Introduction

Attention can be voluntarily directed following one’s behavioral goals (goal-directed or endogenous attention) or reflexively oriented by salient information (such as abrupt onsets) in the environment (stimulus-driven or exogenous attention). The exogenous attentional orienting is crucial for efficient visual search in cluttered scenes (Klein et al., 2023; Li et al., 2023; Ma et al., 2011; Wang & Klein, 2010; Wolfe & Horowitz, 2017). Studies employing the cue-target paradigm (Posner & Cohen, 1984) have shown that the exogenous attentional orienting shows a biphasic temporal pattern. An uninformative peripheral cue initially facilitates subsequent target processing at the cued location at short stimulus-onset asynchronies (SOAs), and later turns into inhibiting responses at the cued location at long SOAs (typically over 200 ms), a phenomenon known as inhibition of return (IOR) (Klein, 2000; Lupianez et al., 2006; Posner et al., 1985; Seidel Malkinson et al., 2024). This characteristic shift from facilitation to inhibition at the cued location has fueled decades of theoretical debate, giving rise to multiple competing accounts of its underlying mechanisms (Funes et al., 2008; Klein & Dick, 2002; Lupiáñez, 2010; Lupiáñez et al., 2001; Milliken et al., 2000; Prime & Jolicoeur, 2009; Taylor & Klein, 1998; Vivas et al., 2007).

Among these, one of the most influential and extensively developed theories explaining the biphasic effect is the integration-segregation theory proposed by Lupiáñez and colleagues (Funes et al., 2008; Lupiáñez et al., 2001; Milliken et al., 2000). Rooted in the object file framework (Kahneman et al., 1992), this theory attributes the biphasic pattern to the dynamic competition between cue-target integration and segregation, two opposing processes. While the integration process favors integrating the targets at the cued locations into an existing episodic representation (an object file) that has been activated by the preceding peripheral cue (Kahneman et al., 1992), the segregation process tends to create a new episodic representation for targets at the uncued locations (Funes et al., 2008; Lupiáñez & Funes, 2005; Lupiáñez et al., 2001; Milliken et al., 2000). At short SOAs (e.g., around less than 250 ms), the integration process in the cued trials is more efficient than the segregation process in the uncued trials, resulting in faster responses for the cued than the uncued trials (i.e., the facilitation effect). At longer SOAs, however, the original object file activated by the cue gradually closed, making it less efficient to integrate new stimuli. Consequently, constructing a new object file at the uncued location gradually becomes easier than updating the closing one, resulting in the IOR effect.

Over the past two decades, the integration-segregation theory has been widely accepted as a flexible and extensible framework for explaining the accumulating IOR research findings (Chen et al., 2007; Funes et al., 2008; Hu et al., 2011; Li et al., 2018; Luo et al., 2010; Lupiáñez et al., 2001; Lupiáñez et al., 2007; Zu et al., 2023), especially showing strengths in explaining the diverse research findings across sensory modalities, feature domains, and task contexts. Originally developed to explain spatial IOR (Funes et al., 2008; Lupiáñez & Funes, 2005; Lupiáñez et al., 2001; Milliken et al., 2000), it has subsequently been extended to nonspatial (e.g., color-, shape-, or frequency-based) IOR in complex environments (Chen et al., 2007; Hu et al., 2011). When the target and the cue share both non-spatial and spatial features, these consecutive stimuli are integrated into a single event representation, hindering the detection of the target (Chen et al., 2007; Hu et al., 2011). Such integrative interference has also been observed across sensory domains— including the auditory modality—and more recently, in supramodal contexts involving abstract semantic features (Zu et al., 2023). Researchers have further extended the explanatory scope of the integration-segregation framework to electrophysiological data (Li et al., 2018; Martín-Arévalo et al., 2014, 2016). For example, enhanced P3 amplitudes at the uncued locations have been interpreted as reflecting greater cognitive demands associated with new object-file creation underlying IOR (Li et al., 2018), whereas reduced P1 amplitudes at the cued locations have been taken to index a perceptual detection cost arising from disrupted cue-target integration (Martín-Arévalo et al., 2014, 2016).

In addition, the integration-segregation theory also succeeds in accounting for findings that are difficult to explain by other attentional theories, particularly the task-dependent variations in IOR (Chen et al., 2007; Lupiáñez & Milliken, 1999; Lupiáñez et al., 2001; Lupiáñez et al., 2007). According to this framework, the timing of IOR is attributed to task demands, with exogenous orientation being modulated top-down by task relevance. It predicts earlier IOR onsets in detection tasks (favoring an early file closure for new event encoding) and delayed onsets in discrimination tasks (supporting late closure for information accumulation) (Lupiáñez & Milliken, 1999; Lupiáñez et al., 2001). Lupiáñez et al. (2007) further extended this view by showing that, in discrimination tasks, facilitation for infrequent targets also occurred at long SOAs when IOR for frequent targets took place. This finding is incompatible with the attentional capture and disengagement accounts, and instead suggests cue-target integration dependent on the task set (see also its three-factor extension; Lupiáñez, 2010). In auditory attention research, IOR has also been found to vary with task demands, as task-irrelevant features can either enhance or eliminate the IOR effect depending on whether the cue and target share the same task-relevant dimension (Chen et al., 2007). As reported by Chen et al. (2007), this pattern is better explained by the integration-segregation theory than by the traditional accounts. Collectively, by emphasizing the cue-target interplay, the integration-segregation account provides a comprehensive interpretation of exogenous attention within a unified theoretical framework by incorporating stimulus features, modalities, and task demands.

Despite these strengths, current support for the theory remains largely inferential. Specifically, the hypothesized dual processes of integration and segregation have not been directly evidenced in brain activities. This gap likely reflects that the empirical research had not kept up pace with the theoretical advances, with most of the early neuroimaging work being conducted when the theoretical account of IOR was still evolving and the integration-segregation framework not yet fully formulated. These studies generally assumed that IOR reflects a time-dependent inhibitory state that builds up with the increasing SOA, and that stronger brain activation was expected at longer SOAs regardless of cue validity (Lepsien & Pollmann, 2002; Mayer, Seidenberg, et al., 2004). Following this assumption, the past studies typically contrasted long and short SOAs to capture the neural dynamics underlying the inhibitory phase of visual attentional orienting, either by collapsing the cued and uncued trials together (Lepsien & Pollmann, 2002; Mayer, Dorflinger, et al., 2004; Müller & Kleinschmidt, 2007; Zhou & Chen, 2008) or by performing the short-vs. long-SOA comparison separately for each cueing condition (Mayer, Seidenberg, et al., 2004). These studies observed the involvement of the frontoparietal attention network, particularly the frontal eye fields (FEF), anterior cingulate cortex (ACC), and inferior parietal lobule (IPL). However, such SOA-based contrasts are insufficient for testing the integration-segregation framework, because they only capture the temporal changes in attentional orienting rather than the functional distinction (integration vs. segregation) that characterizes the theory. To directly test the dual processes of event integration (for the cued targets) and segregation (for the uncued targets), it is necessary to compare the cued and uncued conditions. Some studies did attempt this direct comparison but did not reveal reliable neural differences between the cued and uncued trials during the inhibitory period (Chen et al., 2006; Lepsien & Pollmann, 2002; Mayer, Seidenberg, et al., 2004). For instance, Chen et al. (2006) reported a cue validity effect, but only in the left FEF. The limited neuroimaging evidence for distinct neural responses to the cued and uncued targets in IOR research was likely due to statistical power constraints inherent in event-related functional magnetic resonance imaging (ER-fMRI) experiments (despite their high psychological validity, i.e., estimation efficiency), further aggravated by the suboptimal temporal structure of stimulus sequences, and the limited sample sizes and trial numbers (Buracas & Boynton, 2002; Liu, 2004; Liu & Frank, 2004; Liu et al., 2001; Wager & Nichols, 2003).

In the current study, we aimed at obtaining direct neuroimaging evidence for the integration-segregation theory employing ER-fMRI with an optimized stimulus sequence following the genetic algorithm (GA) framework (Wager & Nichols, 2003). GA is a class of flexible search algorithms that iteratively optimize sequences based on multiple fitness measures to improve the statistical power of contrast detection while taking into account the estimation efficiency of hemodynamic response function (HRF). According to the integration-segregation theory, targets appearing at the previously cued locations will be engaged by the integration process for an update of the existing cue-activated object file. In contrast, targets appearing at the uncued locations should trigger the segregation process to form a new object file. Therefore, we predicted dissociable neural activations for the cued versus uncued targets. Specifically, we hypothesized that the uncued targets would show a greater engagement of the regions involved in new episodic encoding, such as the parahippocampal gyrus (PHG) (Burgess et al., 2002; Danieli et al., 2023; Hayes et al., 2007; Li et al., 2016; Menon et al., 2000; Torres-Morales & Cansino, 2024), and that the cued targets would show a stronger activation of the regions involved in information integration and attentional reorienting, such as the FEF (Astafiev et al., 2003; Corbetta & Shulman, 2002; Liu et al., 2023).

In addition to the main comparison of the cued vs. uncued locations (i.e., IOR generation mechanisms), we also took the opportunity to investigate the IOR expression by systematically manipulating the type of target stimuli by varying the color-response key mappings using a Stroop paradigm (De Houwer, 2003; Veen & Carter, 2005; Veen et al., 2001). Three types of stimuli were presented: a neutral condition (non-color words shown in color, producing no conflict), a semantic conflict condition (word meaning and ink color were incongruent, but mapped to the same response), and a combined semantic-response conflict condition (word meaning and ink color were mismatched and mapped to different responses). This design enabled us to examine how spatial attention interacts with distinct levels of cognitive conflict. A previous fMRI study by Chen et al. (2006) reported dissociable neural signatures for the semantic and response conflicts when spatial attention was engaged. Specifically, the increased dorsolateral prefrontal cortex (DLPFC) activity at the cued locations was interpreted within the inhibitory tagging framework as reflecting a temporary disconnection of pre-response representations from response processes. However, that study manipulated response eligibility by excluding certain incongruent color words from the response set (Milham et al., 2001). This method was criticized for conflating the semantic and response conflicts, as ineligible distractors may not be processed in the same way as task-relevant words (Veen & Carter, 2005). To overcome this limitation, the present study adopted an improved Stroop design that clearly separates the semantic and response conflicts (De Houwer, 2003; Veen & Carter, 2005; Veen et al., 2001). This approach provided a more precise test of whether spatial attention differentially modulates these two conflict types and their associated neural mechanisms.

Results

Participants performed a spatial cueing task (long SOA to elicit IOR) combined with a Stroop paradigm adapted for colored Chinese characters (Chen et al., 2006) (Fig 1A), with the characters appearing at either the cued or the uncued location. The experimental manipulation dissociated the semantic and response conflicts, following a well-established three-condition design (Fig 1B). These conditions were neutral (NE; non-color characters), semantically incongruent (SI; the character and the color are incongruent but mapped to the same response, causing only the semantic conflict), and response-incongruent (RI; the character and the color are incongruent and mapped to opposite responses, causing both the semantic and response conflicts) (De Houwer, 2003; Veen & Carter, 2005). Participants responded using two keys, each assigned to two colors. To ensure sufficient statistical power for detecting condition-specific neural differences, the ER-fMRI design was optimized using the Genetic Algorithm (Wager & Nichols, 2003), with the goal of maximizing experimental efficiency for three contrasts of interest. To directly evaluate the prediction of the integration-segregation theory, we first examined the brain activity differences between the conditions of cued-NE (targets at the cued location in the neutral condition) and uncued-NE (targets at the uncued location in the neutral condition). Subsequent analyses examined how IOR modulated the conflict-related neural activities. The contrast of cued-SI minus cued-NE vs. uncued-SI minus uncued-NE was used to assess the effect of IOR on semantic conflict processing, whereas the contrast of cued-RI minus cued-SI vs. uncued-RI minus uncued-SI was employed to capture the modulation of response conflict by IOR.

Experimental Materials.

A. Trial sequence and display sizes. Each trial starts with a 150-ms non-informative cue at one of the peripheral boxes, followed by a 150-ms fixation cue with a 300-ms SOA. The target was a 450-ms colored Chinese character presented 600 ms after the fixation cue onset at the two target locations with equal probabilities. ISI = interstimulus interval; ITI = intertrial interval. B. The character-color combinations in the three congruency conditions. In the neutral condition (first row), the characters were not color-related. In the other conditions, the characters were color names (translation added for illustration purposes). S-R mapping = stimulus-response mapping; NE = neutral; SI = semantically incongruent; RI = response-incongruent.

Behavioral Results

Mean reaction times (RTs) and accuracies are shown in Fig 2. A two-way (Cue Validity x Congruency) repeated-measures analysis of variance (rm-ANOVA) for the RTs revealed a significant IOR effect (main effect of Cue Validity), F(1, 28) = 12.057, p = .002, = 0.301, showing slower responses to targets at the cued location (M = 642 ms, SE = 20 ms) than at the uncued location (632 ± 19 ms). The main effect of Congruency was also significant, F(1.53, 42.88) = 29.602, p < .001, = 0.514 (the Greenhouse–Geisser correction was applied due to the violation of the sphericity assumption). Post hoc comparisons with the Holm-Bonferroni correction (Holm, 1979) revealed significant differences among all conditions (NE vs. RI: t(28) = −2.179, p = .038, Cohen’s d = 0.071; NE vs. SI: t(28) = −5.957, p < .001, Cohen’s d = 0.275; SI vs. RI: t(28) = −6.715, p < .001, Cohen’s d = 0.203), with NE showing the shortest RT (624 ± 19 ms), followed by SI (632 ± 20 ms) then RI (654 ± 21 ms). These data clearly isolated the two distinct conflict effects in the Stroop effect, namely the semantic conflict (SI-NE difference) and the response conflict (RI-SI difference). The interaction did not approach significance, F(2, 56) = 0.930, p = .401, = 0.032.

Behavioral Results.

Mean reaction times (A) and accuracies (B) as a function of cue validity and congruency. Error bars extend to one standard error of the mean (SEM).

The rm-ANOVA for the accuracy data (Fig 2B) only showed a significant main effect of Congruency, F(2, 56) = 7.685, p = .001, = 0.215. Post hoc comparisons confirmed that this came from a lower accuracy in the RI condition (M = 0.943 ± SE = 0.009) than in the NE (0.966 ± 0.009; t(28) = −3.596, p = .002, Cohen’s d = 0.446) and SI (0.963 ± 0.009; t(28) = −3.150, p = .005, Cohen’s d = 0.391) conditions. No significant difference was found between the NE and SI conditions (t(28) = 0.446, p = .657, Cohen’s d = 0.055). The main effect of Cue Validity (F(1, 28) = 0.021, p = .887, < 0.001) and the interaction (F(2, 56) = 1.298, p = .281, = 0.044) were not significant. No speed-accuracy trade-off was noticed in the current data.

Neuroimaging Results

IOR Effect in the Neutral Condition

The contrast between the cued-NE and uncued-NE conditions was examined to identify the underlying neural mechanisms of the IOR effect during the processing of neutral targets. Whole-brain fMRI findings revealed two distinct activation patterns in response to these conditions (Fig 3A). Relative to the uncued-NE condition, the cued-NE condition showed enhanced activations in the dorsal attention network (DAN) including the bilateral FEF and IPS, along with the right-lateralized TPJ from the ventral attention network (VAN), and the left dACC. In contrast, the uncued-NE condition demonstrated stronger activations in the bilateral PHG and STG than the cued-NE condition. To further compare the activity levels in each brain region between the cued-NE and uncued-NE conditions, paired t-tests were conducted on the average parameter estimates (beta weights) in the left and right IPS, FEF, PHG, and STG, as well as the left dACC and right TPJ (all ps < .001, see Fig 3B). Detailed information on the activated regions’ coordinates, cluster sizes, and statistical significance is provided in Table 1.

IOR Effect in the Neutral Condition and Parameter Estimation.

A. Brain regions showing significant activations in the contrast between the cued-NE and uncued-NE conditions, with a threshold of p < 0.005 (uncorrected) with a minimum cluster size of 540 mm³ (20 voxels), yielding a corrected p < 0.05 based on 2,500 Monte Carlo simulations in BrainVoyager. Warm colors represent stronger activations in the cued condition, and cold colors represent stronger activations in the uncued condition. B. Parameter estimates for each activation region. Error bars extend to 1 SEM. L = left; R = right. *** p < .001.

Brain regions showing significant activation differences between the cued-NE and uncued-NE conditions.

Effect of IOR on Semantic Conflict

Although the IOR effect showed no effect on either the semantic conflict difference (SI-NE) or the response conflict difference (RI-SI) in the behavioral performance, differential neural activities were observed between these conditions (summarized in Fig 4 and Table 2). The effect of IOR on the semantic conflict was examined as the contrast between the SI-NE differences (SI minus NE) in the cued and the uncued conditions. As illustrated in Fig 4A, the right dACC showed significantly reduced activation. A two-way rm-ANOVA was conducted on the average parameter estimates (beta weights) obtained from these contrasts for each activated region. The results confirmed a significant interplay between semantic conflict and IOR in the right dACC, F(1,28) = 15.946, p < .001, = 0.363. Greater neural activities were found in the SI condition compared to the NE condition when the targets were presented at the uncued location (t(28) = 3.262, p = .003, Cohen’s d = 0.606), but not for the targets at the cued location (t(28) = −1.010, p = .321, Cohen’s d = 0.187).

IOR Effect under the Semantic Conflict (A) and the Response Conflict (B).

These two effects were quantified as cued-SI minus cued_NE > uncued-SI minus uncued-NE and cued-RI minus cued-SI > uncued-RI minus uncued-SI, respectively. Parameter estimations were based on a threshold of p < 0.005 (uncorrected), with a minimum cluster size of 540 mm³ (20 voxels), yielding a corrected p < 0.05 based on 2500 Monte Carlo simulations in BrainVoyager. Error bars extend to 1 SEM. **p < .01, *p < .05, n.s. = non-significant.

Brain regions showing a significant modulation effect of IOR on semantic conflict (cued-SI minus cued_NE > uncued-SI minus uncued-NE) or response conflict (cued-RI minus cued-SI > uncued-RI minus uncued-SI).

Effect of IOR on Response Conflict

To explore the influence of IOR on response conflict, we compared the cued (RI–SI) and the uncued (RI–SI) conditions (Fig 4B and Table 2). The right superior parietal cortex (SPC) showed a significant activation reduction (Fig 4B, left), while the right putamen exhibited an activation enhancement (Fig 4B, right). A two-way rm-ANOVA on the beta weights revealed a significant interaction in the right SPC, F(1,28) = 20.833, p < .001, = 0.427. Specifically, it showed greater activations in the RI condition compared to the SI condition when the targets were presented at the uncued location (t(28) = 3.447, p = .002, Cohen’s d = 0.640), but not for the cued location (t(28) = −0.962, p = .344, Cohen’s d = 0.179).The right putamen also demonstrated a significant interaction (F(1,28) = 26.686, p < .001, = 0.488), but with a different pattern. The activation was stronger in the RI than the SI conditions for the cued location (t(28) = 2.983, p = .006, Cohen’s d = 0.554); and the opposite pattern was observed for the uncued location (t(28) = −2.404, p = .023, Cohen’s d = 0.446).

Discussion

The integration-segregation theory has emerged as an influential framework for explaining the dynamic effects of exogenous attention (Chen et al., 2007; Funes et al., 2008; Hu et al., 2011; Li et al., 2018; Lupiáñez et al., 2001; Lupiáñez et al., 2007; Zu et al., 2023), attributing the turning from the early attentional facilitation to the later IOR to the dynamic processes of cue-target integration and segregation (Funes et al., 2008; Lupiáñez & Funes, 2005; Lupiáñez et al., 2001; Milliken et al., 2000). In the current study, by contrasting the cued versus the uncued targets, we provided the first direct neuroimaging evidence supporting this theory by dissociating brain activation patterns associated with these two processes. Stronger responses were found in the bilateral FEF, IPS, right TPJ, and left dACC in the cued than the uncued targets. Relative to the cued targets, the uncued targets showed stronger activations in the bilateral PHG and STG.

The heightened activation observed for targets appearing at the cued locations (particularly in the bilateral FEF, IPS, and right TPJ) reflects the attentional demand associated with the integration process. According to the integration-segregation theory (Funes et al., 2008; Lupiáñez & Funes, 2005; Lupiáñez et al., 2001; Milliken et al., 2000), the cue-initiated object file is likely to have closed or begin closing under the long SOA conditions, hindering immediate integration of the subsequent targets. To integrate a target appearing again at the cued location, the object file needs to be reopened with reallocation of attentional resources. Our neuroimaging data captured this process by showing coordinated activation in the bilateral FEF and IPS (key nodes of the dorsal attention network) and in the right TPJ (a core region of the ventral attention network) (Ahrens et al., 2019; Corbetta & Shulman, 2002; Fox et al., 2006; Vossel et al., 2014). These regions act in concert to support the attentional shifts and reorienting necessary for reopening the object file for integration. Additionally, the observed increase in left dACC activity under the cued relative to the uncued condition likely reflects the engagement of cognitive control mechanisms (Botvinick et al., 2004; Chung et al., 2024; Mayer et al., 2012; Veen & Carter, 2005) to resolve the difficulty with the task-driven requirement of target integration facing the reduced accessibility of the cue-initiated representation. It is also possible that the heightened activation of dACC represents the inhibitory bias toward the direction of the cued location (Mayer, Seidenberg, et al., 2004).

In contrast, the stronger activation in the uncued condition was observed in the bilateral PHG extending into the STG. Located in the medial temporal lobe surrounding the hippocampus, the PHG is critically involved in episodic encoding, particularly for novel visual or spatial stimuli (Burgess et al., 2002; Danieli et al., 2023; Hayes et al., 2007; Li et al., 2016; Menon et al., 2000; Torres-Morales & Cansino, 2024). In our task, the targets appearing at the uncued locations likely represented novel events requiring new spatial registration. Following the theorized segregation process, such events require the creation of new object files (Kahneman et al., 1992) by engaging brain regions involved in novelty detection and contextual updating. The enhanced PHG/STG activities observed in the uncued condition may reflect the encoding of new spatial representations of the segregation process.

Our data provided clear support for the integration-segregation theory. It is also noteworthy that, although prior studies investigated the neural mechanisms of IOR (Bourgeois et al., 2013a, 2013b; Hanlon et al., 2017; Lepsien & Pollmann, 2002; Mayer, Dorflinger, et al., 2004; Mayer et al., 2007; Mayer, Seidenberg, et al., 2004; Müller & Kleinschmidt, 2007; Satel et al., 2019; Yang & Mayer, 2014; Zhou & Chen, 2008), none identified distinct activation patterns corresponding to the integration and segregation processes as in our data. Specifically, most of the previous IOR studies did not show significant brain activations when contrasting the cued and uncued conditions (Lepsien & Pollmann, 2002; Mayer, Seidenberg, et al., 2004), except Chen et al. (2006) reported a cue-validity effect confined to the left FEF. Instead, some indirect approaches, such as comparing long- and short-SOA trials while collapsing over the cueing conditions, reported activations in regions like the FEF, TPJ, ACC, and posterior parietal cortex (Lepsien & Pollmann, 2002; Mayer, Dorflinger, et al., 2004; Mayer, Seidenberg, et al., 2004; Müller & Kleinschmidt, 2007; Zhou & Chen, 2008), showing some similarity with the integration-related network observed in the current study. However, the findings were inconsistent across studies, with some reporting only a limited subset of regions and others showing lateralized instead of bilateral effects (e.g., stronger right-hemisphere FEF activation; Lepsien & Pollmann, 2002; Mayer, Dorflinger, et al., 2004). Similar frontoparietal engagement has also been observed in auditory and cross-modal IOR studies (Hanlon et al., 2017; Mayer et al., 2009; Mayer et al., 2007; Yang & Mayer, 2014), typically present across various SOAs (e.g., sustained activation in both the dorsal and ventral frontoparietal regions regardless of SOA length; Hanlon et al., 2017) or showing SOA-dependent effects (e.g., reversed direction of activation differences between short and long SOAs; Mayer et al., 2007). Complementing these observations, transcranial magnetic stimulation (TMS) studies have provided causal evidence for the contribution of frontoparietal regions in IOR (Bourgeois et al., 2013a, 2013b; Chica et al., 2011; Ro et al., 2003). For instance, stimulation over the right FEF during the cue-target interval has been shown to eliminate the typical IOR effect for the cued targets in the ipsilateral hemifield (Ro et al., 2003). Similarly, TMS applied to the right IPS/TPJ also disrupted the IOR effect (Bourgeois et al., 2013a; Chica et al., 2011), whereas stimulation over their left-hemisphere counterparts did not cause much change in IOR (Bourgeois et al., 2013b). These findings suggest a possible right-lateralized neural organization of the integration process. However, this lateralization notion conflicts with the largely bilateral activation pattern observed in our study. The lack of systematic testing for the left-hemisphere contribution in previous TMS studies leaves this asymmetry open to further investigation. Notably, despite offering partial (and often lateralized) support for the integration process, none of these prior studies have addressed the neural mechanisms underlying the segregation process, which is uniquely revealed by the present neuroimaging findings.

The above discrepancies between our findings and the previous studies may stem from several methodological and design factors. Firstly, the prior studies likely introduced confounds when investigating IOR indirectly. When comparing long and short SOAs, the observed effects may have been jointly influenced by factors unrelated to IOR, such as working memory (i.e., increased demand of maintaining the cue representation over longer intervals; Mayer et al., 2007) and temporal attention (i.e., distinct temporal expectations formed by variations in SOA; Nobre & van Ede, 2018). Moreover, the IOR effect depends not only on cue-induced attentional orienting, but also on the dynamic interaction between the target onset and the ongoing cue-related neural activity (Lupiáñez, 2010; Nobre & van Ede, 2018; Taylor & Donnelly, 2002). These confounds could potentially obscure the genuine IOR effect. Secondly, differences in statistical power may also account for the discrepancies. In the present study, we employed an optimized GA stimulus sequence (Wager & Nichols, 2003), which provides greater statistical power than simple random sequences while maintaining a high estimation efficiency (for details, see the Methods and Supplementary Information sections). This optimization likely enhanced the reliability of the estimated neural responses (Wager & Nichols, 2003). In addition, the previous neuroimaging studies on IOR often relied on relatively small sample sizes (around 10-12 participants; Chen et al., 2006; Mayer, Dorflinger, et al., 2004; Mayer, Seidenberg, et al., 2004; Müller & Kleinschmidt, 2007) or a limited number of trials (e.g., 30 trials per condition; Lepsien & Pollmann, 2002), leading to much reduced statistical power and a higher probability of false negatives. In contrast, the current study increased both the number of trials and the sample size, having effectively enhanced the sensitivity of detecting differences between experimental conditions (Baker et al., 2021; Chen et al., 2022). Finally, task design differences may further contribute to the observed inconsistencies. The earlier studies often employed simple localization or detection tasks (Lepsien & Pollmann, 2002; Mayer, Dorflinger, et al., 2004; Mayer, Seidenberg, et al., 2004; Müller & Kleinschmidt, 2007), while the current study adopted a discrimination task. According to the integration-segregation theory (Funes et al., 2008; Lupiáñez & Funes, 2005; Lupiáñez et al., 2001; Lupiáñez et al., 2007; Milliken et al., 2000), more complex stimuli may require greater cognitive resources to establish object files, leading to enhanced processing of object files and heightened detectability of the underlying integration and segregation processes.

Another novelty of the current study is integrating the IOR and the modified Stroop tasks, which separately studied semantic-and response-level conflicts (De Houwer, 2003; Veen & Carter, 2005). Through this design, we made an additional discovery about how IOR modulates the ongoing Stroop interference effect at the inhibited (i.e., cued) locations. Behaviorally, our results showed no significant interaction between IOR and any conflict in the Stroop task, not replicating the previous findings (Chen et al., 2006; Vivas & Fuentes, 2001) of reduced Stroop interference at the cued relative to the uncued locations. Yet, at the neural level, the brain regions involved in conflict processing were engaged in the interaction between IOR and the Stroop effect. Specifically, the right dACC, which is involved in semantic conflict processing (Li et al., 2017; Milham et al., 2001; Veen & Carter, 2005), appeared to serve as a critical neural interface for the interaction between semantic conflict and IOR. Specifically, in the uncued condition, the semantic incongruency elicited stronger activations compared to the neutral condition, a pattern that disappeared or even reversed in the cued condition. Regarding the interaction between response conflict and IOR, brain regions such as the right SPC, which are involved in detecting response conflict and orienting spatial attention (Li et al., 2017), played a key role. Similarly, this region exhibited stronger conflict effects (i.e., greater activation in the RI than SI condition) in the uncued condition compared to the cued condition. These results can be interpreted by the inhibitory tagging mechanism proposed by Fuentes et al. (1999), which posits that, when attention is drawn away from a cued location, stimuli presented there are temporarily tagged with inhibition (Fuentes et al., 2000; Fuentes et al., 1999; Vivas & Fuentes, 2001). ERP evidence supporting this mechanism was reported by Zhang et al. (2012), who showed that the Stroop conflict-related N450 effects were delayed and attenuated at the cued compared to the uncued locations, suggesting a temporary disruption of the stimulus-response link. Such inhibitory tagging may attenuate or even disrupt conflict processing at the inhibited location, offering a plausible account for the neural interactions between IOR and Stroop conflicts observed in our study. The current results could also potentially suggest that the effects of inhibitory tagging are not limited to stimulus-response connections (as proposed by Fuentes et al., 1999), but also extend to semantic representations, as evidenced by the modulation of the right dACC observed in our study. This notion is consistent with a previous finding that the N400 ERP component (a biomarker of semantic processing) had a decreased amplitude for the cued position (Zhang & Zhang, 2007). This highlights that spatial attention can affect subsequent cognitive processes at the semantic level (Cristescu & Nobre, 2008; Zhang & Zhang, 2007).

Furthermore, we observed pronounced neural responses in the right putamen when contrasting the RI and SI conditions at the cued versus the uncued locations. The putamen is a subcortical nucleus in the basal ganglia and has been found to be involved in control of response interference (Schmidt et al., 2018; Schmidt et al., 2020). For example, Schmidt et al. (2020) demonstrated that the dorsal striatum, including the putamen, is engaged during Simon-type interference by supporting task-appropriate response selection and suppression of competing alternatives, and that its damage leads to less efficient interference control (Schmidt et al., 2018). These findings support the view that the putamen is recruited when interference arises at the response-selection level. Building on this, we speculated that the enhanced putamen activation in the cued conditions in the current study reflects an increased demand for response control when attentional resources were reduced by IOR. Taken together, our findings highlight a potential neural basis for the interaction between IOR and conflict processing encompassing both semantic and response domains.

However, these results should be interpreted with caution given the absence of behavioral support. One potential explanation of this dissociation is the usage in the current study of the GA-optimized sequence, which prioritizes both detection and HRF estimation efficiency. This optimization made the event order converging towards a block-like structure, thereby having reduced event counterbalancing and increased sequential predictability (Wager & Nichols, 2003). Such unintended regularities may have influenced participants’ behavioral strategies (e.g., forming expectations about upcoming events), thereby weakening the correspondence between the neural and behavioral findings. Future studies should address this limitation by employing more optimized designs that consider some psychological factors (e.g., event counterbalancing; Wager & Nichols, 2003) to better validate the observed neural mechanisms. Alternatively, the observed neural-behavioral response dissociation may reflect differences in measurement sensitivity between the neural and behavioral indices (Chen et al., 2006; Wilkinson & Halligan, 2004). A similar pattern of dissociation was reported by Chen et al. (2006), who found that the response conflict was not modulated by IOR behaviorally, yet the regions associated with conflict resolution exhibited stronger activations in the cued condition. As noted by Wilkinson and Halligan (2004), RTs and accuracies are not perfect measures of cognition, whereas neural signals can reveal finer-grained or “hidden” processes that precede overt behavior. Thus, neural modulations may emerge even in the absence of detectable behavioral differences.

In conclusion, the current study provides the first direct neuroimaging evidence lending support to the hypothesis of the integration–segregation theory (Funes et al., 2008; Lupiáñez & Funes, 2005; Lupiáñez et al., 2001; Milliken et al., 2000). We revealed distinct neural mechanisms for processing of the cued and uncued targets during IOR, with attentional integration engaging the frontoparietal attention network (FEF, IPS, TPJ, dACC) and segregation recruiting the medial temporal regions (PHG–STG) associated with new object-file formation and novelty encoding. These dissociated activations offered direct support for the dynamic interplay between the integration and segregation processes. We also identified interactions between IOR and cognitive conflict in brain activities, suggesting that attentional orienting can modulate conflict processing at both the semantic and response levels. Taken together, our findings revealed the neural underpinnings of the integration-segregation theory and advanced our understanding of the neural mechanisms linking exogenous attentional orienting and cognitive control.

Methods

Participants

32 healthy participants with normal or corrected-to-normal vision and normal color vision were recruited. None reported a history of neurological disorders. Data from three participants were excluded due to excessive head movements and high global variances (see fMRI Data Analysis), leaving 29 participants for analysis (18 female, 11 male; mean age ± SD = 22.69 ± 2.58 years). All participants were naïve to the purpose of the study, provided written informed consent approved by the Ethics Committees of Northeast Normal University and Soochow University, and received monetary compensation. The sample size was informed by a power analysis using MorePower 6.0 (Campbell & Thompson, 2012) for a within-subjects rm-ANOVA. To achieve an 80% statistical power at the threshold of α = .05 (Chen et al., 2006), 14 participants were required. In addition, we also acknowledged that effect sizes from published studies are often inflated due to the publication bias (Albers & Lakens, 2018). To mitigate this potential risk, we determined to acquire data from a sample sized at least double of the suggested size by the power analysis (i.e., N ≥ 28).

Experimental Design

The experiment adopted a within-subjects design with two factors, namely Cue Validity (cued vs. uncued) and Congruency (SI, RI, and NE). The targets appeared at the cued location in the cued trials and at the other peripheral location in the uncued trials. The congruency factor referred to the relationship between the color and meaning of the Chinese characters (i.e., targets). In total, eight characters and four colors were used (see Fig 1B). In an SI trial, the color and character meaning differed but were mapped to the same response key (e.g., character “红” [“red”] displayed in green, which was related to the same response key). In an RI trial, the color and character meaning differed and were mapped to different response keys (e.g., “红” displayed in yellow, which was related to another key). The NE trials used characters that were not related to color in the meanings and shared the same orthographic structures (character complexity and form) of the color-meaning characters. In addition to the six experimental conditions, we added a null condition, in which no Chinese character was shown, to increase the statistical power of detecting differences across conditions (Friston et al., 1999).

It is worth noting that the statistical power of effects in rapid ER-fMRI depends greatly on specific sequences of stimulus events (Liu & Frank, 2004; Wager & Nichols, 2003). To ensure high design efficiency, we optimized the stimulus sequences employing the genetic algorithm (see Supplementary Information for details) (Wager & Nichols, 2003). This optimization improves the detection efficiency for the contrasts of interest by moderately sacrificing the efficiency of less relevant contrasts (Wager & Nichols, 2003). In the current study, we focused on three contrasts, including cued-NE vs. uncued-NE, cued-SI minus cued-NE vs. uncued-SI minus uncued-NE, and cued-RI minus cued-SI vs. uncued-RI minus uncued-SI. These contrasts respectively examined the IOR effect, the modulation of semantic conflict processing by IOR, and the modulation of response conflict processing by IOR. The optimized sequences were used for all but two participants, whose trial sequences were constructed using a truncated M-sequence (Buracas & Boynton, 2002) implemented in an earlier version of the experiment.

Stimuli and Procedure

Each participant completed two functional scans (i.e., experimental runs) and one anatomical scan in a single session. Each experimental run employed a rapid event-related design and had each of the seven conditions (six experimental conditions plus the null condition) repeated 48 times (336 trials per run). Across the two runs, this yielded a total of 672 trials (96 trials per condition).

All trials displayed a three-box display over a gray background, including a central black fixation box (1°×1°, line width of 0.02°) and two black placeholder boxes (1.5°×1.5°, line width of 0.02°) positioned 4° (center-to-center) to the left and right of the fixation box. Each run began and ended with this display for 16 and 20 s, respectively. The trial sequence is illustrated in Fig 1A. In a null trial, only the three boxes were shown for the trial duration. In any of the six experimental conditions, each trial started with one of the peripheral boxes changing to a white color with a line width of 0.05° for 150 ms to attract attention to this peripheral location (cue). 150 ms after the offset of the peripheral cue, the central fixation box turned into white with a line width of 0.05° for 150 ms to force attention back to the central location (central cue). After another 450 ms, a colored Chinese character (in the STSong font, 1.4°×1.4°) was presented (target) for 450 ms inside one of the two peripheral boxes with equal probabilities. Participants were required to ignore the meaning of the character and identify the word color as quickly and accurately as possible by pressing one of the two keys designated for the color categories (red/green and blue/yellow) with their middle and index fingers, respectively (Fig 1B). The color category-button mapping was counterbalanced across participants. Furthermore, to avoid a possible occurrence of the Simon effect (Klein & Ivanoff, 2011), the response keys were vertically arranged. Each trial ended with an inter-trial interval (ITI) with a duration of 850, 1,050, 1,250, or 1,450 ms (randomized with equal probabilities). The average trial duration was 2,500 ms.

Before the scanning, all participants had two practice parts outside the scanner to familiarize themselves with the task and the stimuli. In the first part, the participants practiced on a discrimination task with only color patches (no Chinese characters) using the predefined color category-button mapping. Once having reached an accuracy of 96%, the participants did the second part and completed 24 practice trials of the experimental task as in the scanning runs.

Apparatus and Data Acquisition

The imaging data were acquired at two research sites following comparable protocols. At the Imaging Center for Brain Research of Beijing Normal University, the stimuli were presented with E-Prime (Psychological Software Tools, Pittsburgh, PA) on an LCD monitor (1024×768 resolution, 60 Hz refresh rate) viewed through a head-coil-mounted mirror (115 cm optical distance). The data were collected using a Siemens 3-Tesla Tim Trio scanner with a head coil. The functional data were acquired through a T2*-weighted echo planar imaging (EPI) sequence (TR = 2,000 ms; TE = 30 ms; flip angle = 90°; FOV = 220×220 mm; matrix size = 64×64). Thirty-three transversal slices covering the whole brain (slice thickness = 4 mm; in-plane resolution = 3.44×3.44 mm; slice gap = 0.4 mm) were acquired in an interleaved ascending order. Each participant completed two functional runs of 400 volumes (including 8 initial dummy volumes). High-resolution anatomic images were collected using a T1*-weighted magnetization-prepared rapid gradient echo (MP-RAGE) sequence consisting of 128 sagittal slices (TR = 2,300 ms; TE = 3.9 ms; flip angle = 8°; FOV = 256×256 mm, matrix size = 256×256, voxel resolution = 1.33×1×1 mm, slice gap = 0 mm).

At the Imaging Center of the First Affiliated Hospital of Soochow University, the stimuli were presented with MATLAB (The MathWorks, Natick, MA) and the Psychophysics Toolbox (Brainard, 1997) on a monitor (also 1920×1080 and 60 Hz) viewed through a mirror at an optical distance of 251 cm. The imaging data were recorded using a 3-Tesla Philips Ingenia scanner equipped with a head coil. The functional images featured a matrix size of 80×80, an in-plane resolution of 2.75×2.74 mm, and no slice gap. The structural images were acquired with a voxel resolution of 1×1×1 mm across 180 slices (FOV = 240×240 mm; matrix size = 240×240). The other parameters remained the same as those used at Beijing Normal University.

Data Analysis

Behavioral analysis

Trials with incorrect responses and RTs shorter than 100 ms or longer than 1,300 ms (5.52% of total trials) were excluded from statistical analyses. Mean RTs on correct trials and response accuracies were entered into the two-way rm-ANOVA.

fMRI Data Analysis

The fMRI preprocessing and analysis were conducted with the BrainVoyager QX (version 2.2, Brain Innovation) software package(Goebel et al., 2006). The initial eight functional volumes of each scan were discarded to allow signal equilibration. For the remaining functional images, slice timing correction was applied using sinc interpolation, followed by 3D motion correction with trilinear/sinc interpolation for intra-session alignment to the middle volume. Each run for each participant was examined for the six head motion parameters (three rotations and three translations). Runs with motions exceeding one voxel length in any direction were excluded. An isotropic Gaussian kernel of an 8-mm full width at half maximum (FWHM) was then applied to spatially smooth the images. Finally, linear trend removal was performed, along with high-pass temporal filtering at a cutoff of approximately 0.0081 Hz (corresponding to seven cycles per run), to remove low-frequency nonlinear drifts. After these steps, we checked the variance in the global signals to assess the data quality, specifically for detection of any abrupt changes in the time course of each run. Volumes containing abrupt changes were interpolated, and runs exhibiting a global variance exceeding 0.1% were excluded. For each participant, the accepted fMRI data were co-registered with an anatomical scan in the native space and further transformed to the standard Montreal Neurological Institute (MNI) 152 space, with a resampled voxel size of 3×3×3 mm.

After the preprocessing, statistical analyses were performed using a random effects general linear model (RFX-GLM) analysis within BrainVoyager, executing a multi-subject GLM with distinct predictors for each participant. Using a deconvolution and multiple regression approach, we modeled six experimental conditions and one “error” term (including all the error trials) for each participant, with each condition including six sampling points taken from the 0-12 s period after the cue presentation (i.e., one sampling point every 2 seconds). The functional images occurring 5-10 s after the cue onsets, corresponding to the peak of the hemodynamic response function (HRF, Cohen, 1997), were used to provide parameter estimates for the amplitudes of the HRF. Volumes deviating in intensity by ±3 SDs or more from the individual means were removed by a weighted vector that was included in the model as a covariate of no interest. In addition, the six mean-centered head motion parameters were modeled as covariates of no interest to further remove any residual variance due to head motion. To mitigate noise related to global physiological processes, the model incorporated the global signal, which represented the normalized average activity across all voxels at each time point in the standard space, as an additional predictor. We examined the three contrasts of interest introduced earlier. Corrections for multiple comparisons at p < 0.05 were made through the Cluster Threshold plugin (BrainVoyager) using 2,500 Monte Carlo simulations. Minimum cluster sizes (540 mm3 corresponding to 20 voxels) corresponding to significance at a threshold of p < 0.005 (uncorrected) were computed for each contrast (Forman et al., 1995). The approximate Brodmann areas (BAs) and the corresponding anatomical labels of the peak voxel of the significant clusters in the MNI space were identified using the Neuroelf toolbox v1.1 (Weber, 2017).

Data availability

The processed data used for the final analyses are available at https://github.com/yangzhangpsy/ER-fMRI-IOR. Raw data for this study can be requested from the Lead Contact, and the authors confirm that all reasonable requests will be fulfilled.

Acknowledgements

This research was supported by grants from the Brain Science and Brain-like Intelligence Technology-National Science and Technology Major Project (2025ZD0215702), the National Natural Science Foundation of China (32171049), the Social Science Foundation of Jiangsu Province (22JYB015), the China Postdoctoral Science Foundation (2024M752310), and Jiangsu Funding Program for Excellent Postdoctoral Talent (2024ZB496, 2025ZB642).

Additional files

Supplemental Materials

Additional information

Funding

Brain Science and Brain-like Intelligence Technology - National Science and Technology Major Project (2025ZD0215702)

  • Yang Zhang

MOST | National Natural Science Foundation of China (NSFC) (32171049)

  • Yang Zhang

Social Science Foundation of Jiangsu Province (江苏省社会科学基金项目) (22JYB015)

  • Yang Zhang

China Postdoctoral Science Foundation (中国博士后科学基金) (2024M752310)

  • Yujie Chen

Jiangsu Funding Program for Excellent Postdotoral Talent (2024ZB496)

  • Yujie Chen

Jiangsu Funding Program for Excellent Postdotoral Talent (2025ZB642)

  • Ai-Su Li