Introduction

Visual spatial attention is a fundamental mechanism that enables both humans (Carrasco, 2011) and animals (Saban et al., 2017; Hahner and Nieder, 2025) to selectively process information from their environment. Often, shifts in spatial attention are accompanied by eye movements to focus on a specific location, a process known as overt spatial attention (Kaspar, 2013). However, covert spatial attention — the ability to shift attention independently of eye movements — is equally crucial in daily life. This ability enables us to monitor locations beyond our line of sight, such as when driving and keeping track of peripheral surroundings.

Covert spatial attention is typically categorized into two types: endogenous and exogenous attention. Endogenous attention refers to the voluntary allocation of processing resources to a specific location. While this shift occurs relatively slowly, taking approximately 200–300 ms to reach the target region, it can be sustained for an extended duration (Theeuwes, 1994; Findlay, 2003; Chica and Lupiáñez, 2009; Carrasco, 2011; Dugué et al., 2020). In contrast, exogenous attention is driven by salient stimuli that automatically capture attention (Theeuwes, 1994; Findlay, 2003; Chica and Lupiáñez, 2009; Carrasco, 2011; Dugué et al., 2020). This shift is rapid but transient, often followed by a phenomenon known as inhibition of return, moving attention away from the initially attended location (Klein, 2000; Chica and Lupiáñez, 2009). Compared to endogenous attention, exogenous attention is more automatic and less flexible Corbetta and Shulman (2002); Knudsen (2007); Carrasco (2011).

Until recently, research on the effects of covert attention on visual perception has focused primarily on extrafoveal vision. A vast body of literature has demonstrated that covert attention enhances visual contrast sensitivity (Carrasco et al., 2000; Martínez-Trujillo and Treue, 2002; Reynolds and Chelazzi, 2004; Pestilli and Carrasco, 2005; Li et al., 2008; Foster et al., 2021) and increases spatial resolution (Yeshurun and Carrasco, 1998; Carrasco et al., 2002, 2006; Jigo and Carrasco, 2018) at selectively cued locations in the extrafovea. In contrast, attention within the high-acuity 1-deg foveola has often been considered uniform and distributed evenly throughout this small region. Therefore, the effects of attention in the fovea are traditionally studied using large stimuli encompassing one or more degrees of visual angle (Miniussi et al., 2002; Jigo and Carrasco, 2020; Papaioannou and Luck, 2020). However, recent findings showed that even within the 1-degree foveola, both endogenous (Poletti et al., 2017) and exogenous (Guzhang et al., 2021) attention can be covertly allocated in a highly spatially selective manner. For both types of covert attention, observers were better able to discriminate the orientation of fine details at an attended location—cued endogenously or exogenously—compared to nearby uncued locations just 0.26° away. Although these results highlighted the strikingly fine grain of attentional control, they also raised new questions. In particular, it remains unknown which spatial frequencies benefit from fine-grained attentional shifts within the foveola. While Guzhang et al. demonstrated visual enhancement from exogenous attention at the foveal scale, the orientation discrimination task used in the study was relatively coarse, requiring participants to determine whether a stimulus was tilted ±45°. Despite the small size of the stimulus, such a task does not require high spatial frequencies (e.g., > 10 cycles per degree); in fact, frequencies around 4-8 cycles per degree (CPD) should be sufficient to perform it effectively. Therefore, the perceptual enhancement observed in Guzhang et al. could be due to an enhancement of only lower or only higher spatial frequencies, or perhaps a broad range of spatial frequencies. The overall improvement in orientation discrimination of fine spatial stimuli is compatible with any of these scenarios.

The effects of extrafoveal attention have been found to differ across spatial frequencies: while extrafoveal endogenous attention enhances a broad range of spatial frequencies (Lu and Dosher, 2004; Jigo and Carrasco, 2020), extrafoveal exogenous attention selectively enhances high spatial frequencies (Carrasco et al., 2006; Barbot et al., 2011, 2012; Carretié et al., 2012; Jigo and Carrasco, 2020; Fernández et al., 2022), peaking just above the spatial frequency characterized by the highest sensitivity at a given eccentricity (Jigo and Carrasco, 2020). Whether fine spatial exogenous attention at the foveal scale modulates visual discrimination similarly is an open question. Generally, fine control of spatial attention at the foveal scale is required when examining fine spatial details, such as reading small text in a book or noticing subtle changes, like a traffic light switching or unexpected pedestrians from afar while driving (Figure 1A). In these tasks, precise allocation of attention likely helps distinguishing and recognizing individual letters and details. It is possible that in the foveola, exogenous attention modulates a narrow range of spatial frequencies, similar to how it operates extrafoveally. However, while humans can resolve spatial frequencies up to 30 cycles per degree (CPD) in the foveola (Intoy and Rucci, 2020; Clark et al., 2022), extrafoveally, spatial frequencies above 10 CPD cannot be resolved. Therefore, even if the perceptual enhancement driven by fine spatial attention is limited to a narrow range of lower spatial frequencies, the enhanced frequency range may shift toward higher spatial frequencies in the foveola compared to what happens extrafoveally (Figure 1B). Alternatively, attention at the foveal scale might preserve its enhancement of low spatial frequencies while extending it to high spatial frequencies, leading to a broad, rather than narrow, range of modulation (Figure 1C). Any of these scenarios could account for the improvement in orientation discrimination observed in Guzhang et al..

Fine-tuning exogenous attention within the foveola.

(A) Fine-tuning of exogenous attention within the foveola occurs, for example, when we are looking at a distant traffic light—occupying less than 1° of our visual field—that suddenly turns green, capturing our attention and prompting us to move forward. As a result of the fine-tuning of exogenous attention, contrast sensitivity could be enhanced for a narrow range of spatial frequencies, centered around lower spatial frequencies (blue) or higher spatial frequencies (green) (B). On the other hand, contrast sensitivity may be enhanced uniformly across a wide range of spatial frequencies (C).

In the current study, we addressed two main questions. First, does exogenous attention at the foveolar scale enhance visual processing across a narrow or a broader range of spatial frequencies? Second, if the enhancement operated within a narrow frequency band, which range of spatial frequencies benefits the most from such fine-grained shifts of attention? Addressing these questions is crucial because, while it is now established that covert attention can be selectively shifted even within the central fovea, it remains unclear whether it follows the same modus operandi foveally and extrafoveally. If a similar range of spatial frequencies is enhanced by exogenous attention in both the foveola and extrafovea, it would suggest that exogenous attention operates similarly across the visual field, regardless of the spatial resolution achievable at different eccentricities. In contrast, if the modulation of spatial frequencies differs, it would indicate that the mechanisms of exogenous attention are flexibly tuned in the foveola and adjusted based on the spatial resolution that can be achieved at this scale.

Studying attentional control in the foveola presents unique challenges. Continuous microscopic eye movements during fixation cause constant displacement of the retinal input (Martinez-Conde et al., 2004; Rucci and Poletti, 2015; Krauzlis et al., 2017), making it difficult to limit visual stimulation to the desired eccentricity at this scale. This poses a significant issue when investigating covert attention in the central foveola. To address these challenges, we employed high-precision eye-tracking (Wu et al., 2023) combined with gaze-contingent display control (Santini et al., 2007) to precisely monitor gaze position throughout each trial to ensure that any effects observed are solely due to covert exogenous attention and are not driven by fixational saccades (Hafed and Clark, 2002; Yuval-Greenberg et al., 2014; Shelchkova and Poletti, 2020; Guzhang et al., 2024).

Results

To examine the effects of high-resolution exogenous attention within the fovea on visual discrimination of stimuli at different spatial frequencies, we employed a 2AFC visual discrimination task in which observers were asked to discriminate the orientation of a small Gabor patch (30′ x 30′ with an overlaying 5.4′ Gaussian window, tilted ±45°) 30′ from either left or right of the fixation marker when prompted by a response cue (Figure 2A). Note that, the Gabor patches used in the current study were much smaller than those typically used in studies probing extrafoveal attention (Rossi and Paradiso, 1995; Gobell and Carrasco, 2005; Herrmann et al., 2010; Jigo and Carrasco, 2020). On each trial, the orientation and phase of the Gabor patch was randomized at each location independently. Eye movements were monitored at high resolution using a digital Dual Purkinje Image eye tracker (Wu et al., 2023) to ensure that observers maintained the center of gaze within a 10′×10′ window around the fixation point throughout the trial (Figure 2D).

Experimental protocol.

(A) Trials started with a fixation marker at the center of the monitor. Observers were instructed to maintain fixation at the center throughout the trial. After a brief flash of the exogenous cue to capture observers’ attention, two Gabor patches independently tilted (±45°) were briefly displayed, one on each side of the fixation marker. At the end of the trial, a response cue appeared, and observers had to report the orientation of the stimulus that was previously presented at the cued location. (B) In valid trials, the exogenous cue and response cue indicated the same spatial location. In neutral trials, no exogenous cue was presented. Valid and neutral trials had the same probability of occurrence. The Gabor patches had a Gaussian window of 5.4′ standard deviation, creating a 30′ x 30′ visible region. (C) Stimuli used in the experiment. Gabor patch of all spatial frequencies tested from 4 to 20 cycles per degree (CPD). (D) 68% contour of the gaze probability distribution in valid and neutral conditions during Gabor presentation. Color represents individual observers. Figure 2—figure supplement 1. Number of trials included in the analysis after filtering.

The experiment included two cueing conditions—valid and neutral—that were randomly interleaved with equal probability within each experimental block. In the valid condition, shifts of exogenous attention were elicited by a small and brief white flash (exogenous cue) presented 100 ms before the target either on the left or the right of the visual field. The cue, with 100% validity, appeared just outside the upcoming target location. In the neutral condition, no exogenous cue was presented.

We tested four spatial frequencies (SFs)—4, 8, 12, and 20 CPDs, Figure 2C)—ensuring at least one full cycle of modulation within the Gabor patch even at the lowest SF (4 CPD) (Howell and Hess, 1978). The highest spatial frequency (20 CPD) is close to the limit of visual resolution at the eccentricity tested here. For each spatial frequency tested, an initial threshold contrast was estimated by methods of constant stimuli, then discrimination accuracy was measured at four contrast levels around the initial threshold estimate, and one additional level at the maximum contrast to measure the asymptotic performance (see Methods). The contrast level was kept constant within each experimental block but was randomized across blocks.

To examine how high-resolution exogenous attention influences performance for stimuli at different spatial frequencies, we fitted individual psychometric curves of contrast level versus discrimination accuracy and estimated contrast thresholds for each cueing condition and spatial frequency for each observer (Figure 3A shows the psychometric curves for one example observer with the result of all psychometric curves included in Figure 3—figure Supplement 1).

Effect of fine-grained exogenous attention on contrast sensitivity.

(A) Psychometric functions illustrating example observers’ discrimination accuracy for Gabor patches with spatial frequencies of 4, 8, 12, and 20 cycles per degree (CPD). The size of each dot corresponds to the number of trials included at a specific contrast value. Vertical lines indicate contrast thresholds, while horizontal lines represent the accuracy level midway between chance performance and maximum performance. (B) Average contrast sensitivity, calculated as the inverse of the contrast threshold, across spatial frequencies in valid and neutral conditions. Each dot represents an individual observer. Error bars denote the standard error of the mean (SEM). Asterisks mark a significant difference in contrast sensitivity between pairs of spatial frequencies. (C) Average contrast sensitivity in neutral condition against that in valid condition at each spatial frequency. Each dot represents an individual observer. Error bars indicate the bootstrapped confidence intervals. (D) Average difference in log-scaled contrast sensitivity between valid and neutral conditions across different spatial frequencies. Each line corresponds to the log-scaled contrast sensitivities from each observer. Error bars represent the bootstrapped confidence intervals. Asterisks mark post-hoc pairwise comparison results between valid and neutral conditions within each spatial frequency. Figure 3—figure supplement 1. Psychometric functions for all observers in all conditions. Figure 3—figure supplement 2. Contrast sensitivity results including 2 CPD. Figure 3—figure supplement 3. Relation between mean and variability of contrast sensitivity (CS).

We first examined the full factorial effects of spatial frequency (SF) and cueing on contrast sensitivity estimated from the observers’ psychometric functions by fitting a linear mixed-effects regression to the estimates of log-transformed contrast sensitivity at all 8 conditions (4 SF x 2 cueing) across observers. The contrast sensitivity captures the effects on the intercept and/or slope of the psychometric function.

Consistent with the literature (Howell and Hess, 1978; Bex et al., 2009; Lovegrove et al., 1980; Rovamo et al., 1992, 1993; Pointer and Hess, 1989), we observed a main effect of spatial frequency on contrast sensitivity (χ2(3) = 65.3, p < 0.0001, Figure 3B); contrast sensitivity was highest at 4 CPD and decreased as the spatial frequency increased. [Post-hoc pairwise comparisons revealed several significant differences between spatial frequencies (post-hoc pairwise comparisons of the estimated marginal means, ps ≤ 0.0001 for 4 vs 8 CPD, 4 vs. 12 CPD, 4 vs. 20 CPD, 8 vs. 20 CPD, 12 vs. 20 CPD).] We also observed a significant main effect of attention on contrast sensitivity (χ2(1) = 7.3, p < 0.001; meanvalid = 10.1 ± 7.56SD and meanneutral = 9.03 ± 6.39SD): averaging across all spatial frequencies, the valid cueing condition resulted in higher contrast sensitivity (Figure 3B, C). These findings indicate that exogenous attention led to a contrast gain across spatial frequencies in the attended subfoveolar region.

Notably, the improvement in contrast sensitivity driven by fine-grained attention was not uniform across spatial frequencies (Figure 3D, also visible in Figure 3B, C). We observed a statistically significant interaction between spatial frequency and attention (χ2(3) = 9.3, p = 0.0258), indicating that contrast gains were selective. Specifically, contrast sensitivity exhibited a contrast gain in the valid condition compared to the neutral condition at lower spatial frequencies (4 and 8 CPD) (mean gain4 CPD = 2.62 ± 2.13 SD and mean gain8 CPD = 1.36 ± 1.25 SD, ps< 0.004). However, the contrast sensitivity gains at higher spatial frequencies (12 and 20 CPD) were smaller and did not reach statistical significance (mean gain12 CPD = 0.09 ± 0.91SD and mean gain20 CPD = 0.11 ± 0.13SD, ps> 0.5). In addition to examining the contrast gain within each spatial frequency, we also compared the amount of contrast gain within each pair of spatial frequencies. Post-hoc pairwise comparisons revealed that attention modulation did not differ between the two low-mid spatial frequencies (4 and 8 CPD, p > 0.8) or between the two mid-hight spatial frequencies (12 and 20 CPD, p > 0.7). Importantly, attention led to a larger enhancement of contrast sensitivity at the low-mid spatial frequencies (4 and 8 CPD) compared to 12 CPD (Δmean gain4 CPD - 12 CPD = 2.53 ± 2.16SD and Δmean gain8 CPD - 12 CPD = 1.27 ± 0.99SD, ps< 0.025). When comparing the low-mid spatial frequencies (4 and 8 CPD) to the highest spatial frequency tested (20 CPD), differences in contrast gains were even larger, though these effects were not statistically significant (Δmean gain4 CPD - 20 CPD = 3.35±1.96SD, p = 0.066 and Δmean gain8 CPD - 20 CPD = 1.45±1.46SD, p = 0.084). As detailed in Methods, two observers did not have data for the 20 CPD condition. Statistical power might thus have been reduced for comparisons against this condition.

These results demonstrate that, similar to the selectivity in enhancements in visual periphery, micro-shifts of exogenous attention within the central fovea selectively enhanced contrast sensitivity primarily for low- to mid-range frequencies (4 to 8 CPD).

In addition to contrast sensitivity, we examined the effects of attention on asymptotic performance. Research on extrafoveal vision has returned mixed results with respect to the effects of exogenous attention on asymptotic performance. On the one hand, it has been found that both exogenous and endogenous attention can enhance not only contrast sensitivity (contrast gain) but also asymptotic performance, a phenomenon known as response gain (Morrone et al., 2002, 2004; Ling and Carrasco, 2006; Pestilli et al., 2009; Reynolds and Heeger, 2009; Herrmann et al., 2010). For exogenous attention, however, some previous studies have found significant effects only on contrast sensitivity, with no notable impact on asymptotic performance in extrafoveal vision (Cameron et al., 2002; Herrmann et al., 2010; Jigo and Carrasco, 2020). It has been argued that whether covert attention influences contrast gain, response gain, or both, depends on stimulus size, size of the attended area as well as spatial uncertainty(Herrmann et al., 2010; Wu et al., 2021; Reynolds and Heeger, 2009).

To examine how asymptotic performance was impacted by micro-shifts of foveal exogenous attention across spatial frequencies, we fit a generalized linear mixed-effects regression to the estimated asymptotic performance from all 8 conditions of all observers. Observers’ ability to discriminate the orientation at full contrast (asymptotic performance) decreased with increasing frequency (Figure 4B, C), but this change was not significant (χ2(3) = 7.6, p = 0.0545). [Post-hoc pairwise comparisons revealed significant differences in asymptotic performance only between 8 and 20 CPD (p = 0.015) as well as 12 and 20 CPD (p = 0.048).] The main effect of attention was significant, with overall higher asymptotic performance in the valid condition compared to the neutral condition (χ2(1) = 11.9, p < 0.001; meanvalid = 0.97 ± 0.03SD and meanneutral = 0.94 ± 0.04SD; see Figure 4A, C). These findings suggest that fine-grained attention resulted in a general response gain, enhancing the ability to discriminate the orientation of high-contrast stimuli.

Effect of fine-grained exogenous attention on asymptotic performance.

(A) Average asymptotic performance, defined as discrimination accuracy at maximum contrast, pooled across spatial frequencies in valid and neutral conditions. Each dot represents an individual observer. Error bars denote the standard error of the mean (SEM). (B) Average asymptotic performance across spatial frequencies in valid and neutral conditions. Each dot represents an individual observer. Error bars denote the standard error of the mean (SEM). Asterisks mark a significant difference in asymptotic performance between pairs of spatial frequencies. (C) Average asymptotic performance in the neutral condition against that in the valid condition at each spatial frequency. Each dot represents an individual observer. Error bars indicate the bootstrapped confidence intervals. (D) Average difference in asymptotic performance between valid and neutral conditions across different spatial frequencies. Each line corresponds to an individual observer. Error bars indicate the bootstrapped confidence intervals. Asterisks mark post-hoc pairwise comparison results between valid and neutral conditions within each spatial frequency. Figure 4—figure supplement 1. Maximum a posteriori (MAP) estimates (points) and 95% confidence intervals for contrast sensitivity (CS) and asymptotic performance (AP).

Unlike for contrast sensitivity, the effect of attention on asymptotic performance was not modulated by spatial frequency; we did not observe a significant interaction between spatial frequency and attention on asymptotic performance (Figs.4D, χ2(3) = 5.1, p > 0.15). [Post-hoc pairwise comparisons between valid and neutral conditions within each spatial frequency revealed that attention significantly increased asymptotic performance compared to neutral condition at 8 and 12 CPD (mean gain8 CPD = 0.05 ± 0.04SD, p < 0.0001 and mean gain12 CPD = 0.03 ± 0.03SD, p = 0.0108) but not at 4 and 20 CPD (mean gain4 CPD = 0.02 ± 0.03SD, p = 0.0819 and Δmean gain20 CPD = 0.03 ± 0.04SD, p = 0.0501). When comparing the attentional benefit on asymptotic performance across pairs of spatial frequencies, we found a significant difference (Δmean gain) only between 4 and 8 CPD (p = 0.0240).] Therefore, asymptotic performance showed only independent, additive effects of frequency and attention, without a systematic influence of spatial frequency on the attentional benefit.

Discussion

Whereas attention is often believed to be either uniformly allocated at the center of gaze or selectively shifted to locations outside of the central fovea, recent research has shown that humans are also capable of allocating attention within the central fovea in a spatially selective manner, enhancing our ability to perceive fine spatial stimuli (Poletti et al., 2017; Guzhang et al., 2021). Humans can focus processing resources on a specific region of the central fovea, enhancing visual processing within that small area while suppressing processing at other unattended locations just a few arcminutes away.

These findings raise the question of what spatial frequencies are enhanced by fine-grained shifts of covert attention within the foveola. In our previous work (Guzhang et al., 2021), observers were asked to perform a coarse orientation (±45°) discrimination task. This experiment did not manipulate spatial frequency. Subjectively, however, the stimuli in this task could be distinguished as long as frequency information of more than 3 cycles per degree (CPD) was available to the observer. The attentional gain we observed in the discrimination task could therefore have resulted from foveal exogenous attention enhancing spatial frequencies anywhere above 3 CPD. Thus, it remains unclear which spatial frequencies are enhanced when exogenous attention is allocated at the fine scale within the foveola. Additionally, it is unclear whether fine-grained attention in the foveola is governed by the same principles and is modulated similarly to extrafoveal attention, especially considering the stark differences in spatial resolution between the foveola and the rest of the visual field. The high-acuity foveola can resolve spatial frequencies up to 30 CPD, whereas just five degrees away from the center of gaze, this limit drops to around 10 CPD (Virsu and Rovamo, 1979). Therefore, rather than enhancing the same range of low spatial frequencies as for extrafoveal vision, fine-grained foveal attention may shift or extend its enhancement toward higher frequencies in the foveola.

Our findings indicate that fine-grained shifts of covert exogenous attention in the foveola enhance contrast sensitivity within a narrow range of spatial frequencies, peaking at low to mid frequencies (4–8 CPD). In particular, we found little or no attentional gain at higher spatial frequencies (12–20 CPD), which are closer to the limits of visual resolution at the eccentricity tested (0.3° from the preferred locus of fixation). Whereas enhancements in contrast sensitivity were relatively selective to a narrow band of spatial frequencies, overall asymptotic performance increased as a result of exogenous attention, with no detected dependence on spatial frequency. However, it is important to remember that the statistical power to detect the interaction might differ between analyses of contrast sensitivity and analyses of asymptotic performance, given that the latter (almost by definition) tends to involve differences close to its bounds (Bicknell et al., 2025; Jaeger, 2008).

These results shed light on the benefits of exogenous attention reported in Guzhang et al.. In that study, oriented bars were presented at threshold contrast levels. The results described here suggest that the performance improvements observed in Guzhang et al. were primarily driven by the enhancement of the low-mid spatial frequencies between 4 to 8 cycles per degree, which were sufficient to perform the coarse ±45° orientation discrimination task. In everyday life, we frequently rely on fine-grained exogenous attention, for example when a salient road sign captures our attention while we are looking into the distance while driving. Accurately identifying the details, such as the small texts on the road sign, often depends on the ability to perceive high spatial frequency components of the stimulus. Based on our findings, if the text on the road sign is at maximal contrast, drivers should benefit from exogenous attention, as we have shown that attention enhances asymptotic performance across a broad range of spatial frequencies, including those near the limits of visual resolution. However, if the sign is faded and the text is close to the threshold contrast, the benefit of exogenous attention will be minimal, as the high spatial frequencies crucial for resolving fine details receive little enhancement from fine-grained exogenous attention in the foveola.

For extrafoveal vision, exogenous attention has been reported to enhance contrast sensitivity selectively within a narrow range of spatial frequencies, peaking at the spatial frequency slightly higher than the ones with the highest contrast sensitivity for a given eccentricity, while sensitivity drops sharply at adjacent spatial frequencies (Jigo and Carrasco, 2020; Fernández et al., 2022). In our study, the small stimulus size required to assess localized attention within the foveola limited our ability to measure contrast sensitivity reliably for spatial frequencies below 4 CPD. Within the spatial frequency range we tested, we observed a monotonic decrease in contrast sensitivity from 4 CPD to 20 CPD as spatial frequency increased. Additionally, we conducted a post-hoc evaluation of contrast sensitivity at 2 CPD using the same Gabor patch size to estimate the peak spatial frequency (see Figure 3—figure Supplement 2). We note that at 2 CPD, the stimulus contained less than one full cycle of modulation, which may have affected the observers’ performance (Howell and Hess, 1978). With this caveat in mind, our post-hoc analysis found similar levels of contrast sensitivity at 2 and 4 CPD. This, combined with the drop in contrast sensitivity at 8 CPD, suggests sensitivity might reach a plateau between 2 to 4 CPD. In addition, attention significantly improved contrast sensitivity at 2 CPD (pair-wise comparison did not reveal any difference in contrast gain between 2 CPD and the rest of the spatial frequencies tested). While our study was not specifically designed to examine the relationship between spatial frequencies with peak contrast sensitivity and those with peak contrast gain from attention, our results were in line with previous research studying the effects of exogenous attention in extrafoveal vision (Jigo and Carrasco, 2020; Fernández et al., 2022), showing a contrast gain in a similar range of spatial frequencies.

While the present study examined the range of spatial frequencies enhanced by fine-grained shifts of attention within the foveola, attention can also be distributed uniformly across the entire foveola. In such cases, it would be interesting to explore whether the range of spatial frequencies enhanced by the broadly distributed attention is comparable to that of fine-grained attention reported here. It is possible that the enhancements from broad attention primarily peak around lower spatial frequencies compared to fine-grained attention. This is because localized attention in the foveola is often used to explore and preview fine spatial details before engaging in further examination, whereas broadly distributed attention may function more as a mechanism to enhance the global pattern of the foveal input, which does not necessarily require the selective processing of high spatial frequencies.

In addition to contrast gain, we also observed a sizable response gain, evident as an increase in asymptotic performance in the valid compared to the neutral condition. This indicates that even at the maximum contrast level, observers were better at discriminating the orientation of the small Gabor patch when attending, compared to baseline. On the contrary, no such enhancement in asymptotic performance has been observed in the extrafovea (Jigo and Carrasco, 2020). Methodological differences, particularly in spatial uncertainty, might account for the presence of response gain in our results. In our study, the neutral condition had only two possible stimulus locations, making the target location more predictable, whereas Jigo and Carrasco required observers to monitor seven locations, introducing greater spatial uncertainty. Prior research has shown that the presence of a response gain is more likely when the stimulus location is fixed and the stimulus is small, whereas high spatial uncertainty and larger stimuli tend to elicit contrast gain (Herrmann et al., 2010). Due to the lower spatial uncertainty and small stimulus size in our study, attention likely influenced both contrast gain and response gain.

Here, we examined selective contrast sensitivity enhancements in spatial frequency driven by fine-grained exogenous attention within the foveola. In the extrafovea, endogenous attention can flexibly modulate different visual features depending on their relevance, while the effects of exogenous attention are relatively inflexible (Hein et al., 2006; Giordano et al., 2009; Bocanegra and Zeelenberg, 2011; Carrasco and Barbot, 2014; Jigo and Carrasco, 2020). Consequently, if fine-grained endogenous attention is engaged, we would expect the range of enhanced spatial frequencies to be higher and closer to the limit of visual resolution or to be modulated based on task demands. Further research is needed to address this point. Understanding how these attentional mechanisms operate in the foveola could provide valuable insights into the functional distinctions between endogenous and exogenous attention in fine spatial processing.

In everyday life, our covert exogenous attention is often engaged when a salient stimulus captures our focus. This evolutionarily important mechanism ensures that we continuously monitor our environment for unexpected events and prepare to respond accordingly (Yantis, 1993; Theeuwes, 2010). Our previous work has demonstrated that attentional shifts can also occur locally within the high-acuity foveola. Here we show that these fine-grained attention shifts function similarly to those in the extrafoveal region, enhancing visual sensitivity to coarse stimulus features. This mechanism is essential for everyday tasks, such as driving or reading. Our findings not only shed light on the functionality of fine-grained covert attention within the foveola but also reinforce the idea that exogenous attention operates under similar principles as extrafoveal vision. Specifically, exogenous attention remains an inflexible mechanism for selective processing—even in the foveola, where higher spatial frequency information is available, it does not enhance contrast sensitivity of the finer details but instead prioritizes coarser stimulus features. Functionally, this selective enhancement of contrast sensitivity at low to mid spatial frequencies provides a preview of small but salient stimuli located just a few arcminutes from the preferred locus of fixation in everyday tasks. By enhancing the visibility of these stimuli before direct fixation, this mechanism enables the visual system to rapidly assess their relevance and guide the planning of microsaccades, ensuring efficient and precise shifts of gaze to bring these stimuli into the foveal region for detailed examination.

Methods and Materials

Observers

7 human observers in total, 6 emmetropic observers, and 1 observer with 20/20 corrected vision participated in the experiments (4 females, 3 males; age range 18 - 27 years old). The experiment was approved by the University of Rochester Institutional Review Boards. The experimenter reviewed and explained the material in the consent form to the participants before conducting the experiment. The form was signed only after the participant fully understood the material and voluntarily agreed to take part in the study. Consent was obtained from all participants in the study.

Stimuli and Apparatus

Stimuli were displayed on an LCD monitor (ASUS ROG SWIFT 360Hz PG259QN) at a refresh rate of 360 Hz and spatial resolution of 1920 x 1080 pixels. Observers performed the task monocularly with their right eye while the left eye was patched. A dental-imprint bite bar and a headrest were used to prevent head movements. Eye movements were recorded with high precision using a custom-made digital Dual Purkinje Image (dDPI) eye tracker, which has a sampling rate of 1 kHz (Wu et al., 2023). The system has an internal noise well below 1′ and a spatial resolution of 1′ (Ko et al., 2016; Wu et al., 2023). Stimuli were rendered using EyeRIS, a custom-developed system that allows flexible gaze-contingent display control (Santini et al., 2007). This system acquires eye movement signals from the eye tracker, processes them in real time, and updates the stimulus on the display according to the desired combination of estimated oculomotor variables.

Procedure and Experimental Task

Calibration

Every session started with the setup of the bite bar. A magnetized helmet was used to position the observer’s head. When accurate localization of gaze position is necessary, calibration represents an important stage of the experimental procedure, which was performed in two phases. During the first phase, observers sequentially fixated on each of the nine points of a 3-by-3 grid, as is customary in all oculomotor experiments. In the second phase, observers refined the pixel-to-pixel mapping, given by the automatic calibration. Observers fixated again on each of the nine points of the grid while the location of the line of sight was displayed in real time on the screen. Observers used a joypad to correct the predicted gaze location, shifting the real-time display to align with the grid point for each fixation, if necessary. These corrections were then incorporated into the transformation of the gaze position as well. This dual-step calibration procedure allows more accurate localization of gaze position than standard single-step procedures. The manual calibration procedure was repeated at the center of the fixation before each trial to compensate for unpreventable head movements.

Experimental task

Observers were instructed to fixate on a central marker (5 by 5 arcminutes) throughout each trial. On valid trials, an exogenous cue—a white square (8-by-8 arcminutes)—appeared 500ms after fixation. The cue appeared for 30 ms at 0.75 deg eccentricity to the left/right of the fixation marker, with each location occurring randomly with equal probability. The smaller exogenous cue, positioned offset from the Gabor patch, was used to prevent forward masking and ensure clear perception of the Gabor patch. Shortly after the cue disappeared (70 ms), two small Gabor patches (0.5 deg visible area), tilted +/− 45 degrees, with a phase of 0 or 90 degrees were shown (50 ms) on the left/right side at 0.5 deg eccentricity. The tilt of the two Gabor patches was randomly and independently chosen on each trial. And the phase was randomly selected on each trial but consistent between the two patches. The spatial frequency of the Gabors was 4, 8, 12, or 20 CPD. After the stimulus offset, a response cue was presented, and observers were instructed to report the orientation of the stimulus previously presented at that location. The trial concluded either when observers responded or automatically after 1000 ms if no response was given following the appearance of the response cue. On valid trials, the response cue always indicated the same location as the exogenous cue, making the cue 100% valid. On neutral trials, no exogenous cue was presented, and the response cue indicated one of the two possible locations randomly.

Given that contrast sensitivity varies considerably across SF and eccentricity (Rovamo et al., 1992), for each spatial frequency tested, observers underwent a preliminary session in which an initial estimate of contrast threshold, defined as the contrast needed to achieve 70% discrimination accuracy in the neutral condition. The order of spatial frequency tested was randomized across observers. After the threshold was obtained, each observer was tested at five different contrast values around the estimated threshold. One of these values included presenting the grating at 100% contrast to obtain a precise estimate for the upper-performance asymptote. The remaining four levels were ±0.075 and ±0.225 log10 units from the initial threshold estimate. If the initial estimates were within 0.225 log10 units of 100% contrast (i.e., >= 60% contrast), the rest of the four contrast values were −0.6, −0.45, −0.3, and −0.15 log10 units compared to the initial estimate (Prins, 2012). Within each experimental session, a single spatial frequency was tested, and the corresponding contrast levels were presented in a block design. All five contrast levels were tested within a single experimental session on the same day. Each contrast level included between 50 and 100 filtered trials.

Two observers were not tested at 20 CPD because their performance remained at chance level even with gratings at maximal contrast. It is possible given that 20 CPD was near the visual resolution limit at the tested eccentricity.

Data Analysis

Only trials with uninterrupted tracking in which the fourth Purkinje image was never eclipsed by the pupil margin, were selected for data analysis. Trials in which the gaze was > 10’ away from the center position 50 ms before the onset of the exogenous cue t0 50 ms after the offset of the target, and trials with blinks, saccades, or microsaccades occurring at any time during the period of interest (50 ms before the onset of the exogenous cue to 200 ms after the offset of the Gabor patches), were discarded. Periods of blinks were automatically detected by the dDPI eye tracker. Eye movements with a minimal amplitude of 30 and a peak velocity higher than 3°/s were categorized as saccades. Saccades with an amplitude of less than 0.5° (30’) were defined as microsaccades. Saccade amplitude was defined as the vector connecting the point where the speed of the gaze shift grew greater than 3°/s (saccade onset) and the point where it became less than 3°/s (saccade offset). Periods that were not classified as saccades or blinks were labeled as drifts. Observers had 1000 ms to respond, and trials were excluded from further analysis if observers responded too fast (< 100 ms) or too slow (> 1000 ms), resulting in the exclusion of 0.02% ± 0.02% of the trials. Table 1 summarizes the data remaining for analysis for each condition (see Figure 2—figure Supplement 1 for a detailed breakdown by observer).

Average number of trials included in the analysis after filtering (± SEs over by-observer counts), across different spatial frequencies and cueing conditions.

Parentheses show inclusions as percentages of total trials collected.

Weibull functions were fitted to the responses of the orientation discrimination task, using the maximum likelihood procedure implemented in the psignifit 4 toolbox (Schütt et al., 2016) for MATLAB. Separate functions were fitted for each combination of observer, attention conditions (attended and neutral), and spatial frequency, for a total of 8 psychometric function fits (2 attention conditions x 4 spatial frequencies) per observer. Each fit resulted in maximum a posteriori (MAP) estimates for the intercept α, slope β, threshold θ, and lapse rate λ of the psychometric function (the guess rate γ was set to .5, given the 2AFC task). Two estimates were extracted from the MAP estimates to examine the effects of fine-grained exogenous attention across spatial frequencies — contrast sensitivity and asymptotic performance. Contrast sensitivity was defined as the inverse of the threshold (the midpoint on the psychometric curve between chance performance and maximum performance). Asymptotic performance was calculated by subtracting the lapse rate from 1, representing the discrimination accuracy at the highest contrast level of the stimuli. In total, this procedure resulted in 52 estimates each of contrast sensitivity and asymptotic performance (5 observers with 4 SFs x 2 cueing conditions, and 2 observers, who did not complete the 20 CPD condition, with 3 SFs x 2 cueing conditions).

Contrast sensitivity (CS) and asymptotic performance (AP) were both analyzed with (different types of) mixed-effects regressions. Each of the mixed-effects regressions contained cueing, spatial frequency, and their interactions as fixed-effects predictors. Cueing was effect-coded (“attended” = .5 vs. “neutral” = −.5), and frequency was coded using sliding difference, comparing the effects for each spatial frequency against the next highest spatial frequency (4 vs. 8, 8 vs. 12, 12 vs. 20). Following the recommended procedure (Lohse, 2022), we included the maximal possible random effect structure: random intercepts by observer, by unique combination of observer and cueing condition, and by unique combination of observer and spatial frequency condition.

CS is a bounded variable with a natural limit in that it cannot be lower than zero. Importantly, the variance of bounded variables tends to systematically decrease as their mean approaches the bound. This violates the assumption of homoskedasticity—the idea that variance should be independent of the mean and thus remain roughly constant across different conditions — an assumption that is shared by widely used statistical methods like t-tests, ANOVA (analysis of variance) and linear mixed-effects models (LMMs). When this assumption is violated, it can impact the reliability of statistical conclusions, affecting both Type I errors (false positives) and Type II errors (false negatives) (Jaeger, 2008).

Indeed, we observed a strong positive correlation between the mean and variance of contrast sensitivity: smaller variances for smaller means (Figure 3—figure Supplement 3). To address this issue, we log-transformed CS before analyzing it with an LMM using the lmer function from the lme4 package (Bates et al., 2025) in R (R Core Team, 2024). This largely mitigated the heteroskedasticity, except potentially in the 20 CPD condition (see Figure 3—figure Supplement 3). As a precautionary measure, we verified that all main findings remained unchanged when this condition was excluded. This included the critical interaction between spatial frequency and attention (χ2(2) = 6.6, p < .04), which remained statistically significant.

AP is bounded both at the lower and the upper end (as it cannot be larger than 1, or smaller than the guess rate). Following recommended procedure, we thus normalized asymptotic performance to the range between 0 and 1, and analyzed it with a mixed-effects Beta model (with a logit link) using lme4’s glmer function.

Post-hoc pairwise comparisons for both mixed-effects analyses were conducted by estimating the relevant marginal means of the fitted mixed-effects regression, using the emmeans package (Lenth et al., 2025) in R.

Below, we summarize the key considerations for interpreting our results.

Limitations

Here, we analyzed log-transformed CS and AP in two separate analyses, each conducted over the observer-level estimates for each condition. We did so both (1) because this approach remains the standard in psychophysics—including in research on the role of covert attention (Cameron et al., 2002; Pestilli and Carrasco, 2005; Herrmann et al., 2010; Li et al., 2021)—and (2) because the alternative would have required fitting mixed-effects trial-level psychometric models to the combined data from all conditions and observers (an approach that is computationally demanding, and has not yet been broadly validated). The approach taken here and in prior work on extrafoveal attention does, however, have several known limitations, some of which might be of particular relevance to questions about the effects of attention. We summarize these potential downsides here, so that they can be considered in the interpretation of our results, and addressed in future work.

First, summarizing each observer’s performance in a particular condition in terms of the best-fitting estimates of the parameters of the psychometric function (e.g., sensitivity or asymptotic performance) discards all uncertainty about these estimates. The repeated-measures ANOVA or, in our case, mixed-effects regression analyses that are conducted over the best-fitting estimates have no access to information contained in the trial-level observations. This means that two estimates with very different levels of uncertainty (see Figure 4—figure Supplement 1) can have the same impact on the analysis, underweighting less uncertain estimates and overweighting more uncertain estimates. [Differences in uncertainty across conditions and/or across observers can result from a variety of sources. This includes, among other things, differences in the number of trials available to fit the psychometric model, differences in which subsets of trials meet the exclusion criteria (even when those criteria are constant across conditions), and differences in where along the perceptual continuum observers were queried—i.e., at which levels of the observer’s true probability of an accurate answer, p(accurate answer |stimulus), which in turn depends on the true parameterization—intercept, slope, and lapse rate—of the observer’s psychometric function. For instance, for observers who have not reached peak performance at 100% contrast, it is difficult to dissociate the slope and lapse rate of their psychometric function, leading to increased uncertainty about the observer’s contrast threshold, sensitivity, and asymptotic performance.] A failure to adequately account for uncertainty can also increase Type I and Type II errors.

Second, the standard approach discards all information about the covariation between the different parameters. For instance, the same data can sometimes be described well by either having a larger slope in the perceptual model (and thus a lower threshold and higher sensitivity) combined with a lower asymptotic performance or a smaller slope combined with higher asymptotic performance. Any information about such correlations between parameters of the psychometric function is lost under the traditional approach we have followed here. This might contribute to seemingly heterogeneous results across the literature with regard to whether the effects of attention result in contrast or response gain (see Discussion).

Future research could employ alternative analysis approaches. In particular, it is now possible to fit mixed-effects psychometric models to the trial-level data from all conditions and all observers (Prins, 2024; Tan and Jaeger, 2025). While this approach is computationally more demanding and requires familiarity with nonlinear mixed-effects modeling, it allows statistical tests that avoid the downsides described above.

Supplementary Figures

Number of trials included for analysis by condition.

Each line is an observer. Error bars show the mean of by-observer counts and SEM.

Psychometric Weibull function fits for all observers and conditions.

(A) Average contrast sensitivity, calculated as the inverse of the contrast threshold, across spatial frequencies, including 2 CPD in valid and neutral conditions. Each dot represents an individual observer. Error bars denote the standard error of the mean (SEM). Asterisks mark a significant difference in contrast sensitivity between 2 CPD and other spatial frequencies. (B) Average difference in log-scaled contrast sensitivity between valid and neutral conditions across different spatial frequencies. Each line corresponds to the log-scaled contrast sensitivities from each observer. Error bars represent the bootstrapped confidence intervals. Asterisks mark post-hoc pairwise comparison results between valid and neutral conditions within each spatial frequency.

Relation between mean and variability of contrast sensitivity (CS), depending on whether CS is log-transformed (right panel) or not (left panel).

Without a logtransform, the mean and standard deviation (SD) of CS are almost perfectly correlated, constituting a strong violation of the homoskedasticity assumption of linear models. For the present data, logtransforming CS mostly removes this correlation (except for the 20 CPD condition).

Maximum a posteriori (MAP) estimates (points) and 95% confidence intervals for contrast sensitivity (CS) and asymptotic performance (AP) for each experimental condition across seven observers.

Both CS and AP were transformed in the same way as used in our mixed-effects analyses.

Acknowledgements

The authors wish to thank Nora Rooney for assisting with data collection.

Additional Information

Funding

This work was funded by META, NIH R01 EY029788-01 (M.P.), and NIH grant EY001319 to the Center for Visual Science.

Author Contributions

Y.G. and M.P. designed research; Y.G. performed research; Y.G. and T.J. analyzed data; all authors drafted the paper and approved the final version of the manuscript.