Visual Attention in The Fovea and The Periphery during Visual Search

Jie Zhang; Xiaocang Zhu; Shanshan Wang; Zhengyu Ma; Hossein Esteky; Yonghong Tian; Robert Desimone; Huihui Zhou

doi:10.7554/eLife.109498.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Reviewing Editor
Tirin Moore
Stanford University, Howard Hughes Medical Institute, Stanford, United States of America
Senior Editor
Tirin Moore
Stanford University, Howard Hughes Medical Institute, Stanford, United States of America

Reviewer #1 (Public review):

Summary:

This manuscript aims to differentiate between foveal and peripheral attentional mechanisms in visual and frontal brain regions in monkeys engaged in a free-gaze visual search task.

Strengths:

The manuscript is clearly written, the question is important, and the behavioral task is interesting.

Weaknesses:

I have two major concerns.

(1) The authors interpret divergence in neural responses to target vs nontarget as attention. But it is not. The subject has to attend to both target and nontarget stimuli to determine the stimulus category and thereby decide on the next action. Thus, divergence between target and nontarget responses could reflect categorical discrimination, but I am not sure this can be interpreted as attentional modulation. While it may be tempting to suggest that finding a stimulus of a specific category is "feature attention", analogous to, e.g., attending to the red stimulus, I don't believe this is correct. For the former, the animals have to attend to a stimulus, and examine the stimulus to determine the stimulus category, unlike a simpler discrimination, which may pop out. Given this, I am unconvinced that the interpretations in this manuscript are valid.

(2) Regarding the RF classification of foveal and peripheral RFs for IT and PFC, prior work suggests that neurons in IT cortex (especially AIT) and PFC have RFs that largely include the foveal visual field. So, it would be important to include figures that show the RFs of neurons classified as foveal versus peripheral for all three areas.

https://doi.org/10.7554/eLife.109498.1.sa2

Reviewer #2 (Public review):

Summary:

In natural visual behavior, such as when one is looking for a face in the crowd, the eyes are moved from site to site, seeking possible matching targets. This involves attention both to the current view at the center of vision (the foveal location) as well as to upcoming views via attention to targets in the periphery. While it has been established that attention generally enhances neuronal response (compared to simple visual activation) at the attended spatial location, this study provides solid evidence that attention during active visual search leads to neuronal response enhancement only when the eye moves towards targets that exhibit the desired feature and category. This study thus moves the field towards understanding the neural encoding of active vision.

This study examines the neuronal basis of feature-selective attention during active, freely behaving visual search. Traditional electrophysiological studies on visual attention in monkeys commonly used an eye fixation with a covert attention paradigm, but have not sufficiently addressed the roles of both foveal and peripheral attention in play during natural looking behavior. Here, the authors present a novel paradigm in which, during eye-movement mediated search, neuronal receptive fields are recorded in multiple cortical areas (sensory V4, temporal, and prefrontal areas). In this manner, as the eye foveates, items in the array fall into foveal or non-foveal recorded sites. Thus, the experimental paradigm is elegant, offering the opportunity to make multiple types of comparisons: target/distractor, towards/away from fovea, and areal. Specifically, following a category cue (face, house, hand, flower), freely initiated saccades are made to locate a categorically matching 'target' in an array of distractors. Feature attention is assessed by comparing eye saccades made to targets vs to distractors. Spatial attention is assessed by comparing saccades made 'towards' vs 'away' from targets. Statistics are rigorous and nicely designed. The detailed association of simultaneously obtained eye movement sequences and neural parameters is well done. These are valuable data that will contribute to our understanding of attentional modulation in visual search.

Strengths:

The significance of these findings is fundamental. Decades of attention research in vision have been based on the paradigm of visual fixation and covert peripheral attention. However, increasingly, the field has moved towards understanding how the visual system works during active vision. Here, the authors use an active visual search paradigm and record from multiple areas (V4, IT, PFC). They find enhancement of attention both in the foveal and peripheral locations, and, furthermore, a high degree of feature and categorical specificity. This provides valuable data for the concept of a foveal-peripheral attentional window in natural vision. The controls (comparisons of neuronal response during looks to targets vs distractors, and looks towards and away from the target) and statistical rigor make these findings quite compelling.

Weaknesses:

While the study is generally quite strong, there are a few weaknesses to be addressed.

(1) Little rationale is provided for recording in the selected areas, V4, IT, and PFC. Given the respective roles in sensory, object recognition, and goal-directed behavior, some rationale for this design should be offered, and commonalities/distinctions between these areas should be discussed.

(2) Given the reliance of all analyses on saccadic behavior (towards target/distractor, towards/away from target), additional description and summaries of eye movement behavior during single trials and across trials should be provided.

(3) The dependency of findings on top-down (categorical & feature-specific) task design should be discussed.

https://doi.org/10.7554/eLife.109498.1.sa1

Reviewer #3 (Public review):

In this manuscript, the authors investigate the role of attention in foveal processing during a naturalistic task. They record neural activity from extrastriate visual areas V4 and inferotemporal cortex, as well as from the lateral prefrontal cortex, in macaques performing a free-gaze visual search task. In this task, animals searched for a face or house target among multiple complex stimuli, with no constraints on eye movements. Unlike classic studies of visual attention, which often rely on controlled fixation, this work examines neural activity in both foveal and peripheral receptive fields during naturalistic eye movements.

The main question addressed by the authors is how feature-based attention is distributed and coordinated across foveal and peripheral visual fields during active search, and how this attentional processing influences saccade behavior. The authors show that foveal units in visual areas exhibit feature-based attentional enhancement, with stronger responses when a fixated stimulus is a target compared to when the same stimulus serves as a distractor. Peripheral units in visual and prefrontal areas show both feature-based and spatial attentional modulation, consistent with prior work. Finally, the authors show that attentional modulation depends primarily on stimulus category rather than response magnitude, with neurons showing similar enhancement for all images within the target category regardless of how strongly individual images drive the cell.

There are several notable strengths of this paper, including:

(1) Disentangling feature-based and spatial attention during naturalistic vision remains a central challenge. This paper tackles both simultaneously, parsing neural populations by object selectivity (face-selective, house-selective, non-selective) and RF position (foveal vs. peripheral).

(2) The unconstrained search task (Figure 1A) moves beyond the dominant fixed-gaze, cued-attention designs (Zhou & Desimone, 2011) to study attention as it operates during natural behavior, with sequential fixations and voluntary saccades.

(3) The scale of the multi-area recordings is a major strength and is well aligned with current trends in primate and human neuroscience toward large-scale, multi-area recordings. Simultaneous recordings from visual and prefrontal areas, comprising over 4,900 foveal units and more than 1,500 peripheral units, enable meaningful cross-area latency comparisons and area-specific analyses of attentional modulation. This study builds on the authors' previous analyses of this dataset by expanding the scope to show that feature-based attention generalizes across neuronal classes and operates on categorical identity rather than response magnitude.

(4) The combination of simultaneous multi-area recordings and a rich behavioral paradigm provides a dataset that is well-suited for population decoding, cross-area interaction analyses, and trial-by-trial prediction of saccade choices, which could substantially deepen mechanistic understanding beyond the largely univariate comparisons presented here.

While the data broadly support the paper's main conclusions, several issues limit the strength of the mechanistic interpretation and should be taken into consideration:

(1) Receptive field size is not explicitly quantified and may confound foveal-peripheral comparisons. Units are classified as foveal or peripheral based on responsiveness to the cue versus the search array (Methods, p. 17), but the manuscript lacks essential information about receptive field sizes, eccentricities, and the number of search stimuli falling within each receptive field and related proper controls. This is critical because receptive fields in visual area V4 at foveal eccentricities are relatively small (Gattass et al., 1988; Desimone & Schein, 1987), whereas receptive fields in inferotemporal cortex can span several degrees to tens of degrees and often include the fovea (Op de Beeck & Vogels, 2000; DiCarlo & Maunsell, 2003; Zoccolan et al., 2007). Given the 2{degree sign} × 2{degree sign} stimulus size, multiple search items could potentially fall simultaneously within peripheral receptive fields. This introduces a potential confound, as attentional modulation is known to be strongest when multiple stimuli appear within a single receptive field (Reynolds et al., 1999). Although the authors acknowledge this issue for visual area V4 (p. 17), it is neither quantified nor controlled for. Without explicit receptive field mapping relative to the search array, comparisons between foveal and peripheral units, as well as between visual areas, are difficult to interpret cleanly.

(2) Attentional modulation is difficult to dissociate from saccade planning and decision-related signals. The free-gaze paradigm enhances ecological validity but introduces a temporal confound: mean distractor fixation durations are approximately 156 ms (p. 9), while attentional effects emerge between 137 and 170 ms after fixation onset (Figure 2). As a result, the reported attentional modulation coincides with the preparation of the subsequent saccade. Neural activity measured in the primary analysis window (150-225 ms; p. 19), therefore, likely reflects a mixture of visual, attentional, motor planning, target recognition, and behavioral relevance signals, all of which are known to modulate responses in visual areas at similar latencies (e.g., Chelazzi et al., 1998). Moreover, target fixations (~257 ms) and distractor fixations (~156 ms) occur on fundamentally different behavioral timescales, which may inflate apparent foveal attentional effects. While the authors suggest that these timing differences support the idea that foveal feature-based attention facilitates prolonged fixation on target stimuli, this interpretation is not fully supported by the current analyses. That said, the saccade-aligned analyses of peripheral units (Figure S3) partially mitigate this concern by demonstrating that feature-based modulation persists through saccade execution.

(3) The "attention-out" condition for spatial attention lacks directional control. In the spatial attention analyses (Figures 4D-F), the "attention-out" condition appears to include all fixations followed by saccades directed away from the receptive field, regardless of saccade direction. This differs from classic spatial attention designs, which typically use controlled anti-saccades or saccades to fixed locations opposite the receptive field (e.g., Moore & Armstrong, 2003; Gregoriou et al., 2009). Saccades directed toward locations adjacent to, but outside, the receptive field may still partially engage spatial attention mechanisms near the receptive field via broad attentional fields or motor preparation gradients (Bisley & Goldberg, 2010). In addition, the "attention-out" condition likely contains a heterogeneous mixture of trials in which the stimulus in the receptive field is either a target or a distractor, since feature-based attention effects are derived from this same pool of trials. As a result, spatial and feature attention effects are not fully orthogonal, and variance related to feature attention may already be embedded in the spatial attention baseline.

https://doi.org/10.7554/eLife.109498.1.sa0

Visual Attention in The Fovea and The Periphery during Visual Search

Peer review process

Editors

Be the first to read new articles from eLife