Solving visual search and symmetry tasks using visual homogeneity.

(A) Example target-present search display, containing a single oddball target (horse) among identical distractors (dog). Participants in such tasks have to indicate whether the display contains an oddball or not, without knowing the features of the target or distractor. This means they have to perform this task by detecting some property of each display rather than some feature contained in it.

(B) Example target-absent search display containing no oddball target.

(C) Hypothesized neural computation for target present/absent judgements. According to multiple object normalization, the response to multiple items is an average of the responses to the individual items. Thus, the response to a target-absent array will be identical to the individual items, whereas the response to a target-present array will lie along the line joining the corresponding target-absent arrays. This causes the target-absent arrays to stay apart (red lines), and the target-present arrays to come closer due to mixing (blue lines). If we calculate the distance (VH, for visual homogeneity) for each display, then target-absent arrays will have a larger distance to the center (VHa) compared to target-present arrays (VHp), and this distance can be used to distinguish between them. Inset: Schematic distance from center for target-absent arrays (red) and target-present arrays (blue). Note that this approach might only reflect the initial target selection process involved in oddball visual search but does not capture all forms of visual search.

(D) Example asymmetric object in a symmetry detection task. Here too, participants have to indicate if the display contains a symmetric object or not, without knowing the features of the object itself. This means they have to perform this task by detecting some property in the display.

(E) Example symmetric object in a symmetry detection task.

(F) Hypothesized neural computations for symmetry detection. Following multiple object normalization, the response to an object containing repeated parts is equal the response to the individual part, whereas the response to an object containing two different parts will lie along the line joining the objects with the two parts repeating. This causes symmetric objects to stand apart (red lines) and asymmetric objects to come closer due to mixing (blue lines). Thus, the visual homogeneity for symmetric objects (VHs) will be larger than for asymmetric objects (VHa). Inset: Schematic distance from center for symmetric objects (red) and asymmetric objects (blue).

(G) Behavioral predictions for VH. If visual homogeneity (VH) is a decision variable in visual search and symmetry detection tasks, then response times (RT) must be largest for displays with VH close to the decision boundary. This predicts opposite correlations between response time and VH for the present/absent or symmetry/asymmetry judgements. It also predicts zero overall correlation between VH and RT.

(H) Neural predictions for VH. Left: Correlation between brain activations and VH for two hypothetical brain regions. In the VH-encoding region, brain activations should be positively correlated with VH. In any region that encodes task difficulty as indexed by response time, brain activity should show no correlation since VH itself is uncorrelated with RT (see Panel G). Right: Correlation between brain activations and RT. Since VH is uncorrelated with RT overall, the region VH should show little or no correlation, whereas the regions encoding task difficulty would show a positive correlation.

Visual homogeneity predicts target present/absent responses

(A) Example search array in an oddball search task (Experiment 1). Participants viewed an array containing identical items except for an oddball present either on the left or right side, and had to indicate using a key press which side the oddball appeared. The reciprocal of average search time was taken as the perceptual distance between the target and distractor items. We measured all possible pairwise distances for 32 grayscale natural objects in this manner.

(B) Perceptual space reconstructed using multidimensional scaling performed on the pairwise perceptual dissimilarities. In the resulting plot, nearby objects represent hard searches, and far away objects represent easy searches. Some images are shown at a small size due to space constraints; in the actual experiment, all objects were equated to have the same longer dimension. The correlation on the top right indicates the match between the distances in the 2D plot with the observed pairwise distances (**** is p < 0.00005).

(C) Example display from Experiment 2. Participants performed this task inside the scanner. On each trial, they had to indicate whether an oddball target is present or absent using a key press.

(D) Predicted response to target-present and target-absent arrays, using the principle that the neural response to multiple items is the average of the individual item responses. This predicts that target-present arrays become similar due to mixing of responses, whereas target-absent arrays stand apart. Consequently, these two types of displays can be distinguished using their distance to a central point in this space. We define this distance as visual homogeneity, and it is obtained by finding the optimum center that maximizes the difference in correlations with response times (see Methods).

(E) Mean visual homogeneity relative to the optimum center for target-present and target-absent displays. Error bars represent s.e.m across all displays. Asterisks represent statistical significance (**** is p < 0.00005, unpaired rank-sum test comparing visual homogeneity for 32 target-absent and 32 target-present arrays).

(F) Response time for target-present searches in Experiment 2 plotted against visual homogeneity calculated from Experiment 1. Asterisks represent statistical significance of the correlation (**** is p < 0.00005). Note that a single model is fit to find the optimum center in representational space that predicts the response times for both target-present and target-absent searches.

(G) Response time for target-absent searches in Experiment 2 plotted against visual homogeneity calculated from Experiment 1. Asterisks represent statistical significance of the correlation (**** is p < 0.00005).

A localized brain region encodes visual homogeneity

A. Searchlight map showing the correlation between mean activation in each 3x3x3 voxel neighborhood and visual homogeneity.

B. Searchlight map showing the correlation between neural dissimilarity in each 3x3x3 voxel neighborhood and perceptual dissimilarity measured in visual search.

C. Key visual regions identified using standard anatomical masks: early visual cortex (EVC), area V4, lateral occipital (LO) region. The visual homogeneity (VH) region was identified using the searchlight map in Panel A.

D. Correlation between the mean activation and visual homogeneity in key visual regions EVC, V4, LO and VH. Error bars represent standard deviation of the correlation obtained using a boostrap process, by repeatedly sampling participants with replacement for 10,000 times. Asterisks represent statistical significance, estimated by calculating the fraction of bootstrap samples in which the observed trend was violated (* is p < 0.05, ** is p< 0.01, **** is p < 0.0001).

E. Correlation between neural dissimilarity in key visual regions with perceptual dissimilarity. Error bars represent the standard deviation of correlation obtained using a bootstrap process, by repeatedly sampling participants with replacement 10,000 times. Asterisks represent statistical significance, estimated by calculating the fraction of bootstrap samples in which the observed trend was violated (** is p < 0.001).

Visual homogeneity predicts symmetry perception

(A) Example search array in Experiment 3. Participants viewed an array containing identical items except for an oddball present either on the left or right side, and had to indicate using a key press which side the oddball appeared. The reciprocal of average search time was taken as the perceptual distance between the target and distractor items. We measured all possible pairwise distances for 64 objects (32 symmetric, 32 asymmetric) in this manner.

(B) Perceptual space reconstructed using multidimensional scaling performed on the pairwise perceptual dissimilarities. In the resulting plot, nearby objects represent hard searches, and far away objects represent easy searches. Some images are shown at a small size due to space constraints; in the actual experiment, all objects were equated to have the same longer dimension. The correlation on the top right indicates the match between the distances in the 2D plot with the observed pairwise distances (**** is p < 0.00005).

(C) Two example displays from Experiment 4. Participants had to indicate whether the object is symmetric or asymmetric using a key press.

(D) Using the perceptual representation of symmetric and asymmetric objects from Experiment 3, we reasoned that they can be distinguished using their distance to a center in perceptual space. The coordinates of this center are optimized to maximize the match to the observed symmetry detection times.

(E) Visual homogeneity relative to the optimum center for asymmetric and symmetric objects. Error bar represents s.e.m. across images. Asterisks represent statistical significance (* is p < 0.05, unpaired rank-sum test comparing visual homogeneity for 32 symmetric and 32 asymmetric objects).

(F) Response time for asymmetric objects in Experiment 4 plotted against visual homogeneity calculated from Experiment 3. Asterisks represent statistical significance of the correlation (** is p < 0.001).

(G) Response time for symmetric objects in Experiment 4 plotted against visual homogeneity calculated from Experiment 3. Asterisks represent statistical significance of the correlation (* is p < 0.05).

Brain region encoding visual homogeneity during symmetry detection

(A) Searchlight map showing the correlation between mean activation in each 3x3x3 voxel neighborhood and visual homogeneity.

(B) Searchlight map showing the correlation between neural dissimilarity in each 3x3x3 voxel neighborhood and perceptual dissimilarity measured in visual search.

(C) Key visual regions identified using standard anatomical masks: early visual cortex (EVC), area V4, Lateral occipital (LO) region. The visual homogeneity (VH) region was identified using searchlight map in Panel A.

(D) Correlation between the mean activation and visual homogeneity in key visual regions EVC, V4, LO and VH. Error bars represent standard deviation of the correlation obtained using a boostrap process, by repeatedly sampling participants with replacement for 10,000 times. Asterisks represent statistical significance, estimated by calculating the fraction of bootstrap samples in which the observed trend was violated (* is p < 0.05, ** is p< 0.01, **** is p < 0.0001).

(E) Correlation between neural dissimilarity in key visual regions with perceptual dissimilarity. Error bars represent the standard deviation of correlation obtained using a bootstrap process, by repeatedly sampling participants with replacement 10,000 times. Asterisks represent statistical significance, estimated by calculating the fraction of bootstrap samples in which the observed trend was violated (** is p < 0.001).