Experimental task and spatial predictors of fixation selection.

(A) Behavioral paradigm. Monkeys initiated each trial by fixating a central point (500 ms) before freely viewing a naturalistic scene for 5 s to receive a juice reward. (B) Example scene and corresponding spatial predictor maps. The top-left panel shows an example scene with fixation locations (green dots) and randomly sampled non-fixated regions (black dots). Together, these locations provide an account of regions that did and did not capture the subject’s attention. Each location was used to compute a mean value from the meaning, salience, and center proximity maps across an approximately 3-degree window, illustrated by the circles around example fixated (green) and non-fixated (black) locations. These spatial maps respectively visualize the spatial distribution of semantic informativeness, low-level physical features, and the central fixation bias used to model fixation selection. (C) Predictor distributions for Monkey V. Density plots comparing the normalized (z-scored) values of meaning, salience, and center proximity at actual fixated locations (green) versus randomly sampled non-fixated locations (black). (D) Predictor distributions for Monkey I. Same as in (C).

Posterior estimates for the fixation model.

The fixed effects (main effects and interactions) and the scene-level random effect are presented for the fixation-likelihood GLMM. Columns display the posterior mean and standard deviation (SD) of β coefficients, the 95% highest density interval (HDI), and the probability of a positive effect (P(β > 0 | data)). Probabilities close to 1 indicate strong evidence for a positive effect, whereas probabilities close to 0 indicate strong evidence for a negative effect. Values that would round to 1.00 or 0.00 are reported as P(β > 0 | data) > 0.999 or P(β > 0 | data) < 0.001 to avoid implying exact certainty.

Scene meaning is a robust predictor of fixation selection.

(A, C) Posterior estimates of the fixed effects for Monkey V (A) and Monkey I (C). Points represent the posterior mean of β coefficients, and error bars indicate the 95% highest density interval (HDI). The vertical red line marks 0 (no effect). Predictors are considered reliable if their 95% HDI excludes 0. (B, D) Predicted fixation probability as a function of the three main spatial predictors for Monkey V (B) and Monkey I (D). Curves visualize the marginal effects of meaning (red), salience (blue), and center proximity (green) as they vary from −3 to +3 standard deviations (z-scores), while holding other predictors constant at their means (0). Solid lines indicate the posterior mean prediction, and shaded areas represent the 95% HDI. (E, F) Model classification performance for Monkey V (E) and Monkey I (F). Density distributions show the model’s predicted probabilities for observed fixations (dark gray) and non-fixated locations (light gray). The vertical dashed line represents the classification threshold (0.5), and the percentages indicate the proportion of observations in each class that fall above or below the threshold. (G, H) Relative importance of spatial predictors for Monkey V (G) and Monkey I (H). Bars represent the unique share of explained variance (ΔR2) attributable to meaning (red), salience (blue), and center proximity (green). Error bars indicate the 95% highest density interval (HDI).

Attention is directed to high-meaning regions, with salience playing an increasing role as meaning decreases.

(A, B) Visualization of the meaning × salience interaction for Monkey V (A) and Monkey I (B). Curves show the predicted fixation probability as a function of meaning at three representative levels of image salience: low (−2 SD, purple), average (0 SD, red), and high (+2 SD, yellow). Solid lines represent the posterior mean prediction, and shaded areas indicate the 95% highest density interval (HDI). The vertical dashed line marks the average meaning value (0). In both animals, when scene meaning is high, fixation probability remains consistently high with little influence of visual salience. In contrast, when scene meaning is low, visual salience strongly modulates fixation probability.

Attention is directed to high-meaning regions, with familiarity playing an increasing role as meaning decreases.

(A) Example unfamiliar and familiar scenes. The top row displays an unfamiliar scene with fixations (green dots) alongside its meaning and salience maps. The bottom row displays a familiar scene with its corresponding maps. (B, C) Posterior estimates of familiarity effects for Monkey V (B) and Monkey I (C). Points represent the posterior mean of the interaction coefficients between familiarity and the spatial predictors; error bars indicate the 95% HDI. The vertical red line marks 0. Note the negative interaction between meaning and familiarity in both animals. (D–I) Predicted fixation probability as a function of spatial predictors for unfamiliar and familiar scenes. Top row (D, F, H) shows results for Monkey V; bottom row (E, G, I) shows results for Monkey I. Curves visualize fixation probability as a function of meaning (D, E), salience (F, G), and center proximity (H, I) for familiar (colored lines) versus unfamiliar (gray lines) scenes. Solid lines represent the posterior mean prediction, and shaded areas represent the 95% HDI. In both animals, the probability of fixating high-meaning regions remains consistently high regardless of scene familiarity. However, as scene regions become less meaningful, they receive increasingly more fixations in familiar compared to unfamiliar scenes.

Attentional guidance by meaning is enhanced during high engagement.

(A, B) Distribution of engagement duration (total fixation time per scene) for Monkey V (A) and Monkey I (B). Histograms show the variability in scene viewing time across trials. (C, D) Posterior estimates of engagement interactions for Monkey V (C) and Monkey I (D). Points represent the posterior mean of the interaction coefficients between engagement and the spatial predictors; error bars indicate the 95% HDI. The vertical red line marks 0. Note the positive interaction between meaning and engagement in both animals. (E, F) Visualization of the meaning × engagement interaction for Monkey V (E) and Monkey I (F). Curves show the predicted fixation probability as a function of meaning at three representative levels of engagement: low (2 SD, yellow), average (0 SD, red), and high (+2 SD, purple). Solid lines represent the posterior mean prediction, and shaded areas indicate the 95% HDI. In both animals, the slope of the meaning function is steepest when engagement is high (purple curve), indicating that the influence of meaning on gaze selection scales with the subject’s level of scene engagement.