A normative, Bayesian framework for linking center-surround processing in perception and neural activity.

This figure provides a conceptual overview of the study’s approach, from the experimental paradigm and Bayesian model to its connection with behavioral and neural data. (a) Human experimental paradigm. Observers view a central moving dot patch (green) within a moving surround (red) and report the center’s perceived direction. The retinal velocities of the center and surround are depicted by green and red arrows, respectively. (b) Generative model for the observed stimulus velocity. The model assumes the true retinal center velocity, νcenter, is equal to the vector sum of the reference frame velocity, νreference, and the center velocity relative to that reference frame, . This true center velocity then produces a noisy sensory measurement, νobservation. (c) Posterior beliefs. The brain infers posterior beliefs over latent variables, such as causal structure and velocities. The example shows posteriors where the observer infers two possible causal structures (on the left): Intermediate reference (perceiving motion relative to a reference frame intermediate between the center and surround motions) and Independence (perceiving the two patches as moving independently). The posterior over (on the right) reflects this, showing two modes that correspond to these two structures. (d) Connecting the model to behavior. The model is constrained by fitting it to human psychophysical data. The panel shows reports (gray dots) and the model’s posterior predictions (black violins) for an example observer from Shivkumar et al. (2025). The plots display the center’s perceived direction (y-axis) as a function of the directional difference between the center and surround (x-axis). (e) Connecting the model to neural activity. The model’s posterior beliefs, constrained by behavior, are then used to predict the modulation (suppression/facilitation) of single MT neurons in response to complex surround motion. (f) Overall research pipeline. In summary, the framework uses a pipeline where the model is first fitted to behavioral responses (Fig. 2). These subject-specific parameters are then used to compute posterior beliefs for any center-surround stimulus (Fig. 3). Finally, these posteriors are combined with baseline neural tuning curves (measured with a stationary surround) to generate testable predictions (Fig. 4) for how single neurons will respond to complex moving surrounds (Fig. 5).

A hierarchical Bayesian model for inferring the causal structure of visual motion.

(a) Same as Fig. 1a. (b) Same as Fig. 1b. (c-g) Posterior probability assigned to competing causal motion structures. The model considers 12 hypotheses for the relationship between center and surround motion. Here, we show 5 of the 12 structures depicted schematically (arrows: motion; dots: stationary) in the inset boxes. These include four structures assuming a common cause (c-f) and one assuming independent causes (g). The plots show the median posterior probability across five observers (solid grey lines) with 95% credible intervals (shading) as a function of the directional difference between the center and surround stimuli. Observers consistently assigned negligible probability to the independent-causes structure (g). Adapted from Shivkumar et al. (2025).

Causal inference generates complex, multi-modal posterior beliefs about latent motion variables.

Posteriors were computed for observer #2 from Shivkumar et al. (2025) in response to two distinct center-surround stimuli (top vs. bottom row). a: The CS stimuli. The surround motion was fixed at 0° (red arrow), while the center motion differed between the two examples (green arrow). The relative velocity vector (center - surround velocity) is shown in blue. The blue dot in the top row represents zero velocity. b: Posterior probabilities over the five causal motion structures (see Fig. 2c-g). Note how the most probable structure changes depending on the stimulus. c-e: Posterior beliefs about relative motion (). These panels show the full 2D posterior over velocity, and the marginal posteriors for direction and speed (dashed line: posterior beliefs; colored lines: individual mixture components corresponding to the motion structures). The multi-modal and complex shapes reflect the observer’s uncertainty over the causal structure. f and g: Posterior beliefs about the reference frame (νreference) and retinal motion (νcenter). Marginal posteriors for direction and speed are not shown for these variables. To aid visualization, all 2D velocity posteriors (c, f, g) are displayed using log probability density, which makes secondary modes visible. In contrast, the 1D marginal posteriors (d, e) are plotted using linear probability density.

Linking posterior beliefs to neural activity via the neural sampling hypothesis.

Schematic of the linking model used to generate neural predictions from the Bayesian observer model’s posterior distributions. All firing rates are normalized. (a) For a given center-surround stimulus (inset), the model’s posterior belief over a latent variable (here, ) is approximated by a set of discrete samples (blue dots), in line with the neural sampling hypothesis (Fiser et al., 2010; Hoyer and Hyvärinen, 2002). (b) A baseline tuning curve is measured (here, in response to a stimulus with a stationary surround). This baseline serves as a look-up table to convert each sample from the test posterior (here, in response to a stimulus with a moving surround) into a predicted firing rate. (c) The mapped activities for speed and direction are multiplied (symbol ×), yielding a full distribution of predicted firing rates. The mean (solid black line) and variance (black arrow) of this distribution form our predictions for the neuron’s response to the specific stimulus in (a). (d) This entire process is repeated for a wide range of CS stimuli to generate complete 2D tuning curves. Shown are the predicted mean firing rate (top) and excess variance beyond the variability of a simple Poisson process (bottom) for a neuron assumed to represent either retinal motion (νcenter, left) or relative motion (, middle). Model parameters are from observer #2 shown in Fig. 2.

Predicted 4D tuning curves reveal complex signatures of causal inference.

Predictions are shown for observer #2 in Shivkumar et al. (2025) and two hypothetical neurons with different tuning curve shapes. (a)-(b) Schematics of the narrow and wide tuning curves used to generate predictions. (c)-(d) Predicted mean firing rates as a function of center direction (x-axis), surround direction (y-axis), center speed (columns), and surround speed (rows). Axes are relative to the neuron’s preferred direction and speed (where zero on both axes indicates the preferred direction). The key signature of causal inference is the complex pattern in the tuning curve, which consists of a mixture of different diagonal interactions corresponding to the different reference frames under different motion structures. (e)-(f) Predicted excess variance beyond the variability of a simple Poisson process for the same conditions as (c-d). All firing rates are normalized.

Model predictions qualitatively capture diverse surround modulation patterns in empirical MT data.

The model’s predictions are compared to previously published single-neuron data from monkeys. In all heatmaps and plots, axes are expressed relative to the neuron’s preferred direction (labeled as zero). (a) Surround modulation is computed by subtracting the predicted response to a stationary surround (middle heatmap) from the response to a moving surround (left heatmap). The top inset illustrates the stationary-and moving-surround stimulus conditions. The model predicts suppression for neurons encoding (e.g., bottom row) and minimal modulation for neurons encoding νcenter. Facilitation is predicted for νcenter under moderate to high sensory uncertainty (e.g., top row; see main text). The dashed red lines indicate the cross-section corresponding to the stimuli used in the experiments. (b) Comparison to three modulation types from Allman et al. (1985a). We performed qualitative model fitting; see main text and Methods for details. Model predictions assuming coding (red lines) are overlaid on average empirical data (blue lines). The plot shows modulation (facilitation/suppression, x-axis) as a function of surround direction (y-axis), while the center patch moves in the neuron’s preferred direction. The full 2D prediction is shown in the heatmap, with the dashed red line indicating the plotted cross-section, which corresponds to the experimental data we aim to match. Pink text indicates the best-matching observer model and sensory uncertainty level (”low,””moderate,” and”high” correspond to the fitted noise value from Shivkumar et al. (2025), 10x, and 100x that value). (c) Comparison to four functional classes from Born (2000). The format is the same as in (b). The model accounts for the different neural classes by assuming different latent variable encodings. Predictions based on νcenter under moderate uncertainty capture the observed facilitation effects (left column). In contrast, predictions based on capture the observed suppression effects (right column). (d) Comparison to surround modulation from Tanaka et al. (1986). Following the same format, model predictions for (red lines) are compared to example neurons and population data where surround properties were varied: in the top row, direction was varied while speed was fixed at the neuron’s preferred speed; in the bottom row, speed was varied while direction was fixed at the neuron’s preferred direction. The heatmap and its red dashed cross-section correspond to the direction-varying data in the top row. A corresponding heatmap for the speed-varying data (bottom row) is not shown, as it would represent a different 3D parameter space that also contains surround speed.

The model predicts when tuning curve shifts due to surround motion are large and when they are minimal.

In all panels, axes are plotted relative to the neuron’s preferred direction and speed (labeled as zero). Specific points on the tuning curves correspond to the stimulus conditions tested in Born (2000). (a) Predicted shifts in center direction tuning (y-axis) as a function of four different surround directions (x-axis). Violin plots show the distribution of shifts across 45 combinations of observer parameters and neuron tuning profiles (5 observers × 9 tuning curves). The predicted shifts for some of the observers fall within the empirically observed range from Born (2000) (dotted horizontal lines), and most shifts are below predicted by a pure surround-relative model under the surround reference structure (blue dashed line). (b) Example 2D tuning curves (top) and their 1D cross-sections (bottom) for direction. The colored dashed lines in the heatmaps indicate the surround directions for which the center tuning curves below are plotted. Predictions are shown for low (fitted values) and moderate (10x fitted values) levels of sensory uncertainty. The shift of the peaks is smaller for moderate sensory noise, consistent with the points around y=0 in (a) (c) Predicted shifts in center speed tuning, analogous to (a). The x-axis represents surround speed scaled by the tuning width (SD). The predicted shifts are smaller than what is predicted by a pure relative-motion model (blue dashed line) and mostly fall within the empirically measured range (dotted horizontal lines). (d) Example 4D tuning curves and 1D cross-sections for speed. The top row displays two examples of the predicted 4D tuning curves, showing firing rate as a function of center speed (columns) and surround speed (rows). The bottom row shows four example 1D center speed tuning curves. The first and last of these are cross-sections corresponding to the 4D predictions shown directly above them, while the two middle curves are additional examples.