Abstract
Center–surround interactions are a hallmark of visual processing and are especially prominent in area MT, where surround motion can either suppress or facilitate neuronal responses depending on context. However, existing mechanistic descriptions, including divisive normalization, do not explain the full diversity of these effects or their relationship to motion perception. Here, we show that both perceptual and neuronal center–surround phenomena can be understood as consequences of Bayesian causal inference over reference frames. Building on a normative model of motion perception, we derived predictions for the mean responses and variability of single MT neurons across the full fourdimensional space of center and surround directions and speeds. The model generates structured patterns of suppression, facilitation, and coordinate-frame selectivity that qualitatively match the diversity of center–surround effects reported in primate MT. Our results provide a unified computational account linking motion integration and segmentation in perception with contextual response modulation in MT, and yield testable predictions for how the visual system infers and represents reference frames.
1 Introduction
Center-surround (CS) processing is a canonical computation in the brain, shaping vision from its earliest stages (Allman et al., 1985b; Angelucci et al., 2017). Starting in the retina to enhance contrast sensitivity and facilitate the detection of boundaries (Baden et al., 2020; Euler et al., 2014; Gaynes et al., 2022; Kuffler, 1953; Turner et al., 2018), this principle of surround-dependent contextual modulation is repeated throughout the visual hierarchy. In the ventral stream, it refines feature selectivity in V1 (Angelucci et al., 2017; Nurminen & Angelucci, 2014) and shapes object representations in the inferotemporal cortex (Angelucci et al., 2017; Carandini & Heeger, 2012; Zoccolan et al., 2005). This principle is also fundamental to the dorsal stream’s processing of motion (Born, 2000; Born & Bradley, 2005; Britten & Heuer, 1999; DeAngelis & Uka, 2003; Pack et al., 2004; Rust et al., 2006). In the middle temporal area (MT), a moving surround can powerfully suppress a neuron’s response or, under different conditions, facilitate it (Allman et al., 1985a; Born, 2000; Born & Bradley, 2005; Born & Tootell, 1992; Bradley & Andersen, 1998; DeAngelis & Uka, 2003; Huang et al., 2007; Inaba et al., 2011, 2007; Newsome et al., 1988; Pack et al., 2004, 2005; Raiguel et al., 1995; Tanaka et al., 1986; Tzvetanov & Womelsdorf, 2008; Xiao et al., 1997; Zemel et al., 1998). Perceptually, this same stimulus can cause a central motion to be integrated with the surround into a unified whole object motion, or to be segmented from the surround in stark contrast (Bill et al., 2022, 2020; Braddick, 1993; Dakin & Mareschal, 2000; Gershman et al., 2016; Penaloza et al., 2024; Shivkumar et al., 2025; Tadin et al., 2003, 2019; Zarei Eskikand et al., 2020). Despite its ubiquity, a unifying computational model explaining these diverse CS interactions at both the perceptual and neural levels has remained elusive. At the neural level, we have a rich, processlevel description of what happens, but no normative theory for why it happens and what computational principles can produce such profoundly different and context-dependent outcomes. For instance, decades of neurophysiological research have meticulously catalogued the variety of CS interactions in MT, from the direction-selective antagonistic suppression first described by Allman et al. (1985a) and Tanaka et al. (1986), to the reinforcing facilitation and more complex patterns later categorized by Born (2000), Born and Bradley (2005), and Born and Tootell (1992). However, existing models of the V1-MT pathway, often based on linear-nonlinear integration or divisive normalization, can account for a subset of suppressive phenomena but struggle to explain the full range of effects (Adelson & Bergen, 1985; Albright, 1984; Barth & Watson, 2000; Movshon et al., 1985; Qian et al., 1994a; Rust et al., 2006; Simoncelli & Heeger, 1998; Wilson et al., 1992; Wilson & Kim, 1994; Zarei Eskikand et al., 2020). Conversely, influential models of perception successfully describe phenomena like motion integration and segmentation but have not been linked to the rich diversity of single-neuron CS response properties (Bill et al., 2022, 2020; Braddick, 1993; Dakin & Mareschal, 2000; Gershman et al., 2016; Penaloza et al., 2024; Shivkumar et al., 2025; Tadin et al., 2003, 2019; Zarei Eskikand et al., 2020).
Recent normative models of motion perception suggest that causal inference over reference frames provides a unifying framework for both motion integration and segmentation (Bill et al., 2022, 2020; Gershman et al., 2016; Penaloza et al., 2024; Shivkumar et al., 2025). Causal inference is a Bayesian computational motif for inferring when to combine information across cues (Körding et al., 2007), and has recently been proposed as a universal computation across the cortex (Shams & Beierholm, 2022). This line of work proposed that the brain infers the causal structure underlying the elements moving in a visual scene, and uses that structure to determine the reference frames within which to represent the motion of each element. These models successfully explained the observed behavior in psychophysical experiments using CS motion stimuli, effectively bridging the gap between normative computations and behavior (Bill et al., 2022, 2020; Gershman et al., 2016; Penaloza et al., 2024; Shivkumar et al., 2025).
In this study, we built on one of these recently proposed models, which accurately captures the perceptual effects of CS stimuli (Fig. 1a,d) using a biologically plausible architecture (Fig. 1b) (Shivkumar et al., 2025). From this model, we derived neural predictions for mean responses (tuning curves, Figs. 4 and 5, and Figs. S2 to S4) and neural variability (Fig. 5, and Figs. S2 and S3) across the entire 4-dimensional (motion direction and speed in center and surround) stimulus space. The neural predictions of Bayesian causal inference captured a wide range of neural CS interaction effects (Figs. 5 to 7, and Figs. S2 to S5). The predicted CS interactions reflected both the representation of motion relative to a reference frame, inferred by causal inference, and the representation of motion in retinal coordinates. Finally, we show that these predictions are compatible with classic neurophysiological findings (Figs. 6 and 7, and Fig. S5), providing a normative explanation for why the surround motion facilitates or suppresses neural responses, and when these effects are direction dependent. Our results suggest future experiments – both to provide more comprehensive tests of the normative theory and to link it to circuit-level mechanistic models.

A normative, Bayesian framework for linking center-surround processing in perception and neural activity.
This figure provides a conceptual overview of the study’s approach, from the experimental paradigm and Bayesian model to its connection with behavioral and neural data. (a) Human experimental paradigm. Observers view a central moving dot patch (green) within a moving surround (red) and report the center’s perceived direction. The retinal velocities of the center and surround are depicted by green and red arrows, respectively. (b) Generative model for the observed stimulus velocity. The model assumes the true retinal center velocity, νcenter, is equal to the vector sum of the reference frame velocity, νreference, and the center velocity relative to that reference frame, 

2 Results
Our paper is structured as follows. First, we introduce the Bayesian causal inference model we adapt, along with the CS motion experiment and the behavioral results used to fit the model parameters. Next, we examine the variety and complexity of posterior beliefs inferred by the model in CS motion tasks. We then describe how neural predictions – including tuning curves and their variability – are derived from the model. Following this, we characterize the CS interactions present in these predictions. Finally, we demonstrate that Bayesian causal inference qualitatively accounts for a wide range of neural CS interactions observed in MT.
2.1 Task, model, and prior empirical results
We used the model of Shivkumar et al. (2025) (Fig. 2c-g) with parameters inferred from behavioral data of five human observers in their experiment. Participants reported their perceived motion direction of a central target patch composed of multiple moving dots (Fig. 2a, green dots), while the target was surrounded by a ring of moving dots (Fig. 2a, red dots). The experiment was designed to measure how participants’ motion direction estimates are influenced by the directions and speeds of both the center and the surround. This stimulus is commonly used to investigate CS processing in motion perception and in neurophysiology (e.g., Adelson and Bergen, 1985; Born, 2000; Tadin et al., 2008, 2019; Tanaka et al., 1986).

A hierarchical Bayesian model for inferring the causal structure of visual motion.
(a) Same as Fig. 1a. (b) Same as Fig. 1b. (c-g) Posterior probability assigned to competing causal motion structures. The model considers 12 hypotheses for the relationship between center and surround motion. Here, we show 5 of the 12 structures depicted schematically (arrows: motion; dots: stationary) in the inset boxes. These include four structures assuming a common cause (c-f) and one assuming independent causes (g). The plots show the median posterior probability across five observers (solid grey lines) with 95% credible intervals (shading) as a function of the directional difference between the center and surround stimuli. Observers consistently assigned negligible probability to the independent-causes structure (g). Adapted from Shivkumar et al. (2025).
The hierarchical Bayesian causal inference model from Shivkumar et al. (2025) proposes that the brain jointly infers the appropriate reference frames (vreference in Fig. 2) for each stimulus and its motion in that reference frame (
The first of the four main causal structures is classical motion”integration” (Braddick, 1993; Tadin & Lappin, 2005; Tadin et al., 2003) (see Fig. 2c; generative model in Fig. S1c). Observers infer this structure as the most probable when the differences between center and surround patches’ motions are minimal, leading to a”cue-combined” perception in which both motions are integrated (Fig. 2c, curves). In contrast, motion segmentation (Braddick, 1993; Tadin et al., 2019), the second causal structure, arises when the motion differences are significant. We term this second structure”surround reference” (Fig. 2d and Fig. S1e) because observers segment the center’s motion from the surround, perceiving it relative to the surround’s motion. Interestingly, this structure was dominant for only one of five observers, playing a less significant role for the other participants (Fig. 2d).
The third and fourth causal structures account for the behavioral variability beyond canonical integration and segmentation. In the third structure, observers perceive the surround motion relative to the center group, using the center as a reference frame (Fig. 2e and Fig. S1g). Since observers in this task report the motion direction of the center patch, this”center reference” structure generates responses that are unaffected by the surround patch, much like the independence structure (Fig. 2g, Fig. S1k). The fourth structure captures when observers adopt an”intermediate reference” frame, between the center and surround patches’ motions, perceiving motions of both patches relative to this intermediate frame. Most observers put significant probability mass on this structure across many different CS stimulus configurations (Fig. 2f and Fig. S1i).
1In an experiment where participants’ heads and eyes are fixed, retinal, ego-centered, and world coordinates are indistinguishable – and our predictions do not depend on that distinction.
These latter two structures can be easily miscategorized as classical integration or segmentation. For instance, the center reference structure can be confused with motion integration when center and surround motions are very similar, because the theoretically integrated percept would lie very close to the center’s actual motion. Similarly, the intermediate reference structure can be mistaken for motion segmentation if one assumes that the subtraction of the surround’s velocity from the center’s is incomplete (Shivkumar et al., 2025). The Bayesian causal inference model resolves these ambiguities, differentiating between all these structures and explaining diverse behavioral phenomena through a single computational principle.
2.2 Causal inference produces complex posterior beliefs
In order to generate neural predictions for arbitrary CS stimuli, we independently varied CS velocities (consisting of direction and speed each) and computed the posteriors over all latent variables in the Bayesian causal inference model using parameters fitted to the five observers in Shivkumar et al. (2025). The posteriors can be computed using any parameter values hypothesized by the researcher. However, the parameters fitted to real behavioral responses will reveal posterior beliefs that are backed up by empirical data. The model (Fig. 2b) contains three latent variables that are involved in representing the center patch’s motion: vreference, vcenter, and 



Causal inference generates complex, multi-modal posterior beliefs about latent motion variables.
Posteriors were computed for observer #2 from Shivkumar et al. (2025) in response to two distinct center-surround stimuli (top vs. bottom row). a: The CS stimuli. The surround motion was fixed at 0° (red arrow), while the center motion differed between the two examples (green arrow). The relative velocity vector (center - surround velocity) is shown in blue. The blue dot in the top row represents zero velocity. b: Posterior probabilities over the five causal motion structures (see Fig. 2c-g). Note how the most probable structure changes depending on the stimulus. c-e: Posterior beliefs about relative motion (
To illustrate the complexity arising from the causal inference process, we first examine the model’s posterior beliefs for a nominally simple stimulus where the center and surround move identically (Fig. 3, top row). Contrary to expectation for such a simple stimulus, most observers did not infer a single causal structure. Instead, they assigned significant probability mass to a mixture of competing structures. For instance, observer #2 in Shivkumar et al. (2025) assigned high probability to three structures: motion integration, the center patch acting as a reference frame, and an intermediate reference frame. This mixture of inferred structures directly shapes the posterior distributions. The posterior over the reference frame (vreference; Fig. 3f) consequently shows three distinct peaks corresponding to these probable structures. The dominant peak aligns with motion integration (around 0° direction and 4 deg/s speed), while secondary modes correspond to the center-reference (middle peak at 3.7 deg/s) and intermediatereference (left peak at 2.8 deg/s) interpretations. Notably, the peak for the center-reference mode reflects a slower motion than the true stimulus speed; this discrepancy is a direct consequence of the slow-speed prior inherent in the model (Shivkumar et al., 2025).
Correspondingly, the posterior over relative velocity (
The posterior over retinal motion (vcenter; Fig. 3g), while typically unimodal, also displays a subtle bimodality in this case. This corresponds to the center-reference structure, which, as noted, was inferred to be slightly slower than the true stimulus speed. This could indicate a small perceptual bias for this observer even under this simple CS stimulus. Together, this example demonstrates that due to uncertainty over the underlying causal structure, the model’s posterior beliefs are highly complex even when the stimulus itself seems unambiguous to the experimenter.
A second example highlights how the model arbitrates between competing causal structures when the directional difference between the center and surround motion is large (Fig. 3, bottom row). The same example observer infers two probable structures for this CS stimulus (Fig. 3b): (1) that the motions are independent, or (2) that they share a common intermediate reference frame. Each of these interpretations corresponds to a distinct mode in the bimodal posteriors for vreference and 


2.3 A novel sampling-based method for predicting single neural activity from posterior beliefs
To bridge the gap between our normative, Bayesian causal inference model and neural data, we introduce a novel linking method which posits that single-neuron activity reflects samples from the model’s posterior beliefs (i.e., the neural sampling hypothesis; Fiser et al., 2010; Hoyer and Hyvärinen, 2002). Neural sampling enables us to generate neural predictions that reflect the complex, multi-modal nature of the model posteriors. Our method provides a general framework for predicting single neural responses to any”test” stimulus from any Bayesian model Fig. 4. Specifically, it posits that the response to the test stimulus can be derived from the same neuron’s tuning curve measured for a simpler”baseline” stimulus set. This baseline tuning curve acts as a”look-up table”: each sample from the test posterior is converted into an individual firing rate (using the look-up table), and aggregating all these rates forms a full distribution of predicted responses (more details about the method are provided in the Methods 4.3). This method can be applied in any domain, as long as its two key components are well-defined: (1) the Bayesian model parameters to compute posterior beliefs and (2) the tuning curve parameters to link posterior samples to firing rates.

Linking posterior beliefs to neural activity via the neural sampling hypothesis.
Schematic of the linking model used to generate neural predictions from the Bayesian observer model’s posterior distributions. All firing rates are normalized. (a) For a given center-surround stimulus (inset), the model’s posterior belief over a latent variable (here, 

To apply this method to our CS motion stimuli, we define these two components as follows. For the Bayesian model parameters, we used the unique parameters fitted to each of the five human observers in Shivkumar et al. (2025). For the tuning curve parameters, we used the median of the parameters of empirically measured MT tuning curves from monkeys (DeAngelis & Uka, 2003). While this combines human behavioral models with monkey neurophysiology, it provides a strong proof of principle. Ideally, both behavioral and neural data would be collected from the same subjects.
We then implemented the linking procedure (Fig. 4). First, for a given test stimulus (moving surround), we generated samples from the model’s posterior distribution over a latent variable (Fig. 4a). Second, we used the baseline tuning curve (derived from the median parameters of MT neurons’ tuning curves measured to a stationary surround in DeAngelis and Uka (2003)) as a look-up table (Fig. 4b) to map each posterior sample to a predicted firing rate yielding a full distribution of predicted firing rates (Fig. 4c). Finally, we calculated the mean and variance of this distribution, which serve as our predictions for the neuron’s mean firing rate and response variability (Fig. 4d). In essence, this method leverages a neuron’s simple motion tuning curve to predict its complex response to center motion with moving surround. It achieves this by assuming the neuron’s firing is determined not by the raw stimulus, but by the brain’s internal belief about the probable motion components and their reference frames, as represented by the samples from the model’s posteriors (full details of the method are provided in Methods 4.3).
2.4 Predicted 4D tuning curves reveal complex signatures of causal inference
Applying our linking method, we generated neural predictions for neurons representing posterior beliefs over either vcenter or 
First, we tested the hypothesis that the two main classes of MT neurons (Born & Bradley, 2005) – those insensitive to the surround and those with strong surround modulation – correspond to neurons encoding vcenter and 



Predicted 4D tuning curves reveal complex signatures of causal inference.
Predictions are shown for observer #2 in Shivkumar et al. (2025) and two hypothetical neurons with different tuning curve shapes. (a)-(b) Schematics of the narrow and wide tuning curves used to generate predictions. (c)-(d) Predicted mean firing rates as a function of center direction (x-axis), surround direction (y-axis), center speed (columns), and surround speed (rows). Axes are relative to the neuron’s preferred direction and speed (where zero on both axes indicates the preferred direction). The key signature of causal inference is the complex pattern in the tuning curve, which consists of a mixture of different diagonal interactions corresponding to the different reference frames under different motion structures. (e)-(f) Predicted excess variance beyond the variability of a simple Poisson process for the same conditions as (c-d). All firing rates are normalized.
Examining the predictions for 
The mechanism for this diagonal is twofold. First, the observer’s posterior over 
It is important to note that all predictions in (and Figs. S2 and S3) are displayed relative to the neuron’s preferred speed and direction, making these 4D tuning curves generalizable to any neuron with a similar tuning profile. The specific patterns of CS interaction, however, depend on two key factors. First is the neuron’s intrinsic tuning shape, with narrower and broader tuning curves yielding qualitatively different interactions (Fig. 5, left vs. right columns). Second is the observer’s perception, as variations in the fitted model parameters lead to substantial differences in the predicted responses across individuals (compare panels in Figs. S2 to S4).
Importantly, causal inference across potential reference frames shapes the diagonal interaction. When an observer infers a mixture of probable motion structures, the model’s posterior becomes multimodal (Fig. 3). This mixing of beliefs translates into complex response patterns. For instance, in Fig. 4d (topright panel), the neural prediction for observer #2 is shaped by a mixture of three stimulus-dependent interpretations: (1) for large CS directional differences, the patches are inferred as independent, producing vertical bands; (2) for moderate differences, an intermediate reference frame is used, creating a curved peak near the diagonal; and (3) for similar directions, integration occurs (i.e., relative motion, 
A second key characteristic of our Bayesian models is the assumption that people not only form beliefs about motion but also represent the uncertainty in those beliefs. This allows us to generate neural predictions capturing this uncertainty as variability in the firing rates. Thus, instead of computing the mean of the predicted firing rates mapped from posterior samples, we computed the variance of these firing rates across the same four dimensions (CS directions and speeds) and found it also exhibits diagonal structures (Fig. 4d, bottom panels, Fig. 5e & f, and Figs. S2 and S3, bottom rows). This diagonal structure appears because the highest uncertainty in posterior beliefs – and thus the highest predicted firing rate variance – occurs when posteriors are multimodal. This happens when an observer considers multiple causal structures to be simultaneously plausible, for example, when CS directions are similar making both integration and segmentation with respect to an intermediate reference plausible (see the high variability regions along the diagonal in Fig. 5e & f) or far apart making both independence and segmentation plausible (see the regions further away from the diagonal in Fig. 5e & f).
It is important to note that this predicted neural variability represents the posterior uncertainty over the latent variables (vcenter and 
2.5 Comparing causal inference predictions to the empirical literature
The role of area MT in encoding motion remains a topic of debate. While some neurons were found to represent retinal motion and are unaffected by surround motion (Inaba et al., 2011, 2007; Newsome et al., 1988), many others appear to encode motion relative to the surround (Allman et al., 1985a; Born & Tootell, 1992; Huang et al., 2007, 2008; Tanaka et al., 1986; Tzvetanov & Womelsdorf, 2008). Complicating this picture, studies have found a wide variety of additional CS interactions, such as direction-selective suppression, non-selective suppression, and even direction-selective and non-selective facilitation (Allman et al., 1985a; Born, 2000; Born & Bradley, 2005; Tanaka et al., 1986). No single theory yet captures all these diverse CS effects (Born & Bradley, 2005). In this section, we show that our causal inference model predicts this observed diversity. These varied response patterns emerge naturally from the model, depending on whether a given neuron encodes the retinal-centric (vcenter) or relative-motion (
We qualitatively compared our model’s predictions against published data from three key neurophysiology studies that used conceptually identical CS stimuli: Allman et al. (1985a) (61 single units, 3 owl monkeys), Tanaka et al. (1986) (105 single units, 4 Japanese monkeys), and Born (2000) (169 single units, 18 owl monkeys). A key challenge in this comparison is that the model’s predictions depend on both observer-specific model parameters and neuron-specific tuning curve parameters. Ideally, one would fit the model to an animal’s behavior and use tuning curves recorded from the same animal. Since this was not possible, we generated a diverse set of predictions using the five sets of model parameters from the human observers in Shivkumar et al. (2025) and a library of nine hypothetical but plausible tuning curves (Fig. S5a). This library was constructed by combining three distinct tuning widths for both speed and direction (3 speed widths × 3 direction widths), approximating the 25th, 50th, and 75th percentiles of MT tuning curve parameters measured in DeAngelis and Uka (2003). To evaluate whether these observed MT responses support our model, we compared this large set of predictions against the empirically measured responses. We first categorized the recorded neurons by their functional type of CS interaction. This grouping allowed us to assess whether each distinct neural response pattern could be explained by the causal inference framework. For each empirical CS interaction type, we then identified the best-matching prediction from our library (i.e., the one with the lowest mean squared error; more details are provided in Methods 4.6).
2.5.1 Neurons unaffected by surround motion
A subset of MT neurons exhibits no measurable modulation by surround motion. This response profile (which 8% of MT neurons show in Allman et al. (1985a)) indicates that these cells respond selectively to center motion while remaining unaffected by motion in the surround. This response profile aligns with classical accounts of MT as representing motion in retinal coordinates (Inaba et al., 2011, 2007; Newsome et al., 1988). This pattern is predicted by the causal inference model for neurons representing the retinal-centric latent motion variable (vcenter) as the posterior over this variable is largely unaffected by surround motion (Fig. 4d, left column, and Fig. S3).
2.5.2 Neurons exhibiting direction-selective antagonistic suppression
The most common class of MT neurons exhibits strong suppression when the center and surround move in the same direction, with suppression decreasing as their directions diverge (Allman et al., 1985a; Born, 2000; Tanaka et al., 1986). The causal inference model predicts this suppression for neurons representing the relative velocity latent variables (


Model predictions qualitatively capture diverse surround modulation patterns in empirical MT data.
The model’s predictions are compared to previously published single-neuron data from monkeys. In all heatmaps and plots, axes are expressed relative to the neuron’s preferred direction (labeled as zero). (a) Surround modulation is computed by subtracting the predicted response to a stationary surround (middle heatmap) from the response to a moving surround (left heatmap). The top inset illustrates the stationary-and moving-surround stimulus conditions. The model predicts suppression for neurons encoding 



2.5.3 Neurons showing non-directional surround suppression
Approximately 30% of MT neurons in Allman et al. (1985a) (and a smaller fraction in other studies) exhibited suppression from the surround regardless of its direction (Born, 2000; Tanaka et al., 1986). In our model, such tuning arises for a neuron encoding 

2.5.4 Neurons with reinforcing surround modulation
Our model can also explain the”reinforcing modulation” observed in a subset of MT neurons, which constitutes a particular challenge for normalization-based models. For those neurons, responses increase when the surround moves in the same direction as the center (Born, 2000). This facilitation is predicted for two of our five observer models under specific conditions: that the neuron encodes the retinal-centric variable (vcenter) and that sensory uncertainty is moderate (10x the value inferred from human behavior) (Fig. 6c, left column).
2.5.5 Neurons showing surround facilitation orthogonal to their preferred direction
It might seem that our model is capable of explaining all observed phenomena, potentially rendering it unfalsifiable. We stress, however, that this is not the case. Our framework is constrained both by model parameter values fitted to behavioral data and by measured baseline tuning curves. Consequently, the model can only account for a small, specific subset of all possible four-dimensional CS tuning curves, which makes the agreement with the data described in previous sections a non-trivial success. This point is illustrated by the fact that a small fraction of neurons (8%) in Allman et al. (1985a) were facilitated by surround motion orthogonal to their preferred direction (Fig. 6b, middle column). Our model does not reproduce this effect under any parameter settings, suggesting these neurons may perform a different computation than the one described by our framework. It is noteworthy, however, that this specific finding involved a small proportion of neurons and lacked replication in later studies (Born, 2000; Born & Tootell, 1992; Tanaka et al., 1986).
2.5.6 Neurons showing speed-dependent suppression
Tanaka et al. (1986) reported that surround suppression in MT is modulated by surround speed in addition to direction. Our model successfully accounts for the primary pattern they observed, where higher surround speeds led to stronger suppression (Fig. 6d, neurons 3 and 4). This effect is robustly predicted for a neuron encoding 
2.5.7 Neurons with minimal shifts in tuning under surround manipulation
To directly test whether neurons encode relative motion with respect to the surround (i.e., surround reference causal motion structure), Born (2000) recorded from neurons while jointly varying center and surround direction or speed. If neurons truly represent center motion relative to the surround, we would expect the center tuning to shift accordingly. However, the authors found little to no tuning curve shift, merely an overall response modulation. While this finding seems to challenge pure relative-motion models, the small number of neurons in these experiments (3 for direction, 9 for speed) and sparse stimulus sampling limit strong conclusions.
Our causal inference model demonstrates that a neuron representing relative motion (i.e., 

The model predicts when tuning curve shifts due to surround motion are large and when they are minimal.
In all panels, axes are plotted relative to the neuron’s preferred direction and speed (labeled as zero). Specific points on the tuning curves correspond to the stimulus conditions tested in Born (2000). (a) Predicted shifts in center direction tuning (y-axis) as a function of four different surround directions (x-axis). Violin plots show the distribution of shifts across 45 combinations of observer parameters and neuron tuning profiles (5 observers × 9 tuning curves). The predicted shifts for some of the observers fall within the empirically observed range from Born (2000) (dotted horizontal lines), and most shifts are below predicted by a pure surround-relative model under the surround reference structure (blue dashed line). (b) Example 2D tuning curves (top) and their 1D cross-sections (bottom) for direction. The colored dashed lines in the heatmaps indicate the surround directions for which the center tuning curves below are plotted. Predictions are shown for low (fitted values) and moderate (10x fitted values) levels of sensory uncertainty. The shift of the peaks is smaller for moderate sensory noise, consistent with the points around y=0 in (a) (c) Predicted shifts in center speed tuning, analogous to (a). The x-axis represents surround speed scaled by the tuning width (SD). The predicted shifts are smaller than what is predicted by a pure relative-motion model (blue dashed line) and mostly fall within the empirically measured range (dotted horizontal lines). (d) Example 4D tuning curves and 1D cross-sections for speed. The top row displays two examples of the predicted 4D tuning curves, showing firing rate as a function of center speed (columns) and surround speed (rows). The bottom row shows four example 1D center speed tuning curves. The first and last of these are cross-sections corresponding to the 4D predictions shown directly above them, while the two middle curves are additional examples.
Thus, the data from Born (2000) do not rule out relative motion coding in MT. They may challenge the assumption of a fixed, pure surround-based reference frame. However, even this conclusion is limited by the small number of neurons and sparse stimulus sampling used in the experiments. Importantly, our model provides a way to resolve this ambiguity. It generates specific, quantitative predictions for the expected shift based on an observer’s behavioral parameters and their neuron’s baseline tuning. In fact, for some observers from Shivkumar et al. (2025), the minimal shifts reported by Born (2000) are exactly what our Bayesian causal inference model would predict. This highlights the need for such a principled model to guide new, densely-sampled experiments that can definitively distinguish among competing theories of motion encoding.
3 Discussion
In this study, we demonstrated that a normative Bayesian causal inference model of motion perception can qualitatively account for the diverse center-surround (CS) interaction effects reported in the primate middle temporal area (MT). By deriving comprehensive predictions from a model fitted to human psychophysical data, we showed that the diverse, and often seemingly contradictory, response properties of MT neurons can be understood as principled manifestations of a single underlying computation: inferring the latent causal structure of the visual world. This framework unifies disparate findings in motion perception and neurophysiology, reinterprets long-standing debates about neural coding in MT, and provides a clear, theory-driven path for future experiments.
3.1 A single model for diverse neural effects
The function of CS interactions in area MT has been a puzzle, primarily due to the diversity of the observed phenomena. Our framework reframes this diversity not as a catalogue of many distinct neuron types with fixed properties, but as the consequence of a single inferential process: Bayesian causal inference.
Classic neurophysiological studies categorized the variety of surround modulations. A small subset of MT neurons (→8%) shows no surround modulation at all, consistent with the traditional view of MT encoding motion in purely retinal coordinates (Inaba et al., 2011, 2007; Newsome et al., 1988). Our model naturally accounts for these neurons through the retinal-centric latent variable (vcenter). The most common class of neurons exhibits direction-selective antagonistic suppression, strongest when center and surround move together (Allman et al., 1985a; Born, 2000; Born & Bradley, 2005; Born & Tootell, 1992; Tanaka et al., 1986). Our model robustly reproduces this effect as a signature of relative motion coding (
Crucially, the model also explains phenomena that have challenged simpler descriptive or mechanistic models. For instance, reinforcing modulation (facilitation), where responses are enhanced by samedirection surrounds, has been difficult to reconcile with a purely suppressive mechanism like classic divisive normalization (Coen-Cagli et al., 2015). Our framework predicts this exact effect for neurons encoding retinal-centric motion (vcenter) under conditions of moderate to high sensory uncertainty, providing a normative explanation for its emergence. Our model also resolves the key finding from Born (2000) that center tuning curves modulate their gain but do not shift their peak in response to surround motion – a result that poses a direct challenge to any pure surround-relative coding scheme. The causal inference model predicts this when there is larger uncertainty about the causal structures, and hence ambiguity about the correct reference frame. Because the inferred reference is often intermediate – not locked to the surround – the posterior belief about the center’s motion is modulated while its peak shifts only minimally, in line with empirical observations.
Our framework not only reinterprets past findings but also generates new, testable predictions. The model makes directly testable predictions for surround modulation. Facilitation, for instance, is predicted for neurons encoding retinal-centric motion (vcenter) under high sensory uncertainty. In contrast, suppression is the hallmark of neurons encoding relative motion (
3.2 Reinterpreting MT: from motion filters to causal inference
Our findings suggest a conceptual shift in our understanding of MT’s role. In traditional models of neural activity, CS processing in motion perception arises from a cascade of linear and nonlinear computations that integrate local motion signals into the representation of a more global motion of the visual scene (Adelson & Bergen, 1985; Albright, Barth & Watson, 2000; Movshon et al., 1985; Rust et al., 2006; Simoncelli & Heeger, 1998; Wilson et al., 1992). In these models, V1 simple cells function as linear spacetime filters, detecting only local motion energy within their passband, without encoding the global motion in the scene. The role of V1 complex cells remains debated: some models suggest they compute local averages of V1 activity to represent phase-insensitive motion energy (Adelson & Bergen, 1985; Albright, 1984; Movshon et al., 1985; Rust et al., 2006; Simoncelli & Heeger, 1998), while others propose they already signal global motion direction by selectively responding to endpoints of long contours, thereby avoiding the aperture problem (Pack et al., 2003a; Pack et al., 2003b; Zarei Eskikand et al., 2016). For MT, most models posit that its cells integrate local motion energy signals from V1 and apply nonlinear computations, such as divisive normalization and half squaring, to compute the global motion in the visual scene (Adelson & Bergen, 1985; Albright, 1984; Movshon et al., 1985; Rust et al., 2006; Simoncelli & Heeger, 1998). Alternative theories argue that nonlinear computations, such as texture boundary motion and feature extraction, occur earlier in V1 and V2, where global motion in the visual scene may already be represented. In this framework, MT primarily pools these inputs and reduces noise (Barth & Watson, 2000; Wilson et al., 1992; Zarei Eskikand et al., 2019, 2020, 2016). While these traditional models successfully explain many aspects of neural activity along the V1–MT pathways, these models are fundamentally descriptive; they explain how a computation might be implemented but not why it takes the particular form it does. They struggle to explain the functional purpose behind the diverse CS effects and do not clarify how the brain determines the appropriate reference frame for motion integration or segmentation. Consequently, these models fail to explain CS effects in perception.
In contrast, our model assumes that neural activity represents posterior beliefs about latent variables in a generative model of retinal motion. These latent variables represent not only the velocities of moving objects but also the reference frames within which the velocities are represented, both of which are inferred from sensory input. The resulting relationships between the 4-dimensional stimulus (center and surround motion, each with speed and direction) and neural responses are complex and difficult to capture with a feedforward model. Furthermore, these relationships are dynamically modulated by contextual variables that influence the brain’s belief about causal structure, such as stimulus uncertainty (number of elements and/or contrast) and the distance between center and surround. These variables help account for reported differences between studies.
Therefore, our results reframe MT not merely as a sophisticated filter for integrating V1 signals using simple, fixed rules, but as an inferential system that deduces the relationships between moving objects to form coherent perceptual representations. This perspective not only provides the missing functional role (”why”) for the complex repertoire of CS interactions – suggesting they emerge as a solution to a causal inference problem – but can also explain the wide range of empirical tuning functions.
3.3 Bridging perceptual phenomena and neural circuits
A central contribution of our work is bridging the gap between perceptual studies of motion and their neural underpinnings. Decades of psychophysical research have established two dominant effects in motion perception: integration and segmentation (Braddick, 1993; Dakin & Mareschal, 2000; Penaloza et al., 2024; Tadin & Lappin, 2005; Tadin et al., 2003; Zarei Eskikand et al., 2024, 2019). Integration is thought to improve the signal-to-noise ratio, especially in low-contrast conditions (Britten & Heuer, 1999; Britten et al., 1993; Tadin et al., 2003), while segmentation enhances the perception of relative motion, aiding in object tracking and parsing scenes relative to a reference frame (Huang et al., 2007; Johansson, 1950; Tadin et al., 2019). MT neurons seem to exhibit shifting responses between integration and segmentation in a dynamic, stimulus-dependent way (Huang et al., 2007, 2008).
Our results provide a unifying explanation for these observations, both at the perceptual and at the neural levels. The causal inference model casts both integration and segmentation as outcomes of probabilistic inference over latent motion structures (Penaloza et al., 2024; Shivkumar et al., 2025; Yang et al., 2021). We show that MT neurons can exhibit response properties consistent with either computation by representing posterior beliefs over latent variables that flexibly change with stimulus context. This offers a principled account of how the same population of neurons can support both perceptual states (integration and segmentation) and why certain CS stimuli elicit ambiguous or mixed neural responses. At the circuit level, divisive normalization is often proposed as a canonical computation for CS effects (Carandini & Heeger, 2012; Coen-Cagli et al., 2015). While simple divisive normalization models cannot capture the full range of effects predicted by our framework, extended variants that include multiplicative interactions within their normalization pool may provide a mechanistic approximation (Penaloza et al., 2023). Several mechanistic models have been proposed to explain the integration and segmentation properties of MT neurons (Kim & Wilson, 1997; Qian et al., 1994a, 1994b; Wilson & Kim, 1994; Zarei Eskikand et al., 2019, 2020). These models generally fall into three categories: (1) those attributing segmentation to inhibitory interactions between direction-tuned neurons (Kim & Wilson, 1997; Qian et al., 1994a, 1994b; Wilson & Kim, 1994), (2) those proposing distinct populations of neurons specialized for integration versus segmentation (Beck & Neumann, 2011; Zarei Eskikand et al., 2020), and (3) those positing a single, adaptive population that dynamically adjusts its CS interactions based on stimulus context (Zarei Eskikand et al., 2019). While these models can account for many CS interactions, they struggle to explain cases that contradict simple relative motion encoding (e.g., Born (2000) and Tanaka et al. (1986)). It also remains unclear if circuit models based on divisive normalization or stabilized supralinear networks (Rubin et al., 2015) can support the probabilistic computations required by Bayesian causal inference. Building on Festa et al. (2014) and Echeveste et al. (2020), future work could explore this connection, using the detailed predictions from our framework as a clear computational target for mechanistic modeling.
3.4 Limitations and Future Directions
This study provides a qualitative, proof-of-principle demonstration that a normative model of perception can account for a wide range of neurophysiological data. Two main limitations, however, point toward crucial avenues for future research.
First, linking Bayesian models to neural activity requires an assumption about the neural code. We used the neural sampling hypothesis (e.g., Buesing et al., 2011; Fiser et al., 2010; Haefner et al., 2016; Hoyer and Hyvärinen, 2002; Orbán et al., 2016), which belongs to a broad class of Linear Distributional Codes (LDCs) that also includes Distributed Distributional Codes (DDCs) (Lange & Haefner, 2022). While our predictions for mean tuning curves are generalizable across LDCs (see Methods 4.3 and Lengyel et al., 2023), our predictions for response variability are specific to sampling-based codes. The broader debate over whether the brain uses LDCs or other encoding schemes, like Probabilistic Population Codes (PPCs), is ongoing (Beck et al., Fiser et al., 2010; Haefner et al., 2024; Lange et al., Ma et al., 2006; Pouget et al., Tajima et al., 2016; Ujfalussy & Orbán, 2022; Vértes & Sahani, 2018, 2019). A critical next step is to derive predictions for non-LDC codes like PPCs, which would generate directly competing and testable hypotheses.
Second, our comparison to historical data is qualitative. While we show that the patterns of neural activity are consistent with our model, rigorous quantitative model comparison against alternatives (like extended divisive normalization models) is required. This would be best achieved by fitting the model parameters to an animal’s behavior while simultaneously recording from MT neurons. Our framework is ideally suited to guide such work, as it can generate predictions for any stimulus, allowing researchers to design optimized experiments that can maximally differentiate between competing models (i.e., controversial stimuli; Golan et al., 2020).
3.5 Conclusion
This study establishes Bayesian causal inference as a unifying normative principle for center-surround interactions in motion. Our framework parsimoniously explains a wide array of disparate findings – from surround suppression and facilitation to minimal tuning curve shifts in response to surround motion – as the principled outputs of a single computation over latent reference frames. By reinterpreting classic neurophysiological data and bridging the long-standing gap between models of perception and neural activity, this work reframes our understanding of MT’s role from that of a motion filter to that of an inference engine. This provides a clear foundation for designing new, theory-driven experiments to dissect the circuit-level implementation of these abstract cognitive computations.
4 Methods
4.1 The behavioral task and data source
The Bayesian causal inference model was constrained using behavioral data from five observers in Experiment 1 of Shivkumar et al. (2025). In that study, observers used a dial to report their perceived motion direction of a central patch of green dots surrounded by a peripheral patch of red dots (Fig. 2a). The surrounding dots were either stationary or moved horizontally (0 or 180°) at a speed of 1 deg./sec. Stimuli were displayed at an eccentricity of 5° in the periphery. On each trial, the direction of the central patch was randomly selected from the set 0°, ±2.5°, ±5°, ±10°, ±20°, ±45°. Both the center and surround patches shared a common horizontal velocity (0°or 180°), resulting in the center’s velocity relative to the surround being ±90°, depending on the direction of the center patch. After a fixation period of 0.5 seconds, the stimulus appeared and oscillated back and forth for 1.5 cycles. The patch envelopes moved at a constant velocity and reversed direction after 1.5 seconds, following a square wave velocity profile with a period of 3 seconds. This oscillatory motion ensured that the envelopes remained within a fixed area on the screen. For a detailed description of the stimuli, task, and design, see Shivkumar et al. (2025).
When the surround was stationary, reported directions closely followed the true motion of the center patch as displayed on the screen. However, when the surround patch was moving, observers’ responses deviated systematically from the true motion of the center patch. For small center directions, responses were biased towards zero degrees (the surround direction), consistent with observers integrating the center and surround velocities. For larger center directions, responses were biased towards 90°, indicating that observers perceived the relative velocity between the center and surround. Additional findings from this experiment, along with further analyses, are detailed in Shivkumar et al. (2025).
4.2 The Bayesian causal inference model
The Bayesian causal inference model described in Shivkumar et al. (2025) was used to derive posterior beliefs for center-surround (CS) motion stimuli. In this generative model, the retinal velocity of each visual element (i.e., the center and surround patches) is the sum of the reference frame’s velocity to which the element belongs and its relative velocity with respect to that reference frame. The prior over relative velocity is modeled as a mixture of a delta function centered at zero and a zero-mean Gaussian distribution. This prior naturally enables the model to infer whether elements are stationary relative to their reference frame or moving within their reference frame. A detailed description of the full generative model, inference equations, and model fitting procedures can be found in Shivkumar et al. (2025).
Observers allocated most of the posterior probability mass to the delta component, reflecting the brain’s strong expectation that relative motion in the world is exactly zero, rather than merely slow. Moreover, observers attributed significant probability mass to only four out of the 12 possible causal motion structures:
Motion integration: Integrating center and surround velocities, leading to the perception of cuecombined motion velocity (Fig. 2c). The posterior probability of this structure was highest when the center and surround moved at the same velocity and decreased as the separation between center and surround velocities increased (Fig. S1c and d).
Surround reference: Perceiving center motion in the reference frame defined by the surround (i.e., pure relative motion or motion segmentation, see Fig. 2d). This structure was prominent for intermediate differences between center and surround motion directions for a subset of observers (Fig. S1e and f).
Center reference: Perceiving surround motion in the reference frame of the center. In this case, the perception of center motion largely aligned with its retinal motion (Fig. 2e), with only a few observers attributing significant probability mass to this structure (Fig. S1g and h).
Intermediate reference: Perceiving both center and surround motion relative to a reference frame intermediate between the two (Fig. 2f). This structure accounted for incomplete subtraction of the surround velocity from the center velocity at larger differences between center and surround motion directions and was prominent among most observers (Fig. 2i and j).
Notably, none of the observers attributed any probability mass to the possibility that the center and surround patches were independent, indicating that they did not perceive the two patches as causally unrelated (Fig. 2g and Fig. S1k and l).
Interestingly, motion structures with intermediate reference frames have higher posterior probabilities than other structures at large separations. This is due to the Gaussian slow-speed component in the prior, which favors smaller relative velocities under an intermediate reference frame, outweighing the mass concentrated in the delta component of the prior. Additional insights from fitting the model to observers’ responses, along with further analyses, can be found in Shivkumar et al. (2025).
We adopted a simplified version of the Bayesian causal inference model in (Shivkumar et al., 2025) which includes 10 parameters: (1) two sensory uncertainty parameters associated with the center and surround patches’ observed velocities, (2) one computational noise parameter, (3) three mixture prior parameters corresponding to the reference, center, and surround velocities, (4) three parameters for the prior widths of the zero-mean Gaussian slow speed priors over the reference, center, and surround velocities, and (5) one parameter representing the probability of grouping the center, and the surround patches into a single causal structure. Detailed descriptions of these parameters and their fitted values can be found in Shivkumar et al. (2025).
4.3 Generating neural predictions from Bayesian models
We generated neural predictions from the model’s posterior beliefs, assuming that they are represented in the single-neuron activity via Neural Sampling (Fiser et al., 2010; Hoyer and Hyvärinen, 2002).
We used the following algorithm to link a posterior distribution to mean firing rates and variability in the firing rates:
Draw samples, S, from the posterior distribution over latent variables,
, pvcenter, and pvreference:

where 
2 Link each sample, containing a combination of a speed and a direction value, in S with a predicted instantaneous firing rate using the measured (or hypothesized) tuning curve of the neuron in response to stimuli with stationary surround:S with a predicted instantaneous firing rate using the measured (or hypothesized) tuning curve of the neuron in response to stimuli with stationary surround:

where fspeed and fdirection denote the speed and direction tuning functions of the neuron, respectively. Sspeed = {u | (u, θ) ɛ S} and Sdirection = {ω | (u, θ) ɛ S} while * denotes element-wise multiplication. R represents the distribution over the predicted instantaneous firing rates corresponding to the pairs of speed and direction samples.
3 Take the mean or the variance to compute the predicted firing rate and variability in the firing rate.


where N is the number of samples in S.
4 Repeat this process for each CS stimulus.
All predicted tuning curves (mean firing rate) and response variability (variance of the firing rates) presented in this study were generated using the algorithm described above. This variability term quantifies the uncertainty in the model’s posterior beliefs and represents the predicted’excess variance’ (overdispersion) that modulates a baseline Poisson process, accounting for super-Poisson spike count statistics (Goris et al., 2014).
4.4 Predicted tuning curves hold for all Linear Distributional Codes
The method described above can be regarded as a special case of the approach introduced in Lengyel et al. (2023). In that work, Lengyel et al. (2023) developed a method for testing Bayesian models using neural activity, which applies to all Linear Distributional Code (LDC; Lange and Haefner, 2022) based encoding models. This approach also enables researchers to generate predicted tuning curves for single neurons that are consistent across all LDCs (Lange and Haefner, 2022). LDCs encompass a broad class of encoding models, including schemes within two of the three principal families of hypotheses about neural encoding: Neural Sampling Codes (Fiser et al., 2010; Hoyer and Hyvärinen, 2002) and Distributed Distributional Codes (DDCs; Vértes and Sahani, Consequently, the neural predictions derived from this method hold for any Neural Sampling or DDC-based encoding used by the brain to represent probability distributions. However, the method does not apply to Probabilistic Population Codes (PPCs; Ma et al., 2006) or other percentile codes (see Lengyel et al., for details).
In essence, the method identifies predicted,”test” stimuli whose posteriors can be expressed as mixtures of the posteriors of other,”baseline” stimuli, referred to as predictor stimuli. It demonstrates that, under the assumption of any LDC-based encoding, these predicted test stimuli are expected to elicit neural responses that are mixtures of the neural responses to the predictor baseline stimuli. Crucially, the mixture weights used to generate the neural responses correspond to the same mixture weights used to match the posteriors for the predicted test stimuli (see the derivation in Lengyel et al., 2023).
The method we used in this work to generate tuning curves (i.e., mean firing rates) is mathematically equivalent to the approach in Lengyel et al. (2023), under the following assumptions. To predict neural activity in response to a CS motion stimulus with a moving surround (the predicted test stimulus), we first compute the posterior belief for this stimulus using a Bayesian model with fixed or fitted parameters (e.g., p(k) in step 1 above). Next, we draw samples from this posterior (e.g., S in step 1 above) and use these as predictor baseline CS motion stimuli, with a stationary surround. We assume that the posteriors corresponding to these predictor baseline stimuli, i, are delta posteriors:

Then, the posterior for the predicted test stimulus is expressed as a mixture of the predictor baseline stimuli’s posteriors:

The optimal match is achieved when the mixture weights, w, are equal to the probability density of the posterior samples drawn for the predicted test stimulus:

Using measured or hypothesized tuning curves for baseline CS motion stimuli with a stationary surround, we look up the firing rate for each predictor baseline stimulus, i:

where, similar to step 2 above, fspeed, fdirection, ui, and ωi denote the speed and direction tuning functions of the neuron, and the speed and direction samples drawn from the posterior, respectively. For a large number of samples drawn from the posterior, the mixture of these neural responses to the predictor baseline stimuli weighted by the mixture weights, w, matches the predicted mean firing rate, 

Since the weighted sum Σi wiRi converges to the integral ∫ p(k)(u, ω) · R(u, ω) d(u, ω), which defines the expected firing rate.
Therefore, the predicted tuning curves generated with the algorithm described above hold for any LDC, whether Neural Sampling or DDC-based. However, the method does not apply to other encoding schemes such as PPCs. Additionally, the predicted variability in neural responses applies only to Neural Sampling codes.
4.5 Analyzing the predicted neural activity
The CS stimulus described in 4.1 has four important properties that modulate posterior beliefs in the Bayesian causal inference model: (1) the direction of the center, (2) the direction of the surround, (3) the speed of the center, and (4) the speed of the surround patches. Therefore, we plotted the predicted tuning curves and variability as a function of these four stimulus dimensions (see Fig. 5, and S3).
We then demonstrate that these predictions are different for different observers via the model parameter values fitted to their behavior (see and S3). Finally, we demonstrate that these predictions are different for different neurons via their different tuning curves (see Figs. 5 and S5).
In order to investigate the variety of possible neural CS interactions predicted by Bayesian causal inference, we generated neural predictions using a wide range of possible model parameters besides the fitted parameters for the 

The main signature of our causal inference model encoding relative motion is the diagonal interaction between the CS directions. The prediction maps in Fig. S4 are ordered from strongest (upper left corner) to weakest (lower right corner) diagonal interactions. We measured diagonal interaction as how symmetrical the prediction was to the horizontal cross-section line in the middle (i.e., when the surround direction equals 0 with varying center directions). The lower the symmetry scores, the stronger the CS interaction (see the scores on top of each prediction map). This diagonal CS interaction reflects the fact that these predictions assume that the neuron represents the center motion relative to the inferred reference frame’s motion. The more observers infer the surround patch as a reference frame and thus perceive the center motion relative to the surround, the stronger the diagonal CS interaction will be predicted for mean neural responses. Importantly, while this library of 47250 maps was generated to explore the model’s full predictive range (Fig. S4), it was not used for the subsequent comparison with empirical data. For that validation, we only used the parameters fitted to the five observers in Shivkumar et al. (2025). This allowed us to test the specific hypothesis that the same parameters explaining human behavior could also predict the neural responses.
4.6 Qualitative comparison to empirical data
To qualitatively compare our model’s predictions to previously published neural data, we followed a three-step procedure. First, we generated a catalogue of potential neural responses. Second, we defined the specific stimulus conditions (”cross-sections”) needed to match each historical study. Third, for each study, we selected the best-fitting prediction from our catalogue.
Generation of the neural prediction catalogue
Our predictions needed to account for variability across both observers and neurons. We therefore generated a catalogue spanning two dimensions: (1) Observer Parameters: We used the five unique parameter sets fitted to the human observers in Shivkumar et al. (2025). (2) Neuron Tuning Profiles: We created nine plausible tuning profiles by combining three distinct tuning widths (approximating the 25th, 50th, and 75th percentiles from DeAngelis and Uka (2003)) for both speed and direction (3 speed widths × 3 direction widths; see Fig. S5a).
For each of the 45 combinations (5 observers × 9 tuning profiles), we generated predicted mean firing rates (tuning curves) and response variability across a dense, four-dimensional grid of center and surround speeds and directions.
Stimulus cross-section selection and matching procedure
To compare our predictions with the modulation effects reported in the literature, we first calculated a surround modulation index (facilitation or suppression) by subtracting the predicted response to a stationary surround from the predicted response to a moving surround (Fig. 6a).
We then extracted specific”cross-sections” from our 4D predictions to precisely match the stimulus conditions of each empirical study. For each dataset, we identified the best-fitting prediction from our 45-member catalogue by selecting the observer/tuning-profile combination that yielded the lowest root mean squared error (RMSE). The specific cross-sections were defined as follows:
To match Allman et al. (1985a): We fixed the center motion to the neuron’s preferred direction and speed, and varied the surround direction (0°, 30°, 60°, 90°, 120°, 150°, 180°).
To match Born (2000): We used a similar cross-section, varying the surround direction from 0° to 180° in 45° increments.
To match Tanaka et al. (1986): For direction modulation, we varied the surround direction (0°, 30°, 60°, 90°). For speed modulation, we fixed both center and surround to the preferred direction and varied the surround speed (0.25x, 1x, and 4x the preferred speed).
Data availability
The current manuscript is a computational study, so no data have been generated for this manuscript. The modelling and analysis code can be found at https://github.com/GaborLengyel/CI_neural_predictions.
Acknowledgements
This work was supported by the BRAIN Initiative grant from the National Institute of Neurological Disorders and Stroke (https://www.ninds.nih.gov/ U19NS118246 to GCD and to RMH), and by the Computing Module of a National Eye Institute Core grant (https://www.nei.nih.gov/ EY001319 to GCD). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Additional files
Additional information
Funding
National Institute of Neurological Disorders and Stroke (NINDS) (U19NS118246)
Gregory DeAngelis
Ralf M Haefner
National Eye Institute (NEI) (EY001319)
Gregory DeAngelis
References
- Spatiotemporal energy models for the perception of motionJ Opt Soc Am A 2:284–299https://doi.org/10.1364/JOSAA.2.000284PubMedGoogle Scholar
- Direction and orientation selectivity of neurons in visual area mt of the macaqueJournal of Neurophysiology 52:1106–1130https://doi.org/10.1152/jn.1984.52.6.1106PubMedGoogle Scholar
- directionand velocity-specific responses from beyond the classical receptive field in the middle temporal visual area (MT)Perception 14:105–126https://doi.org/10.1068/p140105PubMedGoogle Scholar
- Stimulus specific responses from beyond the classical receptive field: Neurophysiological mechanisms for local-global comparisons in visual neuronsAnnual Review of Neuroscience 8:407–430https://doi.org/10.1146/annurev.ne.08.030185.002203PubMedGoogle Scholar
- Circuits and mechanisms for surround modulation in visual cortexAnnual Review of Neuroscience 40https://doi.org/10.1146/annurev-neuro-072116-031418PubMedGoogle Scholar
- Understanding the retinal basis of vision across speciesNature Reviews Neuroscience 21:5–20https://doi.org/10.1038/s41583-019-0242-1PubMedGoogle Scholar
- A geometric framework for nonlinear visual codingOpt Express 7:155–165https://doi.org/10.1364/OE.7.000155PubMedGoogle Scholar
- Combining feature selection and integration—a neural model for mt motion selectivityPLOS One 6:1–15https://doi.org/10.1371/journal.pone.0021254PubMedGoogle Scholar
- Complex inference in neural circuits with probabilistic population codes and topic modelsIn:
- Pereira F.
- Burges C.
- Bottou L.
- Weinberger K.
- Pereira F.
- Burges C.
- Bottou L.
- Weinberger K.
- Pereira F.
- Burges C.
- Bottou L.
- Weinberger K.
- Competing theories of probabilistic computations in the brainIn: Ccn 2020 workshop gac https://openreview.net/forum?id=uZMMO2obl50Google Scholar
- Visual motion perception as online hierarchical inferenceNature Communications 13:7403https://doi.org/10.1038/s41467-022-34805-5PubMedGoogle Scholar
- Hierarchical structure is employed by humans during visual motion perceptionProceedings of the National Academy of Sciences 117:24581–24589https://doi.org/10.1073/pnas.2008961117PubMedGoogle Scholar
- Center-surround interactions in the middle temporal visual area of the owl monkeyJournal of Neurophysiology 84:2658–2669https://doi.org/10.1152/jn.2000.84.5.2658PubMedGoogle Scholar
- Structure and function of visual area mtAnnual Review of Neuroscience 28:157–189https://doi.org/10.1146/annurev.neuro.26.041002.131052PubMedGoogle Scholar
- Segregation of global and local motion processing in primate middle temporal visual areaNature 357:497–499https://doi.org/10.1038/357497a0PubMedGoogle Scholar
- Segmentation versus integration in visual motion processingTrends in Neurosciences 16:263–268https://doi.org/10.1016/0166-2236(93)90179-PPubMedGoogle Scholar
- Center-surround antagonism based on disparity in primate area MTJ Neurosci 18:7552–7565https://doi.org/10.1523/jneurosci.18-18-07552.1998PubMedGoogle Scholar
- Spatial summation in the receptive fields of mt neuronsJournal of Neuroscience 19:5074–5084https://doi.org/10.1523/JNEUROSCI.19-12-05074.1999PubMedGoogle Scholar
- Responses of neurons in macaque mt to stochastic motion signalsVisual Neuroscience 10:1157–1169https://doi.org/10.1017/S0952523800010269PubMedGoogle Scholar
- Neural dynamics as sampling: A model for stochastic computation in recurrent networks of spiking neuronsPLOS Computational Biology 7:1–22https://doi.org/10.1371/journal.pcbi.1002211PubMedGoogle Scholar
- Normalization as a canonical neural computationNature Reviews Neuroscience 13:51–62https://doi.org/10.1038/nrn3136PubMedGoogle Scholar
- Do monkeys see the way we do? qualitative similarities and di”erences between monkey and human perceptionbioRxiv https://doi.org/10.1101/2025.01.30.635614Google Scholar
- Flexible gating of contextual influences in natural visionNature Neuroscience 18:1648–1655https://doi.org/10.1038/nn.4128PubMedGoogle Scholar
- The role of relative motion computation in ‘direction repulsion’Vision Research 40:833–841https://doi.org/10.1016/S0042-6989(99)00226-6PubMedGoogle Scholar
- Coding of horizontal disparity and velo city by mt neurons in the alert macaqueJournal of Neurophysiology 89:1094–1111https://doi.org/10.1152/jn.00717.2002PubMedGoogle Scholar
- Cortical-like dynamics in recurrent circuits optimized for sampling-based probabilistic inferenceNature Neuroscience 23:1138–1149https://doi.org/10.1038/s41593-020-0671-1PubMedGoogle Scholar
- Retinal bipolar cells: Elementary building blocks of visionNature Reviews Neuroscience 15:507–519https://doi.org/10.1038/nrn3783PubMedGoogle Scholar
- Analog memories in a balanced rate-based network of e-i neuronsIn:
- Ghahramani Z.
- Welling M.
- Cortes C.
- Lawrence N.
- Weinberger K.
- Ghahramani Z.
- Welling M.
- Cortes C.
- Lawrence N.
- Weinberger K.
- Statistically optimal perception and learning: From behavior to neural representationsTrends Cogn Sci 14:119–130https://doi.org/10.1016/j.tics.2010.01.003PubMedGoogle Scholar
- Classical centersurround receptive fields facilitate novel object detection in retinal bipolar cellsNature Communications 13:5575https://doi.org/10.1038/s41467-022-32761-8PubMedGoogle Scholar
- Discovering hierarchical motion structure [Quantitative Approaches in Gestalt Perception]Vision Research 126:232–241https://doi.org/10.1016/j.visres.2015.03.004PubMedGoogle Scholar
- Controversial stimuli: Pitting neural networks against each other as models of human cognitionProceedings of the National Academy of Sciences 117:29330–29337https://doi.org/10.1073/pnas.1912334117PubMedGoogle Scholar
- Partitioning neuronal variabilityNature Neuroscience 17:858–865https://doi.org/10.1038/nn.3711PubMedGoogle Scholar
- How does the brain compute with probabilities?arXiv https://doi.org/10.48550/arXiv.2409.02709Google Scholar
- Perceptual Decision-Making as probabilistic inference by neural samplingNeuron 90:649–660https://doi.org/10.1016/j.neuron.2016.03.020PubMedGoogle Scholar
- Interpreting neural response variability as monte carlo sampling of the posteriorIn: Proceedings of the 15th international conference on neural information processing systems. NIPS’02 MIT Press pp. 293–300Google Scholar
- Adaptive surround modulation in cortical area mtNeuron 53:761–770https://doi.org/10.1016/j.neuron.2007.01.032PubMedGoogle Scholar
- Stimulus dependency and mechanisms of surround modulation in cortical area mtJournal of Neuroscience 28:13889–13906https://doi.org/10.1523/JNEUROSCI.1946-08.2008PubMedGoogle Scholar
- Direction and speed tuning to visual motion in cortical areas mt and mstd during smooth pursuit eye movementsJournal of Neurophysiology 105:1531–1545https://doi.org/10.1152/jn.00511.2010PubMedGoogle Scholar
- Mst neurons code for visual motion in space independent of pursuit eye movementsJournal of Neurophysiology 97:3473–3483https://doi.org/10.1152/jn.01054.2006PubMedGoogle Scholar
- Configurations in the perception of velocityActa Psychologica 7:25–79https://doi.org/10.1016/0001-6918(50)90003-5Google Scholar
- Motion integration over space: Interaction of the center and surround motionVision Research 37:991–1005https://doi.org/10.1016/S0042-6989(96)00254-4Google Scholar
- Discharge patterns and functional organization of mammalian retinaJ Neurophysiol 16:37–68https://doi.org/10.1152/jn.1953.16.1.37PubMedGoogle Scholar
- Causal inference in multisensory perceptionPLOS One 2:1–10https://doi.org/10.1371/journal.pone.0000943PubMedGoogle Scholar
- Task-induced neural covariability as a signature of approximate bayesian learning and inferencePLOS Computational Biology 18:1–39https://doi.org/10.1371/journal.pcbi.1009557PubMedGoogle Scholar
- Bayesian encoding and decoding as distinct perspectives on neural codingbioRxiv https://doi.org/10.1101/2020.10.14.339770Google Scholar
- Neural correlates of perceptual learning in a sensory-motor, but not a sensory, cortical areaNature Neuroscience 11:505–513https://doi.org/10.1038/nn2070PubMedGoogle Scholar
- A general method for testing bayesian models using neural dataIn: Unireps: The first workshop on unifying representations in neural models https://openreview.net/forum?id=oWJP0NhcY7Google Scholar
- Bayesian inference with probabilistic population codesNature Neuroscience 9:1432–1438https://doi.org/10.1038/nn1790PubMedGoogle Scholar
- The analysis of moving visual patternsIn:
- Chagas C.
- Gattass R.
- Gross C.
- Relation of cortical areas mt and mst to pursuit eye movements. ii. differentiation of retinal from extraretinal inputsJournal of Neurophysiology 60:604–620https://doi.org/10.1152/jn.1988.60.2.604PubMedGoogle Scholar
- Multiple components of surround modulation in primary visual cortex: Multiple neural circuits with multiple functions? [The Function of Contextual Modulation]Vision Research 104:47–56https://doi.org/10.1016/j.visres.2014.08.018PubMedGoogle Scholar
- Neural variability and Sampling-Based probabilistic representations in the visual cortexNeuron 92:530–543https://doi.org/10.1016/j.neuron.2016.09.038PubMedGoogle Scholar
- Two-dimensional substructure of stereo and motion interactions in macaque visual cortexNeuron 37:525–535https://doi.org/10.1016/S0896-6273(02)01187-XPubMedGoogle Scholar
- Integration of contour and terminator signals in visual area mt of alert macaqueJournal of Neuroscience 24:3268–3280https://doi.org/10.1523/JNEUROSCI.4387-03.2004PubMedGoogle Scholar
- Contrast dependence of suppressive influences in cortical area mt of alert macaqueJournal of Neurophysiology 93:1809–1815https://doi.org/10.1152/jn.00629.2004PubMedGoogle Scholar
- End-stopping and the aperture problem: Two-dimensional motion signals in macaque v1Neuron 39:671–680https://doi.org/10.1016/S0896-6273(03)00439-2Google Scholar
- Causal inference predicts the transition from integration to segmentation in motion perceptionScientific Reports 14:27704https://doi.org/10.1038/s41598-024-78820-6PubMedGoogle Scholar
- Divisive normalization as a mechanism for hierarchical causal inference in motion perceptionIn: Cosyne Google Scholar
- Probabilistic brains: Knowns and unknownsNat Neurosci 16:1170–1178https://doi.org/10.1038/nn.3495PubMedGoogle Scholar
- Transparent motion perception as detection of unbalanced motion signals. i. psychophysicsJournal of Neuroscience 14:7357–7366https://doi.org/10.1523/JNEUROSCI.14-12-07357.1994PubMedGoogle Scholar
- Transparent motion perception as detection of unbalanced motion signals. iii. modelingJournal of Neuroscience 14:7381–7392https://doi.org/10.1523/JNEUROSCI.14-12-07381.1994PubMedGoogle Scholar
- Shape and spatial distribution of receptive fields and antagonistic motion surrounds in the middle temporal area (v5) of the macaqueEuropean Journal of Neuroscience 7:2064–2082https://doi.org/10.1111/j.1460-9568.1995.tb00629.xPubMedGoogle Scholar
- Distinct decision processes for 3d and motion stimuli in both humans and monkeys revealed by computational modellingbioRxiv https://doi.org/10.1101/2024.11.25.625189Google Scholar
- The stabilized supralinear network: A unifying circuit motif underlying multi-input integration in sensory cortexNeuron 85:402–417https://doi.org/10.1016/j.neuron.2014.12.026PubMedGoogle Scholar
- How MT cells analyze the motion of visual patternsNature Neuroscience 9:1421–1431https://doi.org/10.1038/nn1786PubMedGoogle Scholar
- Bayesian causal inference: A unifying neuroscience theoryNeuroscience & Biobehavioral Reviews 137:104619https://doi.org/10.1016/j.neubiorev.2022.104619PubMedGoogle Scholar
- Hierarchical motion perception as causal inferenceNature Communications 16:3868https://doi.org/10.1038/s41467-025-58797-0PubMedGoogle Scholar
- A model of neuronal responses in visual area mtVision Research 38:743–761https://doi.org/10.1016/S0042-6989(97)00183-1PubMedGoogle Scholar
- linking psychophysics and physiology of center-surround interactions in visual motion processingIn: Seeing spatial form Oxford University Press pp. 278–314https://doi.org/10.1093/acprof:oso/9780195172881.003.0014Google Scholar
- Perceptual consequences of centre–surround antagonism in visual motion processingNature 424:312–315https://doi.org/10.1038/nature01800PubMedGoogle Scholar
- Contextual modulations of center-surround interactions in motion revealed with the motion aftereffectJournal of Vision 8https://doi.org/10.1167/8.7.9PubMedGoogle Scholar
- Spatial suppression promotes rapid figure-ground segmentation of moving ob jectsNature Communications 10:2732https://doi.org/10.1038/s41467-019-10653-8PubMedGoogle Scholar
- Population code dynamics in categorical perceptionScientific Reports 6:22536https://doi.org/10.1038/srep22536PubMedGoogle Scholar
- Analysis of local and wide-field movements in the superior temporal visual areas of the macaque monkeyJournal of Neuroscience 6:134–144https://doi.org/10.1523/JNEUROSCI.06-01-00134.1986PubMedGoogle Scholar
- Receptive field center-surround interactions mediate context-dependent spatial contrast encoding in the retina (A. J. King, Ed.)eLife 7:e38841https://doi.org/10.7554/eLife.38841PubMedGoogle Scholar
- Predicting human perceptual decisions by decoding neuronal information profilesBiological Cybernetics 98:397–411https://doi.org/10.1007/s00422-008-0226-0PubMedGoogle Scholar
- Sampling motion trajectories during hippocampal theta sequenceseLife 11:e74058https://doi.org/10.7554/eLife.74058PubMedGoogle Scholar
- Flexible and accurate inference and learning for deep generative modelsIn: Proceedings of the 32nd international conference on neural information processing systems. NIPS’18 Curran Associates Inc pp. 4170–4179Google Scholar
- A neurally plausible model learns successor representations in partially observable environmentsIn:
- Wallach H.
- Larochelle H.
- Beygelzimer A.
- d’Alche—Buc F.
- Fox E.
- Garnett R.
- Wallach H.
- Larochelle H.
- Beygelzimer A.
- d’Alche—Buc F.
- Fox E.
- Garnett R.
- Wallach H.
- Larochelle H.
- Beygelzimer A.
- d’Alche—Buc F.
- Fox E.
- Garnett R.
- A psychophysically motivated model for two-dimensional motion perceptionVisual Neuroscience 9:79–97https://doi.org/10.1017/S0952523800006386PubMedGoogle Scholar
- A model for motion coherence and transparencyVisual Neuroscience 11:1205–1220https://doi.org/10.1017/S0952523800007008PubMedGoogle Scholar
- Selectivity of macaque MT/V5 neurons for surface orientation in depth specified by motionEur J Neurosci 9:956–964https://doi.org/10.1111/j.1460-9568.1997.tb01446.xPubMedGoogle Scholar
- Human visual motion perception shows hallmarks of bayesian structural inferenceScientific Reports 11:3714https://doi.org/10.1038/s41598-021-82175-7PubMedGoogle Scholar
- Understanding visual processing of motion: Completing the picture using experimentally driven computational models of mtReviews in the Neurosciences 35:243–258https://doi.org/10.1515/revneuro-2023-0052PubMedGoogle Scholar
- Pattern motion processing by mt neuronsFrontiers in Neural Circuits 13https://doi.org/10.3389/fncir.2019.00043Google Scholar
- Adaptive surround modulation of mt neurons: A computational modelFrontiers in Neural Circuits 14https://doi.org/10.3389/fncir.2020.529345PubMedGoogle Scholar
- A possible role for end-stopped v1 neurons in the perception of motion: A computational modelPLOS One 11:1–27https://doi.org/10.1371/journal.pone.0164813PubMedGoogle Scholar
- Probabilistic Interpretation of Population CodesNeural Computation 10:403–430https://doi.org/10.1162/089976698300017818PubMedGoogle Scholar
- Multiple object response normalization in monkey inferotemporal cortexJournal of Neuroscience 25:8150–8164https://doi.org/10.1523/JNEUROSCI.2058-05.2005PubMedGoogle Scholar
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.110621. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2026, Lengyel et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 0
- downloads
- 0
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.