Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.
Read more about eLife’s peer review process.Editors
- Reviewing EditorPeter LathamUniversity College London, London, United Kingdom
- Senior EditorJoshua GoldUniversity of Pennsylvania, Philadelphia, United States of America
Joint Public Review:
Summary:
Lengyel et al. present a normative model of single-neuron activity in area MT, which is known for its role in processing visual motion. The authors focus on responses to a center and a surround that move at different velocities. Both the center and surround are rigid: picture a set of dots all moving at the same velocity. The center dots are arranged in a disc; the surround dots in an annulus, and in both cases, the velocity of each is time-varying.
The core proposal is that the brain does not process motion in a fixed coordinate system, but instead infers a latent reference frame, and that MT neurons encode motion either in retinal coordinates or relative to this inferred reference frame. The model is meant to overcome a challenge in the existing literature on area MT: on the one hand, experimental findings are heterogeneous, including both surround suppression and surround facilitation of neural responses; on the other, existing models are either designed ad hoc to capture specific phenomena or they are somewhat general (e.g., divisive normalization), but in either case they can't explain the full range of responses. This manuscript proposes that the full range of responses in MT is explained as Bayesian inference over the reference frame in which center motion speed and direction should be estimated. The model extends one introduced in a previous publication from the same lab (Shivkumar et al. 2025). That publication focused on human perception of motion; this one makes predictions about MT mean responses and across-trial variability.
Strengths:
Processing visual motion is important for normal visual function, including for the integration and segmentation of visual objects. This manuscript presents a normative theory, supported by recent human perceptual data, and extends it to make predictions about neural firing rate and variability in area MT. The theory is well motivated and supported by the simulation analysis and comparison to data. It provides new insight into how causal inference of relative motion reference frames can modulate neural activity in MT. The richness of the theory's prediction can guide future experiments. In particular, the theory explains both center-surround suppression and facilitation, unifying disparate empirical observations in MT for which no unified explanation had been proposed. The manuscript also demonstrates a new method to map ideal observer predictions (posterior distributions over speed and direction, which are dependent on the posterior inference over reference frames) onto predicted neural activity for center-surround stimuli, by only considering basic tuning curves measured in the center-alone condition. This is a useful methodological contribution. The manuscript offers a thorough review of CS modulation studies in MT.
Weaknesses:
We found this paper difficult to read for two reasons. First, math is generally explained in words. This made it extremely difficult (impossible for some reviewers) to understand the details of the model, which are important. We're not against words, but it's critical that they be accompanied by equations.
Second, the manuscript is not self-contained in the sense that many of the motivations, assumptions, and limitations of the approach are only evident if one carefully reads the groups' prior work, Shivkumar et al. (2025). Following up on previous work isn't necessarily a flaw, but the introduction of the paper is written from a very broad perspective that does not effectively summarize the prior work and lay out the specific questions that motivate the current study. For example, it is not clear from the introduction whether the authors believe this framework can explain all sorts of center-surround interactions (including in non-motion stimuli and in other areas like the retina), or if the focus is only on area MT.
Finally, the connection to neural data is confusing and mostly qualitative. The authors create a library of "hypothetical but plausible tuning curves" and show that their modeling framework is flexible enough to capture a variety of center-surround interactions. Although they do state that their model can't explain all possible tuning curves, it's still hard to tell whether they have particularly strong evidence for the Bayesian causal inference hypothesis.
We also have several technical, but potentially important, comments.
Line 427: 'Our framework not only reinterprets past findings but also generates new, testable predictions. The model makes directly testable predictions for surround modulation. Facilitation, for instance, is predicted for neurons encoding retinal-centric motion (v_center) under high sensory uncertainty. In contrast, suppression is the hallmark of neurons encoding relative motion (v^relative_center) with respect to a surround-influenced reference frame.' It seems that to test the predictions of the model, one would need to first determine if a neuron encodes retinal or relative motion, without relying on the patterns predicted by this model, and then test if the two types of neurons behave as predicted. It is unclear how one can obtain this labeling of neurons independently of the model predictions.
Line 492: 'This offers a principled account of how the same population of neurons can support both perceptual states (integration and segmentation)'. However, because the theory assumes each neuron encodes either center velocity or center velocity relative to a moving reference frame, but not both, it does not explain that the same neuron could shift from suppression to facilitation. It may be worth considering another possibility, using V1 surround modulation as an analogy. Different neuron types are required to implement the surround computation: in mouse V1, SST interneurons are surround-facilitated, and they are necessary to implement surround suppression of pyramidal neurons https://pmc.ncbi.nlm.nih.gov/articles/PMC3621107, but their (SST) outputs are not communicated to downstream targets. In that view, facilitation is therefore not a signature of some neurons encoding a type of latent variable; it is only there as an intermediate step in the computation of the other latents (those that require suppression).
Misspecification of either the prior or likelihood can be a problem for Bayesian inference. Discussion of this point -- and in particular evidence (say from analysis of natural scene statistics in the case of the prior) that both are well-specified -- would strengthen the manuscript.