Neural encoding of multiple motion speeds in visual cortical area MT

  1. Department of Neuroscience, University of Wisconsin-Madison, Wisconsin 53705, USA

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Jennifer Groh
    Duke University, Durham, NC, United States of America
  • Senior Editor
    Joshua Gold
    University of Pennsylvania, Philadelphia, United States of America

Reviewer #1 (Public Review):

Summary:

Most studies in sensory neuroscience investigate how individual sensory stimuli are represented in the brain (e.g., the motion or color of a single object). This study starts tackling the more difficult question of how the brain represents multiple stimuli simultaneously and how these representations help to segregate objects from cluttered scenes with overlapping objects.

Strengths

The authors first document the ability of humans to segregate two motion patterns based on differences in speed. Then they show that a monkey's performance is largely similar; thus establishing the monkey as a good model to study the underlying neural representations.

Careful quantification of the neural responses in the middle temporal area during the simultaneous presentation of fast and slow speeds leads to the surprising finding that, at low average speeds, many neurons respond as if the slowest speed is not present, while they show averaged responses at high speeds. This unexpected complexity of the integration of multiple stimuli is key to the model developed in this paper.

One experiment in which attention is drawn away from the receptive field supports the claim that this is not due to the involuntary capture of attention by fast speeds.

A classifier using the neuronal response and trained to distinguish single-speed from bi-speed stimuli shows a similar overall performance and dependence on the mean speed as the monkey. This supports the claim that these neurons may indeed underlie the animal's decision process.

The authors expand the well-established divisive normalization model to capture the responses to bi-speed stimuli. The incremental modeling (eq 9 and 10) clarifies which aspects of the tuning curves are captured by the parameters.

Weaknesses

While the comparison of the overall pattern of behavioral performance between monkeys and humans is important, some of the detailed comparisons are not well supported by the data. For instance, whether the monkey used the apparent coherence simply wasn't tested and a difference between 4 human subjects and a single monkey subject cannot be tested statistically in a meaningful manner. I recommend removing these observations from the manuscript and leaving it at "The difference between the monkey and human results may be due to species differences or individual variability" (and potentially add that there are differences in the task as well; the monkey received feedback on the correctness of their choice, while the humans did not.)

A control experiment aims to show that the "fastest speed takes all" behavior is general by presenting two stimuli that move at fast/slow speeds in orthogonal directions. The claim that these responses also show the "fastest speed takes all" is not well supported by the data. In fact, for directions in which the slow speed leads to the largest response on its own, the population response to the bi-speed stimulus is the average of the response to the components. Only for the directions where the fast speed stimulus is the preferred direction is there a bias towards the faster speed (Figure 7A). The quantification of this effect in Figure 7B seems to suggest otherwise, but I suspect that this is driven by the larger amplitude of Rf in Figure 8, and the constraint that ws and wf are constant across directions. The interpretation of this experiment needs to be reconsidered.

Reviewer #2 (Public Review):

Summary:

This is a paper about the segmentation of visual stimuli based on speed cues. The experimental stimuli are random dot fields in which each dot moves at one of two velocities. By varying the difference between the two speeds, as well as the mean of the two speeds, the authors estimate the capacity of observers (human and non-human primates) to segment overlapping motion stimuli. Consistent with previous work, perceptual segmentation ability depends on the mean of the two speeds. Recordings from area MT in monkeys show that the neuronal population to compound stimuli often shows a bias towards the faster-speed stimuli. This bias can be accounted for with a computational model that modulates single-neuron firing rates by the speed preferences of the population. The authors also test the capacity of a linear classifier to produce the psychophysical results from the MT data.

Strengths:

Overall, this is a thorough treatment of the question of visual segmentation with speed cues. Previous work has mostly focused on other kinds of cues (direction, disparity, color), so the neurophysiological results are novel. The connection between MT activity and perceptual segmentation is potentially interesting, particularly as it relates to existing hypotheses about population coding.

Weaknesses:

Page 10: The relationship between (R-Rs) and (Rf-Rs) is described as "remarkably linear". I don't actually find this surprising, as the same term (Rs) appears on both the x- and y-axes. The R^2 values are a bit misleading for this reason.

Figure 9: I'm confused about the linear classifier section of the paper. The idea makes sense - the goal is to relate the neuronal recordings to the psychophysical data. However the results generally provide a poor quantitative match to the psychophysical data. There is mention of a "different paper" (page 26) involving a separate decoding study, as well as a preprint by Huang et al. (2023) that has better decoding results. But the Huang et al. preprint appears to be identical to the current manuscript, in that neither has a Figure 12, 13, or 14. The text also says (page 26) that the current paper is not really a decoding study, but the linear classifier (Figure 9F) is a decoder, as noted on page 10. It sounds like something got mixed up in the production of two or more papers from the same dataset. In any case, I think that some kind of decoding analysis would really strengthen the current paper by linking the physiology to the psychophysics, but given the limitations of the linear classifier, a more sophisticated approach might be necessary -- see for example Zemel, Dayan, and Pouget, 1998. The authors might also want to check out closely related work by Treue et al. (Nature Neuroscience 2000) and Watamaniuk and Duchon (1992).

What do we learn from the normalization model? Its formulation is mostly a restatement of the results - that the faster and slower speeds differentially affect the combined response. This hypothesis is stated quantitatively in equation 8, which seems to provide a perfectly adequate account of the data. The normalization model in equation 10 is effectively the same hypothesis, with the mean population response interposed - it's not clear how much the actual tuning curve in Figure 10A even matters, since the main effect of the model is to flatten it out by averaging the functions in Figure 10B. Although the fit to the data is reasonable, the model uses 4 parameters to fit 5 data points and is likely underconstrained; the parameters other than alpha should at least be reported, as it would seem that sigma is actually the most important one. And I think it would help to examine how robust the statistical results are to different assumptions about the normalization pool.

Reviewer #3 (Public Review):

Summary:

This study concerns how macaque visual cortical area MT represents stimuli composed of more than one speed of motion.

Strengths:

The study is valuable because little is known about how the visual pathway segments and preserves information about multiple stimuli. The study presents compelling evidence that (on average) MT neurons represent the average of the two speeds, with a bias that accentuates the faster of the two speeds. An additional strength of the study is the inclusion of perceptual reports from both humans and one monkey participant performing a task in which they judged whether the stimuli involved one vs two different speeds. Ultimately, this study raises intriguing questions about how exactly the response patterns in visual cortical area MT might preserve information about each speed, since such information could potentially be lost in an average response as described here, depending on assumptions about how MT activity is evaluated by other visual areas.

Weaknesses:

My main concern is that the authors are missing an opportunity to make clear that the divisive normalization, while commonly used to describe neural response patterns in visual areas (and which fits the data here), fails on the theoretical front as an explanation for how information about multiple stimuli can be preserved. Thus, there is a bit of a disconnect between the goal of the paper - how does MT represent multiple stimuli? - and the results: mostly averaging responses which, while consistent with divisive normalization, would seem to correspond to the perception of a single intermediate speed. This is in contrast to the psychophysical results which show that subjects can at least distinguish one from two speeds. The paper would be strengthened by grappling with this conundrum in a head-on manner.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation