Introduction

Neuroscientists have been investigating how neurons in the brain represent sensory information for decades. Previous studies were often concerned with the neural coding of a single visual stimulus. However, natural environments are abundant with multiple entities that often co-occupy visual neurons’ receptive fields (RFs). Segmenting visual objects from each other and their background is a fundamental function of vision (Braddick, 1993), yet the neural mechanisms underlying the representation of multiple stimuli remain poorly understood. As the field moves toward understanding visual processing under more naturalistic conditions, it becomes increasingly important to uncover the principles by which the brain encodes multiple visual stimuli. Visual motion is a particularly salient cue for scene segmentation. Elements that share common motion are typically grouped into a single perceptual object, while entities moving at different velocities can often be segregated from each other. For instance, an object moving at a speed distinct from its background is more readily segmented. In this study, we investigated how the primate visual system represents multiple motion speeds.

The extrastriate middle-temporal cortex (area MT) is important for motion processing and motion-based segmentation (Allman et al., 1985; Britten, 2003; Born and Bradley, 2005; Pasternak et al., 2020; Born et al., 2000; Huang et al., 2007, 2008). Segmentation of overlapping stimuli moving at different directions and speeds gives rise to the perception of transparent motion (Braddick, 1997; Braddick et al., 2002; Mestre et al., 2001; Masson et al., 1999). Previous studies have investigated how neurons in area MT represent two motion directions of transparently moving stimuli (Snowden et al., 1991; Qian and Andersen, 1994; McDonald et al., 2014; Xiao et al., 2014; Xiao and Huang, 2015; Wiesner et al., 2020; Stoner and Albright, 1992; Krekelberg and van Wezel, 2013). Although how cortical neurons represent the speed of a single stimulus has been well-studied (Maunsell and van Essen, 1983; Lisberger and Movshon, 1999; Nover et al., 2005; Pack et al., 2005; Krekelberg et al., 2006a; Perrone and Thiele, 2001; Priebe et al., 2003, 2006; Liu and Newsome 2003), how neurons represent multiple speeds of transparently moving stimuli is largely unknown.

In characterizing how MT neurons represent multiple directions of transparently moving stimuli, we have previously shown that many neurons do not pool two directions equally, but weigh one direction more than the other (Xiao and Huang, 2015). We have also found that some MT neurons show response nonlinearity in pooling two motion directions in a way that better represents the individual direction components. The heterogeneous response weights and response nonlinearity in representing multiple directions can benefit the neural coding of multiple stimuli (Orhan and Ma, 2015; Xiao and Huang, 2015), and may constitute an optimal population representation of visual motion with multiple directions (Huang et al., 2017). Unlike two motion directions for which the individual directions appear to be balanced in perceptual quality and salience, visual stimuli moving at two speeds appear to be asymmetrical – one slower and one faster. The goal of this study is to determine the neural coding principle for multiple speeds of overlapping stimuli.

Visual information is encoded in the brain by populations of neurons, and Bayesian inference provides a robust framework for understanding the population neural code (Pouget et al., 2000; Averbeck et al., 2006; Ma et al., 2006; Fisher et al., 2010). Additionally, the visual system may be optimized to represent information in natural environments and to enhance performance in key behavioral tasks (Barlow, 1961; Atick and Redlich, 1992; Simoncelli and Olshausen, 2001; Ganguli and Simoncelli, 2014; Manning et al., 2024). Within this framework, we consider several scenarios for how MT neurons might encode two motion speeds within their RFs. 1) Response averaging: MT neurons may average the responses elicited by individual speed components, a phenomenon often observed in neural responses to multiple stimuli (e.g., Recanzone et al., 1997; Zoccolan et al., 2005). When the separation between the two speeds is smaller than the neuron’s tuning width, the population response to two speeds would appear unimodal, peaking at an intermediate speed. While decoding two stimuli from a unimodal response is theoretically possible (Zemel et al., 1998; Treue et al., 2000), response averaging may result in poorer segmentation compared to encoding schemes that emphasize individual components, as demonstrated in neural coding of overlapping motion directions (Xiao and Huang, 2015). 2) Bias toward the stronger component response: A neuron may favor the speed component that elicits a stronger response, following a soft-max operation (Riesenhuber and Poggio, 1999). This scheme allows neurons to preferentially encode stimuli closer to their preferred speed and maintain a population code representing both components. 3) Bias toward the slower speed component: Given that slower speeds are more prevalent in natural environments (Weiss et al., 2002; Stocker and Simoncelli, 2006; Zhang and Stocker, 2022), MT neurons may favor slower components. Such encoding would align with the prior probability of natural speed distributions, optimizing for more frequent stimuli. 4) Bias toward the faster speed component: Neurons may prioritize faster-moving components. This scheme would allow better segmentation of a faster-moving stimulus from a slower background and facilitate the critical perceptual task of figure-ground segregation. Finally, we explored whether these encoding rules depend on the stimulus speeds and the speed preferences of individual neurons.

Regarding neural decoding, previous studies successfully extracted single stimulus speeds from neuronal populations in area MT using decoders such as vector-averaging and maximum likelihood estimators (Lisberger and Movshon, 1999; Churchland and Lisberger, 2001; Priebe and Lisberger, 2004; Huang and Lisberger, 2009; Yang and Lisberger, 2009; Krekelberg et al., 2006a, b; Krekelberg and van Wezel, 2013). However, it is unclear whether simultaneously presented multiple speeds can be extracted from population neural responses, which would be difficult for decoders that only read out a single value. Zemel and colleagues developed a decoding framework that recovers the probabilistic distribution of a stimulus feature (Zemel et al., 1998; Pouget et al., 2003). Decoders of this type remain to be tested with neurophysiological and perceptual data.

We first characterized the perception of overlapping stimuli that moved simultaneously at two speeds. Our results showed that human and monkey subjects can segment overlapping stimuli based only on speed cues. The performance was better when the separation between two stimulus speeds was larger, and the ability of speed segmentation was reduced when stimulus speeds were fast. Next, we recorded neuronal responses from area MT of male macaque monkeys. We made a novel finding that MT neurons showed a strong faster-speed bias when stimulus speeds were slow, and as stimulus speeds increased, the faster-speed bias gradually shifted to response averaging. We also showed that a classifier could differentiate a two-speed stimulus from a single-speed stimulus based on MT responses in a way generally consistent with perception. We proposed a model in which each speed component was weighted by the responses of a population of neurons with a broad range of speed preferences elicited by that speed component. We also found that information about multiple speeds was carried in the population neural response in MT, and it was possible to extract either a single speed or multiple speeds from area MT in a way largely consistent with perception, with limitations when two stimulus speeds were less separated from each other. This study helps to fill a gap in understanding the neural coding principle of multiple motion speeds and provides new insight into the mechanism underlying the neural representation of multiple visual stimuli.

Results

Perception of overlapping stimuli moving at different speeds

Human psychophysics

To establish the perceptual basis for our study, we first characterized how human subjects perceived overlapping stimuli moving at different speeds. We used similar visual stimuli in our psychophysics experiments as in our neurophysiology experiments. We asked how perceptual segmentation was impacted by the separation between two stimulus speeds, and as the mean stimulus speed changed from slow to fast.

The visual stimuli were two overlapping random-dot patches presented within a stationary square aperture 10° wide and centered at 11° eccentricity. The random dots translated within the aperture in the same direction at two different speeds. It has been suggested that the neural representation of speed in the visual cortex is encoded on a logarithmic scale (Maunsell and van Essen, 1983; Lisberger and Movshon, 1999; Nover et al., 2005), so we used a fixed ratio between two speeds, which gave rise to a fixed speed difference in the logarithmic scale. One set of stimuli had a “large speed separation”, and the speed of the faster component was four times (x4) that of the slower component. The five speed pairs used were 1.25 and 5°/s, 2.5 and 10°/s, 5 and 20°/s, 10 and 40°/s, and 20 and 80°/s (Fig. 1B1). Another set of stimuli had a “small speed separation”, and the speed ratio was two (x2). The five speed pairs were 1.25 and 2.5°/s, 2.5 and 5°/s, 5 and 10°/s, 10 and 20°/s, and 20 and 40°/s (Fig. 1B2). Experimental trials of bi-speed stimuli that had large and small speed separations were randomly interleaved.

Psychophysical tasks and performance of human subjects.

A. Illustration of the 2AFC and 3AFC tasks. B. Motion speeds of visual stimuli. The speeds of two stimulus components were plotted versus the log mean speed of each bi-speed stimulus. C. Discriminability of four human subjects performing a standard 2AFC task. Letters are coded symbols for individul subjects. D. In the 3AFC task, the percentage of trials that human subjects reported “no two-speeds”. E. Discriminability of the same subjects performing the 3AFC task. B1-E1. X4 speed separation. B2-E2. X2 speed separation. Each color represents data from one subject. The solid line shows the subject-averaged result. Error bars and error bands represent ±STE.

Human subjects first performed a standard two-alternative-forced-choice (2AFC) task to discriminate a bi-speed stimulus from the corresponding single-speed stimulus that moved at the log mean speed of the two component speeds. In each trial, the bi-speed and single-speed stimuli were presented in two consecutive time intervals in a random and balanced order (Fig. 1A). At large (x4) speed separation, all four subjects could perform the task well when the component speeds were less than 20 and 80°/s (Fig. 1C1). At 20 and 80°/s, the discrimination performance was poor (mean d’ = 0.74, standard error STE = 0.5), indicating that subjects could not segment the speed components. At the small (x2) speed separation, the discriminability was worse than at the x4 separation. When the component speeds were less than 20 and 40°/s, subjects on average could differentiate the bi-speed stimulus from the single-speed stimulus (d’ > 1.5), but not when speeds were at 20 and 40°/s (mean d’ = 0.17, STE = 0.1) (Fig. 1C2).

In the standard 2AFC task, it is possible that subjects could not segment the bi-speed stimulus into two separate speeds, but were still able to differentiate the bi-speed from single-speed stimuli based on their appearances (e.g., the distribution of the random dots of the bi-speed stimulus may appear less uniform). Because our goal was to measure discriminability based on perceptual segmentation, we designed a novel 3AFC task to address this concern. In the modified task, subjects still discriminated the bi-speed stimulus from the corresponding single-speed stimulus but had the option to make a third choice on trials when they thought neither stimulus interval appeared to contain two speeds (“no two-speeds” choice) (Fig. 1A). Panels D1 and D2 show the percentage of trials in which subjects made the no two-speeds choice (NTC). At x4 speed separation, the percentage of NTC was low at most speed pairs. Except at the highest speeds of 20 and 80°/s, subjects reported they could not see two speeds in most trials (Fig. 1D1). At x2 speed separation, the percentage of NTC showed a U-shape as a function of the stimulus speed, and was near 100% at 20 and 40°/s (Fig. 1D2). These results confirmed that human subjects had difficulty segmenting two speeds when stimulus speeds were high. In addition, at low stimulus speeds with a small (x2) speed separation, subjects tended to perceive only one speed (Fig. 1D2). We incorporated the NTC into the d’ calculation by evenly splitting the NTC trials into “hit” trials and “false alarm” trials (see Methods). In this way, the NTC trials were accounted for by d’, in the sense that they did not contribute to successful discrimination.

The d’ from the 3AFC task were similar to those of the 2AFC task, with a slight reduction of d’ across conditions as the NTC trials reduced discrimination performance (Fig. 1E1 vs. 1C1, 1E2 vs. 1C2). The small performance difference between the 2AFC and 3AFC tasks suggests that human subjects generally relied on speed segmentation to perform the 2AFC task. Based on the results from the 3AFC task, we performed a two-way ANOVA, in which the two factors were the mean speed of the stimulus components and the speed separation (x4 or x2). We found that both factors had significant effects. d’ changed significantly with the mean stimulus speed (F(4,30) = 26.8, p = 1.60×10−9) and the d’ at x4 separation differed significantly from that at x2 separation (F(1,30) = 84.1, p = 3.29×10−10). d’ was higher at x4 than at x2 speed separation except at the fastest speeds of 20 and 80°/s vs. 20 and 40°/s (Fig. 1E1 vs. 1E2). Our results also showed that segmentation was significantly worse at fast speeds – d’ dropped significantly as the stimulus speeds increased from 10 and 40°/s to 20 and 80°/s for x4 separation (one-way ANOVA, F(1,6) = 38.6, p = 8.1×10−4) (Fig. 1E1), and from 10 and 20°/s to 20 and 40°/s for x2 separation (one-way ANOVA, F(1,6) = 32.7, p = 1.24×10−3) (Fig. 1E2).

Monkey psychophysics

We next measured the monkey’s ability to segment overlapping stimuli moving at two speeds. We trained one male macaque monkey to perform a 2AFC task to report whether a stimulus contained one or two speeds (Fig. 2A, see Methods). The monkey’s performance at x2 speed separation (Fig. 2B2) was very similar in shape to that of humans (Fig. 1C2 of the 2AFC task). In addition, the monkey’s performance was generally better at x4 separation than at x2 separation (Fig. 2B1 vs 2B2).

Monkey psychophysics.

A. Behavioral task and visual stimuli. B. Discriminability of a monkey subject performing a 2AFC task. B1. X4 speed separation. B2. X2 speed separation. Error bars and error bands represent ±STE.

At x4 separation, the performance improved as the stimulus speeds increased from 1.25 and 5°/s to 5 and 20°/s. As the stimulus speeds increased from 5 and 20°/s to 20 and 80°/s, the performance declined (Fig. 2B1), similar to the human results (Fig. 1C1). However, the monkey was still able to differentiate the bi-speed and single-speed stimuli at the fastest speeds of 20 and 80°/s (Fig. 2B1), whereas the average human performance was poor (Fig. 1C1). Note that one human subject (NP) performed better than other subjects at 20 and 80°/s (mean d’ = 2.12, STE = 0.12) (Fig. 1C1). The difference between the monkey and human results may be due to species differences or individual variability. The differences in behavioral tasks may also play a role – the monkey received feedback on the correctness of the choice, whereas human subjects did not.

Another notable difference between the monkey and human results was that, at low stimulus speeds of 1.25 and 5°/s, human subjects could differentiate the bi-speed stimulus from the corresponding single-speed (2.5°/s) stimulus nearly perfectly. In comparison, the ability of the monkey subject to segment 1.25 and 5 °/s was lower (d’ = 2.8, STE = 0.51), although still good (Fig 2B1 vs 1C1). This may be explained by how the monkey performed the task. For human subjects, while the motion of the faster component (5°/s) of the bi-speed stimulus appeared to be salient, it required effort to notice the very slow component (1.25°/s) to be moving rather than stationary. In some trials, the monkey may be able to segment the 5°/s component from the bi-speed stimulus but consider the slower component of 1.25°/s as stationary and, therefore, reported that the stimulus contained only one speed. Despite some differences between the human and monkey results, the two general trends – better segmentation performance at larger than smaller speed separation and reduced segmentation ability at very fast speeds were consistent across species.

Neuronal responses in MT elicited by bi-speed stimuli and single-speed components

To characterize how neurons in the visual cortex encode two overlapping stimuli moving at different speeds, we recorded extracellularly from 100 isolated neurons in area MT of two male macaque monkeys (60 neurons from IM and 40 neurons from MO) while the monkeys performed a fixation task. Figure 3 shows the responses from four example neurons. To visualize the relationship between the responses to the bi-speed stimulus (red) and the constituent speed components, the plots of the response tuning curves to the slower (green) and faster (blue) components are shifted horizontally so that the responses elicited by the bi-speed stimulus and its constituent single-speed components are aligned along a vertical line as illustrated in Figure 3A1.

Speed tuning curves of four example neurons to bi-speed stimuli and constituent single-speed components.

A. Illustration of the visual stimuli and the response tuning curves of an example neuron. Green and blue dots in the diagram indicate two overlapping achromatic random-dot patterns moving in the same direction at different speeds. Colors are used for illustration purposes only. The abscissas in green and blue show the speeds of the slower and faster components, respectively. The abscissa in black shows the log mean speed of the two speed components. A-D. Four example neurons are sorted by their preferred speeds (PS) from slow to fast. Error bars represent ±STE. For some data points, error bars were comparable to the symbol size. A1-D1. X4 speed separation. A2-D2. X2 speed separation.

We found that the relationship between the responses elicited by the bi-speed stimulus and the constituent components depended on the stimulus speeds. Figure 3A1-D1 shows the responses of four MT neurons when the speed separation was large (×4). The component speeds were the same as the bi-speed stimuli used in the psychophysics experiments. When the two component speeds were slow (1.25 and 5°/s), the response to the bi-speed stimulus nearly followed the response elicited by the faster-speed component (the leftmost data points in Fig. 3A1-D1). Importantly, the response elicited by the bi-speed stimuli did not simply follow the stronger component response. When the preferred speed of a neuron was sufficiently low such that the response elicited by the faster component was weaker than that elicited by the slower component, the response to the bi-speed stimulus still followed the weaker response elicited by the faster component (Fig. 3A1). When the speeds of the two stimulus components were at 2.5 and 10°/s, the response elicited by the bi-speed stimulus was also biased toward the faster component, albeit to a lesser degree. As the mean speed of the two stimulus components increased, the bi-speed response became closer to the average of the two component responses (Fig. 3A1-D1). We found similar results when the speed separation between the two stimulus components was small (×2) (Fig. 3A2-D2).

We found the same trend in the neural responses averaged across 100 neurons (Fig. 4A). At ×4 speed separation, the population-averaged response showed a strong bias toward the faster component when the stimulus speeds were low and shifted toward the average of the component responses as the speeds increased (Fig. 4A1). To determine whether this trend held for neurons with different preferred speeds, we divided the neuron population into three groups with “low” (<2.5°/s), “intermediate” (between 2.5 and 25°/s), and “high” (>25°/s) preferred speeds. For 10 neurons that preferred low speeds, the response to the faster component was weaker than that to the slower component. However, the response to the bi-speed stimuli was strongly biased toward the faster component when the stimulus speeds were low (Fig. 4B1). This finding suggests that the bi-speed response is not biased toward the stimulus component that the neuron prefers when presented alone but biased toward the faster speed component.

Population-averaged speed tuning curves to bi-speed stimuli and constituent single-speed components.

Speed tuning curves averaged across A. 100 neurons in our dataset. B. 10 neurons that had PS lower than 2.5°/s. C. 61 neurons that had PS between 2.5 and 25°/s. D. 29 neurons that had PS greater than 25°/s. Error bars represent ±STE. For some data points, error bars were comparable to the symbol size. A1-D1. X4 speed separation. A2-D2. X2 speed separation.

For 61 neurons that preferred intermediate speeds (Fig. 4C1) and 29 neurons that preferred high speeds (Fig. 4D1), we also found a strong bias toward the faster speed component when the stimulus speeds were low, and a gradual change toward the average of the component responses as the stimulus speeds increased. At the lowest stimulus speeds of 1.25 and 5°/s, the bi-speed response was nearly identical to that elicited by the faster component, showing “faster-component-take-all”. For neurons that preferred high speeds, faster-component-take-all was also found for the stimulus speeds of 2.5 and 10°/s (Fig. 4D1). At the fastest speeds of 20 and 80°/s, the response to the bi-speed stimuli showed a slight bias toward the slower component. We found similar results at x2 speed separation (Fig. 4A2-D2), although the effect is not as pronounced as x4 speed separation.

Relationship between the responses to bi-speed stimuli and constituent stimulus components

We aimed to quantify the relationship between the response elicited by the bi-speed stimuli and the corresponding component responses. We first made an assumption that the response R of a neuron elicited by two component speeds can be described as a weighted sum of the component responses Rs and Rf elicited by the slower (vs) and faster (vf) component speed, respectively (Eq. 5).

in which, ws and wf are the response weights for the slower and faster speed component vs and vf, respectively.

Our goal was to estimate the weights for each speed pair and determine whether the weights change with the stimulus speeds. In our main data set, the two speed components moved in the same direction. To determine the weights of ws and wf for each neuron at each speed pair, we have three data points R, Rs, and Rf, which are trial-averaged responses. Since it is not possible to solve for both variables, ws and wf, from a single equation (Eq. 5) with three data values, we introduced an additional constraint: ws + wf =1. While this constraint may not yield the exact weights that would be obtained with a fully determined system, it nevertheless allows us to characterize how the relative weights vary with stimulus speed. With this constraint, as long as RfRs, R can be expressed as:

The response weights are . Intuitively, if R were closer to one component response, that stimulus component would have a higher weight. Note that Equation 6 is not intended for fitting the response R using Rs and Rf, but rather to use the relationship among R, Rs, and Rf to determine the weights for the faster and slower components.

Using this approach to estimate response weights for individual neurons can be unreliable, particularly when Rf and Rs are similar. This situation often arises when the two speeds fall on opposite sides of the neuron’s preferred speed, resulting in a small denominator (RfRs) and consequently an artificially inflated weight estimate. We, therefore, used the neuronal responses across the population to determine the response weights (Fig. 5). For each pair of stimulus speeds, we plotted (R - Rs) in the ordinate versus (Rf - Rs) in the abscissa. Figure 5A1-E1 shows the results obtained at ×4 speed separation. Across the neuronal population, the relationship between (R - Rs) and (Rf - Rs) is well described by a linear equation (Eq. 7) (R2 ranged from 0.94 to 0.76, see Table 1). This linearity suggests that the response weights for each speed pair are consistent across the neuronal population.

Response weight for faster component based on linear regression (N = 100)

Relationship between the responses to the bi-speed stimuli and the constituent stimulus components.

A-E. Each panel shows the responses from 100 neurons. Each dot represents the response from one neuron. The ordinate shows the difference between the responses to a bi-speed stimulus and the slower component (R - Rs). The abscissa shows the difference between the responses to the faster and slower components (Rf - Rs). The regression line is shown in red. F. Response weights for the faster stimulus component obtained from the slope of the linear regression based on the recorded responses of 100 neurons (black symbols), and based on simulated responses to the bi-speed stimuli (gray symbols). Error bars represent 95% confidence intervals. A1-F1. X4 speed separation. A2-F2. X2 speed separation.

Because all the regression lines in Figure 5 nearly go through the origin (i.e. intercept b ≈ 0, Table 1), the slope k obtained from the linear regression approximates , which is the response weight wf for the faster component (Eq. 6). Hence, for each pair of stimulus speeds, we can estimate the response weight for the faster component using the slope of the linear regression of the responses from the neuronal population.

Our results showed that the bi-speed response showed a strong bias toward the faster component when the speeds were slow and changed progressively from a scheme of “faster-component-take-all” to “response-averaging” as the speeds of the two stimulus components increased (Fig. 5F1). We found similar results when the speed separation between the stimulus components was small (×2), although the bias toward the faster component at low stimulus speeds was not as strong as x4 speed separation (Fig. 5A2-F2 and Table 1).

In the regression between (RRs) and (RfRs), Rswas a common term and therefore could artificially introduce correlations. We wanted to determine whether our estimates of the regression slope (wf) and the coefficient of determination (R2) can be explained by this confounding factor. At each speed pair and for each neuron from the data sample of the 100 neurons shown in Figure 5, we simulated the response to the bi-speed stimuli (Re) as a randomly weighted sum of Rf and Rs of the same neuron.

in which a was a randomly generated weight (between 0 and 1) for Rf, and the weights for Rf and Rs summed to one. We then calculated the regression slope and the correlation coefficient between the simulated Re-Rs and Rf-Rs across the 100 neurons. We repeated the process 1000 times and obtained the mean and 95% confidence interval (CI) of the regression slope and the R2. The mean slope based on the simulated responses was 0.5 across all speed pairs. The estimated slope (wf) based on the data was significantly greater than the simulated slope at slow speeds of 1.25/5, 2.5/10 (Fig. 5F1), and 1.25/2.5, 2.5/5, and 5/10 degrees/s (Fig. 5F2) (bootstrap test, see p values in Table 1). The estimated R2 based on the data was also significantly higher than the simulated R2 for most of the speed pairs (Table 1). These results suggest that the faster-speed bias at the slow stimulus speeds and the consistent response weights across the neuron population at each speed pair are not analysis artifacts.

Timecourse of MT responses to bi-speed stimuli

The temporal dynamics of the response bias toward the faster component may provide a useful constraint on the neural model that accounts for this phenomenon. We therefore examined the timecourse of MT response to the bi-speed stimuli. We asked whether the faster-speed bias occurred early in the neuronal response or developed gradually.

Figure 6 shows the timecourse of the normalized responses averaged across 100 neurons in the population. The bias toward the faster speed component occurred at the very beginning of the neuronal response when the stimulus speeds were less than 20º/s (Fig. 6A-C). The first 20-30 ms of the neuronal response elicited by the bi-speed stimulus was nearly identical to the response elicited by the faster component alone, as if the slower component were not present. The early dominance of the faster component on the bi-speed response cannot be explained by the difference in the response latencies of the faster and slower components. Faster stimuli elicit a shorter response latency (Lisberger and Movshon, 1999), which can be seen in Figure 6A-C. However, the bi-speed response still closely followed the faster component for a period of time after the response to the slower component started to rise. The effect of the slower component on the bi-speed response was delayed for about 25 ms, as indicated by the arrows in Figure 6A-C. During the rest of the response period, the bias toward the faster component was persistent. As the stimulus speeds increased, the bi-speed response gradually changed to follow the average of the component responses (Fig. 6E). We found similar results when the speed separation between the two stimulus components was x4 (Fig. 6A1-E1) and x2 (Fig. 6A2-E2). At slow speeds, the very early faster-speed bias suggests a likely role of feedforward inputs to MT on the faster-speed bias. The slightly delayed reduction (normalization) in the bi-speed response relative to the stronger component response also helps constrain the circuit model for divisive normalization.

Timecourse of MT responses averaged across neurons to bi-speed stimuli.

Peristimulus time histograms (PSTHs) were averaged across 100 neurons. The bin width of PSTH was 10 ms. A1-E1. X4 speed separation. A2-E2. X2 speed separation. In A-C, the left dash line indicates the latency of the response to a bi-speed stimulus, and the right dash line and the arrow indicate when the response to a bi-speed stimulus started to diverge from the response to the faster component.

Faster-speed bias still present when attention was directed away from the RFs

One possible explanation for the faster-speed bias is that bottom-up attention is drawn toward the faster stimulus component, enhancing the response to the faster component. To address this question, we asked whether the faster-speed bias was still present if attention was directed away from the RFs. We trained one monkey (RG) to perform a demanding fine direction-discrimination task in the visual hemifield opposite to the RFs. The perifoveal/peripheral viewing of the attended stimulus and using a fine direction-discrimination task made the task attention-demanding (see Methods). The monkey performed the task well with an average correct rate of 86.7 ± 7.3% (mean ± std) (see Methods and Supplementary Material 1).

We recorded the responses from 48 MT neurons in 23 experimental sessions while the monkey performed the task. Among the 48 neurons, 32 were recorded using both the attention-away paradigm and a fixation paradigm. We found a similar faster-speed bias at low speeds. The results obtained using the attention-away paradigm and the fixation paradigm were similar (Supplementary Fig. 1). The faster-speed bias was more evident at x4 speed separation than at x2 speed separation. Based on the neuronal responses across the population, we calculated the weight for the faster stimulus component at each of the five speed pairs using linear regression (Eqs. 6, 7), as we did in Figure 5. When attention was directed away from the RFs, the response weight for the faster component decreased from a strong faster-speed bias to response averaging as the stimulus speeds increased, similar to the results from the fixation paradigm (Fig. 7). These results suggest that the faster-speed bias at low speeds cannot be explained by attention drawn to the faster-speed component.

Comparison of response weights between attention-away and fixation paradigms.

The red and blue curves indicate the response weights for the faster speed component in an attention-away paradigm and a fixation paradigm, respectively, obtained from the same population of 32 neurons. The black curves are the replot of the data in Figure 5F, obtained from 100 neurons in a fixation paradigm. A. X4 speed separation. B. X2 speed separation.

Faster speed bias also occurs when stimulus components move in different directions

We showed that at low speeds, MT response to the bi-speed stimulus was biased toward the faster stimulus component when two overlapping components moved in the same direction (at the preferred direction of the neuron). We asked whether this faster-speed bias also occurred when visual stimuli moved in different directions. We presented overlapping random-dot stimuli moving in two directions separated by 90° in the RF. The two stimulus components moved at speeds of 2.5 and 10°/s. The faster speed component moved on the clockwise side of the two directions. We varied the vector-average (VA) direction of the two component directions across 360° to characterize the direction tuning curves. Each neuron’s direction tuning curve was fitted with a spline and circularly shifted such that the VA direction 0° was aligned with the neuron’s preferred direction before averaging across neurons.

Figure 8A shows the results averaged across 21 neurons (13 from monkey RG, 8 from monkey GE). The peak response to the faster component (Fig. 8A, blue curve) was stronger than that to the slower component (green curve), consistent with the overall speed preference of a large MT neuron population (Nover et al., 2005). MT responses elicited by the bi-directional stimuli (red curve) showed a strong bias toward the faster component, more than expected by the average of the two component responses (gray curve). The bi-speed response was biased toward the faster component regardless of whether the response to the faster component was stronger (in positive VA directions) or weaker (in negative VA directions) than that to slower component (Fig. 8A). The result from an example neuron further demonstrated that, even when the peak firing rates of the faster and slower component responses were similar, the response elicited by the bi-speed stimuli was still biased toward the faster component (Fig. 8B). These results suggest that the bias was not toward the strong component response of the individual neuron, but to the faster component.

MT responses to bi-speed stimuli moving in different directions and the linear weighted sum (LWS) and normalization model fits.

A. Population-averaged direction tuning curves of 21 neurons in response to stimuli moving at two speeds and in two directions separated by 90° (red). The component direction Dir. 1 (blue) moved at 10°/s, and the component direction Dir. 2 (green) moved at 2.5°/s. The faster component Dir. 1 was always on the clockwise side of Dir. 2. The abscissas in blue and green show the directions of stimulus components Dir. 1 and Dir. 2, respectively. The blue and green axes are shifted by 90° relative to each other. The abscissa in black shows the corresponding VA direction of the two direction components. Error bands represent ±STE. The gray curve represents the average of the component responses. The orange and black curves are the LWS and normalization model fits, respectively, of the population-averaged direction-tuning curve to the bi-speed stimuli. B. The direction-tuning curves of an example neuron showing similar peak responses to the slower and faster components. The orange and black curves are nearly identical and are the LWS and normalization model fits of the bi-speed responses. The weights of wf, ws are from the normalization model fit. C. Response weights for the stimulus components obtained using the LWS model fit. Each circle represents one neuron. D. Response weights obtained using the normalization model fit. The dashed lines in C, D indicate where ws and wf sum to one. Although ws and wf are not constrained to sum to one in the model fits, the fitted weights are roughly aligned with the dashed lines. E. Population-averaged speed tuning curves of MT neurons recorded in our data sample in response to single speeds. The red circles indicate responses to 2.5 and 10º/s. Error bars represent ±STE.

To quantify the response weights, for each neuron we fitted the MT raw firing rates of the direction tuning curve to bi-speed/bi-directional stimuli as a linear weighted sum (LWS) of the direction tuning curves to the individual stimulus components moving at different speeds:

Rbi is the model-fitted direction-tuning curves to the bi-speed and bi-direction stimuli. Rs and Rf are the measured direction tuning curves to the slower and faster stimulus components, respectively. θ1 and θ2 are the motion directions of the two components; ws, wf, and c are model parameters, which represent the response weights for the slower and faster components and an offset constant, respectively. c was constrained to be between 0 and 100 spikes/s. Because the model can be well constrained by the measured direction-tuning curves, it is not necessary to require ws and wf to sum to one, which is more general. An implicit assumption of the model is that, at a given pair of stimulus speeds, the response weights for the slower and faster components are fixed across motion directions. The model fitted MT responses very well, accounting for an average of 91.8% of the response variance (std = 7.2%, N = 21) (see Methods). The success of the model supports the assumption that the response weights are fixed across motion directions. The median response weights for the faster and slower components were 0.74 and 0.26, respectively, and were significantly different (Wilcoxon signed-rank test, p = 8.0 x10−5). For most neurons (20 out of 21), the response weight for the faster component was larger than that for the slower component (Fig. 8C). This result suggests that at low speeds, the faster-speed bias is a general phenomenon that applies to overlapping stimuli moving either in the same direction or different directions.

Normalization model fit of the direction-tuning curves to bi-speed stimuli

We showed that the neuronal response in MT to a bi-speed stimulus can be described by a weighted sum of the neuron’s responses to the individual speed components. However, what determines the response weights? The divisive normalization model (Carandini and Heeger, 2012) has been used to explain a wide array of phenomena, including neuronal responses elicited by multiple visual stimuli (e.g. Britten and Heuer 1999; Heuer and Britten 2002; Busse et al. 2009; Xiao et al. 2014; Xiao and Huang 2015; Bao and Tsao 2018). In the normalization model, while the division by the activity of a population of neurons in the denominator (the normalization pool) is well accepted, the nature of the numerator is less understood. We have previously proposed that the weight of a stimulus component is proportional to the activity of a population of neurons elicited by the stimulus component (Xiao et al., 2014; Wiesner et al., 2020). We refer to this neuronal population as the “weighting pool”. Here, we assumed that the weighting pool was composed of neurons with a broad range of speed preferences in response to multiple speed components. So, the summed response of the weighting pool reflects the speed preference of the neuronal population rather than the speed preference of individual neurons. We used the following equation (Eq. 9) to fit the direction-tuning curves of each neuron in response to two speed components moving in different directions:

Rbi, Rs, Rf, θ1, and θ2 are the same as in Equation 8. Ss and Sf are the population neural responses of the weighting pools to the slower and faster component speeds, respectively. n, σ, α, and c are model parameters and have the following constraints: 0.01 ≤ n ≤ 100, 0 ≤ σ ≤ 500, 0.01 ≤ α ≤ 100, 0 ≤ c ≤ 100. α is a parameter that controls for the tuned normalization (Ni et al., 2012; Rust et al., 2006; Carandini et al., 1997). We approximated Ss and Sf based on the population-averaged responses of our recorded MT neurons (N = 100) in response to single speeds moving in the preferred direction of each neuron (Fig. 8E). For the speeds of 2.5 and 10°/s, Ss = 36.7 and Sf = 62.5 spikes/s (red circles in Fig. 8E). The normalization model fit the data well, accounting for an average of 90.5% of the response variance (std = 7.1%, N = 21), slightly smaller but comparable to the fit by the LWS model. The median response weights obtained from the normalization model for the faster and slower components were 0.78 and 0.15, respectively, and were significantly different (Wilcoxon signed-rank test, p = 6.0 x10−5) (Fig. 8D). The median values of the fitted parameters across 21 neurons are n = 4.13, σ = 123, α = 1.57, c = 0.03.

So far, we have described the neural encoding of multiple speeds in area MT. We will next examine the decoding of speed(s) from population neural responses in MT and compare the performance of decoding with perceptual performance.

Discriminate bi-speed and single-speed stimuli based on neuronal responses in area MT

We asked whether the responses of MT neurons contained information about bi-speed and single-speed stimuli that were suitable to support the perceptual discrimination of these stimuli. To address this question, we first examined the responses elicited by the bi-speed and single-speed stimuli from a population of MT neurons with different preferred speeds (PS). Next, we used a classifier to discriminate the bi-speed stimuli from the single, log-mean speed stimuli based on MT responses.

In different experimental sessions, we centered visual stimuli on neurons’ RFs. The visual stimuli were identical across experimental sessions except for the spatial location of the RF. This allowed us to pool the trial-averaged responses recorded from different neurons to form a pseudo-population (see Methods). One can interpret the responses as from a population of neurons elicited by the same visual stimulus. Figure 9 shows the pseudo-population neural response (referred to in brief as the population response) plotted as a function of neurons’ PS, constructed from 100 neurons that we recorded using a fixation paradigm (see Methods). To capture the population response evenly across a full range of PS, we spline-fitted the recorded response elicited by the bi-speed stimulus (the red curves) and by the single, log-mean speed (the black curves) (Fig. 9A-E). At x4 and x2 speed separations, the population responses elicited by two speeds did not show two separate peaks. Instead, they had a single hump that shifted from low PS to high PS as the stimulus speeds increased. At x4 speed separation across all five speed pairs, the population response elicited by two speeds was broader and flatter than that elicited by the single log-mean speed (Fig. 9A1-E1).

Population neural responses elicited by the bi-speed and single-speed stimuli and the performance of a linear classifier.

A population response of 100 recorded neurons was reconstructed by pooling across recordings in different experimental sessions. Each neuron’s response was averaged across experimental trials and normalized by the maximum response of the spline-fitted speed tuning curve to single speeds. Each dot represents the response from one neuron plotted as the neuron’s PS in the natural logarithm scale. The curves represent the spline-fitted population neural responses. Red: response to the bi-speed stimulus; Black: the response to the corresponding single, log-mean speed. A1-F1. X4 speed separation. The speeds of the bi-speed stimuli are 1.25 and 5°/s (A1), 2.5 and 10°/s (B1), 5 and 20°/s (C1), 10 and 40°/s (D1), 20 and 80°/s (E1). A2-F2. X2 speed separation. The speeds of the bi-speed stimuli are 1.25 and 2.5°/s (A2), 2.5 and 5°/s (B2), 5 and 10°/s (C2), 10 and 20°/s (D2), 20 and 40°/s (E2). Two red dots on the X-axis indicate two component speeds; the black dot indicates the log-mean speed. F1, F2. Performance of a linear classifier to discriminate the population neural responses to the bi-speed stimulus and the corresponding single log-mean speed. Error bars represent STE.

In our experiments, we directly measured the neuronal responses elicited by the log-mean speed of x4 but not x2 speed separation. Because we had characterized each neuron’s tuning curve to single speeds, we could infer the responses elicited by the log-mean speed of x2 separation by interpolating the speed tuning curve using a spline fit. At x2 speed separation, the population response elicited by two speeds was similar to that elicited by the single log-mean speed, with the two-speed population response slightly broader (Fig. 9A2-E2).

We used a linear classifier to perform a discrimination task to evaluate the discriminability between MT population responses elicited by the bi-speed stimulus and the corresponding log-mean speed. Trial-by-trial population responses were generated randomly according to a Poisson process, and the mean response of each neuron was set to the trial-averaged neuronal response. The classifier was trained and tested using k-fold cross-validation. The classifier determined whether a population response from the recorded 100 neurons in our data set was elicited by two speeds or a single speed (see Methods). Discriminability of the classifier was measured in d’ as in our psychophysics study.

Consistent with perceptual discrimination (Figs. 1F, 2B), the classifier’s performance at x4 speed separation (Fig. 9F1) was better than that at x2 speed separation (Fig. 9F2). This provides a neural correlate with better perceptual speed segmentation at larger speed separation. At x4 speed separation, the discriminability of the classifier was slightly decreased as the stimulus speed increased (Fig. 9F1), which was generally consistent with the human psychophysics results (Fig. 1E1). However, one difference was that at 20 and 80°/s, the classifier’s performance did not drop to a low level as human performance (compare Fig. 9F1 with Fig. 1E1), but was more comparable to that of the monkey subject (Fig. 2B1). At x2 speed separation, the classifier’s performance (Fig. 9F2) had a similar shape as that of the human (Fig. 1E2) and monkey (Fig. 2B2) subjects, but the performance was not as good as the perceptual performance at intermediate speeds.

When the stimulus speeds were 20 and 80°/s, the population responses elicited by the bi-speed stimulus and the single log-mean speed stimulus were noticeably different (Fig. 9E1), which explains the good performance of the classifier in differentiating the two stimuli. However, the difference in the population neural responses may contribute to perceptual differences in quality other than motion speeds, and the monkey subject might be able to pick up these perceptual cues at these high speeds to aid the task performance. To directly evaluate whether the population neural responses elicited by the bi-speed stimulus carry information about two speeds, it is important to conduct a decoding analysis to extract speed(s) from MT population responses.

Decoding either a single speed or two speeds from trial-averaged population neural response

Since the population responses elicited by the x4 and x2 speed separations only had a single peak centered between the two component speeds (Fig. 9A-E), this raised the question of how neuronal populations represent multiple speeds of the motion components. To address this question, we used a decoding approach motivated by the theoretical framework of coding multiplicity and probability distribution of visual features in neuronal populations proposed by Zemel et al. (Zemel et al., 1998; Pouget et al., 2003; also see Treue et al., 2000). Our decoder extracted speeds that minimized the difference (sum squared error) between the estimated population response elicited by the extracted speeds, and the reconstructed population neural response based on the neural recording (Eqs. 10-13, see Methods). Rather than searching for a probability distribution of speed, we constrained the search to either a single speed or two speeds. We also constrained the weights for the extracted speeds to sum to one, consistent with a probability distribution.

Our approach is akin to the forward encoding model for decoding that is often used in brain imaging studies (e.g., Kay et al., 2008; Brouwer and Heeger, 2009; Naselaris et al., 2011; Vintch and Gardner, 2014; van Bergen et al., 2015). We applied an encoding rule and found the visual stimuli that generated a population response best matched the recorded neural response. Our assumed encoding rule in the decoder is that a neuron’s response to multiple speeds is the linear sum of the neuron’s responses to individual speed components presented alone based on the neuron’s speed tuning curve, and weighted by the strength (or probability) of each speed component. The decision to use this encoding model for decoding, rather than the encoding rule characterized in this study, was made primarily for practical reasons. Our experimental data only covered two speed separations (x4 and x2) and 5 log mean speeds. We do not yet know a general encoding rule for two speeds across all different speed separations and log mean speeds. However, if the linear encoding of the two speeds, as characterized in this study, generalizes across a broader range of speed combinations – such that only the weights of the speed components vary within the general encoding rule – then our choice of encoding model for decoding would not alter the decoded speeds themselves, but would merely affect the estimated weights associated with those speeds (Eq. 10).

Figure 10 shows the decoding procedure and the results of extracting speed(s) from the population neural responses reconstructed based on the trial-averaged responses of 100 recorded neurons to the bi-speed stimuli. To capture the population neural response across a full range of the PS, we spline-fitted the recorded (red dots) and estimated (blue dots) population responses. The estimated population responses (Fig. 10, blue curves) matched the recorded neural responses well (Fig. 10, red curves) (for five speed pairs, R2 > 0.96 at x4 speed separation; R2 > 0.99 at x2 speed separation). At x4 speed separation, the decoder extracted two speeds for all speed combinations (Fig. 10A-E). The readout speeds were generally close to the veridical stimulus speeds. At low stimulus speeds of 1.25 and 5°/s (Fig. 10A) and 2.5 and 10°/s (Fig. 10B), the decoded faster speed component had a higher weight than the slower component. At the highest speeds of 20 and 80°/s, the decoder extracted two speeds (Fig. 10E), whereas human subjects could not perceive two speeds (Fig. 1E1) (see Supplementary Figure 3). At x2 speed separation, the decoder extracted two speeds only at low stimulus speeds of 1.25 and 2.5°/s (Fig. 10F). At higher stimulus speeds, the decoder extracted a dominant speed that was between the two component speeds, with or without a second nearby speed that had a very low weight (Fig. 10G-J). In contrast, human subjects could perceive two speeds when stimulus speeds were below 20 and 40°/s (Fig. 1E2) (see below and Discussion).

Illustration of the decoding procedure and extraction of speed(s) from population responses reconstructed based on the trial-averaged neuronal responses to the bi-speed stimuli. A-E. X4 speed separation. F-J. X2 speed separation.

The neural population contains 100 recorded neurons, as shown in Figure 9. Each red dot represents the trial-averaged response from one neuron plotted versus the PS of the neuron in the natural logarithm scale. The red curve represents the spline-fitted population neural response. The decoder found either one speed or two speeds with different weights (vertical green bars on the X-axis), giving rise to the estimated and spline-fitted population response (blue curve) that best fitted the recorded and spline-fitted population neural response (red curve). Each blue dot represents the estimated response from one neuron, and the blue curve represents the spline-fitted estimated population response. Two red dots on the X-axis indicate the stimulus speeds. The Y-axis on the right side shows the weight of the readout speed (A, F).

Decoding speeds from trial-by-trial population neural responses

To determine the distribution of the readout speed across trials, we randomly generated 200 trials based on the trial-averaged responses of 100 recorded neurons in our data sample. In each simulated trial, a given neuron’s response was determined by a Poisson process, with the mean set to the spike count averaged across the recorded trials. The trial-by-trial response of each neuron was normalized to construct the population response and then spline-fitted for decoding. The speeds extracted from the recorded neural responses to single stimulus speeds (Suppl. Fig. 2A-G) and from the inferred responses to the log-mean speed of x2 speed separation (Suppl. Fig. 2H-L) matched the single stimulus speed well (Suppl. Fig. 2M).

Figure 11 shows the speeds extracted from the neural response to the bi-speed stimuli. The decoder often extracted two speeds across trials. In some trials, the readout of one speed component had a minimal weight. We considered a trial having a “single” readout speed if the weight difference between the two readout speeds was greater than 0.7 (i.e., the weaker weight < 0.15). This usually happened when the readout speed having a minimal weight was either at one of the boundaries of the speed range (i.e., 1.25°/s or 80°/s) or separated from the other readout speed by a large speed separation (x27.86, which was the largest speed separation searched by the algorithm) (see Methods). These small weights were likely artifacts due to the boundaries of the stimulus speeds used in our experiments or the range of speed separation searched by the decoder.

Trial-by-trial readout speeds decoded from population neural responses to the bi-speed stimuli.

The neural population contains 100 recorded neurons and the trial-by-trial responses are randomly generated based on a Poisson process. The convention is the same as in Figure 10. A-E. Speeds decoded from population responses to x4 speed separation. The vertical red lines indicate two component speeds, which are 1.25 and 5°/s (A), 2.5 and 10°/s (B), 5 and 20°/s (C), 10 and 40°/s (D), 20 and 80°/s (E). F-J. Speeds decoded from population responses to x2 speed separation. The red vertical line indicates two component speeds, and the black vertical line indicates the log mean speed. The component speeds are 1.25 and 2.5°/s (F), 2.5 and 5°/s (G), 5 and 10°/s (H), 10 and 20°/s (I), 20 and 40°/s (J).

At x4 speed separation, the decoder was able to extract the speeds of the stimulus components (Fig. 11A-D), except at the fastest speeds of 20 and 80°/s. At low stimulus speeds of 1.25 and 5°/s, and 2.5 and 10°/s, the readout speed around the faster stimulus component had a higher weight than that around the slower stimulus component (Fig. 11A, B). At stimulus speeds of 1.25 and 5°/s, in trials with two readout speeds (Fig. 11A, on the white background), the faster readout speeds were close to the faster stimulus speed of 5°/s. The slower readout speeds were closely aligned with the slower stimulus speed of 1.25°/s, which was also the lower boundary of the speed range. In trials considered to have a single readout speed, the readout was very close to the faster stimulus speed of 5°/s (Fig. 11A, on the grey background). For some of these trials (at the top of Fig. 11A), the faster readout speed was near the upper-speed boundary of 80°/s and had a minimal weight (< 0.15). Those faster readout speeds were boundary artifacts.

At stimulus speeds of 2.5 and 10°/s, the decoder extracted two speeds that had a separation close to the veridical separation (Figs. 11B, 12B). In trials considered to have a single-speed readout, the readout speed was close to the faster stimulus speed of 10°/s. In some single- and two-readout speed trials, the slower readout speeds aligned with the 1.25°/s boundary and had a small weight, suggesting they were boundary artifacts.

At stimulus speeds of 5 and 20°/s, nearly all trials had two readout speeds with a separation well aligned with the veridical speed separation (Figs. 11C, 12C). At stimulus speeds of 10 and 40°/s, the decoder was able to extract two speeds for most of the trials (Fig. 11D). A small percentage of the trials (about 10%) were considered to have a single readout speed, which was close to the log mean speed of the two stimulus speeds (20°/s) (at the top of Fig. 11D on the grey background).

At the fastest stimulus speeds of 20 and 80°/s, about 40% of the total trials were considered to have only a single readout speed, which was near the log mean speed of the stimulus components (40°/s) (Fig. 11E). In other trials, the decoder extracted two speeds – the slower readout speeds were generally higher than the slower stimulus speed (20°/s), and the faster readout speeds aligned with the faster stimulus speed (80°/s), which was also the upper boundary speed. However, an examination of the objective function as the decoder searched for the best-fit population response across speed separations revealed that the trial-averaged objective function was flat within a big range of speed separations (Suppl. Fig. 3A). Further analysis showed that the decoder was uncertain about how many speeds were in the visual stimuli and therefore had difficulty segmenting the visual stimuli at these fast stimulus speeds of 20 and 80°/s (Suppl. Fig. 3).

At x2 speed separation, the decoder was not able to extract two speeds of the stimulus components, except at the slowest speeds of 1.25 and 2.5°/s (Fig. 11F-J). At stimulus speeds of 1.25 and 2.5°/s (Fig. 11F), in 38% of total trials considered to have a single readout speed, the readout speed was close to the faster stimulus speed of 2.5°/s (mean = 1.97°/s, STD = 1.08). In trials that had two readout speeds, the slower readout speeds roughly followed the slower stimulus speed (1.25°/s), which was also the lower boundary of the speed range (Fig. 11F). At stimulus speeds higher than 1.25 and 2.5°/s, most trials were considered to have a single readout speed (Fig. 11G-J). The mean speeds of the single readout-speed trials were 3.9°/s (STD = 1.07), 7.3°/s (STD = 1.99), 13.5°/s (STD = 1.06), and 31°/s (STD = 1.07), respectively, for stimulus speeds of 2.5 and 5°/s, 5 and 10°/s, 10 and 20°/s, and 20 and 40°/s. These mean readout speeds were close to the log mean speeds of the two stimulus speeds (3.54°/s, 7.07°/s, 14.14°/s, and 28.28°/s, respectively).

Discrimination between single- and bi-speed stimuli based on decoded speeds

To compare the perceptual discrimination between bi-speed stimuli and the log-mean speed, we used the decoding results to perform a discrimination task similar to that used in our psychophysical experiments. Figure 12 shows the distributions of the speed separation between two readout speeds extracted from the reconstructed population neural responses to the bi-speed stimuli and the correspondingly single log-mean speed. As stated above, when the difference between the weights of two readout speeds in a trial was greater than 0.7, the trial was considered to have a single readout speed, and the speed separation was set to zero. At x4 speed separation, the separations between the readout speeds extracted from the response to the bi-speed stimuli generally matched the veridical speed separation. They were larger than those extracted from the response to single log-mean speed (Fig. 12A-E). Based on the distributions of the decoded speeds, we used a speed separation threshold of x1.3 (i.e., 0.26 on the log scale, marked by a black triangle in Fig. 12) to distinguish single- and bi-peed stimuli and to evaluate the hit rate and false alarm rate. The exact choice of the threshold within a range from x1.1 to x1.7 did not change the results qualitatively. We calculated d’ to measure the ability to discriminate the bi-speed stimuli from the corresponding single log-mean speed. At x4 speed separation, the d’ (Fig. 12K) was similar to the psychophysical performance of the monkey subject (Fig. 2B1), reaching its peak at 5 and 20°/s. Although d’s at stimulus speeds of 1.25 and 5°/s and 2.5 and 10°/s were smaller than those of human subjects (Fig. 1C1, E1), the fact that in many trials, the readout speeds matched the faster stimulus speeds (Fig. 12A, B) indicated that the decoder was able to segment the visual stimuli when stimulus speeds were low.

Discrimination between single- and bi-speed stimuli based on decoded speeds.

A-J. The distributions of the speed separation between two readout speeds in each trial for the bi-speed stimuli (yellow) and the single, log-mean speed (blue). The bin width is 0.05. The abscissa is shown in the natural logarithm scale. The red dotted line indicates veridical speed separation. A-E. X4 speed separation. The speeds of the bi-speed stimuli are 1.25 and 5°/s (A), 2.5 and 10°/s (B), 5 and 20°/s (C), 10 and 40°/s (D), 20 and 80°/s (E). F-J. X2 speed separation. The speeds of the bi-speed stimuli are 1.25 and 2.5°/s (F), 2.5 and 5°/s (G), 5 and 10°/s (H), 10 and 20°/s (I), 20 and 40°/s (J). K-L. The performance of discriminating a bi-speed stimulus from the corresponding log-mean speed is based on the speed separation of the decoded speeds. K. X4 speed separation; L. X2 speed separation. The black triangles in A-J indicate the speed separation threshold of x1.3 (0.26 on the log scale) used for discriminating bi-speed and single-speed stimuli.

At x2 speed separation, except at 1.25 and 2.5°/s, the distribution of the speed separation extracted from the response to the bi-speed stimuli was similar to that extracted from the inferred response to single log-mean speed (see Methods) (i.e., orange and blue bars overlapping) (Fig. 12F-J). The d’ calculated based on the decoded speed separation (Fig. 12L) was smaller than the psychophysical performance of human and monkey subjects (Fig. 1C2, E2; Fig. B2), suggesting that the decoder was not able to segment the visual stimuli at x2 speed separation, except at the lowest speeds of 1.5 and 2.5°/s.

Discussion

Perceptual segmentation of multiple motion speeds

Our human psychophysical study employed a novel 3AFC task. The task combined an identification task (to report whether a stimulus had one or two speeds) with a discrimination task (to compare a two-speed stimulus with a single-speed stimulus) (Fig. 1A, E1, E2). This approach allowed us to characterize discriminability based on perceptual segmentation, rather than other perceptual appearances of the stimuli. We made two findings. First and intuitively, the performance of speed segmentation was better when the separation between two stimulus speeds was larger. Second, at a fixed speed separation, speed segmentation became harder at fast speeds. Our results are consistent with previous studies. Masson et al. (1999) showed that the speed segmentation threshold increased sharply when the mean stimulus speed increased from 8°/s to 16°/s. By varying the width of a speed notch, Rocchi et al. (2018) showed that transparent motion perception was stronger with a wider notch width, and that transparent motion was well perceived at slow speeds (mean speed = 4.6°/s) but not at faster speeds (mean speed = 20.6°/s) at a range of notch width from 1 to 6°/s. Our study tested a larger range of speeds and showed that the segmentation performance dropped sharply at speeds of 20 and 80°/s (x4), and 20 and 40°/s (x2), faster than those shown in the previous studies. This discrepancy is likely due to the larger speed separations used in our study and the difference in stimuli. The visual stimuli used in our study had either one or two speeds, whereas those used by Rocchi et al. (2018) were sampled from a distribution of motion speeds and had multiple elements.

Neural encoding of multiple speeds and implication for efficient coding

We found that, at low stimulus speeds, MT neurons showed a faster-speed bias in representing two speeds of overlapping stimuli. We also showed that faster-speed bias in MT is a robust phenomenon regardless of whether the stimulus components move in the same or different directions. A faster-speed bias in representing two motion speeds is a novel finding. It adds to a growing body of studies demonstrating that visual neurons do not necessarily average the responses elicited by individual stimulus components in response to multiple stimuli (e.g., Ni et al., 2012; Bao and Tsao, 2018). Our laboratory has previously reported that the responses of MT neurons to multiple moving stimuli can show a bias toward the stimulus component with a higher signal strength such as motion coherence or luminance contrast (Xiao et al., 2014; Wiesner et al., 2020), a directional side bias toward one of two motion directions, even when the stimulus components have the same signal strength (Xiao and Huang, 2015), and a disparity bias toward one of two surfaces moving at different stereoscopic depths (Chakrala et al., 2024). These different response biases enhance the representation of individual stimulus components and can help to facilitate the segmentation of multiple moving stimuli.

While the faster-speed bias reported in this study may facilitate the segregation of faster-moving stimuli, it may come at the cost of reduced ability to segregate slower speeds. Why does the primate visual system encode multiple speeds in this way? An efficient way to represent sensory information is to devote limited resources to better represent signals that occur more frequently in the natural environment (Attneave 1954; Barlow 1961; Simoncelli and Olshausen 2001). Previous studies have suggested that slow speeds are more likely to occur than fast speeds in natural scenes (Weiss et al., 2002; Stocker and Simoncelli, 2006; Zhang and Stocker, 2022). If neurons in the primate visual cortex are optimized to efficiently represent speeds that are more likely to occur in natural scenes, one may expect to find neurons showing a slower-speed bias rather than a faster-speed bias. However, besides maximizing information about the environment, neural representation in the sensory cortices may be optimized for other goals, such as maximizing the performance of certain behavioral tasks (Simoncelli and Olshausen, 2001; Manning et al., 2024). Since a figural object tends to move faster than its background in natural scenes (Huang et al., 2019), a neural representation of multiple motions with a faster-speed bias would help to identify the figure and, therefore, benefit the performance of an essential behavioral task – figure/ground segregation. Our finding of a faster-speed bias at slow stimulus speeds underscores the possibility that, when choosing between efficiently representing the most commonly occurring features in natural scenes (e.g., slow speeds) and enhancing behavioral performance in critical tasks (e.g., figure-ground segregation), some brain areas in the visual system may prioritize representing the stimulus features that enhances the behavioral performance.

Potential mechanisms underlying the neural encoding of multiple speeds

We found that the faster-speed bias was still present when attention was directed away from the RFs, suggesting that the faster-speed bias cannot be explained by an attentional modulation. The faster-speed bias cannot be explained by the apparent contrast of the stimulus component either – the random dots of the faster-speed component had shorter dwell time on the video display and appeared dimmer than the slower component. We suggest a modified normalization model that may explain why the faster-speed bias in MT occurs at low stimulus speeds and diminishes at high speeds.

Previous studies that characterize the neural representation of multiple stimuli have used stimulus strength to weigh the component responses in the divisive normalization model (e.g., Busse et al., 2009; Ni et al, 2012; Xiao et al., 2014; Heuer and Britten, 2002). In comparison to the standard normalization model, we suggest that the response of a population of neurons (i.e., the weighting pool) defines the numerator of the normalization equation (Eq. 9). The weighting pool may or may not be the same as the normalization pool that defines the response in the denominator. We suggest that the weighting pool contains a population of neurons with a broad range of speed preferences. In this way, the summed (or averaged) response of the weighting pool depends mainly on the stimulus speed, and therefore the weighting is less sensitive to the individual neuron’s speed preference. In this study, we used MT population-averaged responses to single speeds to approximate the responses of the weighting pool. MT population-averaged speed tuning in our data peaked around 20°/s (Fig. 8E), consistent with previous studies (Maunsell and Van Essen 1983; Lisberger and Movshon, 1999; Nover et al., 2005; Huang and Lisberger, 2009). At stimulus speeds less than 20°/s, the population speed tuning has a positive slope, and a faster component would elicit a stronger population response than a slower component. This insight explains the faster-speed bias at low stimulus speeds and why the faster-speed bias tends to be stronger at x4 than x2 speed separation. Conceptually, this model can also explain why faster-speed bias diminishes at higher speeds because, when two stimulus speeds are at opposite sides of the population’s preferred speed, they elicit similar population responses in the weighting pool. This model also predicts that when both stimulus speeds are higher than the preferred speed of the weighting pool, the response weight for the slower component should be higher than the faster component.

This modified normalization model well described our data on MT responses to two stimuli moving in different directions at 2.5 and 10°/s (Fig. 8). However, our current data set has limitations to validate this model fully. This normalization model (Eq. 9) can describe our data on MT responses to the bi-speed stimuli moving in the same direction across five speed pairs reasonably (Fig. 5) (results not shown). However, since the responses of each neuron to the bi-speed stimuli only have five data points (see Fig. 3) and our model has four free parameters (Eq. 9), the model is underconstrained. In future work, it will be important to extend the experiment to include pairs of stimuli moving in different directions at varying combinations of speeds across a broader range. By incorporating full direction tuning curves (as shown in Fig. 8) to better constrain the model, and systematically varying speed combinations, such a study could test the model’s prediction that the response bias shifts from a faster-speed bias to response averaging and eventually to a slower-speed bias, as overall stimulus speeds increase.

Although in our model, we used the responses of a population of MT neurons to estimate the responses of the weighting pool, it is possible that the weighting pool may be composed of neurons that feed signals into MT and have similar population-averaged speed tuning as MT neurons. MT neurons receive feedforward motion-selective input mainly from V1, and also from V2 and V3 (Ungerleider and Desimone, 1986; Movshson and Newsome, 1996; Anderson et al., 1998; Anderson and Martin, 2002; Rockland, 2002). Speed-selective complex cells in V1 have preferred speeds in a range similar to that of MT neurons, but the mean preferred speed is slower than MT (Mikami et al., 1986; Orban et al., 1986; Priebe et al., 2006). In future work, examination of the transition speeds at which faster-speed bias changes to response averaging and slower-speed bias may help to differentiate whether the weighting pool consists of neurons in MT or early visual areas such as V1.

Decoding multiple speeds from population neural responses

Theoretical studies have proposed neural coding of probability distribution and multiplicity of a visual attribute (Pouget et al., 2000). The key idea of this framework is that neurons are not coding a single stimulus value but instead coding the distribution of the stimulus (Zemel et al., 1998; Pouget et al., 2003). However, neurophysiological evidence supporting this framework on coding multiplicity is limited. Previous studies have not demonstrated the ability to extract multiple speeds from population neural responses. Our results provide experimental support for this framework of coding multiplicity. Our decoding analysis reveals that the population neural response in MT carries information about multiple speeds of overlapping stimuli, and it is possible to extract multiple speeds and their weights even when the population neural response has a unimodal distribution.

At large (x4) speed separation, our decoding results captured several key features of human and monkey’s perception of multiple speeds – the decoded speeds support perceptual segmentation at low to intermediate speeds (Figs. 11A-E, 12A-K). At 20 and 80°/s, the decoder was uncertain about whether a single speed or two speeds were present in the visual stimuli and, therefore, had difficulty segmenting the visual stimuli at these fast speeds (Suppl. Fig. 3). However, at small (x2) speed separation, the decoding results showed very little segmentation (Figs. 11G-J, 12L), except at very low speeds. This result differs from the perception at stimulus speeds less than 20 and 40°/s (Figs. 1C2, E, 2B2). What are the potential reasons for the decoder’s inadequacy in segmenting small speed separations? The best-fit population response predicted by the encoding rule of the decoder matched the neural responses remarkably well (R2 > 0.99 for all five speed pairs of x2 separation, Fig. 10F-J). So, the encoding model for decoding well described the population neural responses to the bi-speed stimuli. Because we found the same results when performing decoding based on neural responses averaged across experimental trials (Fig. 10G-J), this inadequacy was unlikely due to our assumption of the trial-by-trial response variability following a Poisson process, nor due to the lack of consideration of noise correlations (Zohary et al., 1994; Huang and Lisberger, 2009). We consider several factors that may contribute to this discrepancy.

First, this may be attributed to the limited sample size of our dataset. If we had a much larger MT neuron population, potential differences in neuronal responses to bi-speed stimuli and the single log mean speed might be captured by the data, which may lead to better decoding. Second, it may be due to the choice of the “objective function”. Our decoder minimized the sum squared error between the predicted population response and the recorded neural response. In contrast, Zemel et al. (1998) found motion directions that maximized the posterior probability P(s|r) using a maximum a posteriori (MAP) estimate. It remains to be determined whether maximizing the posterior probability can improve the resolution of segmenting multiple speeds. Third, neuronal response in different sensory areas, including MT, to two stimuli can fluctuate from trial to trial between representing one stimulus component from the other (Li et al., 2016; Caruso et al., 2018; Jun et al., 2022; Schmehl et al., 2024; Groh et al., 2024). If this trial-varying stimulus multiplexing also occurred for representing two speeds with a small separation, information about individual speed components would be lost in the trial-averaged responses (with added variability based on a Poisson process), as in our decoding procedure. Future studies with a large number of repeated experimental trials would be needed to test this possibility. Finally, while area MT is clearly important for motion-based segmentation, other motion-sensitive brain areas may be important for segmenting speeds with a small separation.

Materials and methods

We conducted psychophysical experiments using human subjects, and psychophysical and neurophysiological experiments using macaque monkeys.

Human psychophysics

Subjects

Four adult human subjects (CN, CO, IN, NP), two men and two women, with normal or corrected-to-normal visual acuity participated in the psychophysics experiments. Subject CN was naive about the purposes of the experiments. Subjects CO and IN had a general idea about this study but did not know the specific design of the experiments. Informed consent was obtained from the subjects. All aspects of the study were in accordance with the principles of the Declaration of Helsinki and were approved by the Institutional Review Board at the University of Wisconsin-Madison.

Apparatus

Visual stimuli were generated by a Linux workstation using an OpenGL application and displayed on a 19-inch CRT monitor. The monitor had a resolution of 1,024 × 768 pixels and a refresh rate of 100 Hz. The output of the video monitor was measured with a photometer (LS-110, Minolta) and was gamma-corrected. Stimulus presentation was controlled by a real-time data acquisition and stimulus control program “Maestro” (https://sites.google.com/a/srscicomp.com/maestro/) as in the animal behavior and neurophysiology experiments. Subjects viewed the visual stimuli in a dark room with dim background illumination. The viewing distance was 58 cm. A chin rest and forehead support were used to restrict the head movements of the observers. During experimental trials, human subjects maintained fixation on a small spot within a 2 × 2° window. Eye positions were monitored using a video-based eye tracker (EyeLink, SR Research) at a rate of 1kHz.

Visual stimuli

Visual stimuli were two spatially overlapping random-dot patches presented within a square aperture 10° wide. Each square stimulus was centered 11° to the right of the fixation spot, therefore covering 6° to 16° eccentricity. This range roughly matched the RF eccentricity of the recorded MT neurons in our neurophysiological experiments. The random dots were achromatic. Each random dot was 3 pixels and had a luminance of 15.0 cd/m2. The background luminance was 0.03 cd/m2. The dot density of each random dot patch was 2 dots/degree2. The two random-dot patches translated horizontally in the same direction. To reduce adaptation, the motion direction was either leftward or rightward in half of the trials, and stimulus trials were randomly interleaved. In one set of trials, two overlapping random-dot patches had a “large speed separation” and the speed of the faster component was always four times (x4) that of the slower component. In another set of trials, visual stimuli had a “small speed separation” and the speed of the faster component was always twice (x2) that of the slower component (see Fig. 1B1, B2). For each bi-speed stimulus, there was a corresponding single-speed stimulus composed of two overlapping random-dot patches moving in the same direction at the same speed. The single speed was the natural logarithmic (log) mean speed of the bi-speed stimulus: , in which Spd1 and Spd2 were the two component speeds. The motion coherence of each random-dot patch was always 100%.

Procedure

In a standard two-alternative-forced-choice (2AFC) task, subjects discriminated a bi-speed stimulus from the corresponding single log-mean speed stimulus. The bi-speed and single-speed stimuli were presented in two consecutive time intervals with a 500 ms gap in between, in random, balanced order. In each time interval, the visual stimulus appeared, remained stationary for 250 ms, and then moved for 500 ms. At the end of each trial, subjects reported which time interval contained a bi-speed stimulus by pressing one of two buttons (left or right) within a 1500-ms window. After the button press, the inter-trial interval was 1300 ms. Each block of trials contained 40 trials, i.e. 5 speed pairs × 2 speed separations × 2 temporal orders (the bi-speed stimulus appeared in the first or second time-interval) × 2 motion directions (visual stimuli moved either to the left or right). Each experimental session typically contained 5 blocks, i.e. 200 trials.

Subjects also performed a 3AFC task. As in the 2AFC task, subjects discriminated a bi-speed stimuli from the corresponding single log-mean speed stimulus but had the option to make a third choice by pressing the middle button on trials when they thought neither stimulus interval appeared to contain two speeds (“no two-speeds” choice). When subjects thought one of the two stimulus intervals contained two speeds, subjects then pressed either the left or the right button to indicate which interval had two speeds.

Data analysis

The hit rate was calculated as the percentage of trials in which a subject correctly picked the bi-speed stimulus as having two speeds. The false alarm rate was calculated as the percentage of trials that a subject incorrectly picked the singe-speed stimulus as having two speeds. As a measure of discriminability between the bi-speed and the corresponding single-speed stimuli, we calculated the discriminability index d′ = norminv(hit rate) – norminv(false alarm rate). norminv is a MATLAB function that calculates the inverse of the normal cumulative distribution function, with the mean and standard deviation set to 0 and 1, respectively. When the hit or false alarm rate was occasionally close to 1, to avoid infinite d’ values, d’ was calculated using a modified formula: d’ = norminv{[(100 x hit rate)+1]/102} - norminv{[(100 x false alarm rate) +1]/102}. In analyzing the results of the 3AFC task, we incorporated the NTC trials into the d’ calculation by evenly splitting the NTC trials into “hit” trials and “false alarm” trials. In this way, the NTC trials were still accounted for by the hit rate and false alarm rate, in the sense that they did not contribute to the discrimination. We also examined the percentage of trials in which subjects made the NTC choice at different stimulus speeds.

Neurophysiological and psychophysical experiments

Subjects

Five male adult rhesus monkeys (Macaca mulatta) were used in the experiments. Four monkeys were used in the neurophysiological experiments, and one was used in the psychophysical experiment. Experimental protocols were approved by the local Institutional Animal Care and Use Committee and were in strict compliance with U.S. Department of Agriculture regulations and the National Institutes of Health Guide for the Care and Use of Laboratory Animals.

Apparatus and electrophysiological recording

Procedures for surgical preparation and electrophysiological recording were routine and similar to those described previously (Huang and Lisberger, 2009; Xiao et al., 2014). For subjects IM and MO, horizontal and vertical eye positions were monitored using the search coil method at a sampling rate of 1kHz on each channel. For subjects RG, GE, and BJ, eye positions were monitored using a video-based eye tracker (EyeLink, SR Research) at a rate of 1kHz. For electrophysiological recordings, we lowered single-contact tungsten microelectrodes (Thomas Recording or FHC) either using the MiniMatrix microdrive (Thomas Recording) or the NAN drive (NAN Instruments) into the posterior bank of the superior temporal sulcus. The impedances of the electrodes were 1∼3 MΩ. We identified area MT by its characteristically large proportion of directionally selective neurons, small classical RFs relative to those in the neighboring medial superior temporal area, and location on the posterior bank of the superior temporal sulcus. Electrical signals were filtered, amplified, and digitized conventionally. Single units were identified with a real-time template-matching system (Plexon). Spikes were carefully sorted using Plexon offline sorter.

Stimulus presentation and the behavioral paradigm were controlled by a real-time data acquisition program Maestro as described in the human psychophysics experiment. For neurophysiological recordings from IM and MO, visual stimuli were presented on a 20-inch CRT monitor at a viewing distance of 38 cm. Monitor resolution was 1,280 × 1,024 pixels and the refresh rate was 85 Hz. For RG, GE, and BJ, visual stimuli were presented on a 25-inch CRT monitor at a viewing distance of 63 cm. Monitor resolution was 1,024 × 768 pixels and the refresh rate was 100 Hz. Visual stimuli were generated by a Linux workstation using an OpenGL application that communicated with the main experimental-control computer over a dedicated Ethernet link. The output of the video monitor was gamma-corrected.

Visual stimuli and experimental procedure of the main experiment

All visual stimuli were presented in individual trials while monkeys maintained fixation. Monkeys were required to maintain fixation within a 1.5 × 1.5° window centered around a fixation spot during each trial to receive juice rewards, although actual fixation was typically more accurate. In a trial, visual stimuli were illuminated after the animal had acquired fixation for 200 ms. To assist the isolation of directional-selective neurons in area MT, we used circular translation of a large random-dot patch (30 × 30°) as a search stimulus (Schoppmann and Hoffmann, 1976). After an MT neuron was isolated, we characterized the direction tuning by randomly interleaved trials of 30 × 30° random-dot patches moving at 10°/s in eight different directions from 0 to 315° at 45° steps. Next, we mapped the RF by recording responses to a series of 5 × 5° patches of random dots that moved in the preferred direction of the neuron at 10°/s. The location of the patch was varied randomly to tile the screen in 5° steps without overlap and to cover an area of either 40 × 30° or 35 × 25°. The raw map of the RF was interpolated using the Matlab function interp2 at an interval of 0.5° and the location giving rise to the highest firing rate was taken as the center of the RF. In the following experiments, testing stimuli were centered on the RF.

Monkeys IM and MO were tested with the main visual stimuli used in our experiments, which were two spatially overlapping random-dot patches presented within a square aperture 10° wide. The random dots were achromatic. The dot density of each random-dot patch was 2 dots/deg2. Each random dot was 3 pixels at a side and had a luminance of 15.0 cd/m2. The background luminance was < 0.2 cd/m2. In each trial, the random dots moved within the aperture. The two random-dot patches translated at two different speeds at 100% motion coherence and in the same direction (the preferred direction of the recorded neuron). The ratio between the two component speeds was fixed either at 4 (i.e. the large speed separation) or 2 (i.e. the small speed separation) (see Methods for human psychophysics above). At x4 speed separation, the five speed pairs used were 1.25 and 5°/s, 2.5 and 10°/s, 5 and 20°/s, 10 and 40°/s, and 20 and 80°/s (Fig. 1B1). At x2 speed separation, the speed pairs used were 1.25 and 2.5°/s, 2.5 and 5°/s, 5 and 10°/s, 10 and 20°/s, and 20 and 40°/s (Fig. 1B2). Experimental trials of bi-speed stimuli that had x4 or x2 speed separations were randomly interleaved. Also randomly interleaved were trials that showed only a single random-dot patch moving at a speed of 1.25, 2.5, 5, 10, 20, 40, or 80°/s, which were the individual stimulus components of the bi-speed stimuli.

Monkeys RG and GE were tested with a variation of the main visual stimuli, in which two overlapping random-dot stimulus components moved at two fixed speeds of 2.5 and 10°/s, respectively, and in two different directions separated by 90°. The diameter of the stimulus aperture was 3°. The faster component moved at the clockwise side of the two component directions (illustrated in Fig. 8). We varied the vector average direction of the two stimulus components across 360° in a step of 15° to characterize the direction-tuning curves of MT neurons. We also measured the direction-tuning curves to a single stimulus moving at the individual component speeds.

Behavioral paradigm and visual stimuli of attention control

Monkey RG was also tested in a control experiment in which the attention of the animal was directed away from the RFs of MT neurons. The attended stimulus was a random-dot patch moving in a single direction at 100% motion coherence within a stationary circular aperture that had a diameter of 5°. The stimulus patch was centered 10° to the left of the fixation spot, in the visual hemifield contralateral to the hemifield of the recorded MT neurons’ RFs. The monkey performed a fine direction-discrimination task to report whether the motion direction of the attended stimulus moved at the clockwise or counter-clockwise side of the vertical direction. While the animal fixated on a point at the center of the monitor, both the attended stimulus and the RF stimulus were turned on and remained stationary for 250 ms before they moved for 500 ms. The attended stimulus translated at a speed of 10°/s and in a direction either clockwise or counter-clockwise from an invisible vertical (upward) direction by an offset of 10°, 15°, or 20°. The RF stimuli were the same as our main visual stimuli, with either a single-speed or bi-speed stimulus moving in the same direction. All trials were randomly interleaved. After the motion period, all the visual stimuli were turned off, and two reporting targets appeared 10° eccentric on the left and right sides of the fixation point. To receive a juice reward, the animal was required to make a saccadic eye movement within 400 ms after the fixation spot was turned off, either to the left or right target when the motion direction of the attended stimulus was counter-clockwise or clockwise to the vertical direction, respectively.

Monkey psychophysics

Monkey BJ was trained to perform a 2AFC discrimination task. The visual stimuli were the same as our main visual stimuli in the neurophysiological experiments except that the stimulus moving at a single speed was also composed of two overlapping random-dot patches moving in the same direction at the same speed, the same as in the human psychophysics experiments. In this way, the single-speed stimulus and the bi-speed stimuli had the same dot density. Visual stimuli were random-dot patches moving within a square aperture of 10°x10°, centered 10° to the right of the fixation spot. The motion direction of the visual stimuli was always rightward. Experimental trials of bi-speed stimuli that had x4 or x2 speed separations, as well as the single-speed stimulus that moved at the log mean speed of the bi-speed stimuli were randomly interleaved. Visual stimuli were turned on and remained stationary for 250 ms before they moved for 500 ms. Following the stimulus offset, two reporting targets (dots) were presented 5.7° away from the fixation spot, at upper right (4°, 4°) and lower left (−4°, -4°) positions relative to the fixation spot. To receive a juice reward, the animal was required to make a saccadic eye movement to one of the two targets within 300 ms after the fixation spot was turned off. In a majority of the experiment trials, the animal received juice rewards if selecting the upper-right target when visual stimuli moved at two different speeds and selecting the lower-left target when visual stimuli moved at a single speed. Guided by our human psychophysics results, we made an exception to always reward the animal when the bi-speed stimuli moved at 20 and 80°/s or at 20 and 40°/s, regardless of which target was selected to avoid biasing the monkey’s choice by veridically rewarding the animal. This was because, at these fast speeds, human subjects could not segment the bi-speed stimuli. During training, the animal was never presented with the bi-speed stimuli of 20 and 80°/s, and 20 and 40°/s. During testing, the trials of 20 and 80°/s, and 20 and 40°/s were randomly interleaved with bi-speed and single-speed trials that were rewarded veridically to anchor the task rule. Among all testing trials, only 10% of the trials were rewarded with a 100% rate. We collected 50 trials of data for x4 speed separation across 5 experimental sessions, and 90 trials for x2 speed separation across 9 sessions during the testing phase. The hit rate, false alarm rate, and the d’ were calculated in the same way as in the human psychophysics experiments.

Model fit of the tuning curves to bi-speed stimuli

We used a linear weighted summation model (Eq. 8) to fit the direction-tuning curves to overlapping stimuli moving in different directions and at different speeds. We also fitted the direction-tuning curves to the bi-speed/bi-directional stimuli using a modified divisive normalization model (Eq. 9). These model fits were obtained using the constrained minimization tool “fmincon” (MATLAB) to minimize the sum of squared error. To evaluate the goodness of fit of models for the response tuning curves, we calculated the percentage of variance (PV) accounted for by the model as follows: , where SSE is the sum of squared errors between the model fit and the neuronal data, and SST is the sum of squared differences between the data and the mean of the data (Morgan et al., 2008).

Construction of population neural response

For each recorded MT neuron, we plotted the trial-averaged speed tuning curve in response to the single speed and spline-fitted the tuning curve using the Matlab function csaps with the smoothing parameter p set to 0.93. We found p = 0.93 best captured the trend of the speed tuning, without obvious overfitting. We then found the preferred speed (PS) of the neuron, which is the speed when the maximum firing rate was reached in the spline-fitted tuning curve. The neuron’s responses to all single-speed and bi-speed stimuli were normalized by the maximum firing rate at the PS. To construct the population neural response to a given stimulus, we took the normalized firing rate of each neuron elicited by that stimulus and plotted it against the PS of the neuron. Because the PSs of the neurons in our data sample did not cover the full speed range evenly, we spline-fitted (with a smoothing parameter of 0.93) the population neural response to capture the population neural response evenly across the full range of PS.

Discrimination of population neural responses using a classifier

We trained a linear classifier to discriminate constructed population neural response to a bi-speed stimulus and the corresponding single-speed stimulus moving at the log mean speed. Constructed trial-by-trial population responses were generated randomly according to a Poisson process with the mean set to the recorded neuronal response averaged across experimental trials.

For each speed combination, we generated 200 trials of responses to the bi-speed stimuli and the corresponding single-speed stimulus, respectively. Constructed population responses were partitioned into training and testing sets using k-fold cross-validation (k = 40). The 200 generated trials were randomly divided into 40 folds. The classifier was trained on 39 data folds and tested on the remaining fold, and the process was repeated 40 times to ensure that each fold was used for testing exactly once. The Matlab fitclinear function was used to fit a linear classifier to the training data. The logistic learner and lasso regularization techniques were specified during the model training. The Stochastic Gradient Descent solver was used to optimize the objective function during the training of the classifier. The performance of the classifier was evaluated by d’, calculated using the hit rate and false alarm rate as described in human psychophysics.

Population Decoding

We define a given probability distribution of stimulus speed as: ∅m = {Pm,j}, in which Pm,j is the probability of speed Sj, j = 1, 2, 3, …, 121, and j evenly samples speeds from 1.25°/s to 80°/s (referred to as the “full speed range”) in a natural logarithm scale and at a “speed interval” of 0.0347. Because ∅m is a probability distribution, ∑j Pm,j = 1. m is an index for different distributions.

The estimated response (ES) of neuron i to the stimulus speeds with a probability distribution ∅m is a linear sum of the responses of neuron i to each single speed Sj within the full speed range, weighed by the probability of each speed in ∅m. The probability can also be considered as the weight (signal strength) of the speed.

where fi is the spline-fitted speed tuning curve of neuron i in response to single speeds. The estimated population response (EP) of N neurons to ∅m is:

where PSi is the preferred speed of neuron i, i = 1, 2, 3, …, N. N = 100 in our neural data.

We then spline-fitted the estimated population response EPm(ln (PSi)) using a smoothing parameter of 0.93, interpolating the PS within the full speed range from 1.25°/s to 80°/s in natural logarithm with 121 evenly spaced values. The spline-fitted estimated population response is represented as spEPm(ln (PSj)), j = 1, 2, 3, …, 121.

Similarly, we spline-fitted the recorded and normalized population neural response RPm(ln (PSi)), i = 1, 2, 3, …, 100, and interpolated the PS to the same 121-speed values in a logarithm scale within the full speed range as above. The spline-fitted, recorded population neural response is represented as spRPm(ln (PSj)), j = 1, 2, 3, …, 121.

The decoded probability distribution of the stimulus speed ∅e is the ∅m that maximizes the objective function (OF), which is defined as the negative value of the SSE (sum squared error) between the spline-fitted estimated population response and the recorded neural response:

Rather than finding an arbitrary distribution, we constrained ∅e to contain either a single speed with a probability (referred to as the “weight”) of 1 or two speeds with the same or different weights that sum to 1.

Algorithm to search for the probability distribution of stimulus speed

We first searched for the best-fit distribution ∅e1 that contained a single speed SP with non-zero probability (P=1) that gave rise to the maximum OF across the full speed range (OFmax1). We next searched for the best-fit distribution ∅e2 that contained two speeds SP1 and SP2 with non-zero probability and gave rise to the maximum OF for two speeds (OFmax2). We varied the speed separation, the center position, and the probabilities of the two speeds. For each speed separation and center position, the probabilities of SP1 and SP2 were varied from 0 to 1 at a step of 0.01, with the constraint that they summed to 1. We searched the speed separation, ln(SP2)- ln(SP1), from 0.0693 (i.e. 2 speed intervals) to 3.3271 (i.e. 96 speed intervals), in a step of 0.0693. The search range covered the speed ratio SP2/SP1 from x1.07 to x27.86, sufficiently broader than x2 and x4 used in our visual stimuli. For each speed separation, we started the search where the center position of the two speeds [ln(SP1)+ln(SP2)]/2 was in the middle of the 121 possible speed values, referred to as the “speed axis”. We then moved the center position toward the left border of ln(1.25) at a step of 0.0347 to find the maximum OF value (OFleftmax) along the left half of the speed axis. If the OF value at the center position next to the current position was higher, the search moved to the next position. Otherwise, the current position was considered a local maximum. After we found a local maximum, the search continued in the same direction for up to another 30 speed intervals until one of the component speeds hit a border, or 30 intervals were reached, or an OF value greater than the previous local maximum was found. If a larger OF was found, the local maximum was updated and the search jumped to that position, and the procedure repeated until OFleftmax was found. We then returned to the middle of the speed axis and searched speed pair toward the right border ln(80) to find the maximum OFrightmax. The larger one of OFleftmax and OFrightmax was the maximum OF for two speeds (OFmax2). The ∅e was either ∅e1 or ∅e2, whichever gave rise to the larger value of OFmax1 and OFmax2.

Acknowledgements

We thank Dr. Steven Lisberger for his support in the early phase of this project, Emily Ausloos and Jianbo Xiao for data collection in early human psychophysics experiments, Bryce Arseneau for animal training, Ying Cao for collecting additional neural data, Drs. Jennifer Coonen and Kevin Brunner at the Wisconsin National Primate Research Center for excellent veterinary care and surgical assistance, Dr. Kechen Zhang for helpful suggestions on the study, Drs. Emily Cooper and Greg DeAngelis for their valuable comments on the manuscript.

Supplementary Materials and Figures

1. Behavioral performance of the fine-direction discrimination task and MT response properties when attention was directed away from MT neurons’ RFs

The monkey RG performed a fine-direction discrimination task with an average correct rate of 86.7 ± 7.3% (mean ± std) across 23 sessions and over 5000 trials. The correct rates for 10°, 15°, and 20° direction offsets of the fine direction-discrimination task were 78.8 ± 9.7%, 87.5 ± 8.3%, and 93.9 ± 5.8%, respectively (see Methods).

Population-averaged speed tuning curves to bi-speed stimuli and constituent single-speed components recorded in an attention-away and a fixation paradigm.

Speed tuning curves from one monkey (RG) averaged across A1-D1. 5 neurons that had PS ≤ 2.5°/s, A2-D2. 6 neurons that had PS between 2.5 and 25°/s, A3-D3. 21 neurons that had PS > 25°/s. Error bars represent ±STE. A1-A3 and B1-B3. X4 speed separation; C1-C3 and D1-D3. X2 speed separation. A1-A3 and C1-C3. Attention directed away from the RF; B1-B3 and D1-D3. Fixation paradigm.

2. Trial-by-trial readouts from population neural responses to single speeds

Trial-by-trial readout speeds decoded from population neural responses to single speeds.

The neural population contained 100 recorded neurons, as shown in Figure 9. The trial-by-trial responses were randomly generated based on a Poisson process, with the mean set to the spike count averaged across the recorded trials. Each row shows the readout speed(s) from one trial, and each dot’s size is proportional to the weight of the readout speed. If only one speed is decoded in a trial, that readout speed is shown in red. In trials with two readout speeds, the slower and faster readout speeds are shown in green and blue, respectively. The white background indicates trials with a weight difference between two readout speeds less than 0.7 and are considered to have two readout speeds. The gray background indicates trials with a weight difference greater than 0.7 and are considered to have only one readout speed. The vertical black line and the speed marked in each panel indicate the stimulus speed. A-G. Speeds decoded from recorded population neural responses to single speeds from 1.25 to 80°/s. Note that, at the stimulus speed of 80°/s (G), in addition to picking up the veridical speed of 80°/s (log speed of 4.382), the decoder often picked up a slower speed at 2.872°/s (log speed of 1.055), which was at the largest speed separation from 80°/s used in our searching algorithm (x27.86, log value 3.327). This border effect can also be seen at the stimulus speed of 1.25°/s (A) as well, in which a weaker and faster speed was sometimes picked up around 34.8°/s (log speed of 3.55). H-L. Speeds decoded from inferred population neural response to single speeds, which are the log-mean speeds of the bi-speed stimuli with x2 speed separation. The responses of these log mean speeds of x2 speed separation were obtained from the splined-fitted, trial-averaged speed-tuning curve of each neuron. M. Comparison of the readout speeds and the stimulus speeds. The diagonal line is the unity line. The ordinate represents the speed at the peak of the readout speed distribution pooled across simulated trials (not shown). At the stimulus speed of 1.77°/s (H), the distribution of the readout speed has two peaks, indicated by a solid circle (at 1.77°/s) and an open circle (at 1.25°/s). At the stimulus speed of 80°/s (G), the distribution of the readout speed also has two peaks; only the readout speed for the higher peak is shown in M.

3. Analysis of decoded speeds of the bi-speed stimulus with the fastest speeds of 20 and 80°/s

At the fastest stimulus speeds of 20 and 80°/s, across all trials, the mean objective function value peaked at speed separation of x3.25 (mean OF = -0.17, std = 0.14) (purple vertical line in Suppl. Fig. 3A). However, the peak value is not significantly different from the mean objective function value at the largest speed separation (x27.86, 3.3 on the log scale) searched (mean OF = -0.19, std = 0.14) (paired t-test, p=0.31) (orange vertical line in Suppl. Fig. 3A). The flat objective function suggests high uncertainty of the extracted speed separation at this speed pair.

We divided the trials into two subgroups, considering that they had either one or two readout speeds, and calculated the objective function for each subgroup. For trials considered to have one readout speed, the mean objective function showed a peak at the speed separation of x27.86 (3.3 on the log scale), which was the largest speed separation searched (orange vertical line in Suppl. Fig. 3E). As shown in Supplementary Figure 3F, as the searched speed separation increased, the dominate faster readout speed approached the log mean speed (40°/s, 3.7 on log scale) (thick navy blue curve) and the mean weight increased to 0.94 (cyan curve), whereas the slower readout speed approached the lower boundary speed (1.25°/s) (thick green curve) with the weight diminishing to negligible 0.06 (thin green curve) as a likely artifact. For trials considered to have two readout speeds, the objective function peaked at the speed separation of x3.25 (1.18 on the log scale) (purple vertical line in Suppl. Fig. 3B), corresponding to two readout speeds of 17.8 and 58.0°/s (the interaction points between the purple vertical line with the thick navy blue and thick green curves) (Suppl. Fig. 3C).

Furthermore, we compared the population neural responses averaged across the one-readout-speed trials and the two-readout-speed trials. The spline-fitted population responses of the two subgroups were highly correlated (R2=0.99) and statistically indistinguishable (paired t-test, p=0.30) (Suppl. Fig. 3D). This indicates that a tiny change in the population response (e.g., a slightly higher peak near log preferred speed of 3.7) would lead the decoder to exact one speed rather than two speeds (Suppl. Fig. 3D). In other words, the decoder was uncertain about how many speeds were in the visual stimuli and therefore had difficulty segmenting the visual stimuli at these fast stimulus speeds of 20 and 80°/s.

Analysis of decoding the speeds of the bi-speed stimulus with the fastest speeds of 20 and 80°/s.

A. Evolution of the objective function averaged across all 200 trials as the decoder searched through different speed separations. The red dot on the X-axis indicates the speed separation of the stimulus speeds. B, E. Evolution of the objective functions averaged across trials considered to have two (B) and one (E) readout speed(s). In A, B, and E, the error bands indicate ±STE. The black arrow indicates the speed separation where the objective function reaches its peak. The horizontal dotted line indicates the peak value of the objective function. C, F. Evolution of the readout speeds (darker and thick lines) and their weights (lighter and thin lines) as the decoder searched through different speed separations in trials considered to have two (C) and one (F) readout speed(s). D. Population neural responses averaged across trials that are considered to have two readout speeds (purple) and one readout speed (orange). Each dot represents the trial-averaged response of one neuron. The curves represent the spline-fitted population neural responses. The two red dots on the X-axis indicate stimulus speeds of 20 and 80°/s.

Additional information

Funding

National Eye Institute (R01 EY022443)