Brain representations of motion and position in the double-drift illusion

Abstract
Editor's evaluation
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

In the ‘double-drift’ illusion, local motion within a window moving in the periphery of the visual field alters the window’s perceived path. The illusion is strong even when the eyes track a target whose motion matches the window so that the stimulus remains stable on the retina. This implies that the illusion involves the integration of retinal signals with non-retinal eye-movement signals. To identify where in the brain this integration occurs, we measured BOLD fMRI responses in visual cortex while subjects experienced the double-drift illusion. We then used a combination of univariate and multivariate decoding analyses to identify (1) which brain areas were sensitive to the illusion and (2) whether these brain areas contained information about the illusory stimulus trajectory. We identified a number of cortical areas that responded more strongly during the illusion than a control condition that was matched for low-level stimulus properties. Only in area hMT+ was it possible to decode the illusory trajectory. We additionally performed a number of important controls that rule out possible low-level confounds. Concurrent eye tracking confirmed that subjects accurately tracked the moving target; we were unable to decode the illusion trajectory using eye position measurements recorded during fMRI scanning, ruling out explanations based on differences in oculomotor behavior. Our results provide evidence for a perceptual representation in human visual cortex that incorporates extraretinal information.

Editor's evaluation

This important and elegant imaging experiment in humans shows that visual area hMT+, but not other candidate brain areas, signal the perceived motion path in a visual drift illusion. Using a convincing computational decoding approach, the results indicate a perceptual representation of the illusory position in space for moving stimuli even when the actual retinal position of the stimulus is kept stable. Such a representation and the underlying neural mechanisms are of broad importance for our understanding of the neural basis of sensory perception.

https://doi.org/10.7554/eLife.76803.sa0

Introduction

Neurons throughout visual cortex encode the location of visual stimuli on the retina, suggesting that the visual system uses a primarily retina-centered reference frame. Yet visual perception is stable across frequent eye movements that displace the retinal image. This observation has led to the idea that the brain maintains a world-centered or ’spatiotopic' representation that is invariant to changes in eye position. This idea has perhaps received its strongest support from monkey single-unit recording studies showing that neurons in the ventral intraparietal area (VIP) in macaque monkeys exhibit receptive fields that do not change position when the eyes move (Duhamel et al., 1997). Observations of spatiotopic representation have also been reported in a number of human brain imaging studies. For example, visually evoked responses in both human MT/MST (hMT+) and the lateral occipital complex (LOC) have been reported to be invariant to changes in eye position (d’Avossa et al., 2007; McKyton and Zohary, 2007). These brain imaging studies suggest a broad agreement in the brain’s representation of space in monkey and human visual cortex.

Spatiotopic encoding has not been observed in all fMRI studies, however. For example, a number of studies have measured spatial receptive fields for a range of eye positions and found that receptive fields change position when the eyes move, suggesting that the brain uses a retinotopic reference frame (Gardner et al., 2008; Golomb and Kanwisher, 2012; Merriam et al., 2013). The discrepancy between these studies and reports of spatiotopic representations have not been fully resolved. One suggestion is that the reference frame for stimulus encoding depends on cognitive or task demands. For example, Crespi et al., 2011 reported that the reference frame of visual responses can shift from retinotopic to spatiotopic depending on the attentional state of the observer. Behavioral studies have reported that spatiotopic representations become more prominent in tasks requiring sequences of eye movements, suggesting that spatiotopic coordinates are built-up over time (Poletti et al., 2013; Sun and Goldberg, 2016). Together, these observations suggest that reference frames can be dynamic and depend on a variety of factors, such as visual context or the specific task (Steinberg et al., 2022).

In the current study, we used a version of the double-drift illusion to investigate a fundamental paradox of spatiotopic visual processing. The double-drift illusion occurs when a combination of local motion and an orthogonal global motion trajectory causes a strong perception of illusory drift away from the veridical trajectory. The illusion can be strikingly large so that the stimulus appears to deviate by as much as 45° away from the veridical motion path (Tse and Hsieh, 2006; Shapiro et al., 2010; Lisi and Cavanagh, 2015). A recent study revealed that the illusion persists even during smooth pursuit when the stimulus is stabilized on the retina (Cavanagh and Tse, 2019). This pursuit version of the double-drift illusion highlights the paradox of spatiotopic processing: even though the stimulus is at a constant position on the retina, it is perceived to change position in world-centered coordinates. Here, we asked if this illusion could provide insight into spatiotopic encoding in the brain. We hypothesized that several regions in occipital and parietal cortex are involved in computing the illusory percept. A number of brain areas encode stable stimulus position during pursuit eye movements (i.e., ‘real position’ cells) (Nau et al., 2018). Moreover, several of these areas have been implicated in spatiotopic processing (d’Avossa et al., 2007). If this hypothesis is correct, we predict that activity in extrastriate cortex will reflect the illusory motion path instead of the veridical stimulus path.

Results

We tested whether fMRI BOLD activity in human visual cortex reflects the perceived spatial position of a visual stimulus that remained at a constant retinal location. We measured BOLD activity during a version of the double-drift illusion in which the perceived location of the stimulus differed from its actual location by several degrees (Figure 1).

Figure 1 with 1 supplement see all

Download asset Open asset

Double-drift illusion during smooth pursuit.

(A) Leftward drift illusion. Participants made smooth pursuit eye movements, tracking the target as it moved vertically in tandem with a Gabor stimulus. Both the gabor and target moved for 12 seconds. Conjunction of local motion (grating phase drift) and global motion (displacement of the Gaussian envelope) produces an illusion in which the Gabor appears to drift several degrees to the left of its actual trajectory, even when smooth pursuit eye movements stabilize the Gabor on the retina. (B) Rightward drift illusion. Conjunction of local and global motion produces illusion of a rightward Gabor trajectory. (C) No-illusion control condition. Randomly updated grating phase does not produce illusory stimulus trajectory. All three stimulus conditions contain the same net motion energy and involved the same pursuit eye movements, yet are associated with strongly different percepts.

To determine whether BOLD activity contained information about the visual illusion, we trained a classifier to decode blocks of illusory trials from blocks of trials in which no illusion was perceived. Using leave-one-run-out cross-validation, we found that responses in multiple cortical areas were sensitive to the double-drift illusion (Figure 2); the classifier could accurately decode illusory drift in all four visual area regions of interest (ROIs) (Figure 3A, left).

Figure 2

Download asset Open asset

Modulation of fMRI response amplitude during double-drift illusion.

(A) Stimulus localizer-evoked activity in cortical regions representing stimulus location (center), eye movement localizer-evoked diffuse activity in visual cortex, extending well beyond stimulus representation. Data from a single participant in the stimulus localizer shown on an inflated cortical surface (left) and a flattened patch of the occipital lobe (center). Data from the same subject in the eye-movement localizer shown on the right. Boundaries of retinotopic visual areas identified according to an anatomical template. Color indicates the phase of the response. Yellow hues indicate a response in phase with the onset of the stimulus (center) or onset of smooth pursuit (right). (B) Time course of fMRI response from voxels identified in the stimulus localizer, from three cortical areas exhibiting a larger response for the double-drift illusion than during a no-illusion control condition.

Figure 3

Download asset Open asset

Stimulus location information encoding during double-drift illusion.

(A) Accuracy of discriminating the double-drift illusion from a control condition that was matched for net motion energy. Participants either attended the peripheral stimulus and reported the presence of the illusion (Expt 1, left), or attended the fovea and reported a luminance decrement at fixation (Expt 2, right). (B) Accuracy of discriminating rightward vs. leftward drift illusion paths in Expt 2 (attend fixation) based on fMRI responses in voxels selected to match the retinotopic location of the stimulus (left) and voxels selected based on responses to pursuit eye movements (right). (C) Decoding accuracy for independent replication and control experiments (Expt 3). Left, decoding illusory drift paths, replicating results of Expt 2. Right, decoding local-motion only control conditions, which did not produce a drift illusion. Vertical lines extend from minimum to maximum bootstrap decoding accuracy. Horizontal lines denote median bootstrap decoding accuracy. Maroon dot, p<0.01; orange dot, p<0.05; gray dot, nonsignificant (p>0.05).

A number of different factors could lead to accurate decoding of the double-drift illusion. One possibility is that the decoder was sensitive to neural activity related to computing the location of the stimulus. Alternatively, it is possible that the perception of the illusion attracted spatial attention, and the classifier was picking up on attentional differences between illusory and non-illusory conditions. To control for this second possibility, we repeated the experiment, but had participants perform a demanding task at fixation that required sustained attention (Haladjian et al., 2018). The fixation task minimized differences in spatial attention to the stimulus across conditions. We again tested whether the classifier could discriminate the double-drift illusion from the control condition. While overall decoding accuracy was slightly reduced in this experiment, we found that decoding accuracy remained robust and significant in LO, hMT+, and V3A/B, but not in early visual cortex (EVC) (Figure 3A, right), consistent with other recent observations (Liu et al., 2019; Ho and Schwarzkopf, 2022). Participants were not attending the stimulus; therefore, these results cannot be attributed to differences in spatial attention. Instead, we conclude that the classifier was sensitive to information related to encoding the perceived position of the stimulus during the illusion.

The critical test in this study is whether BOLD fMRI activity in visual cortex can discriminate between different illusory paths. We tested whether a classifier could decode the drift path of the illusion. Of all the visual areas tested, only area hMT+ could discriminate leftward from rightward illusory paths (Figure 3B, left). Because the stimulus remained at a constant retinal location, the ability to discriminate the illusory motion path suggests a non-retinotopic representation of stimulus position.

We next tested alternative explanations for the ability to discriminate motion trajectory in MT+. It is conceivable that decoding of the drift path was due to subtle differences in smooth pursuit eye movements, rather than encoding of the stimulus position. Specifically, we wondered if perceiving the illusion caused a change in oculomotor behavior, which could in turn result in decodable differences in fMRI activity. Under this alternative explanation, the ability to decode the trajectory of the illusion would be a secondary consequence of any difference in oculomotor behavior between illusory conditions. We conducted the following analyses to rule out this explanation. First, we repeated the classification analysis, this time using only voxels that were selective for smooth pursuit eye movements, as identified in a separate pursuit control experiment. In this analysis, we specifically excluded voxels that responded in the stimulus localizer (see ‘Smooth pursuit control experiment’). We reasoned that voxels that responded in the pursuit localizer should be most sensitive to any differences in pursuit eye movements in the main experiment. Note that this logic should apply, regardless of whether these voxels are selective for pursuit eye movements, or to the visual consequences of retinal slip during pursuit (i.e., during catch-up saccades). We found that responses in these voxels do not carry information that distinguishes the drift paths, in any of the ROIs (Figure 3B, right). Results from this control analysis suggest that the information being utilized by the classifier is not due to differences in pursuit eye movements. Second, in a subset of subjects, we repeated the fMRI experiment but with concurrent eye tracking (see ‘Eye tracking data with concurrent fMRI’). In this subset of subjects, we replicated our main fMRI results (decoding the illusion trajectory from hMT+ responses), but we were unable to decode the illusory trajectory from the eye position measurements alone, indicating that there was no information contained in the oculomotor behavior related to perceiving the illusion (Figure 4). Moreover, we quantified microsaccade characteristics (amplitude and direction) and found no reliable differences between illusory conditions (illusion vs. no-illusion), or between the direction of the illusion (left vs. right). We conclude that oculomotor behavior was unlikely an underlying cause of the fMRI findings reported here.

Figure 4

Download asset Open asset

Eye position measurements did not reflect the trajectory of the illusion.

(A) Eye position measured during blocks of double-drift illusion; one representative subject averaged over all blocks in a scan session. Traces show stable fixation during the first 12 s followed by 12 s of vertical smooth pursuit. Eye position did not differ between leftward and rightward illusion. (B) Eye position during blocks of local motion only trials. (C) Polar histogram of microsaccades direction during leftward and rightward double-drift illusion and the two motion conditions. (D) Histogram of saccade amplitude during the two illusion conditions and the two motion conditions. (E) Decoding accuracies, using the horizontal and vertical eye position measurements to train and test a linear classifier to discriminate the direction of local motion. The bar labeled 'illusion' indicates accuracy for decoding trajectory during the illusion; the bar labeled 'fixation' indicates decoding during fixation (when no illusion was perceived); bar labeled 'local motion' indicates decoding during the local-motion only condition (during fixation). Horizontal gray dashed line denotes chance decoding for binary decision (50%). Horizontal black dashed line denotes 95% confidence interval of null distribution estimated using a permutation test.

While the two illusory drift paths in our experiment were carefully balanced for net motion energy (i.e., a combination of a vertical global trajectory and horizontal local motion), we wondered if the ability of the classifier to discriminate leftward and rightward illusion drift paths could be due to the difference in temporal sequence of events within the trial (e.g., leftward followed by rightward motion, and vice versa). To test this possibility, we scanned another group of participants in an experiment (Expt 3) in which we included both illusion conditions from Expt 2, and two control conditions that contained the same local motion, but no global trajectory (and no smooth pursuit eye movements). For the illusory double-drift conditions, we again found that drift path was decodable in hMT+, replicating the results from Exp 2 in an independent group of participants (Figure 3C, left). However, the classifier was unable to decode the conditions containing local motion alone (i.e., discriminating left-followed-by-right from right-followed-by-leftward; Figure 3C, right). This result demonstrates that information about illusory motion paths in hMT+ is not due to local motion of the stimulus alone.

Discussion

We found that fMRI BOLD responses in several visual cortical areas could reliably discriminate the double-drift illusion from a control condition that was matched for motion energy. In EVC, this result could be explained by attentional effects associated with perceiving the illusion, since when attention was directed away from the illusion, decoding in EVC dropped to chance. Beyond early visual cortex, several areas (hMT+, LO, and V3A/B) exhibited significant decoding of the illusion itself, even when controlling for spatial attention. Moreover, responses in hMT+ could also discriminate the illusory drift path, suggesting that retinal and extraretinal information are integrated in hMT+ and used to construct the spatiotopic perception experienced during the illusion. A number of control experiments indicate that these findings cannot be attributed to low-level stimulus or oculomotor factors. Our results may indicate non-retinal stimulus position encoding occurs in human extrastriate visual cortex.

Source of illusory drift path information

What is the source of decodable drift path information? One possibility is related to a coarse-scale map for direction of motion, which has been observed throughout visual cortex, including all of the areas included in our study (Wang et al., 2014). The coarse-scale map for direction of motion in early visual areas (V1/V2/V3) is thought to result from an aperture-inward bias: larger responses were observed for motion away from the aperture edge (Wang et al., 2014). In contrast, the coarse-scale map observed in hMT+ did not depend on the aperture boundary, but instead consisted of a bias for motion toward the fovea (Wang et al., 2014). Could this fovea-centered bias explain the ability to decode the path of the double-drift illusion? In the current study, a fovea-centered bias would predict a leftward preference across voxels within hMT+ since the stimulus was always in the right visual field and leftward motion would be toward the fovea. However, the two illusory conditions (Figure 1A and B) had identical net amounts of leftward and rightward local motion, and identical proportions of time of leftward and rightward illusory drift paths. We think it is hence unlikely that a net motion bias toward the fovea in hMT + accounts for the observed results.

An alternative account is that differences in BOLD activity to the two illusory drift paths arise because of the topographic organization within hMT+ (Huk et al., 2002; Amano et al., 2009). The rightward drift path begins with illusory drift up-and-to-the-right, which increases the perceived eccentricity of the Gabor (Lisi and Cavanagh, 2015). The path continues with drift down-and-to-the-left, which brings the perceived position back to the original position. This cycle repeats throughout the block of trials. In contrast, the leftward drift path begins with illusory drift up-and-to-the-left, which decreases the perceived eccentricity of the Gabor, and continues with drift down-and-to-the-right, bringing the perceived position back to the original position. Thus, the average eccentricity of the perceived drift path is higher during rightward drift and lower during leftward drift. This shift in the perceived eccentricity of the stimulus could result in slightly different patterns of activity in hMT+, and this difference could underlie the ability to decode the illusion drift path.

This second account depends on there being an explicit representation of the perceived position of a stimulus in hMT+, while position encoding in EVC is entirely veridical. Since veridical position did not differ for rightward and leftward drift paths; the classifier was unable to decode the drift path from activity in EVC. Consistent with this account, one fMRI study (Maus et al., 2013) has reported that BOLD activity in hMT+ reflects the illusory position during motion-induced position shift. Activity throughout visual cortex is known to encode stimulus position in retinal, not spatiotopic, coordinates (Gardner et al., 2008). However, it remains unknown whether retinotopic coding is also universal in visual cortex for motion illusions. If the second account is accurate, our data may imply a difference between EVC and downstream areas in the spatial encoding of illusory motion.

Spatiotopic coordinates in visual cortex

The double-drift illusion results from combining local motion of the Gabor with a global trajectory of the envelope. In the smooth-pursuit variant of the illusion, the envelope only has a trajectory when defined in spatiotopic coordinates, since the stimulus remains at a constant retinal location. With a stable position of the stimulus on the retina, the presence of the illusion suggests some degree of spatiotopic processing in the brain (Turi and Burr, 2012). This could be accomplished by the formation of an explicit spatiotopic reference frame (Duhamel et al., 1997; d'd’Avossa et al., 2007; Crespi et al., 2011). Alternatively this could be accomplished through a computation by which a retinotopic input is combined with an eye position gain field (Merriam et al., 2013). Our data do not speak to which of these two possibilities is more likely.

Decoding the drift illusion beyond hMT+

In addition to decoding the illusory drift path, patterns of activity in multiple visual areas enabled classification of the perception of the illusion. When subjects were attending the stimulus, illusion decoding was possible in all the visual areas that we studied, raising the possibility that the illusion attracted attention, resulting in a higher BOLD response during illusory blocks (see Figure 2). When subjects were attending to a task at fixation (Exp 2), the illusion could still be decoded from activity in LO, V3A/B, and hMT+, but not from EVC. Previous fMRI studies have claimed that the locus of attention can affect the apparent reference frame in which a stimulus is encoded (Crespi et al., 2011). It is unclear, however, whether attention and task indeed change the spatial reference frame, or instead affect global response amplitudes (Roth et al., 2020), which may constitute an additive signal obfuscating measurement of the underlying reference frame. In Exp 1, subjects performed a task on the stimulus, and so attention was likely directed toward the stimulus. In an earlier fMRI study on the double-drift illusion (Liu et al., 2019), subjects also performed a task on the stimulus, and while the tasks were different in the two experiments, in both cases attention was focused on the stimulus. Our results in Exp 2, in which subjects attended fixation, and not the stimulus, demonstrate that attention and task do have an impact on the spatial encoding of the double-drift illusion, and highlights the importance of controlling the attentional state of the observer when studying visual reference frames (Crespi et al., 2011).

Disentangling spatiotopic representations and remapping

Two potential mechanisms have been suggested for the visual system’s ability to preserve a stable percept across saccades. The first is a spatiotopic representation, relying on afferent signals that update across saccades. This can be thought of as a combination of two representations: a retinotopic representation of the visual world, and a representation of gaze direction in the world. The two are integrated to form our perception of the outside world, independent of changes in direction of gaze. The second mechanism is remapping. During (or a brief moment prior to) a saccade, receptive fields shift to where they will naturally be positioned after the saccade. This shift, or remapping, ensures that neurons activity before and after the saccade will reflect the same region in the visual field. After the saccade, the receptive field returns to its natural retinotopic position. Behavioral investigations into mechanisms for visual stability across eye movements have found evidence that both spatiotopic representations and receptive field remapping underly visual stability (Poletti et al., 2013), with the relative contribution of each mechanism depending on the number of intervening saccades. After a single saccade, receptive field remapping is the primary mechanism underlying visual stability, whereas spatiotopic representations become prominent after multiple saccades (Sun and Goldberg, 2016). It is therefore possible that fMRI studies exploring spatiotopic representations could in fact probe retinotopic coding that is updated by remapping across saccades.

The version of the double-drift illusion employed in the current study did not require saccadic eye movements, making it unlikely that perisaccadic remapping contributed to our results. Remapping can take place during saccades since saccades are discrete events separated in time, leaving time for both the shift and the return. However, shifting the receptive field cannot be used for continuous gradual changes such as smooth pursuit. A receptive field shift in the direction of the planned motion before the beginning of the pursuit would not correct for the rest of the pursuit, and furthermore, there would be no opportunity to shift back. We find it plausible to assume, therefore, that the remapping mechanism is relevant only for saccades. Note that subjects may perform saccades during the pursuit (e.g., catch-up saccades) that could be corrected by remapping, but the pursuit itself cannot be corrected by remapping. Therefore, a spatiotopic signal robust to smooth pursuit provides evidence for a different correction mechanism, namely a spatiotopic representation. From this, our results suggest that stimulus position was encoded in a spatiotopic representation.

Relationship to a previous study of the double-drift illusion

A recent study Liu et al., 2019 used a decoding approach to identify brain activity reflecting the percept during a version of the double-drift illusion that did not include smooth pursuit. A classifier was used to decode both the veridical direction of a diagonally moving Gabor patch, and the illusion direction during the double-drift illusion. Briefly, they found that both veridical motion and illusory motion direction could be decoded from visual cortex, but a classifier trained on veridical motion could not decode illusory motion and vice versa, suggesting differences between the patterns of activity in the two conditions. Instead, cross-decoding was possible primarily in prefrontal cortex, suggesting that activity in PFC reflects the perceived motion direction.

The results reported by Liu et al., 2019 are surprising, on several accounts. First, when the double-drift experiment was repeated with exactly the same stimuli, the brain regions that showed significant decoding changed (compare their Figure 4A with their Figure 6A). Second, when the veridical motion stimuli were changed slightly, the pattern of regions supporting decoding changed substantially (compare their Figure 4B with their Figure 6B), as did the regions supporting cross-decoding between veridical motion and illusory motion (compare their Figure 4C with their Figure 6C). These findings raise important questions and suggest that multiple factors, such as spatial attention, may influence the ability to decode an illusory motion path, as we have demonstrated in our study.

Regardless, the results of Liu et al. do not have direct bearing on which reference frame was used to encode the stimulus location, which is the topic of the current study. Because in Liu et al. subjects were fixating on the Gabor, the encoding of the illusion could have been in either retinal or spatiotopic coordinates. In contrast, in our study, the stimulus must have been encoded in spatiotopic coordinates. However, one potentially interesting extension of the cross-decoding approach would be to train the decoder on a version of the illusion involving fixation (as in Liu et al.), but then test the decoder on the illusion during pursuit (as in the current study). If perceived motion direction is represented in spatiotopic coordinates in both cases, one would expect the classifier to succeed in cross-decoding. However, if spatiotopic coding is used during pursuit (as we have shown here) but not during fixation, this cross-decoding should fail.

Materials and methods

Participants

Data were acquired from 19 healthy participants (11 females, age range 23–34 y, mean 25.8 y) with normal or corrected-to-normal vision. Experiments were conducted with the written consent of each observer. The consent and experimental protocol were in compliance with the safety guidelines for MRI research and were approved by the Institutional Review Board of the National Institutes of Health. Of the 19 participants, 12 were scanned in multiple sessions and in multiple experimental conditions. 9, 12, and 5 participants participated in Exp 1, 2, and 3, respectively.

Share this article

Cite this article

Double-drift illusion during smooth pursuit.

Modulation of fMRI response amplitude during double-drift illusion.

Stimulus location information encoding during double-drift illusion.

Eye position measurements did not reflect the trajectory of the illusion.

Author details

Noah J Steinberg

Contribution

Contributed equally with

For correspondence

Competing interests

Zvi N Roth

Contribution

Contributed equally with

For correspondence

Competing interests

J Anthony Movshon

Contribution

Competing interests

Elisha Merriam

Contribution

Competing interests

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism