Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.
Read more about eLife’s peer review process.Editors
- Reviewing EditorMarisa CarrascoNew York University, New York, United States of America
- Senior EditorJoshua GoldUniversity of Pennsylvania, Philadelphia, United States of America
Reviewer #1 (Public review):
Summary:
This carefully executed study uncovers the functional relevance of curl signals that impinge on the retina every time an observer's gaze direction and movement direction are not aligned.
Strengths:
This finding is important, highlighting the functional role of an abundant incidental signal (curl in retinal motion) that has thus far believed to be a nuisance that needs to be filtered out of the retinal motion stream.
The study's evidence is compelling: a combination of psychophysical experiments and critical manipulations, control theory and neural modeling, which together make an internally consistent and biologically plausible case for the role of curl signals in estimating heading direction.
This study uncovers the functional relevance of curl signals that occur on the retina when an observer is moving, and gaze is not straight ahead. The experimental and modeling results clearly go beyond previous studies and significantly advance our understanding of vision-based navigation.
Another clear strength is that the study uses tightly controlled experimental manipulation to provide strong test cases for the hypothesis that curl is used for visual navigation. These conditions are important to constrain the proposed model (and future models) of heading control.
The modeling is very clearly described, and the modeling and analysis code is published and freely available. The authors go beyond a back-of-the-envelope control model and show how it might be implemented at the neural-circuit level. The model is biologically plausible.
Weaknesses:
The discussion would benefit from an extension of the implications of the study and predictions of their model.
Reviewer #2 (Public review):
This study examines how curl in the retinal flow field can be used as a control variable for estimating and controlling the heading of a moving observer. The basic idea (which is not entirely new, see Matthis et al. 2022) is that translation along a path with eccentric gaze (meaning that the subject is not heading toward the point they are looking at) produces a pattern of optic flow on the retina with a rotational component around the point of fixation (which can be captured by the mathematical "curl" operator). The sign and magnitude of retinal curl vary with heading relative to the point of fixation, such that curl can be used as a control variable to steer rightward or leftward to move toward the fixated target. The authors perform behavioral experiments and show that there are biases in perceived heading that seem to be largely governed by retinal curl. They also show that a simple controller model can use curl to steer toward a target, and they provide a neural network model that provides a biologically plausible implementation of the controller (although there are some questions about that).
There is a core of interesting work here that I think can be important to the field. However, there is a lack of clarity on several important fronts, including design of the behavioral experiments, presentation of the behavioral data, conceptual framing of what curl can and cannot do, etc. Equally importantly, the manuscript is not written in a manner that will make it accessible to most vision scientists. I consider myself to be pretty knowledgeable about optic flow, and I had to read most of the manuscript 3 or 4 times to be able to understand the bulk of it. And my experience is that most vision scientists do not understand optic flow well, so I fear that most of the readers that the authors should want to reach would struggle to understand the work. As written, this is mainly going to make an impact on a handful of optic flow gurus. Thus, I consider that this manuscript will need a major overhaul to clarify important issues and make it more accessible.
Major issues:
(1) The manuscript contains inconsistent, if not misleading, messaging about what information retinal curl does, and does not, provide regarding heading estimation. In the Abstract, the authors state: "We propose an alternative: the visual system utilizes retinal curl directly to estimate heading, rendering the explicit recovery of the FOE unnecessary." Based on my understanding of the rest of the manuscript, I find this statement to be a misrepresentation for two main reasons:
a) To "directly estimate heading" relative to what? When not qualified, most people interpret "heading" to mean an observer's heading relative to the world (or some allocentric reference frame). But retinal curl only gives information about an observer's heading relative to the point on which their eyes are fixated. Moreover, that point of fixation will change every few hundred milliseconds in natural viewing, so the retinal curl will change with each new fixation even as heading relative to the world remains unchanged. So I think most readers would grossly misinterpret the claim that retinal curl can be used "directly to estimate heading". Indeed, in the authors' controller model, the initial heading needs to be given, and then the controller can work. But from where does the visual system get the initial heading, since it does not come from curl? These issues are left hanging. Thus, while curl can provide a very useful input for steering toward a fixated target, other signals are needed to estimate heading relative to the world. This has to be made much clearer early on, and a conceptual schematic diagram might help. Also, the authors generally do not specify the reference frame of the variables they are talking about, leaving lots of room for misinterpretations. It should be clear each time they are talking about a variable, such as heading, whether it is relative to the fixation target, body, world, etc.
b) It seems to me that retinal curl will depend on other variables, in addition to heading relative to the fixation target. For example, it seems to me that the magnitude of retinal curl will depend on self-motion speed, the depth structure of the scene, the angle of elevation of the fixated target, and perhaps others. This is not discussed at all, and many readers would get the misguided impression that there is a 1:1 mapping from curl to heading (relative to fixation). If I am right that this is not correct, it means that retinal curl can tell the observer whether to steer right or left to move toward the fixated target, but it cannot tell them how much to steer. Indeed, in the authors' controller model, there is a free parameter that calibrates curl to angle. It makes sense that this works to fit trajectory data that are given from a fixed environment, but it is unclear how the brain would use retinal curl to control steering when these other variables are uncertain or changing unpredictably. Moreover, how does the system change the mapping from curl to steering command as the location of fixation changes relative to the current heading? These are issues that need to be brought up in framing the problem and discussed at some length. If the authors can show mathematically that retinal curl is only dependent on heading (relative to fixation) and not any of these other variables, it would be very valuable to show the equations for this relationship.
(2) The description of the behavioral experiment and presentation of behavioral data leaves a lot to be desired.
a) First, it is stated (line 158) that "Participants continuously reported their perceived direction of self-motion while maintaining fixation on the yellow dot." Again, the reference frame is completely unspecified. Participants were reporting their perceived heading relative to what? The fixation target? The world? What exactly were the instructions given to the subjects to perform the task? Based on the description of how perceived paths are computed (line 166-), it seems to be presumed that subjects are reporting their heading relative to the world because those angles are then converted into x and z coordinates in what I presume is a world-centered reference frame. But how do we know that subjects are accurately reporting their heading relative to the world? What if they are biased in their reports by the location of the fixation target relative to the scene, or by some other reference signal? Is it possible for the authors to rule out the possibility that perceptual biases seen in the unaltered curl condition result from observers not fully adopting the assumed reference frame of the task? If this cannot be firmly excluded, it seems to create problems for the rest of the study.
b) I also feel that there is a mismatch between what the behavioral task requires and what the controller model does. Subjects are apparently asked to report their heading relative to the world, but the controller model only controls their heading relative to the point that they are fixating. I understand how this is resolved in the model, but I think this type of distinction is buried and will not be apparent to most readers. Again, the reference frames of what is being measured and controlled need to be specified explicitly in all parts of the paper, and the authors need to explain how the system would combine curl-based control with some other measures of (at least initial) heading for world-centered heading to be computed. All of the assumptions need to be clearly specified.
c) In addition, I found it frustrating that the authors never present raw perceptual data from the observers. Rather, in Figure 2, we see reconstructed trajectories that are perfectly smooth with no indications of noise whatsoever. Since these paths are computed from the perceptual reports, there must be some noise inherent in them. The figures should represent this uncertainty somehow, and it should be explained how these perfectly smooth trajectories are obtained.
(3) "...the magnitude of retinal curl in the fovea can specify the body trajectory relative to gaze (Matthis et al., 2022)." The main idea put forward by the authors here seems to overlap heavily with this statement that they attribute to Matthis et al. 2022. While I think this paper still adds importantly to the topic, the authors do not discuss how their findings are different from those of Matthis et al. 2022, why they are an important extension, etc. Readers should not have to go read this other paper to have any idea how the present findings are placed in importance relative to the literature.
(4) The analysis and treatment of eye movements is extremely weak. The authors discarded trials for which gaze deviated from the fixation point by more than 3 degrees (which is a LOT given that the eye speeds are generally in the neighborhood of 0.5 deg/sec), and they provide basic stats on the distribution of positions. But this largely misses the point: it is not small position errors that are likely to matter, but rather velocity errors. Even a small amount of retinal slip of the target while it is being pursued will cause image motion that is going to alter the optic flow field around the fixation target. So, for example, the retinal curl field may no longer be centered on the fixation target. How do we know that some of the perceptual biases are not influenced by image motion resulting from imperfect tracking of the fixation target? This needs to be analyzed and discussed.
(5) I found the sections of text comparing the separate and joined fits (starting line 287) to be a bit too rosy. The authors show the separate fits in the main text, and it is not very surprising that these fits are good, given that the model has 30 parameters, and these data are pretty low-dimensional. The authors only show the joined fits in the supplement, and they say that they are almost as good as the separate fits (indeed, they are better in a model comparison sense, but this is 30 parameters vs. 2 parameters). However, when I look at the fits of the joined model in the supplement, I don't find them to be very impressive. In particular, the model grossly misses the data for the straight paths for several subjects (e.g., id5, id6, id8, id10). And fitting the straight paths would presumably be easiest. This implies that the joined model is really missing something and that fitting the curved paths interacts strongly with fitting the data for different fixation target locations on the straight path. I think that the authors should discuss the results a bit more soberly and tone down their conclusions here.
(6) The section of the paper on neural simulations (starting line 387) has a few weaknesses. First, why are only straight paths simulated here? This does not seem to provide a very rigorous test of the model. Second, it is awkward that the simulation results are presented in units of pixels, rather than degrees. Third, the authors seem to downplay the fact that the neural estimates of heading seem to oscillate rather wildly (over a range of hundreds of pixels, whatever that means, see especially Figure S16). It was far from clear to me how an estimate of heading with these large oscillations is useful. It would seem to require that heading estimates are integrated over substantial lengths of time to be reliable. It was therefore unclear how the model produces such smooth paths from these oscillating estimates.
Reviewer #3 (Public review):
Summary:
This manuscript uses a novel paradigm to demonstrate that rotational motion patterns in the retinal image, called curl, directly influence perception of heading direction. This means that it is not necessary to recover the focus of expansion, defined by the point of zero motion when moving along a straight trajectory toward a target, as is commonly thought.
Strengths:
It has long been accepted that the focus of expansion of the optic flow field generated by self-motion is used to guide heading direction. While there have been many challenges to the need to recover the focus of expansion when gaze is not in the direction of travel, it is still not well understood how retinal motion patterns contribute to heading perception. Recent work has demonstrated the complexity of the retinal motion patterns during natural walking, where body motion adds a rotational component. A rotational component also results from curved paths as well as gaze off the direction of travel. This rotational component is called curl. The primary contribution of this manuscript is to demonstrate convincingly that curl influences perception of heading, and that it is not necessary to recover the focus of expansion.
A strength of the manuscript is that realistic retinal motion patterns are generated by recording the image sequences generated by a walker in a virtual environment, and then using those patterns as stimuli in the experiment. This allows the creation of the more complex flow patterns that are a consequence of the bob and sway of natural walking, which are often considered a minor factor. The elegant experimental design allows direct manipulation of the curl signal, and this in turn directly influences measured heading perception. Another strength is that the authors ground their findings in control theory and neural computations, using a model that produces human-like path trajectories.
The study is timely, given the long history of this question, together with the growing understanding of the complexity of naturally generated retinal motion and the absence of direct evidence for the way that these motion patterns are used in heading perception. It adds an important piece of evidence for how retina-centered optic flow may be used by the visual system, which is critical for our understanding of motion processing in the brain.
Weaknesses:
The primary limitation of the paper is that it avoids discussion of some of the inevitable complexities of heading perception. The main issue is what exactly is meant by heading. Different behaviors evolve over different timescales. The geometry of retinal motion defines instantaneous heading, which varies widely through the gait cycle. Time-varying information like this is known to be important in the momentary control of balance. Heading can also be thought of as steering the body toward a distant goal, which evolves over longer timescales. The current manuscript appears to be concerned with heading information integrated over a few seconds and seems to provide evidence that heading is indeed integrated over the gait cycle. The issue of the time scale of the computation is touched on, but it is not related to how it might be used in normal walking or what situations it might apply to. Steering toward a distant goal during walking is not a very difficult problem and may not require evaluation of retinal motion, but control of balance is more challenging and may depend critically on curl. Consequently, the timescale of the computation needs to be considered in order to understand what is meant by heading.
