1. Computational and Systems Biology
  2. Neuroscience
Download icon

Contrast sensitivity reveals an oculomotor strategy for temporally encoding space

  1. Antonino Casile  Is a corresponding author
  2. Jonathan D Victor
  3. Michele Rucci  Is a corresponding author
  1. Istituto Italiano di Tecnologia, Italy
  2. Center for Neuroscience and Cognitive Systems, Italy
  3. Harvard Medical School, United States
  4. Weill Cornell Medical College, United States
  5. University of Rochester, United States
Research Article
  • Cited 0
  • Views 653
  • Annotations
Cite this article as: eLife 2019;8:e40924 doi: 10.7554/eLife.40924

Abstract

The contrast sensitivity function (CSF), how sensitivity varies with the frequency of the stimulus, is a fundamental assessment of visual performance. The CSF is generally assumed to be determined by low-level sensory processes. However, the spatial sensitivities of neurons in the early visual pathways, as measured in experiments with immobilized eyes, diverge from psychophysical CSF measurements in primates. Under natural viewing conditions, as in typical psychophysical measurements, humans continually move their eyes even when looking at a fixed point. Here, we show that the resulting transformation of the spatial scene into temporal modulations on the retina constitutes a processing stage that reconciles human CSF and the response characteristics of retinal ganglion cells under a broad range of conditions. Our findings suggest a fundamental integration between perception and action: eye movements work synergistically with the spatio-temporal sensitivities of retinal neurons to encode spatial information.

https://doi.org/10.7554/eLife.40924.001

Introduction

Contrast sensitivity, the ability to distinguish a patterned input from a uniform background, is one of the most important measures of visual function (Robson, 1966; Campbell and Robson, 1968; De Valois et al., 1974; Owsley and sensitivity, 2003). Elucidation of its underlying mechanisms is, thus, essential for understanding how the visual system operates both in health and disease.

It has long been established that sensitivity varies in a specific manner with the spatial frequency of the stimulus, yielding the so-called contrast sensitivity function (henceforth CSF). Under photopic conditions, the CSF measured with stationary gratings exhibits a well-known band-pass shape that typically peaks around 3–5 cycles/deg and sharply declines at higher and lower spatial frequencies. The mechanisms responsible for this dependence on spatial frequency are not fully understood. At high frequency, a decline in sensitivity is expected for several reasons, including the filtering of the eyes’ optics (Campbell and Green, 1965) and the spatial limits in sampling imposed by the cone mosaic on the retina (Hirsch and Miller, 1987; Rossi and Roorda, 2010). At low frequencies, however, the reasons for a reduced sensitivity have remained less clear.

A popular theory directly links the low-frequency attenuation in visual sensitivity to the neural mechanisms of early visual encoding (Atick and Redlich, 1990; Atick and Redlich, 1992). Building on theories of efficient coding (Barlow, 1961), it has been argued that this attenuation reflects a form of matching between the characteristics of the natural visual world and the response tuning of neurons in the retina: retinal ganglion cells (henceforth RGCs) respond less strongly at low spatial frequencies so as to counterbalance the spectral distribution of natural scenes. According to this proposal, this filtering eliminates part of the redundancy intrinsic in natural scenes and enables more efficient (i.e. more compact) visual representations.

Although very influential, this proposal conflicts with experimental data. Neurophysiological recordings have long shown that the way the responses of retinal ganglion cells vary with spatial frequency deviates sharply from the CSF. The CSF of macaques is very similar to that of humans (De Valois et al., 1974); yet neurons in the macaque retina respond much more strongly at low spatial frequencies than one would expect from behavioral measurements of the CSF (Figure 1A). This deviation cannot be reconciled with standard models of retinal ganglion cells. It persists even when one takes into account obvious differences in the stimuli often used in neurophysiological and behavioral measurements (i.e. drifting gratings vs. temporally modulated gratings), as well as the nonlinear attenuation in responsiveness at low spatial frequencies exhibited by some retinal ganglion cells (Derrington and Lennie, 1984; Croner and Kaplan, 1995; Benardete and Kaplan, 1997a). This mismatch between neuronal and behavioral sensitivity indicates that additional mechanisms contribute to the CSF.

Figure 1 with 1 supplement see all
Contrast sensitivity and fixational eye movements.

(A) Behavioral and neurophysiological measurements of contrast sensitivity. The contrast sensitivity functions (CSF) of humans and macaques (black curves; De Valois et al., 1974) are compared to the receptive fields profiles of magno- (M) and parvo-cellular (P) retinal ganglion cells (red curves; Croner and Kaplan, 1995). (B) Fixational eye movements (FEMs; red curve in magnified inset and black and gray traces in Cartesian graph), which include small saccades (microsaccades; red-shaded interval) and fixational drift (green), continually displace the stimulus on the retina.

https://doi.org/10.7554/eLife.40924.002

A fundamental difference between neurophysiological and behavioral measurements of contrast sensitivity is the presence of eye movements in the latter. Under natural viewing conditions, humans and other primates incessantly move their eyes (Kowler, 2011; Cherici et al., 2012). Small movements, known as fixational eye movements (FEMs), occur, even when attempting to maintain steady gaze on a single point (Figure 1B). Although humans often tend to suppress saccades of all sizes, including microsaccades, during measurements of contrast sensitivity (Mostofi et al., 2016), ocular drift—the seemingly erratic motion in between saccades/microsaccades—keeps the stimulus on the retina always in motion and may cover an area as large as that of the foveola (Rucci and Poletti, 2015a). Critically, this retinal image motion is completely eliminated or markedly attenuated in many neurophysiological preparations, where the retina is studied in a dish, or eye muscles are paralyzed as a result of anesthesia and/or neuromuscular blockade.

In previous work, we have shown that eye drift profoundly reshapes visual input signals, redistributing the 0 Hz (DC) power of the external static stimulus to non-zero temporal frequencies on the retina (Casile and Rucci, 2006; Casile and Rucci, 2009; Kuang et al., 2012; Aytekin et al., 2014). These modulations appear to be used by humans for the fine spatial discrimination (Rucci et al., 2007; Boi et al., 2017; Ratnam et al., 2017), providing new support to the long-standing proposal that the visual system uses oculo-motor induced luminance fluctuations for encoding spatial information in a temporal format (see Rucci and Victor, 2015b and Rucci et al., 2018 for reviews). Building upon this previous work, here, we investigate whether this temporal encoding strategy, coupled with the known response characteristics of retinal neurons, accounts for the most fundamental properties of human spatial sensitivity.

In addition to the properties described above, it is well established that contrast sensitivity is affected by temporal modulations in the stimulus. Although the CSF exhibits a strong attenuation at low spatial frequencies when tested with stationary gratings, the shape of this function changes when gratings are modulated in time, transitioning from band-pass to low-pass as the temporal frequency of the stimulus increases (Robson, 1966). Furthermore, although strongly attenuated, sensitivity also tends to shift to higher spatial frequencies when retinal image motion is strongly reduced, as in experiments of retinal stabilization (Kelly, 1979). In both these conditions, the temporal modulations impinging onto retinal receptors differ drastically from those generated by normal eye drift over stationary gratings.

Does a temporal strategy of spatial encoding reconcile neurophysiological and behavioral measurements of contrast sensitivity? And does this strategy explain the differences in the CSF measured in various experimental conditions? More broadly, does the oculomotor-driven dynamics of retinal ganglion cells provide a unified account of human spatial sensitivity? Answers to these questions are not only critical for advancing our comprehension of the mechanisms of visual encoding but also for understanding the consequences of abnormal retinal image motion and their clinical implications. In the following, we use neuronal models to quantitatively examine the impact of eye drift on neural activity and compare the responses of retinal ganglion cells to the CSF of primates.

Results

Figure 1A compares the mean receptive fields of ganglion cells in the primate retina, as estimated by Croner and Kaplan (1995), with the contrast sensitivity of alert and behaving macaques (De Valois et al., 1974). The two sets of data deviate considerably, especially at low spatial frequencies. In this range, unlike the CSF, neural sensitivity is not strongly attenuated, a trend reported by multiple neurophysiological studies (e.g., Kaplan and Shapley, 1982; Hicks et al., 1983; Derrington and Lennie, 1984). This deviation is not simply the outcome of incorrectly extrapolating receptive-field measurements, as neural responses have been directly measured at very low spatial frequencies (down to 0.07 cpd in Croner and Kaplan, 1995; Figure 1A).

While a difference-of-Gaussians model can yield reduced responses at low spatial frequencies, attenuation similar to that observed in the CSF can only be achieved at the expense of highly unrealistic model parameters. As shown in Figure 1—figure supplement 1A–B, for both M and P cells, matching the physiological CSF requires a surround strength that is more than twice the value found in physiological measurements, a condition that gives an almost perfect balance between excitation and inhibition. Even small deviations from this balance lead to marked departures from the CSF (Figure 1—figure supplement 1C–D). Thus, contrary to previous proposals, the spatial sensitivity of retinal ganglion cells appears to be quantitatively incompatible with the characteristics of the CSF. A greater attenuation of neural sensitivity is required at low spatial frequencies to counterbalance the large power of natural scenes in this range.

The response of a neuron, however, does depend not only on the cell’s spatial preference but also on its temporal sensitivity. Temporal transients are always present in the input signals to the retina during behavioral measurements of contrast sensitivity. Experimenters often take great care to minimize these transients, for example by slowly ramping up the stimulus at the beginning and down at the end of a trial and by enforcing fixation to prevent visual changes caused by saccadic eye movements (Figure 2A). Yet, despite these precautions, fixational eye movements are always present and modulate the visual flow impinging on the retina even when the stimulus does not change on the monitor. Could sensitivity to these oculomotor fluctuations reconcile neurophysiological and behavioral measurements of spatial sensitivity?

Figure 2 with 1 supplement see all
Input transients during measurement of contrast sensitivity.

(A) Temporal modulations in the stimulus. Measurements of contrast sensitivity often change gradually the contrast of the stimulus during the course of the trial. In this case, the stimulus is a static grating. (B) Fixational jitter modulates input signals in a way that depends on the spatial frequency of the stimulus. The same amount of fixational drift yields larger temporal fluctuations with gratings at higher spatial frequencies (vertical arrows). (C) Input power with gratings at 1 and 8 cycles/deg (left panel). Higher spatial frequencies lead to broader temporal distributions (right panel). (D) Total power available at non-zero temporal frequencies with and without fixational drift. In this latter case, temporal modulations are only caused by the temporal contrast envelope of stimulus presentation. Shaded regions represents one standard deviation (see inset). Data represent averages over N=5 observers. (E) Temporal sensitivities of modeled retinal ganglion cells. Model parameters are reported in Tables 1 and 2 in Materials and methods.

https://doi.org/10.7554/eLife.40924.004

To investigate this question, we recorded eye movements in human observers, as they carried out a grating detection task at threshold and exposed spatiotemporal filters approximating the receptive fields of retinal ganglion cells to the luminance signals experienced by the retina in each individual trial. Figure 2B shows the temporal modulations impinging onto retinal neurons during a typical measurement of contrast sensitivity. In the absence of any transient, the power of a stationary visual stimulus would be confined to the DC (0 Hz) temporal frequency axis. In practice, however, both eye drift and the turning of the stimulus on and off on the display introduce temporal modulations. These modulations effectively redistribute part of the stimulus DC power to nonzero temporal frequencies, that is they transform static power (the original power at 0 Hz) into dynamic power (power at non-zero temporal frequencies).

As shown in Figure 2C–D, because of the characteristics of ocular drift, the resulting dynamic power increases with spatial frequency, up to approximately 30 cpd (magenta line in Figure 2D), which, interestingly, roughly corresponds to the frequency limit given by the spatial resolution of photoreceptors in the fovea. In contrast, unlike drift, contrast modulations due to the onset/offset of the stimulus on the display cause power redistributions that do not depend on the spatial frequency of the stimulus (black line in Figure 2D). It is important to keep in mind that eye movements do not generate new power in the retinal input. They only redistribute the original DC power of the stimulus, so that a complementary frequency-dependent attenuation of power occurs along the 0 Hz axis (Figure 2—figure supplement 1).

Both eye drift and contrast changes yield temporal modulations that are well within the range of temporal sensitivity of retinal ganglion cells (cfg. Figure 2C and E). However, in simulations that replicated the standard conditions of contrast sensitivity measurements, drift modulations predominated. Since drift modulations convey little power at low spatial frequencies, the responses of standard ganglion cells were attenuated in this frequency range (Figure 3B–C). This happened for both M and P cells, despite the well-known differences in their spatio-temporal sensitivity. As a consequence of this effect, a simple linear combination of the resulting M and P responses accurately predicted human contrast sensitivity with stationary stimuli over the entire range of relevant spatial frequencies (solid line in Figure 3A).

Influence of fixational drift on contrast sensitivity.

Predicted CSFs in the presence (Drift; solid line) and absence (No Drift; dashed line) of eye movements. Stimuli were stationary gratings. (A) A linear combination of the responses of M and P cells closely matches classical measurements (circles; data from De Valois et al., 1974) only when eye drift occurs. (B–C) CSFs predicted separately from the responses of M (panel B) and P (panel C) cells.

https://doi.org/10.7554/eLife.40924.008

In contrast, in the absence of eye movements, when the only temporal modulations were those given by the onset/offset of the stimulus on the monitor, the CSF predicted by the same linear combination of neural responses exhibited a low-pass behavior that deviated considerably from human contrast sensitivity, especially at low spatial frequencies (dashed lines in Figure 3). In fact, no linear combination of modeled responses could approximate the CSF in this condition. This happened because, unlike the luminance modulations resulting from ocular drift, the amplitude of the contrast modulations of the stimulus on the display does not depend on the spatial frequency of the stimulus (black line in Figure 2D). Thus, without taking ocular drift into account, neuronal models exhibit a higher level of response at low spatial frequencies, as dictated by the spatial sensitivity of their kernels — and this strongly deviates from the CSF (Figure 1A).

In sum, standard models of the responses of M and P RGCs well predict the shape of the human CSF as measured with stationary gratings, but only when one considers sensitivity to the temporal modulations caused on the retina by fixational drift.

Contrast sensitivity is a function not only of the spatial frequency of the stimulus but also of its temporal frequency. Measurements with gratings modulated in time have long shown that the CSF in humans is not space-time separable: the way contrast sensitivity varies with spatial frequency depends on the temporal frequency of the modulation (Robson, 1966). As the temporal frequency increases, the CSF changes its shape, transitioning from band-pass to low-pass (Figure 4A).

Figure 4 with 3 supplements see all
Contributions of fixational drift to contrast sensitivity with temporally modulated gratings.

(A) Human CSFs measured with static (0 Hz; data from De Valois et al., 1974 and sinusoidally modulated (6 Hz; data from Robson, 1966) gratings. (B) Contrast sensitivity functions predicted by our model in the presence of temporally modulated gratings are compared with measurements from Robson (1966). See Figure 4—figure supplement 3 for the separate contributions of M and P cells. (C) Power spectra of the response of modeled retinal ganglion cells during viewing of gratings temporally modulated at 6 Hz. Each point in the map represents the amount of power at a given temporal frequency resulting from translating the modeled receptive fields over a grating at the corresponding spatial frequency following the recorded eye drift trajectories.

https://doi.org/10.7554/eLife.40924.010

To investigate whether our model also accounts for this change in shape, we repeated our simulations using gratings modulated at various temporal frequencies. The same linear combination of the responses of M and P cells as in Figure 3 continued to closely match human performance when the stimulus was temporally modulated on the display, and the predicted CSF replicated the low-pass to band-pass transition observed in primates, as the frequency of the modulation increased (Figure 4B).

This change in shape was the consequence of the different amount of dynamic power that the combination of fixational drift and temporal modulations of the stimulus delivered within the range of neuronal sensitivity. Since we assume that there is no sensitivity to unchanging stimuli, the DC power does not contribute to cells’ responses. However, flickering a grating has the effect of shifting the 0 Hz power of the grating to the temporal frequency of the modulation (Figure 4C). As a consequence, as the frequency of the modulation increased, this DC power was progressively moved into the sensitivity range of modeled neurons. At low temporal modulating frequencies (e.g. 1 Hz or below), only a small fraction of this power was within the region of neuronal sensitivity, and the temporal redistribution resulting from eye drift continued to exert a strong influence, forcing the CSF to maintain its band-pass shape. However, at higher temporal frequencies (e.g. 6 Hz and higher), the power restricted to the 0 Hz axis in the absence of stimulus’ modulations now became fully available within the cells’ peak sensitivity region. Since this static power is predominantly at low spatial frequencies (Figure 2—figure supplement 1), it caused a transition from band-pass to low-pass behavior in the responses of simulated M and P neurons, as well as in the shape of the CSF. Estimates of the CSF at intermediate frequencies between 0 Hz and 6 Hz (Figure 4—figure supplement 1) suggest that this transition occurs around 3 Hz, which is in agreement with psychophysical results (Bowker and Tulunay-Keesey, 1983).

In sum, our model attributes the space-time inseparability of the CSF to the structure of the temporal modulations delivered within the range of sensitivity of retinal ganglion cells. Modulations resulting from eye drift yield a band-pass CSF, whereas sinusoidally modulated gratings yield a low-pass CSF. The interplay between these two components of the retinal input explains not only contrast sensitivity with stationary gratings, but also the band-pass to low-pass transition that occurs with temporally modulated gratings. Notably, it correctly predicts the temporal frequency range at which this transition takes place. Our results, thus, suggest a functional link between the physiological instability of visual fixation and the characteristics of the CSF.

A natural question then emerges: how is contrast sensitivity affected by elimination of the luminance modulations caused by ocular drift? Ideally, in the complete absence of eye movements, neural responses in our model would only be driven by the modulations present in the external stimulus. Under such conditions, the model predicts that sensitivity to a stationary grating would be greatly attenuated and the CSF would shift toward a low-pass shape, as it would lack the frequency-dependent amplification operated by ocular drift.

In real experiments, however, elimination of oculomotor-induced luminance modulations is impossible. Retinal stabilization — a laboratory procedure that attempts to immobilize an image on the retina (Riggs et al., 1953; Yarbus, 1957) — is always affected by noise in the oculomotor recordings as well as imperfections in gaze-contingent display control, which leave some residual motion on the retina. Under these conditions, contrast sensitivity has indeed been found to be attenuated, but it maintains its band-pass shape and peaks at higher spatial frequencies (Kelly, 1979).

To examine whether sensitivity to temporal transients accounts for the changes in the CSF measured under retinal stabilization, we exposed modeled neurons to reconstructions of the visual input signals experienced in these experiments. Previous studies have established that a Brownian model well captures the characteristics of retinal image motion during fixation (Kuang et al., 2012; Poletti et al., 2015). Building on this previous finding, we modeled the residual motion of the retinal image in stabilization experiments as a Brownian process, but with greatly reduced diffusion coefficients relative to that present during normal, unstabilized fixation.

Figure 5A shows how the spatial frequency content of the luminance fluctuations experienced by retinal receptors (the power available at nonzero temporal frequencies) varies with the scale of the Brownian motion process (i.e. its diffusion coefficient, D). Changing the amount of retinal image motion has interesting repercussions on the characteristics of temporal modulations. As expected, a smaller diffusion constant delivers less dynamic power to the retina within the range of neural sensitivity, a direct consequence of the fact that luminance modulations are now smaller. However, a smaller D also has the effect of shifting the range of amplification to higher spatial frequencies by a factor of D. This happens because reducing the scale of retinal image motion is functionally equivalent to spatially stretching the stimulus, which translates, in the Fourier domain, to a compression of the axis of spatial frequencies that moves the amplification range toward higher spatial frequencies.

Figure 5 with 1 supplement see all
Consequences of retinal stabilization.

(A) Spatial spectral density of the luminance modulations resulting from a Brownian model of retinal image motion with different diffusion constants. Lowering D both attenuates the power available at each spatial frequency (vertical arrow) and shifts the distribution to higher spatial frequencies (horizontal arrow). (B) Predicted contrast sensitivity under retinal stabilization. Sensitivity is reduced and shifted to higher spatial frequencies. Dashed vertical lines mark the maxima of the two curves (color coded according to their D in panel A). Results quantitatively match classical experimental data from Kelly (1979). CSFs predicted separately from the responses of M and P neurons are shown in Figure 5—figure supplement 1.

https://doi.org/10.7554/eLife.40924.014

These effects in the spectral distributions of the retinal flow well match the changes in contrast sensitivity observed in retinal stabilization experiments. Figure 5B compares classical retinal stabilization data from Kelly (1979) to the sensitivity predicted by our model when the diffusion constant of the retinal image motion was attenuated by a factor of 125, which corresponds to shrinking the spatial scale of eye movements by approximately one order of magnitude. Model predictions closely followed psychophysical measurements: a reduction in the amount of retinal image motion attenuated contrast sensitivity while maintaining its band-pass shape and shifted its peak sensitivity to higher spatial frequencies from 4 Hz to 5.5 Hz (Figure 5B).

These data show that consideration of the luminance modulations resulting from the motion of the stimulus on the retina accounts not only for behavioral sensitivity measurements performed in the presence of normal eye movements, but also for measurements made under conditions of retinal stabilization, when retinal image motion is greatly reduced.

Discussion

Contrast sensitivity is a fundamental descriptor of visual functions. In many species, including humans, sensitivity strongly depends on the spatial and temporal frequency of the stimulus. Here, we show that a temporal scheme of spatial encoding, a scheme in which spatial vision is driven by temporal changes, predicts such dependencies when the temporal modulations introduced by incessant eye movements are taken into account. In contrast, when these consequences of fixational drift are ignored, the known response characteristics of retinal ganglion cells fail to account for human CSF. As described below, these results are highly robust, bear multiple consequences, and lead to important predictions.

An important consequence of our results regards the strategies by which the visual system encodes spatial information. Existing theories of visual processing have attributed the shape of the CSF to the characteristics of early visual processing. In an influential study (Atick and Redlich, 1992) found that the theoretical filter that optimally decorrelates natural images closely matches the CSF. Since decorrelated responses enable compact neural representations, these authors assumed that the CSF reflects the average spatial selectivity of ganglion cells in the retina. However, experimental measurements have long shown that the response selectivity of RGCs differs considerably from the CSF, particularly at low spatial frequencies, where decorrelation would be most beneficial (Hicks et al., 1983; Kaplan and Shapley, 1982; Derrington and Lennie, 1984; Croner and Kaplan, 1995). As expected from this deviation, broad spatial correlations in RGCs responses have been found in preparations in which natural images are displayed in the absence of eye movements (Puchalla et al., 2005; Segal et al., 2015). These findings are consistent with our model: when the transients in stimulus presentation override the consequences of eye drift, spatial sensitivity follows the spatial kernels of modeled receptive fields. For this reason, responses to low spatial frequencies are enhanced relative to the level that would be needed for decorrelating activity.

The same principle also provides an explanation for the band-pass to low-pass transition of the CSF as the temporal frequency of the stimulus increases. This transition is the consequence of the spectral characteristics of the signals that the combination of fixational drift and stimulus transients delivers within the range of neuronal temporal sensitivity. With stationary gratings, temporal modulations in the retinal input are heavily influenced by ocular drift, which enhances high spatial frequencies imposing a band-pass sensitivity (Figure 3). With temporally modulated gratings, neuronal responses are also affected by the contrast modulation imposed to the stimulus on the display. Above a frequency of a few Hz, the impact of external modulations outweighs the effects of eye movements, removes the space-time inseparability in cell responses caused by ocular drift, and enhances again sensitivity to low spatial frequencies (Figure 4B).

Rather than attributing spatial sensitivity solely to the spatial selectivity of RGCs, our analysis shows that the CSF is shaped by the joint spatial and temporal characteristics of retinal responses and how they interact with oculomotor transients. It predicts the complex way contrast sensitivity varies with the spatial and temporal frequency of the stimulus by a linear combination of the space-time separable functions of P and M channels. While our study cannot exclude that other mechanisms, at various stages of visual processing, may also play a role in shaping the CSF (e.g. the number of neurons in different frequency channels), it suggests that these other contributions are minimal. Consideration of RGCs temporal sensitivity provides a parsimonious unifying framework for a wide range of experimental measurements of the CSF with only a minimal set of assumptions.

In our model, we assumed that retinal ganglion cells possess negligible sensitivity below the frequencies at which sensitivity can practically be measured (~0.2–0.3 Hz). This hypothesis may appear to conflict with the neurophysiological data reported in the low temporal frequency range by several studies. However, in both neurophysiological and psychophysical experiments, measuring sensitivity in this range is challenging because it requires trials with long durations, consideration of the visual stimuli present before and after each trial, and estimation of long impulse responses. Typically, the transfer functions reported at low temporal frequencies are extrapolations outside of the range of measured values based on models that were not designed for this purpose (e.g. the linear cascade model (Victor, 1987) in Benardete and Kaplan, 1997b; Benardete and Kaplan, 1997a; a difference of exponential in Derrington and Lennie, 1984, etc.). These extrapolations must be interpreted with great caution, as they merely reflect untested model assumptions.

The few studies that specifically examined retinal ganglion cells’ responses at low temporal frequencies found a decline in sensitivity up to the limit that they could measure (Victor, 1987; Purpura et al., 1990). These studies suggest that the response attenuation takes the form of an approximately linear decrease in log-log scale. Such behavior is expected from theoretical considerations based on the characteristics of adaptation (Thorson and Biederman-Thorson, 1974), considerations that appear to apply to the responses of cones in the retina of the macaque (Boynton and Whitten, 1970) and therefore will limit the low-frequency behavior of retinal ganglion cells. Furthermore, temporal signals at frequencies below ~0.3 Hz, even if present, are not likely to be useful to an observer in a psychophysical experiment, as they will contain noise power due to visual stimulation on previous trials and during the intertrial interval (such as eye-blinks and glances around the lab). Our results are robust to the specifics of how this low-frequency attenuation in sensitivity was implemented. The curves presented in Figures 3, 4 and 5 were obtained by simply discarding responses below 0.6 Hz. Results were, however, virtually identical when we used different frequency thresholds (Figure 4—figure supplement 2A), or when we modeled sensitivity as a power law function in the low-frequency range, as in Purpura et al. (1990) and Thorson and Biederman-Thorson (1974) (Figure 4—figure supplement 2B).

We specifically focused on fixational drift both because of its ubiquitous presence and its known influence on fine pattern vision (Ratliff and Riggs, 1950; Ditchburn, 1955; Steinman et al., 1973; Rucci et al., 2007; Ratnam et al., 2017). Other types of eye movements, like saccades and microsaccades, tend to be suppressed during measurements of contrast sensitivity (Mostofi et al., 2016) and were not considered in this study. The transients from these movements, however, differ in their spectra from those from eye drift, as they provide equal temporal power across a broad range of spatial frequencies. Thus, during normal viewing, the visual system could benefit from different types of modulations. In keeping with this idea, it has been argued that the stereotypical alternation of oculomotor transients resulting from the natural saccade/drift cycle contributes to a coarse-to-fine processing dynamics at each visual fixation (Boi et al., 2017).

It is worth emphasizing that our results are very robust and do not depend on fitting model parameters. With regard to oculomotor activity, we did not model eye movements, but used real traces recorded from human subjects during measurements of contrast sensitivity. With regard to neuronal properties, we implemented standard M and P filters obtained from the neurophysiological literature and frequently adopted by modeling studies (Croner and Kaplan, 1995; Benardete and Kaplan, 1997a; Benardete and Kaplan, 1999). We chose to estimate the CSF by linearly combining M and P responses in fixed ratio, because this was the simplest model. But we note that other ways of combining M and P signals will yield very similar conclusions, since the space-time inseparability originate from the visual input rather than the neuronal models. Our two parameters (the global gain at a given temporal frequency and the ratio of M-P contributions, see Equation 7 in the Materials and methods section) were merely used to quantitatively align the modeled CSF with the experimental data. They have no role in explaining the shape of the CSF and its band- to low-pass transition.

In addition to providing a comprehensive explanation of the CSF, our study makes important predictions at different levels. At the neural level, our results predict that the response selectivity of RGCs will change when measured in the presence and absence of the fixational motion of the retinal image. Neurophysiological studies already suggest that fixational eye movements are an important component of visual encoding (Gur et al., 1997; Leopold and Logothetis, 1998; Martinez-Conde et al., 2000; Olveczky et al., 2003; Kagan et al., 2008; Meirovithz et al., 2012; McFarland et al., 2016). Eye jitter has been found to reduce redundancy in the responses of retinal neurons (Segal et al., 2015) and to synchronize them, enhancing visual features (Greschner et al., 2002) even beyond the physiological limitations imposed by photoreceptors spacing (Juusola et al., 2016). Furthermore, retinal ganglion cells have been found that may distinguish between the global motion given by fixational eye movements and the local motion of objects (Olveczky et al., 2003). Yet, retinal responses are traditionally measured with the eyes immobilized, a condition in which RGCs tend to exhibit relatively strong responses at low spatial frequencies (Croner and Kaplan, 1995). Our model predicts that the spatial frequency amplification produced by fixational drift in the retinal input (Figure 2D) will enhance neuronal sensitivity to higher spatial frequencies and will reduce sensitivity to low spatial frequencies. As a consequence, RGCs’ spatial sensitivity should exhibit a more pronounced band-pass behavior and its peak should shift toward higher frequencies. This prediction is difficult to test in vivo, because of the need to completely stabilize the retinal input, but it can be thoroughly investigated in vitro, where the motion of the retinal image is under full experimental control.

At the perceptual level, an interesting observation comes from the changes in the frequency content of the retinal input shown in Figure 5A. The amplitude of fixational instability regulates the power available in different spatial frequency bands. Specifically, the smaller the amount of retinal image motion, the more the range of amplification shifts to higher spatial frequencies. The visual system could, in principle, exploit this relationship by dynamically matching the spatial scale of eye drift to the frequency content of the visual scene, or the frequency range that is task-relevant. Within a certain range, smaller drifts would optimize information accrual when foveating on regions rich in high spatial frequencies. This effect could not only be directly driven by the stimulus in a bottom-up fashion, but also be used to meet top-down demands in high-acuity tasks. Indeed, several studies support the idea that humans can control the amount of their ocular drift (Steinman et al., 1973; Cherici et al., 2012; Poletti et al., 2015). In the same vein, the relationship between fixational drift and the frequency content of the retinal input may also explain individual perceptual differences. Subjects with relatively smaller drifts are expected to perform better in tasks in which high spatial frequencies are critical. Studies that quantitatively relate the characteristics of fixational eye drift to visual perception are needed to investigate these predictions.

Furthermore, our model predicts that manipulating temporal modulations from eye drift will affect performance. We have shown that reducing the amount of the retinal jitter well matches the overall reduction in contrast sensitivity as well as the shift to higher spatial frequencies observed in experiments of retinal stabilization. In the other direction, enlarging fixational jitter increases the amount of power available at low spatial frequencies predicting an improvement in contrast sensitivity in this range. This prediction is consistent with the improvements in word and object recognition reported in patients with central visual loss, when images or text are jittered or scrolled (Watson et al., 2012; Harvey and Walker, 2014; Gustafsson and Inde, 2004). The spatial frequency band of retinal ganglion cells decreases with eccentricity and enlarging retinal image motion has the effect of bringing more power in their range of sensitivity.

Our study also has clinical implications, as it predicts that disturbances in fixational oculomotor control will affect visual sensitivity. Oculomotor anomalies and impaired sensitivity co-occur in a variety of disorders, including conditions as diverse as dyslexia (Stein and Fowler, 1981; Stein and Fowler, 1993) and schizophrenia (Dowiasch et al., 2016; Egaña et al., 2013). Patients with these conditions exhibit similar visual deficits including reduced sensitivity (Lovegrove et al., 1980a; Lovegrove et al., 1980b; Slaghuis, 1998), low-level visual impairments (Eden et al., 1996; Li, 2002; Butler et al., 2001; Kim et al., 2006) and reading disabilities (Revheim et al., 2006) possibly caused by the disturbances in low-level vision (Revheim et al., 2006; Lovegrove et al., 1980a). Our results suggest a potential link between fine-scale eye movements and these visual deficits, which has not yet been investigated and which may inspire novel therapeutic approaches.

Materials and methods

Data collection and analysis

To examine the influences of eye movements on visual sensitivity, neuronal models were exposed to reconstructions of the input signals typically experienced by observers in experiments of contrast sensitivity. To this end, we used oculomotor traces recorded in measurements of contrast sensitivity to move the stimuli presented as input to the models. Methods for the collection and analysis of eye movements data, as well as perceptual results have already been described in previous publications and are only briefly summarized here (see Mostofi et al., 2016 and Boi et al., 2017). This section focuses on the methods that are novel to this study.

Subjects

Eye movements were recorded from five observers (all females, age range 21–31). To optimize the precision of the recordings, only subjects with normal, uncorrected vision took part in the study. Informed consent was obtained from all participants following the procedures approved by the Boston University Charles River Campus Institutional Review Board (protocol number 1062E).

Apparatus

Stimuli were displayed on a gamma-corrected fast-phosphor CRT monitor (Iyama HM204DT) in a dimly-illuminated room. They were observed monocularly with the left eye patched, while movements of the right eye were recorded by means of a Dual Purkinje Image eyetracker (Fourward Technology) and sampled at 1 KHz. This system has a resolution – measured by means of an artificial eye – of approximately 1(Crane and Steele, 1985; Ko et al., 2016). A dental imprint bite bar and a head-rest prevented head movements. Stimuli were rendered by means of EyeRIS, a custom system that enables precise synchronization between oculomotor events and the refresh of the image on the monitor (Santini et al., 2007).

Stimuli and procedure

As in typical psychophysical CSF measurements, we used a standard grating-detection paradigm (see Mostofi et al., 2016 for the behavioral data). In a forced-choice procedure, observers detected 2D Gabor patterns oriented at ±45. Their contrast varied across trials following PEST (Taylor and Creelman, 1967). The frequency and standard deviation of the Gabor were 10 cycles/deg and 2.25 respectively. Stimuli were displayed over a uniform field with luminance of 21 cd/m2. Oculomotor traces were segmented in complementary periods of drift and saccades based on a speed threshold of 2o/s (Mostofi et al., 2016). Only oculomotor traces collected around threshold levels of sensitivity and that contained no saccades, microsaccades or blinks were used in this study.

Modeled neurons were exposed to the same retinal input experienced by human participants, identically replicated at all spatial frequencies. Gratings were presented for 3.2 s. They were smoothly ramped up and down in contrast at the beginning and end of the trial by means of the modulating function M(t) and also modulated in time at frequency ωt (ωt = 0, 1, 6, 16, or 22 Hz). The reconstructed retinal input was thus given by:

(1) I(x,t|fs,ωt,αs,ϕs)=sin(2πfs(xξ(t))T+ϕs)sin(2πωtt)M(t)

where 𝝃(𝒕)=[ξx(t),ξy(t)] represents eye movements and fs=[fscos(αs),fssin(αs)] the stimulus frequency (0.1–60 cycles/deg). The orientation αs and the phase ϕs uniformly spanned the range [0 2π).

Neural models

The mean instantaneous rate of retinal ganglion cells (RGCs) were simulated by means of standard space-time separable linear filters with transfer function:

(2) RF(𝒇,ω)=K(𝒇)H(ω)

where 𝒇 and ω indicate spatial and temporal frequencies respectively. The spatial kernel K(𝒇) was modeled as in Croner and Kaplan (1995) with a standard difference of Gaussians:

(3) K(f)=C(Kcπrc2eπrc|γf|2Ksπrs2eπrs|γf|2)

with parameters adjusted based on neurophysiological recordings from macaques (Table 1 in Croner and Kaplan, 1995). The scaling factor γ was set to 0.5 to model the smaller receptive fields of the fovea following cortical magnification (Eq.8 in Van Essen et al., 1984).

Table 1
Parameters used in Equation 3 to model the spatial kernels of magno- (upper row) and parvo-cellular (bottom row) neurons.

Data are from Croner and Kaplan (1995).

https://doi.org/10.7554/eLife.40924.006
rcKcrsKs
M cells0.101480.721.1
P cells0.03353.20.184.4

The temporal sensitivity function H(ω) consisted of a series of low-pass filters and a high-pass stage as propose by Victor (1987):

(4) H(ω)=Ae-iρ2πωD(1-Hs1+iρ2πωτS)(11+iρ2πωτL)N

Parameters were taken from neurophysiological studies that fitted this model to recorded neurons (M cells: median values in Table 2 in Benardete and Kaplan, 1999; P cells: median values in Table 2 in Benardete and Kaplan, 1997a). The scaling factor ρ was set to 1/1.6 to include the effects of large stimuli on retinal responses (Figure 7B in Alitto and Usrey, 2015).

Table 2
Parameters used in Equation 4 to model the temporal kernels of magno- (upper row) and parvo-cellular (bottom row) neurons.
https://doi.org/10.7554/eLife.40924.007
NADHsτLτS
M cells30499.77211.12.23
P cells3867.593.50.691.2729.36

Estimating contrast sensitivity

The main hypothesis of our study is that the visual system is insensitive to temporal stimulation at 0 Hz so that spatial sensitivity is entirely driven by temporal transients. For this reason, we estimated the predicted CSF on the basis of cell responses to input changes.

For each spatial frequency fs of the grating, we first estimated the space-time power spectrum of the retinal input PI(𝒇,ω) by averaging the square of the absolute value of the Fourier transform of Equation 1 across trials, stimulus’ orientations αs and phases ϕs. Since both PI(𝒇,ω) and the spatial kernels K(𝒇) possess circular symmetry in spatial frequency, we reduced the spatial dimensionality from 2D to 1D by radial averaging. We then computed the power spectrum of neuronal responses O(f,ω) by multiplying the space-time power spectrum of the retinal input PI(f,ω) by the transfer functions of the cells’ filters:

(5) Oζ(f,ω)=PI(f,ω)|RFζ(f,ω)|2

where RFζ(f,ω), with ζ=M or P, represents the Fourier transform of M or P cells’ receptive fields (Equation2).

Finally, we evaluated the CSF at each spatial frequency f, by computing the square root of the integrated temporal power across all non-zero temporal frequencies:

(6) CSFζ(f)=o+Oζ(f,ω)dω

where Oζ represents the power spectrum of M or P responses. The integral in Equation 6 was computed numerically. To avoid artifacts from finite bandwidth, the first two temporal samples of the spectrum were discarded so that integral over temporal frequency started from ω=0.63Hz. However, virtually identical results were obtained when we used lower thresholds or when we modeled the low-frequency range of temporal sensitivity as a power law (Figure 4—figure supplement 2).

The predicted CSF was then estimated, for each condition, by a linear combination of the contrast sensitivities of the two types of neurons, CSFM(f) and CSFP(f) :

(7) CSFest(f)=A[λCSFM(f)+(1-λ)CSFP(f)]

where λ (λ=0.57 for all conditions) weighs the contributions of the M and P populations and A is a global rescaling coefficient.

Note that the parameters A and λ were merely used to quantitatively align model predictions with classical data, but had no role in explaining our findings. That is, the emergence of a space-time inseparability in the CSF, was neither caused by the specific value of λ (both M and P cells show this transition; Figure 4—figure supplement 3) nor by the global scaling factor A, which had no effect on the shape of the predicted CSF. We chose to linearly combine the contributions of M and P neurons because this was the simplest model. However, use of other models (e.g. the maximum of either population at each spatial frequency f) produced virtually the same results given the robustness of the underlying phenomenon.

The same procedure was used to estimate the CSF in the case of no eye movements and retinal stabilization (Figures 3, 4 and 5). In the former condition (no eye movements), 𝝃(𝒕) was set to zero in Equation 1. In the latter condition (retinal stabilization), we modeled the retinal image motion by means of a 2D random walk process, but with reduced diffusion coefficient (D = 2 rather than the normal value D = 250). Brownian motion, with D in the range 100–350, is known to be a good model for the normal retinal image motion when the head is not immobilized (Aytekin et al., 2014).

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
    Possible principles underlying the transformations of sensory messages
    1. HB Barlow
    (1961)
    In: W Rosenblith, editors. Sensory Communication. Cambride, USA: MIT Press. pp. 217–234.
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
    Eye-Movements in relation to retinal action
    1. RW Ditchburn
    (1955)
    Optica Acta: International Journal of Optics 1:171–176.
    https://doi.org/10.1080/713818684
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
    The MoviText method: Efficient pre-optical reading training in persons with central visual field loss
    1. J Gustafsson
    2. K Inde
    (2004)
    Technology and Disability 6:211–221.
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
    Impaired detection of visual motion in schizophrenia patients
    1. CS Li
    (2002)
    Progress in Neuro-Psychopharmacology & Biological Psychiatry 26:929–934.
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
    Distributed relaxation processes in sensory adaptation
    1. J Thorson
    2. M Biederman-Thorson
    (1974)
    Science 183:161–172.
  71. 71
  72. 72
  73. 73
  74. 74
    The perception of an image fixed with respect to the retina
    1. AL Yarbus
    (1957)
    Biophysics 2:683–690.

Decision letter

  1. Fred Rieke
    Reviewing Editor; University of Washington, United States
  2. Timothy E Behrens
    Senior Editor; University of Oxford, United Kingdom

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

[Editors’ note: a previous version of this study was rejected after peer review, but the authors submitted for reconsideration. The first decision letter after peer review is shown below.]

Thank you for submitting your work entitled "Changes in visual sensitivity reveal an active strategy for temporally encoding space" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by a Senior Editor. The reviewers have opted to remain anonymous.

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work cannot be considered further, at least in present form. We would be willing to consider a revised paper if you thoroughly deal with the reviewers’ concerns.

All three reviewers agreed that the work was of broad interest. We also agreed that several issues needed to be strengthened before considering the paper further. These and several other issues are detailed in the individual reviews. The two most pressing issues follow:

1) Recent work from some of the same authors has deepened understanding of the relation between FEM and neural coding. The relation between the present work and this past work is not as clear as it needs to be in the Introduction, and as a result the advance that the present work represents is difficult to appreciate immediately.

2) At the heart of the paper is the discrepancy between physiological and psychophysical CSFs. But the ganglion cell examples in the first figure appear not to accurately reflect ganglion cell behavior since they do not include points at low frequencies (with instead a horizontal line from the lowest spatial frequency point plotted). Most ganglion cell measurements show some low frequency roll-off. A thorough justification of the properties used to represent measured ganglion cell CSFs is needed.

Reviewer #1:

This paper describes the interaction between eye movements and spatial coding. The paper starts by describing a discrepancy between spatial contrast sensitivity as measured behaviorally and as measured in responses of retinal or geniculate cells. It then proposes that this discrepancy originates because the neural recordings were made in the absence of eye movements, and that the discrepancy can be resolved when eye movements are taken into account. The case for the importance of eye movements, made here and in previous work from some of the same authors, is compelling. There are several issues with the present paper, however, that limit my enthusiasm:

1) M and P cell models. The literature contains numerous measurements of M and P cell spatial contrast sensitivity. From a scan of these papers (e.g. Derrington and Lennie, 1984; Kaplan and Shapley, 1982), measured spatial contrast sensitivity curves for both M and P cells fall, sometimes sharply, at low spatial frequencies. This fall is not evident in the responses depicted in Figure 1. Even the Cronor and Kaplan (1995) paper cited as the basis of Figure 1 shows a clear falloff at low spatial frequencies, and describes properties of the associated surround (related, the Benardete and Kaplan, 1997 reference in the text is about M sequences and temporal properties, so not immediately clear how it is relevant). I am not clear on why the red curves in Figure 1 should be extrapolated from their peak to a 0 spatial frequency asymptote; this extrapolation appears to be inconsistent with most experimental results. More generally, I think it is critical that the paper contains a thorough summary of the measured M and P properties across studies, and a clear justification of the M and P models used. Figure 1 presently undermines confidence in the central motivation for the paper.

2) Overlap with previous work. The content of Figures 1 and 2 is a review of previous work. Of particular importance, previous work from some of the same authors has established that eye movements decrease responses to low spatial frequencies in natural images. I am concerned, given that, about the intellectual advance of the present paper. Specifically, I think the paper would benefit from more experimental tests of the proposal. For example, are there measurements of spatial contrast sensitivity in awake behaving monkeys (i.e. with eye movements)? Recognizing that eye movements cannot be fully suppressed, can they still be manipulated to test the proposal (e.g. can they be increased)? Are there differences across eccentricity that could be exploited to provide additional tests?

Reviewer #2:

The contrast sensitivity function (CSF) is a fundamental characteristic of vision – it describes how the ability to see depends on the spatial frequency of the input. The CSF is believed to be due to processing limitations at the earliest stages of processing (i.e., at the retina). However, there is a long-known discrepancy between the CSF (measured by having subjects report) and the properties of retinal output signals (measured physiologically) – subjects are less sensitive to low spatial frequencies than you would expect from the signals recorded in the retina. This paper puts forward and tests a specific explanation for this discrepancy – subjects make small eye movements during fixation, and this introduces temporal modulations that shift the power spectrum of the visual input. The paper convincingly demonstrates that this explanation is adequate to explain the discrepancy, using high-quality measurements of eye movements in human subjects combined with computational modeling of retinal ganglion cell responses. Moreover, they extend this approach to show that reducing fixational eye movements would be expected to shift the CSF in ways that are consistent with previous studies using retinal stabilization.

Overall, this is an expertly conducted combination of computational modeling and eye movement recording that provides an elegant solution to a long-standing problem in vision science. My comments are aimed mainly at clarifying the presentation.

The paper refers to "fixational eye movements" or "FEMs" throughout, but what is meant by this is the slow drift component of fixational eye movements, and not microsaccades or oscillatory movements. A few points. First, the paper is reasonably clear on this point, but not completely and not everywhere. For example, the casual reader who only looks at the Abstract and skims the details (including the Results where the focus on drift is pointed out) might conclude that this applies to microsaccades, since they are the best-known component of FEMs. Is there a reason to avoid using the more specific term "ocular drift" rather than "FEMs" throughout, or at least, more often? Second, what is the impact of microsaccades? Would these also be expected to affect the CSF or is the scale of their temporal modulation too high to affect responses? This point should be clarified. Third, if ocular drift and microsaccades have distinguishable effects on CSF, then the authors should be especially careful with the use of the term "FEMs", especially in the Discussion, where it seems that the conclusions based on ocular drift appear to be generalized to all types of fixational eye movements.

In several places – starting with the title – the manuscript implies that there is an active strategy behind these effects of FEMs, rather than it being an incidental effect. To me, "active strategy" implies some reliable relationship between the circumstance and the subject's behavior. For example, the needle-threading task shows that microsaccades appear to be generated based on some sort of active strategy. For ocular drift, it is not clear that such a relationship has been established. Do you have some basis of this claim, or is it simply a speculation? If it is a speculation, the text should be written more conservatively to match.

Another plausible explanation for why the behavioral CSF does not match the properties of the retinal output is that some part of the rest of the visual system is responsible. Is there a basis for excluding this possibility?

Reviewer #3:

This paper explores how non-separable spatio-temporal frequency tuning of M and P cells combines with measured fixational eye movements to account for observed behaviorally measured visual contrast sensitivity. A minimal model agrees well with a range of experimental data (free viewing of gratings at different temporal frequencies as well as retinal stabilization experiments) and suggests that eye movements have been optimized for neuronal contrast sensitivity curves (this work) as well as the content of natural scenes (previous work from the same groups). Experimental tests are suggested, particularly for retina-in-a-dish style experiments, where movements typical of FEMs are rarely added to visual stimuli. Experiments are also suggested for human subjects and for some clinical applications.

1) The paper is clearly presented and the work is thorough and interesting, but there should be more emphasis in the Introduction on why it's so important to understand the discrepancies between neuronal and behavioral CSFs for stationary stimuli. This could be addressed with a bit of text that previews some of the Discussion points about the neural coding and potential clinical applications of these findings.

2) Some recap of existing retinal recordings with FEM-like perturbations (e.g. Greschner et al., 2002) should be added to the Discussion.

3) Figure 1A – no data points are apparent below about 2Hz for the P and M cells, right around where the neuronal and behavioral CSFs diverge. It would be nice to include some data points there, perhaps from other studies. If data points do exist at 0Hz, they should be made more apparent.

[Editors’ note: what now follows is the decision letter after the authors submitted for further consideration.]

Thank you for submitting your article "Contrast sensitivity reveals an oculomotor strategy for temporally encoding space" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Timothy Behrens as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

The revisions have improved the paper substantially. There is one outstanding point remaining about the ganglion cell model (see reviews for details). In consultation all of the reviewers agreed this was important to resolve.

Reviewer #1:

This is a revision of a paper about the relationship between fixational eye movements and spatial contrast sensitivity. The paper centers around the observation that the spatial contrast sensitivity function measured behaviorally in primates differs from that of retinal ganglion cells, particularly at low spatial frequencies. The paper argues that eye movements and the temporal sensitivity of ganglion cells account for this difference.

The paper has improved in revision. In particular, I found the Introduction more compelling, and the connection of the spatial ganglion cell models to past work is much less confusing with Figure 1 fixed. An issue that became clearer in the revised version of the paper is that the temporal model for the ganglion cells needs to be described more thoroughly. This and some smaller points follow.

The crux of the paper is how the dynamics of eye movements make nominally static spatial inputs dynamic, and this shifts static inputs into a temporal region where the ganglion cells are more responsive. The ganglion cell model in Equation 4 is taken straight from prior measurements, and hence is well supported in the literature (see suggestion below about including relevant parameters). However, this model does not predict a complete lack of response at a temporal frequency of 0 Hz, as assumed in the paper. There are several issues with how this is treated in the paper. First, the temporal model is central to the paper, and should not be relegated entirely to the Materials and methods (one suggestion would be to add a figure between the current Figure 2 and 3 showing the temporal response of the ganglion cell model). Second, is there experimental evidence that static components of the stimulus do not modulate the ganglion cell responses? This assumption should be clarified earlier (in reference to Figure 3) and needs to be justified, especially as it introduces an abrupt discontinuity in the ganglion cell model between 0 and nonzero temporal frequencies. Related to this point, the Robson measurements shown in Figure 4 show a marked attenuation of the CSF at low spatial frequencies for stimuli modulated at 1 Hz – such that the 1 Hz CSF is not very different from that shown for static gratings by DeValois. This suggests that the static/dynamic separation is not completely correct. Third, temporal sensitivity measured behaviorally depends strongly on spatial frequency (e.g. see Robson paper). Shouldn't this affect the argument presented in the paper – i.e. that gratings of different spatial frequencies are subject to quite different temporal filtering apparently? Some discussion of this is needed.

Unless the lack of ganglion cell responses to static images can be justified based on past experiments, the paper needs to be much more careful in asserting that known ganglion cell models can account for the discrepancy between CSFs measured behaviorally and in ganglion cells. Statements to this effect are made in many places in the paper – starting in the Abstract, Results, seventh paragraph, Discussion, fourth paragraph, etc.

Reviewer #2:

The authors have thoroughly revised the paper and, to my mind, have satisfied the reviewers' concerns and questions. The manuscript is much clearer and is now suitable for publication in eLife.

One small point that may further improve the clarity of the presentation:

In Figure 4 and Figure 4—figure supplement 1 results are presented linking model predictions to CSFs measured during the presentation of temporally modulated gratings. From the Results and Discussion, it's clear that the effects of drift are more apparent at low temporal frequency. It would be useful to see an array of curves from the model without drift, pinpointing the transition around a few Hz where transients from the stimulus override the effects of eye movements. At the very least, it would be useful to label the Figure 4—figure supplement 1 model lines with "drift".

Reviewer #3:

I thought the original submission was interesting and excellent. In the revised manuscript, the authors have done a thorough job of responding to both my comments and those of the other reviewers.

https://doi.org/10.7554/eLife.40924.022

Author response

[Editors’ note: the author responses to the first round of peer review follow.]

All three reviewers agreed that the work was of broad interest. We also agreed that several issues needed to be strengthened before considering the paper further. These and several other issues are detailed in the individual reviews. The two most pressing issues follow:

1) Recent work from some of the same authors has deepened understanding of the relation between FEM and neural coding. The relation between the present work and this past work is not as clear as it needs to be in the Introduction, and as a result the advance that the present work represents is difficult to appreciate immediately.

We have carefully revised Abstract, Introduction, and Discussion to make sure that the novel contributions of this study are clear. A detailed reply to this point is provided in our letter to reviewer 1, who raised this issue (see section “Overlap with previous work”). In brief, whereas our previous work focused on how eye movements affect visual input signals (i.e., that they result in a whitening of the input), here we combine measurements of eye movements with models of retinal ganglion cell responses to bridge the gap between neuronal mechanisms and psychophysics. We show that consideration of the influences of oculomotor transients on retinal responses provides a unifying account of a large range of experimental findings on human visual sensitivity. As we now better explain in the manuscript, this finding carries important implications.

2) At the heart of the paper is the discrepancy between physiological and psychophysical CSFs. But the ganglion cell examples in the first figure appear not to accurately reflect ganglion cell behavior since they do not include points at low frequencies (with instead a horizontal line from the lowest spatial frequency point plotted). Most ganglion cell measurements show some low frequency roll-off. A thorough justification of the properties used to represent measured ganglion cell CSFs is needed.

We apologize for this issue. The low-frequency properties of RGCs were incorrectly displayed in the original Figure 1A, as we had inadvertently plotted the Fourier transform of a 1D section of the receptive field, rather than a section of the 2D Fourier Transform. This led to the lack of a roll-off in the receptive field profile, which triggered the reviewers’ question. When correctly plotted, we find the standard roll-off in responses reported by many studies (see section on “M and P models”, in our reply to reviewer 1). We corrected this error in the resubmitted manuscript and now emphasize that the plots are not based on an extrapolation. We have also added additional material, including new Figure 1—figure supplement 1, to support the point that this response attenuation in physiological sensitivity falls far short of what is needed to account for psychophysical measurements of contrast sensitivity, unless eye movements are explicitly taken into account.

Reviewer #1:

This paper describes the interaction between eye movements and spatial coding. The paper starts by describing a discrepancy between spatial contrast sensitivity as measured behaviorally and as measured in responses of retinal or geniculate cells. It then proposes that this discrepancy originates because the neural recordings were made in the absence of eye movements, and that the discrepancy can be resolved when eye movements are taken into account. The case for the importance of eye movements, made here and in previous work from some of the same authors, is compelling. There are several issues with the present paper, however, that limit my enthusiasm:

1) M and P cell models. The literature contains numerous measurements of M and P cell spatial contrast sensitivity. From a scan of these papers (e.g. Derrington and Lennie, 1984; Kaplan and Shapley, 1982), measured spatial contrast sensitivity curves for both M and P cells fall, sometimes sharply, at low spatial frequencies. This fall is not evident in the responses depicted in Figure 1. Even the Cronor and Kaplan (1995) paper cited as the basis of Figure 1 shows a clear falloff at low spatial frequencies, and describes properties of the associated surround (related, the Benardete and Kaplan, 1997 reference in the text is about M sequences and temporal properties, so not immediately clear how it is relevant).

As mentioned above, we inadvertently plotted the 1D Fourier Transform of a section of the receptive field, rather than a section of the 2D Fourier Transform. This made the curves deviate from the receptive fields that were actually used in the simulations and appear considerably flatter at low spatial frequencies. When properly plotted (as in the revised Figure 1A), the expected low-frequency attenuation is seen. Crucially, as we show in the main text and Figure 1—figure supplement 1, this attenuation is not nearly enough to account for psychophysical CSFs. We apologize for this error and the confusion that it generated.

I am not clear on why the red curves in Figure 1 should be extrapolated from their peak to a 0 spatial frequency asymptote; this extrapolation appears to be inconsistent with most experimental results.

We now clarify that these are not extrapolations; they are the values determined by the receptive fields models measured by Croner and Kaplan (1995). As we now mention in the text, these curves were determined by the difference-of-Gaussian fits of these authors, based on measurements at spatial frequencies down to 0.07 cycles/deg, which goes beyond the range plotted. We have modified both the legend and the caption to avoid possible misunderstandings.

More generally, I think it is critical that the paper contains a thorough summary of the measured M and P properties across studies, and a clear justification of the M and P models used. Figure 1 presently undermines confidence in the central motivation for the paper.

We used the models of retinal ganglion cells (RGCs) reported by Croner and Kaplan (1995) because this study provides one of the most thorough investigations of receptive fields parameters that we could find in the literature. Unlike the few “typical” receptive fields reported by other articles, this study contains detailed lists of model parameters estimated over sizeable neuronal populations.

The degree of roll-off at low spatial frequencies reported by Croner and Kaplan is very similar to that found by other studies in macaques. See, for example, Figure 9 in Kaplan and Shapley, 1982, Figure 6 in Hicks et al., 1983, Figure 3 in Derrington and Lennie, 1984. These studies did not report parameters that we could use for modeling cell responses, except for Derrington and Lennie, who reported the spatial receptive fields’ parameters of the six P cells their Figure 3 (Table 1 in Derrington and Lennie, 1984). We have now included in the manuscript also the spatial sensitivity estimated from these parameters, which is very similar to those obtained with the data from Croner and Kaplan (Figure 1—figure supplement 1B in the resubmitted manuscript). Crucially, although the response attenuation exhibited by neurons in the low spatial frequency range is in the same direction of the attenuation in contrast sensitivity measured in humans, it falls far short of accounting for the attenuation in psychophysically-measured CSFs. In addition to the difference in the amount of attenuation, there is also a difference in the shape of the curves – a leveling-off for physiological sensitivities, compared to psychophysical measurements. These elements are clearly visible in all the figures listed above and in the new data in Figure 1—figure supplement 1.

Following the reviewer’s comments, we revised the Introduction to make clear that the deviation between neural and psychophysical measurements requires additional mechanisms. As we now discuss in the first paragraph of the Results, the only way in which Croner and Kaplan’s models could be modified to match contrast sensitivity functions is by doubling the ratio between the area of the center and surround – from the measured 0.5-0.6 values to close to 1. That is, receptive fields with highly unrealistic characteristics would be necessary to match behavioral functions. We have added a figure (Figure 1—figure supplement 1) to explain this point. Since the effect in our model originates from the redistribution of power resulting from eye movements, rather than the specific shape of the spatial receptive field of modeled neurons, our results are instead very robust.

“related, the Benardete and Kaplan, 1997 reference in the text is about M sequences and temporal properties, so not immediately clear how it is relevant.”

We thank the reviewer for pointing out this issue. The relevant article here is Croner and Kaplan, (1995). We corrected the reference in the revised manuscript.

2) Overlap with previous work. The content of Figures 1 and 2 is a review of previous work. Of particular importance, previous work from some of the same authors has established that eye movements decrease responses to low spatial frequencies in natural images. I am concerned, given that, about the intellectual advance of the present paper.

We extensively modified both the Introduction and the Discussion to clarify the significance of the present study and better explain how it goes beyond our previous work. In brief, our previous studies focused on the consequences of eye movements for visual input signals, while here, we build on these previous results to bridge between neuronal mechanisms and psychophysics. We show that standard models of retinal neurons quantitatively account for the way human contrast sensitivity depends on both spatial and temporal frequency, but only when one takes into account the temporal sensitivity of ganglion cells and their interaction with oculomotor luminance modulations. These results challenge widely accepted hypotheses about retinal functions (Atick and Redlich, 1992), which rely on the assumption that spatial filtering is sufficient to account for the shape of the CSF, and show that the proposal that space is encoded via oculomotor transients provides a unifying account of a large range of experimental findings on human visual sensitivity. As we describe in the revised manuscript, these conclusions have important consequences at the neural, perceptual, and clinical levels (Discussion).

Specifically, I think the paper would benefit from more experimental tests of the proposal. For example, are there measurements of spatial contrast sensitivity in awake behaving monkeys (i.e. with eye movements)? Recognizing that eye movements cannot be fully suppressed, can they still be manipulated to test the proposal (e.g. can they be increased)? Are there differences across eccentricity that could be exploited to provide additional tests?

While our framework makes novel predictions, we think that including further experimental tests would not improve the focus of the manuscript, as our goal here is to show that there is a logical explanation for a gap between well-established sets of measurements across many labs. However, we fully agree with the reviewer that experiments with controlled retinal image motion at selected eccentricities are highly interesting, and we have been working in this direction. We recently reported preliminary results consistent with the predictions of this work at the recent annual meeting of the Vision Sciences Society (Intoy et al., 2018). Given the reviewer’s comment, we expanded the Discussion to mention the improvements in word and object recognition reported in patients with central visual loss, when images or text are jittered or scrolled (Gustafsson and Inde, 2004; Watson et al., 2012; Harvey and Walker, 2014). These results are consistent with our prediction that a larger fixational instability should enhance sensitivity at low spatial frequencies, because of larger power available in this range (Figure 5A).

Finally, contrast sensitivity measurements have been performed in awake behaving monkeys (De Valois et al., 1974), and are included in Figure 1A, so that the reader can directly compare neurophysiological and behavioral measurements in the same species. These measurements are very similar to those reported in humans (also shown in the same figure for comparison purposes).

Reviewer #2:

[…] Overall, this is an expertly conducted combination of computational modeling and eye movement recording that provides an elegant solution to a long-standing problem in vision science. My comments are aimed mainly at clarifying the presentation.

The paper refers to "fixational eye movements" or "FEMs" throughout, but what is meant by this is the slow drift component of fixational eye movements, and not microsaccades or oscillatory movements. A few points. First, the paper is reasonably clear on this point, but not completely and not everywhere. For example, the casual reader who only looks at the Abstract and skims the details (including the Results where the focus on drift is pointed out) might conclude that this applies to microsaccades, since they are the best-known component of FEMs. Is there a reason to avoid using the more specific term "ocular drift" rather than "FEMs" throughout, or at least, more often?

The only reason for using “fixational eye movements” was to adopt a terminology with which readers may be more familiar. But we fully agree with the reviewer that this may lead to ambiguity, and in the revised manuscript we replaced this term with “fixational drift” (or, in many cases, simply “drift”) both in the text and figures.

Second, what is the impact of microsaccades? Would these also be expected to affect the CSF or is the scale of their temporal modulation too high to affect responses? This point should be clarified. Third, if ocular drift and microsaccades have distinguishable effects on CSF, then the authors should be especially careful with the use of the term "FEMs", especially in the Discussion, where it seems that the conclusions based on ocular drift appear to be generalized to all types of fixational eye movements.

This manuscript specifically focuses on eye drift not only because its consideration is by itself sufficient to account for experimental data, but also because humans tend to suppress microsaccades during contrast sensitivity measurements (see Mostofi et al., 2016). But we agree, microsaccades are interesting and important, and we now comment on them, as summarized below.

Briefly, microsaccades and, in general, saccades redistribute the spatiotemporal power of the retinal stimulus in a highly different manner than ocular drift, providing significantly more temporal power at low spatial frequencies. This difference leads to the prediction that microsaccades and saccades should enhance visual sensitivity at low spatial frequencies. This prediction has been recently confirmed for larger saccades (see Boi et al., 2017), but not for saccades smaller than 1 degree (Mostofi et al., 2016). A possibility, discussed in Mostofi et al., 2016 is that perhaps for microsaccades, the beneficial consequences of luminance transients and the negative consequences of saccadic suppression (a reduction in sensitivity before and during saccade) may more evenly counterbalance each other. In the resubmitted manuscript, we now comment on the possible function of microsaccades and how their input reformatting differs from drift in the Discussion (fifth paragraph).

In several places – starting with the title – the manuscript implies that there is an active strategy behind these effects of FEMs, rather than it being an incidental effect. To me, "active strategy" implies some reliable relationship between the circumstance and the subject's behavior. For example, the needle-threading task shows that microsaccades appear to be generated based on some sort of active strategy. For ocular drift, it is not clear that such a relationship has been established. Do you have some basis of this claim, or is it simply a speculation? If it is a speculation, the text should be written more conservatively to match.

Our use of the term “active” was simply intended to convey the notion that contrast sensitivity is not merely the outcome of sensory processes, but it also includes oculomotor contributions. To avoid ambiguity, we changed the title by replacing the word “active” with “oculomotor” and revised the text similarly. However, there is some evidence that humans do in fact actively control the overall amount of drift, and we now comment on this in the Discussion (eighth paragraph) in the context of the predictions of our study.

Another plausible explanation for why the behavioral CSF does not match the properties of the retinal output is that some part of the rest of the visual system is responsible. Is there a basis for excluding this possibility?

This is a reasonable possibility that we cannot exclude; we added a comment about this in the fourth paragraph of the Discussion. However, while one might expect contributions from multiple stages of the visual system, our results show that retinal sensitivity and fixational drift suffice to account for the CSF over a broad spatiotemporal range, without a further downstream reshaping. This is a further reason why we think our results are significant, as we also now comment.

Reviewer #3:

This paper explores how non-separable spatio-temporal frequency tuning of M and P cells combines with measured fixational eye movements to account for observed behaviorally measured visual contrast sensitivity. A minimal model agrees well with a range of experimental data (free viewing of gratings at different temporal frequencies as well as retinal stabilization experiments) and suggests that eye movements have been optimized for neuronal contrast sensitivity curves (this work) as well as the content of natural scenes (previous work from the same groups). Experimental tests are suggested, particularly for retina-in-a-dish style experiments, where movements typical of FEMs are rarely added to visual stimuli. Experiments are also suggested for human subjects and for some clinical applications.

1) The paper is clearly presented and the work is thorough and interesting, but there should be more emphasis in the Introduction on why it's so important to understand the discrepancies between neuronal and behavioral CSFs for stationary stimuli. This could be addressed with a bit of text that previews some of the Discussion points about the neural coding and potential clinical applications of these findings.

We rewrote large sections of the Introduction to better explain the rationale for our work. As suggested by the reviewer, we now start by reviewing the relation between the shape of the CSF and theories of efficient encoding and mention that an explanation of contrast sensitivity in humans has important conceptual consequences and potential clinical implications. While the discrepancies fundamentally come down to quantitative measurements, there are two main qualitative points about function: whether the physiologically-measured low-frequency attenuation in neural responses accounts for spatial decorrelation of natural images (as theorized by Atick and Redlich), and whether the physiologically-measured spatiotemporal properties of ganglion cell receptive fields account for the spatiotemporal inseparability present in the human CSF. In both cases, we show that the effects of FEMs – and not just neuronal filtering characteristics – are critical. The Introduction section has been modified extensively to explain these points, see the third and seventh paragraphs.

2) Some recap of existing retinal recordings with FEM-like perturbations (e.g. Greschner et al., 2002) should be added to the Discussion.

We expanded discussion of previous physiological studies in which stimuli included temporal modulations like those resulting from fixational eye movements. These studies, most of them performed in vitro, provide support to our proposal that FEMs are an integral component of the retinal neural code. The relevant text is in the seventh paragraph of the Discussion.

3) Figure 1A – no data points are apparent below about 2Hz for the P and M cells, right around where the neuronal and behavioral CSFs diverge. It would be nice to include some data points there, perhaps from other studies. If data points do exist at 0Hz, they should be made more apparent.

If we understand correctly, the reviewer is referring to 2cpd, not 2Hz, as this is what was plotted on the abscissa of Figure 1A. As mentioned above, this figure was problematic in several respects, all of which have now been fixed. Specifically, the physiological data are not extrapolations; they are the receptive fields of retinal ganglion cells given by standard difference of Gaussians models, as measured by Croner and Kaplan (1995). The symbols previously on these curves were simply markers used to distinguish the curves. We can see how they could be misinterpreted as data points, so we now distinguish the curves by line styles. The parameters for these models were experimentally estimated by Croner and Kaplan by presenting gratings at different spatial frequencies, with a minimum of 0.07 cpd (Croner and Kaplan, 1995), which extends below the range plotted.

[Editors' note: the author responses to the re-review follow.]

The revisions have improved the paper substantially. There is one outstanding point remaining about the ganglion cell model (see reviews for details). In consultation all of the reviewers agreed this was important to resolve.

Reviewer #1:

This is a revision of a paper about the relationship between fixational eye movements and spatial contrast sensitivity. The paper centers around the observation that the spatial contrast sensitivity function measured behaviorally in primates differs from that of retinal ganglion cells, particularly at low spatial frequencies. The paper argues that eye movements and the temporal sensitivity of ganglion cells account for this difference.

The paper has improved in revision. In particular, I found the Introduction more compelling, and the connection of the spatial ganglion cell models to past work is much less confusing with Figure 1 fixed. An issue that became clearer in the revised version of the paper is that the temporal model for the ganglion cells needs to be described more thoroughly. This and some smaller points follow.

The crux of the paper is how the dynamics of eye movements make nominally static spatial inputs dynamic, and this shifts static inputs into a temporal region where the ganglion cells are more responsive. The ganglion cell model in Equation 4 is taken straight from prior measurements, and hence is well supported in the literature (see suggestion below about including relevant parameters). However, this model does not predict a complete lack of response at a temporal frequency of 0 Hz, as assumed in the paper.

We think we see where some of the difficulties have arisen. It is impossible to measure responses that are truly at 0 Hz, as it would require an infinitely long experiment to do so – and this holds not only for physiological measurements, but also for psychophysical ones. As a practical matter, the lowest frequency measurable is comparable to the reciprocal of the duration of a trial (i.e., 0.2 or 0.3 Hz); any attempt to infer behavior at lower frequencies would be confounded by the visual input on adjacent trials (or between trials). With this in mind, we had intended our statements about insensitivity to 0 Hz as shorthand for the more legalistic, “negligible sensitivity below the frequencies at which sensitivity can practically be measured.” As explained in detail below, this hypothesis is well justified from the data available in the literature, and our results are robust to the specifics of how exactly this hypothesis is incorporated into the neural models. We now make our point of view explicit in the paper and better explain how the model relates to physiological data at low temporal frequencies.

There are several issues with how this is treated in the paper. First, the temporal model is central to the paper, and should not be relegated entirely to the Materials and methods (one suggestion would be to add a figure between the current Figure 2 and 3 showing the temporal response of the ganglion cell model).

We fully agree with the reviewer and added a panel in Figure 2. This new panel (Figure 2E) shows the temporal profiles of both M and P cells. We also provide two new tables (Table 1 and 2), in which we report the values of the parameters used to model both spatial (Table 1) and temporal kernels (Table 2).

Second, is there experimental evidence that static components of the stimulus do not modulate the ganglion cell responses? This assumption should be clarified earlier (in reference to Figure 3) and needs to be justified, especially as it introduces an abrupt discontinuity in the ganglion cell model between 0 and nonzero temporal frequencies.

Strictly speaking, 0 Hz is a mathematical abstraction, and measuring sensitivity at this frequency is not physically possible. We therefore focus here on the broader question of sensitivity of retinal ganglion cells to low temporal frequencies, an issue that regards primarily P cells. These are the neurons with more sustained responses (Figure 2E). There is considerable experimental evidence in support of the way we handle the low-frequency limit, and we have added two paragraphs to the Discussion as well as a figure (Figure 4—figure supplement 3) to comment on this issue.

It is first important to realize that, in most neurophysiological (and psychophysical) studies, the data reported in the low temporal frequency range do not provide reliable estimation of sensitivity. This happens for multiple reasons, including: the too short duration of the experimental trial; the lack of consideration of the visual stimuli present before and after each trial, and the length of the estimated impulse response.

Typically, the transfer functions reported at low temporal frequencies are extrapolations outside of the range of measured values based on models that were not designed for this purpose (e.g., the linear cascade model (Victor, 1987) in Benardete and Kaplan, 1997 and 1999; a difference of exponential in Derrington and Lennie, 1984). Both of these models use functional forms that flatten out at very low temporal frequencies, but this flattening occurs below the frequencies at which data are acquired to fit the model. These extrapolations must be interpreted with great caution, as they merely reflect untested model assumptions. Victor’s (1987) linear cascade model estimated by Kaplan and colleagues, which we used in our study (Figure 2E), was never meant to serve as an extrapolation to frequencies outside of the range used to fit it, and this is why we don’t merely use model values down to DC. Benardete and Kaplan, for example, only measured impulse responses for ~0.5 sec, so the frequencies in the Fourier transform of the impulse response, which they used to fit the model, did not go below 2 Hz.

So, rather than use these extrapolations, we turn to the very few studies that specifically examined low temporal frequencies in retinal ganglion cells. These studies found a decline in sensitivity up to the limit that they could measure. This applies both to the more sustained X channel in the cat (Frishman et al., 1987; Victor, 1987), as well as P cells in the macaque, as shown in Figure 12A-B in Purpura et al., 1990. These studies suggest that the response attenuation takes the form of an approximately linear decrease in loglog scale. Such behavior is also expected from theoretical considerations based on the characteristics of adaptation (Thorson and Biederman-Thorson, 1974), considerations that seem to apply to the responses of cones in the retina of the macaque (Boynton and Whitten, 1970) and therefore will limit the low-frequency behavior of retinal ganglion cells.

Independently, any retinal sensitivity at frequencies ~0.3 Hz and below is likely to be virtually useless in a psychophysical experiment in trials of 2-3 sec or less – because whatever low-TF signal is present is likely to be masked by low-TF noise contributed by visual input on adjacent trials, or what the subject does between trials (e.g., looks around the lab, blinks, etc.).

Our model is highly robust to the specifics of how this reduction in sensitivity at low temporal frequencies is implemented in the simulations. In the manuscript, we simply discarded responses below a frequency threshold of 0.63Hz. But results were virtually identical when we used different frequency thresholds, or when we modeled sensitivity as a power law function of temporal frequency in the low-frequency range (as in Purpura et al., 1990, and Thorson and Biederman-Thorson, 1974, respectively).In the latter case, results are also robust with respect to the slope of the power-law function. We comment on these results in the Discussion and refer the interested reader to a new supplementary figure (Figure 4—figure supplement 3).

Related to this point, the Robson measurements shown in Figure 4 show a marked attenuation of the CSF at low spatial frequencies for stimuli modulated at 1 Hz – such that the 1 Hz CSF is not very different from that shown for static gratings by DeValois. This suggests that the static/dynamic separation is not completely correct.

We do not follow the reviewer here: our model correctly predicts the gradual change in spatial sensitivity shown in the Robson data, as the temporal frequency of the grating is increased (see Figure 4). There is no abrupt transition in the predicted CSF, which is consistent with the smooth transition between dynamic and static power in the retinal input (see Figure 2C).

Perhaps, confusion occurred here between the temporal modulation of the stimulus on the monitor (0 Hz for a static grating) and the temporal fluctuations in the responses of our models (where 0 Hz indicates a constant response). The two things are not equivalent: we always use model responses at all non-zero temporal frequencies – rather than just at the temporal frequency of the grating – to estimate contrast sensitivity. We think part of this confusion was generated by the previous version of Figure 2, which was giving the false impression of a dichotomy in the visual input by showing next to each other the spatial distributions of power on the retina at 0 Hz and the power integrated across all non-zero temporal frequencies (panels D and E respectively). These two curves were only meant to show that the power distributions differ at different temporal frequencies (already evident from the map in Figure 2B), not to imply the presence of a discontinuity.

In the revised manuscript, we have modified Figure 2 to eliminate possible ambiguity. We also added a supplementary figure (Figure 4—figure supplement 2) in which we show the predicted CSF at temporal modulations of the gratings not present in the Robson data in Figure 4, so to further highlight that the model well captures the smooth low-pass to band-pass transition in the CSF.

Third, temporal sensitivity measured behaviorally depends strongly on spatial frequency (e.g. see Robson paper). Shouldn't this affect the argument presented in the paper – i.e. that gratings of different spatial frequencies are subject to quite different temporal filtering apparently? Some discussion of this is needed.

A strength of our study is that it explains the way human temporal sensitivity varies across spatial frequencies (a space-time inseparable function) on the basis of space-time separable neural filters, like the ones of retinal ganglion cells. In other words, our model captures the full spatiotemporal pattern of contrast sensitivity by means of a linear combination of the sensitivity of P and M cells, without requiring additional temporal filters to process different ranges of spatial frequencies, as one may think. We have added a paragraph to the Discussion to comment on this point.

Unless the lack of ganglion cell responses to static images can be justified based on past experiments, the paper needs to be much more careful in asserting that known ganglion cell models can account for the discrepancy between CSFs measured behaviorally and in ganglion cells. Statements to this effect are made in many places in the paper – starting in the Abstract, Results, seventh paragraph, Discussion, fourth paragraph, etc.

As explained above, our model is well justified by the literature. We have carefully revised the text, including the points highlighted by the reviewer, to eliminate confusion and make sure that our claims are clear.

Reviewer #2:

The authors have thoroughly revised the paper and, to my mind, have satisfied the reviewers' concerns and questions. The manuscript is much clearer and is now suitable for publication in eLife.

One small point that may further improve the clarity of the presentation:

In Figure 4 and Figure 4—figure supplement 1, results are presented linking model predictions to CSFs measured during the presentation of temporally modulated gratings. From the Results and Discussion, it's clear that the effects of drift are more apparent at low temporal frequency. It would be useful to see an array of curves from the model without drift, pinpointing the transition around a few Hz where transients from the stimulus override the effects of eye movements. At the very least, it would be useful to label the Figure 4—figure supplement 1 model lines with "drift".

We thank reviewer 2 for the nice comments. We added the suggested new figure (Figure 4—figure supplement 2), which plots the CSF predicted by our model at two additional intermediate temporal modulating frequencies: 2 Hz and 3 Hz. This figure shows that the transition from a band- to a low-pass behavior of the CSF occurs around 3 Hz. This result is consistent with psychophysical results; see, for example, the data from Bowker and Tulunay-Keesey, 1983, reported in their Figure 1, as we now mention in the eleventh paragraph of the Results section. We also re-labelled the curves in Figure 4—figure supplement 1 and Figure 5—figure supplement 1 with “Drift”, as suggested by the reviewer.

https://doi.org/10.7554/eLife.40924.023

Article and author information

Author details

  1. Antonino Casile

    1. Center for Translational Neurophysiology, Istituto Italiano di Tecnologia, Ferrara, Italy
    2. Center for Neuroscience and Cognitive Systems, Rovereto, Italy
    3. Department of Neurobiology, Harvard Medical School, Boston, United States
    Contribution
    Conceptualization, Data curation, Software, Supervision, Investigation, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    1. antonino.casile@iit.it
    2. toninocasile@gmail.com
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7824-9274
  2. Jonathan D Victor

    1. Brain and Mind Research Institute, Weill Cornell Medical College, New York, United States
    2. Department of Neurology, Weill Cornell Medical College, New York, United States
    Contribution
    Conceptualization, Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9293-0111
  3. Michele Rucci

    1. Brain and Cognitive Sciences, University of Rochester, Rochester, United States
    2. Center for Visual Science, University of Rochester, Rochester, United States
    Contribution
    Conceptualization, Supervision, Methodology, Writing—review and editing
    For correspondence
    rucci.michele@gmail.com
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3066-1964

Funding

National Eye Institute (EY018363)

  • Michele Rucci

National Science Foundation (BCS-1457238)

  • Michele Rucci

National Eye Institute (NEI 07977)

  • Jonathan D Victor

National Science Foundation (1420212)

  • Michele Rucci

Harvard/MIT Joint Research Program

  • Antonino Casile

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Ethics

Human subjects: Informed consent was obtained from all participants following the procedures approved by the Boston University Charles River Campus Institutional Review Board (protocol number 1062E).

Senior Editor

  1. Timothy E Behrens, University of Oxford, United Kingdom

Reviewing Editor

  1. Fred Rieke, University of Washington, United States

Publication history

  1. Received: August 9, 2018
  2. Accepted: December 3, 2018
  3. Version of Record published: January 8, 2019 (version 1)

Copyright

© 2019, Casile et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 653
    Page views
  • 107
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Computational and Systems Biology
    Nidhi Seethapathi, Manoj Srinivasan
    Research Article
    1. Computational and Systems Biology
    2. Microbiology and Infectious Disease
    Andrei Prodan et al.
    Insight