1. Neuroscience
Download icon

Nonlinear circuits for naturalistic visual motion estimation

  1. James E Fitzgerald Is a corresponding author
  2. Damon A Clark Is a corresponding author
  1. Harvard University, United States
  2. Yale University, United States
Research Article
Cited
6
Views
1,981
Comments
0
Cite as: eLife 2015;4:e09123 doi: 10.7554/eLife.09123

Abstract

Many animals use visual signals to estimate motion. Canonical models suppose that animals estimate motion by cross-correlating pairs of spatiotemporally separated visual signals, but recent experiments indicate that humans and flies perceive motion from higher-order correlations that signify motion in natural environments. Here we show how biologically plausible processing motifs in neural circuits could be tuned to extract this information. We emphasize how known aspects of Drosophila's visual circuitry could embody this tuning and predict fly behavior. We find that segregating motion signals into ON/OFF channels can enhance estimation accuracy by accounting for natural light/dark asymmetries. Furthermore, a diversity of inputs to motion detecting neurons can provide access to more complex higher-order correlations. Collectively, these results illustrate how non-canonical computations improve motion estimation with naturalistic inputs. This argues that the complexity of the fly's motion computations, implemented in its elaborate circuits, represents a valuable feature of its visual motion estimator.

https://doi.org/10.7554/eLife.09123.001

eLife digest

Many animals have evolved the ability to estimate the speed and direction of visual motion. They use these estimates to judge their own motion, so that they can navigate through an environment, and to judge how other animals are moving, which allows them to avoid predators or detect prey.

In the 1950s, a physicist and a biologist used measurements of beetle behavior in response to visual stimuli to develop a model for how the brain estimates motion. The model became known as the Hassenstein-Reichardt correlator (HRC). The HRC and related models accurately predict the behavioral and neural responses of insects and mammals to many types of motion stimuli.

However, there are visual stimuli that generate motion percepts in fruit flies (and humans) that cannot be accounted for by the HRC. Are these differences between real brains and the HRC simply imperfections in visual circuits, whose neurons cannot perform idealized mathematical operations, or are these deviations intentional, somehow improving motion estimates? In other words: are the observed deviations a bug or a feature of visual circuits?

To address this question, Fitzgerald and Clark evaluated how different models of motion detection performed when presented with natural scenes. Natural scenes are fundamentally different from most stimuli used in lab, since they contain a rich set of regularities that are not present in simple stimuli. Fitzgerald and Clark compared the ability of the HRC, along with new, more general models, to estimate the speed and direction at which images moved across a screen. This revealed that many models could out-perform the HRC by taking advantage of regularities in natural scenes. Those models that were tuned to perform well with natural scenes could also predict the paradoxical motion percepts that were not predicted by the HRC. This suggests that visual circuits may have evolved to perform well with natural inputs, and the paradoxical motion percepts represent a feature of the real circuit, rather than a bug.

Models that performed well with natural inputs treated light and dark visual information differently. This different treatment of light and dark is a property of most visual systems, but not of the HRC or related models. In the future, these models of motion processing may help us understand how biological details of the fruit fly's visual circuits help it to estimate motion.

https://doi.org/10.7554/eLife.09123.002

Introduction

A major goal in neuroscience is to understand how the brain computes behaviorally relevant stimulus properties from streams of incoming sensory data (Sejnowski et al., 1988). Visual motion guides behaviors across the animal kingdom. To navigate, many vertebrates and invertebrates use visual data to estimate the velocity of full field motion, and they use that estimate to judge their motion with respect to their environment (Sperry, 1950; Kalmus, 1964; Reichardt and Poggio, 1976; Orger et al., 2000). Spatially localized motion perception (Hubel and Wiesel, 1962; Barlow and Hill, 1963; Buchner, 1976) is also important, as it can indicate the presence of predators or prey in the environment (Reichardt et al., 1983; Gabbiani et al., 1999; Nordström et al., 2006; Zhang et al., 2012), and spatial velocity gradients allow animals to judge relative distances (Rogers and Graham, 1979; Srinivasan et al., 1991; Kral, 2003; Pick and Strauss, 2005). In principle, different algorithms could be used to estimate different types of motion. However, data suggest that many animals compute local motion over an array of spatially localized elementary motion detectors, or EMDs, and then differentially pool those signals for use in different behaviors and neural operations (Hubel and Wiesel, 1962; Barlow and Hill, 1963; Buchner, 1976; Britten et al., 1992; Gabbiani et al., 1999; Franz and Krapp, 2000; Rust et al., 2006).

The Hassenstein-Reichardt correlator (HRC) was introduced nearly sixty years ago to model the EMD underlying the beetle's optomotor response (Hassenstein and Reichardt, 1956). It has since provided numerous insights into motion-guided behaviors across a variety of insect species. The HRC's successes are most striking in flies, where the HRC accurately predicts a wide variety of behavioral and neural responses (Götz, 1968; Buchner, 1976; Egelhaaf and Borst, 1989; Haag et al., 2004), and even adaptation to stimulus statistics (Borst et al., 2005). The HRC's importance also extends to primates and vertebrates, where the EMDs are often described in terms of the motion energy model (Adelson and Bergen, 1985). In particular, although the HRC and motion energy models differ in terms of their intuition and neuronal bases, both models rely on the same mathematical fact about moving visual stimuli—motion causes pairs of spatially separated points to become correlated when a stimulus moves from one location towards the other (Adelson and Bergen, 1985; van Santen and Sperling, 1985). A simple algebraic identity shows that the HRC and motion energy models are computationally equivalent.

Research on Drosophila's motion detection system has progressed quickly in recent years. With an influx of novel genetic, anatomical, and physiological tools, Drosophila researchers are able to perform experiments that have revealed an intricate neural circuit whose details were not anticipated. For example, separate pathways process the motion of light and dark moving edges (Joesch et al., 2010; Clark et al., 2011; Maisak et al., 2013), and different neurons within these pathways coordinate the motion response depending on the velocity of motion (Ammer et al., 2015). Furthermore, connectomic analysis has revealed that more spatial and temporal channels converge onto the fly's motion computing neurons than had been predicted by the HRC's two-input architecture (Takemura et al., 2013). Going forward, it is critical that the field discovers which of these circuit details are computationally relevant and which are not. Since many of these details go beyond the HRC's premise, we must consider alternate theories if we hope to understand how circuit details contribute to motion estimation.

A large body of theoretical and experimental work supports the hypothesis that visual systems are tailored for functionality in the animal's natural behavioral context (Simoncelli and Olshausen, 2001). For example, photoreceptors adapt effectively across the ecological range of light levels (Juusola and Hardie, 2001), the excess number of OFF vs ON retinal ganglion cells matches the excess information of dark vs light contrasts in natural images (Ratliff et al., 2010), and several learning algorithms predict receptive fields similar to early cortical neurons when applied to natural images (Olshausen and Field, 1996; Bell and Sejnowski, 1997). These examples are special cases of the general hypothesis that the early visual system provides an efficient code for the natural visual environment, and recent research suggests that efficient coding accounts for certain aspects of higher-level coding and perception as well (Tkačik et al., 2010; Hermundstad et al., 2014; Yu et al., 2015).

Several recent studies have established connections between the biological algorithms used for visual motion estimation and the statistical demands of naturalistic motion estimation. Natural stimuli are intricately structured and light–dark asymmetric (Geisler, 2008), and a variety of low and high order correlations characterize motion in such an environment. Although the HRC and motion energy models only respond to pairwise correlations in their inputs (Adelson and Bergen, 1985; van Santen and Sperling, 1985), the Bayes optimal visual motion estimator also incorporates a variety of higher-order correlations of both even and odd order (Potters and Bialek, 1994; Fitzgerald et al., 2011). Accordingly, certain visual stimuli that contain only higher-order correlations induce motion percepts in both vertebrates and insects (Chubb and Sperling, 1988; Quenzer and Zanker, 1991; Zanker, 1993; Orger et al., 2000; Hu and Victor, 2010; Clark et al., 2014), and theoretical work shows that the correlations that characterize these stimuli can also improve motion estimation in natural environments (Clark et al., 2014). This demonstrates that neither the HRC nor the motion energy model can account for the totality of experimentally observed motion percepts and suggests that departures from these canonical models might improve motion estimation accuracy. Relatively little is known about the neural basis of these higher-order motion percepts, although several studies have suggested intriguing commonalities across insect and primate species (Clark et al., 2014; Nitzany et al., 2014).

Here we investigate whether the computational demands imposed by accurate motion estimation in natural environments can illuminate the unexpected details of Drosophila's motion estimation circuit or account for non-Reichardtian motion perception in flies. We study a sequence of five computational models, each of which considers a conceptually new aspect of the motion estimation problem. Since each model succeeds in improving estimation accuracy, these results provide a range of nonlinear circuit mechanisms that flies and other animals might incorporate into their motion estimators. We describe how observed elements of Drosophila's motion estimation circuitry could support such computations (Table 1). Importantly, four of the five models also predict the signs and approximate magnitudes of known non-Reichardtian motion percepts in flies. Since the models were tuned exclusively for estimation accuracy, these results support the view that non-Reichardtian motion percepts probe ethologically relevant aspects of biological motion estimators. More generally, our results posit normative interpretations for some unexpected aspects of the fly's motion estimation circuit and behavior and suggest that non-Reichardtian aspects of fly circuitry and behavior might be closely linked through the statistics of natural scenes.

Table 1

The different models used in this paper, experimental results that support each model, and references for those results

https://doi.org/10.7554/eLife.09123.003
ModelSupporting evidence
Front-end nonlinearity• Photoreceptors show nonlinear responses to contrast changes (Laughlin, 1989; Juusola and Hardie, 2001; Juusola and Hardie, 2001; van Hateren and Snippe, 2006)
• Some neurons in the early visual system have nonlinear responses that make their output signals nearly uniform (Laughlin, 1981)
Weighted 4-quadrant model• Visual processing is divided early into ON and OFF channels (Joesch et al., 2010; Clark et al., 2011; Behnia et al., 2014; Meier et al., 2014; Strother et al., 2014)
• The two output channels (T4/T5) are sensitive to light and dark edges (Maisak et al., 2013), but their inputs are incompletely rectified (Behnia et al., 2014)
• Stimuli targeting the four quadrants are differentially represented in neural substrates (Clark et al., 2011; Joesch et al., 2013)
Non-multiplicative nonlinearity• Pure multiplication is not a trivial neural operation (Koch, 2004)
• Inputs to T4/T5 are nonlinearly transformed (Behnia et al., 2014), which also contributes to the biologically implemented non-multiplicative nonlinearity
Unrestricted nonlinearity• The direction-selective neurons T4 receive inputs from more than two types of neurons (Takemura et al., 2013; Ammer et al., 2015)
• T4 receives inputs from both its major input channels at overlapping points in space (Takemura et al., 2013)
Extra input nonlinearity• The direction-selective neuron T4 receives inputs from more than two discrete retinotopic locations (Takemura et al., 2013)

Results

Flies incorporate motion signals that the HRC neglects

The HRC is the dominant model of motion computation in flies and other insects. In this paper we describe several generalizations of the HRC, but it is helpful to first review this canonical model. The HRC comprises three stages of processing. First, two different temporal filters (here, a low-pass filter (f(t)) and a high-pass filter (g(t))) are applied to each of two spatially filtered visual input streams (Figure 1A, ‘Materials and methods’). These four filtered signals are then paired and multiplied (Figure 1A). Finally, the HRC takes the difference between the two multiplied signals to obtain a mirror anti-symmetric motion estimator (Figure 1A). Because the HRC combines its two input channels via a multiplication operation, the average output of the HRC depends only on 2-point correlations in the visual stimulus. We thus refer to the HRC as a 2-point correlator, and we will return to this mathematical characterization of the HRC repeatedly throughout this work.

The Hassenstein-Reichardt correlator (HRC) model is an incomplete description of Drosophila's motion estimator.

(A) Diagram of the HRC model. (B) We assessed motion estimation performance across an ensemble of naturalistic motions, each of which consisted of a natural image (van Hateren and van der Schaaf, 1998) and a velocity chosen from a normal distribution. (C) We quantified model accuracy by comparing the model response to the true velocity using the mean squared error. (D) We summarized the error with the correlation coefficient between the model output and the true velocity. (E) In previous work (Clark et al., 2014), we used a panoramic display and spherical treadmill to measure the rotational responses of Drosophila to visual stimuli. (F) We presented flies with binary stimuli called gliders (Hu and Victor, 2010), which imposed specific 2-point and 3-point correlations (Clark et al., 2014). (G) Flies turned in response to 3-point glider stimuli, but these responses cannot be predicted by the standard HRC. (H) Diagram of the converging 3-point correlator, which is designed to detect higher-order motion signals like those found in 3-point glider stimuli. (I) Adding the converging 3-point correlator to the HRC improved motion estimation performance with naturalistic inputs. We optimized weighting coefficients to minimize the mean squared error over the ensemble of naturalistic motions and used cross-validation to protect against over-fitting. (J) This model predicted that Drosophila would weakly turn in response to 3-point glider stimuli.

https://doi.org/10.7554/eLife.09123.004

No motion estimator is perfect for every stimulus, and this paper explores the hypothesis that evolution has tuned Drosophila's motion estimator for visual experiences that are likely to result from ordinary behavior in natural environments (Appendix 1). We assessed the accuracy of the HRC and other motion estimators by approximating naturalistic motion as the rigid translation of natural images (Clark et al., 2014), with a velocity distribution that mimicked Drosophila's natural behavior (Figure 1B, ‘Materials and methods’) (Katsov and Clandinin, 2008). We spatiotemporally filtered the input signals to simulate the responses of two neighboring photoreceptors (‘Materials and methods’). We quantified the performance of each model as the mean squared error between the input velocity and model output. However, we report each model's accuracy as the correlation coefficient between its output and the true velocity (Figure 1C), an intuitive metric that is equivalent to the mean squared error for correctly scaled model outputs (‘Materials and methods’). In isolation, the local HRC was weakly correlated with the velocity of motion (Figure 1D). Although the HRC's performance can be improved by averaging over space and time (Dror et al., 2001; Clark et al., 2014), this study explores how alternate nonlinear processing can improve motion estimation accuracy without sacrificing spatial or temporal resolution (Clark et al., 2014).

Researchers can probe a fly's motion estimate by measuring its behavioral optomotor turning response (Hassenstein and Reichardt, 1956; Götz and Wenking, 1973; Buchner, 1976; Reichardt and Poggio, 1976). We previously measured optomotor responses from flies walking on a spherical treadmill by recording their turning responses to various visual stimuli (Figure 1E) (Clark et al., 2014). We emphasized binary stimuli called gliders (Hu and Victor, 2010) (Figure 1F), which enforce spatiotemporal correlations to interrogate the fly's motion estimation algorithm. For example, 2-point gliders contain only 2-point correlations (first two stimuli, Figure 1F). Drosophila turned in response to these stimuli (Clark et al., 2014) (black bars, left, Figure 1G), and the HRC correctly predicted that flies would respond to both positive and negative 2-point correlations (gray bars, left, Figure 1G). On the other hand, 3-point gliders contain 3-point correlations without 2-point correlations (last four stimuli, Figure 1F). These stimuli generated motion responses in flies (black bars, right, Figure 1G) that the HRC could not explain (gray bars, right, Figure 1G). Thus, behavioral responses to glider stimuli show that the HRC is an incomplete description of fly motion estimation and provide a useful benchmark for evaluating alternate models.

In this study, we tune our models to optimize motion estimation accuracy, rather than to fit the behavioral data, for two main reasons. First, we want to explore the hypothesis that Drosophila's glider responses follow from performance optimization within biologically plausible circuit architectures. Second, we seek models that will generalize well across visual stimuli, and the measured glider responses under-constrain possible motion estimation models. It's useful to illustrate our procedure with a simple example. The HRC does not account for 3-point glider responses because it is insensitive to 3-point correlations. Nevertheless, 3-point correlations are present in natural stimuli (Clark et al., 2014; Nitzany and Victor, 2014), and their use might facilitate accurate motion estimation. We can explore this hypothesis by summing the HRC with a motion estimator designed to respond specifically to 3-point correlations. For instance, the mirror anti-symmetric ‘converging’ 3-point correlator multiplies one high-pass filtered signal with two low-pass filtered signals (Figure 1H) and mimics the converging structure present in certain glider stimuli (last two stimuli, Figure 1F). We tune the model for motion estimation accuracy by choosing the weights of the HRC and the converging 3-point correlator to minimize the mean squared error (‘Materials and methods’). The resulting model is more accurate than the HRC (Figure 1I) and it predicts that flies should respond to glider stimuli in the observed directions (Figure 1J, ‘Materials and methods’). Nevertheless, this simple model underestimates 3-point turning magnitudes (Figure 1J), indicating a discrepancy between the fly's motion estimator and this performance-optimized model.

In this study, we apply this same basic model building procedure to a series of increasingly general model architectures. There are four benefits to this approach. First, each model incorporates a type of computation that was neglected by earlier models. Thus, we can compare model accuracies to quantify how important various computations are for naturalistic motion estimation. Second, each model has a distinct biological interpretation in terms of Drosophila's motion estimation circuit (Table 1). This allows us to enumerate many directions for future experimental and computational research. Third, this set of models reveals several distinct principles of accurate naturalistic motion estimators, yet no single model illustrates every principle. Finally, by comparing the glider predictions of each model to behavioral data, we can gain insight into which principles underlie Drosophila's known glider responses.

Nonlinear preprocessing of HRC inputs improves estimation but poorly predicts responses to gliders

The HRC correlates pairs of photoreceptor signals (Figure 1A). We previously assumed that each photoreceptor's response was generated from incoming contrast signals through linear spatiotemporal filtering. However, real photoreceptors are linear only over a limited range of inputs (Laughlin, 1981; Juusola and Hardie, 2001) (Table 1). Our first model thus modifies the HRC by allowing the photoreceptor responses to become nonlinear (Figure 2A). More specifically, we consider models in which a static nonlinearity transforms the filtered contrast signals before a standard HRC is applied to the two input streams (Figure 2A, ‘Materials and methods’). Since the nonlinearity occurs before the HRC, we refer to this model as the front-end nonlinearity model. By nonlinearly transforming the contrast signals, the front-end nonlinearity model is able to reshape natural sensory statistics. In particular, linear photoreceptor signals inherit complex non-Gaussian statistics from their natural inputs (Figure 2B), but front-end nonlinearities (Figure 2C) can produce transformed signals with alternate statistics (Figure 2D, ‘Materials and methods’). Thus, optimal front-end nonlinearity models should reshape natural statistics into those that best suit the HRC. Previous studies have already demonstrated example front-end nonlinearity models that improve naturalistic motion processing by the HRC (Dror et al., 2001; Brinkworth and O'Carroll, 2009). Here we provide new theoretical insight into these improvements and their consequences for glider responses.

Figure 2 with 1 supplement see all
Front-end nonlinearities improved naturalistic motion estimation but did not reproduce the psychophysical results.

(A) Diagram of the front-end nonlinearity model. The nonlinearity occurs after the spatiotemporal filtering of photoreceptors but before the temporal filtering of the HRC. (B) The distribution of contrast signals after photoreceptor filtering had a kurtosis of 9.6. The kurtosis of unfiltered pixels in the image database was 7.8. (C) Three different nonlinearities that transformed this input distribution into a Gaussian distribution, a uniform distribution, and a binary distribution. (D) After these transformations, the kurtosis of the contrast signal was reduced to 3, 1.8, and 1, respectively. (E) Each front-end nonlinearity model improved the HRC's estimation accuracy, and uniform output signals worked best. (F, G) The front-end nonlinearity models reproduced the sign of the negative 2-point glider psychophysical responses but did not reproduce the pattern of psychophysical responses to 3-point gliders.

https://doi.org/10.7554/eLife.09123.005

Although the statistics of natural images are complicated, the mean squared error between the HRC's output and the velocity of motion depends only on a few statistical quantities. Since the HRC is a 2-point correlator, the mean velocity signal decoded by an HRC is determined by the second-order statistics of the image ensemble (Dror et al., 2001). The variance of the motion signal comes from the square of a quadratic signal, and thus the noise statistics of the HRC depend on the fourth-order statistics of the image ensemble (Appendix 2). If the image ensemble is spatially uncorrelated, the situation simplifies further and the correlation between the estimated and true image velocity is determined entirely by the standardized fourth central moment of the input streams, a quantity known as kurtosis (Appendix 3). A larger kurtosis results in a larger error in the motion estimate. Note that some authors use ‘kurtosis’ to refer to the ‘excess kurtosis’, which shifts kurtosis values such that the Gaussian distribution has zero excess kurtosis. This shift is not relevant for our purposes. Because large positive contrasts are relatively probable, naturalistic inputs are highly kurtotic (kurtosis = 9.6 for the spatiotemporal filtering in our simulations) and are thus expected to hinder HRC performance (Figure 2B).

The Gaussian, uniform, and symmetric Bernoulli distributions have much lower kurtosis values (kurtosis = 3.0, 1.8, 1.0, respectively, Figure 2D). In fact, the symmetric Bernoulli distribution has the lowest kurtosis of any probability distribution (DeCarlo, 1997). When we transformed the HRC's inputs to have these statistics (‘Materials and methods’), we found that each nonlinearity substantially improved the accuracy of the HRC (Figure 2E). The contrast equalizing nonlinearity, which produces uniform outputs, performed best and also plays a prominent role in efficient coding theory (Laughlin, 1981). It is interesting that contrast equalization improved the accuracy of the HRC more than binarization (Figure 2E), even though it produced outputs with greater kurtosis. The reason for this is that natural images are spatially correlated, and the accuracy of the HRC over a general image ensemble depends on the ensemble's spatial correlation structure (Appendix 2). Binarization attenuated spatial correlations more strongly than contrast equalization over the natural image ensemble (Figure 2—figure supplement 1), and spatial correlations can enhance the performance of the HRC (Appendices 4, 5). Designing a nonlinearity that optimally sculpts the correlation structure of natural images is not simple and goes beyond the scope of this study.

Each front-end nonlinearity model is sensitive to a variety of higher-order correlations (Appendix 6). We thus tested whether accurate front-end nonlinearity models would predict Drosophila's glider response pattern. However, each front-end nonlinearity model performed poorly at this task (Figure 2F,G). None of the three models predicted that Drosophila would invert its response to positive and negative 3-point gliders (Figure 2G), even though they predicted that the 3-point glider responses would be nonzero. The simplest explanation for this observation is that the front-end nonlinearity models responded to fourth-order correlations that are common to the stimuli, rather than the third-order correlations that defined the glider stimuli and primarily drove the experimental response (Clark et al., 2014). Mechanistically, this result follows from the fact that the nonlinearities that reduced kurtosis (Figure 2C) were not strongly asymmetric around zero contrast (Appendix 6). The binarizing front-end nonlinearity model also failed to predict that Drosophila would respond less to negative 2-point glider stimuli than positive 2-point glider stimuli (Figure 2F). Since this effect was correctly predicted by the standard HRC (Figure 1G), this observation shows that accurate front-end nonlinearity models can distort the processing of 2-point correlations. Although the front-end nonlinearity model did not explain the phenomenon of fly glider perception, future work should investigate whether its merits make it functionally relevant for motion processing in other contexts or species.

Separating ON and OFF signals improves motion estimation and predicts responses to gliders

Instead of a front-end nonlinearity, Drosophila could use an alternative non-Reichardtian motion estimation strategy that reflects natural sensory statistics, without necessarily requiring nonlinear preprocessing. Previous computational analyses show that motion estimation strategies that distinguish light and dark information can enhance motion processing with natural inputs (Fitzgerald et al., 2011; Clark et al., 2014; Nitzany and Victor, 2014), and recent experiments indicate that flies use separate channels to process the motion of light and dark edges (Joesch et al., 2010; Clark et al., 2011; Behnia et al., 2014; Clark et al., 2014; Meier et al., 2014; Strother et al., 2014) (Table 1). Our next model explores the hypothesis that Drosophila segregates ON and OFF signals in order to facilitate naturalistic motion estimation (Clark et al., 2014) (Figure 3A, ‘Materials and methods’). There are four ways to pair the ON and OFF components of the two filtered signals that enter the HRC's multiplier. For example, one possibility is to pair the ON component of the low-pass filtered signals with the OFF component of the high-pass filtered signal. Since each pairing restricts the HRC's multiplier to a single quadrant of the Cartesian plane, we refer to these four signals as HRC-quadrants. If the quadrants are summed with equal weights, then this model is mathematically identical to the HRC (Hassenstein and Reichardt, 1956; Clark et al., 2011). Unequal weighting coefficients enable the motion estimator to prioritize some quadrants over others, and here we select quadrant weightings that minimize the mean squared error between the model output and velocity (Figure 3B, ‘Materials and methods’). More generally, we refer to any model that linearly combines the four HRC-quadrants as a weighted 4-quadrant model. The precise manner in which the four HRC-quadrants might map onto circuitry remains unclear; we do not suggest there exists separate circuitry for each quadrant. For instance, studies have identified only two motion-processing channels in the Drosophila brain, which might suggest that the fly only uses a subset of the quadrants (Eichner et al., 2011; Joesch et al., 2013; Maisak et al., 2013). On the other hand, each channel appears imperfectly selective for light vs dark signals (Behnia et al., 2014), which in principle enables these two channels to access all four quadrants (Table 1).

Figure 3 with 2 supplements see all
The weighted 4-quadrant model improved estimation performance and reproduced the directionality of psychophysical results.

(A) Diagram of the weighted 4-quadrant model. Similar to ON/OFF processing in the visual system, the weighted 4-quadrant model splits the four differentially filtered signals into positive and negative components. As in the HRC, these component signals are paired, multiplied, and subtracted to produce four mirror anti-symmetric signals. We refer to these signals as HRC-quadrants. The model output is a weighted sum of the quadrant signals. We identify quadrants by whether they respond to the positive or negative components of each filtered signal and denote the four quadrants as (+ +), (+ −), (− +), and (− −). In this notation, the first index refers to the sign of the low-pass filtered signal (emanating from f(t)), and the second refers to the high-pass filtered signal (emanating from g(t)). (B) We measured the response of each quadrant to naturalistic motions and chose the quadrant weightings to minimize the mean squared error between the model output and the true velocity. (C) Comparison of the estimation performance of individual quadrants, multiple quadrants, and the HRC. The best two quadrants were (− −) and (− +); the best three also included (+ −). (D) The performance-optimized weighted 4-quadrant model reproduced the signs and approximate magnitudes of the psychophysical results.

https://doi.org/10.7554/eLife.09123.007

We began by examining how well individual quadrants predicted the velocity of motion. The four quadrants provided motion signals of strikingly different quality (first four red bars, Figure 3C). The most accurate quadrant correlated negative low-pass filtered signals with negative high-pass filtered signals ((− −) bar, Figure 3C). This isolated quadrant already outperformed the full HRC. The quadrant that correlated negative low-pass filtered signals with positive high-pass filtered signals also performed relatively well ((− +) bar, Figure 3C), whereas the quadrants that involved positive low-pass filtered signals performed poorly ((+ +) and (+ −) bars, Figure 3C). This shows that negative signals emanating from the low-pass filter better facilitate motion estimation, and the HRC's uniform weighting of all four quadrants is computationally detrimental.

We next considered all subsets of two, three, or four quadrants. The best subsets for each number of predictors were nested, and the quadrants were incorporated in the order (i) (− −); (ii) (− +); (iii) (+ +); (iv) (+ −). Although all four quadrants enhanced the accuracy of the weighted 4-quadrant model, the benefit of each added quadrant decreased with the number of quadrants (Figure 3C). It is possible to reparameterize the weighted 4-quadrant model in a form that isolates the contributions of various higher-order correlations to the model's accuracy (Appendix 7). Interestingly, this parameterization showed that nearly all the accuracy of the weighted 4-quadrant model can be obtained by supplementing the HRC with a set of odd-ordered correlations that account for the asymmetry between positive and negative low-pass filtered signals (Figure 3—figure supplement 1, Appendix 8). Principal component analysis (PCA) did not reveal this simple interpretation of the model's computation (Appendix 9).

The performance-optimized weighted 4-quadrant model also offered an interesting interpretation of Drosophila's glider response pattern. First note that the model preserved the HRC's response pattern to 2-point glider stimuli (compare left subpanels of Figure 3D and Figure 1G). More interestingly, the model predicted behavioral responses to 3-point glider stimuli that matched the experimentally observed turning directions, and even the response magnitudes were similar between the model and the data (right, Figure 3D). Nevertheless, the model's predictions were imperfect. The primary qualitative discrepancy was that the model failed to predict that positive 3-point glider stimuli would generate smaller turning responses than negative 3-point glider stimuli. The simplest interpretation for this experimental result is that flies might incorporate both 3-point correlations and 4-point correlations into their motion estimation strategy. In particular, since the positive and negative 3-point glider stimuli have inverted 3-point correlations and matched 4-point correlations, third-order and fourth-order correlations would have the same sign for one parity and opposite signs for the other parity. This observation makes it easier to understand the glider predictions of the weighted 4-quadrant model. The optimized model does a good job accounting for the direction and approximate magnitude of the glider responses because it draws heavily on second-order and odd-order correlations, but it fails to predict the 3-point glider magnitude asymmetry because it finds little added utility in higher-order even correlations (Figure 3—figure supplement 1, Appendix 8). This failure stems from architectural limitations in the weighted 4-quadrant model (Figure 3—figure supplement 2), so it is important to consider alternate model classes.

Drosophila circuitry contains additional elements that might facilitate motion estimation

The previous section suggested that the segregation of light and dark signals by Drosophila's motion estimation circuitry might enhance naturalistic motion estimation in a manner that also generates the observed glider responses. In this section, we introduce three hierarchical models to investigate other features of Drosophila's circuit that might have functional consequences for the processing of natural stimuli and gliders (Table 1). We refrain from modifying the temporal filtering of the motion estimator, and we focus on its nonlinear architecture.

The first of these models recasts the HRC and the weighted 4-quadrant model in a more general architecture. This model is the class of mirror anti-symmetric models that apply a 2-dimensional nonlinearity to the low-pass filtered signal from one point in space and the high-pass filtered signal from a neighboring point in space (Figure 4A). Since the observed glider responses indicate that flies use higher-order correlations of both even and odd order, we model this 2-dimensional nonlinearity as a fourth-order polynomial (‘Materials and methods’). The HRC corresponds to the special case of this nonlinearity that multiplies the two inputs (left, Figure 4B). To emphasize how the model class in Figure 4A generalizes the HRC, we refer to it as the non-multiplicative nonlinearity model. In comparison, the weighted 4-quadrant model corresponds to a different nonlinearity that separately scales a pure multiplication in each quadrant of the Cartesian plane. Compared to the HRC, the optimized forms of both the weighted 4-quadrant model and the non-multiplicative nonlinearity model substantially attenuated positive low-pass filtered signals (middle and right, Figure 4B), though the non-multiplicative nonlinearity shows less attenuation. This model architecture provides enough flexibility to generate the glider response pattern (Figure 4—figure supplement 1).

Figure 4 with 2 supplements see all
Several biologically motivated generalizations of the motion estimator further improved estimation performance without sacrificing glider responses.

See Table 1 for a description of the biological rationales behind these models. (A) The ‘non-multiplicative nonlinearity’ model substitutes a 2-dimensional nonlinearity for the pure multiplication of the HRC. Here, we approximated the nonlinearity with a fourth order polynomial. (B) Two-dimensional nonlinearities underlying the HRC, the weighted 4-quadrant model, and the non-multiplicative nonlinearity model. The latter models reflect optimized cases, in which the weighting coefficients maximized estimation performance with natural inputs. Iso-output lines are shown in each plot, and the horizontal and vertical limits are chosen to include 95% of the naturalistic input signals. (C) Another generalization, the ‘unrestricted nonlinearity’ model allows all 4 input signals to be combined nonlinearly. We approximate this 4-dimensional nonlinearity with a fourth-order polynomial. (D) A final generalization, the ‘extra input nonlinearity’ model, relaxes the restriction that the motion estimator only uses 2 spatial inputs. We approximate this 6-dimensional nonlinearity with a fourth-order polynomial. (E) Comparison of the estimation performance of these models to the HRC. We compare the extra input nonlinearity model to the average of two neighboring motion estimators. (F, G) The three models correctly predicted the directions of psychophysical responses. The pattern of 3-point responses differed somewhat across the models, and the extra input nonlinearity model was the first to predict a large asymmetry between positive and negative 3-point glider responses.

https://doi.org/10.7554/eLife.09123.010

The non-multiplicative nonlinearity model relaxes some restrictions of the 4-quadrant model. This is prudent because the exact nonlinear transformations implemented by neural circuits in the Drosophila brain remain poorly understood. For example, T4 and T5 are the first direction-selective neurons in the fly brain (Maisak et al., 2013), but the mechanism by which they become direction-selective is not yet known. Furthermore, neurons upstream of T4 and T5 imperfectly segregate light and dark information (Behnia et al., 2014) and show overlap between the two motion pathways (Silies et al., 2013), suggesting that ON/OFF segregation may not be crisply realized. We will discuss this model's estimation accuracy and glider performance in the next section.

Drosophila's motion processing circuitry suggests two more generalizations of the non-multiplicative nonlinearity model. First, note that the non-multiplicative nonlinearity model inherits the HRC's assumption that each nonlinear unit only acts upon the low-pass filtered signal from one point in space and the high-pass filtered signal from the neighboring point (Figure 4A). In contrast, the converging 3-point correlator (Figure 1H) shows that the accuracy of motion estimation can sometimes be enhanced by nonlinearly combining both low-pass filtered signals (Figure 1I). Moreover, connectomic evidence conflicts with the non-multiplicative nonlinearity model's constraints, because each T4 cell receives synaptic connections from both the Mi1 cell and the Tm3 cells (T4's two major input channels) at overlapping points in space (Takemura et al., 2013). The unrestricted nonlinearity model removes this restriction of the non-multiplicative nonlinearity model by allowing a 4-dimensional nonlinearity to act on all four filtered signals (Figure 4C). Here, we again model this nonlinearity as a fourth-order polynomial (‘Materials and methods’). The unrestricted nonlinearity allows the motion estimator to nonlinearly combine multiple temporal channels from the same point in space. Recent experiments indicate that the Mi1 and Tm3 cells alone are insufficient to account for the motion processing of the T4 channel (Ammer et al., 2015). Future work might generalize the unrestricted nonlinearity model to include three or more temporal channels at each point in space.

The models presented so far operate only on a pair of neighboring photoreceptors, and the final generalization incorporates a third point in space. Averaging EMDs over space improves the accuracy of whole-field motion estimation (Dror et al., 2001), but Drosophila's neural circuitry suggests that it might adopt a more sophisticated strategy to combine signals across space. In particular, single T4 cells receive synaptic inputs from Mi1 cells and Tm3 cells from more than two retinotopic columns (Takemura et al., 2013). This arrangement could allow the circuit to incorporate higher-order correlations that are distributed across three or more spatial input channels. To explore whether this possibility has computational significance, we generalized the unrestricted nonlinearity model to provide unrestricted access to six temporal channels distributed across three points in space (Figure 4D). We refer to this model as the extra input nonlinearity model. We approximate its 6-dimensional nonlinearity as a fourth-order polynomial (‘Materials and methods’).

Elaborated circuit architectures improve motion estimation without sacrificing glider responses

Having introduced the rationale behind the non-multiplicative, unrestricted, and extra input nonlinearity models, it is straightforward to examine their performance as motion estimators. First note that the polynomial non-multiplicative nonlinearity model was a better motion estimator (Figure 4E) than the weighted 4-quadrant model (Figure 3C). This implies that some useful signatures of naturalistic motion are not made accessible by simply segregating ON and OFF motion signals. Interestingly, this performance improvement is largely due to 3-point correlations, and models that exclude fourth-order polynomial terms still outperform the weighted 4-quadrant model (Figure 4—figure supplement 2). Third-order correlations are only useful for motion estimation because of light–dark asymmetries in natural stimulus statistics (Fitzgerald et al., 2011; Clark et al., 2014), so this result implies that ON/OFF segregation provides an imperfect way to account for the complexity of light–dark asymmetries found in the natural world. The non-multiplicative nonlinearity model also made novel use of low-order correlations to improve its motion estimate (Appendix 10).

The three models are hierarchical because the non-multiplicative nonlinearity model is a special case of the unrestricted nonlinearity model, which is itself a special case of the extra input nonlinearity model. Thus, we expect each model to perform at least as well as its predecessor, but it is possible that some circuit elaborations will not introduce useful computational cues. Nevertheless, we found that that the unrestricted nonlinearity model performed better than the non-multiplicative nonlinearity model, and the extra input model performed better than the average of two neighboring unrestricted nonlinearity models (Figure 4E). Therefore, both models incorporated novel computational signatures with relevance for visual motion estimation. Although the relative improvements were fairly small, it's worth noting that the improvement from spatial averaging is also small, and it is possible that the fly brain builds an accurate motion estimator by combining a large number of weak predictors of motion.

Each of these three generalized models predicted 2-point glider responses (Figure 4F) that closely resembled the standard HRC (left, Figure 1G). Each model also correctly predicted the experimental turning directions to each of the 3-point glider stimuli (Figure 4G). The magnitudes of the 3-point glider turning responses did not unambiguously favor any of the three hierarchical models (Figure 4G) or the weighted 4-quadrant model (right, Figure 3D). Each model did better on some stimuli and worse on others. Nevertheless, the predicted glider responses did make several interesting points. First, the extra input nonlinearity model predicted a clear asymmetry between positive and negative 3-point gliders (Figure 4G). This shows that some of the even-ordered correlations found in 3-point glider stimuli have relevance for naturalistic motion estimation. Second, the observation that each model provides qualitatively similar glider response patterns illustrates that animals could use multiple nonlinear mechanisms to access ethologically relevant higher-order correlations. Future experiments should directly assess the functional relevance of the different models in the hierarchy. Finally, the qualitative agreement between all of these predictions and the experimental data supports the general hypothesis that glider responses could reflect underlying nonlinear mechanisms that facilitate motion estimation in natural environments.

The extra input nonlinearity model contains the conceptual content of the other considered models

In this paper, we sequentially introduced several models in order to isolate specific ideas about the relationships between Drosophila's behavior, its motion estimation circuit, and the statistical demands of accurate motion estimation in natural environments. The front-end nonlinearity model explored an interesting candidate principle for visual motion estimation, but it conflicted sharply with fly behavior (Figure 2G) and excluded the conceptual insights offered by other models. For example, the front-end nonlinearities we considered eliminated the asymmetry between light and dark contrasts (Figure 2D), removing the need for separate ON and OFF processing. However, the remaining models embodied ideas that are complementary rather than exclusive, and these models should not be thought of as competitors. Instead, we will show here that the final, most general model incorporates the variety of conceptual points that were initially illustrated by specific models.

The structure of the non-multiplicative nonlinearity models can be directly plotted (Figure 4B), but is not easy to visualize the 6-dimensional nonlinearity that defines the extra input nonlinearity model. We therefore need an alternate technique to illustrate its computations. We proceed by leveraging three ideas. First, a wide variety of visual motion estimators can be expanded as an abstract series of multipoint correlators (e.g., see Poggio and Reichardt, 1980, Fitzgerald et al., 2011, and Appendices 6, 7, 11), and it is straightforward to pictorially represent a multipoint correlator (e.g., see Figure 1A,H, and more to come). In the extra input nonlinearity model, this expansion is immediate because we have already parameterized its 6-dimensional nonlinearity as a polynomial. Importantly, this expansion should be considered at the algorithmic level (Marr and Poggio, 1976), and we do not suggest that the wiring of brain circuits will reflect a large number of higher-order correlators. To the contrary, a large number of higher-order multipoint correlators may be implemented implicitly by high-dimensional nonlinearities suggested by Drosophila's visual circuitry. Second, we note that certain multipoint correlators can be recombined into a 2-dimensional non-multiplicative nonlinearity that facilitates easy comparisons with the HRC and weighted 4-quadrant models (e.g., see Figure 4B). Taken together, these two points mean that we can represent the performance-optimized extra input nonlinearity model in terms of non-multiplicative nonlinearity models and multipoint correlators, each of which are easy to represent graphically.

This graphical representation could be unwieldy because of the shear number of higher-order correlators in the model. Thus the third and final point is that we need a way to identify a relatively small number of terms that substantially improve the accuracy of motion estimation and illustrate the conceptual content of the model. To achieve this, we used lasso regression (Tibshirani, 1996) to identify models with fewer multipoint correlators that still enabled accurate motion estimation (‘Materials and methods’). This analysis revealed that fewer than half of all multipoint correlators were needed to account for the full accuracy of the extra input nonlinearity model (rightmost bars, Figure 5A). In fact, the accuracy of naturalistic motion estimation increased rapidly as the few correlators were sequentially added (left bars, Figure 5A), and a model that used 16 out of the 209 possible predictors was already able to produce 74% of the gain offered by the full extra input nonlinearity model (red bar, Figure 5A).

Figure 5 with 1 supplement see all
Computational interpretation of the extra input nonlinearity model.

(A) We used lasso regression to select subsets of predictors that might enable accurate estimation (see ‘Materials and methods’). With only 16 predictors, the model improved naturalistic performance over the HRC by 68%, and including fewer than half of the predictors improved it by the full 92%. The maximum number of predictors corresponds to the number of polynomial coefficients that were fit in the full model. (B) We visualized the 6-dimensional nonlinearity as the sum of several simpler computational modules. When only 16 predictors were used (red bar in (A)), the model used four distinct types of computations. In particular, the model included nearest-neighbor and next-nearest-neighbor non-multiplicative nonlinearities (top row). It also included a converging 3-point correlator from the two furthest photoreceptors and a 4-point correlator that combined three spatial inputs (bottom row). (C) Venn diagram illustrating the hierarchical nesting of models used in this paper. All models in this paper contain sets of parameters that reproduce the HRC (gray dot). The weighted 4-quadrant model is a subset of non-multiplicative nonlinearity models, which are themselves a subset of unrestricted nonlinearity models. The extra input nonlinearity encompasses all the models. When we approximated the nonlinearites with fourth order polynomials, we restricted them to a smaller portion of the model space. The 4-quadrant nonlinearities only overlapped with the fourth-order polynomial approximation at the HRC, because the weighted 4-quadrant model is infinite order when expanded as a polynomial (see Appendix 7).

https://doi.org/10.7554/eLife.09123.013

The leading 16 predictors compactly illustrated how the extra input nonlinearity model recapitulates the conceptual advances offered by the other models (Figure 5B). Four of the predictors combine to implement a mirror-symmetric non-multiplicative nonlinearity model that acts on the first and second points in space (first term, Figure 5B). The dominant contribution to the nonlinearity is the HRC's multiplier, but an additional third-order term breaks the symmetry between positive and negative low-pass filtered signals (Figure 5—figure supplement 1). Thus, the extra input nonlinearity model approximately correlates neighboring points in space, as the HRC would suggest, but it differentially weights positive and negative low-pass filtered signals, like the weighted 4-quadrant model. It also replicates the main insight from the non-multiplicative nonlinearity model: the best treatment of asymmetric light and dark information need not be as simple as pure ON/OFF segregation. The model used another eight predictors to construct two more non-multiplicative nonlinearity models, one that surveyed the second and third points in space and another that surveyed the first and third points (first and second terms, Figure 5B, Figure 5—figure supplement 1). These components make the previously highlighted conceptual points and add the observation that spatial averaging improves estimates.

The final four predictors implemented two mirror anti-symmetric multipoint correlators (third and fourth terms, Figure 5B). In particular, two predictors went towards implementing a converging 3-point correlator that spanned the first and third spatial points (third term, Figure 5B). This estimator made the model's asymmetric treatment of light and dark signals more nuanced than permitted by the non-multiplicative nonlinearity model, and it also incorporated motion signals that combine multiple temporal signals from the same point in space. This latter point was the main conceptual motivation for the unrestricted nonlinearity model. Finally, the last two predictors implemented a 4-point correlator that combined temporal signals from three distinct points in space (fourth term, Figure 5B). This component reinforces the conceptual motivation for the extra input nonlinearity model and gives a concrete example of a computationally relevant higher-order correlator that is distributed across three points in space. It's interesting that the leading fourth-order correlator spanned three spatial points, because the extra input nonlinearity model was the first performance-optimized model that generated a substantially asymmetric response to positive and negative 3-point gliders (Figure 4G).

This paper set out with the goal of exploring whether the statistical demands of naturalistic motion estimation could provide a useful lens for interpreting features of Drosophila's behavior and neural circuitry that push beyond the canonical HRC. Although we have considered several interesting classes of visual motion estimators, the space of possible motion estimators is much larger (Figure 5C). For instance, these models have not explored the impact of temporal filter choice on naturalistic motion estimation. Nor have they assessed the possibility of more than two temporal filters, which is be suggested by anatomical (Takemura et al., 2013) and physiological (Ammer et al., 2015) experiments. More generally, the neural circuits contributing to Drosophila's motion estimator are still incompletely known, and the extent to which the fly brain's biological complexity reflects computational sophistication remains an open question. Theoretical considerations will be critical for resolving that question and pinpointing the most relevant principles underlying visual motion estimation.

Discussion

Ongoing research is providing an increasingly detailed picture of the anatomy and physiology of the visual circuitry that implements motion processing in Drosophila. Through the combination of genetic silencing experiments, connectomic analysis, and functional recordings, researchers have identified many individual neurons in the fly brain that contribute to visual motion processing (Silies et al., 2014). Although the HRC provided the initial theoretical impetus for these experiments, specific experimental outcomes have often been unanticipated. For instance, the fly brain contains multiple pathways that segregate different types of motion information (Joesch et al., 2010; Clark et al., 2011; Silies et al., 2013); its direction-selective neurons receive inputs from more than two neighboring points in visual space (Takemura et al., 2013); and the biological substrates for reverse-phi signals, which were fundamental to the formulation of the HRC, remain poorly understood (Clark et al., 2011; Tuthill et al., 2011; Joesch et al., 2013). Theoretical work to illuminate the computational significance of these various discrepancies is critical for understanding Drosophila's motion estimator.

The results presented in this paper provide a new theoretical perspective on these experimental results. While previous research has addressed how neural circuits could use four quadrants to carry out algebraic multiplication, here, the recurring theme of our models was that motion-processing circuits should treat light and dark signals differently for functional reasons. We first showed that visual systems could use ON and OFF processing channels that separately correlate light and dark signals to improve the accuracy of motion estimation (Figure 3). This model was inspired by the experimental observation that Drosophila's motion processing channels distinguish between light increments and decrements (Joesch et al., 2010; Clark et al., 2011), but this study is the first to explicitly demonstrate how such processing channels can improve the accuracy of motion estimation. Furthermore, our model shows that both the phi channels (i.e., the (+ +) and (− −) and quadrants) and the reverse-phi channels (i.e., the (+ −) and (− +) quadrants) can contribute productively to motion estimation in natural environments. Since many animals experience similar sensory statistics and ON and OFF visual processing channels are pervasive across visual systems (Schiller, 1992; Westheimer, 2007), these mechanisms might be very general. Ultimately, the performance gains from weighted quadrants were a consequence of statistical asymmetries between light and dark contrasts in natural images, and our models showed that neural circuits could perform even better if they made distinctions between light and dark signals that were subtler than simple ON/OFF segregation (Figure 4). Recent experimental evidence indicates that the fly's motion processing channels are imperfectly selective for ON vs OFF information (Silies et al., 2013; Behnia et al., 2014; Strother et al., 2014), and it is important that future experiments characterize such subtleties in the computations performed by these circuits.

Our most general model contained three spatial inputs and showed that spatial averaging of local motion detectors was suboptimal (Figure 4E). Anatomy suggests that single T4 cells receive inputs from several different retinotopic columns, and also from multiple neuron types in a single retinotopic column (Takemura et al., 2013). Our modeling suggests that these two forms of circuit heterogeneity could enhance motion estimation by facilitating computations that go beyond averaging to compute higher-order correlations that are distributed across multiple points in space (Figure 5B). Overall, our results demonstrate how the subtleties of neural circuit nonlinearities can improve motion detection with naturalistic inputs. It therefore seems likely that some of the complexities of Drosophila's circuitry are critical to its performance under natural conditions.

It is remarkable that our approximation of natural motion by the rigid translation of natural images revealed substantial utility for higher-order correlations in motion processing. Truly naturalistic motion would include spatial velocity gradients, occlusion, expansion, and contraction, yet the simplified naturalism we used to optimize our models already sufficed to account for many aspects of the fly's glider responses. This may be because the rotational optomotor response measured in the fly experiments is thought to be sensitive primarily to full-field rotations, which our naturalism emulates well. However, since other higher-order correlations may be associated with non-rigid translation (Nitzany and Victor, 2014), one might expect a different set of glider sensitivities to be optimal in the context of other motion-guided behaviors, such as looming responses (Gabbiani et al., 1999; Tammero and Dickinson, 2002; Card and Dickinson, 2008). Since a common elementary motion detector might underlie many or all motion-guided behaviors, incorporating more complex optic flow patterns may even diminish discrepancies between our models and Drosophila's behavior.

The approach of this study is also relevant to vertebrate vision, where researchers typically model motion estimation using the motion energy model (Adelson and Bergen, 1985). Like the HRC, the motion energy model only responds to 2-point correlations in the visual stimulus. Consequently, many of the theoretical considerations in this paper apply directly to the motion energy model. Furthermore, each of our computational models can be straightforwardly generalized to the architecture of the motion energy model. For example, one could incorporate non-multiplicative nonlinearities by replacing the squaring operation of the motion energy model with a more flexible nonlinearity. Nevertheless, the numerical benefits offered by each modification to the motion energy model might differ from those found for the HRC because the motion energy model and HRC use distinct spatial and temporal filtering. Such differences could in principle manifest themselves as a different pattern of predicted glider responses (Hu and Victor, 2010; Clark et al., 2014), but comparative electrophysiology experiments in macaques and dragonflies currently suggest that similarities between primate and insect motion processing are abundant (Nitzany et al., 2014).

Our models make predictions that are testable with new experiments. Researchers hypothesize that the T4 and T5 neurons in the fly lobula nonlinearly combine visual inputs across space and time to become the first direction-selective neurons in Drosophila's visual system (Maisak et al., 2013). In accordance with the HRC model, conventional wisdom says that these neurons will multiply their input channels. In contrast, we predict that T4 and T5 will combine their visual input streams with non-multiplicative nonlinearities that facilitate accurate motion estimation in natural sensory environments. It's crucial to note that subtle differences between biology's nonlinearity and a pure multiplication can correspond to substantial functional effects. In particular, the optimized nonlinearity that we found here (Figure 4B) is superficially similar a simple multiplication, yet its subtle distinctions manifest themselves by improving the local estimation accuracy of the HRC by an impressive margin (Figure 2D).

In this paper, we studied several simple models to most clearly illustrate the computational consequences of fundamental nonlinear circuit operations. Each of these operations individually provided a way for Drosophila to improve their motion estimation accuracy in natural environments, but they are not necessarily exclusive. For example, if a front-end nonlinearity does not fully remove the asymmetry between light and dark contrasts, then subsequent ON and OFF processing might further improve estimation accuracy. Similarly, non-multiplicative nonlinearities might enable an even better combination of ON and OFF signals for motion estimation. The general approach that we adopted here is to restrict the space of candidate models to those that have immediate biological relevance and to identify interesting models by optimizing the model's estimation accuracy over naturalistic stimuli. Future models should incorporate more biological details to better emulate the specifics of Drosophila's visual circuitry, which is rapidly being dissected through unprecedented anatomical, functional, and behavioral experiments (Silies et al., 2014).

Materials and methods

Simulated ensemble of naturalistic motions

We simulated the linear responses of neighboring photoreceptors to naturalistic motion using methods similar to previous work (Clark et al., 2014). We began with a database of natural images (van Hateren and van der Schaaf, 1998). We converted each natural image to a contrast scale, C(x)=(I(x)I0)/I0, where C(x) is the contrast at the spatial point x, I(x) is its intensity, and I0 is the average intensity across the image. Since we only consider horizontal motion, we emulated the spatial blurring of Drosophila's photoreceptors in the vertical dimension by filtering across rows with a Gaussian kernel (FWHM = 5.7°). We then took the central row of each filtered image to represent a one-dimension variant of the natural image, denoted c(x). We applied reflective boundary conditions to generate images that covered 360° and down-sampled each resulting image to 1° pixels by averaging. Photoreceptor blurring from signals in the horizontal dimension depends on the velocity of motion. In particular, we model the response of the ith photoreceptor as

Vi(t)=dtT(t)dxM(xxi)c(xv(tt)),

where T is a causal exponential kernel (timescale = 10 ms), M is a Gaussian kernel (FWHM = 5.7°), xi is the location of the ith photoreceptor, and ν is the velocity of motion.

Each naturalistic motion comprised a randomly selected one-dimensional natural image, an offset to set the initial location of the photoreceptors, and a velocity drawn from a zero-mean normal distribution with a standard deviation of 90°/s. In this manner, we simulated the responses of three horizontally adjacent photoreceptors (spaced by 5.1°) to 5 × 105 naturalistic motions (each with duration = 800 ms, time step = 5 ms). We then explicitly enforced left-right symmetry in the naturalistic ensemble by pairing each naturalistic motion with a new simulated motion, in which the natural image is reflected, the velocity is inverted, and the offset is chosen such that {V1(t),V2(t),V3(t)} in the new naturalistic motion is exactly {V3(t),V2(t),V1(t)} from its partner. The final symmetric ensemble thus consists of 106 naturalistic motions.

The HRC

The HRC applies two temporal filters to its photoreceptor inputs. We denote the kernels of the low-pass and high-pass filters as f and g, respectively, such that the output of a local HRC is

R(t)=(f*V1)(t)(g*V2)(t)(g*V1)(t)(f*V2)(t),

where * denotes convolution (Figure 1A). We consider the HRC's velocity estimate for a given naturalistic motion as its value at the final time point of the simulation. We model the filter kernels as

f(t)=tet/τ,t0

and

g(t)=df(t)dt,

where τ = 20 ms and g(t) is comparable to lamina monopolar cell responses (Clark et al., 2011; Behnia et al., 2014). We built the alternate motion estimators considered in this work from the same four filtered signals, {(fV1),(gV1),(fV2),(gV2)}, always considering the estimator's output at the final time point as its velocity estimate. Thus, none of our models modified the spatial or temporal processing of the HRC, reflecting our emphasis on how nonlinear processing might be tuned for naturalistic motion estimation. The global output of an array of HRCs would be obtained by pooling signals across space. Here we focus on spatiotemporally local strategies for motion estimation and at most pool motion signals across two neighboring motion detectors.

Relationship between the mean squared error and the correlation coefficient

We evaluate motion estimators by the mean squared error between their output and the true velocity. To minimize the mean squared error of the HRC, we scale its output by r(R)σν/σR, where r(R) is the correlation coefficient between the HRC's output and the velocity of motion, σν is the standard deviation of the velocity distribution, and σR is the standard deviation of the HRC's output. Once the HRC is scaled in this manner, its mean squared error is

ϵ=σv2(1(r(R))2).

More generally, this equation rewrites the mean squared error of any optimally scaled motion estimator in terms of its correlation coefficient with the velocity. All motion estimators considered in this paper are optimally scaled, and we find the correlation coefficient to be more intuitive than the mean squared error. We thus always report the performance of each motion estimator in terms of the correlation coefficient between the true and estimated velocity.

Model fitting procedure

We fit the linear weighting parameters in the models of Figures 1I, 3A, 4A,C,D to maximize the estimation accuracy over a simulated ensemble of naturalistic motions. The formulas provided in subsequent sections of the ‘Materials and methods’ will cast each motion estimation scheme as a linear combination of a variety of motion predictors,

ve=iwixi,

where the xi are nonlinear combinations of {(fV1),(gV1),(fV2),(gV2),(fV3),(gV3)} that depend on the model architecture, and the wi are associated weighting coefficients. We chose the weights to minimize the mean squared error between the true and predicted velocity, which is the standard scenario considered by ordinary least-squares regression. The same weights maximize the correlation coefficient between the true and predicted velocity, and we typically present model accuracies as correlation coefficients.

We used twofold cross-validation to protect against over-fitting. In particular, we randomly divided the ensemble of naturalistic motions into a training set of 500,000 symmetrically paired examples and a testing set of the remaining 500,000 examples. We determined the weighting coefficients by minimizing the empirical error over the training set, and we reported accuracies over the test set. To estimate error bars for each model's accuracy, we computed twenty random divisions of the naturalistic motion ensemble and calculated the standard deviation of the estimation accuracy.

Model responses to glider stimuli

We generated 25 random instantiations of each glider stimulus considered by our previous experimental work (Figure 1F, duration = 3 s, update rate = 40 Hz, pixel size = 5°) (Hu and Victor, 2010; Clark et al., 2014). We evaluated the response of each model to these stimuli by averaging the outputs of 60 identical local motion estimators (each separated by 5.1°) over the last two seconds of visual stimulation. Glider predictions were equal and opposite for the left and right variants of the stimuli, so we pooled leftward and rightward stimuli in all figures (Figure 1F shows the rightward variants). We scaled each model's output such that the average response to the positive 2-point glider was 1. All figures associated with glider responses show the mean and standard error of each model's response across the 25 glider instantiations.

Front-end nonlinearity model

The model in Figure 2A replaces the linear photoreceptor signals, V1 and V2, with nonlinear photoreceptor signals

yi=h(Vi),

where h is some nonlinear function. Thus, the motion estimate from the front-end nonlinearity model is

F=(f*y1)(g*y2)(g*y1)(f*y2).

To implement the contrast equalizing nonlinearity, we replaced values of Vi(t) by their rank-order (scaled and shifted to range between −1 and +1). Note that all Vi(t) were sorted together (i.e., including all spatial points, temporal points, and simulated naturalistic motions). When multiple Vi(t) had the same value, they were given the same rank. To implement binarizing nonlinearities, we again sorted the Vi(t) and found the values corresponding to the threshold locations. For example, to calculate the binarizing nonlinearity with two steps (Appendix 4): (i) we found the Vi(t) values corresponding to the 25th and 75th percentiles; (ii) signals below the 25th percentile or above the 75th percentiles were assigned the value of −1; and (iii) signals between 25th and 75th percentiles were assigned the value of +1. To implement the Gaussianizing nonlinearity, we again rank-ordered the Vi(t) (scaled to range between 0 and 1) and applied the inverse Gaussian cumulative distribution function to these ranks. The HRC is the special case of this model where the front-end nonlinearity is linear.

Weighted 4-quadrant model

The weighted 4-quadrant model in Figure 3A separately correlates bright and dark signals. Mathematically, it is

Q=a{+,}b{+,}wab(Q)Qab,

where wab(Q) are adjustable weights that parameterize the model,

Qab=[(f*V1)]a[(g*V2)]b[(g*V1)]b[(f*V2)]a,

[x]+ is equal to x when x is positive and zero otherwise, and [x] is equal to x when x is negative and zero otherwise. The HRC is the special case of this model where w++(Q)=w+(Q)=w+(Q)=w(Q).

Non-multiplicative nonlinearity model

The non-multiplicative nonlinearity model in Figure 4A replaces the HRC's multiplication step with a more flexible two-dimension nonlinearity. In particular, it is

N=η((f*V1),(g*V2))η((f*V2),(g*V1)),

where we approximate the nonlinearity, η, as a fourth-order polynomial

η(x,y)=i=04j=04iwij(N)xiyj,

and wij(N) are adjustable weights that parameterize the model. We include terms up to fourth order in this model to ensure that it is flexible enough to describe the published glider response data. In particular: (i) the second-order terms accommodate responses to 2-point glider stimuli; (ii) the third-order terms accommodate parity-inverting responses to 3-point glider stimuli; and (iii) the fourth-order terms enable the model to respond with unequal magnitude to positive and negative parity 3-point glider stimuli (Figure 4—figure supplement 1). Thus, this model has 14 parameters. The HRC is the special case of this model where only w11(N) is nonzero.

Unrestricted nonlinearity model

Here we model the 4-dimensional nonlinearity in Figure 4C as a fourth-order polynomial of the four filtered signals in the HRC. In general, this motion estimator is

S=i=04j=04ik=04ijl=04ijkwijkl(S)(f*V1)i(g*V1)j(f*V2)k(g*V2)l,

where wijkl(S) are adjustable weights that parameterize the model, and we set w0000(S)=0 because this term has no utility for naturalistic motion estimation. Thus, this model has 69 parameters. The HRC is the special case of this model where w1001(S)= w0110(S)0, and all other parameters are zero.

Extra input nonlinearity model

Here we model the 6-dimensional nonlinearity in Figure 4D as a fourth-order polynomial of the six filtered signals in two neighboring HRCs. In general, this motion estimator is

E=i=04j=04ik=04ijl=04ijkm=04ijkln=04ijklmwijklmn(E)(f*V1)i(g*V1)j×(f*V2)k(g*V2)l(f*V3)m(g*V3)n,

where wijklmn(E) are adjustable weights that parameterize the model, and we set w000000(E)=0 because this term has no utility for naturalistic motion estimation. Thus, this model has 209 parameters. The average of two neighboring HRCs is the special case of this model where w100100(E)= w011000(E)=w001001(E)=w000110(E)0, and all other parameters are zero.

Lasso regression for predictor selection

Lasso regression augments the squared error with an L1 penalty on nonzero weighting coefficients that favors sparse solutions (Tibshirani, 1996). We used lasso regression to identify subsets of predictors that might enable accurate motion estimation (Figure 5A). Once we identified a predictor subset using lasso regression, we refit the nonzero model weights using ordinary least squares regression (i.e., without the weight penalty).

Appendix 1

Visual signatures of motion.

The pattern of light that stimulates the retina encodes information about the relative motion between the retina and its visual environment. The manner in which this information is encoded depends on the geometry of the photoreceptor array, the statistics of self-motion, and the statistics of the visual environment. The principal goal of this paper is to illustrate several ways that the brain's nonlinear processing of visual motion signals might be tuned to reflect specific features of the natural visual environment. We thus begin by enumerating some computational signatures of visual motion in natural environments, thereby exposing a diversity of stimulus features that visual system nonlinearities might aim to extract.

In the real world, animals encounter visual environments that are intricately structured and far from random (Appendix figure 1A) (Ruderman and Bialek, 1994; van Hateren and van der Schaaf, 1998; Geisler, 2008). When an animal rotates with constant angular velocity through the environment, the spatiotemporal response profile of the photoreceptor array encodes the velocity of self-motion through the slope of oriented streaks in space-time (front face, Appendix figure 1B) (Adelson and Bergen, 1985). Thus, a visual system with a dense array of noiseless photoreceptors could extract the angular velocity of an arbitrary image by computing the ratio of temporal and spatial derivatives (Potters and Bialek, 1994). The statistics of the image ensemble become relevant once multiple interpretations of the sensory world become logically consistent with the photoreceptor data. In particular, the optimal motion estimator depends on the statistics of the image ensemble when photoreceptors have noise (Potters and Bialek, 1994; Fitzgerald et al., 2011), and a nonzero spacing between photoreceptors introduces ambiguity via aliasing (Potters and Bialek, 1994). In these cases, the animal can use prior information regarding the sensory environment and its motion in order to weigh the plausibility of each sensory interpretation.

Appendix figure 1
Motion transforms spatial correlations into temporal correlations.

(A) An example natural image (van Hateren and van der Schaaf, 1998). (B) When a natural image (top face) moves to the right, streaks in space-time (front face) indicate the direction and speed of the motion. Alternatively, motion influences the temporal correlation structure of visual signals (side face). (C) Second-order correlation function between pairs of spatially separated contrast signals (across the natural image ensemble [van Hateren and van der Schaaf, 1998]). (D) For constant velocity motion, the temporal correlation function between a pair of spatially separated points is shifted and stretched relative to the spatial correlation function. We separated the two points by Drosophila's photoreceptor spacing (5.1°). (E) Example third-order spatial correlation function involving two points in space. (F) As with pairwise correlations, higher-order temporal correlations between spatially separated visual signals are shifted and stretched (relative to higher-order spatial correlation functions) in a manner that indicates the speed and direction of motion.

https://doi.org/10.7554/eLife.09123.015

Full field motion transforms spatial features (top face, Appendix figure 1B) into temporal features (side face, Appendix figure 1B) in a manner that depends upon the velocity of motion. Consequently, one can also think about the visual signatures of motion in terms of spatiotemporal correlations between photoreceptors. The luminance contrast encoded by the ith photoreceptor is Ci(t)=(Ii(t)I0)/I0, where Ii(t) is the luminance intensity seen by the ith photoreceptor at time t and I0 is the average luminance intensity over the visual field. Thus, the average contrast is zero, and the simplest correlation function corresponds to the product of two spatially separated contrast signals. Measured over an ensemble of natural images, this 2-point correlation function had a global maximum at zero spatial offset (Appendix figure 1C). Consequently, the velocity of motion is encoded by the peak of the temporal cross-correlation function between two neighboring photoreceptors, which occurs at the temporal offset that equals the photoreceptor spacing (5.1° for Drosophila) divided by the velocity of motion (Appendix figure 1D). Natural images also contain many higher-order correlations (Ruderman and Bialek, 1994; Geisler, 2008). For instance, the nonzero skewness of natural images implies that the third-order correlation that multiplies the contrast at one point with the squared contrast at a neighboring point also has a peak at zero spatial offset (Appendix figure 1E). Correspondingly, the peak of the temporal 3-point correlation function between neighboring photoreceptors encodes the velocity of motion (Appendix figure 1F). This argument generalizes to nth-order correlation functions when the ensemble of natural images has a nonzero nth moment. Note that this argument does not necessarily imply that a motion estimator would benefit from the incorporation of all nonzero correlation functions, because the velocity signals provided by one correlation function could be redundant with those provided by others.

Importantly, photoreceptor correlation functions also encode velocity information away from their peaks. For example, the velocity of motion influences the widths of the temporal cross-correlation functions between pairs of photoreceptors (Appendix figure 1D,F). To see this, note that the values of the temporal correlation functions at zero temporal offsets are velocity independent, whereas the peak locations are closer to zero for larger speeds (Appendix figure 1D,F). This implies a more rapid falloff for higher speeds. This fundamental effect occurs because nearby points are more correlated in natural environments and photoreceptors rapidly survey distant points when the speed of motion is high.

The description above illustrates how visual motion becomes encoded in photoreceptor correlations. A central goal of research in visual motion estimation is to understand how neural circuits invert (or decode) that encoding of velocity. Just as a broad class of functions can be represented as a power series, a broad class of motion estimators can be represented as a Volterra series (Poggio and Reichardt, 1980; Fitzgerald et al., 2011). Each term in the Volterra series can be interpreted as a multipoint correlator that decodes velocity information from a specific correlation function (Fitzgerald et al., 2011). For example, the HRC and the motion energy model are 2-point correlators that decode velocity from 2-point correlations, whereas the Bayes optimal motion estimator capitalizes on a wider variety of correlation functions (Potters and Bialek, 1994; Fitzgerald et al., 2011). Because multipoint correlators relate intuitively to measurable properties of the image ensemble, we will find that decomposing a motion estimator in terms of multipoint correlators is often illuminating. Moreover, we will use multipoint correlators as a common basis to compare the computations performed by mechanistically distinct models.

Appendix 2

Accuracy of 2-point correlators.

In this section we derive an expression for the accuracy of a general 2-point correlator in terms of the statistics of naturalistic motion.

We consider a general 2-point correlator that temporally correlates visual signals from the spatial points i and j. Mathematically, this estimator has the form

(1) ve(2)(t)=dt1dt2ki,j(2)(t1,t2)Vi(tt1)Vj(tt2),

where the 2-point kernel, ki,j(2)(t1,t2), defines the correlator by specifying how each 2-point correlation contributes to the motion estimate. We model the response of the ith photoreceptor as

(2) Vi(t)=dτT(τ)dθM(θθi)c(θ0tτdtv(t)),

where T is a temporal integration kernel, M is the photoreceptor's spatial acceptance profile, θi is the location of the ith photoreceptor, c(θ) is the spatial contrast pattern of the visual world, and ν(t) is the time-dependent velocity. This formula simplifies to the formula in the ‘Materials and methods’ when ν(t) is time-independent. If T is an invertible linear filter, then a more convenient representation of the photoreceptor signals is

(3) Ui(t)=C(θi0tdtv(t)),

where Ui=T1Vi, C=M*c, and * is the convolution operator (Potters and Bialek, 1994). We can rewrite the 2-point correlator in this representation as

(4) ve(2)(t)=dt1dt2κi,j(2)(t1,t2)Ui(tt1)Uj(tt2),

where

(5) κi,j(2)(t1,t2)dt3T(t3)dt4T(t4)ki,j(2)(t1t3,t2t4)

is the 2-point kernel that converts correlations in the U variables to a velocity estimate.

Recall that we quantify the performance of visual motion estimators based on the mean squared error between the true and estimated velocities

(6) ϵ(ve(2)(t)v(t))2=σv22v(t)ve(2)(t)+(ve(2)(t))2,

where σν = 90°/s is the standard deviation of the velocity distribution. For estimators that are scaled to minimize their mean squared error (‘Materials and methods’), this formula can be rewritten as

(7) ϵ=σv2(1r2),

where

(8) r=v(t)ve(2)(t)(v(t))2(ve(2)(t))2=v(t)ve(2)(t)σv(ve(2)(t))2

is the correlation coefficient between the estimated and true velocities. Thus, minimizing the mean squared error is mathematically equivalent to maximizing the correlation coefficient if all motion estimators are correctly scaled. We find the correlation coefficient to be a more intuitive error metric than the mean squared error, so many of our results will be presented in terms of correlation coefficients.

The numerator of the correlation coefficient is determined by the second-order statistics of the image ensemble,

(9) v(t)ve(2)(t)=dt1dt2κi,j(2)(t1,t2)v(t)Ui(tt1)Uj(tt2),=dt1dt2κi,j(2)(t1,t2)v(t)C(2)(Δij+t1t2dtv(t))v,

where Δij is the angular separation between the ith and jth photoreceptors, and

(10) C(2)(Δ)C(x)C(x+Δ)C

is the 2-point correlation function over the ensemble of spatially filtered natural scenes. Note that C(2) is independent of x because reasonable image ensembles are translationally invariant. Also note that the 2-point correlation function of filtered natural images is related to the correlation function of unfiltered images by

(11) C(2)(Δ)=dxM(x)dxM(x)c(xx)c(x+Δx)=((M*M)*C(2))(Δ),

where C(2)(Δ) is the correlation function of unfiltered images, and we've assumed that M is a symmetric function. We model M as Gaussian with FWHM of 5.7°, so M*M is also Gaussian with FWHM of 2×5.7=8.1.

On the other hand, the denominator of the correlation coefficient is determined by fourth-order statistics of the image ensemble,

(12) (ve(2)(t))2=dt1dt2dt3dt4κi,j(2)(t1,t2)κi,j(2)(t3,t4)×Ui(tt1)Uj(tt2)Ui(tt3)Uj(tt4)=dt1dt2dt3dt4κi,j(2)(t1,t2)κi,j(2)(t3,t4)×C(4)(Δij+t1t2dtv(t),t1t3dtv(t),Δij+t1t4dtv(t))v,

where

(13) C(4)(Δ1,Δ2,Δ3)=C(x)C(x+Δ1)C(x+Δ2)C(x+Δ3)C

is the 4-point correlation function of the ensemble of filtered natural images. Notice that the second argument of C(4) in Equation 12 lacks the additive factor of Δij because Ui(tt1) and Ui(tt3) correspond to the same point in space. As above, C(4) is related to the unfiltered 4-point correlation function through a fourfold application of the photoreceptor spatial acceptance filter.

The preceding analysis shows that only the second-order and fourth-order statistics of the natural image ensemble contribute to the correlation coefficient between an arbitrary 2-point correlator and the true velocity. The same quantities also determine the mean squared error. Thus, the second-order and fourth-order statistics of the image ensembles are the critical determinants of a 2-point correlator's motion estimation accuracy. Note that both the HRC and the motion energy model fall into this important class of visual motion estimators, so our analysis is also important for understanding visual motion estimation by vertebrates.

Appendix 3

Motion estimation without spatial correlations—the role of kurtosis on the accuracy of 2-point correlators.

In this section, we apply the results of Appendix 2 to the special case of normally distributed velocities and spatially uncorrelated image ensembles. This calculation reveals an important role for kurtosis in motion estimation, and we discuss how nonlinearities in the early visual system could cope with highly kurtotic naturalistic inputs.

In this section, we assume that the velocity is time-independent (i.e., ν(t)=ν) and normally distributed

(14) Pv(v)=12πσv2ev2/(2σv2).

We also assume that the image ensemble is spatially uncorrelated. By this, we mean that the luminance contrast at each point in space is statistically independent of the luminance contrast at all other points in space. Thus, the second-order correlation function is

(15) C(2)(Δ)=σC2δ(Δ),

where σC is the standard deviation of the luminance contrast, and δ(Δ) is the Dirac delta-function. The fourth-order correlation function is

(16) C(4)(Δ1,Δ2,Δ3)=κ4σC4δ(Δ1)δ(Δ2)δ(Δ3)+σC4(δ(Δ1)δ(Δ2Δ3)+δ(Δ2)δ(Δ1Δ3)+δ(Δ3)δ(Δ1Δ2)),

where k4 is the excess kurtosis of the contrast distribution. The excess kurtosis is zero for normally distributed contrasts. It can either be positive or negative for other contrast distributions. Note that we define the kurtosis of a probability distribution to be its fourth central moment normalized by the square of its second central moment. Thus, the kurtosis of a normal distribution is 3. We caution readers that some other sources use ‘kurtosis’ to refer to the excess kurtosis.

With these assumptions, the signal term represented by Equation 9 is

(17) v(t)ve(2)(t)=σC2Δij2πσv2dt1dt2κi,j(2)(t1,t2)eΔij2/(2σv2(t2t1)2)(t2t1)|t2t1|,

and the noise term represented by Equation 12 is

(18) (ve(2)(t))2=σC4Δij2πσv2dt1dt2dt3dt4κi,j(2)(t1,t2)κi,j(2)(t3,t4)×(eΔij2/(2σv2(t1t4)2)δ((t1t4)(t3t2))+eΔij2/(2σv2(t1t2)2)δ((t1t2)(t3t4))+κ4|t1t2|ΔijeΔij2/(2σv2(t1t2)2)δ(t3t1)δ(t4t2)),

where we've assumed that the 2-point correlator is mirror anti-symmetric,

(19) κi,j(2)(t1,t2)=κi,j(2)(t2,t1),

in order to ignore contributions from static signals. This mirror-symmetry assumption holds for the HRC and the motion energy model. Since the denominator of the correlation coefficient is set by (ve(2)(t))2, both the signal and the noise are proportional to σC2. Thus, the only remaining dependence on the image ensemble is through the excess kurtosis. Note that

(20) d(ve(2)(t))2dκ4=σC4Δij22πσv2dt1dt2(κij(2)(t1,t2))2|t1t2|eΔij2/(2σv2(t1t2)2)>0.

Thus, the correlation coefficient is maximized by making k4 as small as possible.

In conclusion, if the image ensemble is spatially uncorrelated (at second and fourth-order), then the image ensemble only affects the correlation coefficient between the velocity and a 2-point correlator through its kurtosis. The best accuracy is achieved when the kurtosis is minimized. In reality, the assumption that the image ensemble is spatially uncorrelated is clearly wrong. Natural images are strongly correlated, and even if they weren't, they'd become correlated once they are filtered by the photoreceptors' spatial acceptance filter. Nevertheless, Figure 2E empirically shows that introducing several front-end nonlinearities that decrease the kurtosis also improve the accuracy of naturalistic motion estimation. Thus, kurtosis provides a useful guide for the design of neuronal nonlinearities. On the other hand, Figure 2D,E demonstrate that it's too simplistic to assume that the kurtosis is the only relevant factor for the accuracy of a 2-point correlator. As we'll discuss in the next section, spatial correlations in the image ensemble also affect the accuracy of 2-point correlators.

Appendix 4

The HRC benefits from spatially correlated input signals.

When we applied a contrast-equalizing or binarizing nonlinearity to naturalistic inputs before evaluating the HRC, we found that both nonlinearities substantially improved the accuracy of the HRC (Figure 2E). Interestingly, contrast equalization improved the accuracy of the HRC more than binarization (Figure 2E), even though it produced outputs with greater kurtosis. The reason for this is that natural images are correlated (Appendix figure 1), and the accuracy of the HRC over a general image ensemble depends on the ensemble's spatial correlation structure (Appendix 2). Binarization attenuated spatial correlations more strongly than contrast equalization over the natural image ensemble (Figure 2—figure supplement 1), which leads us to hypothesize that correlations present in the natural image ensemble might benefit the HRC's performance. In Appendix 5 we will provide theoretical support for this idea. Here we begin with a less mathematical argument that also supports our hypothesis.

A comparison between the estimation performance of binarizing and equalizing front-end nonlinearities was complicated by the fact that the models produced outputs that differed in both their point statistics and their correlation structures. To gain more direct insight into how spatial correlations affect motion estimation performance, it would be helpful to compare front-end nonlinearity models that differ only through their output correlation structures. We implemented this comparison using a family of binarizing front-end nonlinearities that undergo multiple steps between +1 and −1 (Appendix figure 2A). Although these nonlinearities are not physiologically realistic, they are conceptually useful because they each produced a stimulus ensemble that minimized the kurtosis yet achieved distinct correlation structures (Appendix figure 2B). These nonlinearities thus allow us to assess directly whether spatial decorrelation of inputs degrades the motion estimation performance of the HRC. We found that each binarizing front-end nonlinearity model outperformed the original HRC (Appendix figure 2C). However, we found that the magnitude of the improvement decreased with the number of steps (Appendix figure 2C). Since spatial cross-correlations also decreased as a function of the number of steps (Appendix figure 2B), these results support our hypothesis that the correlations present in natural visual inputs aid the functionality of the standard HRC.

Appendix figure 2
Correlations in binarized natural images.

(A) We transformed each image in the van Hateren natural image database (van Hateren and van der Schaaf, 1998) with several binarizing nonlinearities. To implement the simplest binarizing nonlinearity, we set all pixels to +1 or −1 depending on whether that pixel exceeded or fell below the median intensity in the image. For the nonlinearity with two steps, the thresholds were at the 25th and 75th intensity percentiles. For the nonlinearity with three steps, the thresholds were at the 25th, 50th, and 75th intensity percentiles. When a pixel intensity exactly equaled a threshold, we considered its value below threshold. Binary nonlinearities with a larger number of steps produced grainier images that indicate a spatial decorrelation of the transformed image. (B) We computed second-order spatial correlation functions across the nonlinearly transformed natural image ensemble. This confirmed that each step in the binarizing nonlinearity further decorrelated the image ensemble. (C) In addition to decreasing the spatial extent of correlations, a larger number of transitions also degraded the performance of the front-end nonlinearity model.

https://doi.org/10.7554/eLife.09123.016

The HRC correlates two signals that are offset in space and differentially delayed in time. One intuition that researchers often apply to this computation is that the correlation operation effectively detects times when two signals that are offset in space and time are equal. However, a motion estimator that strictly obeyed this intuition would be agnostic to the spatial correlation structures present in the input signals, and our results show that the HRC is not (see also Appendix 5). Instead, the HRC also generates motion signals when its two input channels are imperfectly aligned, and these signals depend strongly on the correlation structure of the inputs (Appendix figure 1D). Our results thus show that the HRC's ability to detect imperfect coincidences contributes significantly to its performance as a motion estimator, as was suggested intuitively in Appendix 1.

Appendix 5

Motion estimation with Gaussian image statistics—the role of spatial correlations on the accuracy of 2-point correlators.

In this section, we apply the results of Appendix 2 to the special case of normally distributed velocities and normally distributed image ensembles. This model formalizes how spatial correlations in the natural world affect the accuracy of motion estimation by 2-point correlators and shows how spatial decorrelation can adversely affect estimation accuracy. For example, we'll show that the simplest HRC is unable to extract motion signals from high frequency components of the image ensemble, yet those components still lead to variability in the motion estimator. Thus, this HRC works best when the image ensemble is correlated in a manner that avoids high-frequency components in the signal, and spatial low-pass filtering at the photoreceptor level can help to eliminate the high-frequency image components that hurt the HRC's accuracy.

Here we use the same velocity distribution that we used in Appendix 3 (i.e., Equation 14). However, we now allow the two point correlation function to have arbitrary structure

(21) C(2)(Δ)=k=0Skcos(kΔ),

where Sk are the Fourier coefficients for C(2)(Δ), and we have noted that the image ensemble is 2π-periodic. Note that Sk is called the power spectrum of the image ensemble, and uncorrelated ensembles correspond to the special case where Sk = constant. With these assumptions

(22) v(t)C(2)(Δij+t1t2dtv(t))v=k=0Skvcos(k(Δi,j+v(t2t1)))v.

By evaluating the integral, we find that this velocity expectation is

(23) vcos(k(Δi,j+v(t2t1)))v=k(t1t2)σv2sin(kΔij)e12k2(t2t1)2σv2.

Thus, if we define

(24) γk=kσv2sin(kΔij)dt1dt2κij(2)(t1,t2)(t1t2)e12k2(t2t1)2σv2,

then

(25) v(t)ve(2)(t)=k=0γkSk.

Each frequency component of the image ensemble linearly contributes to the correlation between the 2-point correlator's response and the velocity. The weight of each frequency component is determined by the structure of the 2-point correlator and the width of the velocity distribution.

We compute the fourth-order moment of the image ensemble using Wick's theorem for Gaussian moments, which says

(26) C(x1)C(x2)C(x3)C(x4)=C(x1)C(x2)C(x3)C(x4)+C(x1)C(x3)C(x2)C(x4)+C(x1)C(x4)C(x2)C(x3).

This immediately implies that

(27) C(4)(Δ1,Δ2,Δ3)=C(2)(Δ1)C(2)(Δ3Δ2)+C(2)(Δ2)C(2)(Δ3Δ1)+C(2)(Δ3)C(2)(Δ2Δ1).

Once again, it's convenient to rewrite this expression in the Fourier domain

(28) C(4)(Δ1,Δ2,Δ3)=k1=0k2=0Sk1Sk2(cos(k1Δ1)cos(k2(Δ3Δ2))+cos(k1Δ2)cos(k2(Δ3Δ1))+cos(k1Δ3)cos(k2(Δ2Δ1))).

With these assumptions

(29) C(4)(Δij+t1t2dtv(t),t1t3dtv(t),Δij+t1t4dtv(t))v=k1=0k2=0Sk1Sk2cos(k1(Δij+v(t2t1)))cos(k2(Δij+v(t4t3)))+cos(k1v(t3t1))cos(k2v(t4t2))+cos(k1(Δij+v(t4t1)))cos(k2(Δij+v(t2t3)))v.

We evaluate the expectations over velocity by noting that each has the form

(30) cos(k1(Δ+vδ1))cos(k2(Δ+vδ2))v=12e12(k1δ1+k2δ2)2σv2(cos(Δ(k1+k2))+e2k1k2δ1δ2σv2cos(Δ(k1k2)))

for some spatial offset Δ and temporal offsets {δ1,δ2}. Thus, if we define

(31) Γk1k2=dt1dt2κij(2)(t1,t2)dt3dt4κij(2)(t3,t4)(12e12(k1(t2t1)+k2(t4t3))2σv2(cos(Δij(k1+k2))+e2k1k2(t2t1)(t4t3)σv2cos(Δij(k1k2)))+12e12(k1(t4t1)+k2(t2t3))2σv2(cos(Δij(k1+k2))+e2k1k2(t4t1)(t2t3)σv2cos(Δij(k1k2)))+12e12(k1(t3t1)+k2(t4t2))2σv2(1+e2k1k2(t3t1)(t4t2)σv2)),

then

(32) (ve(2)(t))2=k1=0k2=0Γk1k2Sk1Sk2.

Power spectrum components contribute to the 2-point correlator's variance quadratically.

Putting these pieces together, the expected squared error achieved by a 2-point correlator is a quadratic function of the power spectrum

(33) ϵ=σv22k=0γkSk+k1=0k2=0Γk1k2Sk1Sk2.

We're interested to know whether spatial correlations can enhance the accuracy of the 2-point correlator. This will be the case unless a uniform power spectrum minimizes ϵ. Note that every physically meaningful power spectrum is non-negative

(34) Sk0.

Thus, the minimum of ϵ either occurs at an extremum point or on the boundary of admissible solutions. If the minimum occurs on the boundary, then a subset of the Sk are exactly equal to zero. In particular, the power spectrum would not be constant, which implies that the image ensemble would be spatially correlated. At an extremum point, we must find

(35) 0=ϵSk=2γk+2k=0ΓkkSk

for every k. A uniform power spectrum can only satisfy this condition if

(36) γk=βk=0Γkk,

where β>0 is the (constant) value of each power spectrum component. This is generally not the case, so correlations exist that would help typical 2-point correlators.

For example, the simplest HRC, which replaces the low-pass and high-pass filters with pure time delays, is

(37) R~=A(U1(tτ)U2(t)U1(t)U2(tτ)),

where A is a constant with units of °/s. For this model,

(38) κ1,2(2)(t1,t2)=A(δ(t1τ)δ(t2)δ(t1)δ(t2τ)).

Substituting this expression into the above formulas, we find

(39) γk=2Akτσv2sin(kΔ0)e12k2τ2σv2

and

(40) Γk1k2=A2(3sin(k1Δ0)sin(k2Δ0)(e12(k1k2)2τ2σv2e12(k1+k2)2τ2σv2)+(1cos(k1Δ0)cos(k2Δ0))(2e12(k1k2)2τ2σv2e12(k1+k2)2τ2σv2)),

where Δ0 is the spacing between adjacent photoreceptors. Note that

(41) limkγk=0.

On the other hand,

(42) limk2k1Γk1k2=2A2(1cos(k1Δ0)cos(k2Δ0)).

This does not approach zero, even for large values of k1. Therefore, k=0Γkk diverges and γkβk=0Γkk. In this model, high frequency components lack signal but contribute noise. It's helpful if these frequency components are absent from the image ensemble. Future work should more fully investigate the role of spatial correlations in naturalistic motion estimation.

Appendix 6

Front-end nonlinearities give the HRC access to higher-order correlations.

The response of the front-end nonlinearity model to a 3-point glider stimulus is determined by the higher-order correlations that it detects in the stimulus. Furthermore, we argued in Appendix 1 and Figure 1I that higher-order correlations can contribute to the accuracy of visual motion estimators. We now describe how front-end nonlinearities provide pair-correlation mechanisms with access to certain types of higher-order correlations.

We suppose that the front-end nonlinearity, denoted h, has a power series expansion:

(43) h(x)=n=0hnxn.

Then the cross-correlation function between two non-linearly transformed input streams, denoted y1 and y2, is

(44) y1(t)y2(t+τ)=h(V1(t))h(V2(t+τ))=n,m=0hnhmV1n(t)V2m(t+τ),

where V1 and V2 are linear photoreceptor signals. This substitution explicitly demonstrates that the front-end nonlinear transformation enables pair correlation mechanisms to incorporate higher-order correlations of the form V1n(t)V2m(t+τ). The choice of nonlinearity specifies the expansion coefficients, hn, which in turn determines the pattern of higher-order correlations that the pair correlator incorporates into its velocity estimate. For example, sensitivity to odd-ordered correlations demands that hn be large for some even values of n. These expansion coefficients would manifest themselves in the structure of the front-end nonlinearity as asymmetries between positive and negative contrasts, but strong asymmetries were not needed to eliminate kurtosis in natural image ensembles (Figure 2C). Inversely, one could use this equation to determine whether a set of expansion coefficients exist that would implement a desired series of multipoint correlators. The preceding argument implies that strongly asymmetric front-end nonlinearities would be needed to account for the 3-point glider responses.

Appendix 7

Expansion of the weighted 4-quadrant model.

In this Appendix, we rewrite the weighted 4-quadrant model in a basis that isolates its dependence on 2-point correlations, on higher-even-ordered correlations, and on two types of odd-ordered correlations. In Appendix 8, we'll discuss the motion estimation performance of the weighted 4-quadrant model in this basis in order to gain insight into why performance-optimized weighted 4-quadrant models also predict 3-point glider responses that resemble Drosophila behavior.

The weighted 4-quadrant model supposes that the input signals are segregated into four separate streams:

(45) Q++=[f*V1]+[g*V2]+[g*V1]+[f*V2]+Q+=[f*V1]+[g*V2][g*V1][f*V2]+Q+=[f*V1][g*V2]+[g*V1]+[f*V2]Q=[f*V1][g*V2][g*V1][f*V2]

where Qab denotes the (ab) quadrant for a,b{+,}, [x]+ is x for x>0 and is zero otherwise, and [x] is x for x<0 and is zero otherwise. The HRC is equal to

(46) R=Q+++Q++Q++Q.

More generally, we suppose that Drosophila could estimate motion as any linear combination of these signals, and we define the weighted 4-quadrant model as

(47) Q=w++(Q)Q+++w+(Q)Q++w+(Q)Q++w(Q)Q,

where w++(Q), w+(Q), w+(Q), and w(Q) are linear weighting coefficients that specify the computation performed by the model. Since this section, and the next two, focus entirely on the weighted 4-quadrant model, we simplify notation by dropping the superscript (Q).

The weighted 4-quadrant model can be rewritten in an alternate form that facilitates an understanding of how various correlation types contribute to its motion estimates. We begin by noting that

(48) [x]+=x2(1+sgn(x)),
(49) [x]=x2(1sgn(x)),

where sgn(x) is +1 for positive arguments and −1 for negative arguments. We thus see that

(50) Qab=[f*V1]a[g*V2]b[g*V1]b[f*V2]a=(f*V1)(g*V2)4(1+asgn(f*V1)+bsgn(g*V2)+absgn(f*V1)sgn(g*V2))(g*V1)(f*V2)4(1+bsgn(g*V1)+asgn(f*V2)+absgn(g*V1)sgn(f*V2)).

Therefore, the complete weighted 4-quadrant model is

(51) Q=w+++w++w++w4((f*V1)(g*V2)(g*V1)(f*V2))+w+++w+w+w4((f*V1)sgn(f*V1)(g*V2)(g*V1)(f*V2)sgn(f*V2))+w++w++w+w4((f*V1)(g*V2)sgn(g*V2)(g*V1)sgn(g*V1)(f*V2))+w++w+w++w4((f*V1)sgn(f*V1)(g*V2)sgn(g*V2)(g*V1)sgn(g*V1)(f*V2)sgn(f*V2))

This expression for the weighted 4-quadrant model groups the four weighting coefficients into four alternate terms. The first term is proportional to a standard HRC, which computes second-order correlations. We denote its associated coefficient as

(52) weven=2=14(w+++w++w++w).

The second and third terms invert sign and retain magnitude under contrast inversion. Therefore, they only compute odd-ordered correlations:

(53) wodd=14(w+++w+w+w),
(54) wodd*=14(w++w++w+w).

The fourth term is unaffected by contrast inversion. Thus, it only computes even-ordered correlations. We'll soon see that the lowest-order contribution from this term is fourth-order, so we denote its coefficient as

(55) weven>2=14(w++w+w++w).

These four coefficients define the correlational basis considered in Figure 3—figure supplement 1. For example, note that Figure 3—figure supplement 1A shows the transformation defined by Equations 5255.

Because sgn(x) is a non-analytic function, it is still somewhat opaque how the weighted 4-quadrant model relates to specific higher-order correlations in the visual stimulus. We thus rewrite sgn(x) as the limit of an analytic function:

(56) sgn(x)=limβerf(βx),

where

(57) erf(x)=2π0xdyey2

is the Gauss error function. The Gauss error function is entire, which means that it has a power series expansion for any value x. Also note that since real biological nonlinearities are not infinitely sharp, a more realistic weighted 4-quadrant model would fix β at a finite value. We thus consider the follow approximation,

(58) sgn(x)2πn=0(1)n(βx)2n+1n!(2n+1)=2π(βx(βx)33+O((βx)5)),

where β(0,). Although high-order terms might not be negligible in this expansion, the contributions of low-order correlations to visual motion estimation are set by low-order terms. In particular, the contributions of second, third, and fourth-order correlations to the weighted four quadrant model are determined by the leading terms in the expansion,

(59) F=weven=2((f*V1)(g*V2)(g*V1)(f*V2))+wodd2βπ((f*V1)2(g*V2)(g*V1)(f*V2)2)+wodd*2βπ((f*V1)(g*V2)2(g*V1)2(f*V2))+weven>24β2π((f*V1)2(g*V2)2(g*V1)2(f*V2)2)+O(β3V5).

Thus, the third-order term associated with wodd squares the low-pass filtered signal and might help to account for light–dark asymmetries in the low-pass filtered signal. The third-order term associated with wodd* squares the high-pass filtered signal. Finally, note that this formula confirms that the lowest-order term associated with weven>2 is fourth-order.

Appendix 8

The weighted 4-quadrant model improves motion estimation with odd-ordered correlations.

In the main text we quantitatively characterized the weighted 4-quadrant model by discussing its accuracy given various subsets of the four quadrants (Figure 3C). Here we consider the performance of the weighted 4-quadrant model in the correlational basis defined in Appendix 7 and Figure 3—figure supplement 1A. These results lead to a simple interpretation of the computation performed by performance optimized weighted 4-quadrant models.

Models that oriented all of their weight along the even = 2 axis outperformed models that focused their weight along any other correlational axis (Figure 3—figure supplement 1B). This reinforces the foremost importance of second-order correlations for motion estimation. In isolation, odd-ordered correlations were weaker predictors of motion than second-order correlations (Figure 3—figure supplement 1B). Nevertheless, the odd class well complemented the HRC, and the full accuracy of the weighted 4-quadrant model was obtained by linearly combining the even = 2 and odd correlation classes (best 2 bar, Figure 3—figure supplement 1B). This result suggests that the weighted 4-quadrant model has two relevant dimensions. In particular, accurate models combine an HRC with odd-ordered correlations that account for statistical light–dark asymmetries in the HRC's low-pass filtered branch.

Since the weighted 4-quadrant model only has four parameters, it's possible to exhaustively study its parameter dependence. We have in mind models that are correctly scaled, in which case the mean squared error is determined by the correlation coefficient (Appendix 2). Since the value of the correlation coefficient is unchanged when all four weighting coefficients are scaled by the same positive factor, it suffices to consider weighting coefficients drawn from the 3-sphere, such that w++2+w+2+w+2+w2=1. Because the 3-sphere has a finite volume, we were able to densely sample the correlation coefficient for all parameter values (Appendix figure 3). This function has one global maximum, corresponding to the optimal weight vector discussed in the main text. Its global minimum occurs on the polar opposite side of the 3-sphere, where the weighted 4-quadrant model is most strongly anti-correlated with the velocity. More generally, correlation coefficients corresponding to model parameters on opposite poles of the 3-sphere always have the same magnitude and opposite sign. Both models explain the same amount of variance about the velocity, and they become equivalent after they're correctly scaled. Thus, we henceforth focus our discussion on the hemisphere where the correlation coefficient was positive.

Appendix figure 3
Accuracy of the weighted 4-quadrant model across model parameters.

(A, B) We computed the correlation coefficient between the velocity and the response of the weighted 4-quadrant model for all possible sets of model parameters. Since rescaling the weight vector does not affect the correlation coefficient, we assumed that all model parameters satisfy a,b{+,}(wab(Q))2=1. We color-coded each set of model parameters by its accuracy and projected the parameter space onto various subspaces. (A) We first examined the quadrant basis by projecting onto the {(− −), (− +)} (left) and {(+ −), (+ +)} (right) subspaces. (B) We next examined the correlational basis by projecting onto the {even = 2, odd} (left) and {odd*, even >2} (right) subspaces. These project into different linear combinations of the original quadrant weightings. One of the projections is the pure HRC (even = 2), while the other projections contain only odd correlations, of two different types (odd and odd*), or only even correlations of order greater than 2 (even >2). These projections show that accurate weighted 4-quadrant models always put positive weight into 2-point correlations and negative weight into odd-ordered correlations. Note that the glider responses predicted by the weighted 4-quadrant model mirror this pattern (Figure 3D).

https://doi.org/10.7554/eLife.09123.017

Weighted 4-quadrant models were most accurate when w−+ and w−− were large (Appendix figure 3A, left) and w++ and w−+ were small (Appendix figure 3A, right). In the correlational basis, the HRC is the model with maximum weight in weven=2 and with zero weight in wodd, wodd*, and weven>2. Thus, this basis makes it easy to compare the accuracy of the HRC to other weighted 4-quadrant models (Appendix figure 3B). Furthermore, this basis clearly sorts the weighted 4-quadrant models according to their accuracy and confirms that that the accuracy of a weighted 4-quadrant model is largely determined by weven=2 and wodd (Appendix figure 3B, left). Higher even-ordered correlations and odd-ordered correlations that account for light–dark asymmetries in the high-pass filtered visual signals did not contribute prominently to the accuracy of the weighted 4-quadrant model (Appendix figure 3B, right). Interestingly, Appendix figure 3A shows that there is a diversity of ways to combine the four quadrants in order to improve the accuracy of the HRC, which translates into a diversity of correlational responses (Appendix figure 3B). Similarly, the HRC is only one of many models that achieve a comparable level of accuracy. Every other motion estimator that achieves the HRC's performance level incorporates higher-order correlations into its estimate.

Appendix 9

The weighted 4-quadrant model in the basis of PCs.

PCA is a popular method to reduce the dimensionality of neural population recordings. In this section, we conceptualize the four quadrants as a small neural population and study how each PC accounts for variance in the system and contributes to motion estimation. We show that most of the weighted 4-quadrant model's variance is due to two of the four PCs. Interestingly, most of this variance is not velocity-related, and we show that the two low-variance PCs are the ones that dominate motion estimation.

We began by directly applying PCA to the weighted 4-quadrant model. We computed the 4 × 4 covariance matrix of the four quadrants over the ensemble of simulated motions (Appendix figure 4A). The eigenvectors of the covariance matrix are called the PCs (Appendix figure 4B), and the associated eigenvalues specify the amount of variance accounted for by each PC (Appendix figure 4C). We found that the first two PCs accounted for 86.3% of the variance, whereas the third and fourth PCs each contributed about 7% of the variance (Appendix figure 4C). The high-variance eigenvectors roughly corresponded to a sum and a difference of the (+ +) and (+ −) quadrants, whereas the low-variance PCs roughly corresponded to a sum and a difference of the (− +) and (− −) quadrants (Appendix figure 4B). The (− −) and (− +) quadrants best facilitated motion estimation (Figure 3C). Thus, the low-variance PCs were most important for motion estimation.

Appendix figure 4
The weighted 4-quadrant model in the basis of principal components (PCs).

(A) We computed the covariance matrix of quadrant responses across the simulated ensemble of naturalistic motions. (B) The eigenvectors of the covariance matrix are called PCs. Signals from the (+ +) and (+ −) quadrants primarily comprised the first two PCs, whereas the (− +) and (− −) components comprised the third and fourth PCs. (C) The first two PCs accounted for the vast majority of the weighted 4-quadrant model's response variance. (D) Each member of the ensemble of naturalistic motions comprised a velocity and a natural image, and both components contributed variance to the model response. Although the first two PCs accounted for most of the variance, little of that variance was associated with the velocity of motion. Instead, the third and fourth PCs best aided motion estimation, because they contributed the vast majority of the velocity-associated variance.

https://doi.org/10.7554/eLife.09123.018

This result is counter to one's usual intuition, but it is a straightforward consequence of the mathematics of linear regression and PCA. We want to linearly combine the PC signals to best predict the velocity:

(60) β=argmin(vβTx)2,

where β is a four-dimensional column vector of weights, v denotes the velocity, the superscript T denotes the matrix transpose, and x is the 4-vector of PC signals. The solution to this problem is well-known from the theory of linear regression:

(61) β=M1U,

where Mij=xixj is the covariance matrix of the predictors, and Ui=vxi is the covariance of each predictor with the velocity. In practice, we estimate these expectations from the empirical data, and PCs are uncorrelated over the naturalistic motion ensemble by construction

(62) Mij=λiδij,

where λi is the variance associated with ith PC, and δij is the Kronecker δ-function. Thus,

(63) βi=vxixi2=σvλiriλi=σvriλi,

where σv is the standard deviation of the velocity signal, and ri is the correlation coefficient between the velocity and the ith PC.

It is also easy to calculate the correlation coefficient between the true velocity and the estimated velocity. First note that

(64) vβTx=βTvx=iσvriλiσvλiri=σv2iri2,
(65) (βTx)2=i,jβiβjxixj=iσv2ri2λiλi=σv2iri2.

Thus the square of the correlation coefficient between the true and estimated velocities is

(66) r2=(vβTx)2v2(βTx)2=iri2.

Because the PCs are uncorrelated, each contributes independently to the motion estimator's accuracy. The amount that each PC contributes to the estimation accuracy is determined by its correlation with the velocity, and all dependence on the total amount of variance associated with the PC has dropped out entirely. These conclusions are also true when we look at the squared error directly

(67) ϵ=(vβTx)2=σv2+(βTx)22vβTx=σv2(1iri2).

As would be expected from this formula, the third and fourth PCs account for much more of the velocity-associated variance than the first and second PCs (Appendix figure 4D). Nevertheless, the first PC does account for a significant portion of the velocity-associated variance (Appendix figure 4D), so the basis of PCs does not fully reveal the structure that was apparent in the correlational basis (Appendix 8).

Appendix 10

Novel use of low-order signatures for motion estimation.

The non-multiplicative nonlinearity model (Figure 4A) relaxed the assumption that Drosophila's motion estimator multiplies its inputs and substantially improved the accuracy of visual motion estimation (Figure 4E). Surprisingly, the non-multiplicative nonlinearity model slightly outperformed the HRC when we parameterized it as a second-order polynomial (Figure 4—figure supplement 2). This indicates that there are useful low-order correlations that the HRC neglects. In this section, we will explain how visual motion estimators can sometimes productively incorporate computational signatures that do not nonlinearly combine signals across space.

This section considers computational signatures that clash harshly with our usual intuition for visual motion estimation, and we need to unpack how the motion estimator in Figure 4—figure supplement 2 works before we can understand why it works. The observed improvement results from a linear combination of the HRC

(68) R=(f*V1)(g*V2)(g*V1)(f*V2)

with a linear transformation of the photoreceptor signals

(69) L=g*V1g*V2.

We thus must consider the motion estimator

(70) ve(low)=βRR+βLL,

where βR and βL are the weighting coefficients that minimize the mean-squared error. Note that L linearly combines signals from multiple points in space. Like the HRC, it is mirror anti-symmetric:

(71) {V1(t),V2(t)}{V2(t),V1(t)}LL.

It is useful to take a detour to abstractly consider how motion estimation performance depends on the joint statistics of R, L, and the velocity of motion, v. All three of these quantities are zero mean. We denote their variances as

(72) σR2=R2,σL2=L2,σv2=v2

and their cross-correlation coefficients as

(73) r(R)=vRσvσR,r(L)=vLσvσL,c(RL)=RLσRσL.

The optimal weighting coefficients are determined by these quantities (see Equation 61):

(74) βR=σv(r(R)c(RL)r(L))σR(1(c(RL))2),βL=σv(r(L)c(RL)r(R))σL(1(c(RL))2);

as is the correlation coefficient between the true velocity and ve(auto):

(75) r(low)=(r(R))2+(r(L))22c(RL)r(R)r(L)1(c(RL))2.

Across the simulated ensemble of naturalistic motions we empirically found that r(R)0.24, r(L)0.0017, and c(RL)0.28. Thus, we note that |r(L)||r(R)| and approximate the correlation coefficient as

(76) r(low)r(R)11(c(RL))2.

Thus, we expect the inclusion of the linear term L to improve the accuracy of motion estimation by about 4.3% (compare to Figure 4—figure supplement 2). Interested readers can find a complete derivation of these equations in section V of the supplemental materials for (Clark et al., 2014).

With this machinery in hand, we can start to understand the utility of the linear term. First, note that this term was only weakly correlated with the velocity across the simulated ensemble of motions. Furthermore, the correlation would have been exactly zero if v(g*V1) had been equal to v(g*V2), as would have been the case for an ensemble that was perfectly translationally invariant. So the small correlation we observed between L and v is nothing more than residual noise resulting from a finitely sized data sample that did not explicitly enforce translation invariance. Nevertheless, it's critical to realize that Equation 76 treated r(L) as if it were zero, yet it still managed to account for the results of Figure 4—figure supplement 2. Thus, this residual sampling noise has nothing to do with the improvements offered by the hybrid estimator. As intuitively expected, the linear term is completely uncorrelated with the velocity of motion.

Equation 76 suggests that a linear term, which is itself uncorrelated with the velocity of motion, can nevertheless help velocity estimation. However, this improvement demands that it be combined with another motion estimator that: (i) is correlated with the velocity (i.e., r(R)0); and (ii) is correlated with the linear term (i.e., c(RL)0). Our numerical results indicate that the HRC is an example of such a motion estimator. The HRC obviously satisfies the first condition. To examine the second condition, we note that correlation between the HRC and the linear term is nonzero if and only if

(77) RL=(f*V1)(g*V1)(g*V2)+(g*V1)(f*V2)(g*V2)(f*V1)(g*V2)2(g*V1)2(f*V2)

is nonzero. As long as the image ensemble is light–dark asymmetric, there are no symmetry principles that force this number to vanish for a general choice of f and g. Our numerical results show that the associated correlation coefficient is far from zero for natural inputs and our choices of filters. Fundamentally, this correlation can be nonzero because the HRC's response depends on the pattern that is moving, as does the linear response. Because image-induced variability is partially shared between the HRC and the linear term, the latter can help to eliminate image-induced noise from the HRC, thereby improving the motion estimate.

Although our results indicate that a linear term can improve local motion estimation, its benefits do not sum over space. In particular, imagine an ensemble of elementary motion detectors that combine a local HRC and a local linear estimator:

(78) ve,i(low)=βR((f*Vi)(g*Vi+1)(g*Vi)(f*Vi+1))+βL(g*Vig*Vi+1),

where i indexes the first point in space surveyed by the ith local estimator. A whole field motion percept could be found by averaging these local motion signals over space

(79) ve(low)=1Ni=1Nve,i(low),

where N denotes the total number of local motion detectors. However, the second term in the linear estimator at point i cancels the first term in the linear estimator at point i+1. Thus, spatial averaging eliminates most of the dependence on the linear term

(80) ve(low)=βRNi=1N((f*Vi)(g*Vi+1)(g*Vi)(f*Vi+1))+βLN(g*V1g*VN+1).

All that remains of the linear term is a boundary term that depends on photoreceptor activity at the edges of the visual field. Furthermore, the magnitude of this contribution decreases with N. Thus, linear estimators have little utility for full-field motion estimation. Nevertheless, it's conceivable that such terms could play a role in Drosophila's motion estimation circuit, because the same elementary motion detector is thought to underlie a wide variety of motion-guided behaviors, and the inclusion of this locally beneficial term is not detrimental to whole field motion estimation.

Finally we note that the principles discussed in the context of linear motion estimators also apply in other counterintuitive contexts. For example, consider an autocorrelator,

(81) A=(f*V1)(g*V1)(f*V2)(g*V2),

which correlates visual signals from the same point in space. Like the HRC, it is mirror anti-symmetric:

(82) {V1(t),V2(t)}{V2(t),V1(t)}AA,

but it is uncorrelated with the velocity. Nevertheless, the autocorrelator's correlation with the HRC is determined by

(83) RA=(f*V1)2(g*V1)(g*V2)+(g*V1)(f*V2)2(g*V2)(f*V1)(f*V2)(g*V2)2(f*V1)(g*V1)2(f*V2)

and need not be zero. Empirically, we find the relevant correlation coefficient to be −0.40 across the ensemble of naturalistic motions, so Equation 76 implies that this autocorrelator would enhance the HRC by 8.9%. However, such improvements do not sum over space. Thus, autocorrelators might be relevant for local motion estimates, but not for motion estimates that average over space.

Appendix 11

Regarding the computational problem of visual motion estimation.

Throughout this paper, we have illustrated connections between the computations performed by our models and spatiotemporal correlations. These links are important for both practical and theoretical reasons. First, the many experimental successes of the HRC already suggest that the fly's computation of motion is organized around spatiotemporal correlations in the stimulus (Silies et al., 2014). Thus, by relating our models to spatiotemporal correlations, we were able to discern how each model generalizes this canonical model. For example, Figure 3—figure supplement 1B shows that the optimal weighted 4-quadrant model supplements the standard HRC with a specific subclass of odd-ordered correlations, an observation that both reiterates the importance of the HRC and highlights the most critical signals that it lacks. Second, spatiotemporal correlations provide a fundamental connection between the motion estimation strategies used by invertebrates and vertebrates (Adelson and Bergen, 1985; van Santen and Sperling, 1985). In particular, although the HRC and motion energy models differ in their architectural details, both models are ultimately driven by 2-point correlations in the stimulus. Therefore, general arguments framed in terms of spatiotemporal correlations are easy to investigate in the specific context of either the HRC or motion energy model. Third, an understanding of the spatiotemporal correlations computed by each model facilitates the design of psychophysical experiments that test the models. For example, glider stimuli (Hu and Victor, 2010) provide flexible experimental tools to probe how specific correlations contribute to motion percepts. Future work will lead to a variety of more realistic models that can also be characterized by the stimulus correlations that they detect. These models can be distinguished by carefully designed glider experiments.

From a theoretical point of view, correlation functions are important because they provide a mathematical basis in which to decompose neural computations (Poggio and Reichardt, 1973, 1980; Fitzgerald et al., 2011). David Marr famously proposed that neural computation must be understood at several levels (Marr and Poggio, 1976). He described his second level as “that at which the algorithms that implement a computation are characterized.” Our emphasis on correlation functions is directed towards unraveling motion estimation at this algorithmic level. As illustrated concretely by Figure 3—figure supplement 1, it's possible for an algorithm to have a simple characterization in terms of correlation functions, even when the fundamental computational units (e.g., the quadrants) do not actually compute correlations. Furthermore, correlation functions intuitively relate the visual signatures of motion to measurable features of natural visual environments (Appendix figure 1). Nevertheless, it's possible that correlation functions will ultimately provide an inefficient basis for representing the algorithms of visual motion estimation. For example, although the weighted 4-quadrant model is well understood in terms of the correlations that it detects, it would be nontrivial to discern its underlying simplicity based solely on its responses to glider stimuli, because the constraints relating various higher order correlators would be a priori unknown. Overall, we consider correlation functions to provide a useful lens for characterizing and understanding the algorithms of visual motion estimation, but research should also consider visual motion estimation in alternate bases that might reflect the brain's biological substrates more directly (Rust et al., 2006).

Our characterization of visual motion estimation in terms of correlation functions provides an interesting perspective on the computational problem faced by Drosophila's visual motion estimator in natural environments. Natural images contain many low and high-order correlations (Geisler, 2008), and this implies that the fly brain could in principle use a wide array of correlations for visual motion estimation (Appendix figure 1). However, each correlation is only weakly associated with the velocity of motion in naturalistic settings (Dror et al., 2001; Clark et al., 2014). The reason for this is that the specific structure of the scene that is moving acts as a nuisance parameter that hinders the unambiguous assignment of a velocity to pattern of light input. For example, it's well known that the temporal frequency of a moving sinusoidal grating shapes the HRC's output (Egelhaaf et al., 1989), thereby conflating the velocity with the grating's spatial frequency. More generally, the variability of a multipoint correlator across an ensemble of moving scenes is determined by higher-order statistics of the image ensemble (e.g., see Appendix 2). The fact that the same natural image drives every multipoint correlator also implies that the correlators co-vary with each other across the naturalistic motion ensemble. This shared variability can sometimes enable higher-order multipoint correlators to compensate effectively for image-induced noise that contaminates the HRC (Clark et al., 2014).

Questions of how brains compute behaviorally relevant stimulus features from sensory inputs are central to neuroscience, but they are extraordinarily difficult to answer, even in principle. In the context of Drosophila's visual motion estimator, the ensemble of photoreceptor signals contains many nonlinear cues that are weakly correlated with the stimulus velocity and with each other under naturalistic conditions. There are many ways to pool these signals into an improved motion estimate. The space of possible stimuli is astronomically large, so it is impossible for experiments to sample it completely. Nevertheless, synthetic laboratory stimuli can be designed to rule out specific algorithms that the brain might use to estimate motion. Thus, to deconstruct a neural computation, one must find ways to dramatically restrict the space of candidate models and to identify interesting models that can be experimentally ruled out. It's important to note that we did not construct our models to reproduce the behavioral data, even though this is a straightforward exercise (Figure 4—figure supplement 1). Instead we aimed for a predictive framework that can relate behavioral responses to the statistics of natural sensory inputs, the statistics of natural behavior, and the constraints imposed by the neural circuits that implement the computation. Such constructions are complicated and depend on features of neural circuits that are incompletely known. Nevertheless, we hope that this added complexity will eventually pay off in computational models that have a rational structure from the viewpoint of the stimulus, the animal, and the brain.

References

  1. 1
    Spatiotemporal energy models for the perception of motion
    1. E Adelson
    2. J Bergen
    (1985)
    Journal of the Optical Society of America. A, Optics and Image Science 2:284–299.
    https://doi.org/10.1364/JOSAA.2.000284
  2. 2
  3. 3
    Selective sensitivity to direction of movement in ganglion cells of the rabbit retina
    1. HB Barlow
    2. RM Hill
    (1963)
    Science 139:412–414.
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
    The analysis of visual motion: a comparison of neuronal and psychophysical performance
    1. KH Britten
    2. MN Shadlen
    3. WT Newsome
    4. JA Movshon
    (1992)
    The Journal of Neuroscience 12:4745–4765.
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
    Accuracy of velocity estimation by Reichardt correlates
    1. DO Dror
    2. DC O'Carroll
    3. SB Laughlin
    (2001)
    Journal of the Optical Society of America. A, Optics and Image Science 18:241–252.
    https://doi.org/10.1364/JOSAA.18.000241
  16. 16
    Transient and steady-state response properties of movement detectors
    1. M Egelhaaf
    2. A Borst
    (1989)
    Journal of the Optical Society of America. A, Optics and Image Science 6:116–127.
    https://doi.org/10.1364/JOSAA.6.000116
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
    Computation of object approach by a wide-field, motion-sensitive neuron
    1. F Gabbiani
    2. HG Krapp
    3. G Laurent
    (1999)
    The Journal of Neuroscience 19:1122–1141.
  22. 22
  23. 23
    Flight control in Drosophila by visual perception of motion
    1. KG Götz
    (1968)
    Biological Cybernetics 4:199–208.
  24. 24
    Visual control of locomotion in the walking fruitfly Drosophila
    1. K Götz
    2. H Wenking
    (1973)
    Journal of Comparative Physiology. A, Sensory, Neural, and Behavioral Physiology 85:235–266.
    https://doi.org/10.1007/BF00694232
  25. 25
  26. 26
    Systemtheoretische Analyse der Zeit-, Reihenfolgen-und Vorzeichenauswertung bei der Bewegungsperzeption des Rüsselkäfers Chlorophanus
    1. B Hassenstein
    2. W Reichardt
    (1956)
    Zeitschrift für Naturforschung B 11:513–524.
  27. 27
  28. 28
    A set of high-order spatiotemporal stimuli that elicit motion and reverse-phi percepts
    1. Q Hu
    2. JD Victor
    (2010)
    Journal of Vision, 10, 10.1167/10.3.9.
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
    Animals as mathematicians
    1. H Kalmus
    (1964)
    Animals as mathematicians.
  35. 35
  36. 36
    Biophysics of computation: information processing in single neurons
    1. C Koch
    (2004)
    Oxford university press.
  37. 37
  38. 38
    A simple coding procedure enhances a neuron's information capacity
    1. SB Laughlin
    (1981)
    Zeitschrift für Naturforschung C 36:51.
  39. 39
    The role of sensory adaptation in the retina
    1. SB Laughlin
    (1989)
    The Journal of Experimental Biology 146:39–62.
  40. 40
  41. 41
    From understanding computation to understanding neural circuitry
    1. D Marr
    2. T Poggio
    (1976)
    Massachusetts Institute of Technology Artificial Intelligence Laboratory A.I. Memo 357.
  42. 42
  43. 43
  44. 44
    Evolutionary convergence in computation of local motion signals in monkey and dragonfly
    1. EI Nitzany
    2. G Menda
    3. PS Shamble
    4. JR Golden
    5. RR Hoy
    6. JD Victor
    (2014)
    Computational and Systems Neuroscience (CoSyNe).
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
    Collision-avoidance and landing responses are mediated by separate pathways in the fruit fly, Drosophila melanogaster
    1. LF Tammero
    2. MH Dickinson
    (2002)
    The Journal of Experimental Biology 205:2785–2798.
  69. 69
    Regression shrinkage and selection via the lasso
    1. R Tibshirani
    (1996)
    Journal of the Royal Statistical Society. Series B (Methodological), 267–288.
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
    Elaborated reichardt detectors
    1. JPH van Santen
    2. G Sperling
    (1985)
    Journal of the Optical Society of America. A, Optics and Image Science 2:300–320.
    https://doi.org/10.1364/JOSAA.2.000300
  75. 75
  76. 76
  77. 77
  78. 78

Decision letter

  1. Matteo Carandini
    Reviewing Editor; University College London, United Kingdom

eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.

Thank you for submitting your work entitled “Naturalistic visual motion estimation by Drosophila” for peer review at eLife. Your submission has been favorably evaluated by Eve Marder (Senior Editor), Matteo Carandini (Reviewing Editor), Jonathan Victor (peer reviewer), and two other reviewers.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

This is an important computational and mathematical study of visual motion analysis: what kinds of motion signals are present in the natural environment, how they can be extracted by biologically-plausible neural circuitry, and the features that govern their performance. The point of reference for the paper is the Hassenstein-Reichardt correlator (HRC), the canonical model for motion signal detection, and the starting point for a great deal of important work on how motion computations are reduced to circuitry. The main point of the paper is that this model fails to account for some qualitative aspects of motion processing (responses to glider stimuli), and that relatively simple extensions of the model enable it to properly predict responses to glider stimuli, and also, substantially improve its accuracy for the naturalistic motion stimuli that are the focus of the paper.

The big-picture findings (which at the moment are very difficult to grasp, see below) are important for the community at large: simple multiplication is not the best approach, and deviations from multiplication – which are being revealed by circuit-level analysis – should likely be viewed as a feature, rather than a bug. This changes the way we view the analysis of the implementation of the HRC in circuitry – rather than attempting to understand how neurons carry out a multiplication, the focus is shifted into how they make use of nonlinearities to do a computation that is, perhaps surprisingly, even better for the task.

Essential revisions:

The paper is currently written in such a way that only an exquisitely trained and alert specialist can appreciate these points. It is essential that the paper be reorganized so that there are clear questions set in context of the literature, and the paper and its logic become easier to follow for someone who is not working on motion detection in fly.

The current organization of the paper, indeed, is obscure. The paper starts by introducing 4 models that are elaborations of the Reichardt detector: (1) multiple channels that explicitly construct higher order correlation statistics, (2) A static front end nonlinearity, (3) separate correlations between lights and darks, (4) nonlinear combination through operations other than multiplication. This is a lot to take in, and only the most motivated readers will get past this point. Most readers, instead, will stop there. It would be much better to introduce the Reichardt detector (and ideally an energy model), show their failings (e.g. with glider stimuli) and then motivate ways in which they could be improved, and introduce a few of the alternative models (do we need all 4?). Otherwise, introducing the 4 models right away feels like introducing solutions in search of a problem.

A key problem is that all the models considered seem to work better than the Reichardt model, with no compelling argument for one vs. the other. Possibly this should be regarded as generating a set of alternate hypotheses for physiologists and anatomists to investigate, but it would be useful to understand better what separates these models.

Because of this, and because the style of writing is opaque, it is currently difficult to discern the “bottom line” in this paper. It reads as a survey of many possible models and some of their virtues and discontents.

It is essential for the authors to decide what exactly they want the paper to convey, and set out a clear set of questions in Introduction, answer them in Results, and return to them in Discussion. The current paper, by contrast, seems to change its mind along the way as to what those questions are, and the reader is left grasping for a specific set of questions that need addressing. For instance, the Discussion (second paragraph) emphasizes processing in segregated ON and OFF channels. Is this the main take-home message? If so, it should be clearly set out as a question in the Introduction.

More generally, it seems essential that the authors introduce the problem in a broader context. As written, the paper uses computational methods to specifically examine some possible mechanisms of motion estimation in flies in view of the inability of the traditional Reichardt model to explain how animals use higher order motion cues. These are important issues, but to engage that broader interest it would be good to relate more substantially with the literature on higher order statistics in natural scenes and adaptation of circuit structure and perceptual phenomena to these statistics. Some suggestions appear in Reviewer 3's specific comments, appended below.

The language needs to become clearer. For instance, at the end of the subsection “Model responses to glider stimuli”, you state, “utility of higher-order correlations for naturalistic motion processing is not restricted to a specific neural circuit implementation.” What does this mean?

Also, while it is understandable that the authors may want to protect the reader from too much detail, at the moment the paper seems to put too much away into the appendices for the reader to be able to follow exactly what is going on. In turn, the material in the appendices is written in an extremely mathematical style, too much so for the readers of this journal – for example, “simply” is overused.

In addition to these key issues of organization and style, there are also some conceptual aspects that need work:

1) The paper seems to concern only rigid, constant, fronto-parallel motion. While this limitation needs to be explicitly stated, it also increases the strength of the findings. That is, there are many ways in which real motion estimation differ from this simplified scenario: for example, there are objects that may be moving independently of the visual flow, and objects may move in depth. Getting rid of these confounds will further complicate algorithms for extraction of visual flow, and influence what is “optimal”. So it is really quite remarkable that even with these real-world complications neglected, the authors still find that extraction of motion is benefited by mechanisms that make use of high-order nonlinearities.

2) The authors' choice of a way to evaluate performance, i.e. correlation of the inferred motion signal with the veridical motion, is problematic for two related reasons. Firstly, performance can be increased merely by attenuating large-magnitude outliers in the estimate (even simply by passing the output through a sigmoidal nonlinearity, independent of the stimulus). Secondly, the mathematical analysis is complicated because of the need to compute the denominator. There is no single best measure of performance, but for the above reasons, correlation is suboptimal. Two possible alternatives are: (a) to use the covariance (or, equivalently, mean-squared error) more extensively – this gets rid of the denominator problem, and may also simplify the understanding of the role of the autocorrelators; (b) to use mutual information between true and estimated velocity. The advantage of this is that it will not depend on any output transformation. There's a good argument that this is biologically appropriate, since after all, the fly does not need to know the velocity, it just needs to control its behavior – and there may well be nonlinear transformations that intervene between the motion signal output, and the motor command. In sum, it would be useful to know whether maximizing covariance, or maximizing information about velocity, would lead to the same conclusions as maximizing correlation. Note that the authors don't necessarily need to refit the models to maximize mutual information. Rather, all that is needed is to take the existing fitted models and calculate the mutual information from the scatter plot of true velocity vs. calculated velocity – a one dimensional calculation that can be done with binning in, say, 16 velocity bins of approximately equal occupancy.

3) The benefits of the autocorrelator are not clear, despite the paper's lengthy material on this point. To get off the ground: is it the case that the stimulus set is symmetric with respect to velocity – i.e., that each spatial profile is presented as moving both with positive and negative velocity? If this is not the case, then it would seem important to explain why this is justified, and of course, autocorrelators may help. And unless this asymmetry is present in natural stimuli, it would seem to be a flaw in the analysis that needs to be fixed.

Indeed, perhaps the stimulus set is actually not symmetric with respect to velocity. This is suggested by the sentence following Eq. 7 of Appendix 4: “The performance of the combined motion estimator is superior to the original direction selective estimator whenever the latter [i.e., the direction selective estimator] has a nonzero correlation with the autocorrelator.”

But if the stimulus set is symmetric in this regard, the result is quite puzzling. Let's say a particular estimator A that was optimal had the following behavior. For a particular stimulus moving with a positive velocity (say, S+, with signed velocity v(S+)) gave a result E+, and for the same stimulus moving with a negative velocity (say, S-, with signed velocity v(S-)=-v(S+)) gave a result E-. We could also construct an estimator A*, which first inverted the stimulus, and then applied A, and then inverted the result – so that A* would give the result of-(E-) for S+, and -(E+) for S-. If A is optimal, then (by symmetry) A* would also have to be optimal. But then it would seem that a new estimator, B=(A+A*)/2, would be better than both, unless A and A* are identical. The reason for this is that B removes any bias (towards either positive or negative velocities) that A or A* might have, and one can always decompose the error (across the entire ensemble) into the error for the summed velocity (v(S+)+v(S-)) and for the difference velocity (v(S+)-v(S-)), for each pair of stimuli. B reduces the error for the summed velocity to zero, and does not influence the error for the difference velocity. Is there a problem with this argument?

Assuming that the stimulus set is symmetric w.r.t. velocity, and, the above argument is correct, then it is hard to understand how linear addition of a signal Z that is generated in a spatially-symmetric fashion, including an autocorrelator, can improve the performance of an estimator if measured by covariance. But perhaps it might help by improving the correlation (see item 2 above) – and this should be clarified. On the other hand, using a symmetric signal Z in a nonlinear way (e.g., dividing by (1+Z2)) could improve the performance as measured by covariance or correlation, by reducing outliers. However, it is not clear that this is what the authors are doing.

In sum – the paper should clarify whether the stimulus set is symmetric with respect to velocity sign. If it is asymmetric, it should justify this asymmetry, and, be clear whether it is responsible for the utility of autocorrelators. If it is symmetric, then further explanation is needed as to why autocorrelators are useful.

4) Regarding the Discussion section concerning motion energy computations in vertebrates: In considering how the conclusions might apply to the motion energy model, it looks like the paper is suggesting that vertebrates might use different kinds of deviations from strict multiplication, because of the different linear structure of the motion energy model. In this regard, it would be useful to point out that cellular-level analysis in the macaque (Nitzany et al., Evolutionary convergence in computation of local motion signals in monkey and dragonfly. CoSyNe , 2014) shows that the consequences of these deviations is very similar in terms of the motion signal that is extracted – at least in terms of detection of three-point correlations, which are crucial to this paper.

5) Nesting of models. A diagram that shows the “nesting” relationships of the models – which ones are special cases of others, and that the HRC is a special case of all – would be helpful.

6) Static nonlinearities at the front end: the paper considers binarization and histogram-equalization. Might it be useful to add a front-end nonlinearity that converts the intensity distribution to a Gaussian? Kurtosis would be higher than either, but this is the distribution that maximizes information (entropy) for a given variance – so it would be interesting to see how it does.

7) The authors must make their program code for the results available either on the journal web site or on a publicly accessible data base. Please add details to the Method on this last point.

Reviewer #3:

1) At the end of the Introduction, the authors say that the perceptual measurements are consistent with only a subset of their models. It would be useful to be clear up front here about which models worked and which ones didn't and in what ways.

2) A comment about writing style. The paper frequently has expressions like “we hypothesize that biology tunes its motion estimators[…]” (subsection “Strategies for visual motion estimation”, first paragraph). Personally, I find it a bit grating to read this sort of broad (over-)generalization about “biology”. Are there motion detectors of some kind in plants, cyanobacteria and tube-worms at deep-sea vents? Maybe, but the authors are not saying anything that would convince me about how they work. There are many such occurrences of the “biology does X or Y” phrasing, which the authors would be well-advised to remove. (Another example is in the third paragraph, but there are many others.) In any case, the hypothesis of tuning of circuits to natural scenes statistics arising from ordinary behavior is hardly new, so a nod to the venerable history of this idea would be a good thing here.

3) In reading the third paragraph of the subsecton “Strategies for visual motion estimation” and Figure 1, I could not understand how the static front-end nonlinearity allowed the circuit compute higher order spatial correlations more easily. Or maybe that is not being implied, but it seemed to be. Later on it becomes more clear that the nonlinearity is supposed to remove kurtosis, but I was confused at this stage in the paper. More generally, it would be helpful to get some conceptual sense of why these particular models are considered – otherwise one gets a sense of a bit of a grab-bag, especially since many of these models could be combined with each other, no doubt giving improvements in each case.

4) In the subsection entitled “Each mechanisms outperforms the HRC[…]”, the authors show that their various models work better than the HRC. They then say that they can gain insight because their models are theoretical tractable. As an example, they say that their front-end nonlinearity does contrast equalization. But this is the first time they say anything about the nature of the nonlinearity, and they don't say why this kind of normalization helps. So I am afraid that the insight does not come through. Continuing in this vein, they say that “a large fraction of […] performance was afforded by a small number of correlation types”. But there has been no discussion in the text at this point of the possible correlation types and which ones are being used, so the comment remains opaque. Then they write “binarizing nonlinearities also offered certain […] advantages”. Again, this is the first time binarizing nonlinearities are mentioned and it is not clear what the advantages are. It seems clear from the opacity of the text that more details of the model variations should be described earlier, in order to make the paper easier to read. Looking at Figure 2 it is clear that everything outperforms the HRC. But the HRC really does poorly – a correlation coefficient of 0.25 or so. The best mechanism has a correlation coefficient of 0.5. This leads me to worry that none of these proposed mechanisms actually works that well. Also, a main message of this paper is that multi-point correlations are important to motion estimation. But it seems in this Figure that the front-end nonlinearity has the biggest effect. Why didn't the authors combine this nonlinearity with the other mechanisms they consider?

5) The authors cite their own work for the glider stimuli. Don't these stimuli and associated analysis come from the decades of work by J. Victor and collaborators? I might be mistaken about this – I am most familiar with the spatial stimuli created with gliders that I learned about from those papers. I understand how these stimuli are constructed from those works, but I suspect that the general reader will need a brief introduction at this stage in the paper, even though this is covered in previous works.

6) I appreciated the clear statement in the first paragraph of subsection “Model responses to glider stimuli” about what the HRC model fails to predict. In the next paragraph (and Figure 3) the authors discuss how the front-end nonlinearity increases accuracy compared to HRC broadly speaking, but fail to match the responses to negative 2-point correlations and some 3-point gliders. Then they show that models that explicitly compute higher-order correlations correctly predicted the sign of all glider responses, but did not predict the detailed response amplitudes. They try different architectures and find that several architectures show similar performance overall. Of course the output of any of these circuits need not be directly equal to the turning rate of the animal. Am I correct in understanding that the model is that a single gain parameter should relate all of the turning rates under different conditions to the output of the circuit, and that this single gain is fixed by normalizing the positive two-point glider? Also, in Figure 3, is the “equalized” model the same as the front-end nonlinearity model in Figure 1? In what sense is it “equalized”?

7) In the subsection “Improving motion estimation by accounting for natural light-dark asymmetries”, the authors describe results that show that accounting for bright-dark asymmetry improved the results of all of the mechanisms that got the sign of glider responses right. While the text has a technical discussion how various combinations of signals work, I did not understand from the text the conceptual reason why the bright-dark separation helps. At some broad level it is not surprising that adapting to the natural statistics helps with the detection of signals, and indeed it seems here that all the mechanisms are helped by incorporating bright-dark asymmetries. Is there a deeper insight here? Or maybe the point is that the authors are simply making a prediction that the higher order motion detection circuits in flies will be discovered to segregate ON and OFF pathways and then recombine them, independently of which detailed mechanism and nonlinearities are being used? Now in Figure 4A, most of the quadrant models seem to do worse than HRC, and the (— —) quadrant seems not to be significantly better. And all of these models have a very low correlation with velocity. In view of this, could the authors please clarify why they are saying that accounting for bright-dark asymmetry in their model improves things?

8) In the subsection “Improving motion estimation by reducing kurtosis” the authors explain that their front end nonlinearities improved motion estimation by reducing the kurtosis in the inputs, even though they did not help with predicting glider responses. Would it help to have these nonlinearities along with the correlation detectors and the ON-OFF segregation discussed in previous sections?

9) As written, the paper uses computational methods to specifically examine some possible mechanisms of motion estimation in flies in view of the inability of the traditional Reichardt model to explain how animals use higher order motion cues. I agree with the authors that the work has potentially broader significance beyond the specific example discussed here. But in order to engage that broader interest it would be really good if the authors would engage more substantially with the literature on higher order statistics in natural scenes and adaptation of circuit structure and perceptual phenomena to these statistics. At the moment the engagement is cursory. Some suggestions appear below. None of the papers mentioned below is specifically about motion estimation and the suggestion to engage a bit more with all this is not intended to detract from the novelty of the present work; rather it will likely help readers to find the work situated better against this well-known context.

A) Light-dark asymmetry of natural stimuli and consequences for visual cicrcuits:

(i)Ratliff, Charles P., et al. “Retina is structured to process an excess of darkness in natural scenes.” Proceedings of the National Academy of Sciences 107.40 (2010): 17368-17373, and references therein on light-dark statistics;

(ii) Komban, Stanley Jose, Jose-Manuel Alonso, and Qasim Zaidi. “Darks are processed faster than lights.” The Journal of Neuroscience 31.23 (2011): 8654-865, and references therein.

B) Kurtosis and higher moments of the distribution of light:

(i) Bonin, Vincent, Valerio Mante, and Matteo Carandini. “The statistical computation underlying contrast gain control.” The Journal of neuroscience 26.23 (2006): 6346-6353, and references therein.

C) Higher order spatial correlations and perception in the “glider” formulation used here:

(i) A long history of studies by Victor, and specifically the recent work: Tkačik, Gašper, et al. “Local statistics in natural scenes predict the saliency of synthetic textures.” Proceedings of the National Academy of Sciences 107.42 (2010): 18149-18154; Hermundstad, Ann M., et al. “Variance predicts salience in central sensory processing.” eLife 3 (2014): e03722, and references therein;

(ii) Many papers by the Simoncelli group about visual textures.

https://doi.org/10.7554/eLife.09123.019

Author response

Essential revisions:

The paper is currently written in such a way that only an exquisitely trained and alert specialist can appreciate these points. It is essential that the paper be reorganized so that there are clear questions set in context of the literature, and the paper and its logic become easier to follow for someone who is not working on motion detection in fly.

The current organization of the paper, indeed, is obscure. The paper starts by introducing 4 models that are elaborations of the Reichardt detector: (1) multiple channels that explicitly construct higher order correlation statistics, (2) A static front end nonlinearity, (3) separate correlations between lights and darks, (4) nonlinear combination through operations other than multiplication. This is a lot to take in, and only the most motivated readers will get past this point. Most readers, instead, will stop there. It would be much better to introduce the Reichardt detector (and ideally an energy model), show their failings (e.g. with glider stimuli) and then motivate ways in which they could be improved, and introduce a few of the alternative models (do we need all 4?). Otherwise, introducing the 4 models right away feels like introducing solutions in search of a problem.

In response to this suggestion and others below, we have fully restructured our paper. Instead of introducing all models simultaneously at the beginning of the paper, we now first discuss experimental and theoretical reasons that the pure HRC might be disfavored. This includes introducing the puzzling glider psychophysical results. We then show that front-end nonlinearities, while able to improve motion signals with natural inputs, are unable to recapitulate the glider response patterns. We next explore the ON/OFF model, which is probably most proximal to the field’s current conception of the circuit, and show that it improves natural motion estimation and matches experimental glider responses. However, it is important not to get stuck on the first model that fits the glider data reasonably well, because other aspects of Drosophila’s circuitry still conflict with the HRC’s hypotheses. We thus continue to explore the space of models with other plausible nonlinear modifications to the HRC: the non-multiplicative nonlinearity model, an unrestricted nonlinearity model, which is a circuit-motivated renaming of the former “explicit multipoint correlator model,” and an extra-input nonlinearity model. This last model is new to this revision, but we believe it is informative because it moves us further beyond the HRC and is motivated by EM reconstructions that show wide-field inputs to the fly’s motion detectors.

As hinted above, we now focus much more on explaining the biological motivation for each model. This is aided by a new table in the manuscript, which gives experimental justifications for each generalization. One of our goals in this paper is to move away from orthodox models by emphasizing the potential of more complex circuit computations. However, the models we consider are roughly hierarchical, moving to ever more general conceptualizations. This enables us to conclude the paper with our bottom line – the final model incorporates the variety of conceptual advances that were initially illustrated with specific models. This includes, but goes beyond, our point that many of our models improve natural motion estimation through asymmetric treatment of light and dark by the motion estimator. We believe this conceptual hierarchy of models clarifies why it is helpful to consider multiple models, rather than just one: the computational structure of the extra input nonlinearity model is too complicated to take in at once, and the other models in the hierarchy help to dissect it.

A key problem is that all the models considered seem to work better than the Reichardt model, with no compelling argument for one vs. the other. Possibly this should be regarded as generating a set of alternate hypotheses for physiologists and anatomists to investigate, but it would be useful to understand better what separates these models.

Our new structure, described above, provides a stronger rationale for each of the models, introducing them one by one and in a logical progression, from the most conceptually-proximal to the most generalized. Table 1 now gives experimental rationales for examining each model type. We do not intend to present our models as competitors. Rather, each illustrates something new about naturalistic motion estimation. As the reviewers say, each generates a set of hypotheses for experiments. Our new hierarchical organization should make this clearer.

Because of this, and because the style of writing is opaque, it is currently difficult to discern the “bottom line” in this paper. It reads as a survey of many possible models and some of their virtues and discontents.

It is essential for the authors to decide what exactly they want the paper to convey, and set out a clear set of questions in Introduction, answer them in Results, and return to them in Discussion. The current paper, by contrast, seems to change its mind along the way as to what those questions are, and the reader is left grasping for a specific set of questions that need addressing. For instance, the Discussion (second paragraph) emphasizes processing in segregated ON and OFF channels. Is this the main take-home message? If so, it should be clearly set out as a question in the Introduction.

As described above, we have restructured the paper to better tease out the bottom line of our paper. More generally, we have highlighted several points that are most critical to the paper: (a) In the motion estimation circuit, nonlinearities are a feature, not a bug, and can be tuned to improve motion estimation in natural scenes. Such tuning also accounts for puzzling psychophysical results. (b) We must explore the space of models to discern critical details from non-critical ones, and make sure our results aren’t the product of idiosyncracies of one particular model. (c) The critical detail of the models that reproduce the behavioral data is that they treat light and dark asymmetrically. (d) The models are conceptually hierarchical, and the most general model recapitulates the insights provided by its predecessors.

More generally, it seems essential that the authors introduce the problem in a broader context. As written, the paper uses computational methods to specifically examine some possible mechanisms of motion estimation in flies in view of the inability of the traditional Reichardt model to explain how animals use higher order motion cues. These are important issues, but to engage that broader interest it would be good to relate more substantially with the literature on higher order statistics in natural scenes and adaptation of circuit structure and perceptual phenomena to these statistics. Some suggestions appear in Reviewer 3's specific comments, appended below.

Thank you for this comment. Our revised manuscript rewrites the Introduction and parts of the Discussion to place our work in the broader contexts suggested by the reviewers.

The language needs to become clearer. For instance, at the end of the subsection “Model responses to glider stimuli”, you state, “utility of higher-order correlations for naturalistic motion processing is not restricted to a specific neural circuit implementation.” What does this mean?

In response to this comment we have tried to clarify and simplify our writing. (In this particular case, we meant that many models were sufficient to make use of HOCs to improve motion estimates. This is now clarified in the text.)

Also, while it is understandable that the authors may want to protect the reader from too much detail, at the moment the paper seems to put too much away into the appendices for the reader to be able to follow exactly what is going on. In turn, the material in the appendices is written in an extremely mathematical style, too much so for the readers of this journal – for example, “simply” is overused.

We have moved some of the material from the appendices to the main text to make the paper easier to follow. All mathematics remains in the appendices. We have gone through all of the appendices to clarify our arguments and simplify our prose. This word “simply” no longer appears. We have added explanatory prose to help ease the reader through the math. More importantly, we think it’s now possible for the reader to understand the paper without ever looking at the math, which is an important improvement.

In addition to these key issues of organization and style, there are also some conceptual aspects that need work:

1) The paper seems to concern only rigid, constant, fronto-parallel motion. While this limitation needs to be explicitly stated, it also increases the strength of the findings. That is, there are many ways in which real motion estimation differ from this simplified scenario: for example, there are objects that may be moving independently of the visual flow, and objects may move in depth. Getting rid of these confounds will further complicate algorithms for extraction of visual flow, and influence what is “optimal”. So it is really quite remarkable that even with these real-world complications neglected, the authors still find that extraction of motion is benefited by mechanisms that make use of high-order nonlinearities.

We agree that this is remarkable and interesting, and we have added a Discussion paragraph about this point.

2) The authors' choice of a way to evaluate performance, i.e. correlation of the inferred motion signal with the veridical motion, is problematic for two related reasons. Firstly, performance can be increased merely by attenuating large-magnitude outliers in the estimate (even simply by passing the output through a sigmoidal nonlinearity, independent of the stimulus). Secondly, the mathematical analysis is complicated because of the need to compute the denominator. There is no single best measure of performance, but for the above reasons, correlation is suboptimal. Two possible alternatives are: (a) to use the covariance (or, equivalently, mean-squared error) more extensively – this gets rid of the denominator problem, and may also simplify the understanding of the role of the autocorrelators; (b) to use mutual information between true and estimated velocity. The advantage of this is that it will not depend on any output transformation. There's a good argument that this is biologically appropriate, since after all, the fly does not need to know the velocity, it just needs to control its behavior – and there may well be nonlinear transformations that intervene between the motion signal output, and the motor command. In sum, it would be useful to know whether maximizing covariance, or maximizing information about velocity, would lead to the same conclusions as maximizing correlation. Note that the authors don't necessarily need to refit the models to maximize mutual information. Rather, all that is needed is to take the existing fitted models and calculate the mutual information from the scatter plot of true velocity vs. calculated velocity – a one dimensional calculation that can be done with binning in, say, 16 velocity bins of approximately equal occupancy.

We thank the reviewers for this comment. We emphasized the correlation coefficient because it is intuitive and has a general relationship with the mean- squared error. In particular, we’ve always been choosing parameters that minimize the mean-squared error, and we’re only using the correlation coefficient as a more intuitive metric to present those results. This was not clear in the previous draft. To make it clear in the new draft, we now introduce our optimization procedures in terms of the mean squared error, and we explicitly describe the relationship between the correlation coefficient and the mean squared error in the Methods and in Appendix 2. We also use this relationship to explicitly motivate our continued graphical reliance on the correlation coefficient. We hope that the situation is clear now. Simply put, everything in this paper is in terms of the mean squared error, and the correlation coefficient is merely an intuitive way to signify the mean squared error of correctly scaled models.

We have refrained from using mutual information to assess our motion estimators for both practical and conceptual reasons. Practically speaking, it is more difficult to tune high-dimensional models for mutual information than for the mean-squared error. The referees already allude to this practical problem by not asking us to refit the models. More importantly, we fear that the mutual information could be conceptually misleading in the present context. For example, the referees’ comment about autocorrelators clearly indicates that they would not consider the speed of motion to be a good estimator of its velocity. We agree. Nevertheless, the mutual information between the velocity of motion and its speed is the entropy of the velocity distribution minus one bit, where the “minus one” corresponds to the missing information about the motion’s direction. Thus, the mutual information between the speed and velocity can be huge for broad velocity distributions. For example, it can certainly be greater than 1 bit, in which case the mutual information metric would claim that the speed of motion is a better estimator of the velocity than the direction of motion. We think this is the wrong conclusion. As the referees say, what ultimately matters is how well the animal can control its behavior, and the direction of motion is clearly more useful for orienting behaviors than the speed of motion, independently of how broad the velocity distribution is. Furthermore, we would not consider the ensemble of raw photoreceptor signals to be a good estimator of the velocity of motion, but these signals have at least as much mutual information with the velocity as any motion estimator we can hope to dream up. Overall, we believe that both the mutual information and the mean squared error have their own domains of superiority, and we think that the mean squared error is a more appropriate fit for this paper. Both the mean squared error and the mutual information are standard metrics, and we hope the referees will permit us to emphasize the metric that we strongly prefer.

3) The benefits of the autocorrelator are not clear, despite the paper's lengthy material on this point.

Thank you for pointing out that this section was unclear. Your comments lead us to carefully revisit the utility of the autocorrelator, and these efforts have led us to clearer understanding. As described below, we still argue that autocorrelators have utility for motion estimation. We have fully rewritten the associated Appendix (now Appendix 10) to enhance its clarity in light of the referees’ specific concerns. Note that the new appendix discusses the mechanisms underlying the autocorrelator’s utility in a more general setting. The new paper organization leads us to emphasize a different computational example in Appendix 10, but we still comment on autocorrelators at the end. Below we will discuss Appendix 10 as if it had emphasized an autocorrelator example, because we want our response to most directly connect to the referees’ specific comments. We hope the combined concreteness and generality of the Appendix will help to clarify the situation for the referees and future readers.

To get off the ground: is it the case that the stimulus set is symmetric with respect to velocity – i.e., that each spatial profile is presented as moving both with positive and negative velocity? If this is not the case, then it would seem important to explain why this is justified, and of course, autocorrelators may help. And unless this asymmetry is present in natural stimuli, it would seem to be a flaw in the analysis that needs to be fixed.

The previous ensemble was statistically symmetric, in the sense that each spatial profile was equally likely to be presented with velocity v and –v. However, it is conceivable that our finite ensemble of simulated motions was too small to have achieved the desired level of left-right symmetry. To eliminate this possibility, we generated a new ensemble that explicitly enforced left-right symmetry: whenever a pattern with velocity v was randomly sampled, we reflected the pattern and presented it with velocity –v. We numerically confirmed that mirror-paired photoreceptor signals were exactly equal between the two simulations. This change didn’t affect the results, which suggests that our original ensemble was large enough to safely consider as left-right symmetric. Nevertheless, we now use the new ensemble with enforced left-right symmetry throughout the paper.

Indeed, perhaps the stimulus set is actually not symmetric with respect to velocity. This is suggested by the sentence following Eq. 7 of Appendix 4: “The performance of the combined motion estimator is superior to the original direction selective estimator whenever the latter [i.e., the direction selective estimator] has a nonzero correlation with the autocorrelator.”

It is mathematically possible for the autocorrelator to be correlated with a direction- selective motion estimator (e.g. the HRC) without being correlated with velocity of motion. Indeed, this is what happens. The Appendix now discusses this initially counter-intuitive point at greater length. Basically, this result occurs because the HRC’s response depends on the pattern that is moving, as does the autocorrelator’s response. Because image-induced variability is partially shared between the HRC and the autocorrelator, the autocorrelator can help to eliminate image-induced noise from the HRC, thereby improving the motion estimate.

Let's say a particular estimator A that was optimal had the following behavior. For a particular stimulus moving with a positive velocity (say, S+, with signed velocity v(S+)) gave a result E+, and for the same stimulus moving with a negative velocity (say, S-, with signed velocity v(S-)=-v(S+)) gave a result E-. We could also construct an estimator A*, which first inverted the stimulus, and then applied A, and then inverted the result – so that A* would give the result of-(E-) for S+, and -(E+) for S-. If A is optimal, then (by symmetry) A* would also have to be optimal. But then it would seem that a new estimator, B=(A+A*)/2, would be better than both, unless A and A* are identical. The reason for this is that B removes any bias (towards either positive or negative velocities) that A or A* might have, and one can always decompose the error (across the entire ensemble) into the error for the summed velocity (v(S+)+v(S-)) and for the difference velocity (v(S+)-v(S-)), for each pair of stimuli. B reduces the error for the summed velocity to zero, and does not influence the error for the difference velocity. Is there a problem with this argument?

We think this argument is elegant and 100% correct.

Assuming that the stimulus set is symmetric w.r.t. velocity, and, the above argument is correct, then it is hard to understand how linear addition of a signal Z that is generated in a spatially-symmetric fashion, including an autocorrelator, can improve the performance of an estimator if measured by covariance.

Your argument helped us to realize that autocorrelators are only useful when they are combined in a mirror anti-symmetric way across space, but this is what they had been doing all along. For example, in Appendix 10 we discuss the usefulness of the autocorrelator: (f*V_1)(g*V_1) – (f*V_2)(g*V_2). When the referees consider this specific form of autocorrelator, we think they will understand why its utility doesn’t contradict their argument. Nevertheless, its utility might still be opaque, and we hope the material Appendix 10 will make its utility easier to understand.

But perhaps it might help by improving the correlation (see item 2 above) – and this should be clarified.

As described above, we do not think that our choice of error function is a cause for concern.

On the other hand, using a symmetric signal Z in a nonlinear way (e.g., dividing by (1+Z2)) could improve the performance as measured by covariance or correlation, by reducing outliers. However, it is not clear that this is what the authors are doing.

Part of the confusion regarding “what the authors are doing” probably relates to the fact that the old version of the Appendix had some discussion of autocorrelators that went beyond trying to explain the specific results that we observed. For example, we described some models that utilized autocorrelation functions in a nonlinear way. We have now deleted that material from the Appendix entirely.

In sum – the paper should clarify whether the stimulus set is symmetric with respect to velocity sign. If it is asymmetric, it should justify this asymmetry, and, be clear whether it is responsible for the utility of autocorrelators. If it is symmetric, then further explanation is needed as to why autocorrelators are useful.

We hope Appendix 10 has succeeded in this regard.

4) Regarding the Discussion section concerning motion energy computations in vertebrates: In considering how the conclusions might apply to the motion energy model, it looks like the paper is suggesting that vertebrates might use different kinds of deviations from strict multiplication, because of the different linear structure of the motion energy model. In this regard, it would be useful to point out that cellular-level analysis in the macaque (Nitzany et al., Evolutionary convergence in computation of local motion signals in monkey and dragonfly. CoSyNe , 2014) shows that the consequences of these deviations is very similar in terms of the motion signal that is extracted – at least in terms of detection of three-point correlations, which are crucial to this paper.

We thank the reviewers for this citation, which is now referenced appropriately in that Discussion paragraph.

5) Nesting of models. A diagram that shows the “nesting” relationships of the models – which ones are special cases of others, and that the HRC is a special case of all – would be helpful.

This was a fun diagram to think about, and we agree it is helpful. We have included it in Figure 5, where we discuss the most general model, and its relationship to the others. In fact, this conception of the hierarchy of models was key to reorganizing the paper.

6) Static nonlinearities at the front end: the paper considers binarization and histogram-equalization. Might it be useful to add a front-end nonlinearity that converts the intensity distribution to a Gaussian? Kurtosis would be higher than either, but this is the distribution that maximizes information (entropy) for a given variance – so it would be interesting to see how it does.

We simulated this Gaussianizing nonlinearity and found that it performed less well in natural scenes than the binarizing or equalizing nonlinearities. Like the other front-end nonlinearities, it didn’t generate glider responses that matched the data. This information is fully included in the results section about front-end nonlinearities and in Figure 2.

7) The authors must make their program code for the results available either on the journal web site or on a publicly accessible data base. Please add details to the methods on this last point.

We agree to provide code that will be posted on the journal website. However, we want to first ensure that the referees and editors are satisfied by our new paper organization. The process of assembling the code is time-consuming, so we want to do it only once in a way that reflects the final published paper’s organization.

Reviewer #3:

We thank Reviewer 3 for these specific comments. We have addressed them all in our revision. In particular, we have included an Introduction paragraph that puts our work in the broader context of natural scene statistics and visual processing.

1) At the end of the Introduction, the authors say that the perceptual measurements are consistent with only a subset of their models. It would be useful to be clear up front here about which models worked and which ones didn't and in what ways.

We think this should be clear in the revised manuscript, because we now go through the models more sequentially.

2) A comment about writing style. The paper frequently has expressions like “we hypothesize that biology tunes its motion estimators[…]” (subsection “Strategies for visual motion estimation”, first paragraph). Personally, I find it a bit grating to read this sort of broad (over-)generalization about “biology”. Are there motion detectors of some kind in plants, cyanobacteria and tube-worms at deep-sea vents? Maybe, but the authors are not saying anything that would convince me about how they work. There are many such occurrences of the “biology does X or Y” phrasing, which the authors would be well-advised to remove. (Another example is in the third paragraph, but there are many others.) In any case, the hypothesis of tuning of circuits to natural scenes statistics arising from ordinary behavior is hardly new, so a nod to the venerable history of this idea would be a good thing here.

We agree with this comment and have removed references to what biology does. In a few places, we have emphasized that evolution might tune specific parameters, and we have added appropriate citations for this idea.

3) In reading the third paragraph of the subsecton “Strategies for visual motion estimation” and Figure 1, I could not understand how the static front-end nonlinearity allowed the circuit compute higher order spatial correlations more easily. Or maybe that is not being implied, but it seemed to be. Later on it becomes more clear that the nonlinearity is supposed to remove kurtosis, but I was confused at this stage in the paper. More generally, it would be helpful to get some conceptual sense of why these particular models are considered – otherwise one gets a sense of a bit of a grab-bag, especially since many of these models could be combined with each other, no doubt giving improvements in each case.

In the new structure of the paper, we believe it is clearer why we chose these particular models. We have added Table 1, which gives clear experimental rationales (and citations) for each of the models and generalized models we consider. We have also changed our Introduction to give more background for why these models are chosen.

4) In the subsection entitled “Each mechanisms outperforms the HRC[…]”, the authors show that their various models work better than the HRC. They then say that they can gain insight because their models are theoretical tractable. As an example, they say that their front-end nonlinearity does contrast equalization. But this is the first time they say anything about the nature of the nonlinearity, and they don't say why this kind of normalization helps. So I am afraid that the insight does not come through. Continuing in this vein, they say that “a large fraction of […] performance was afforded by a small number of correlation types”. But there has been no discussion in the text at this point of the possible correlation types and which ones are being used, so the comment remains opaque. Then they write “binarizing nonlinearities also offered certain […] advantages”. Again, this is the first time binarizing nonlinearities are mentioned and it is not clear what the advantages are. It seems clear from the opacity of the text that more details of the model variations should be described earlier, in order to make the paper easier to read. Looking at Figure 2 it is clear that everything outperforms the HRC. But the HRC really does poorly – a correlation coefficient of 0.25 or so. The best mechanism has a correlation coefficient of 0.5. This leads me to worry that none of these proposed mechanisms actually works that well. Also, a main message of this paper is that multi-point correlations are important to motion estimation. But it seems in this Figure that the front-end nonlinearity has the biggest effect. Why didn't the authors combine this nonlinearity with the other mechanisms they consider?

To address this comment, we have first clarified our prose surrounding the introduction of the front-end nonlinearities, and we describe more details earlier in the new structure of the manuscript.

The second point about the low correlation coefficient of the HRC is worth exploring. Our models perform no averaging in time or space, so their performance is worse than one could do if one pooled signals over space. The HRC performance we measure is reminiscent of the noisy performance reported in Dror et al. (2001). The fact that the HRC doesn’t work very well for natural inputs is part of the reason that we’re looking for alternatives.

It is true that the front-end nonlinearity had the largest effect of any nonlinearity in improving natural motion estimates. Yet it poorly predicted the psychophysics. We agree that stringing models together may improve motion estimates, but our goal here was to first understand how the different mechanisms work in isolation. We believe that combining these nonlinearities is fodder for future work, but it is not appropriate in this paper, in part because of the combinatorial explosion expected when we start combining models, and in part because combining models makes it far more difficult to understand how they function to improve estimates. That said, we do now emphasize the hierarchical nature of our models, and the final figure of the paper should help future reader think about how these models might be combined.

It’s also interesting to note that the front-end nonlinearities considered here eliminate the light-dark asymmetry from their natural inputs. Thus, such non- linearly preprocessed signals cannot combine effectively with some of the other mechanisms that we describe, such as ON/OFF processing. We now point this out in the text. Our overall view is that tuning frontend nonlinearities for motion estimation is a good way to improve estimation performance, but it doesn’t provide a good way of understanding Drosophila’s computation and circuitry.

5) The authors cite their own work for the glider stimuli. Don't these stimuli and associated analysis come from the decades of work by J. Victor and collaborators? I might be mistaken about this – I am most familiar with the spatial stimuli created with gliders that I learned about from those papers. I understand how these stimuli are constructed from those works, but I suspect that the general reader will need a brief introduction at this stage in the paper, even though this is covered in previous works.

We have introduced the gliders more prominently in the new structure of the manuscript (Figure 1). We continue to cite Hu and Victor prominently as the originator of these spatiotemporal stimuli. We also now cite some of Victor’s references containing spatial gliders in the Introduction.

6) I appreciated the clear statement in the first paragraph of subsection “Model responses to glider stimuli” about what the HRC model fails to predict. In the next paragraph (and Figure 3) the authors discuss how the front-end nonlinearity increases accuracy compared to HRC broadly speaking, but fail to match the responses to negative 2-point correlations and some 3-point gliders. Then they show that models that explicitly compute higher-order correlations correctly predicted the sign of all glider responses, but did not predict the detailed response amplitudes. They try different architectures and find that several architectures show similar performance overall. Of course the output of any of these circuits need not be directly equal to the turning rate of the animal. Am I correct in understanding that the model is that a single gain parameter should relate all of the turning rates under different conditions to the output of the circuit, and that this single gain is fixed by normalizing the positive two-point glider? Also, in Figure 3, is the “equalized” model the same as the front-end nonlinearity model in Figure 1? In what sense is it “equalized”?

In response to the first question, the answer is yes: we are assuming that the behavior is proportional to the model output. This is in line with typical models of the optomotor response, which were first modeled as proportional to the HRC output.

In response to the second question, the equalized model is a specific front-end nonlinearity that transforms the natural input distribution into a uniform distribution. This is in contrast to the binarizing nonlinearity, which turns the natural input distribution into a binary distribution. We have clarified this point when we introduce the front-end nonlinearity models. Furthermore, the new paper architecture avoids the specific issue that the referee is referring to.

7) In the subsection “Improving motion estimation by accounting for natural light-dark asymmetries”, the authors describe results that show that accounting for bright-dark asymmetry improved the results of all of the mechanisms that got the sign of glider responses right. While the text has a technical discussion how various combinations of signals work, I did not understand from the text the conceptual reason why the bright-dark separation helps. At some broad level it is not surprising that adapting to the natural statistics helps with the detection of signals, and indeed it seems here that all the mechanisms are helped by incorporating bright-dark asymmetries. Is there a deeper insight here? Or maybe the point is that the authors are simply making a prediction that the higher order motion detection circuits in flies will be discovered to segregate ON and OFF pathways and then recombine them, independently of which detailed mechanism and nonlinearities are being used? Now in Figure 4A, most of the quadrant models seem to do worse than HRC, and the (— —) quadrant seems not to be significantly better. And all of these models have a very low correlation with velocity. In view of this, could the authors please clarify why they are saying that accounting for bright-dark asymmetry in their model improves things?

We apologize that this section was unclear. In the new manuscript, the presentation of this material is very different from the previous draft. For example, the results that were compressed into Figure 4 are now more distributed throughout the narrative, which affords us the opportunity to discuss how each model incorporates light and dark information differently than prior models. We hope that our presentation is substantially clearer this time around.

When we say that accounting for light-dark asymmetry is helpful in the weighted 4- quadrant model, we meant to emphasize that the four quadrants can be combined in a better manner than the HRC predicts. We did not mean to say that the benefits of ON/OFF processing could be achieved by the isolated quadrants.

In terms of whether there is a deeper insight, we now include some discussion of the many different ways that the models utilized light-dark asymmetries. For example, we discuss how simple ON/OFF misses certain useful cues that are captured by other models. We also try to emphasize a broader set of conceptual points that go beyond the asymmetry between light and dark.

8) In the subsection “Improving motion estimation by reducing kurtosis\ the authors explain that their front end nonlinearities improved motion estimation by reducing the kurtosis in the inputs, even though they did not help with predicting glider responses. Would it help to have these nonlinearities along with the correlation detectors and the ON-OFF segregation discussed in previous sections?

As discussed previously (see response to comment 4), the frontend nonlinearities that we use will eliminate the benefit of subsequent ON/OFF processing. It is less obvious how effectively frontend nonlinearities could be combined with the non- multiplicative nonlinearity model. However, the situation will certainly be much more complicated, because the signal and noise of the non-multiplicative nonlinearity model depend on more than the second and fourth-order statistics of the image ensemble. We point out in our discussion that several of these models could be combined in interesting ways, though combinations would change the optimal weightings. We consider pursuing this to be beyond the scope of the current work.

9) As written, the paper uses computational methods to specifically examine some possible mechanisms of motion estimation in flies in view of the inability of the traditional Reichardt model to explain how animals use higher order motion cues. I agree with the authors that the work has potentially broader significance beyond the specific example discussed here. But in order to engage that broader interest it would be really good if the authors would engage more substantially with the literature on higher order statistics in natural scenes and adaptation of circuit structure and perceptual phenomena to these statistics. At the moment the engagement is cursory. Some suggestions appear below. None of the papers mentioned below is specifically about motion estimation and the suggestion to engage a bit more with all this is not intended to detract from the novelty of the present work; rather it will likely help readers to find the work situated better against this well-known context.

We thank the reviewer for the suggestion to add broader context. We have modified the Introduction and Discussion sections substantially to situate the reader within the broader context of natural scene regularities and their influence on visual processing. Through these changes, we have included many of the references suggested.

https://doi.org/10.7554/eLife.09123.020

Article and author information

Author details

  1. James E Fitzgerald

    Center for Brain Science, Harvard University, Cambridge, United States
    Contribution
    JEF, Conceived of research, designed research, performed research analyzed data, and wrote paper
    For correspondence
    jamesfitzgerald@fas.harvard.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0002-0949-4188
  2. Damon A Clark

    1. Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, United States
    2. Department of Physics, Yale University, New Haven, United States
    Contribution
    DAC, Conceived of research, designed research, and wrote paper
    For correspondence
    damon.clark@yale.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0001-8487-700X

Funding

Swartz Foundation (Postdoctoral Fellowship)

  • James E Fitzgerald

Searle Scholars Program (Scholar Award)

  • Damon A Clark

Richard and Susan Smith Family Foundation (Scholar Award)

  • Damon A Clark

Alfred P. Sloan Foundation (Sloan Research Fellowship in Neuroscience)

  • Damon A Clark

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors thank Ruben Portugues, Haim Sompolinsky, and Florian Engert for helpful conversations. JEF acknowledges fellowship support from the Swartz Foundation. DAC acknowledges support from a Searle Scholar Award, a Sloan Research Fellowship, and the Smith Family Foundation.

Reviewing Editor

  1. Matteo Carandini, Reviewing Editor, University College London, United Kingdom

Publication history

  1. Received: June 1, 2015
  2. Accepted: October 23, 2015
  3. Accepted Manuscript published: October 24, 2015 (version 1)
  4. Version of Record published: November 27, 2015 (version 2)

Copyright

© 2015, Fitzgerald and Clark

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,981
    Page views
  • 482
    Downloads
  • 6
    Citations

Article citation count generated by polling the highest count across the following sources: Scopus, PubMed Central, Crossref.

Comments

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Cell Biology
    2. Developmental Biology and Stem Cells
    Amy E Brinegar et al.
    Research Article Updated