Shallow neural networks trained to detect collisions recover features of visual loomselective neurons
Abstract
Animals have evolved sophisticated visual circuits to solve a vital inference problem: detecting whether or not a visual signal corresponds to an object on a collision course. Such events are detected by specific circuits sensitive to visual looming, or objects increasing in size. Various computational models have been developed for these circuits, but how the collisiondetection inference problem itself shapes the computational structures of these circuits remains unknown. Here, inspired by the distinctive structures of LPLC2 neurons in the visual system of Drosophila, we build anatomicallyconstrained shallow neural network models and train them to identify visual signals that correspond to impending collisions. Surprisingly, the optimization arrives at two distinct, opposing solutions, only one of which matches the actual dendritic weighting of LPLC2 neurons. Both solutions can solve the inference problem with high accuracy when the population size is large enough. The LPLC2like solutions reproduces experimentally observed LPLC2 neuron responses for many stimuli, and reproduces canonical tuning of loom sensitive neurons, even though the models are never trained on neural data. Thus, LPLC2 neuron properties and tuning are predicted by optimizing an anatomicallyconstrained neural network to detect impending collisions. More generally, these results illustrate how optimizing inference tasks that are important for an animal’s perceptual goals can reveal and explain computational properties of specific sensory neurons.
Editor's evaluation
This paper trains a simple neural network model to perform a behaviorally important task: the detection of looming objects. Two solutions emerge, one of which shares several properties with the actual circuit. This is a nice demonstration that training a CNN on a behaviorallyrelevant task can reveal how the underlying computations work.
https://doi.org/10.7554/eLife.72067.sa0Introduction
For animals living in dynamic visual environments, it is important to detect the approach of predators or other dangerous objects. Many species, from insects to humans, rely on a range of visual cues to identify approaching, or looming, objects (Regan and Beverley, 1978; Sun and Frost, 1998; Gabbiani et al., 1999; Card and Dickinson, 2008; Münch et al., 2009; Temizer et al., 2015). Looming objects create characteristic visual flow fields. When an object is on a straightline collision course with an animal, its edges will appear to the observer to expand radially outward, gradually occupying a larger and larger portion of the visual field (Video 1). An object heading toward the animal, but which will not collide with it, also expands to occupy an increasing portion of the visual field, but its edges do not expand radially outwards with respect to the observer. Instead, they expand with respect to the object’s center so that opposite edges can move in the same direction across the retina (Video 2). A collision detector must distinguish between these two cases, while also avoiding predicting collisions in response to a myriad of other visual flow fields, including those created by an object moving away (Video 3) or by the animal’s own motion (Video 4). Thus, loom detection can be framed as a visual inference problem.
Many sighted animals solve this inference problem with high precision, thanks to robust loomselective neural circuits evolved over hundreds of millions of years. The neuronal mechanisms for response to looming stimuli have been studied in a wide range of vertebrates, from cats and mice to zebrafish, as well as in humans (King et al., 1992; HervaisAdelman et al., 2015; Ball and Tronick, 1971; Liu et al., 2011; Salay et al., 2018; Liu et al., 2011; Shang et al., 2015; Wu et al., 2005; Temizer et al., 2015; Dunn et al., 2016; Bhattacharyya et al., 2017). In invertebrates, detailed anatomical, neurophysiological, behavioral, and modeling studies have investigated loom detection, especially in locusts and flies (Oliva and Tomsic, 2014; Sato and Yamawaki, 2014; Santer et al., 2005; Rind and Bramwell, 1996; Card and Dickinson, 2008; de Vries and Clandinin, 2012; Muijres et al., 2014; Klapoetke et al., 2017; von Reyn et al., 2017; Ache et al., 2019). An influential mathematical model of loom detection was derived by studying the responses of the giant descending neurons of locusts (Gabbiani et al., 1999). This model established a relationship between the timing of the neurons’ peak responses and an angular size threshold for the looming object. Similar models have been applied to analyze neuronal responses to looming signals in flies, where genetic tools make it possible to precisely dissect neural circuits, revealing various neuron types that are sensitive to looming signals (von Reyn et al., 2017; Ache et al., 2019; Morimoto et al., 2020).
However, these computational studies did not directly investigate the relationship between the structure of the loomsensitive neural circuits and the inference problem they appear to solve. On the one hand, the properties of many sensory circuits appear specifically tuned to the tasks that they are executing (Turner et al., 2019). In particular, by taking into account relevant behaviors mediated by specific sensory neurons, experiments can provide insight into their tuning properties (Krapp and Hengstenberg, 1996; Sabbah et al., 2017). On the other hand, computational studies that have trained artificial neural networks to solve specific visual and cognitive tasks, such as object recognition or motion estimation, have revealed response patterns similar to the corresponding biological circuits (Yamins et al., 2014; Yamins and DiCarlo, 2016; Richards et al., 2019) or even individual neurons (Mano et al., 2021). Thus, here we ask whether we can reproduce the properties associated with neural loom detection simply by optimizing shallow neural networks for collision detection.
The starting point for our computational model of loom detection is the known neuroanatomy of the visual system of the fly. In particular, the loomsensitive neuron LPLC2 (lobula plate/lobula columnar, type 2) has been studied in detail (Wu et al., 2016). These neurons tile visual space, sending their axons to descending neurons called the giant fibers (GFs), which trigger the fly’s jumping and takeoff behaviors (Tanouye and Wyman, 1980; Card and Dickinson, 2008; von Reyn et al., 2017; Ache et al., 2019). Each LPLC2 neuron has four dendritic branches that receive inputs at the four layers of the lobula plate (LP) (Figure 1A; Maisak et al., 2013; Klapoetke et al., 2017). The retinotopic LP layers host the axon terminals of motion detection neurons, and each layer uniquely receives motion information in one of the four cardinal directions (Maisak et al., 2013). Moreover, the physical extensions of the LPLC2 dendrites align with the preferred motion directions in the corresponding LP layers (Figure 1B; Klapoetke et al., 2017). These dendrites form an outward radial structure, which matches the moving edges of a looming object that expands from the receptive field center (Figure 1C). Common stimuli such as the widefield motion generated by movement of the insect only match part of the radial structure, and strong inhibition for inwarddirected motion suppresses responses to such stimuli. Thus, the structure of the LPLC2 dendrites favors responses to visual stimuli with edges moving radially outwards, corresponding to objects approaching the receptive field center.
The focus of this paper is to investigate how loom detection in LPLC2 can be seen as the solution to a computational inference problem. Can the structure of the LPLC2 neurons be explained in terms of optimization—carried out during the course of evolution—for the task of predicting which trajectories will result in collisions? How does coordination among the population of more than 200 LPLC2 neurons tiling a fly’s visual system affect this optimization? To answer these questions, we built simple anatomicallyconstrained neural network models, which receive motion signals in the four cardinal directions. We trained the model using artificial stimuli to detect visual objects on a collision course with the observer. Surprisingly, optimization finds two distinct types of solutions, with one resembling the LPLC2 neurons and the other having a very different configuration. We analyzed how each of these solutions detects looming events and where they show distinct individual and population behaviors. When tested on visual stimuli not in the training data, the optimized solutions with filters that resemble LPLC2 neurons exhibit response curves that are similar to those of LPLC2 neurons measured experimentally (Klapoetke et al., 2017). Importantly, although it only receives motion signals, the optimized model shows characteristics of an angular size encoder, which is consistent with many biological loom detectors, including LPLC2 (Gabbiani et al., 1999; von Reyn et al., 2017; Ache et al., 2019). Our results show that optimizing a neural network to detect looming events can give rise to the properties and tuning of LPLC2 neurons.
Results
A set of artificial visual stimuli is designed for training models
Our goal is to compare computational models trained to perform loomdetection with the biological computations in LPLC2 neurons. We first created a set of stimuli to act as training data for the inference task (Materials and methods). We considered the following four types of motion stimuli: loomandhit (abbreviated as hit), loomandmiss (miss), retreat, and rotation (Figure 2). The hit stimuli consist of a sphere that moves in a straight line towards the origin on a collision course (Figure 3, Video 1). The miss stimuli consist of a sphere that moves in a straight line toward the origin but misses it (Figure 3, Video 2). The retreat stimuli consist of a sphere moving in a straight line away from the origin (Figure 3, Video 3). The rotation stimuli consist of objects rotating about an axis that goes through the origin (Figure 3, Video 4). All stimuli were designed to be isotropic; the first three stimuli could have any orientation in space, while the rotation could be about any axis through the origin (Figure 2). All trajectories were simulated in the frame of reference of the fly at the origin, with distances measured with respect to the origin. For simplicity, the fly is assumed to be a point particle with no volume (Red dots in Figure 2 and the apexes of the cones in Figure 3). For hit, miss, and retreat stimuli, the spherical object has unit radius, and for the case of rotation, there were 100 objects of various radii scattered isotropically around the fly (Figure 3).
An anatomically constrained mathematical model
We designed and trained simple, anatomically constrained neural networks (Figure 4) to infer whether or not a moving object will collide with the fly. The features of these networks were designed to mirror anatomical features of the fly’s LPLC2 neurons (Figure 1). We will consider two types of single units in our models (Figure 4): a linear receptive field (LRF) unit and a rectified inhibition (RI) unit. Both types of model units receive input from a 60 degree diameter cone of visual space, represented by white cones and grey circles in Figure 3, approximately the same size as the receptive fields measured in LPLC2 (Klapoetke et al., 2017). The four stimulus sets were projected into this receptive field for training and evaluating the models. The inputs to the units are local directional signals computed in the four cardinal directions at each point of the visual space: downward, upward, rightward, and leftward (Figure 3). These represent the combined motion signals from T4 and T5 neurons in the four layers of the lobula plate (Maisak et al., 2013). They are computed as the nonnegative components of a HassensteinReichardt correlator model (Hassenstein and Reichardt, 1956) in both horizontal and vertical directions (Figure 3—figure supplement 1; Materials and methods). The motion signals are computed with a spacing of 5 degrees, roughly matching the spacing of the ommatidia and processing columns in the fly eye (Stavenga, 2003).
For the LRF model unit (Figure 4A), there is a single set of four realvalued filters, the elements of which can be positive (excitatory, red) or negative (inhibitory, blue). The four filters integrate motion signals from the four cardinal directions, respectively, and their outputs are summed and rectified to generate the output of a single model unit (Figure 4A). These spatial filters effectively represent excitatory inputs to LPLC2 directly from T4 and T5 in the LP (positive elements), and inhibitory inputs mediated by local interneurons (negative elements) (Mauss et al., 2015; Klapoetke et al., 2017). All filters act on the 60 degree receptive field of a unit. A 90 degree rotational symmetry is imposed on the filters, so that the filters in each layer are identical. Moreover, each filter is symmetric about the axis of motion (Materials and methods). Although these symmetry assumptions are not necessary and may be learned from training (Figure 5—figure supplement 3), they greatly reduce the number of parameters in the models. No further assumptions were made about the structure of the filters. Note that the opposing spatial patterns of the excitatory and inhibitory components in Figure 4 are only for illustration purposes, and are not imposed on the models. The LRF model unit is equivalent to a linearnonlinear model and constitutes one of the simplest possible models for the LPLC2 neurons. In this work, we will focus most of our analysis on this simplest form of the model.
In addition, we will also consider a more complex model unit, which we call the rectified inhibition (RI) model unit (Figure 4B). In this unit, all the same filter symmetries are enforced, but there are two sets of nonnegative filters: a set of excitatory filters and a set of inhibitory filters. The RI unit incorporates a fundamental difference between the excitatory and inhibitory filters: while the integrated signals from each excitatory filter are sent directly to the downstream computations, the integrated signals from each inhibitory filter are rectified before being sent downstream. The outputs of the eight filters are summed and rectified to generate the output of a single model unit in response to a given stimulus (Figure 4B). If one removes the rectification of the inhibitory filters (as is possible with an appropriate choice of parameters), the RI unit becomes equivalent to the LRF unit (Materials and methods). This difference between the two model units reflects different potential constraints on the inhibitory inputs to an actual LPLC2 neuron. While the excitatory inputs to LPLC2 are direct connections, the inhibitory inputs are mediated by inhibitory interneurons (LPi) between LP layers (Mauss et al., 2015; Klapoetke et al., 2017). The LRF model unit assumes that LPi interneurons are linear and do not strongly rectify signals, while the RI model unit approximates rectified transmission by LPi interneurons.
In the fly brain, a population of LPLC2 neurons converges onto the GFs (Figure 1D). Accordingly, in our model there are $M$ replicates of model units, with orientations that are spread uniformly over the $4\pi $ steradians of the unit sphere (Figure 4C and D, Figure 4—figure supplement 1, Materials and methods). In this way, the receptive fields of the $M$ units roughly tile the whole angular space, with or without overlap, depending on the value of $M$. The sum of the responses of the $M$ model units is fed into a sigmoid function to generate the predicted probability of collision for a given trajectory (Materials and methods). The loss function then is defined as the cross entropy between the predicted probabilities and the stimulus labels (hits are labeled one and all others are labeled 0).
Optimization finds two distinct solutions to the loominference problem
The objective of this study is to investigate how the binary classification task shapes the structure of the filters, and how the number of units $M$ affects the results. We begin with the simplest LRF model, which possesses only a single unit, $M=1$. After training with 200 random initializations of the filters, we find that the converged solutions fall into three broad categories (Figure 5, Figure 5—figure supplement 1). Two solutions have spatial structures that are—surprisingly—roughly opposite from one another (magenta and green). Based on the configurations of the positivevalued elements (stronger excitation) of the filters (Materials and methods), we call one solution type outward solutions (magenta) and the other type inward solutions (green) (Figure 5C, Figure 5—figure supplement 1). In this singleunit model, the inward solutions have higher area under the curve (AUC) scores for both receiver operating characteristic (ROC) and precision recall curves, and thus perform better than the outward solutions on the discrimination task (Figure 5D). A third category of solution has all the elements in the filters very close to zero (zero solutions in black squares, Figure 5—figure supplement 1). This solution appears roughly 5–15% of the time, and appears to be a local minimum of the optimization, dependent on the random initialization. This uninteresting category of solutions is ignored in subsequent analyses.
As the number of units $M$ increases, the population of units covers more angular space, and when $M$ is large enough ($M\ge 16$), the receptive fields of the units begin to overlap with one another (Figure 6A). In the fly visual system there are over 200 LPLC2 neurons across both eyes (Ache et al., 2019), which corresponds to a dense distribution of units. This is illustrated by the third row in Figure 6A, where $M=256$. When $M$ is large, objects approaching from any direction are detectable, and such object signals can be detected simultaneously by many neighboring units. The two oppositely structured solutions persist, regardless of the value of $M$ (Figure 6, Figure 6—figure supplement 1, Figure 6—figure supplement 2, Figure 6—figure supplement 3). Strikingly, the inhibitory component of the outward solutions becomes broader as $M$ increases and expands to extend across the entire receptive field (Figure 6B). This broad inhibition is consistent with the large receptive field of LPi neurons suggested by experiments (Mauss et al., 2015; Klapoetke et al., 2017). The outward, inward, and zero solutions all also appear in trained solutions of the RI models. (Figure 5—figure supplement 2, Figure 6—figure supplement 3).
Units with outwardoriented filters are activated by motion radiating outwards from the center of the receptive field, such as the hit event illustrated in Figure 3. These excitatory components resemble the dendritic structures of the actual LPLC2 neurons observed in experiments, where for example, the rightward motionsensitive component (LP2) occupies mainly the right side of the receptive field. In the outward solutions of the LRF models, the rightward motionsensitive inhibitory components mainly occupy the left side of the receptive field (Figure 6B, Figure 6—figure supplement 2). This is also consistent with the properties of the lobula plate intrinsic (LPi) interneurons, which project inhibitory signals roughly retinotopically from one LP layer to the adjacent layer with opposite directional tuning (Mauss et al., 2015; Klapoetke et al., 2017).
The unexpected inwardoriented filters have the opposite structure. In the inward solutions, the rightward sensitive excitatory component occupies the left side of the receptive field, and the inhibitory component occupies the right side. Such weightings make the model selective for motion converging toward the receptive field center, such as the retreat event shown in Figure 3. This is a puzzling structure for a loom detector, and warrants a more detailed exploration of the response properties of the inward and outward solutions.
Units with outward and inward filters respond to hits originating in distinct regions
To understand the differences between the two types of solutions and why the inward ones can predict collisions, we investigated how units respond to hit stimuli originating at different angles $\theta $ (Figure 7A). When there is no signal, the baseline activity of outward units is zero; however, the baseline activity of inward units is above zero (grey dashed lines in Figure 7B and C). This is because the trained intercepts are negative (positive) in the outward (inward) case and when the input is zero (no signal), the unit activity cannot (can) get through the rectifier (Materials and methods). (The training did not impose any requirements on these intercepts.) The outward units respond strongly to stimuli originating near the center of the receptive field, but do not respond to stimuli originating at angles larger than approximately ${30}^{\circ}$ (Figure 7B and C). In contrast, inward units respond below baseline to hit stimuli approaching from the center and above baseline to stimuli approaching from the periphery of the receptive field, with $\theta $ between roughly ${30}^{\circ}$ and ${90}^{\circ}$ (Figure 7B and C). This helps explain why the inward units can act as loom detectors: they are sensitive to hit stimuli coming from the edges of the receptive field rather than from the center. The hit stimuli are isotropic (Figure 2A), so the number of stimuli with angles between ${30}^{\circ}$ and ${90}^{\circ}$ is much larger than the number of stimuli with angles below ${30}^{\circ}$ (Figure 7D). Thus, the inward units are sensitive to more hit cases than the outward ones. One may visualize these responses as heat maps of the mean response of the units in terms of object distance to the fly and the incoming angle (Figure 7E). For the hit cases, the response patterns are consistent with the intuition about trajectory angles (Figure 7C). Both outward and inward units respond less strongly to miss signals than to hit signals. As expected, while the outward units respond at most weakly to retreating signals, the inward ones respond to these signals with angles near ${180}^{\circ}$, since the motion of edges in such cases is radially inward. The RI model units have similar response patterns to the LRF model units (Figure 7—figure supplement 1).
Outward solutions have sparse codings and populations of units accurately predict hit probabilities
Individual units of the two solutions are very different from one another in both their filter structure and their response patterns to different stimuli. In populations of units, the outward and inward solutions also exhibit very different response patterns for a given hit stimulus (Figure 8A and B, Videos 5 and 6). In particular, active outward units usually respond more strongly than inward units, but more inward units will be activated by a hit stimulus. This is consistent with the findings above, in which inward filter shapes responded to hits arriving from a wider distribution of angles (Figure 7). For all the four types of stimuli, the outward solutions generally show relatively sparse activity among units, especially for models with larger numbers of units $M$, while the inward solutions show broader activity among units (Figure 8A and B, Figure 8—figure supplement 1, Figure 8—figure supplement 2, Videos 5–12).
When a population of units encodes stimuli, at each time point, the sum of the activities of the units is used to infer the probability of hit. In our trained models, the outward and inward solutions predict similar trajectories of probabilities of hit (Figure 8A). The outward solutions suppress the miss and retreat signals better, while the inward solutions better suppress rotation signals (Figure 8C). Both 32 unit and 256 unit models have units covering the entire visual field (Figure 6), but the models with 256 units can more accurately detect hit stimuli (Figure 8C). In some cases, misses can appear very similar to hits if the object passes near the origin. Both inward and outward solutions reflect this in their predictions in response to near misses, which have higher hit probabilities than far misses (Figure 8D).
The inward and outward solutions of RI models have similar behaviors to the LRF models (Figure 8—figure supplement 3). This indicates that the linear receptive field intuition captures behavior of the potentially more complicated RI model, as well.
Large populations of units improve performance
Since a larger number of units will cover a larger region of the visual field, a larger population of units can in principle provide more information about the incoming signals. In general, the models perform better as the number of units $M$ increases (Figure 9A). When $M$ is above 32, both the ROCAUC and PRAUC scores are almost 1 (Materials and methods), which indicates that the model is very accurate on the binary classification task presented by the four types of synthetic stimuli. As $M$ increases, the outward solutions become closer to the inward solutions in terms of both AUC score and cross entropy loss (Figure 9A and B). This is also true for the RI models (Figure 9—figure supplement 1A and B).
Beyond performance of the two solution types, we also calculated the ratio of the number of outward to inward solutions in 200 random initializations of the models with $M$ units. For the LRF models, as the number of units increases, the ratio remained relatively constant, fluctuating around 0.5 (Figure 9—figure supplement 2). For the RI models, on the other hand, as the number of units increases, an increasing proportion of solutions have outward filters (Figure 9—figure supplement 2). For RI models with 256 units, the chance that an outward filter appears as a solution is almost 90% compared with roughly 50% when $M=1$.
Because of this qualitative difference between the LRF and RI models, we next asked whether the form of the nonlinearity in the LRF model could influence the solutions found through optimization. When we replaced the rectified linear unit (ReLU) in LRF models with the exponential linear unit (ELU) (4 Methods and Materials), only inward solutions exist for models with $M<32$, and the outward solutions emerge more often as $M$ increases (Figure 9—figure supplement 2). Combined, these results with LRF and RI models indicate that the form and position of the nonlinearity in the circuit play a role in selecting between different optimized solutions. This suggests that further studies of the nonlinearities in LPLC2 processing will lead to additional insight into how a population of LPLC2s encodes looming stimuli.
The current binary prediction task is relatively easy for our loom detection models, as can be seen by the saturated AUC scores when $M$ is large (Figure 9A, Figure 9—figure supplement 1A). Thus, we engineered a new set of stimuli, where for the hit, miss, and retreat cases, we added a rotational background to increase the difficulty of the task. The object of interest, which is moving toward or away from the observer, also rotates with the background. This arrangement mimics selfrotation of the fly while observing a looming or retreating object in a cluttered background. To train an LRF model, we added such rotational distraction to half of the hit, miss, and retreat cases. The outward and inward solutions both persist (Figure 9—figure supplement 3), although in this case, outward solutions outperform inward ones (Figure 9—figure supplement 3). It remains unclear whether this added complexity brings our artificial stimuli closer to actual detection tasks performed by flies, but this result makes clear that identifying the natural statistics of loom will be important to understanding loom inferences.
Activation patterns of computational solutions resemble biological responses
The outward solutions have a receptive field structure that is similar to LPLC2 neurons, based on anatomical and functional studies. However, it is not clear whether these models possess the functional properties of LPLC2 neurons, which have been studied systematically (Klapoetke et al., 2017; Ache et al., 2019). To see how trained units compare to LPLC2 neuron properties, we presented stimuli to the trained outward model solution to compare its responses to those measured in LPLC2 (Figure 10).
The outward unit behaves similarly to LPLC2 neurons on many different types of stimuli. Not surprisingly, the unit is selective for loom signals and does not have strong responses to nonlooming signals (Figure 10B). Moreover, the unit closely follows the responses of LPLC2 neurons to various expanding bar stimuli, including the inhibitory effects of inward motion (Figure 10C and D). In addition, in experiments, motion signals that appear at the periphery of the receptive field suppress the activity of the LPLC2 neurons (periphery inhibition) (Klapoetke et al., 2017), and this phenomenon is successfully predicted by the outward unit (Figure 10E and F) due to its broad inhibitory filters (Figure 10A). The unit also correctly predicts response patterns of the LPLC2 neurons for expanding bars with different orientations (Figure 10G and H).
The ratio of object size to approach velocity, or $R/v$, is an important parameter for looming stimuli, and many studies have investigated how the response patterns of loomsensitive neurons depend on this ratio (Top panels in Figure 10I, J, K and L; Gabbiani et al., 1999; von Reyn et al., 2017; Ache et al., 2019; de Vries and Clandinin, 2012). Here, we presented the trained model (Figure 10A) with hit stimuli with different $R/v$ ratios, and compared its response with experimental measurements (Figure 10I–L). Surprisingly, although our model only has angular velocities as inputs (Figure 3), it reliably encodes the angular size of the stimulus rather than its angular velocity (Figure 10J). This is indicated by the collapsed response curves with different $R/v$ ratios (up to different scales) when plotted against the angular sizes (von Reyn et al., 2017). When the curves are plotted against angular velocity, they shift for different $R/v$ ratios, which means the response depends on the velocity $v$ of the object, since $R$ is fixed to be 1. The relative shifts in these curves are consistent with properties of LPLC2.
There are two ways that this angular size tuning likely arises. First, in hit stimuli, the angular size and angular velocity are strongly correlated (Gabbiani et al., 1999), which means the angular size affects the magnitude of the motion signals. Second, in hit stimuli, the angular size is proportional to the path length of the outwardmoving edges. This angular circumference of the hit stimulus determines how many motion detectors are activated, so that integrated motion signal strength is related to the size. Both of these effects influence the response patterns of the model units (and the LPLC2 neurons). Beyond the tuning to stimulus size, the outward model also reproduces a canonical linear relationship between the peak response time relative to the collision and the $R/v$ ratio (Figure 10L; Gabbiani et al., 1999; Ache et al., 2019).
Not surprisingly, the inward solution cannot reproduce the neural data (Figure 10—figure supplement 1). On the other hand, some forms of the the RI model outward solutions can closely reproduce the neural data (Figure 10—figure supplement 2), while other outward solutions fail to do so (Figure 10—figure supplement 3, Figure 10—figure supplement 4). For example, some RI outward solutions predict the patterns in the wide expanding bars differently and out of phase from the biological data (Figure 10—figure supplement 3H), do a poor job predicting the response curves of the LPLC2 neurons to looming signals with different $R/v$ ratios (Figure 10—figure supplement 3J and K), respond strongly to the moving gratings (Figure 10—figure supplement 4B), cannot show the peripheral inhibition (Figure 10—figure supplement 4E and F), and so on. This shows that, for the RI models, even within the family of learned outward solutions, there is variability in the learned response properties. Although solving the inference problem with the RI model obtains many of the neural response properties, additional constraints could be required. These additional constraints may be built into the LRF model, causing its outward solutions to more closely match neural responses.
Discussion
In this study, we have shown that training a simple network to detect collisions gives rise to a computation that closely resembles neurons that are sensitive to looming signals. Specifically, we optimized a neural network model to detect whether an object is on a collision course based on visual motion signals (Figure 3), and found that one class of optimized solution matched the anatomy of motion inputs to LPLC2 neurons (Figures 1, 5 and 6). Importantly, this solution reproduces a wide range of experimental observations of LPLC2 neuron responses (Figure 10; Klapoetke et al., 2017; von Reyn et al., 2017; Ache et al., 2019).
The radially structured dendrites of the LPLC2 neuron in the LP can account for its response to motion radiating outward from the receptive field center (Klapoetke et al., 2017). Our results show that the logic of this computation can be understood in terms of inferential loom detection by the population of units. In particular, for an individual detector unit, an inward structure can make a better loom detector than an outward structure, since it is sensitive to colliding objects originating from a wider array of incoming angles (Figure 7). As the number of units across visual space increases, the performance of the outwardsensitive receptive field structure comes to match the performance of the inward solutions (Figure 9, Figure 9—figure supplement 1, Figure 9—figure supplement 2). As the number of units increases, the inhibitory component of the outward solutions also becomes broader as the population size becomes larger, which is crucial for reproducing key experimental observations, such as peripheral inhibition (Figure 10; Klapoetke et al., 2017). The optimized solutions depend on the number of detectors, and this is likely related to the increasing overlap in receptive fields as the population grows (Figure 6). This result is consistent with prior work showing that populations of neurons often exhibit different and improved coding strategies compared to individual neurons (Pasupathy and Connor, 2002; Georgopoulos et al., 1986; Vogels, 1990; Franke et al., 2016; Zylberberg et al., 2016; Cafaro et al., 2020). Thus, understanding anatomical, physiological, and algorithmic properties of individual neurons can require considering the population response. The solutions we found to the loom inference problem suggest that individual LPLC2 responses should be interpreted in light of the population of LPLC2 responses.
Our results shed light on discussions of $\eta $like (encoding angular size) and $\rho $like (encoding angular velocity) looming sensitive neurons in the literature (Gabbiani et al., 1999; Wu et al., 2005; Liu et al., 2011; Shang et al., 2015; Temizer et al., 2015; Dunn et al., 2016; von Reyn et al., 2017; Ache et al., 2019). In particular, these optimized models clarify an interesting but puzzling fact: LPLC2 neurons transform their inputs of directionselective motion signals to computations of angular size (Ache et al., 2019). Consistent with this tuning, our model also shows a linear relationship between the peak time relative to collision and the $R/v$ ratio, which should be followed by loom sensitive neurons that encode angular size (Peek and Card, 2016). In both cases, these properties appear to be the simple result of training the constrained model to reliably detect looming stimuli.
The units of the outward solution exhibit sparsity in their responses to looming stimuli, in contrast to the denser representations in the inward solution (Figure 8). During a looming event, in an outward solution, most of the units are quiet and only a few adjacent units have very large activities, reminiscent of sparse codes that seem to be favored, for instance, in cortical encoding of visual scenes (Olshausen and Field, 1996; Olshausen and Field, 1997). Since the readout of our model is a summation of the activities of the units, sparsity does not directly affect the performance of the model, but is an attribute of the favored solution. For a model with a different loss function or with noise, the degree of sparsity might be crucial. For instance, the sparse code of the outward model might make it easier to localize a hit stimulus (Morimoto et al., 2020), or might make the population response more robust to noise (Field, 1994).
Experiments have shown that inhibitory circuits play an important role for the selectivity of LPLC2 neurons. For example, motion signals at the periphery of the receptive field of an LPLC2 neuron inhibit its activity. This peripheral inhibition causes various interesting response patterns of the LPLC2 neurons to different types of stimuli (Figure 10E and F; Klapoetke et al., 2017). However, the structure of this inhibitory field is not fully understood, and our model provides a tool to investigate how the inhibitory inputs to LPLC2 neurons affect circuit performance on loom detection tasks. The strong inhibition on the periphery of the receptive field arises naturally in the outward solutions after optimization. The extent of the inhibitory components increases as more units are added to models (Figure 6). The broad inhibition appears in our model to suppress responses to the nonhit stimuli, and as in the data, the inhibition is broader than one might expect if the neuron were simply being inhibited by inward motion. These larger inhibitory fields are also consistent with the larger spatial pooling likely to be supplied by inhibitory LPi inputs (Klapoetke et al., 2017).
The synthetic stimuli used to train models in this study were unnatural in two ways. The first way was in the proportion of hits and nonhits. We trained with 25% of the training data representing hits. The true fraction of hits among all stimuli encountered by a fly is undoubtedly much less, and this affects how the loss function weights different types of errors. It is also clear that a falsepositive hit (in which a fly might jump to escape an object not on collision course) is much less penalized during evolution than a falsenegative (in which a fly doesn’t jump and an object collides, presumably to the detriment of the fly). It remains unclear how to choose these weights in the training data or in the loss function, but they affect the receptive field weights optimized by the model.
The second issue with the stimuli is that they were caricatures of stimulus types, but did not incorporate the richness of natural stimuli. This richness could include natural textures and spatial statistics (Ruderman and Bialek, 1994), which seem to impact motion detection algorithms (Fitzgerald and Clark, 2015; Leonhardt et al., 2016; Chen et al., 2019). This richness could also include more natural trajectories for approaching objects. Another way to enrich the stimuli would be to add noise, either in inputs to the model or in the model’s units themselves. We explored this briefly by adding selfrotationgenerated background motion; under those conditions, both solutions were present but optimized outward solutions performed better than the inward solutions (Figure 9—figure supplement 3, Materials and methods). This indicates that the statistics of the stimuli may play an important role in selecting solutions for loom detection. However, it remains less clear what the true performance limits of loom detection are, since most experiments use substantially impoverished looming stimuli. Moreover, it is challenging to characterize the properties of natural looming events. An interesting future direction will be to investigate the effects of more complex and naturalistic stimuli on the model’s filters and performance, as well as on LPLC2 neuron responses themselves.
For simplicity, our models did not impose the hexagonal geometry of the compound eye ommatidia. Instead, we assumed that the visual field is separated into a Cartesian lattice with ${5}^{\circ}$ spacing, each representing a local motion detector with two spatially separated inputs (Figure 3). This simplification alters slightly the geometry of the motion signals compared to the real motion detector receptive fields (Shinomiya et al., 2019). This could potentially affect the learned spatial weightings and reproduction of the LPLC2 responses to various stimuli, since the specific shapes of the filters matter (Figure 10). Thus, the hexagonal ommatidial structure and the full extent of inputs to T4 and T5 might be crucial if one wants to make comparisons with the dynamics and detailed responses of LPLC2 neurons. However, this geometric distinction seems unlikely to affect the main results of how to infer the presence of hit stimuli.
Our model requires a field of estimates of the local motion. Here, we used the simplest model – the HassensteinReichardt correlator model Equation 3 (Materials and methods, Hassenstein and Reichardt, 1956) – but the model could be extended by replacing it with a more sophisticated model for motion estimation. Some biophysically realistic ones might take into account synaptic conductances (Gruntman et al., 2018; Gruntman et al., 2019; Badwan et al., 2019; ZavatoneVeth et al., 2020) and could respond to static features of visual scenes (Agrochao et al., 2020). Alternatively, in natural environments, contrasts fluctuate in time and space. Thus, if one includes more naturalistic spatial and temporal patterns, one might consider a motion detection model that could adapt to changing contrasts in time and space (Drews et al., 2020; Matulis et al., 2020).
Although the outward filter of the unit emerges naturally from our gradient descent training protocol, that does not mean that the structure is learned by LPLC2 neurons in the fly. There may be some experience dependent plasticity in the fly eye (Kikuchi et al., 2012), but these visual computations are likely to be primarily genetically determined. Thus, one may think of the computation of the LPLC2 neuron as being shaped through millions of years of evolutionary optimization. Optimization algorithms at play in evolution may be able to avoid getting stuck in local optima (Stanley et al., 2019), and thus work well with the sort of shallow neural network found in the fly eye.
In this study, we focused on the motion signal inputs to LPLC2 neurons, and we neglected other inputs to LPLC2 neurons, such as those coming from the lobula that likely report nonmotion visual features. It would be interesting to investigate how this additional nonmotion information affects the performance and optimized solutions of the inference units. For instance, another lobula columnar neurons, LC4, is loom sensitive and receives inputs in the lobula (von Reyn et al., 2017). The LPLC2 and LC4 neurons are the primary excitatory inputs to the GF, which mediates escape behaviors (von Reyn et al., 2014; Ache et al., 2019). The inference framework set out here would allow one to incorporate parallel nonmotion intensity channels, either by adding them into the inputs to the LPLC2like units, or by adding in a parallel population of LC4like units. This would require a reformulation of the probabilistic model in Equation 6. Notably, one of the most studied loom detecting neurons, the lobula giant movement detector (LGMD) in locusts, does not appear to receive directionselective inputs, as LPLC2 does (Rind and Bramwell, 1996; Gabbiani et al., 1999). Thus, the inference framework set out here could be flexibly modified to investigate loom detection under a wide variety of constraints and inputs, which allow it to be applied to other neurons, beyond LPLC2.
Materials and methods
Code availability
Request a detailed protocolCode to perform all simulations in this paper and to reproduce all figures is available at https://github.com/ClarkLabCode/LoomDetectionANN, (copy archived at swh:1:rev:864fd3d591bc9e3923189320d7197bdd0cd85448; Zhou, 2021).
Coordinate system and stimuli
Request a detailed protocolWe designed a suite of visual stimuli to simulate looming objects, retreating objects, and rotational visual fields. In this section, we describe the suite of stimuli and the coordinate systems used in our simulations (Figure 4—figure supplement 1).
In our simulations and training, the fly is at rest on a horizontal plane, with its head pointing in a specific direction. The fly head is modeled to be a point particle with no volume. A threedimensional righthanded frame of reference $\mathrm{\Sigma}$ is set up and attached to the fly head at the origin. The $z$ axis points in the anterior direction from the fly head, perpendicular to the line that connects the two eyes, and in the horizontal plane of the fly; the $y$ axis points toward the right eye, also in the horizontal plane; and the $x$ axis points upward and perpendicular to the horizontal plane. Looming or retreating objects are represented in this space by a sphere with radius $R=1$, and the coordinates of an object’s center at time $t$ are denoted as $\mathbf{r}(t)=(x(t),y(t),z(t))$. Thus, the distance between the object center and the fly head is $D(t)=\Vert \mathbf{r}(t)\Vert =\sqrt{{x}^{2}(t)+{y}^{2}(t)+{z}^{2}(t)}$.
Within this coordinate system, we set up cones to represent individual units. The receptive field of LPLC2 neurons is measured at roughly ${60}^{\circ}$ in diameter (Klapoetke et al., 2017). Thus, we here model each unit as a cone with its vertex at the origin and with halfangle of ${30}^{\circ}$. For each unit $m$ ($m=1,2,\mathrm{\dots},M$), we set up a local frame of reference ${\mathrm{\Sigma}}_{m}$ (Figure 4—figure supplement 1): the z_{m} axis is the axis of the cone and its positive direction points outward from the origin. The local ${\mathrm{\Sigma}}_{m}$ can be obtained from $\mathrm{\Sigma}$ by two rotations: around $x$ of $\mathrm{\Sigma}$ and around the new ${y}^{{}^{\prime}}$ after the rotation around $x$. For each unit, its cardinal directions are defined as: upward (positive direction of x_{m}), downward (negative direction of x_{m}), leftward (negative direction of y_{m}). and rightward (positive direction of y_{m}). To get the signals that are received by a specific unit $m$, the coordinates of the object in $\mathrm{\Sigma}$ are rotated to the local frame of reference ${\mathrm{\Sigma}}_{m}$.
Within this coordinate system, we can set up cones representing the extent of a spherical object moving in the space. The visible outline of a spherical object spans a cone with its point at the origin. The halfangle of this cone is a function of time and can be denoted as ${\theta}_{\text{s}}(t)$:
One can calculate how the cone of the object overlaps with the receptive field cones of each unit.
There are multiple layers of processing in the fly visual system (Takemura et al., 2017), but here we focus on two coarse grained stages of processing: (1) the estimation of local motion direction from optical intensities by motion detection neurons T4 and T5 and (2) the integration of the flow fields by LPLC2 neurons. In our simulations, the interior of the $m$th unit cone is represented by a NbyN matrix, so that each element in this matrix (except the ones at the four corners) indicates a specific direction in the angular space within the unit cone. If an element also falls within the object cone, then its value is set to 1; otherwise it is 0. Thus, at each time $t$, this matrix is an optical intensity signal and can be represented by $C({x}_{m},{y}_{m},t)$, where $({x}_{m},{y}_{m})$ are the coordinates in ${\mathrm{\Sigma}}_{m}$. In general, $N$ should be large enough to provide good angular resolutions. Then, ${K}^{2}$ ($K<N$) motion detectors are evenly distributed within the unit cone, with each occupying an $L$by$L$ grid in the $N$by$N$ matrix, where $L=N/K$. This $L$by$L$ grid represents a ${5}^{\circ}$by${5}^{\circ}$ square in the angular space, consistent with the approximate spacing of the inputs of motion detectors T4 and T5. This arrangement effectively uses high spatial resolution intensity data to compute local intensity before it is discretized into motion signals with a resolution of ${5}^{\circ}$. Since the receptive field of an LPLC2 neuron is roughly ${60}^{\circ}$, the value of $K$ is chosen to be 12. To get sufficient angular resolution for the local motion detectors, $L$ is set to be 4, so that $N$ is set to 48.
Each motion detector is assumed to be a Hassenstein Reichardt Correlator (HRC) and calculates local flow fields from $C({x}_{m},{y}_{m},t)$ (Hassenstein and Reichardt, 1956; Figure 3—figure supplement 1). The HRC used here has two inputs, separated by ${5}^{\circ}$ in angular space. Each input applies first a spatial filter on the contrast $C({x}_{m},{y}_{m},t)$ and then temporal filters:
where ${f}_{j}(j\in 1,2)$ is a temporal filter and $G$ is a discrete 2d Gaussian kernel with mean ${0}^{\circ}$ and standard deviation of ${2.5}^{\circ}$ to approximate the acceptance angle of the fly photoreceptors (Stavenga, 2003). The temporal filter f_{1} was chosen to be an exponential function ${f}_{1}({t}^{{}^{\prime}})=(1/\tau )\mathrm{exp}({t}^{{}^{\prime}}/\tau )$ with $\tau $ set to 0.03 seconds (SalazarGatzimas et al., 2016), and f_{2} a delta function ${f}_{2}=\delta ({t}^{{}^{\prime}})$. This leads to
as the local flow field at time $t$ between two inputs located at $({x}_{m1},{y}_{m1})$ and $({x}_{m2},{y}_{m2})$.
Four types of T4 and T5 neurons have been found that project to layers 1, 2, 3, and 4 of the LP. Each type is sensitive to one of the cardinal directions: down, up, left, right (Maisak et al., 2013). Thus, in our model, there are four nonnegative, local flow fields that serve as the only inputs to the model: ${U}_{}(t)$ (downward, corresponding LP layer 4), ${U}_{+}(t)$ (upward, LP layer 3), ${V}_{}(t)$ (leftward, LP layer 1), and ${V}_{+}(t)$ (rightward, LP layer 2), each of which is a $K$by$K$ matrix. To calculate these matrices, two sets of motion detectors are needed, one for the vertical directions and one for the horizontal directions. The HRC model in Equation 3 is direction sensitive and is opponent, meaning that for motion in the preferred (null) direction, the output of the HRC model is positive (negative). Thus, assuming that upward (rightward) is the preferred vertical (horizontal) direction, we obtain the nonnegative elements of the four flow fields as
where ${k}_{1},{k}_{2}\in \{1,2,\mathrm{\dots},K\}$. In the above expressions, for ${[{U}_{}(t)]}_{{k}_{1}{k}_{2}}$ and ${[{U}_{+}(t)]}_{{k}_{1}{k}_{2}}$, the vertical motion detector at $({k}_{1},{k}_{2})$ has its two inputs located at $({x}_{m1},{y}_{m})$ and $({x}_{m2},{y}_{m})$, respectively. Similarly, for for ${[{V}_{}(t)]}_{{k}_{1}{k}_{2}}$ and ${[{V}_{+}(t)]}_{{k}_{1}{k}_{2}}$, the horizontal motion detector at $({k}_{1},{k}_{2})$ has its two inputs located at $({x}_{m},{y}_{m1})$ and $({x}_{m},{y}_{m2})$. Using the opponent HRC output as the motion signals for each layer is reasonable because the motion detectors T4 and T5 are highly directionselective over a large range of inputs (Maisak et al., 2013; Creamer et al., 2018) and synaptic, 3input models for T4 are approximately equivalent to opponent HRC models (ZavatoneVeth et al., 2020).
We simulated the trajectories $\mathbf{r}(t)$ of the object in the frame of reference $\mathrm{\Sigma}$ at a time resolution of 0.01 s, which is also be the time step of the training and testing stimuli. For hit, miss, and retreat cases, the trajectories of the object are always straight lines, and the velocities of the object were randomly sampled from a range $[2R,10R]({s}^{1})$ with the trajectories confined to be within a sphere of $5R$ centered at the fly head. The radius of the object, $R$, is always set to be one except in the rotational stimuli. To generate rotational stimuli, we placed 100 objects with various radii selected uniformly from $[0,1]$ at random distances ([5, 15]) and positions around the fly, and rotated them all around a randomly chosen axis. The rotational speed was chosen from a Gaussian distribution with mean ${0}^{\circ}/s$ and standard deviation ${200}^{\circ}/s$, a reasonable rotational velocity for walking flies (DeAngelis et al., 2019). In one case, training data included both an object moving and global rotation due to selfrotation (Figure 9—figure supplement 3). These stimuli were simply combinations of the rotational stimuli with the other three cases (hit, miss, and retreat), so that the object that moves in the depth dimension also rotates together with the background.
We reproduced a range of stimuli used in a previous study (Klapoetke et al., 2017) and tested them on our trained model (Figure 10B–H). To match the cardinal directions of LP layers (Figure 1), we have rotated the stimuli (except in Figure 10B) 45 degrees compared with the ones displayed in the figures in Klapoetke et al., 2017. The disc (Figure 10B and C) expands from ${20}^{\circ}$ to ${60}^{\circ}$ with an edge speed of ${10}^{\circ}/s$. All the bar and edge motions have an edge speed of ${20}^{\circ}/s$. The width of the bars are ${60}^{\circ}$ (right panel of Figure 10E and H), ${20}^{\circ}$ (middle panel of Figure 10E), and ${10}^{\circ}$ (all the rest). All the responses of the models (except in Figure 10B) have been normalized by the peak of the response to the expanding disc (Figure 10B).
We created a range of hit stimuli with various $R/v$ ratios: $0.01s$, $0.02s$, $0.04s$, $0.08s$, $0.10s$, $0.12s$, $0.14s$, $0.16s$, $0.18s$, $0.20s$. The radius $R$ of the spherical object is fixed to be 1, and the velocity is changed accordingly to achieve different $R/v$ ratios.
Models
Request a detailed protocolLPLC2 neurons have four dendritic structures in the four LP layers, and they receive direct excitatory inputs from T4/T5 motion detection neurons (Maisak et al., 2013; Klapoetke et al., 2017). It has been proposed that each dendritic structure also receives inhibitory inputs mediated by lobula plate intrinsic interneurons, such as LPi43 (Klapoetke et al., 2017). We built two model units to approximate this anatomy.
LRF models A linear receptive field (LRF) model is characterized by a realvalued filter, represented by a 12by12 matrix ${W}^{\text{r}}$ (Figure 4A). The elements of the filter combine the effects of the excitatory and inhibitory inputs and can take both positive (stronger excitation) and negative (stronger inhibition) values. We rotate ${W}^{\text{r}}$ counterclockwise by multiples of 90° to obtain the filters that are used to integrate the four motion signals: ${U}_{}(t)$, ${U}_{+}(t)$, ${V}_{}(t)$, ${V}_{+}(t)$. Specifically, we define the corresponding four filters as: ${W}_{{U}_{}}^{\text{r}}=\text{rotate}({W}^{\text{r}},{270}^{\circ})$, ${W}_{{U}_{+}}^{\text{r}}=\text{rotate}({W}^{\text{r}},{90}^{\circ})$, ${W}_{{V}_{}}^{\text{r}}=\text{rotate}({W}^{\text{r}},{180}^{\circ})$, ${W}_{{V}_{+}}^{\text{r}}=\text{rotate}({W}^{\text{r}},{0}^{\circ})$. In addition, we impose mirror symmetry on the filters, and with the above definitions of the rotated filters, the upper half of ${W}^{\text{r}}$ is a mirror image of the lower half of ${W}^{\text{r}}$. Thus, there are in total 72 parameters in the filters. In fact, since only the elements within a 60 degree cone contribute to the filter for the units, the corners are excluded, resulting in only 56 trainable parameters.
In computer simulations, the filters or weights and flow fields are flattened to be onedimensional column vectors. The response of a single LRF unit $m$ is:
where $\varphi (\cdot )=\mathrm{max}(\cdot ,0)$ is the rectified linear unit (ReLU), and ${b}^{\text{r}}\in \mathbb{R}$ is the intercept (Figure 4A). The ReLU is used as the activation function in all the figures, except in one panel (Figure 9—figure supplement 2), where an exponential linear unit is used:
For $M=1$, the LRF model is very close to a generalized linear model, except that it includes an additional activation function $\varphi $. This activation function changes the convexity of the model to make it a nonconvex optimization problem, in general.
RI models The rectified inhibition (RI) models have two types of nonnegative filters, one excitatory and one inhibitory, represented by ${W}^{\text{e}}$ and ${W}^{\text{i}}$, respectively (Figure 4B). Each filter is a 12by12 matrix. The same rotational and mirror symmetries are imposed as in the LRF models, which leads to four excitatory filters as: ${W}_{{U}_{}}^{\text{e}}=\text{rotate}({W}^{\text{e}},{270}^{\circ})$, ${W}_{{U}_{+}}^{\text{e}}=\text{rotate}({W}^{\text{e}},{90}^{\circ})$, ${W}_{{V}_{}}^{\text{e}}=\text{rotate}({W}^{\text{e}},{180}^{\circ})$, ${W}_{{V}_{+}}^{\text{e}}=\text{rotate}({W}^{\text{e}},{0}^{\circ})$, and four inhibitory filters as: ${W}_{{U}_{}}^{\text{i}}=\text{rotate}({W}^{\text{i}},{270}^{\circ})$, ${W}_{{U}_{+}}^{\text{i}}=\text{rotate}({W}^{\text{i}},{90}^{\circ})$, ${W}_{{V}_{}}^{\text{i}}=\text{rotate}({W}^{\text{i}},{180}^{\circ})$, ${W}_{{V}_{+}}^{\text{i}}=\text{rotate}({W}^{\text{i}},{0}^{\circ})$. Thus, there are in total 112 parameters in the two sets of filters, excluding the elements in the corners.
The responses of the inhibitory units are:
where $\varphi (\cdot )=\mathrm{max}(\cdot ,0)$ is the ReLU, and ${b}^{\text{i}}\in \mathbb{R}$ is the intercept. In the RI model, the rectification of each inhibitory layer is motivated by the LPi neurons, which mediate the inhibition within each layer, and could themselves rectify their inhibitory input into LPLC2. The response of a single RI unit $m$ is
where ${b}^{\text{e}}\in \mathbb{R}$ is the intercept (Figure 4B). Interestingly, the RI models become equivalent to the LRF models if we remove the ReLU in the inhibitions and define ${W}_{{U}_{}}^{\text{r}}={W}_{{U}_{}}^{\text{e}}{W}_{{U}_{}}^{\text{i}}$, ${W}_{{U}_{+}}^{\text{r}}={W}_{{U}_{+}}^{\text{e}}{W}_{{U}_{+}}^{\text{i}}$, ${W}_{{V}_{+}}^{\text{r}}={W}_{{V}_{+}}^{\text{e}}{W}_{{V}_{+}}^{\text{i}}$, ${W}_{{V}_{}}^{\text{r}}={W}_{{V}_{}}^{\text{e}}{W}_{{V}_{}}^{\text{i}}$ and ${b}^{\text{r}}={b}^{\text{e}}4{b}^{\text{i}}$.
For both the LRF and RI models, the inferred probability of hit for a specific trajectory is
where $T$ is the total number of time steps in the trajectory and $\sigma (\cdot )$ is the sigmoid function. Since we are adding two intercepts (${b}^{\text{r}}$, and $b$) to the LRF models, and three intercepts (${b}^{\text{i}}$, ${b}^{\text{e}}$, and $b$) to the RI models, there are 58 and 115 parameters to train the two models, respectively.
Training and testing
Request a detailed protocolWe created a synthetic data set containing four types of motion: loomandhit, loomandmiss, retreat, and rotation. The proportions of these types were 0.25, 0.125, 0.125, and 0.5, respectively. In total, there were 5200 trajectories, with 4000 for training and 1200 for testing. Trajectories with motion type loomandhit are labeled as hit or ${y}_{n}=1$ (probability of hit is 1), while trajectories of other motion types are labeled as nonhit or ${y}_{n}=0$ (probability of hit is 0), where $n$ is the index of each specific sample. Models with smaller $M$ have fewer trajectories in the receptive field of any unit. For stability of training, we therefore increased the number of trajectories by factors of eight, four, and two for $M=1,2,4$, respectively.
The loss function to be minimized in our training was the cross entropy between the label y_{n} and the inferred probability of hit ${\widehat{P}}_{\text{hit}}$, and averaged across all samples, together with a regularization term:
where ${\widehat{P}}_{\text{hit}}(n)$ is the inferred probability of hit for sample $n$, $\beta $ is the strength of the ${\mathrm{\ell}}_{2}$ regularization, and $W$ represents all the effective parameters in the two excitatory and inhibitory filters.
The strength of the regularization $\beta $ was set to be ${10}^{4}$, which was obtained by gradually increasing $\beta $ until the performance of the model on test data started to drop. The regularization sped up convergence of solutions, but the regularization strength did not strongly influence the main results in the paper.
To speed up training, rather than taking a temporal average as shown in Equation 6, a snapshot was sampled randomly from each trajectory, and the probability of hit of this snapshot was used to represent the whole trajectory, that is, ${\widehat{P}}_{\text{hit}}=\sigma \left({\sum}_{m}{r}_{m}(t)+b\right)$, where $t$ is a random sample from $\{1,2,\mathrm{\dots},T\}$. Minibatch gradient descent was used in training, and the learning rate was 0.001.
After training, the models were tested on the entire trajectories with the probability of hit defined in Equation 6. Models trained only on snapshots performed well on the test data. During testing, the performance of the model was evaluated by the area under the curve (AUC) of the receiver operating characteristic (ROC) and precisionrecall (PR) curves (Hanley and McNeil, 1982; Davis and Goadrich, 2006). TensorFlow (Abadi et al., 2016) was used to train all models.
Clustering the solutions
Request a detailed protocolWe used the following procedure to cluster the solutions. For the LRF models, the filter of each solution was simply flattened to form a vector. But, for the RI mdoels, each solution had an excitatory and an inhibitory filter. We flattened these two filters, and concatenated them into a single vector. (The elements at the corners were deleted since they are outside of the receptive field.) Thus, each solution was represented by a vector, from which we calculated the cosine distance for each pair of solutions. The obtained distance matrix was then fed into a hierarchical clustering algorithm (Virtanen et al., 2020). After obtaining the hierarchical clustering, the outward and inward filters were identified by their shape. We counted the positive filter elements corresponding to flow fields with components radiating outward and subtracted the number of positive filter elements corresponding to flow fields with components directed inward. If the resulting value was positive, the filters were labeled as outward; otherwise, the filters were labeled as inward. We could also sum the elements rather than count them, but we found that the latter was more robust. If the elements in the concatenated vector were all close to zero, then the corresponding filters were labeled as zero solutions.
Statistics
To calculate the fraction of active units for the model with $M=256$ (Figure 8B), we examined the response curves of each unit to all trajectories of a specific type of stimuli. If a unit response is above baseline (dotted lines in Figure 7B), then the unit is counted as active. For each trajectory/stimulus, we obtained the number of active units. We used this number to calculate the mean and standard deviation of active units across all the trajectories within each type of stimulus (hit, miss, retreat, rotation).
For a model with $M$ units, where $M\in \{1,2,4,8,16,32,64,128,192,256\}$, 200 random initializations were used to train it. For the LRF models, within these 200 training runs, the number of outward solutions ${N}_{\text{out}}$ were (starting from smaller values of $M$) 45, 59, 60, 65, 52, 63, 52, 57, 58, 49, and the number of inward solutions ${N}_{\text{in}}$ were 142, 135, 127, 119, 139, 133, 130, 118, 123, 119. For the RI models, the number of outward solutions ${N}_{\text{out}}$ were 44, 46, 48, 50, 48, 50, 53, 55, 58, 64, and the number of inward solutions ${N}_{\text{in}}$ were 39, 40, 39, 46, 53, 51, 35, 38, 12, 10. The average score curves and points in Figure 9A, Figure 9—figure supplement 1A and Figure 9—figure supplement 2A were obtained by taking the average among each type of solution, with the shading indicating the standard deviations. The curves and point in Figure 9—figure supplement 1C are the ratio of the number of outward solutions to the number of inward solutions. To obtain error bars (grey shading), we considered the training results as a binomial distribution, with the probability of obtaining an outward solution being ${N}_{\text{out}}/({N}_{\text{out}}+{N}_{\text{in}})$, and with the probability of obtaining an inward solution being ${N}_{\text{in}}/({N}_{\text{out}}+{N}_{\text{in}})$. Thus, the standard deviation of this binomial distribution is ${\sigma}_{\text{b}}=\sqrt{{N}_{\text{out}}{N}_{\text{in}}/({N}_{\text{out}}+{N}_{\text{in}})}$. From this, we calculate the error bars as the propagated standard deviation (Morgan et al., 1990):
Data availability
Code to perform all simulations in this paper and to reproduce all figures is available at https://github.com/ClarkLabCode/LoomDetectionANN, (copy archived at swh:1:rev:864fd3d591bc9e3923189320d7197bdd0cd85448).
References

ConferenceIn 12th USENIX symposium on operating systems design and implementationTensorflow: A system for largescale machine learning. pp. 265–283.

Global Motion Processing by Populations of DirectionSelective Retinal Ganglion CellsThe Journal of Neuroscience 40:5807–5819.https://doi.org/10.1523/JNEUROSCI.056420.2020

Visually mediated motor planning in the escape response of DrosophilaCurrent Biology 18:1300–1307.https://doi.org/10.1016/j.cub.2008.07.094

ConferenceProceedings of the 23rd international conference on Machine learningThe relationship between PrecisionRecall and ROC curves. pp. 233–240.https://doi.org/10.1145/1143844.1143874

Dynamic Signal Compression for Robust Motion Vision in FliesCurrent Biology 30:209–221.https://doi.org/10.1016/j.cub.2019.10.035

What Is the Goal of Sensory Coding?Neural Computation 6:559–601.https://doi.org/10.1162/neco.1994.6.4.559

Computation of object approach by a widefield, motionsensitive neuronThe Journal of Neuroscience 19:1122–1141.

Systemtheoretische Analyse der Zeit, Reihenfolgen und Vorzeichenauswertung bei der Bewegungsperzeption des Rüsselkäfers ChlorophanusZeitschrift Für Naturforschung B 11:513–524.https://doi.org/10.1515/znb195691004

Looming sensitive cortical regions without V1 input: evidence from a patient with bilateral cortical blindnessFrontiers in Integrative Neuroscience 9:51.https://doi.org/10.3389/fnint.2015.00051

Experiencedependent plasticity of the optomotor response in Drosophila melanogasterDevelopmental Neuroscience 34:533–542.https://doi.org/10.1159/000346266

Neuronal responses to looming objects in the superior colliculus of the catBrain, Behavior and Evolution 77:193–205.https://doi.org/10.1159/000327045

Approach sensitivity in the retina processed by a multifunctional neural circuitNature Neuroscience 12:1308–1316.https://doi.org/10.1038/nn.2389

Computation of object approach by a system of visual motionsensitive neurons in the crab NeoheliceJournal of Neurophysiology 112:1477–1490.https://doi.org/10.1152/jn.00921.2013

Population coding of shape in area V4Nature Neuroscience 5:1332–1338.https://doi.org/10.1038/nn972

Comparative approaches to escapeCurrent Opinion in Neurobiology 41:167–173.https://doi.org/10.1016/j.conb.2016.09.012

Looming detectors in the human visual pathwayVision Research 18:415–421.https://doi.org/10.1016/00426989(78)900512

A deep learning framework for neuroscienceNature Neuroscience 22:1761–1770.https://doi.org/10.1038/s4159301905202

Neural network based on the input organization of an identified neuron signaling impending collisionJournal of Neurophysiology 75:967–985.https://doi.org/10.1152/jn.1996.75.3.967

Statistics of natural images: Scaling in the woodsPhysical Review Letters 73:814–817.https://doi.org/10.1103/PhysRevLett.73.814

Gliding behaviour elicited by lateral looming stimuli in flying locustsJournal of Comparative Physiology. A, Neuroethology, Sensory, Neural, and Behavioral Physiology 191:61–73.https://doi.org/10.1007/s003590040572x

Role of a loomingsensitive neuron in triggering the defense behavior of the praying mantis Tenodera aridifoliaJournal of Neurophysiology 112:671–682.https://doi.org/10.1152/jn.00049.2014

Designing neural networks through neuroevolutionNature Machine Intelligence 1:24–35.https://doi.org/10.1038/s422560180006z

Angular and spectral sensitivity of fly photoreceptors. II. Dependence on facet lens Fnumber and rhabdomere type in DrosophilaJournal of Comparative Physiology. A, Neuroethology, Sensory, Neural, and Behavioral Physiology 189:189–202.https://doi.org/10.1007/s0035900303906

Motor outputs of giant nerve fiber in DrosophilaJournal of Neurophysiology 44:405–421.https://doi.org/10.1152/jn.1980.44.2.405

A Visual Pathway for LoomingEvoked Escape in Larval ZebrafishCurrent Biology 25:1823–1834.https://doi.org/10.1016/j.cub.2015.06.002

Stimulus and goaloriented frameworks for understanding natural visionNature Neuroscience 22:15–24.https://doi.org/10.1038/s4159301802840

Population coding of stimulus orientation by striate cortical cellsBiological Cybernetics 64:25–31.https://doi.org/10.1007/BF00203627

A spiketiming mechanism for action selectionNature Neuroscience 17:962–970.https://doi.org/10.1038/nn.3741

Tectal neurons signal impending collision of looming objects in the pigeonEuropean Journal of Neuroscience 22:2325–2331.https://doi.org/10.1111/j.14609568.2005.04397.x

Using goaldriven deep learning models to understand sensory cortexNature Neuroscience 19:356–365.https://doi.org/10.1038/nn.4244

SoftwareLoomDetectionANN, version swh:1:rev:864fd3d591bc9e3923189320d7197bdd0cd85448Software Heritage.
Decision letter

Fred RiekeReviewing Editor; University of Washington, United States

Ronald L CalabreseSenior Editor; Emory University, United States

Fred RiekeReviewer; University of Washington, United States

Catherine von ReynReviewer
Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.
Decision letter after peer review:
Thank you for submitting your article "Shallow neural networks trained to detect collisions recover features of visual loomselective neurons" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Fred Rieke as Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Ronald Calabrese as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Catherine von Reyn (Reviewer #3).
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.
Essential revisions:
The followed issues emerged in review – and were agreed upon by all of the reviewers in consultations.
1. Questions about the model architecture. Several model components (rotation and symmetry) were imposed rather than learned. Was this necessary? Can the model make (testable) predictions about connectomics data?
2. Types of solutions. The text and results needs to explore all three types of solution (inward, outward and unstructured) in more detail. It is currently difficult to understand why the inward and unstructured solutions are essentially dropped part way through.
3. More challenging tests of the model. Can you add distracting optic flow to the current stimulus set and/or use more naturalistic stimuli? This could help reduce the number of viable solutions.
4. Inhibitory component of the model. Inhibition is assumed to have specific properties (e.g. rectification) – and it is not clear if these are essential. Further, it is absent in some solutions. Are the properties of inhibition (when present) consistent with the broad LPi receptive fields?
5. Comparison of model with neural data. A stronger rationale is needed for why two of the many outward models are selected for comparison with neural data (and why comparisons are not made for the inward or unstructured models). It is also important to quantify the similarity of the models with neural data.
Reviewer #1 (Recommendations for the authors):
Line 2627: It would be helpful to make a somewhat more general statement about the power of the approach that you take here.
Figure 3 is the first figure referred to, so moving it up to Figure 1 would make reading easier.
Line 79: clarify here you mean object motion, not motion of one of the edges.
Line 9495: the relationship between timing and sizetospeed ratio is likely hard for most readers to make sense of here – suggest deleting.
Lines 150151: suggest clarifying that excitation and inhibition in the model are not constrained to have opposite spatial dependencies as depicted in the Figure 4.
Line 170: suggest describing the loss function in a sentence in the Results.
Lines 174176: It would be helpful to connect the outward and inward model terminology more clearly to the flow fields in Figure 3 here. I think this is just a matter of highlighting which elements of the grid in Figure 3 are relevant for each model.
Lines 177178: describe performance measures here qualitatively.
Lines 206209: the reason for the difference in baseline activity is not clear – and it requires a lot of effort to extract that from the methods. Can you give more intuition here in the results?
Lines 336340: this is helpful, and some of it could come up earlier in the Results. More generally, it would be helpful to be clearer (especially in results) how much of the encoding of angular size is a property of expansion of the stimulus, and how much of how the computation is implemented.
Reviewer #2 (Recommendations for the authors):
– The manuscript is a bit difficult to understand. The authors may want to improve their explanations and figures to make them more accessible. For example, in Figure 7B, I can barely see the responses and don't see any grey lines. Perhaps showing only a subset of responses would make the figure clearer  less is more.
– The usage of the term "ballistic" in the introduction is confusing. In many contexts, "ballistic" suggests freefalling motion; in this paper, the authors are referring to the distinction between ballistic and diffusive motion. To avoid confusion, I would suggest not using the term ballistic at all; instead, "straight line" or "linear" is just as expressive.
– The first figure that is cited in the text is Figure 3. I suggest reorganizing either the text or the figures so that the first figure that is cited is Figure 1.
– Figure 5, panel D: why are there two magenta curves?
– I would also suggest a careful reading to screen for typos  I found a dozen or so, from misspelled words to mismatched parentheses.
Reviewer #3 (Recommendations for the authors):
1. Suggestions for improved or additional experiments, data or analyses:
a. The authors should provide their criteria for selecting a particular solution to compare to neural data.
b. The authors should evaluate how well their solutions predict neural data.
c. The authors need to mention that certain outward solutions have no inhibitory component (see Figure 5C, Figure 6 supplement 2). It needs to be discussed in the text and it would be very interesting to see how well these solutions recreate actual data.
d. It would be helpful for the authors to provide an example of an "unstructured" solution and an evaluation of its performance, even if it is included as a supplemental figure.
2. Recommendations for improving writing and presentation
a. Lines 8990 – this can be better supported by adding the criteria/evaluation mentioned above.
b. Methods (~ line 483) – How is the HRC model using T5 (off) and T4 (on) motion input?
c. Lines 492502 – What was the frame rate (timestep) for both training and testing stimuli?
d. Figures – Please increase the size when there is white space available. Make sure the pink and green color scheme for the two solution sets are very obvious.
e. Figure 1 caption – approximately half of the 200 LPLC2 are directly synaptic to the GF.
f. Figure 5 – is cross entropy loss the same as what is referred to as the loss function (equation 6) in the methods? If so, keep consistent. If not, please explain.
g. Figure 8D, it is difficult to see the boxplots.
h. Figure 10 IL, it is difficult at first glance to realize what is neural data vs model output. Maybe label the rows instead?
i. Supplemental Figure 1. Add a schematic for the HRC model for readers who may not be familiar with it.
https://doi.org/10.7554/eLife.72067.sa1Author response
Essential revisions:
The followed issues emerged in review – and were agreed upon by all of the reviewers in consultations.
1. Questions about the model architecture. Several model components (rotation and symmetry) were imposed rather than learned. Was this necessary? Can the model make (testable) predictions about connectomics data?
Thank you for these two questions. The first question is whether symmetries would arise naturally if not imposed. In our optimization, we imposed rotational and mirror symmetries on the excitatory and inhibitory weights, and also aligned the upward directions when there were multiple units in the models. At the same time, we also chose our stimuli to be isotropic: all stimulus positions were distributed at random across visual angles. This is a reasonable null distribution in the absence of information about the true distribution of looming stimuli. But this isotropy means our basal expectation is that because there are no leftright or updown asymmetries in the stimuli, we expect the filters to be largely symmetric, especially in the model with only one unit M = 1.
There is one large advantage to training our models with imposed symmetries: it reduces the total number of parameters in the model by almost by eightfold. In terms of training examples per parameter, one may also think of this as effectively decreasing the data required for training by eightfold. This computational efficiency justified our use of the symmetries in the paper.
To directly answer this question, we retrained our M = 1 model, this time without imposing the rotation and mirror symmetries (Figure 5—figure supplement 3 in the revised manuscript). As this figure shows, the trained weights possess roughly the rotational and mirror symmetries noted above. To see this more clearly, we quantified the degree of symmetry in these learned filters. To do this, we averaged the filters of the four directions (rotated accordingly to be aligned) and averaged the two halves of each filter. In this way, we created a symmetrized weighting, which we compared to the original unsymmetrized weights by calculating the cosine distance between them. The distribution of the cosine distances (Figure 5—figure supplement 3) shows that most of the trained weights are very close to the symmetrized one. In addition, when we trained the model using a data set eightfold larger than before, the trained weights become more symmetric (Figure 5—figure supplement 3). These results suggest that the isotropic training will naturally lead to the symmetrized weights, and imposing these symmetries in advance will make the training more efficient.
When we align multiple units on a sphere, with upward filters all pointing north, we break the isotropy of the model. In this case, it is possible that solutions without imposed filter symmetry would have asymmetric filters. However, those asymmetries would depend on the details of how we align and position our units. Those unit positions and their alignment, especially near the poles, is not well constrained by data, so we would be extremely reluctant to interpret any asymmetries that arose from our choices about how to distribute and align units. Thus, we do not believe it would be profitable to interpret any such asymmetries or try to tie them to the connectome.
A further important difficulty in comparing our results to connectomic data is that the hemibrain dataset from Janelia regrettably cut off most of the lobula plate, so that there is limited ultrastructure information about LPLC2 dendrites in that brain region. In the future, more ultrastructure data on LPLC2 would be useful both in constraining models of the sort we use here and in testing model predictions.
2. Types of solutions. The text and results needs to explore all three types of solution (inward, outward and unstructured) in more detail. It is currently difficult to understand why the inward and unstructured solutions are essentially dropped part way through.
Thank you for this suggestion. Although we never closely analyzed the unstructured data, in our original submission we did analyze both outward and inward solutions until the very last figure in the main text. In that last figure, we only showed the comparison of the outward solutions with the experimental data. In the revised manuscript, we now provide a supplementary figure for the last figure (Figure 10supplemental figure 1) that shows the comparison of the response curves of the inward model with the experimental data. Thus, we now follow the inward and outward solutions through to the end of the paper.
In the revised manuscript, we have also addressed the unstructured solution. In the original manuscript, we created some confusion by calling these ’unstructured solutions’, when a more accurate term would have been ’zero solutions’. The spatial weights of the unstructured solutions are exactly zero or very close to zero, and thus, a rather uninteresting solution. In the revised manuscript, we have updated Figure 5 to include the zero solutions labeled in black boxes (Figure 5—figure supplement 1). We also now more clearly describe the unstructured solutions in the first paragraph of the section ’Optimization finds two distinct solutions to the loominference problem’. In the new manuscript, we also refer to these solutions as ’zero solutions’ to reduce the confusion brought about by ’unstructured’.
3. More challenging tests of the model. Can you add distracting optic flow to the current stimulus set and/or use more naturalistic stimuli? This could help reduce the number of viable solutions.
This is an interesting suggestion. When we were designing the stimuli used for training, it was not clear to us what naturalistic looming stimuli should look like. It was also not clear what the statistics of the stimuli are and how they should be distributed. Thus, it is not easy to systematically engineer stimuli that are close to the ones that a fly could experience in reality.
However, following this suggestion, we have engineered a new set of stimuli, where for the hit, miss, and retreat cases, we added a rotational background. In these new stimuli, the object of interest, i.e., the one that is moving in the depth direction, also rotates with the background. This mimics the effect of self rotation of the fly while observing a looming or retreating object. In a new training set, we replaced half of the hit cases, half of the miss cases and half of the retreat cases in the original data set with these rotational ones. When optimizing with this more challenging dataset, the outward and inward solutions both continue to exist, and we did observe an expected decrease of the model performance, with a larger decrease for the inward solutions. We present the optimization to this more challenging training set in Figure9—figure supplement 3 and discuss this in the Results section ’Large populations of units improve performance’.
There are two important points to make about interpreting these results. First, we are not aware of any experiments measuring LPLC2 responses to loom with additional background flow fields. Thus, it is not clear that LPLC2 can even perform this task, which makes it difficult to connect these results to data. We are also not aware of data on insect performance to looming stimuli of this more difficult type, so it is also difficult to relate to a true task of an insect. The second point for naturalistic stimuli is that, as mentioned in the Discussion section of the manuscript, the input to our model is not the optical signal itself, but rather the flow field that is calculated by a motion estimator. We have used the simple Reichardt correlator to estimate the motion signals. A more thorough future investigation might examine a more realistic motion detection model that can deal with more complicated, naturalistic visual stimuli, including textures and natural scene statistics, which could add new signals to the inputs to the model.
4. Inhibitory component of the model. Inhibition is assumed to have specific properties (e.g. rectification) – and it is not clear if these are essential. Further, it is absent in some solutions. Are the properties of inhibition (when present) consistent with the broad LPi receptive fields?
This comment and question led to our major change to the linear receptive field model. It is true that it was not clear if the rectified inhibition was necessary, and in fact it was not. As we discussed in our revision summary, we have replaced the model that has a rectified inhibitory component, in our original submission, with a simpler model that has a linear receptive field (Figure 4A). This new model performs almost identically to the previous model, in terms of both AUC performance and replication of the neural data (Figure 9 and supplements, Figure 10 and supplements). The primary difference we observed using the linear receptive field model is that the ratio between the numbers of the outward and inward solutions does not increase as the number of units increases (Figure9—figure supplement 2), and there are fewer outward solutions than inward ones. The nonlinearity of the inhibitory components plays an important role in selecting the outward solutions over the inward ones. Interestingly, if we replace the rectified linear unit (ReLU) with an exponential linear unit (ELU), which has a negative slope below the threshold, for a small number of units, all solutions are inward. But when the number of units increases and become larger than 16, the outward solutions emerge more often from the training. The ratio in this case remains below 1. Combined, these results indicate that the form and position of the nonlinearity in the circuit play a role in selecting between different optimized solutions. This suggests that further studies of the nonlinearities may lead to additional insight into how a population of LPLC2s encodes looming stimuli.
This question also mentions the fact that in some outward solutions, the inhibitory components are zero. In the revised manuscript, for the simpler linear receptive field model, the inhibitory negative component exists in all outward solutions. While it is interesting to examine the family of solutions from the rectified inhibition model, they are no longer the focus of the paper and not central to its claims. In the interest of length, we have not added to our paper by analyzing this specific type of outward solution.
The second question above is whether the inhibitory fields are consistent with broad LPi receptive fields. Here, the new linear model shows that the negative regions (stronger inhibition) are generally broader than the positive regions (stronger excitation) when the number of units is large. In the rectified inhibition model, where excitatory and inhibitory components are dissociable, the inhibitory weights of outward solutions spanned most of the 60degree receptive field. For both models, the inhibition in outward solutions extended out far enough that responses were inhibited by motion far from looming centers (Figure 10 panels E, F and Figure 10—figure supplement 2 E, F), as in the data from LPLC2.
It is a bit difficult to compare these results directly to LPi data. LPis extend over regions that are similar in size to LPLC2 dendrites (Klapoetke et al., Nature, 2017, Figure 5K and Extended Data Figure 9). Moreover, it is not clear how much LPi cells integrate over space, or whether can have more localized inputoutput signals, as in neurons like CT1. Overall, for models with large number of units, the inhibition is broader than the excitation and seems consistent with broad (averaged) inputs from LPi neurons.
5. Comparison of model with neural data. A stronger rationale is needed for why two of the many outward models are selected for comparison with neural data (and why comparisons are not made for the inward or unstructured models). It is also important to quantify the similarity of the models with neural data.
Thank you for these suggestions. One advantage of the new linear receptive field model is that the variability in the solutions is mostly eliminated. In the revised manuscript, we now show the outward solution comparison with the data in the main figure (Figure 10) and the inward solution comparison in Figure10—figure supplement 1. We now include the rectified inhibition model comparison in a supplementary figure (Figure 10—figure supplement 2).
In the initial submission, in which there was a distribution of solutions, it might have been useful to quantify the relative fits of the different outward solutions. But with the linear receptive field model, this withinmodel quantification does not seem warranted because there is no distribution of trained filters.
Our goal in the comparison between the model and the data is to see how the two compare qualitatively, rather than quantitatively. For a quantification of the comparison to be interpretable, one would have to account for calcium indicator dynamics, which we have not included in our model. The important points in comparing the model to the data are: (1) the model responds strongly and selectively to loom signals rather than other nonlooming signals; (2) the model qualitatively reproduces LPLC2 responses to various expanding bar stmuli; (3) the model shows periphery inhibition as observed in experiments; and (4) the model shows similar size tuning properties to LPLC2 neurons. We outline these qualitative similarities in the text analyzing the data in Figure 10. In general, we are strong proponents of quantifying similarities and differences, but in this case a quantification of these qualitative results does not seem as though it would provide additional insight.
Reviewer #1 (Recommendations for the authors):
Line 2627: It would be helpful to make a somewhat more general statement about the power of the approach that you take here.
We have added a more general statement here, and expanded later in the introduction on how this approach relates to others.
Figure 3 is the first figure referred to, so moving it up to Figure 1 would make reading easier.
We want to keep the anatomy as the first figure, and so we removed the reference to Figure 3 in the first paragraph of the introduction.
Line 79: clarify here you mean object motion, not motion of one of the edges.
We rewrote the sentence to make it more clear that it is object motion.
Line 9495: the relationship between timing and sizetospeed ratio is likely hard for most readers to make sense of here – suggest deleting.
Removed.
Lines 150151: suggest clarifying that excitation and inhibition in the model are not constrained to have opposite spatial dependencies as depicted in the Figure 4.
We have added some sentences in both the main text (model section in the results) and the model figure caption to clarify this.
Line 170: suggest describing the loss function in a sentence in the Results.
Did as suggested in the last paragraph of the Results section ’An anatomicallyconstrained mathematical model’.
Lines 174176: It would be helpful to connect the outward and inward model terminology more clearly to the flow fields in Figure 3 here. I think this is just a matter of highlighting which elements of the grid in Figure 3 are relevant for each model.
In the revised manuscript, these connections are made in the last two paragraphs of Results section ’Optimization finds two distinct solutions to the loominference problem’.
Lines 177178: describe performance measures here qualitatively.
We have added this.
Lines 206209: the reason for the difference in baseline activity is not clear – and it requires a lot of effort to extract that from the methods. Can you give more intuition here in the results?
Thank you for highlighting this. Yes, it does require the details of the model to think through this. The baseline activity of the inward solutions does not have to be positive, but it just happens to be. We have added some comments on this in the section ’Outward and inward filters are selective to signals in different ranges of angles’.
Lines 336340: this is helpful, and some of it could come up earlier in the Results. More generally, it would be helpful to be clearer (especially in results) how much of the encoding of angular size is a property of expansion of the stimulus, and how much of how the computation is implemented.
These comments have been moved to earlier the Results section ’Activation patterns of computational solutions resemble biological responses’. With these comments, we want to provide an intuitive explanation of why the LPLC2 neurons and our models are angular size encoder, but it is not straightforward to quantify the contributions of the two aspects to the angular size tuning.
Reviewer #2 (Recommendations for the authors):
– The manuscript is a bit difficult to understand. The authors may want to improve their explanations and figures to make them more accessible. For example, in Figure 7B, I can barely see the responses and don't see any grey lines. Perhaps showing only a subset of responses would make the figure clearer  less is more.
We have made the lines thicker and panels larger to make the figures clearer.
– The usage of the term "ballistic" in the introduction is confusing. In many contexts, "ballistic" suggests freefalling motion; in this paper, the authors are referring to the distinction between ballistic and diffusive motion. To avoid confusion, I would suggest not using the term ballistic at all; instead, "straight line" or "linear" is just as expressive.
We agree this was inappropriate. We now use the suggested term ”straight line motion”.
– The first figure that is cited in the text is Figure 3. I suggest reorganizing either the text or the figures so that the first figure that is cited is Figure 1.
We have deleted the reference to the Figure 3 in the first paragraph of the introduction.
– Figure 5, panel D: why are there two magenta curves?
In the initial submission, there were more than one example. In the new linear receptive field model, the curves are on top of each other, so there is only one curve apparent. We now state in figure captions when curves lie on top of one another.
– I would also suggest a careful reading to screen for typos  I found a dozen or so, from misspelled words to mismatched parentheses.
We have read carefully through the manuscript and attempted to find and correct all typos.
Reviewer #3 (Recommendations for the authors):
1. Suggestions for improved or additional experiments, data or analyses:
a. The authors should provide their criteria for selecting a particular solution to compare to neural data.
Please see Essential Revisions 5. The new linear RF model means that we no longer deal with this distribution of solutions for the main model we study, and selection is not required. Moreover, we now show all three different subtypes of outward solutions for the rectified inhibition model in Figure10—figure supplement 2, 3, 4.
b. The authors should evaluate how well their solutions predict neural data.
Please see Essential Revisions 5. We believe that the qualitative evaluation of the model with data is extremely informative, and without a family of solutions, we are not sure of the goal of a more formal, quantitative comparison between model and data.
c. The authors need to mention that certain outward solutions have no inhibitory component (see Figure 5C, Figure 6 supplement 2). It needs to be discussed in the text and it would be very interesting to see how well these solutions recreate actual data.
The inhibitionabsent outward solutions only exist in the rectified inhibition models, but not in the new linear receptive field models. The outward solution without inhibitory component will respond strongly to the moving gratings in Figure 10—figure supplement 4B (which is different from experimental observations), and it cannot show the periphery inhibition in Figure 10—figure supplement 4E and F. We now mention this as among the family outward solutions in the rectified inhibition model, and point out its shortcomings.
d. It would be helpful for the authors to provide an example of an "unstructured" solution and an evaluation of its performance, even if it is included as a supplemental figure.
This is now provided in the Figure 5 supplemental figure 1, shown as zero solutions. Please see Essential Revisions 2.
2. Recommendations for improving writing and presentation
a. Lines 8990 – this can be better supported by adding the criteria/evaluation mentioned above.
Thank you for this suggestion. We have added more detail about the evaluations of the models in the Results section ’Optimization finds two distinct solutions to the loominference problem’.
b. Methods (~ line 483) – How is the HRC model using T5 (off) and T4 (on) motion input?
The HRC model we use does not distinguish between light and dark edges. Using it as the input is most similar to having both T4 and T5 input (which is also why HS cell activity can often be wellapproximated by an HRC).
c. Lines 492502 – What was the frame rate (timestep) for both training and testing stimuli?
We have added this information in the methods: the time step for the stimuli is also 0.01 second.
d. Figures – Please increase the size when there is white space available. Make sure the pink and green color scheme for the two solution sets are very obvious.
Increased the sizes of some panels.
e. Figure 1 caption – approximately half of the 200 LPLC2 are directly synaptic to the GF.
We are uncertain where this information comes from. In the Ache et al., paper (Current Biology, 2019), they reported 108 LPLC2 neurons projecting to the GF in the right hemisphere of an adult Drosophila. So, in total, there should be about 200 LPLC2 neurons directly projecting to the two GFs. In the hemibrain dataset, there are 68 annotated LPLC2R neurons and all 68 LPLC2R neurons are listed a presynaptic to the right giant fiber in a neuprint query. When not restricted to the ’R’ suffix, one finds a similarly large fraction of LPLC2 neurons presynaptic to the giant fiber. Unless we are mistaken, it appears that most LPLC2 neurons synapse onto the GF. In the Figure 1 caption and introduction, we changed GF to GFs to indicate that these 200 LPLC2 project to two GFs, respectively. If we have missed an important measurement of this connectivity, we would be happy to correct this description if the reviewer could provide the reference.
f. Figure 5 – is cross entropy loss the same as what is referred to as the loss function (equation 6) in the methods? If so, keep consistent. If not, please explain.
Yes, they are the same. We have changed the l.h.s of Equation 6 from loss to cross entropy loss.
g. Figure 8D, it is difficult to see the boxplots.
In the revised manuscript, we have made the boxes larger and hopefully easier to see.
h. Figure 10 IL, it is difficult at first glance to realize what is neural data vs model output. Maybe label the rows instead?
We have labeled the rows as suggested.
i. Supplemental Figure 1. Add a schematic for the HRC model for readers who may not be familiar with it.
Added as suggested.
https://doi.org/10.7554/eLife.72067.sa2Article and author information
Author details
Funding
National Institutes of Health (R01EY026555)
 Baohua Zhou
 Damon A Clark
National Science Foundation (CCF1839308)
 Baohua Zhou
 John Lafferty
 Damon A Clark
National Science Foundation (DMS1513594)
 John Lafferty
Kavli Foundation
 John Lafferty
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
Research supported in part by NSF grants DMS1513594, CCF1839308, DMS2015397, NIH R01EY026555, a JP Morgan Faculty Research Award, and the Kavli Foundation. We thank G Card and N Klapoetke for sharing data traces from their paper. We thank members of the Clark lab for discussions and comments.
Senior Editor
 Ronald L Calabrese, Emory University, United States
Reviewing Editor
 Fred Rieke, University of Washington, United States
Reviewers
 Fred Rieke, University of Washington, United States
 Catherine von Reyn
Publication history
 Preprint posted: July 8, 2021 (view preprint)
 Received: July 8, 2021
 Accepted: January 11, 2022
 Accepted Manuscript published: January 13, 2022 (version 1)
 Version of Record published: February 16, 2022 (version 2)
Copyright
© 2022, Zhou et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 928
 Page views

 140
 Downloads

 2
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.