Shallow neural networks trained to detect collisions recover features of visual loomselective neurons
Abstract
Animals have evolved sophisticated visual circuits to solve a vital inference problem: detecting whether or not a visual signal corresponds to an object on a collision course. Such events are detected by specific circuits sensitive to visual looming, or objects increasing in size. Various computational models have been developed for these circuits, but how the collisiondetection inference problem itself shapes the computational structures of these circuits remains unknown. Here, inspired by the distinctive structures of LPLC2 neurons in the visual system of Drosophila, we build anatomicallyconstrained shallow neural network models and train them to identify visual signals that correspond to impending collisions. Surprisingly, the optimization arrives at two distinct, opposing solutions, only one of which matches the actual dendritic weighting of LPLC2 neurons. Both solutions can solve the inference problem with high accuracy when the population size is large enough. The LPLC2like solutions reproduces experimentally observed LPLC2 neuron responses for many stimuli, and reproduces canonical tuning of loom sensitive neurons, even though the models are never trained on neural data. Thus, LPLC2 neuron properties and tuning are predicted by optimizing an anatomicallyconstrained neural network to detect impending collisions. More generally, these results illustrate how optimizing inference tasks that are important for an animal’s perceptual goals can reveal and explain computational properties of specific sensory neurons.
Editor's evaluation
This paper trains a simple neural network model to perform a behaviorally important task: the detection of looming objects. Two solutions emerge, one of which shares several properties with the actual circuit. This is a nice demonstration that training a CNN on a behaviorallyrelevant task can reveal how the underlying computations work.
https://doi.org/10.7554/eLife.72067.sa0Introduction
For animals living in dynamic visual environments, it is important to detect the approach of predators or other dangerous objects. Many species, from insects to humans, rely on a range of visual cues to identify approaching, or looming, objects (Regan and Beverley, 1978; Sun and Frost, 1998; Gabbiani et al., 1999; Card and Dickinson, 2008; Münch et al., 2009; Temizer et al., 2015). Looming objects create characteristic visual flow fields. When an object is on a straightline collision course with an animal, its edges will appear to the observer to expand radially outward, gradually occupying a larger and larger portion of the visual field (Video 1). An object heading toward the animal, but which will not collide with it, also expands to occupy an increasing portion of the visual field, but its edges do not expand radially outwards with respect to the observer. Instead, they expand with respect to the object’s center so that opposite edges can move in the same direction across the retina (Video 2). A collision detector must distinguish between these two cases, while also avoiding predicting collisions in response to a myriad of other visual flow fields, including those created by an object moving away (Video 3) or by the animal’s own motion (Video 4). Thus, loom detection can be framed as a visual inference problem.
Many sighted animals solve this inference problem with high precision, thanks to robust loomselective neural circuits evolved over hundreds of millions of years. The neuronal mechanisms for response to looming stimuli have been studied in a wide range of vertebrates, from cats and mice to zebrafish, as well as in humans (King et al., 1992; HervaisAdelman et al., 2015; Ball and Tronick, 1971; Liu et al., 2011; Salay et al., 2018; Liu et al., 2011; Shang et al., 2015; Wu et al., 2005; Temizer et al., 2015; Dunn et al., 2016; Bhattacharyya et al., 2017). In invertebrates, detailed anatomical, neurophysiological, behavioral, and modeling studies have investigated loom detection, especially in locusts and flies (Oliva and Tomsic, 2014; Sato and Yamawaki, 2014; Santer et al., 2005; Rind and Bramwell, 1996; Card and Dickinson, 2008; de Vries and Clandinin, 2012; Muijres et al., 2014; Klapoetke et al., 2017; von Reyn et al., 2017; Ache et al., 2019). An influential mathematical model of loom detection was derived by studying the responses of the giant descending neurons of locusts (Gabbiani et al., 1999). This model established a relationship between the timing of the neurons’ peak responses and an angular size threshold for the looming object. Similar models have been applied to analyze neuronal responses to looming signals in flies, where genetic tools make it possible to precisely dissect neural circuits, revealing various neuron types that are sensitive to looming signals (von Reyn et al., 2017; Ache et al., 2019; Morimoto et al., 2020).
However, these computational studies did not directly investigate the relationship between the structure of the loomsensitive neural circuits and the inference problem they appear to solve. On the one hand, the properties of many sensory circuits appear specifically tuned to the tasks that they are executing (Turner et al., 2019). In particular, by taking into account relevant behaviors mediated by specific sensory neurons, experiments can provide insight into their tuning properties (Krapp and Hengstenberg, 1996; Sabbah et al., 2017). On the other hand, computational studies that have trained artificial neural networks to solve specific visual and cognitive tasks, such as object recognition or motion estimation, have revealed response patterns similar to the corresponding biological circuits (Yamins et al., 2014; Yamins and DiCarlo, 2016; Richards et al., 2019) or even individual neurons (Mano et al., 2021). Thus, here we ask whether we can reproduce the properties associated with neural loom detection simply by optimizing shallow neural networks for collision detection.
The starting point for our computational model of loom detection is the known neuroanatomy of the visual system of the fly. In particular, the loomsensitive neuron LPLC2 (lobula plate/lobula columnar, type 2) has been studied in detail (Wu et al., 2016). These neurons tile visual space, sending their axons to descending neurons called the giant fibers (GFs), which trigger the fly’s jumping and takeoff behaviors (Tanouye and Wyman, 1980; Card and Dickinson, 2008; von Reyn et al., 2017; Ache et al., 2019). Each LPLC2 neuron has four dendritic branches that receive inputs at the four layers of the lobula plate (LP) (Figure 1A; Maisak et al., 2013; Klapoetke et al., 2017). The retinotopic LP layers host the axon terminals of motion detection neurons, and each layer uniquely receives motion information in one of the four cardinal directions (Maisak et al., 2013). Moreover, the physical extensions of the LPLC2 dendrites align with the preferred motion directions in the corresponding LP layers (Figure 1B; Klapoetke et al., 2017). These dendrites form an outward radial structure, which matches the moving edges of a looming object that expands from the receptive field center (Figure 1C). Common stimuli such as the widefield motion generated by movement of the insect only match part of the radial structure, and strong inhibition for inwarddirected motion suppresses responses to such stimuli. Thus, the structure of the LPLC2 dendrites favors responses to visual stimuli with edges moving radially outwards, corresponding to objects approaching the receptive field center.
The focus of this paper is to investigate how loom detection in LPLC2 can be seen as the solution to a computational inference problem. Can the structure of the LPLC2 neurons be explained in terms of optimization—carried out during the course of evolution—for the task of predicting which trajectories will result in collisions? How does coordination among the population of more than 200 LPLC2 neurons tiling a fly’s visual system affect this optimization? To answer these questions, we built simple anatomicallyconstrained neural network models, which receive motion signals in the four cardinal directions. We trained the model using artificial stimuli to detect visual objects on a collision course with the observer. Surprisingly, optimization finds two distinct types of solutions, with one resembling the LPLC2 neurons and the other having a very different configuration. We analyzed how each of these solutions detects looming events and where they show distinct individual and population behaviors. When tested on visual stimuli not in the training data, the optimized solutions with filters that resemble LPLC2 neurons exhibit response curves that are similar to those of LPLC2 neurons measured experimentally (Klapoetke et al., 2017). Importantly, although it only receives motion signals, the optimized model shows characteristics of an angular size encoder, which is consistent with many biological loom detectors, including LPLC2 (Gabbiani et al., 1999; von Reyn et al., 2017; Ache et al., 2019). Our results show that optimizing a neural network to detect looming events can give rise to the properties and tuning of LPLC2 neurons.
Results
A set of artificial visual stimuli is designed for training models
Our goal is to compare computational models trained to perform loomdetection with the biological computations in LPLC2 neurons. We first created a set of stimuli to act as training data for the inference task (Materials and methods). We considered the following four types of motion stimuli: loomandhit (abbreviated as hit), loomandmiss (miss), retreat, and rotation (Figure 2). The hit stimuli consist of a sphere that moves in a straight line towards the origin on a collision course (Figure 3, Video 1). The miss stimuli consist of a sphere that moves in a straight line toward the origin but misses it (Figure 3, Video 2). The retreat stimuli consist of a sphere moving in a straight line away from the origin (Figure 3, Video 3). The rotation stimuli consist of objects rotating about an axis that goes through the origin (Figure 3, Video 4). All stimuli were designed to be isotropic; the first three stimuli could have any orientation in space, while the rotation could be about any axis through the origin (Figure 2). All trajectories were simulated in the frame of reference of the fly at the origin, with distances measured with respect to the origin. For simplicity, the fly is assumed to be a point particle with no volume (Red dots in Figure 2 and the apexes of the cones in Figure 3). For hit, miss, and retreat stimuli, the spherical object has unit radius, and for the case of rotation, there were 100 objects of various radii scattered isotropically around the fly (Figure 3).
An anatomically constrained mathematical model
We designed and trained simple, anatomically constrained neural networks (Figure 4) to infer whether or not a moving object will collide with the fly. The features of these networks were designed to mirror anatomical features of the fly’s LPLC2 neurons (Figure 1). We will consider two types of single units in our models (Figure 4): a linear receptive field (LRF) unit and a rectified inhibition (RI) unit. Both types of model units receive input from a 60 degree diameter cone of visual space, represented by white cones and grey circles in Figure 3, approximately the same size as the receptive fields measured in LPLC2 (Klapoetke et al., 2017). The four stimulus sets were projected into this receptive field for training and evaluating the models. The inputs to the units are local directional signals computed in the four cardinal directions at each point of the visual space: downward, upward, rightward, and leftward (Figure 3). These represent the combined motion signals from T4 and T5 neurons in the four layers of the lobula plate (Maisak et al., 2013). They are computed as the nonnegative components of a HassensteinReichardt correlator model (Hassenstein and Reichardt, 1956) in both horizontal and vertical directions (Figure 3—figure supplement 1; Materials and methods). The motion signals are computed with a spacing of 5 degrees, roughly matching the spacing of the ommatidia and processing columns in the fly eye (Stavenga, 2003).
For the LRF model unit (Figure 4A), there is a single set of four realvalued filters, the elements of which can be positive (excitatory, red) or negative (inhibitory, blue). The four filters integrate motion signals from the four cardinal directions, respectively, and their outputs are summed and rectified to generate the output of a single model unit (Figure 4A). These spatial filters effectively represent excitatory inputs to LPLC2 directly from T4 and T5 in the LP (positive elements), and inhibitory inputs mediated by local interneurons (negative elements) (Mauss et al., 2015; Klapoetke et al., 2017). All filters act on the 60 degree receptive field of a unit. A 90 degree rotational symmetry is imposed on the filters, so that the filters in each layer are identical. Moreover, each filter is symmetric about the axis of motion (Materials and methods). Although these symmetry assumptions are not necessary and may be learned from training (Figure 5—figure supplement 3), they greatly reduce the number of parameters in the models. No further assumptions were made about the structure of the filters. Note that the opposing spatial patterns of the excitatory and inhibitory components in Figure 4 are only for illustration purposes, and are not imposed on the models. The LRF model unit is equivalent to a linearnonlinear model and constitutes one of the simplest possible models for the LPLC2 neurons. In this work, we will focus most of our analysis on this simplest form of the model.
In addition, we will also consider a more complex model unit, which we call the rectified inhibition (RI) model unit (Figure 4B). In this unit, all the same filter symmetries are enforced, but there are two sets of nonnegative filters: a set of excitatory filters and a set of inhibitory filters. The RI unit incorporates a fundamental difference between the excitatory and inhibitory filters: while the integrated signals from each excitatory filter are sent directly to the downstream computations, the integrated signals from each inhibitory filter are rectified before being sent downstream. The outputs of the eight filters are summed and rectified to generate the output of a single model unit in response to a given stimulus (Figure 4B). If one removes the rectification of the inhibitory filters (as is possible with an appropriate choice of parameters), the RI unit becomes equivalent to the LRF unit (Materials and methods). This difference between the two model units reflects different potential constraints on the inhibitory inputs to an actual LPLC2 neuron. While the excitatory inputs to LPLC2 are direct connections, the inhibitory inputs are mediated by inhibitory interneurons (LPi) between LP layers (Mauss et al., 2015; Klapoetke et al., 2017). The LRF model unit assumes that LPi interneurons are linear and do not strongly rectify signals, while the RI model unit approximates rectified transmission by LPi interneurons.
In the fly brain, a population of LPLC2 neurons converges onto the GFs (Figure 1D). Accordingly, in our model there are $M$ replicates of model units, with orientations that are spread uniformly over the $4\pi $ steradians of the unit sphere (Figure 4C and D, Figure 4—figure supplement 1, Materials and methods). In this way, the receptive fields of the $M$ units roughly tile the whole angular space, with or without overlap, depending on the value of $M$. The sum of the responses of the $M$ model units is fed into a sigmoid function to generate the predicted probability of collision for a given trajectory (Materials and methods). The loss function then is defined as the cross entropy between the predicted probabilities and the stimulus labels (hits are labeled one and all others are labeled 0).
Optimization finds two distinct solutions to the loominference problem
The objective of this study is to investigate how the binary classification task shapes the structure of the filters, and how the number of units $M$ affects the results. We begin with the simplest LRF model, which possesses only a single unit, $M=1$. After training with 200 random initializations of the filters, we find that the converged solutions fall into three broad categories (Figure 5, Figure 5—figure supplement 1). Two solutions have spatial structures that are—surprisingly—roughly opposite from one another (magenta and green). Based on the configurations of the positivevalued elements (stronger excitation) of the filters (Materials and methods), we call one solution type outward solutions (magenta) and the other type inward solutions (green) (Figure 5C, Figure 5—figure supplement 1). In this singleunit model, the inward solutions have higher area under the curve (AUC) scores for both receiver operating characteristic (ROC) and precision recall curves, and thus perform better than the outward solutions on the discrimination task (Figure 5D). A third category of solution has all the elements in the filters very close to zero (zero solutions in black squares, Figure 5—figure supplement 1). This solution appears roughly 5–15% of the time, and appears to be a local minimum of the optimization, dependent on the random initialization. This uninteresting category of solutions is ignored in subsequent analyses.
As the number of units $M$ increases, the population of units covers more angular space, and when $M$ is large enough ($M\ge 16$), the receptive fields of the units begin to overlap with one another (Figure 6A). In the fly visual system there are over 200 LPLC2 neurons across both eyes (Ache et al., 2019), which corresponds to a dense distribution of units. This is illustrated by the third row in Figure 6A, where $M=256$. When $M$ is large, objects approaching from any direction are detectable, and such object signals can be detected simultaneously by many neighboring units. The two oppositely structured solutions persist, regardless of the value of $M$ (Figure 6, Figure 6—figure supplement 1, Figure 6—figure supplement 2, Figure 6—figure supplement 3). Strikingly, the inhibitory component of the outward solutions becomes broader as $M$ increases and expands to extend across the entire receptive field (Figure 6B). This broad inhibition is consistent with the large receptive field of LPi neurons suggested by experiments (Mauss et al., 2015; Klapoetke et al., 2017). The outward, inward, and zero solutions all also appear in trained solutions of the RI models. (Figure 5—figure supplement 2, Figure 6—figure supplement 3).
Units with outwardoriented filters are activated by motion radiating outwards from the center of the receptive field, such as the hit event illustrated in Figure 3. These excitatory components resemble the dendritic structures of the actual LPLC2 neurons observed in experiments, where for example, the rightward motionsensitive component (LP2) occupies mainly the right side of the receptive field. In the outward solutions of the LRF models, the rightward motionsensitive inhibitory components mainly occupy the left side of the receptive field (Figure 6B, Figure 6—figure supplement 2). This is also consistent with the properties of the lobula plate intrinsic (LPi) interneurons, which project inhibitory signals roughly retinotopically from one LP layer to the adjacent layer with opposite directional tuning (Mauss et al., 2015; Klapoetke et al., 2017).
The unexpected inwardoriented filters have the opposite structure. In the inward solutions, the rightward sensitive excitatory component occupies the left side of the receptive field, and the inhibitory component occupies the right side. Such weightings make the model selective for motion converging toward the receptive field center, such as the retreat event shown in Figure 3. This is a puzzling structure for a loom detector, and warrants a more detailed exploration of the response properties of the inward and outward solutions.
Units with outward and inward filters respond to hits originating in distinct regions
To understand the differences between the two types of solutions and why the inward ones can predict collisions, we investigated how units respond to hit stimuli originating at different angles $\theta $ (Figure 7A). When there is no signal, the baseline activity of outward units is zero; however, the baseline activity of inward units is above zero (grey dashed lines in Figure 7B and C). This is because the trained intercepts are negative (positive) in the outward (inward) case and when the input is zero (no signal), the unit activity cannot (can) get through the rectifier (Materials and methods). (The training did not impose any requirements on these intercepts.) The outward units respond strongly to stimuli originating near the center of the receptive field, but do not respond to stimuli originating at angles larger than approximately ${30}^{\circ}$ (Figure 7B and C). In contrast, inward units respond below baseline to hit stimuli approaching from the center and above baseline to stimuli approaching from the periphery of the receptive field, with $\theta $ between roughly ${30}^{\circ}$ and ${90}^{\circ}$ (Figure 7B and C). This helps explain why the inward units can act as loom detectors: they are sensitive to hit stimuli coming from the edges of the receptive field rather than from the center. The hit stimuli are isotropic (Figure 2A), so the number of stimuli with angles between ${30}^{\circ}$ and ${90}^{\circ}$ is much larger than the number of stimuli with angles below ${30}^{\circ}$ (Figure 7D). Thus, the inward units are sensitive to more hit cases than the outward ones. One may visualize these responses as heat maps of the mean response of the units in terms of object distance to the fly and the incoming angle (Figure 7E). For the hit cases, the response patterns are consistent with the intuition about trajectory angles (Figure 7C). Both outward and inward units respond less strongly to miss signals than to hit signals. As expected, while the outward units respond at most weakly to retreating signals, the inward ones respond to these signals with angles near ${180}^{\circ}$, since the motion of edges in such cases is radially inward. The RI model units have similar response patterns to the LRF model units (Figure 7—figure supplement 1).
Outward solutions have sparse codings and populations of units accurately predict hit probabilities
Individual units of the two solutions are very different from one another in both their filter structure and their response patterns to different stimuli. In populations of units, the outward and inward solutions also exhibit very different response patterns for a given hit stimulus (Figure 8A and B, Videos 5 and 6). In particular, active outward units usually respond more strongly than inward units, but more inward units will be activated by a hit stimulus. This is consistent with the findings above, in which inward filter shapes responded to hits arriving from a wider distribution of angles (Figure 7). For all the four types of stimuli, the outward solutions generally show relatively sparse activity among units, especially for models with larger numbers of units $M$, while the inward solutions show broader activity among units (Figure 8A and B, Figure 8—figure supplement 1, Figure 8—figure supplement 2, Videos 5–12).
When a population of units encodes stimuli, at each time point, the sum of the activities of the units is used to infer the probability of hit. In our trained models, the outward and inward solutions predict similar trajectories of probabilities of hit (Figure 8A). The outward solutions suppress the miss and retreat signals better, while the inward solutions better suppress rotation signals (Figure 8C). Both 32 unit and 256 unit models have units covering the entire visual field (Figure 6), but the models with 256 units can more accurately detect hit stimuli (Figure 8C). In some cases, misses can appear very similar to hits if the object passes near the origin. Both inward and outward solutions reflect this in their predictions in response to near misses, which have higher hit probabilities than far misses (Figure 8D).
The inward and outward solutions of RI models have similar behaviors to the LRF models (Figure 8—figure supplement 3). This indicates that the linear receptive field intuition captures behavior of the potentially more complicated RI model, as well.
Large populations of units improve performance
Since a larger number of units will cover a larger region of the visual field, a larger population of units can in principle provide more information about the incoming signals. In general, the models perform better as the number of units $M$ increases (Figure 9A). When $M$ is above 32, both the ROCAUC and PRAUC scores are almost 1 (Materials and methods), which indicates that the model is very accurate on the binary classification task presented by the four types of synthetic stimuli. As $M$ increases, the outward solutions become closer to the inward solutions in terms of both AUC score and cross entropy loss (Figure 9A and B). This is also true for the RI models (Figure 9—figure supplement 1A and B).
Beyond performance of the two solution types, we also calculated the ratio of the number of outward to inward solutions in 200 random initializations of the models with $M$ units. For the LRF models, as the number of units increases, the ratio remained relatively constant, fluctuating around 0.5 (Figure 9—figure supplement 2). For the RI models, on the other hand, as the number of units increases, an increasing proportion of solutions have outward filters (Figure 9—figure supplement 2). For RI models with 256 units, the chance that an outward filter appears as a solution is almost 90% compared with roughly 50% when $M=1$.
Because of this qualitative difference between the LRF and RI models, we next asked whether the form of the nonlinearity in the LRF model could influence the solutions found through optimization. When we replaced the rectified linear unit (ReLU) in LRF models with the exponential linear unit (ELU) (4 Methods and Materials), only inward solutions exist for models with $M<32$, and the outward solutions emerge more often as $M$ increases (Figure 9—figure supplement 2). Combined, these results with LRF and RI models indicate that the form and position of the nonlinearity in the circuit play a role in selecting between different optimized solutions. This suggests that further studies of the nonlinearities in LPLC2 processing will lead to additional insight into how a population of LPLC2s encodes looming stimuli.
The current binary prediction task is relatively easy for our loom detection models, as can be seen by the saturated AUC scores when $M$ is large (Figure 9A, Figure 9—figure supplement 1A). Thus, we engineered a new set of stimuli, where for the hit, miss, and retreat cases, we added a rotational background to increase the difficulty of the task. The object of interest, which is moving toward or away from the observer, also rotates with the background. This arrangement mimics selfrotation of the fly while observing a looming or retreating object in a cluttered background. To train an LRF model, we added such rotational distraction to half of the hit, miss, and retreat cases. The outward and inward solutions both persist (Figure 9—figure supplement 3), although in this case, outward solutions outperform inward ones (Figure 9—figure supplement 3). It remains unclear whether this added complexity brings our artificial stimuli closer to actual detection tasks performed by flies, but this result makes clear that identifying the natural statistics of loom will be important to understanding loom inferences.
Activation patterns of computational solutions resemble biological responses
The outward solutions have a receptive field structure that is similar to LPLC2 neurons, based on anatomical and functional studies. However, it is not clear whether these models possess the functional properties of LPLC2 neurons, which have been studied systematically (Klapoetke et al., 2017; Ache et al., 2019). To see how trained units compare to LPLC2 neuron properties, we presented stimuli to the trained outward model solution to compare its responses to those measured in LPLC2 (Figure 10).
The outward unit behaves similarly to LPLC2 neurons on many different types of stimuli. Not surprisingly, the unit is selective for loom signals and does not have strong responses to nonlooming signals (Figure 10B). Moreover, the unit closely follows the responses of LPLC2 neurons to various expanding bar stimuli, including the inhibitory effects of inward motion (Figure 10C and D). In addition, in experiments, motion signals that appear at the periphery of the receptive field suppress the activity of the LPLC2 neurons (periphery inhibition) (Klapoetke et al., 2017), and this phenomenon is successfully predicted by the outward unit (Figure 10E and F) due to its broad inhibitory filters (Figure 10A). The unit also correctly predicts response patterns of the LPLC2 neurons for expanding bars with different orientations (Figure 10G and H).
The ratio of object size to approach velocity, or $R/v$, is an important parameter for looming stimuli, and many studies have investigated how the response patterns of loomsensitive neurons depend on this ratio (Top panels in Figure 10I, J, K and L; Gabbiani et al., 1999; von Reyn et al., 2017; Ache et al., 2019; de Vries and Clandinin, 2012). Here, we presented the trained model (Figure 10A) with hit stimuli with different $R/v$ ratios, and compared its response with experimental measurements (Figure 10I–L). Surprisingly, although our model only has angular velocities as inputs (Figure 3), it reliably encodes the angular size of the stimulus rather than its angular velocity (Figure 10J). This is indicated by the collapsed response curves with different $R/v$ ratios (up to different scales) when plotted against the angular sizes (von Reyn et al., 2017). When the curves are plotted against angular velocity, they shift for different $R/v$ ratios, which means the response depends on the velocity $v$ of the object, since $R$ is fixed to be 1. The relative shifts in these curves are consistent with properties of LPLC2.
There are two ways that this angular size tuning likely arises. First, in hit stimuli, the angular size and angular velocity are strongly correlated (Gabbiani et al., 1999), which means the angular size affects the magnitude of the motion signals. Second, in hit stimuli, the angular size is proportional to the path length of the outwardmoving edges. This angular circumference of the hit stimulus determines how many motion detectors are activated, so that integrated motion signal strength is related to the size. Both of these effects influence the response patterns of the model units (and the LPLC2 neurons). Beyond the tuning to stimulus size, the outward model also reproduces a canonical linear relationship between the peak response time relative to the collision and the $R/v$ ratio (Figure 10L; Gabbiani et al., 1999; Ache et al., 2019).
Not surprisingly, the inward solution cannot reproduce the neural data (Figure 10—figure supplement 1). On the other hand, some forms of the the RI model outward solutions can closely reproduce the neural data (Figure 10—figure supplement 2), while other outward solutions fail to do so (Figure 10—figure supplement 3, Figure 10—figure supplement 4). For example, some RI outward solutions predict the patterns in the wide expanding bars differently and out of phase from the biological data (Figure 10—figure supplement 3H), do a poor job predicting the response curves of the LPLC2 neurons to looming signals with different $R/v$ ratios (Figure 10—figure supplement 3J and K), respond strongly to the moving gratings (Figure 10—figure supplement 4B), cannot show the peripheral inhibition (Figure 10—figure supplement 4E and F), and so on. This shows that, for the RI models, even within the family of learned outward solutions, there is variability in the learned response properties. Although solving the inference problem with the RI model obtains many of the neural response properties, additional constraints could be required. These additional constraints may be built into the LRF model, causing its outward solutions to more closely match neural responses.
Discussion
In this study, we have shown that training a simple network to detect collisions gives rise to a computation that closely resembles neurons that are sensitive to looming signals. Specifically, we optimized a neural network model to detect whether an object is on a collision course based on visual motion signals (Figure 3), and found that one class of optimized solution matched the anatomy of motion inputs to LPLC2 neurons (Figures 1, 5 and 6). Importantly, this solution reproduces a wide range of experimental observations of LPLC2 neuron responses (Figure 10; Klapoetke et al., 2017; von Reyn et al., 2017; Ache et al., 2019).
The radially structured dendrites of the LPLC2 neuron in the LP can account for its response to motion radiating outward from the receptive field center (Klapoetke et al., 2017). Our results show that the logic of this computation can be understood in terms of inferential loom detection by the population of units. In particular, for an individual detector unit, an inward structure can make a better loom detector than an outward structure, since it is sensitive to colliding objects originating from a wider array of incoming angles (Figure 7). As the number of units across visual space increases, the performance of the outwardsensitive receptive field structure comes to match the performance of the inward solutions (Figure 9, Figure 9—figure supplement 1, Figure 9—figure supplement 2). As the number of units increases, the inhibitory component of the outward solutions also becomes broader as the population size becomes larger, which is crucial for reproducing key experimental observations, such as peripheral inhibition (Figure 10; Klapoetke et al., 2017). The optimized solutions depend on the number of detectors, and this is likely related to the increasing overlap in receptive fields as the population grows (Figure 6). This result is consistent with prior work showing that populations of neurons often exhibit different and improved coding strategies compared to individual neurons (Pasupathy and Connor, 2002; Georgopoulos et al., 1986; Vogels, 1990; Franke et al., 2016; Zylberberg et al., 2016; Cafaro et al., 2020). Thus, understanding anatomical, physiological, and algorithmic properties of individual neurons can require considering the population response. The solutions we found to the loom inference problem suggest that individual LPLC2 responses should be interpreted in light of the population of LPLC2 responses.
Our results shed light on discussions of $\eta $like (encoding angular size) and $\rho $like (encoding angular velocity) looming sensitive neurons in the literature (Gabbiani et al., 1999; Wu et al., 2005; Liu et al., 2011; Shang et al., 2015; Temizer et al., 2015; Dunn et al., 2016; von Reyn et al., 2017; Ache et al., 2019). In particular, these optimized models clarify an interesting but puzzling fact: LPLC2 neurons transform their inputs of directionselective motion signals to computations of angular size (Ache et al., 2019). Consistent with this tuning, our model also shows a linear relationship between the peak time relative to collision and the $R/v$ ratio, which should be followed by loom sensitive neurons that encode angular size (Peek and Card, 2016). In both cases, these properties appear to be the simple result of training the constrained model to reliably detect looming stimuli.
The units of the outward solution exhibit sparsity in their responses to looming stimuli, in contrast to the denser representations in the inward solution (Figure 8). During a looming event, in an outward solution, most of the units are quiet and only a few adjacent units have very large activities, reminiscent of sparse codes that seem to be favored, for instance, in cortical encoding of visual scenes (Olshausen and Field, 1996; Olshausen and Field, 1997). Since the readout of our model is a summation of the activities of the units, sparsity does not directly affect the performance of the model, but is an attribute of the favored solution. For a model with a different loss function or with noise, the degree of sparsity might be crucial. For instance, the sparse code of the outward model might make it easier to localize a hit stimulus (Morimoto et al., 2020), or might make the population response more robust to noise (Field, 1994).
Experiments have shown that inhibitory circuits play an important role for the selectivity of LPLC2 neurons. For example, motion signals at the periphery of the receptive field of an LPLC2 neuron inhibit its activity. This peripheral inhibition causes various interesting response patterns of the LPLC2 neurons to different types of stimuli (Figure 10E and F; Klapoetke et al., 2017). However, the structure of this inhibitory field is not fully understood, and our model provides a tool to investigate how the inhibitory inputs to LPLC2 neurons affect circuit performance on loom detection tasks. The strong inhibition on the periphery of the receptive field arises naturally in the outward solutions after optimization. The extent of the inhibitory components increases as more units are added to models (Figure 6). The broad inhibition appears in our model to suppress responses to the nonhit stimuli, and as in the data, the inhibition is broader than one might expect if the neuron were simply being inhibited by inward motion. These larger inhibitory fields are also consistent with the larger spatial pooling likely to be supplied by inhibitory LPi inputs (Klapoetke et al., 2017).
The synthetic stimuli used to train models in this study were unnatural in two ways. The first way was in the proportion of hits and nonhits. We trained with 25% of the training data representing hits. The true fraction of hits among all stimuli encountered by a fly is undoubtedly much less, and this affects how the loss function weights different types of errors. It is also clear that a falsepositive hit (in which a fly might jump to escape an object not on collision course) is much less penalized during evolution than a falsenegative (in which a fly doesn’t jump and an object collides, presumably to the detriment of the fly). It remains unclear how to choose these weights in the training data or in the loss function, but they affect the receptive field weights optimized by the model.
The second issue with the stimuli is that they were caricatures of stimulus types, but did not incorporate the richness of natural stimuli. This richness could include natural textures and spatial statistics (Ruderman and Bialek, 1994), which seem to impact motion detection algorithms (Fitzgerald and Clark, 2015; Leonhardt et al., 2016; Chen et al., 2019). This richness could also include more natural trajectories for approaching objects. Another way to enrich the stimuli would be to add noise, either in inputs to the model or in the model’s units themselves. We explored this briefly by adding selfrotationgenerated background motion; under those conditions, both solutions were present but optimized outward solutions performed better than the inward solutions (Figure 9—figure supplement 3, Materials and methods). This indicates that the statistics of the stimuli may play an important role in selecting solutions for loom detection. However, it remains less clear what the true performance limits of loom detection are, since most experiments use substantially impoverished looming stimuli. Moreover, it is challenging to characterize the properties of natural looming events. An interesting future direction will be to investigate the effects of more complex and naturalistic stimuli on the model’s filters and performance, as well as on LPLC2 neuron responses themselves.
For simplicity, our models did not impose the hexagonal geometry of the compound eye ommatidia. Instead, we assumed that the visual field is separated into a Cartesian lattice with ${5}^{\circ}$ spacing, each representing a local motion detector with two spatially separated inputs (Figure 3). This simplification alters slightly the geometry of the motion signals compared to the real motion detector receptive fields (Shinomiya et al., 2019). This could potentially affect the learned spatial weightings and reproduction of the LPLC2 responses to various stimuli, since the specific shapes of the filters matter (Figure 10). Thus, the hexagonal ommatidial structure and the full extent of inputs to T4 and T5 might be crucial if one wants to make comparisons with the dynamics and detailed responses of LPLC2 neurons. However, this geometric distinction seems unlikely to affect the main results of how to infer the presence of hit stimuli.
Our model requires a field of estimates of the local motion. Here, we used the simplest model – the HassensteinReichardt correlator model Equation 3 (Materials and methods, Hassenstein and Reichardt, 1956) – but the model could be extended by replacing it with a more sophisticated model for motion estimation. Some biophysically realistic ones might take into account synaptic conductances (Gruntman et al., 2018; Gruntman et al., 2019; Badwan et al., 2019; ZavatoneVeth et al., 2020) and could respond to static features of visual scenes (Agrochao et al., 2020). Alternatively, in natural environments, contrasts fluctuate in time and space. Thus, if one includes more naturalistic spatial and temporal patterns, one might consider a motion detection model that could adapt to changing contrasts in time and space (Drews et al., 2020; Matulis et al., 2020).
Although the outward filter of the unit emerges naturally from our gradient descent training protocol, that does not mean that the structure is learned by LPLC2 neurons in the fly. There may be some experience dependent plasticity in the fly eye (Kikuchi et al., 2012), but these visual computations are likely to be primarily genetically determined. Thus, one may think of the computation of the LPLC2 neuron as being shaped through millions of years of evolutionary optimization. Optimization algorithms at play in evolution may be able to avoid getting stuck in local optima (Stanley et al., 2019), and thus work well with the sort of shallow neural network found in the fly eye.
In this study, we focused on the motion signal inputs to LPLC2 neurons, and we neglected other inputs to LPLC2 neurons, such as those coming from the lobula that likely report nonmotion visual features. It would be interesting to investigate how this additional nonmotion information affects the performance and optimized solutions of the inference units. For instance, another lobula columnar neurons, LC4, is loom sensitive and receives inputs in the lobula (von Reyn et al., 2017). The LPLC2 and LC4 neurons are the primary excitatory inputs to the GF, which mediates escape behaviors (von Reyn et al., 2014; Ache et al., 2019). The inference framework set out here would allow one to incorporate parallel nonmotion intensity channels, either by adding them into the inputs to the LPLC2like units, or by adding in a parallel population of LC4like units. This would require a reformulation of the probabilistic model in Equation 6. Notably, one of the most studied loom detecting neurons, the lobula giant movement detector (LGMD) in locusts, does not appear to receive directionselective inputs, as LPLC2 does (Rind and Bramwell, 1996; Gabbiani et al., 1999). Thus, the inference framework set out here could be flexibly modified to investigate loom detection under a wide variety of constraints and inputs, which allow it to be applied to other neurons, beyond LPLC2.
Materials and methods
Code availability
Request a detailed protocolCode to perform all simulations in this paper and to reproduce all figures is available at https://github.com/ClarkLabCode/LoomDetectionANN, (copy archived at swh:1:rev:864fd3d591bc9e3923189320d7197bdd0cd85448; Zhou, 2021).
Coordinate system and stimuli
Request a detailed protocolWe designed a suite of visual stimuli to simulate looming objects, retreating objects, and rotational visual fields. In this section, we describe the suite of stimuli and the coordinate systems used in our simulations (Figure 4—figure supplement 1).
In our simulations and training, the fly is at rest on a horizontal plane, with its head pointing in a specific direction. The fly head is modeled to be a point particle with no volume. A threedimensional righthanded frame of reference $\mathrm{\Sigma}$ is set up and attached to the fly head at the origin. The $z$ axis points in the anterior direction from the fly head, perpendicular to the line that connects the two eyes, and in the horizontal plane of the fly; the $y$ axis points toward the right eye, also in the horizontal plane; and the $x$ axis points upward and perpendicular to the horizontal plane. Looming or retreating objects are represented in this space by a sphere with radius $R=1$, and the coordinates of an object’s center at time $t$ are denoted as $\mathbf{r}(t)=(x(t),y(t),z(t))$. Thus, the distance between the object center and the fly head is $D(t)=\Vert \mathbf{r}(t)\Vert =\sqrt{{x}^{2}(t)+{y}^{2}(t)+{z}^{2}(t)}$.
Within this coordinate system, we set up cones to represent individual units. The receptive field of LPLC2 neurons is measured at roughly ${60}^{\circ}$ in diameter (Klapoetke et al., 2017). Thus, we here model each unit as a cone with its vertex at the origin and with halfangle of ${30}^{\circ}$. For each unit $m$ ($m=1,2,\mathrm{\dots},M$), we set up a local frame of reference ${\mathrm{\Sigma}}_{m}$ (Figure 4—figure supplement 1): the z_{m} axis is the axis of the cone and its positive direction points outward from the origin. The local ${\mathrm{\Sigma}}_{m}$ can be obtained from $\mathrm{\Sigma}$ by two rotations: around $x$ of $\mathrm{\Sigma}$ and around the new ${y}^{{}^{\prime}}$ after the rotation around $x$. For each unit, its cardinal directions are defined as: upward (positive direction of x_{m}), downward (negative direction of x_{m}), leftward (negative direction of y_{m}). and rightward (positive direction of y_{m}). To get the signals that are received by a specific unit $m$, the coordinates of the object in $\mathrm{\Sigma}$ are rotated to the local frame of reference ${\mathrm{\Sigma}}_{m}$.
Within this coordinate system, we can set up cones representing the extent of a spherical object moving in the space. The visible outline of a spherical object spans a cone with its point at the origin. The halfangle of this cone is a function of time and can be denoted as ${\theta}_{\text{s}}(t)$:
One can calculate how the cone of the object overlaps with the receptive field cones of each unit.
There are multiple layers of processing in the fly visual system (Takemura et al., 2017), but here we focus on two coarse grained stages of processing: (1) the estimation of local motion direction from optical intensities by motion detection neurons T4 and T5 and (2) the integration of the flow fields by LPLC2 neurons. In our simulations, the interior of the $m$th unit cone is represented by a NbyN matrix, so that each element in this matrix (except the ones at the four corners) indicates a specific direction in the angular space within the unit cone. If an element also falls within the object cone, then its value is set to 1; otherwise it is 0. Thus, at each time $t$, this matrix is an optical intensity signal and can be represented by $C({x}_{m},{y}_{m},t)$, where $({x}_{m},{y}_{m})$ are the coordinates in ${\mathrm{\Sigma}}_{m}$. In general, $N$ should be large enough to provide good angular resolutions. Then, ${K}^{2}$ ($K<N$) motion detectors are evenly distributed within the unit cone, with each occupying an $L$by$L$ grid in the $N$by$N$ matrix, where $L=N/K$. This $L$by$L$ grid represents a ${5}^{\circ}$by${5}^{\circ}$ square in the angular space, consistent with the approximate spacing of the inputs of motion detectors T4 and T5. This arrangement effectively uses high spatial resolution intensity data to compute local intensity before it is discretized into motion signals with a resolution of ${5}^{\circ}$. Since the receptive field of an LPLC2 neuron is roughly ${60}^{\circ}$, the value of $K$ is chosen to be 12. To get sufficient angular resolution for the local motion detectors, $L$ is set to be 4, so that $N$ is set to 48.
Each motion detector is assumed to be a Hassenstein Reichardt Correlator (HRC) and calculates local flow fields from $C({x}_{m},{y}_{m},t)$ (Hassenstein and Reichardt, 1956; Figure 3—figure supplement 1). The HRC used here has two inputs, separated by ${5}^{\circ}$ in angular space. Each input applies first a spatial filter on the contrast $C({x}_{m},{y}_{m},t)$ and then temporal filters:
where ${f}_{j}(j\in 1,2)$ is a temporal filter and $G$ is a discrete 2d Gaussian kernel with mean ${0}^{\circ}$ and standard deviation of ${2.5}^{\circ}$ to approximate the acceptance angle of the fly photoreceptors (Stavenga, 2003). The temporal filter f_{1} was chosen to be an exponential function ${f}_{1}({t}^{{}^{\prime}})=(1/\tau )\mathrm{exp}({t}^{{}^{\prime}}/\tau )$ with $\tau $ set to 0.03 seconds (SalazarGatzimas et al., 2016), and f_{2} a delta function ${f}_{2}=\delta ({t}^{{}^{\prime}})$. This leads to
as the local flow field at time $t$ between two inputs located at $({x}_{m1},{y}_{m1})$ and $({x}_{m2},{y}_{m2})$.
Four types of T4 and T5 neurons have been found that project to layers 1, 2, 3, and 4 of the LP. Each type is sensitive to one of the cardinal directions: down, up, left, right (Maisak et al., 2013). Thus, in our model, there are four nonnegative, local flow fields that serve as the only inputs to the model: ${U}_{}(t)$ (downward, corresponding LP layer 4), ${U}_{+}(t)$ (upward, LP layer 3), ${V}_{}(t)$ (leftward, LP layer 1), and ${V}_{+}(t)$ (rightward, LP layer 2), each of which is a $K$by$K$ matrix. To calculate these matrices, two sets of motion detectors are needed, one for the vertical directions and one for the horizontal directions. The HRC model in Equation 3 is direction sensitive and is opponent, meaning that for motion in the preferred (null) direction, the output of the HRC model is positive (negative). Thus, assuming that upward (rightward) is the preferred vertical (horizontal) direction, we obtain the nonnegative elements of the four flow fields as
where ${k}_{1},{k}_{2}\in \{1,2,\mathrm{\dots},K\}$. In the above expressions, for ${[{U}_{}(t)]}_{{k}_{1}{k}_{2}}$ and ${[{U}_{+}(t)]}_{{k}_{1}{k}_{2}}$, the vertical motion detector at $({k}_{1},{k}_{2})$ has its two inputs located at $({x}_{m1},{y}_{m})$ and $({x}_{m2},{y}_{m})$, respectively. Similarly, for for ${[{V}_{}(t)]}_{{k}_{1}{k}_{2}}$ and ${[{V}_{+}(t)]}_{{k}_{1}{k}_{2}}$, the horizontal motion detector at $({k}_{1},{k}_{2})$ has its two inputs located at $({x}_{m},{y}_{m1})$ and $({x}_{m},{y}_{m2})$. Using the opponent HRC output as the motion signals for each layer is reasonable because the motion detectors T4 and T5 are highly directionselective over a large range of inputs (Maisak et al., 2013; Creamer et al., 2018) and synaptic, 3input models for T4 are approximately equivalent to opponent HRC models (ZavatoneVeth et al., 2020).
We simulated the trajectories $\mathbf{r}(t)$ of the object in the frame of reference $\mathrm{\Sigma}$ at a time resolution of 0.01 s, which is also be the time step of the training and testing stimuli. For hit, miss, and retreat cases, the trajectories of the object are always straight lines, and the velocities of the object were randomly sampled from a range $[2R,10R]({s}^{1})$ with the trajectories confined to be within a sphere of $5R$ centered at the fly head. The radius of the object, $R$, is always set to be one except in the rotational stimuli. To generate rotational stimuli, we placed 100 objects with various radii selected uniformly from $[0,1]$ at random distances ([5, 15]) and positions around the fly, and rotated them all around a randomly chosen axis. The rotational speed was chosen from a Gaussian distribution with mean ${0}^{\circ}/s$ and standard deviation ${200}^{\circ}/s$, a reasonable rotational velocity for walking flies (DeAngelis et al., 2019). In one case, training data included both an object moving and global rotation due to selfrotation (Figure 9—figure supplement 3). These stimuli were simply combinations of the rotational stimuli with the other three cases (hit, miss, and retreat), so that the object that moves in the depth dimension also rotates together with the background.
We reproduced a range of stimuli used in a previous study (Klapoetke et al., 2017) and tested them on our trained model (Figure 10B–H). To match the cardinal directions of LP layers (Figure 1), we have rotated the stimuli (except in Figure 10B) 45 degrees compared with the ones displayed in the figures in Klapoetke et al., 2017. The disc (Figure 10B and C) expands from ${20}^{\circ}$ to ${60}^{\circ}$ with an edge speed of ${10}^{\circ}/s$. All the bar and edge motions have an edge speed of ${20}^{\circ}/s$. The width of the bars are ${60}^{\circ}$ (right panel of Figure 10E and H), ${20}^{\circ}$ (middle panel of Figure 10E), and ${10}^{\circ}$ (all the rest). All the responses of the models (except in Figure 10B) have been normalized by the peak of the response to the expanding disc (Figure 10B).
We created a range of hit stimuli with various $R/v$ ratios: $0.01s$, $0.02s$, $0.04s$, $0.08s$, $0.10s$, $0.12s$, $0.14s$, $0.16s$, $0.18s$, $0.20s$. The radius $R$ of the spherical object is fixed to be 1, and the velocity is changed accordingly to achieve different $R/v$ ratios.
Models
Request a detailed protocolLPLC2 neurons have four dendritic structures in the four LP layers, and they receive direct excitatory inputs from T4/T5 motion detection neurons (Maisak et al., 2013; Klapoetke et al., 2017). It has been proposed that each dendritic structure also receives inhibitory inputs mediated by lobula plate intrinsic interneurons, such as LPi43 (Klapoetke et al., 2017). We built two model units to approximate this anatomy.
LRF models A linear receptive field (LRF) model is characterized by a realvalued filter, represented by a 12by12 matrix ${W}^{\text{r}}$ (Figure 4A). The elements of the filter combine the effects of the excitatory and inhibitory inputs and can take both positive (stronger excitation) and negative (stronger inhibition) values. We rotate ${W}^{\text{r}}$ counterclockwise by multiples of 90° to obtain the filters that are used to integrate the four motion signals: ${U}_{}(t)$, ${U}_{+}(t)$, ${V}_{}(t)$, ${V}_{+}(t)$. Specifically, we define the corresponding four filters as: ${W}_{{U}_{}}^{\text{r}}=\text{rotate}({W}^{\text{r}},{270}^{\circ})$, ${W}_{{U}_{+}}^{\text{r}}=\text{rotate}({W}^{\text{r}},{90}^{\circ})$, ${W}_{{V}_{}}^{\text{r}}=\text{rotate}({W}^{\text{r}},{180}^{\circ})$, ${W}_{{V}_{+}}^{\text{r}}=\text{rotate}({W}^{\text{r}},{0}^{\circ})$. In addition, we impose mirror symmetry on the filters, and with the above definitions of the rotated filters, the upper half of ${W}^{\text{r}}$ is a mirror image of the lower half of ${W}^{\text{r}}$. Thus, there are in total 72 parameters in the filters. In fact, since only the elements within a 60 degree cone contribute to the filter for the units, the corners are excluded, resulting in only 56 trainable parameters.
In computer simulations, the filters or weights and flow fields are flattened to be onedimensional column vectors. The response of a single LRF unit $m$ is:
where $\varphi (\cdot )=\mathrm{max}(\cdot ,0)$ is the rectified linear unit (ReLU), and ${b}^{\text{r}}\in \mathbb{R}$ is the intercept (Figure 4A). The ReLU is used as the activation function in all the figures, except in one panel (Figure 9—figure supplement 2), where an exponential linear unit is used:
For $M=1$, the LRF model is very close to a generalized linear model, except that it includes an additional activation function $\varphi $. This activation function changes the convexity of the model to make it a nonconvex optimization problem, in general.
RI models The rectified inhibition (RI) models have two types of nonnegative filters, one excitatory and one inhibitory, represented by ${W}^{\text{e}}$ and ${W}^{\text{i}}$, respectively (Figure 4B). Each filter is a 12by12 matrix. The same rotational and mirror symmetries are imposed as in the LRF models, which leads to four excitatory filters as: ${W}_{{U}_{}}^{\text{e}}=\text{rotate}({W}^{\text{e}},{270}^{\circ})$, ${W}_{{U}_{+}}^{\text{e}}=\text{rotate}({W}^{\text{e}},{90}^{\circ})$, ${W}_{{V}_{}}^{\text{e}}=\text{rotate}({W}^{\text{e}},{180}^{\circ})$, ${W}_{{V}_{+}}^{\text{e}}=\text{rotate}({W}^{\text{e}},{0}^{\circ})$, and four inhibitory filters as: ${W}_{{U}_{}}^{\text{i}}=\text{rotate}({W}^{\text{i}},{270}^{\circ})$, ${W}_{{U}_{+}}^{\text{i}}=\text{rotate}({W}^{\text{i}},{90}^{\circ})$, ${W}_{{V}_{}}^{\text{i}}=\text{rotate}({W}^{\text{i}},{180}^{\circ})$, ${W}_{{V}_{+}}^{\text{i}}=\text{rotate}({W}^{\text{i}},{0}^{\circ})$. Thus, there are in total 112 parameters in the two sets of filters, excluding the elements in the corners.
The responses of the inhibitory units are:
where $\varphi (\cdot )=\mathrm{max}(\cdot ,0)$ is the ReLU, and ${b}^{\text{i}}\in \mathbb{R}$ is the intercept. In the RI model, the rectification of each inhibitory layer is motivated by the LPi neurons, which mediate the inhibition within each layer, and could themselves rectify their inhibitory input into LPLC2. The response of a single RI unit $m$ is
where ${b}^{\text{e}}\in \mathbb{R}$ is the intercept (Figure 4B). Interestingly, the RI models become equivalent to the LRF models if we remove the ReLU in the inhibitions and define ${W}_{{U}_{}}^{\text{r}}={W}_{{U}_{}}^{\text{e}}{W}_{{U}_{}}^{\text{i}}$, ${W}_{{U}_{+}}^{\text{r}}={W}_{{U}_{+}}^{\text{e}}{W}_{{U}_{+}}^{\text{i}}$, ${W}_{{V}_{+}}^{\text{r}}={W}_{{V}_{+}}^{\text{e}}{W}_{{V}_{+}}^{\text{i}}$, ${W}_{{V}_{}}^{\text{r}}={W}_{{V}_{}}^{\text{e}}{W}_{{V}_{}}^{\text{i}}$ and ${b}^{\text{r}}={b}^{\text{e}}4{b}^{\text{i}}$.
For both the LRF and RI models, the inferred probability of hit for a specific trajectory is
where $T$ is the total number of time steps in the trajectory and $\sigma (\cdot )$ is the sigmoid function. Since we are adding two intercepts (${b}^{\text{r}}$, and $b$) to the LRF models, and three intercepts (${b}^{\text{i}}$, ${b}^{\text{e}}$, and $b$) to the RI models, there are 58 and 115 parameters to train the two models, respectively.
Training and testing
Request a detailed protocolWe created a synthetic data set containing four types of motion: loomandhit, loomandmiss, retreat, and rotation. The proportions of these types were 0.25, 0.125, 0.125, and 0.5, respectively. In total, there were 5200 trajectories, with 4000 for training and 1200 for testing. Trajectories with motion type loomandhit are labeled as hit or ${y}_{n}=1$ (probability of hit is 1), while trajectories of other motion types are labeled as nonhit or ${y}_{n}=0$ (probability of hit is 0), where $n$ is the index of each specific sample. Models with smaller $M$ have fewer trajectories in the receptive field of any unit. For stability of training, we therefore increased the number of trajectories by factors of eight, four, and two for $M=1,2,4$, respectively.
The loss function to be minimized in our training was the cross entropy between the label y_{n} and the inferred probability of hit ${\widehat{P}}_{\text{hit}}$, and averaged across all samples, together with a regularization term:
where ${\widehat{P}}_{\text{hit}}(n)$ is the inferred probability of hit for sample $n$, $\beta $ is the strength of the ${\mathrm{\ell}}_{2}$ regularization, and $W$ represents all the effective parameters in the two excitatory and inhibitory filters.
The strength of the regularization $\beta $ was set to be ${10}^{4}$, which was obtained by gradually increasing $\beta $ until the performance of the model on test data started to drop. The regularization sped up convergence of solutions, but the regularization strength did not strongly influence the main results in the paper.
To speed up training, rather than taking a temporal average as shown in Equation 6, a snapshot was sampled randomly from each trajectory, and the probability of hit of this snapshot was used to represent the whole trajectory, that is, ${\widehat{P}}_{\text{hit}}=\sigma \left({\sum}_{m}{r}_{m}(t)+b\right)$, where $t$ is a random sample from $\{1,2,\mathrm{\dots},T\}$. Minibatch gradient descent was used in training, and the learning rate was 0.001.
After training, the models were tested on the entire trajectories with the probability of hit defined in Equation 6. Models trained only on snapshots performed well on the test data. During testing, the performance of the model was evaluated by the area under the curve (AUC) of the receiver operating characteristic (ROC) and precisionrecall (PR) curves (Hanley and McNeil, 1982; Davis and Goadrich, 2006). TensorFlow (Abadi et al., 2016) was used to train all models.
Clustering the solutions
Request a detailed protocolWe used the following procedure to cluster the solutions. For the LRF models, the filter of each solution was simply flattened to form a vector. But, for the RI mdoels, each solution had an excitatory and an inhibitory filter. We flattened these two filters, and concatenated them into a single vector. (The elements at the corners were deleted since they are outside of the receptive field.) Thus, each solution was represented by a vector, from which we calculated the cosine distance for each pair of solutions. The obtained distance matrix was then fed into a hierarchical clustering algorithm (Virtanen et al., 2020). After obtaining the hierarchical clustering, the outward and inward filters were identified by their shape. We counted the positive filter elements corresponding to flow fields with components radiating outward and subtracted the number of positive filter elements corresponding to flow fields with components directed inward. If the resulting value was positive, the filters were labeled as outward; otherwise, the filters were labeled as inward. We could also sum the elements rather than count them, but we found that the latter was more robust. If the elements in the concatenated vector were all close to zero, then the corresponding filters were labeled as zero solutions.
Statistics
To calculate the fraction of active units for the model with $M=256$ (Figure 8B), we examined the response curves of each unit to all trajectories of a specific type of stimuli. If a unit response is above baseline (dotted lines in Figure 7B), then the unit is counted as active. For each trajectory/stimulus, we obtained the number of active units. We used this number to calculate the mean and standard deviation of active units across all the trajectories within each type of stimulus (hit, miss, retreat, rotation).
For a model with $M$ units, where $M\in \{1,2,4,8,16,32,64,128,192,256\}$, 200 random initializations were used to train it. For the LRF models, within these 200 training runs, the number of outward solutions ${N}_{\text{out}}$ were (starting from smaller values of $M$) 45, 59, 60, 65, 52, 63, 52, 57, 58, 49, and the number of inward solutions ${N}_{\text{in}}$ were 142, 135, 127, 119, 139, 133, 130, 118, 123, 119. For the RI models, the number of outward solutions ${N}_{\text{out}}$ were 44, 46, 48, 50, 48, 50, 53, 55, 58, 64, and the number of inward solutions ${N}_{\text{in}}$ were 39, 40, 39, 46, 53, 51, 35, 38, 12, 10. The average score curves and points in Figure 9A, Figure 9—figure supplement 1A and Figure 9—figure supplement 2A were obtained by taking the average among each type of solution, with the shading indicating the standard deviations. The curves and point in Figure 9—figure supplement 1C are the ratio of the number of outward solutions to the number of inward solutions. To obtain error bars (grey shading), we considered the training results as a binomial distribution, with the probability of obtaining an outward solution being ${N}_{\text{out}}/({N}_{\text{out}}+{N}_{\text{in}})$, and with the probability of obtaining an inward solution being ${N}_{\text{in}}/({N}_{\text{out}}+{N}_{\text{in}})$. Thus, the standard deviation of this binomial distribution is ${\sigma}_{\text{b}}=\sqrt{{N}_{\text{out}}{N}_{\text{in}}/({N}_{\text{out}}+{N}_{\text{in}})}$. From this, we calculate the error bars as the propagated standard deviation (Morgan et al., 1990):
Data availability
Code to perform all simulations in this paper and to reproduce all figures is available at https://github.com/ClarkLabCode/LoomDetectionANN, (copy archived at swh:1:rev:864fd3d591bc9e3923189320d7197bdd0cd85448).
References

ConferenceIn 12th USENIX symposium on operating systems design and implementationTensorflow: A system for largescale machine learning. pp. 265–283.

Global Motion Processing by Populations of DirectionSelective Retinal Ganglion CellsThe Journal of Neuroscience 40:5807–5819.https://doi.org/10.1523/JNEUROSCI.056420.2020

Visually mediated motor planning in the escape response of DrosophilaCurrent Biology 18:1300–1307.https://doi.org/10.1016/j.cub.2008.07.094

ConferenceProceedings of the 23rd international conference on Machine learningThe relationship between PrecisionRecall and ROC curves. pp. 233–240.https://doi.org/10.1145/1143844.1143874

Dynamic Signal Compression for Robust Motion Vision in FliesCurrent Biology 30:209–221.https://doi.org/10.1016/j.cub.2019.10.035

What Is the Goal of Sensory Coding?Neural Computation 6:559–601.https://doi.org/10.1162/neco.1994.6.4.559

Computation of object approach by a widefield, motionsensitive neuronThe Journal of Neuroscience 19:1122–1141.

Systemtheoretische Analyse der Zeit, Reihenfolgen und Vorzeichenauswertung bei der Bewegungsperzeption des Rüsselkäfers ChlorophanusZeitschrift Für Naturforschung B 11:513–524.https://doi.org/10.1515/znb195691004

Looming sensitive cortical regions without V1 input: evidence from a patient with bilateral cortical blindnessFrontiers in Integrative Neuroscience 9:51.https://doi.org/10.3389/fnint.2015.00051

Experiencedependent plasticity of the optomotor response in Drosophila melanogasterDevelopmental Neuroscience 34:533–542.https://doi.org/10.1159/000346266

Neuronal responses to looming objects in the superior colliculus of the catBrain, Behavior and Evolution 77:193–205.https://doi.org/10.1159/000327045

Approach sensitivity in the retina processed by a multifunctional neural circuitNature Neuroscience 12:1308–1316.https://doi.org/10.1038/nn.2389

Computation of object approach by a system of visual motionsensitive neurons in the crab NeoheliceJournal of Neurophysiology 112:1477–1490.https://doi.org/10.1152/jn.00921.2013

Population coding of shape in area V4Nature Neuroscience 5:1332–1338.https://doi.org/10.1038/nn972

Comparative approaches to escapeCurrent Opinion in Neurobiology 41:167–173.https://doi.org/10.1016/j.conb.2016.09.012

Looming detectors in the human visual pathwayVision Research 18:415–421.https://doi.org/10.1016/00426989(78)900512

A deep learning framework for neuroscienceNature Neuroscience 22:1761–1770.https://doi.org/10.1038/s4159301905202

Neural network based on the input organization of an identified neuron signaling impending collisionJournal of Neurophysiology 75:967–985.https://doi.org/10.1152/jn.1996.75.3.967

Statistics of natural images: Scaling in the woodsPhysical Review Letters 73:814–817.https://doi.org/10.1103/PhysRevLett.73.814

Gliding behaviour elicited by lateral looming stimuli in flying locustsJournal of Comparative Physiology. A, Neuroethology, Sensory, Neural, and Behavioral Physiology 191:61–73.https://doi.org/10.1007/s003590040572x

Role of a loomingsensitive neuron in triggering the defense behavior of the praying mantis Tenodera aridifoliaJournal of Neurophysiology 112:671–682.https://doi.org/10.1152/jn.00049.2014

Designing neural networks through neuroevolutionNature Machine Intelligence 1:24–35.https://doi.org/10.1038/s422560180006z

Angular and spectral sensitivity of fly photoreceptors. II. Dependence on facet lens Fnumber and rhabdomere type in DrosophilaJournal of Comparative Physiology. A, Neuroethology, Sensory, Neural, and Behavioral Physiology 189:189–202.https://doi.org/10.1007/s0035900303906

Motor outputs of giant nerve fiber in DrosophilaJournal of Neurophysiology 44:405–421.https://doi.org/10.1152/jn.1980.44.2.405

A Visual Pathway for LoomingEvoked Escape in Larval ZebrafishCurrent Biology 25:1823–1834.https://doi.org/10.1016/j.cub.2015.06.002

Stimulus and goaloriented frameworks for understanding natural visionNature Neuroscience 22:15–24.https://doi.org/10.1038/s4159301802840

Population coding of stimulus orientation by striate cortical cellsBiological Cybernetics 64:25–31.https://doi.org/10.1007/BF00203627

A spiketiming mechanism for action selectionNature Neuroscience 17:962–970.https://doi.org/10.1038/nn.3741

Tectal neurons signal impending collision of looming objects in the pigeonEuropean Journal of Neuroscience 22:2325–2331.https://doi.org/10.1111/j.14609568.2005.04397.x

Using goaldriven deep learning models to understand sensory cortexNature Neuroscience 19:356–365.https://doi.org/10.1038/nn.4244

SoftwareLoomDetectionANN, version swh:1:rev:864fd3d591bc9e3923189320d7197bdd0cd85448Software Heritage.
Article and author information
Author details
Funding
National Institutes of Health (R01EY026555)
 Baohua Zhou
 Damon A Clark
National Science Foundation (CCF1839308)
 Baohua Zhou
 John Lafferty
 Damon A Clark
National Science Foundation (DMS1513594)
 John Lafferty
Kavli Foundation
 John Lafferty
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
Research supported in part by NSF grants DMS1513594, CCF1839308, DMS2015397, NIH R01EY026555, a JP Morgan Faculty Research Award, and the Kavli Foundation. We thank G Card and N Klapoetke for sharing data traces from their paper. We thank members of the Clark lab for discussions and comments.
Version history
 Preprint posted: July 8, 2021 (view preprint)
 Received: July 8, 2021
 Accepted: January 11, 2022
 Accepted Manuscript published: January 13, 2022 (version 1)
 Version of Record published: February 16, 2022 (version 2)
Copyright
© 2022, Zhou et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,277
 views

 180
 downloads

 18
 citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Neuroscience
Maternal choline supplementation (MCS) improves cognition in Alzheimer’s disease (AD) models. However, the effects of MCS on neuronal hyperexcitability in AD are unknown. We investigated the effects of MCS in a wellestablished mouse model of AD with hyperexcitability, the Tg2576 mouse. The most common type of hyperexcitability in Tg2576 mice are generalized EEG spikes (interictal spikes [IIS]). IIS also are common in other mouse models and occur in AD patients. In mouse models, hyperexcitability is also reflected by elevated expression of the transcription factor ∆FosB in the granule cells (GCs) of the dentate gyrus (DG), which are the principal cell type. Therefore, we studied ΔFosB expression in GCs. We also studied the neuronal marker NeuN within hilar neurons of the DG because reduced NeuN protein expression is a sign of oxidative stress or other pathology. This is potentially important because hilar neurons regulate GC excitability. Tg2576 breeding pairs received a diet with a relatively low, intermediate, or high concentration of choline. After weaning, all mice received the intermediate diet. In offspring of mice fed the high choline diet, IIS frequency declined, GC ∆FosB expression was reduced, and hilar NeuN expression was restored. Using the novel object location task, spatial memory improved. In contrast, offspring exposed to the relatively low choline diet had several adverse effects, such as increased mortality. They had the weakest hilar NeuN immunoreactivity and greatest GC ΔFosB protein expression. However, their IIS frequency was low, which was surprising. The results provide new evidence that a diet high in choline in early life can improve outcomes in a mouse model of AD, and relatively low choline can have mixed effects. This is the first study showing that dietary choline can regulate hyperexcitability, hilar neurons, ΔFosB, and spatial memory in an animal model of AD.

 Neuroscience
Brain structural circuitry shapes a richly patterned functional synchronization, supporting for complex cognitive and behavioural abilities. However, how coupling of structural connectome (SC) and functional connectome (FC) develops and its relationships with cognitive functions and transcriptomic architecture remain unclear. We used multimodal magnetic resonance imaging data from 439 participants aged 5.7–21.9 years to predict functional connectivity by incorporating intracortical and extracortical structural connectivity, characterizing SC–FC coupling. Our findings revealed that SC–FC coupling was strongest in the visual and somatomotor networks, consistent with evolutionary expansion, myelin content, and functional principal gradient. As development progressed, SC–FC coupling exhibited heterogeneous alterations dominated by an increase in cortical regions, broadly distributed across the somatomotor, frontoparietal, dorsal attention, and default mode networks. Moreover, we discovered that SC–FC coupling significantly predicted individual variability in general intelligence, mainly influencing frontoparietal and default mode networks. Finally, our results demonstrated that the heterogeneous development of SC–FC coupling is positively associated with genes in oligodendrocyterelated pathways and negatively associated with astrocyterelated genes. This study offers insight into the maturational principles of SC–FC coupling in typical development.