Introduction

Fruit flies demonstrate impressive agility in flight (Card and Dickinson, 2008; Fry et al., 2005; Liu et al., 2019; Muijres et al., 2014), and vision plays a pivotal role in the flight control, often translating directly into precise flight maneuvers. Neural circuit mechanisms underlying many of these visuomotor behaviors have been studied intensively in recent years, both structurally and functionally (Fenk et al., 2022; Joesch, 2009; Joesch et al., 2010; Takemura et al., 2013; Wu et al., 2016) (see (Currier et al., 2023; Ryu et al., 2022) for reviews). While we have gained a qualitative grasp of the neural substrates underlying various visual behaviors, a quantitative understanding remains less explored (Ache et al., 2019; Maisak et al., 2013; Mauss et al., 2015; Städele et al., 2020).

In the pioneering era of insect vision in the 1950s to the 1970s, research heavily leaned towards computational modeling. For instance, the Reichardt-Hassenstein elementary motion detector was formulated to explain visual locomotion behaviors observed in beetles (Hassenstein and Reichardt, 1956). Another example is the model of efference copy (EC) that was put forward to describe the visual behavior of hoverflies (Collett, 1980; von Holst and Mittelstaedt, 1950). While some recent studies have offered quantitative models for Drosophila visual circuits (Borst and Weber, 2011; Clark et al., 2011; Gruntman et al., 2018; Tanaka and Clark, 2020), most focused on circuit-level function without considering its behavioral context. To contextualize the function of these circuits within behavior, a phenomenological model integrating these visual functions within a moving animal would be invaluable.

One important consideration that needs to be made in such an integrative model is the interaction among different visuomotor circuits tuned to different visual features. Since these circuits in complex visual environments may activate multiple motor programs simultaneously, circuit mechanisms by which distinct sensorimotor pathways are integrated adaptively are necessary. A foundational theory regarding the interaction of multiple sensorimotor pathways is the theory of EC, which underscores the necessity for a feedback signal that filters out undesired sensory input (von Holst and Mittelstaedt, 1950). Previous behavioral studies argued that such a mechanism may indeed exist in some dipteran species (Collett, 1980; Heisenberg and Wolf, 1988). Recent studies in Drosophila have uncovered the neural correlates of the EC during both spontaneous and visually evoked rapid flight turns (Fenk et al., 2021; Kim et al., 2017, 2015).

In this study, we present an integrative model of Drosophila visuomotor behaviors that is derived from and validated by behavioral experiments. Results of the experiments provided clear behavioral evidence of an EC in flying Drosophila, which was in line with one type of integration strategy that we used in the model. The model was able to predict the steering behavior of flying Drosophila in response to various visual patterns when they are presented individually as well as in superposition. Our model provides a framework in which detailed neural circuit models for Drosophila visuomotor processing can be implemented and tested in a real-world context.

Results

Flight response of Drosophila for singly presented visual patterns

To develop a quantitative model of the visuomotor control for visual patterns presented individually, we measured wing responses of flying Drosophila. We placed a tethered, flying fly at the center of a cylindrical LED display and presented a rotating visual pattern on the display (Fig. 1A). To measure the angular torque generated by the fly in response to these patterns, we subtracted the right wingbeat amplitude (R WBA) from the left wingbeat amplitude (L WBA), resulting in a time signal termed L-R WBA (Fig. 1B). Previous studies showed that L-R WBA is correlated with the angular torque that a fly exerts on its body (3, 34).

Construction of flight control models for singly presented visual patterns.

(A) Schematic of the experimental setup (left), a frame captured by the infrared camera (middle), and a simplified schematic of the experimental setup (right). The annulus surrounding the fly schematic represents the visual display viewed from above. (B) Wing responses of a sample fly to three rotating visual patterns: bar, spot, and grating. L-R WBA at the bottom row represents the angular torque of the fly, calculated by subtracting the right wingbeat amplitude (RWBA) from the left wingbeat amplitude (LWBA). (C) L-R WBA traces of a sample fly in response to the three visual patterns. The thick black lines indicate the average of all trials (top) or the average of all flies (bottom). Thin gray lines indicate individual trials (top) or fly averages (bottom). (D) Schematic of the position-velocity-based flight control model. (E) Average wing responses of a population of flies in response to the three rotating visual patterns, either in a clockwise (red) or counterclockwise (blue) direction. Top traces show the position of each pattern. The red and blue shadings at the bottom indicate the 95% confidence interval. (F) Position and velocity functions estimated from wing responses in E. Light purple shading indicates the 95% confidence interval.

We presented three visual patterns rotating a full circle at 36°/s: a dark vertical bar, a dark spot, and a vertical grating (Fig. 1B). These simple patterns, when tested in different velocities and sizes, were previously shown to elicit robust visuomotor reflexes in flying Drosophila (Götz, 1968; H. Kim et al., 2023; Maimon et al., 2008; Tammero et al., 2004). In response to a clockwise-rotating bar starting from the back, flies generated wing responses corresponding to a fictive rightward turn throughout the duration of the stimulus (Fig. 1 B-C). The response increased gradually and peaked after crossing the midline, then declined. In response to the moving spot, flies exhibited L-R WBA indicating turning away from the position of the spot (Fig. 1C). In response to the moving grating, flies showed strong wing responses that would exert angular torque in the direction of the grating movement, also consistent with previous studies (Cellini et al., 2022; Cellini and Mongeau, 2020; Tammero et al., 2004).

Flight control models of Drosophila for singly presented visual patterns

We built a simple model for these visual behaviors by adopting a classical approach that was originally applied to flying Musca Domestica (Poggio and Reichardt, 1973; Reichardt and Poggio, 1976). In this approach, the torque response of an animal, W(·), is modeled as a sum of two components: one defined purely by a position response P( ψ) and the other by a position-dependent velocity response (Fig. 1D).

In this model, the position response can be experimentally obtained if we add torque responses to a visual pattern rotating in the clockwise and counter-clockwise directions, but with respect to the same object position.

where WCW(ψ) denotes the wing response for the clockwise-rotating pattern, and WCCW(ψ) for the counterclockwise-rotating pattern. The velocity response, , can be obtained similarly, but by a subtraction. To apply the above approach, we tested flies with both clockwise- and counterclockwise-rotating visual patterns (Fig. 1E). Wingbeat signals to the patterns were largely symmetric for the two directions, but with subtle asymmetries, which are likely due to experimental variabilities such as misalignment of the fly’s body orientation with respect to the display arena. To minimize the influence of such experimental anomalies, we combined the two L-R WBA signals after flipping one for its sign as well as the time, and used them for the further analyses.

When we computed the position functions, P(ψ), for the spot and bar patterns with the above method, we found that they were symmetric with respect to the origin (Fig. 1F). That is, for the rotating bar, the position response was positive when the bar was in the left hemifield, but negative in the right hemifield. This result is consistent with fly’s fixation behavior to a vertical bar as this position response would have made flies turn toward the bar in both sides. For the spot pattern, the overall shape of the position function was opposite to that for the bar, which would have made flies turn away from the spot.

The velocity function, V(ψ), was nearly zero for the spot, except around the frontal visual field (Fig. 1F), and this is consistent with weak direction dependency in the wing response (Fig. 1E). By contrast, the velocity function for the bar is consistently positive (Fig. 1F), poised to contribute to the flight turn in the direction of the bar movement (Reichardt and Poggio, 1976). The amplitude of the velocity response, however, varied depending on the object position, unlike the nearly constant velocity function estimated in blow flies in response to a moving bar (Reichardt and Poggio, 1976).

In response to the moving grating, the velocity response was consistently positive, but its amplitude varied depending on the position (Fig. 1E). The grating pattern consists of 12 periods of bright and dark vertical stripes, with each stripe spanning 15 degrees, which is larger than the fruit fly’s inter-ommatidial angle (∼5°). However, because we do not see any periodicity in the wing response to the grating (Fig. 1E), we reasoned that the change in position function is not because of the positional variation of the pattern. Instead, the time-dependency in the position and velocity response was likely due to non-visual factors such as adaptation or fatigue in the motor control caused by exceptionally large L-R WBA. Furthermore, the relatively slow L-R WBA change at the beginning and the end of the grating stimulus suggests a dynamical system transformation between the visual stimulus and the wing response. Thus, we expanded our models with a set of dynamical system equations involving a fatigue variable (Fig. S1), which successfully reproduced L-R WBAs to the three visual patterns as well as the position and velocity responses associated with each pattern. We also performed this experiment and analysis in a different strain of wild type flies (Canton S) and found that the profile of the position and velocity functions remain largely unchanged (Fig. S2).

Simulating the orientation behavior of the virtual fly model to individual visual patterns

We further tested our model, termed “virtual fly model” hereafter, with a biophysical block by which the torque response of the fly is transformed to the actual heading change according to kinematic parameters estimated previously (Michael H Dickinson, 2005; Ristroph et al., 2010) (Fig. 2A, see Equation 4 in Methods and Movie S1). The virtual fly model, featuring position and velocity blocks that are conditioned on the type of the visual pattern, can now simulate the visual orientation behavior of flies in the free flight condition. This simulation experiment is reminiscent of the magnetically tethered flight assay, where a flying fly remains fixed at a position but is free to rotate around its yaw axis (Bender and Dickinson, 2006; Cellini et al., 2022). Additionally, we simplified the position and velocity functions for the bar and spot pattern with mathematically simple representations (Fig. 2B, see Equation 5-7 in Methods). For the grating pattern, we approximated the position response as zero and the velocity response as a constant value (Fig. 3B), where no fatigue effect was considered for the sake of simplicity (Fig. S1).

The flight control models with a bio-mechanics block predicted the dynamics of the orientation behavior to individual visual patterns.

(A) Schematics of three visuomotor response models with a biomechanics block. (B) Simplified version of the position and velocity responses for each pattern. (C) Simulation results for the three patterns (bar, spot, and grating) moving in a sigmoidal dynamics. The spot response was plotted with an 180° offset to facilitate comparison. Bar plots on the right show the latency of body angle with respect to the stimulus onset, measured at the 50% point of the pattern movement. (D) Simulation results for the three patterns moving in a sinusoidal dynamics. In the bar plots on the right, the amplitude was measured as the peak-to-peak amplitude, and the phase shift was calculated by measuring the peak time of the cross-correlation between the pattern and the fly heading. (E) Same as in C, but for visual patterns remaining static at 0 degree position. The simulation was performed 10 times with a gaussian noise component (gray lines). The mean response was plotted in thick colored lines. The probability density function of the body angle is shown on the right for each pattern.

Magnetically tethered flight experiments confirmed orientation changes predicted by the virtual fly model.

(A) Schematic of the magnetically tethered flight assay with an LED display (left). The image acquired from below was analyzed to estimate the body angle (right). The stimulus protocol (bottom) consisted of four phases: alignment, ready, go, and freeze. (B) Body orientation responses of a single fly for the bar, spot and grating patterns moving horizontally. (C) Same as in B for a population of flies. The population averages were replotted at the bottom to facilitate the comparison of their dynamics. (D) Amplitude and latency of the body orientation responses. The box represents the interquartile range (IQR), with the median indicated by the horizontal black line. The whiskers extend to the minimum and maximum values within 1.5 times the IQR. Outliers are denoted by “+” marks beyond the whiskers.

We tested the model using visual patterns that moved horizontally with three distinct dynamics: a rapid sigmoidal jump, sinusoidal oscillations, and Gaussian noise. The virtual fly model’s response to the sigmoidal visual patterns varied depending on the pattern (Fig. 2C). For the bar, the fly responded quickly toward the bar and refixated on it (see Equation 8 in Methods). The spot elicited a heading change in the opposite direction, albeit at a slower pace and with the pattern positioned behind the fly at the end (see Equation 9 in Methods). In the case of the grating, the response was the most rapid, but the heading change ceased early once the visual pattern stopped (see Equation 10 in Methods). This is due to the absence of positional cue in the grating and consistent with experimental results in a previous study (Collett, 1980). We measured the visual response latencies at 50% of the bar movement for each pattern: 150 ms for the bar and grating, and 360 ms for the spot. These latencies are considerably longer than the wingbeat response latency reported previously (H. Kim et al., 2023). This is likely in part due to the additional delay caused by the biophysical block in our model. We also expanded our model to take into account the known delay in the visual system (Fig. S3A-B), but for the sake of simplicity, we used the simplified models in the following.

For sinusoidally oscillating patterns, we measured the peak-to-peak amplitude and the phase shift of the wing responses (Fig. 2D and S3C). The amplitude of heading change was comparable to the amplitude of the bar and grating stimuli, but much smaller for the spot pattern (Fig. 2D). When we calculated the phase shift, we found that it was the longest for the spot, the shortest for the grating, and intermediate for the bar, consistent with the delay observed in response to the rapidly shifting patterns (H. Kim et al., 2023) (Fig. 2C). We also simulated our models under the static visual patterns, but with Gaussian noise added to its body torque (Fig. 2E, see Equation 11 in Methods). We observed fixation to the bar, anti-fixation to the spot, and stabilization to the grating, whereas the virtual fly model drifted randomly when no visual cue was present (Fig. 2E, bottom). Together, behavior of our models appeared to recapitulate the results of previous behavioral experiments (Collett, 1980; Götz, 1987; H. Kim et al., 2023; Maimon et al., 2008).

Orientation behaviors of magnetically tethered flying Drosophila to rapidly moving visual patterns

We validated the prediction of our model in flies flying in an environment similar to the one used in the simulation (Fig. 3A). A fly was tethered to a short steel pin fixed at a position by a vertically oriented magnetic field that also allowed it to rotate around its yaw axis with little friction (Fig. 3A). We captured the image of the fly from below in response to visual stimuli and calculated the body angle post hoc (Fig. 3A, Methods). To present the visual patterns at a defined position relative to the fly’s heading, we coaxed the fly to be aligned to a reference angle by oscillating a stripe pattern widely across the visual display (“alignment” phase, Fig. 3A bottom). We then presented for 0.5 s the initial frame of the visual pattern (“ready” phase), which is then rotated rapidly for 0.2 seconds in a sigmoidal dynamics (“go” phase) (Fig. 3A). After the movement, the pattern remained static for 2 seconds (“freeze” phase).

For all visual patterns, the fly turned in the direction that was predicted by the simulation (Fig. 2). Namely, flies turned toward the moving bar, away from the moving spot, and in the direction of the grating (Fig. 3B). The mean amplitude was comparable for the three patterns at around 20 degrees, which was almost half that of the angle of the pattern displacement (Fig. 3D). While the small response angle was expected for the grating pattern (Fig. 2C), the bar and spot response amplitudes were too small compared to the model prediction. This is in contrast to the match between the model prediction and behavioral results for a slow moving object (Fig. S1C). Recent studies showed that the visual responses of wings and heads are strongly suppressed by proprioceptive feedback (Cellini and Mongeau, 2022; Rimniceanu et al., 2023). Namely, the prediction of our model is larger than the actual heading change amplitude, possibly because we derived our model from the behavioral assay with no proprioceptive feedback and tested it with the one with proprioceptive feedback. The heading response latency was the longest for the spot, and similar for the bar and grating (Fig. 3D), in line with the virtual fly model (Fig. 2C). The discrepancy in the latency between the model and the fly might be again attributable to various factors such as the approximation in the model and different neuromechanical parameters between rigidly and magnetically tethered flight conditions.

With these experimental results in a magnetically tethered assay, we confirmed that our virtual fly models captured essential dynamics of the vision-based flight control in Drosophila, when the patterns are presented individually. In natural conditions, however, multiple patterns may be presented simultaneously in a visual scene. This led us to further develop the model to add a capacity of controlling their orientation under more complex visual environments.

Integrative visuomotor control models in a complex visual environment

In natural environments, various visual patterns might simultaneously activate distinct visuomotor circuits. These circuits, if activated, may normally trigger either synergistic or opposing actions. For instance, when a fly turns to track a foreground object moving against a static background (Fig. 4A), it experiences rotational optic flow from the background in a direction opposite to the object-evoked flight turn. This would in turn activate the visual stability reflex that opposes the object-evoked turn. Therefore, this raises the question as to how outputs from different visuomotor circuits integrate to control a shared actuator, such as the wing motor system. We considered three integrative models with different strategies. The first model is termed “addition-only model”, in which output of the bar and the grating response circuits are added to control the flight orientation (Fig. 4B, see Equation 12 in Methods).

Three integrative models of the visuomotor control and their predictions in a complex visual environment.

(A) Schematic of the visual environments used in the simulation. A moving bar is presented as a foreground object over the static grating background. (B) A diagram of the addition-only model. Bar and grating response circuits are joined at their output by addition. (C) A diagram of the graded EC model. An EC block translates the bar-evoked motor command to the negative image of the predicted grating input to counteract visual feedback. (D) A diagram of the all-or-none EC model. An EC switches off the grating response circuit during the bar-evoked turn. (E,F,G) Simulation results of the three models. The bar position and the heading of the virtual fly model (top). The associated torques as well as the EC signal (bottom).

In the second and third models, an EC is used to set priorities between different visuomotor circuits (Fig. 4C,D). In particular, the EC is derived from the bar-induced motor command and is sent to the grating response system to nullify visual feedback caused by the bar-evoked turn (22, 41). Although recent studies showed the existence of cell-type-specific EC-like inputs in Drosophila visual neurons (Fenk et al., 2021; Kim et al., 2017, 2015), it has not been addressed whether the amplitude of these inputs are fixed or modifiable depending on the amplitude of the turn. Thus, we proposed two EC-based models. In the “graded” EC model (Fig. 4C, see Equation 13 in Methods), the amplitude of the EC, ε, is adjusted according to the predicted optic flow feedback (eq.12). An example of such an EC has been characterized in detail in the cerebellum-like structure in the electric fish (Bell, 1981). On the other hand, in the all-or-none EC model (Fig. 4D, see Equation 14 in Methods), the grating response is blocked entirely by EC, η, during the bar-evoked turn regardless of the amplitude of the flight turn. An example for such a type of EC has been reported in the auditory neurons of crickets during stridulation (Poulet and Hedwig, 2006).

When we simulated the addition-only model, we observed that the virtual fly model reduced the error angle rather slowly, as expected (Fig. 4E, magenta trace). The slowdown is due to the torque generated by the grating, Fgrtng, which has the opposite sign to that of the bar-evoked torque, Fbar (Fig. 4E, bottom). In the graded EC model, we observed that the bar response was not impeded (Fig. 4F, orange trace in the top), and the heading dynamics matched that of the simulation with the empty background (Fig. 2C). The torque due to the grating, Fgrtng, remained at zero because the graded EC (Fig. 4F, dotted black trace in the bottom) counteracted the input to the grating response (Fig. 4F, solid black trace in the bottom). In the all-or-none EC model, the response to the bar was not dampened either (Fig. 4G, brown trace in the top) because the grating response was fully suppressed by the EC signal, η, during the turn (Fig. 4G, dotted brown trace in the bottom).

Our results suggested that the EC can be used to combine multiple visual reflexes and allows for a high-priority sensorimotor circuit (e.g., object tracking/avoidance) to silence low-priority pathways (e.g., visual stability response). We tested our models also in other visual settings and found that the EC-based models exhibit shorter latencies than the addition-only model (Fig. S4). Our EC-based models solve an important problem for visuomotor integration.

For changing visual environments, the graded EC model requires auto-tuning, but not the all-or-none EC model

Both the EC-based models were shown to disengage the visual stability reflex during object-evoked flight turns. The two models, however, differ in that the graded EC model is capable of responding to subtle mismatches between the predicted and actual background optic flow (Fig. S4C), whereas the all-or-none EC model should always exhibit the same dynamic regardless of the background. For example, when the visual environment changes suddenly, such as when moving from a forest to a desert (Fig. 5A), the dynamics of object-evoked flight turns would be unaffected in the all-or-none EC model, whereas significant heading response change would occur in the case of graded EC model.

Behavior of EC-based models in changing visual backgrounds.

(A) Two different visual environments, one with few objects in the background representing a weak visual feedback and the other with many objects such as trees representing a strong visual feedback. (B) A diagram of the auto-tuned, graded EC model, where a multi-layered perceptron (MLP) is used to adapt the EC amplitude to the changing visual feedback. The loss function is computed based on the mean squared error (MSE) between the angular position of the fly when the efference copy amplitude matches that of the visual feedback. (C) Simulation results for a moving bar with a static background over iterations when the visual feedback increases at the 400th iteration. The top plot depicts the change of the feedback amplitude and the EC amplitude, the second panel the MSE, and the third panel the 50% latency of the heading response. The bottom plot show the sample traces before and after the feedback change. (D) Same as in C, but when the feedback decreases at the 400th iteration. (E) Same as in C, but for the all-or-none EC model.

This also suggests that the graded EC model is required to update its EC according to changing visual environments, adjusting the prediction of reafferent inputs adaptively. Such modifiable ECs were reported in the electric fish’s cerebellum-like structure (Kennedy et al., 2014). We thus implemented an algorithm by which the amplitude of EC is auto-tuned by a multi-layer perceptron (MLP). The MLP generate the EC amplitude, a, as a prediction for the current feedback amplitude, Vgrating. In this model, the tracking error during each iteration of the visuomotor response was calculated as a loss function, and the weight parameters in the MLP were updated in a direction minimizing this error (Fig. 5B, see Equation 15 in Methods).

Using this model, we simulated a moving bar over a static grating for 2000 iterations, where the background changes from a sparse optic flow to a dense one at the 400th iteration (Fig. 5C). We observed that the response of the virtual fly model to the bar pattern slowed down significantly when the background changed, and as a result the error and the latency increased significantly (Fig. 5C). Over iterations, the EC amplitude increased toward the new feedback amplitude, and the latency decreased toward the initial value (Fig. 5C, bottom two rows). When the background changed from the dense to sparse optic flow, we observed the opposite behavior (Fig. 5D). For the all-or-none EC model, however, the background change did not result in any change in the flight turn dynamics, as expected (Fig. 5E).

Behavioral evidence of a visual efference copy in flying Drosophila

To determine whether flying Drosophila use an EC-based mechanism and also to dissociate between the two EC-based models, we replicated our simulation in the behavioral experiment using the magnetically tethered flight assay. In particular, we measured the amplitude of the orientation response to the moving vertical bar against two different random dot backgrounds that differ in their densities (Fig. 6A and Movie S2). We first tested how different the visual stability responses are between the two random dot patterns. As expected, the body orientation change was considerably larger with the dense background compared to the sparse background (Fig. 6B). Note also that the response amplitude to the dense-dot background was comparable to that of the grating motion that we observed previously (Fig. 3D). This indicates that, unless the stability reflex is suppressed during the flies’ object-evoked turns, the turn should slow down more strongly with the dense background than the sparse one.

Dynamics of object-evoked flight turns were not affected by the background pattern.

(A) Temporal profile of the visual stimulus used to test the bar response when the background changes from the uniform to the dense-dot background. In each frame, the horizontal midline of the display is sampled and plotted over time (top). Sample pattern images at the moment of the bar movement onset (bottom middle). (B) Body angle changes to the rotation of the random dot backgrounds in two different densities (left). The amplitude is significantly larger for the dense background than the sparse one (right, Wilcoxon rank sum test). (C) Average body orientation traces in response to horizontally moving bars when the initial uniform, bright background is kept the same or changed to one of the two random-dot backgrounds. The thick colored lines in the top plots represent the population average, whereas the gray lines represent an average for individual flies. Box-and-whisker plots depict the amplitude (middle) and latency (bottom) of the body angle change. Error bars (bottom) indicate 95% confidence interval. (D) Same as in (C), but for different combinations of background patterns. (E) Schematic of the stimulus protocol for testing adaptation in the bar response with a maintained background. The background was maintained in either sparse or dense dot patterns for 7 minutes (“priming phase”) and then changed to a different density. After this change, the same background pattern was maintained for 14 minutes (“test phase” #1 and #2). (F) Population-averaged body angle traces in response to a moving bar in three different phases.

We then assessed the fly’s bar-evoked flight turns with changing visual background (Fig. 6A,C), as in the simulation (Fig. 5C-E). That is, we ran the “alignment” and “ready” phase with a uniform, bright background and then either kept it the same or changed to one of the two random dot backgrounds at the moment of the bar movement onset (Fig. 6A). To match the saliency level of the bar pattern the same in two different backgrounds, we kept the foreground visual field (±55° from the center) the same as the initial, uniform background. Surprisingly, we found that the bar response was virtually the same regardless of the background dot density (Fig. 6C).

To further corroborate this result, we carried out the same experiment, but with the dots filling the entire panoramic visual field and the background changed 0.25 s prior to the bar movement onset (see Fig. S5A for the depiction of the protocol). We tested flies with four different combinations of background patterns: sparse-to-sparse, sparse-to-dense, dense-to-dense, and dense-to-sparse. We discovered that the amplitude of the flight turn, in terms of both the body orientation and the head yaw angle, was not significantly influenced by the background pattern (Fig. 6D, Fig. S5B). Together, these results challenged the addition-only model, which suggested the strong dependency of heading responses on the visual feedback from the background pattern. Furthermore, our results also counter the graded EC model because in this model, the background change at the time of or immediately prior to the bar movement would have caused a mismatch between predicted and actual optic flow feedback, leading to response change, as shown in the simulation (Fig. 5C,D). We did not observe such changes in the behaving animal. Furthermore, no significant response change was observed even when we repeated the above experiment in response to a looming disc pattern in place of the moving bar (Fig. S5C-E). That is, the loom-evoked flight turn was not affected either by the change in the background optic flow density.

To further test the possibility of the auto-tuned, graded efference copy, we tested the fly’s bar response against a maintained background for an extended period of flight (∼14 minutes). Prior to this extended exposure to a single background pattern, we primed the fly with a random-dot background in a different density for roughly 7 minutes (Fig. 6E). The resulting data revealed no significant changes in the flight turn amplitude across the three phases: a priming phase with one background (∼7 min), an early test phase with the other background (∼7 min), and a late test phase with the maintained background (∼7 min) (Fig. 6F). Altogether, we observed that flies’ object-evoked flight turns were not significantly affected by the density of random dots in the background, at least in the time scale that we could have tested. Altogether, our results suggested that flying Drosophila uses the all-or-none EC strategy in complex visual environments (Fig. 4D), wherein the visual stability response is entirely suppressed during object-evoked turns (Fig. 5E).

Discussion

Drosophila visuomotor circuits have been successfully dissected in recent years, but in a fragmented manner for limited sensory and behavioral contexts. To build a framework for quantitatively testing these circuits and their interactions in complex visual environments, we developed an integrative model of visuomotor flight control. In this model, multiple visual features collectively determine flight heading through an EC mechanism. The model accurately predicted the flight orientation changes in response to singly presented patterns as well as to superpositions of those visual patterns. Our results provide definitive behavioral evidence for a visual EC in Drosophila and offer a framework for the potential implementation and testing of detailed models of constituent neural circuits.

Models of Drosophila visuomotor processing

Visual systems extract features from the environment by calculating spatiotemporal relationships of neural activities within an array of photoreceptors. In Drosophila, these calculations occur initially on a local scale in the peripheral layers of the optic lobe (Gruntman et al., 2018; Ketkar et al., 2020). When these features enter the central brain, they are integrated across a large visual field by converging either to a small brain region or to a single cell (Mauss et al., 2015; Wu et al., 2016). These features are subsequently relayed to other central brain regions or descending neural circuits, which ultimately control action (Aymanns et al., 2022; Namiki et al., 2018). With recent advances in understanding the structural and functional principles of its visuomotor circuits, Drosophila presents a unique opportunity to reverse-engineer the entire visuomotor processes. The mechanisms for key visual computations such as direction-dependent motion computation, collision avoidance, stability reflex, object tracking, and navigation are under active investigation, and in silico models for some of these sensory computations are already available (Borst and Weber, 2011; Gruntman et al., 2018). In the motor end, the neuromechanical interaction between the animal’s actuator and external world was modeled in a recent study for walking (Lobato-Rios et al., 2022). What remains to be explored is the mapping of the features in the visual space onto a coordinated activation pattern of motor neurons by higher-order brain circuits, in a way to optimize the organism’s objective functions (i.e., survival and reproduction). Our study provides an integrative model of the visuomotor mapping in Drosophila. When expanded with neuromechanical models as well as models of visual circuits, the in silico model can not only be used for the quantitative understanding of the visuomotor processing but also for building insect-inspired robots (Franceschini et al., 1992; Webb, 2002; Yue and Rind, 2005).

In upcoming studies, our model will be expanded to incorporate additional details of visual behaviors as well as the underlying neural circuits. For instance, recent studies have shown that flies fixate on a rotating bar by interspersing rapid flight turns (or saccades) with stable heading periods, an algorithm referred to as the “integrate-and-saccade” strategy (G. Kim et al., 2023; Mongeau and Frye, 2017). The distinction between the integrate-and-saccade strategy and smooth tracking is crucial because it strongly suggests the existence of distinct neural circuits for the two visually guided orientation behaviors. However, when we averaged wingbeat signals across trials to determine the position and velocity functions (Fig. 1), we effectively approximated the integrate-and-saccade strategy as a smooth pursuit movement. Future experimental and modeling research will refine our current model with behavioral and biophysical details.

Visual efference copy in Drosophila

Under natural conditions, various visual features in the environment are poised to concurrently activate cognate motor programs, which may interfere with each other through reafferent inputs. It is thus crucial for the central brain to arbitrate the motor signals originating from different sensorimotor circuits. The EC, as a copy of a motor command, participates in this process by modifying or inhibiting specific sensory signaling based on the motor context (Kim et al., 2017, 2015; von Holst and Mittelstaedt, 1950). Recent studies reported such EC-like signals in Drosophila visual neurons (Kim et al., 2017, 2015; von Holst and Mittelstaedt, 1950). One type of EC-like signals were identified in a set of wide-field visual motion-sensing neurons that was shown to control the neck movement for the gaze stability (Kim et al., 2017). The EC-like signals in these cells were bidirectional depending on the direction of flight turning, and their amplitudes were quantitatively tuned to that of the expected visual input in each cell type. Another type of EC-like signals observed in the Drosophila visual system possesses contrasting features. It entirely suppresses visual signaling, irrespective of the direction of the self-generated turn (Kim et al., 2015). Our behavioral experiments confirmed for the first time that the EC in Drosophila vision acts to fully suppress optic flow-induced stability reflexes during visually evoked flight turns. Why would Drosophila use the all-or-none EC instead of the graded one? Recent studies on electric fish suggested that a large array of neurons in a multi-layer network are crucial for generating the modifiable efference copy signal matched to the current environment (Muller et al., 2019). Given their small-sized brain, flies might opt for a more economical design for suppressing unwanted visual inputs regardless of the visual environments. Circuits mediating such a type of EC were identified in the cricket auditory system during stridulation (Poulet and Hedwig, 2006), for example. Our study strongly suggests the existence of a similar circuit mechanism in the Drosophila visual system.

Materials and Methods

Fly stocks and rearing

We used female Oregon-R Drosophila melanogaster 2-7 days post-eclosion for the experiments, unless mentioned otherwise. Flies were reared on standard cornmeal agar in 25°C incubators with a 12h-12h light/dark cycle and in 70% humidity. For all experiments, flies were cold-anesthetized at 4°C and tethered either to a tungsten pin for the rigidly tethered experiments (Fig. 1) or to a steel pin for the magnetically tethered experiments (Figs. 3,6).

Visual display

We used a modular LED display system composed of 8x8 dot matrix LED arrays (Reiser and Dickinson, 2008). The visual stimuli were displayed on the interior of a cylindrical display arena featuring green LED pixels with a 570-nm peak wavelength, facing inward. The display covered 360 degrees (96 pixels) in the azimuth and 94 degrees (32 pixels) in elevation. From the animal’s perspective, each LED pixel subtended less than or equal to 3.75 degrees, which is narrower than the inter-ommatidial angle of the fly (∼5 degrees) (Götz, 1965).

Magnetic tethering setup

We constructed a magnetic tethering setup following the designs used in previous studies (Duistermars and Frye, 2008; Kim et al., 2017; Mongeau and Frye, 2017). The support frame for the two neodymium magnets was designed in Fusion 360 (Autodesk, Inc.) and manufactured using a 3D printer (Ender 3 S1, Creality) (see Fig. 3(A)). The bottom magnet had a ring shape with an outer diameter of 32 mm, an inner diameter of 13 mm, and a height of 10 mm. The top magnet was cylinder-shaped with a diameter of 12 mm and a height of 10 mm. The distance between the top of the bottom magnet and the bottom of the top magnet was 25 mm. A V-shaped jewel bearing (Swiss Jewel Company) was attached to the bottom surface of the top magnet using epoxy and was used to anchor a steel pin-tethered fly. We positioned a Prosilica GE-680C camera (Allied Vision Inc.) with a macro lens (Infinistix, 1.5x zoom, 94-mm working distance) on the ground to capture the fly’s behavior from below. To illuminate the fly without interfering with its vision, we affixed four infrared LEDs (850-nm peak wavelength) concentrically to the top surface of the bottom magnet support. Furthermore, we applied matte black acrylic paint (Chroma Inc.) to the bottom surface of the top magnet to minimize background glare in the acquired images and to the top surface of the bottom magnet to minimize visual interference to flies.

Visual stimuli

To estimate the position and velocity functions, we measured flies’ wing responses in response to three visual patterns moving at the speed of 36°/s for 10 seconds (Fig. 1). The bar pattern was a dark vertical stripe with a width of 15° (4 pixels) on a bright background that moved laterally from the back to back for a full rotation in either clockwise or counterclockwise directions. The spot pattern was a 2x2-pixel dark square on a bright background and moved in the same dynamics as the bar. The grating pattern consisted of 12 cycles of dark and bright vertical stripes, each with a width of 15° (4 pixels). For the magnetically tethered flight experiments (Figs. 3,6), the patterns moved rapidly 45 degrees in a sigmoid dynamics. Each pattern consisted of 4 distinct phases: alignment, ready, go, and freeze. The vertical pattern used for the alignment consisted of 3 dark, 2 bright, 1 dark, 2 bright, and 3 dark pixel vertical stripes, spanning a total of 41.25° (11 pixels) in width. To align the orientation of the fly to the reference angle, this pattern moved horizontally in a sinusoidal dynamics for 4-5 seconds at the frequency 1 period/s, with its envelope amplitude decreasing linearly (Movie S2). The bar pattern turned to a dark uniform bar over a bright background with a width of 11.25° (3 pixels) and was used for the rest of the stimulus period. The random dot pattern was created by adding dark pixels at a random position in the display arena. The sparse dot pattern had these dots in 7% of the total pixels, while the dense dot pattern had 40%. To compensate for any asymmetry that may exist in the rigidly tethered flight experiment, each pattern was presented multiple times in two opposing directions. Similarly, in the magnetically tethered flight experiment, we presented the same pattern in two opposing positions, and in the two opposing directions. We inverted all the body orientation signals in the counterclockwise trials and merged them with the clockwise trials.

Behavioral experiment

For the rigidly tethered flight experiments (Fig. 1), we glued the anterior-dorsal part of the fly’s thorax to a tungsten pin using a UV-cured glue (Bondic). The other end of the pin was inserted into a pin receptacle located at the center of the visual display. The fly body orientation was tilted down from the horizontal by 30° to match the normal flight attitude (David, 1978). The fly was illuminated by four 850-nm LEDs on the ring-shaped platform positioned on the top of the camera (Fig. 1A). The fly was imaged from the bottom by a GE-680 camera at 60 frame/s with a macro zoom lens (MLM3X-MP, Computar, NC, USA) and an infrared long-pass filter. Wingbeat signals were analyzed in real-time by FView and motmot package (Straw and Dickinson, 2009). For the magnetically tethered flight experiments (Figs. 3,6), we instead used a short steel pin to levitate the fly by a magnetic field.

Data analysis

For quantifying wing responses from the rigidly tethered flight experiment, we first subtracted the left wingbeat amplitude from the right wingbeat amplitude to calculate the L-R WBA for each recording (Fig. 1B). For each fly, we determined the stimulus-triggered L-R WBA for every pattern, provided that the number of trials in which the animal continuously flew was at least 6. We then computed the mean L-R WBA across the trials for each fly, and from these individual means, we determined the population-averaged L-R WBA (Fig. 1). In the magnetically tethered flight experiment, we captured images of the flies at 60 frames per second (Figs. 3,6). The camera was externally triggered by an Ubuntu computer via a USB-connected microcontroller board (Teensy, PJRC), and the trigger signal was stored on the Windows computer using WinEDR (University of Strathclyde, Glasgow), along with the stimulus type and position signals.The acquired images, along with their timestamps, were saved on the Ubuntu computer using FView (Straw and Dickinson, 2009). To analyze body kinematics (Figs. 3,6), we wrote a custom machine vision code in Matlab. Namely, we calculated the body angle by binarizing each frame, performing an eroding operation to remove small non-fly objects, and then obtaining the orientation of the largest region using regionprops() function in Matlab (Movie S3). To measure the amplitude of the visually evoked responses (Figs 3D, 6B-D), we subtracted the mean body angle during the 300-ms interval immediately prior to the stimulus onset from the mean in the 300-ms interval starting 200 ms after the onset. The 50% latency was measured as the time at which the body angle crosses the 50% of the total pattern displacement from the baseline, with respect to the pattern movement onset (Figs. 3E). Additionally, we measured the head angle (Fig. S5B) by tracking the position of both antennae with respect to the neck (Movie S3) through the deep learning-based software DeepLabCut (Mathis et al., 2018). The body kinematic variables were stored separately from the stimulus parameters and combined post hoc for further analyses. To synchronize these signals, we inserted a 2-s pause in the camera trigger at the beginning and end of each experiment. These no-trigger intervals appeared in both sets of data files and were used to align kinematic variables acquired from the image data to the stimulus parameters.

Statistical analysis

All statistical analysis was performed in Matlab. Heading responses to random dot patterns in the two different densities were analyzed by Wilcoxon rank-sum test (Fig. 6B). Responses to moving bars (Fig. 6C-D) and to looming discs (Fig. S5D-E) for different background profiles were analyzed by one-way ANOVA test.

Biomechanics models of flying Drosophila (Fig. 2)

The orientation of the fly is obtained through a second order dynamical system in which the position and velocity components are added up as the input.

where ψflyis the angular position of the fly, the angular velocity, the angular acceleration, I = 6·10-14 kg m2 the moment of inertia of the fly’s body, β = 1·10-11 kg m2 s-1 the drag coefficient of its wings (Michael H. Dickinson, 2005; Ristroph et al., 2010), ψe the error angle between the fly and the pattern, and the error velocity.

We approximated the experimentally obtained position and velocity functions with the following simple mathematical representations for each pattern (Fig. 2B).

We have defined the dynamical system (eq.4) as a system of ordinary differential equations (ODE), and solved it via Runge-Kutta method 45 through the MATLAB function ode45. We have defined a system of ODE for each visual pattern (Fig. 2A) as follows:

where n is the gaussian noise component.

Addition-only model (Fig. 4B)

By following the same ODE system approach, we defined the integration of the torque response from the bar system, Fbar, and from the grating system, Fgrtng, additively in the addition-only model as follows.

Graded EC-based model (Fig. 4C)

In the case of the graded EC-based model, we defined the integration of the torque responses through an additional ODE that produced the same angular velocity value due to the bar, , and it was subtracted from the grating torque.

where ε is the predicted value of by the EC, and its change over time.

All-or-none EC-based model (Fig. 4D)

In the all-or-none EC-based strategy, the grating torque, Fgrating, is switched off to 0 when the bar-evoked torque is non-zero.

where η is an indicator function that takes the value of 1 when |Fbar| > 1. 5·10−11 and 0 elsewhere.

Auto-tuning mechanism for the graded EC-based model (Fig. 5B)

We added a multi-layer-perceptron (MLP) to the graded EC-based model (eq.13). The output of the MLP is injected into the EC (Fig. 5B). The equations and pseudocode for these compuations are as follows:

Initialize MLP with random parameters wout, wh

for iteration = 1, M do

where M = 2000 is the number of iterations, T = 2.5 s is the time range, ΔVgrating is the change in the visual feedback, α is the output of the MLP, rh is the output of the hidden layer, rin is the input, ψfly0 is the heading of the fly in non changeable visual feedback conditions, and γ=1·10-4 is the learning rate.

Fly simulator

We have built a graphic user interface (GUI), where the user can simulate the magno-tethered flying fly for the three simple visual patterns, as well as their superpositions (e.g., bar and grating) for the addition-only and EC-based models. The GUI shows the heading and torque traces in real-time, as well as an animated fruit fly in the center of a LED cylinder, which rotates in response to visual patterns (Movie S1). The GUI allows one to choose the pattern to be displayed, and the maximum angular position amplitude of the stimulus. It also permits a recording of the simulation to generate its movie.

Code availability

The essential code used to generate the primary results and conduct the simulations for this study is available in our GitHub repository (https://github.com/nisl-hyu/flightsim).

Acknowledgements

We would like to thank all the lab members for the discussion and comments on the manuscript.

This research was supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) under Grant (No.2020-0-01373, Artificial Intelligence Graduate School Program (Hanyang University)); by Basic Science Research Program, the Bio & Medical Technology Development Program, and Police-Lab 2.0 Program through the National Research Foundation of Korea (NRF) funded by the Korea Government (MSIT) under Grant NRF2020R1A4A1016840, NRF2021M3E5D2A01023888, NRF2022R1A2C2007599, NRF2022M3E5E8081195, and RS-2023-00243032.

Author contributions

A.C. and A.J.K. conceived the study and wrote the manuscript. A.C., with input from A.J.K., performed all the simulations. Behavioral experiments were designed by A.C. and A.J.K., carried out by Y.K. and J.P, and analyzed by A.C. and A.J.K.

(Related to Fig. 1) An expanded model predicts the history-dependent dynamics in the wing responses.

(A) Schematic of an expanded flight control model with a motor-wing dynamics and a fatigue block. (B) A heat map indicating the errors in the grating responses for different combinations of the fatigue block parameters. (C) Visually evoked wing responses to the three rotating patterns and the predictions of the two models. (D) Position and velocity responses of the simple and expanded models.

(Related to Fig. 1) Construction of flight control models for Canton S flies.

(A) Schematic of the experimental setup (left), a frame captured by the infrared camera (middle), and a simplified schematic of the experimental setup (right). The annulus surrounding the fly schematic represents the visual display viewed from above. (B) L-R WBA traces of a sample fly in response to the bar and spot visual patterns. The thick black lines indicate the average of all trials (top) or the average of all flies (bottom). Thin gray lines indicate individual trials (top) or fly averages (bottom). (C) Schematic of the position-velocity-based flight control model. (D) Position and velocity functions estimated from wing responses in C. Light purple shading indicates 95% confidence interval.

(Related to Fig. 2) The response latency of the flight control model.

(A) Same as in Fig. 2A with an additional delay block. (B) Same as in Fig. 2C comparing delayed and non-delayed heading of the virtual fly model for each visual pattern. (C) Same as in Fig. 2C for different peak amplitudes of the sigmoid-like pattern trajectory and different frequencies of the sine-like trajectory. Plots on the right show the change of delay with respect to the stimulus amplitude (top) and the frequency of the sine stimulus (bottom two).

(Related to Fig. 4) Orientation behaviors of the two efference copy-based models for different stimulus conditions.

(A) Diagrams of the graded and all-or-none EC models. (B) The change in the EC signals with respect to the bar evoked flight torque. (C) Heading and torque responses to four different stimulus conditions: a moving bar in the first step, a moving bar and grating leftwards in the second step, a moving bar rightwards and grating leftwards in the third step, and a moving grating in the fourth step. (D) Bar plots indicating the amplitude of the response and the latency with respect to the stimulus onset, measured at the 50% point of the pattern movement for the different stimulus combinations. Dotted lines were added to facilitate the comparison across conditions. The all-or-none EC model exhibited less variability than the graded EC model.

(Related to Fig. 6) Unexpected changes of the background pattern did not affect loom-evoked behavioral responses.

(A) Temporal change of the visual stimulus profile along the horizontal midline of the display. Body angle responses are shown in Fig. 6D. (B) Head angle responses to the stimulus profile in A with four different combinations of background patterns. The head angle was calculated from DeepLabCut. (C) On the left, schematic of the experimental display showing the initial position of the loom stimulus. On the right, stimulus profile and background combinations. (D) Body angle responses to the stimulus profile in C, for the different background combinations. (E) Same as in D for different background combinations.

Movie S1 (Related to Fig. 2). A movie showing a simulation using the GUI fly simulator. The simulation corresponds to a moving bar stimulus. On the left the virtual fly model animation, and the simulated LED display, on the right the plots showing the bar position, body angle, and the torque.

Movie S2 (Related to Fig. 6). A movie showing visual patterns used for testing object-evoked flight turns in changing backgrounds.

Movie S3 (Related to Figs. 3,6). A fly movie corresponding to an experimental trial in the magno-tethering setup showing the original frame, the processed frame to obtain the body angle, and the frame where the body parts are tracked via DeepLabCut. In the plots below the frames, the stimulus position, body orientation, and head angle are shown respectively. The motion of the body, and head are synchronized with the video. The video was captured at 60 frames per second.