Introduction

In recent decades, biologists have increasingly relied on a handful of genetically tractable species to study questions related to behavioral mechanisms and their underlying neural circuitry (Devineni & Scaplen, 2022; Piggott et al., 2011; Zhu & Goodhill, 2023). However, it is often not clear to what extent these findings in model organisms can be generalized to other taxa or even closely related species. For example, recent comparative behavioral studies in drosophilids that diverged less than 40 mya found differences in both locomotion (York et al., 2022) and sequencing of spontaneous behaviors (Hernández et al., 2021). For vertebrates, the zebrafish larva has become a dominant model for understanding the neuronal circuits and pathways controlling innate behavior (Baier & Scott, 2009; Friedrich et al., 2010; Gahtan & Baier, 2004; Orger & de Polavieja, 2017; Portugues & Engert, 2011). While many aspects of the visuomotor transformations serving hunting behavior (i. e., search, pursuit, and capture of prey) have been revealed in this species (reviewed by Zhu & Goodhill, 2023), it is not known if strategies employed by zebrafish are representative of those across larval teleosts or if they constitute a derived adaptation to the zebrafish’s specific ecological niche. Such knowledge would allow a search for conservation, or divergence, of neural circuit implementations across species.

Here, we applied advances in computational ethology, such as automated tracking with deep neural networks (Mathis et al., 2018; Pereira et al., 2019) and unsupervised analyses (Berman et al., 2014; Marques et al., 2018; Mearns et al., 2020; Wiltschko et al., 2015) to compare hunting behavior of zebrafish to the Japanese rice fish, medaka (O. latipes) (Mano & Tanaka, 2012) and four cichlids from Lake Tanganyika (El Taher et al., 2021; Higham et al., 2007). Medaka and cichlids belong to a diverse clade of teleosts known as percomorphs, whose last common ancestor with ostariophysi, such as zebrafish, goldfish, and Mexican cavefish, lived ∼240 mya (Betancur-R et al., 2017) (Fig. 1A). While medaka and cichlids are only distantly related (∼100 mya; Betancur-R et al., 2017), the haplochromine species and the three lamprologine species studied here diverged recently, within the past 3 million years (Ronco et al., 2021). We discovered that swim kinematics and the use of eye movements differ qualitatively and quantitatively between these species. This divergence may be driven, in part, by differences in the sensorineural mechanisms underlying prey detection.

Tracking prey capture behavior in five species of fish larvae. (A) Phylogenetic relationship between percomorph species used in this study, and their relationship to zebrafish (a cyprinid). Based on Betancur-R et al. (2017). (B) Schematics of larvae of each species studied here (and zebrafish, Danio rerio, for reference). Scale bar: 1 mm. (C) Outline of tracking procedure. Raw video frames were analyzed with three different neural networks to extract tail pose, eye pose and prey position in each frame. Kinematic features were extracted from raw pose data and used for subsequent analyses. (D-F) Tracking output for each neural network for a single frame from a recording of an L. attenuatus larva, showing identified artemia (D), tail points (E) and eye points (F). (G) One minute of tail and eye tracking from L. attenuatus showing tail tip angles (top trace) and right and left eye angles (bottom traces) relative to midline. Bottom trace, gray shaded boxes: automatically identified prey capture periods, when the eyes are converged (see Figure 2). Top trace, shaded boxes: automatically identified and classified bouts. Color corresponds to cluster identity in Figure 4. (H) Cumulative distribution of bout lengths across species for all fish combined. Colors as in (A) and (B). Medaka bouts (purple line) are longer than cichlid bouts.

Results

Artificial neural networks track swimming behavior and eye movements in percomorph larvae

At five to seven days post fertilization (dpf), zebrafish larvae robustly feed on paramecia. At 12 dpf, they are able to eat the larger and more nutritious artemia. To stage-match larvae of different species to zebrafish, we determined the age at which they clearly pursued prey (Fig. 1B). When placed into a dish, cichlid larvae started feeding later than zebrafish, from 12-14 dpf, and could be fed artemia right away. Medaka started feeding at approximately 10 dpf, a stage where they are unable to eat artemia, due to their small size. We adapted an experimental paradigm used to study prey capture in zebrafish to these other species (Mearns et al., 2020). Individual larvae were placed in chambers with prey items (either artemia for cichlids or paramecia for medaka). We recorded each animal for 15 minutes using a high-speed camera (Suppl. Videos 1-5, Methods). We then used neural networks to extract tail pose, eye movements, and prey locations from videos (Fig. 1C). Turning to neural networks allowed us to train models to efficiently and accurately track the tail pose and eye movements in species characterized by varying morphological features (Fig. 1B; Suppl. Video 6). We trained a 12-point SLEAP model (Pereira et al., 2022) to track the tail, and a 7-point model to track the eyes of larvae (Fig. 1E, F). We tracked prey position using YOLO (Redmon et al., 2016; Fig. 1D). We subsequently used these estimated pose dynamics and prey location information to compare hunting and swimming across species.

Larvae of different species have diverse swim patterns

At the first-feeding stage, zebrafish larvae swim in discrete, discontinuous bouts (Budick & O’Malley, 2000), which merge into continuous swimming at the juvenile stage (Westphal & O’Malley, 2013). In contrast, Danionella cerebrum, a close relative of zebrafish, exhibits slow continuous swimming already as larvae (Rajan et al., 2022). It is not known how larval swimming patterns differ between more distantly related species.

We found marked variation in the continuity of swimming in early percomorph larvae (Fig. 1G; Suppl. Fig. 1). On one extreme, medaka (OL) and A. burtoni (AB) swam with a “motorboating” style characterized by sustained, uninterrupted tail undulations over many seconds with only short breaks between swimming episodes, similar to Danionella. On the other extreme, L. ocellatus (LO) and N. multifasciatus (NM) had an intermittent swimming style, often resting at the bottom of the chamber for minutes at a time with quiescent periods interrupted by short, rapid bursts of activity. L. attenuatus (LA) showed a behavior intermediate to these extremes, alternating between rapid tail beating and quiescence, each lasting on the order of a few seconds, a discontinuous style similar to zebrafish.

Analyses across the animal kingdom suggest that behavior is organized into sub-second kinematic motifs (Berman et al., 2014; Stephens et al., 2008; Tinbergen, 1951; Wiltschko et al., 2015). Inspecting periods of swimming across percomorph species revealed significant substructure within swimming episodes (Fig. 1G; Suppl. Fig. 1). We segmented continuous tail traces into discrete bouts using a change detection algorithm adapted from Mearns et al., 2020 (see Methods), which revealed changes in swim kinematics occuring on the order of tens to hundreds of milliseconds (median seconds: 0.33, LO; 0.29, NM; 0.38, LA; 0.36, AB; 0.49, OL). Furthermore, medaka exhibited significantly longer swims than cichlids (p < 0.001, KS test) (Fig. 1H). These results highlight differences in how fish larvae pattern their swim episodes.

Larvae of different species use diverse strategies to hunt prey

Zebrafish larvae converge their eyes at the onset of hunting episodes, and keep their eyes converged over the course of prey capture (Bianco et al., 2011; Patterson et al., 2013). Eye convergence aids prey capture by bringing prey into a binocular zone in the central visual field (Gahtan et al., 2005; Gebhardt et al., 2019).

During pursuit of prey, cichlids exhibited eye convergence similar to zebrafish. We did not observe a hunting episode in any of the four cichlid species in which eyes were not converged. In striking contrast, medaka did not converge their eyes. We plotted the distributions of eye convergence angles for each species and found that two peaks in the distribution of eye convergence angles were visible in LA. Other cichlids also spent significant time with their eyes converged, more than 50 degrees (Fig 2A). To test whether two eye convergence states also existed in these species, we fitted Gaussian mixture models to the eye convergence data of each species with one or two components. For all species except medaka, modeling the data using two underlying distributions provided a better fit than a single distribution (Fig. 2B). The unimodal distribution of eye vergence angles in medaka was also present when we plotted the joint distribution of left and right eye angles (Suppl. Fig. 2), indicating that this species does not perform eye convergence in our behavioral assay. Taken together, these results are in line with our observations that cichlids converge their eyes during hunting, while medaka do not, and justify the use of eye convergence to identify hunting events in the four cichlid species.

Eye movements during prey capture and statistics of hunting sequences. (A) Histograms of eye convergence angles for each species. The convergence angle is the angle between the long axes of the eyes. Shaded bars: normalized binned counts of convergence angles from all fish. Black lines: a best fit gaussian mixture model for each species (with one or two components). For all cichlid species, the data are better modeled as being drawn from two underlying distributions. For medaka (O. latipes), the data are better modeled with a single underlying distribution. (B) Improvement in fit for a two component over single component Gaussian mixture model, assessed using Bayesian inference criterion (BIC). The BIC is a measure of model fit, while punishing over-fitting. Lower values are better. Two mixtures provide a better fit over a single mixture for all species except medaka (OL). (C) Proportion of time spent by cichlids engaged in prey capture within the first five minutes of being introduced to the behavior arena. Points are single animals; black bar is the median. (D) Hunting rate, measured as the number of times eye convergence is initiated per minute, within the first five minutes. Points are single animals; black bar is the median. (E) Kernel density estimation of hunt durations for each species (all animals pooled). A. burtoni (pink) hunts skew shorter, L. attenuatus (brown) hunts tend to be longer. (F) Median hunt duration for each animal compared across species. Black bar is the median across animals. (G) Capture rate (number of artemia consumed) per minute over the first five minutes for three cichlid species. Black bar is the median across animals.

The correlation of eye convergence and prey pursuit enabled us to measure the frequency and duration of hunting episodes in the four cichlids (Fig. 2C). This analysis could not be extended to medaka on a sufficient scale, due to their lack of eye convergence and the poor visibility of paramecia in our setup. We determined that LO, NM, and AB spent less than 10% of the time hunting (median proportions: 0.064, LO; 0.036, NM; 0.096, AB). LA, on the other hand, spent about a quarter of the time actively engaged in prey capture, with all individuals tested spending at least 10% of their time and one individual spending over 40% of the time hunting (median: 0.25, p-values < 0.05 comparing LA to all other species, Mann-Whitney U test with Bonferroni correction).

Surprisingly, although LO, NM and AB all spent comparable total time engaged in prey capture, AB initiated hunts at a much higher rate, similar to LA (Fig. 2D) (median hunts/min: 2.45, LO; 1.98, NM; 6.85, LA; 8.49, AB; p-values < 0.05 comparing either LO or NM to LA or AB, Mann-Whitney U test with Bonferroni correction). AB hunts were also shorter than other species, while LA hunts tended to be longer (Fig. 2E-F). LO and NM individual hunt durations were similar to each other and intermediate to the other species (medians across fish (seconds): 1.12, LO; 1.14, NM; 1.77, LA; 0.73, AB). Despite these differences in time engaged in prey capture, species consumed prey at similar rates of approximately one artemia per minute (Fig. 2G).

Taken together, our results highlight differences among closely related cichlids in hunting behavior, at least in our setup and at the stages examined. LA are the most persistent hunters, initiating prey capture often and spending a longer time engaged in the behavior once initiated (Suppl. Video 1). In contrast, AB prey capture dynamics are characterized by a high rate of short-duration hunting episodes (Suppl. Video 4). Both LA and AB actively hunt in our behavioral assay, while LO and NM initiate prey capture rarely, spending much time resting at the bottom of the chamber and moving only occasionally when they dart towards prey (Suppl. Videos 2&3).

Cichlids center prey within a strike zone

Zebrafish larvae strike at prey once it is localized in the center of their binocular visual field and ∼0.5 mm away (the “strike zone”) (Mearns et al., 2020; Patterson et al., 2013). This centering behavior is impaired when animals are blinded in one eye (Mearns et al., 2020). To test whether cichlids similarly center prey within a strike zone, we identified the most likely targeted artemia during each hunting episode and studied how the prey moved through the visual field as hunting sequences progressed (Fig. 3A, see Methods) (number of events: 176 for LO; 361 for NM; 756 for LA). This revealed that, within each prey capture sequence, prey became increasingly localized to the near central visual field over time. Computing kernel density estimates of prey distributions at the beginning, middle, and end of each hunting episode revealed that cichlids initiate prey capture at a wide range of distances and that they center prey in the visual field in the early stages of a hunting sequence (Fig. 3B). For instance, the azimuthal angle of prey only decreased over the start of the hunting episode (Fig. 3D) (p-value > 0.05, bootstrap test difference between medians at middle and end of prey capture). They then subsequently close the distance between themselves and their prey over the latter half of a hunt sequence (p-value < 0.001, bootstrap test difference between medians). On average, the cichlids’ eyes converged when prey were ∼5 mm away (median distance, mm: 5.43, LO; 3.98, NM; 4.17, LA). However, LO, NM, and LA could detect prey up to 15 mm away (Fig. 3C). LA often initiated prey capture when artemia were already in the center of the visual field (Fig. 3A, B & D).

Location of prey in the visual field during prey capture in cichlids. (A) Trajectories of prey in the visual field for all automatically identified and tracked hunting events. Each prey is represented by a single line that changes in color from blue to yellow from the onset of eye convergence to when the eyes de-converge. Scale bar: 5 mm. (B) Kernel density estimation of the distribution of prey in the visual field across all hunting events. Rows: individual species. Columns: snapshots showing the distribution of hunted prey items at the beginning, middle and end of hunting sequences. Scale bar: 2 mm. (C-D) Kernel density estimation of prey distance (C) and azimuthal angle from the midline (D) at the onset (top), in the middle (center) and at the end (bottom) of hunting episodes. Each column shows the distribution from all events for a single species.

In all cichlid species analyzed, prey were highly localized to a central strike zone ∼1-2 mm away at the end of prey capture (Fig. 3B-C). In NM, we observed a second peak in prey distance at double the length (∼3 mm) of the most frequently used distance. This peak corresponds to events where NM aborted prey capture prior to the strike. The tails of the distributions in Fig. 3C are, similarly, due to a significant number of aborted hunting events. Together, these results demonstrate that cichlids are able to detect prey at a large distance and possess the fine sensorimotor control required to localize prey to a strike zone.

Fish larvae of different species share a common pose space

Zebrafish larvae have a unique swim repertoire during prey capture, which is distinct from exploratory swim bouts (Borla et al., 2002; McElligott & O’Malley, 2005). During prey capture, these swims each mediate distinct transformations of the visual scene: J-turns center prey in the visual field, approach swims bring prey to the strike zone, and strikes are deployed to capture prey. A limited repertoire of pose positions allows for the capture of variation in tail shape over time with a greatly reduced number of dimensions (Johnson et al., 2020; Marques et al., 2018; Mearns et al., 2020). Performing principal components analysis (PCA) on all species combined revealed that tail pose dynamics are similarly low-dimensional (Fig. 4A, black line; 3 PCs explain > 90% variance). These results also hold when each species is analyzed individually (Fig. 4A, colored lines). As with zebrafish larvae (Girdhar et al., 2015; Mearns et al., 2020), the principal components correspond to a harmonic series (Fig. 4B), with the first PC capturing “turniness”, and PCs 2 and 3 capturing tail oscillations during locomotion. The first three PCs were similar across species (Fig. 4C), suggesting that animals that share a common body plan exhibit similar low-dimensional pose-spaces.

Interspecies comparison of prey capture strategies. (A) Cumulative explained variance for the “canonical” principal components (PCs) obtained from all species (black dotted line), and PCs for each species individually (colored lines). In all cases, three PCs explain >90% of the variance in tail shape. (B) “Eigenfish” of the first three canonical PCs. Each principal component represents a vector of angles from the base to the tip of the tail (oriented with the fish facing up). At a given moment, the shape of the tail can be described as a linear combination of these vectors. Colors correspond to the tail shape obtained by scaling each PC from -4 to 4 standard deviations from the mean. (C) Each species’ eigendecomposition compared against the canonical PCs computed for all species together. Color intensity represents the cosine similarity between pairs of vectors. The strong diagonal structure (particularly in the first three PCs) shows that similar PCs are obtained by analyzing species separately or together. (D) Behavioral space of L. attenuatus. Each point represents a single bout. Bouts are projected onto the first three PCs, aligned to the peak distance from the origin in PC space and then projected into a two-dimensional space using TSNE. Color intensity represents density of surrounding points in the embedding. (E) Clustered behavioral space of L. attenuatus. Clusters (colors) are computed via affinity propagation independently of the embedding. (F) Prey capture and spontaneous bouts in L. attenuatus. Prey capture score is the probability that the eyes are converged at the peak each bout. Bouts are colored according to the mean prey capture score for their cluster. Blue: clusters of bouts that only occur during spontaneous swimming; red: clusters of bouts that only occur during prey capture. (G-H) Example prey capture clusters from L. attenuatus (G) and medaka, O. latipes (H). For each cluster, top left: mean rostrocaudal bending of the tail over time; bottom left: time series of tail pose projected onto first three PCs (mean +/- standard deviation); bottom right: reconstructed tail shape over time for mean bout. (I) Representative frames of an L. attenuatus capture spring (left) and medaka side swing (right), highlighting differences in tail curvature between these behaviors. (J) Location of prey in the visual field (black dots) immediately prior to the onset of a side swing. All events mirrored to be on the right. X marks the midpoint of the eyes, aligned across trials. (K) Confusion matrix of clustered hunt termination bouts from all species. Termination bouts from all species were sorted into five clusters based on their similarity. Rows show the proportion of bouts from each species that were assigned to each cluster (columns). Cichlid bouts are mixed among multiple clusters, while medaka bouts (OL) mostly sorted into a single cluster. (L) Capture strikes in cichlids. Representative examples of tail kinematics during attack swims (left) and capture springs (right) from each species, including time series of tail pose projected onto the first three PCs (mean +/- standard deviation, all species combined) shown for each type of strike. (M) Two hypotheses for distance estimation make different predictions of how heading (black dotted line) changes over time as fish approach prey (pink star). Top: fish maintain prey in the central visual field and use binocular cues to judge distance. Bottom: fish “spiral” in towards prey, using motion parallax to determine distance. Black arrows indicate motion of prey stimulus across the retina. (N) Change in heading over time leading up to a capture strike for L. attenuatus (brown) and medaka (purple). Left: time series of heading. Zero degrees represents the heading 25 ms prior to the peak of the strike (black dotted line). Mean ± s.e.m. across hunting events. Right: comparison of the rate of heading change, computed as the slope of a line fit to each hunting event 200-25 ms prior to the peak of the strike. The heading decreases more rapidly leading up to a strike in medaka than in L. attenuatus.

Hunting sequences in cichlids share kinematic motifs with zebrafish but are more complex

A common low-dimensional pose space in drosophilids has aided with the identification of evolutionary trajectories in behavior (York et al., 2022). We therefore wanted to investigate the trajectories through pose space of cichlids and medaka to understand if they share common behaviors during hunting, or if their behaviors have diverged over evolution. For each species, we projected the time series of rostrocaudal tail angles for each distinct bout onto the first three principal components (Mearns et al., 2020). We then performed affinity propagation to identify clusters corresponding to different swim types and performed t-distributed stochastic neighbor embedding (TSNE) to visualize these clusters in a low-dimensional embedding (Fig. 4D, E, Suppl. Fig. 3). We found between 11 (LO) and 32 (AB) behavioral clusters per species. AB and LA held the richest behavioral repertoires, reflecting the more elaborate prey pursuit behaviors in these species (Fig. 4E, Suppl. Fig. 3). The number of clusters we find is on par with or greater than the bout types that have been found in zebrafish larvae (Marques et al., 2018; Mearns et al., 2020).

Similar to zebrafish, we found that cichlids have specific kinematic motifs for prey capture (Fig. 4F). Some of these swims share characteristics with hunting bouts previously described in zebrafish, such as J-turns and approach swims, but we also identify new types that do not have clear analogs in zebrafish (Fig. 4G). For example, cichlid larvae often “hover” in place, oscillating their tail without any forward movement (Fig. 4G). Such behaviors tend to be longer lasting than approach swims. During hunting, cichlids will alternate between approaches and these hover swims. Another feature of cichlid hunting behavior not present in zebrafish is tail coiling. Here, the tail coils into an S-shape over many hundreds of milliseconds leading up to a strike. The coil is released in a spring-loaded attack to capture prey (Fig. 4I). In addition to such capture springs, we have also observed suction captures and ram-like attack swims in cichlids.

Following the strike, cichlids exhibit a range of behavioral maneuvers not present in zebrafish. In cases where capture strikes miss, cichlid larvae may swim backwards to recenter prey in the visual field and initiate a second strike (Suppl. Video 8). We also observed that, after a successful capture, cichlid larvae may expel prey from their mouth (Suppl. Video 7). Such “spitting” behavior often triggered another attempt to capture the same prey and was particularly common in LA. Repeated attempts at capturing the same prey explain the high incidence of hunt initiations when prey are already in the central visual field in this species (Fig. 3A, B & D). The function of spitting is not clear, and it might indicate that the prey are slightly too large for LA to swallow. Alternatively, such a behavior may possibly be adaptive, providing already sated animals the opportunity to practice hunting without overeating. We also find that cichlid captures often occur in two phases, with the prey first being caught by the pharyngeal jaws (a second, internal set of jaws present in cichlids; Liem 1973) before being ingested. Together, these results reveal a richer behavioral repertoire for pursuing, capturing, and reorienting towards missed prey in cichlids than in zebrafish.

Cichlids share two common capture strikes that are distinct from medaka side-swing strikes

Studies in invertebrates have demonstrated that behavior can evolve through changing the kinematics of a behavior performed in a given sensory context (Ding et al., 2016), changing which sensory cues drive behavior (Seeholzer et al., 2018), or changing the transition frequencies between a common set of kinematic motifs (Hernández et al., 2021). It is not known to what extent these factors play a role in the evolution of vertebrate behaviors, nor the evolutionary timescales over which behaviors diverge to the point of becoming distinct. To address this question, we focused on the capture strike kinematics of cichlids and medaka. The capture strike of zebrafish larvae has been extensively characterized, consisting of two distinct types: the attack swim and S-strike. These strikes are deployed variably, dependent on experience and different prey distances (Lagogiannis et al., 2020; McClenahan et al., 2012; Mearns et al., 2020; Patterson et al., 2013).

Medaka swimming is nearly continuous, and their eyes do not converge during prey capture. Nevertheless, medaka may still perform different behaviors during spontaneous swimming and prey capture. Investigating clusters of behavior in medaka revealed types of swim that are similar to prey capture bouts of cichlids and zebrafish. These included J-turns, approach swims, and slow swims (Fig. 4H). The presence of these swims suggests that medaka, like other species studied, may also have a separate suite of behaviors reserved for prey capture. Remarkably, we found that medaka did not strike at prey in the central visual but rather captured prey in the lateral visual field with a unique side-swing behavior (Fig. 4H-J). Although the eyes do not converge, there is sometimes a slight nasalward rotation of the ipsilateral eye leading up to and during the side swing (Fig. 4I).

To confirm that side-swing behavior was unique to medaka, we clustered capture strikes of cichlids and medaka after projecting their tail kinematics into the common pose space, setting the number of clusters to the number of species (Fig. 4A). We reasoned that if bouts were similar within each species but different between species, cluster identity would correlate with species identity. In contrast, if species shared similar kinematic motifs, clusters would contain bouts from all species. We found that strikes from all cichlid species were similar to each other, while the medaka side-swing strike was unique (Fig. 4K).

This analysis uncovered at least two kinematically distinct capture strikes in cichlids. The slower of these behaviors is similar to the zebrafish attack swim, characterized by a symmetric undulation of the tail about the midline which propels the fish towards the prey, while the faster corresponds to the capture spring. The S-shape of the tail leading up to the capture spring shares structural similarities with the S-strike of zebrafish larvae (Mearns et al., 2020), differing primarily in the time course over which they occur. The S-strike of zebrafish may represent an accelerated form of the capture spring, forming and releasing over tens of milliseconds rather than hundreds of milliseconds seen in cichlids or, vice versa, the capture spring may represent an augmentation of the S-strike.

Different prey capture strategies may reflect different sensorineural solutions to prey detection

Centralized prey and converged eyes in cichlids suggest that the mechanism they use to determine the distance to prey could be similar to that of zebrafish. One possibility is that these species use binocular disparity to judge depth (Qian, 1997). In contrast, we hypothesized that medaka might use a different, monocular strategy to judge the distance to prey, such as motion parallax (Yoonessi & Baker, 2011). These two mechanisms of depth perception make different predictions about how larvae should approach prey to make best use of visual cues (Fig. 4M). To investigate these possibilities, we computed the change in heading leading up to a strike as a proxy for the change in visual angle of the prey for cichlids and medaka (Fig. 4N). We found that the rate of change in heading was greater in medaka than in cichlids (median degrees/second: -13.4, LA; -34.1, OL; p < 0.001, Mann-Whitney U-test). These results are consistent with cichlids using binocular cues, such as binocular disparity, to determine distance to prey, while medaka may employ a monocular strategy.

Discussion

We have found that zebrafish and cichlids share the same general hunting strategy: striking at prey localized to a binocular strike zone while the eyes are converged, with two distinct capture bout types. The similarities are remarkable, given these species diverged 240 million years ago (see Fig. 1A). In contrast, medaka, which are phylogenetically intermediate to zebrafish and cichlids, deploy a different hunting strategy: striking at prey laterally in the monocular visual field with a side-swing behavior. Fig. 5 provides a schematic summary of similarities and differences among the teleost larvae investigated. Characterization of prey capture behavior in a wider range of species could reveal whether the similarities between zebrafish and cichlids represent conservation of an ancestral hunting strategy or convergent evolution.

Schematic of hunting strategies in teleost larvae. Top: prey capture in zebrafish larvae (Danio rerio) begins with eye convergence and a J-turn to orient towards the prey. The prey is then approached with a series of low amplitude approach swims. Once in range in the central visual field, the prey is captured with an attack swim or an S-strike. Middle: prey capture in cichlids (represented by L. attenuatus) also begins with eye convergence and a J-turn. Prey is approached with a wide variety of tail movements. The prey is captured when it is in the central visual field with either an attack swim, or a capture spring, during which the tail coils over several hundreds of milliseconds. Bottom: prey capture in medaka begins with reorienting J-like turns, but these do not centralize prey in the visual field. Instead, the prey is kept lateral in the visual field and is captured with a side swing.

What sensorineural mechanisms might underlie these two different hunting strategies? The side swing of medaka is reminiscent of hunting in (adult) blind cavefish (Lloyd et al., 2018), which position prey laterally prior to strikes using their mechanosensory lateral line. Adult cichlids are known to use their lateral line for prey capture (Schwalbe et al., 2016), and some percomorph species exhibit both an S-shaped and a sideways capture strategy as adults. Therefore, it is conceivable that medaka larvae do not rely on vision to the same extent for hunting, but use their lateral line instead. However, lateral-line use is not uncoupled from eye convergence across the teleost clade. Zebrafish larvae can still hunt in the dark, as do blind zebrafish lakritz mutants, albeit with greatly reduced efficacy, and with their eyes converged (Gahtan et al., 2005; Mearns et al., 2020; Patterson et al., 2013). Early-stage cave fish, which evolved from sighted surface ancestors and are completely blind, exhibit vestigial eye convergence movements during prey capture before their eyes fully degenerate (Espinasa et al., 2023; Espinasa & Lewis, 2023). This comparative evidence suggests that the teleost brain employs cues from both sensory modalities, if available, to locate prey, but that eye movements are a poor predictor of the dominance of vision.

On the other hand, species that attack centrally located prey could be using different visual cues to judge prey distance than those that use a side swing to ingest food: binocular disparity (Qian, 1997) vs. motion parallax (Yoonessi & Baker, 2011). Zebrafish use binocular information (Gahtan et al., 2005; Gebhardt et al., 2019; Henriques et al., 2019) and stationary differences such as brightness and contrast (Khan et al., 2023) to estimate depth and distance. Here we have shown that cichlid larvae also likely use binocular cues, while medaka’s approach to prey conforms to a monocular strategy (see Fig. 4M, N). Strikingly, cichlid and zebrafish larvae swim in bouts with intermittent pauses. We speculate that the pauses may serve to sample the distance to the prey by comparing its position across the two eyes. In contrast, to sample motion cues, it is advantageous to perform continuous glides (“motorboating”) and swim lateral to the prey up until it is close enough for a sideways strike. Thus, natural selection may have favored specific locomotor (bouts vs. glides) and oculomotor (convergent vs. divergent) adaptations of behavioral control depending on the dominant distance measurement mechanism (binocular vs. monocular) (Fig. 5). While the behavioral evidence for this conjunction is still circumstantial and calls for further comparative work, this scenario makes testable predictions about its neurobiological and genetic implementation.

We have demonstrated that eye convergence is not always a hallmark of prey capture in fish larvae, and that multiple, kinematically distinct capture swims exist at these early stages in different species. We speculate that ocular vergence angles and swim kinematics for prey capture might co-evolve, with S-shaped captures and attack swims being associated with convergent eyes and side-swing captures occuring in the absence of eye convergence. These two strategies may have been present in a common ancestor of teleosts, with one or the other becoming dominant in certain lineages. Comparing neural circuitry across these (now) experimentally tractable animals could help pinpoint evolvable synaptic network nodes (“hotspots” of evolutionary change; Roberts et al., 2022; Seeholzer et al., 2018) and genetic loci that underlie prey localization strategies and swim kinematics in fishes.

Methods

Key Resource Table

Experimental animals

The following species were used in this study: Lamprologus ocellatus (n = 14), Lepidiolamprologus attenuatus (n = 12), Neolamprologus multifasciatus (n = 22) and Astatotilapia burtoni (n = 14); medaka, Oryzias latipes (n = 3). Adult fish were raised at the Max Planck Institute for Biological Intelligence. The cichlid colonies were originally provided by the Max Planck Institute for Animal Behavior, Konstanz, Germany. L. ocellatus were housed in 50 liter tanks with one male and two females and a shell for each individual for egg laying. L. attenuatus were housed in groups (10-15 mix-sexed individuals) within a 50 l tank with a half flower pot for egg laying. N. multifasciatus were housed in large groups (20-30 mix-sexed individuals) within a 160 l tank with shells for egg laying. A. burtoni were housed in 160 l tanks in groups of 15 (1 male and 14 females) with plants to encourage courtship arenas. O. latipes were raised like zebrafish. All behavioral observations were non-invasive and were performed in accordance with applicable laws and governmental regulations.

Adult cichlids were fed live artemia, kept under a 13 hour: 11 hour day-night cycle in 27°C water (pH 8.2, conductivity ∼550 µS) on sand substrate. Egg-laying behavior was monitored, and clutches were collected at 7-8 days post-fertilization (dpf) from the shell (L. ocellatus, N. multifasciatus), the flower pot half (L. attenuatus), or gently released from the mouth of the mother (A. burtoni). O. latipes embryos were harvested from natural crosses. Clutches were incubated at 28°C in sterile petri dishes with filtered fish facility water. Larvae were assayed as soon as they displayed robust feeding behaviors: L. ocellatus and L. attenuatus at 12 dpf, N. multifasciatus at 14 dpf, A. burtoni at 14-16 dpf, and O. latipes at 10 dpf.

Video acquisition

High-speed (300-500 frames per second) videography data of single larvae of each species were collected as described previously (Mearns et al., 2020). Behavior chambers were molded out of 2% agarose and measured 30 mm x 30 mm for cichlids or 20 mm x 20 mm for medaka. Single larvae were introduced into chambers with a drop of food culture (Artemia salina for cichlids, ∼30 artemia each; Paramecium multimicronucleatum (Carolina Biological Supply Company, Burlington, NC) for medaka, which were too small to be fed artemia). Each animal was recorded for 10-15 minutes using a high-speed camera (EoSens CL MC1362, Mikrotron; objective: Sigma 50 mm f/2.8 ex DG Macro, Japan), backlit with a custom-built infrared LED array.

Tracking

To track the tail and eyes we trained neural networks using SLEAP (Social Leap Estimates Animal Poses) (Pereira et al., 2022). We designed a 12-point skeleton to track the tail (left eye center, right eye center, rostral tip, swim bladder and eight points along the tail), and a 7-point skeleton to track the eyes (swim bladder, and a nasal, central and temporal point on the perimeter of each eye).

Tail tracking

For tail tracking, a single instance model was trained for L. attenuatus on a total of 701 labeled frames from 8 videos across 8 fish. 290 frames were manually labeled and 411 were corrected across two rounds of human-in-the-loop training. This model was used to track the tail in L. ocellatus and N. multifasciatus in addition to L. attenuatus. To track A. burtoni and medaka, the L. attenuatus base model was trained with an additional 345 and 190 frames, respectively, from each of these species.

Eye tracking

For the eyes, we first tracked 4,900 frames of L. attenuatus behavior using an unsupervised tracking algorithm. These points were used to train a base model. This base model was further refined for each species using human-in-the-loop on an additional 402 frames, A. burtoni; 331 frames, O. latipes; and 314 frames, L. ocellatus (this model was also used to track the eyes in N. multifasciatus).

Prey tracking

Artemia were tracked using a single neural network developed for object detection, YOLO (You Only Look Once) (Redmon et al., 2016). We trained the model on 128 frames. Due to the lower prey contrast in the A. burtoni and medaka videos, we could not track prey location in these videos. Prey locations prior to the medaka swing were manually annotated in 17 frames where the targeted paramecium was clearly visible.

To obtain continuous tracks of the most likely targeted prey, we tracked prey location across all frames of episodes of eye convergence. We tracked identities to single artemia across frames using the Hungarian algorithm. We assumed that the artemia nearest to the larva in the anterior visual field in the 100 ms leading up to eye deconvergence was the targeted prey.

Analysis

All analysis was performed using custom-written Python code (libraries: numpy, pandas, scikit-learn, scipy, opencv).

Eye convergence analysis

Computing eye convergence angles

The angle of each eye in each frame was calculated as the angle between heading vector of the fish (vector between the swimbladder and midpoint of the two eyes) and the long axis of the eye (vector between the temporal and nasal points). Zero signifies the long axis of the eye is parallel with the heading vector of the fish; positive angles signify nasalward rotation. The convergence angle was computed as the sum of these two angles (positive convergence indicates both eyes are rotated inwards towards the midline). Eye angles of < -10° (diverged) or > 60° converged indicated tracking errors and were excluded from further analysis.

Gaussian mixture modeling

To determine when the eyes were converged, we fitted a Gaussian mixture model to the distribution of convergence angles for each species. We tested models with either one or two mixtures to determine whether any discernible eye convergence was present. We used a Bayesian Inference Criterion to determine whether one or two mixtures resulted in a better fit. For all cichlid species, a two-mixture model outperformed a single Gaussian distribution. Convergence phases coincided with hunting events. For medaka (O. latipes), a two-mixture model performed marginally worse than the single distribution model.

Identification of hunting episodes

To identify periods of hunting in cichlids, we computed the probability of eye convergence in each frame as the probability it belonged to the Gaussian mixture with the higher mean. This yielded a time series of eye convergence probability, which we smoothed with a 200 ms rolling mean. We defined a hunting episode as any continuous period where the probability of eye convergence was above 50%, with the additional condition that that convergence probability was above 80% for at least half this period.

Tail kinematic analysis

We tracked eight equally spaced points from the tail base to the tail tip in each frame using the whole body SLEAP model (see above). In each frame, we represented the shape of a tail as a seven-dimensional vector, computed by calculating the angle between a line through each consecutive pair of tail points and the heading vector (from the swimbladder to the midpoint between the eyes).

Principal component analysis (PCA) and data curation

To reduce the dimensionality of pose data, we performed PCA as described previously (Mearns et al., 2020). We concatenated the seven-dimensional tail vectors in time (all animals pooled, grouped by species, or for all species combined). We z-scored the data matrix prior to PCA by subtracting the mean angle at each point and dividing by the standard deviation. This ensures equal weighting of all points of the tail during PCA, since otherwise loadings in the principal components would be biased towards points closer to the tail tip, potentially missing subtle but important curvature in the rostral trunk.

PCA provides a set of principal components (PCs; orthogonal vectors, representing some loading across the input dimensions), each associated with an eigenvalue (variance in the data captured by that component). Each principal component can be represented as an “eigenfish” by mapping the raw loadings multiplied by some scalar back into the original space (Mearns et al., 2020). In all cases, we found that three components were sufficient to capture over 90% of the variance in the data, and so for subsequent analyses we represented tail kinematics by a time varying three-dimensional vector representing the projection of the tail shape onto the first three PCs in each frame.

Since we were unable to track every tail point in every frame of our data, we performed some interpolation to fill in missing frames in our tracking. For any period where there was a gap of four or fewer frames (≤ 10 ms) of missing tracking data, we inferred the tail shape using a cubic spline interpolation in each PC dimension. Gaps greater than four frames were excluded from our analyses. We found that tracking errors were easiest to detect by inspecting the weight of the third PC in each frame. Therefore, we excluded frames where the absolute value of this PC exceeded 10 standard deviations (or 5, in the case of A. burtoni). We linearly interpolated tail kinematics so that all trials had a common sample rate in cases where the frame rates of videos did not match.

Segmentation

We adapted our previously published bout detection algorithm for zebrafish larvae (Mearns et al., 2020) to segment the more continuous swimming behavior of the larval species used in this study. This algorithm involves five steps:

  1. The tail is represented as a single-dimensional time series by computing the L2 norm in PC space for each frame.

  2. The absolute value of the derivative of this trace was convolved with a Gaussian kernel (St. Dev: 40 ms). This produces a smooth trace with peaks whenever the shape of the tail is changing rapidly in a non-cyclic manner (i.e., at transitions between behaviors).

  3. A threshold was automatically computed by performing a kernel density estimation across all values of this trace (bandwidths: 0.005, NM, AB & OL; 0.01, LA; 0.02, LO). If this generated a bimodal distribution, we set the threshold to the antimode (local minimum) of this distribution; otherwise, we set the threshold to the first inflexion point after the mode.

  4. Peaks in the traces (step 2) that were above the threshold (step 3) were detected (scipy.signal.find_peaks function). Bouts were segmented at local minima between these peaks.

  5. We performed a round of cleaning to exclude noisy bouts. For each bout, we computed the standard deviation in each PC dimension (low standard deviations in every dimension suggest no tail movement but rather noise in tracking). We then fit a Gaussian mixture model to the three-dimensional space of these standard deviations, and excluded bouts predicted to belong to the mixture with the smallest standard deviation across PC dimensions. For each species we manually set the number of mixture components to exclude most noisy bouts while keeping real movement in the dataset.

This pipeline identified the following numbers of bouts in each species: 2,067 (LO); 11,237 (NM); 10,445 (LA); 9,796 (AB); 3,437 (OL).

Alignment

Having segmented behavior into individual bouts, we sought to align these bouts for clustering and embedding. Our segmentation algorithm produced bouts of varying duration; however, we reasoned that the most important features of a bout that would distinguish it from others would occur when the tail is in the “most extreme” or characteristic position, so we aligned all bouts to their peak distance in PC space (see segmentation step 1). Then, we truncated or zero-padded the aligned bouts such that we fully encompassed 80% of bout starts (before the peak) and 80% of bout ends (after the peak).

To aid the identification of behavioral clusters we chose to ignore the left/right orientation of bouts. Therefore, we flipped bouts such that at their peaks the tail was always oriented in the same direction. We performed an initial clustering step using ward hierarchical clustering with 30 clusters. For this clustering step, we included each bout and its mirror image to produce symmetric clusters. To determine the directionality of a bout, we computed the mean value of the first PC at the peak of each cluster and used either the original or mirrored version of the bout accordingly. We found flipping bouts based on cluster was more robust than flipping each bout individually, since in many instances the value of the first PC is close to zero at the bout peak, and we wanted to ensure all similar bouts were oriented in the same direction to aid embedding and subsequent clustering.

Clustering

To identify behavioral clusters, we performed affinity propagation on the aligned, flipped kinematic motifs. We treated each motif as a single vector, concatenating the time series for each PC. Affinity propagation computes a set of clusters (represented by “exemplar” data points) for a given dataset at a given preference value. Lower preference values produce fewer clusters, so we used the minimum similarity between data points, which is a standard metric for computing a preference value (Frey & Dueck, 2007). This approach yielded between 11 and 32 clusters per species (11, LO; 23, NM; 29, LA; 32, AB; 17, OL), which is a reasonable number to expect in these data, being the same order as the number of behaviors that have been identified in zebrafish larvae (Marques et al., 2018).

Embedding

To visualize behavior maps, we embedded motifs in a two-dimensional space using t-distributed stochastic neighbor embedding (TSNE) (Maaten & Hinton, 2008), concatenating the time series for each PC such that each motif was represented by a single vector.

Prey capture bouts

To assign bouts a prey capture score, we computed the probability of eye convergence (see eye convergence analysis, above) at the peak of each bout. We then averaged these probabilities over all bouts belonging to a given cluster to compute the prey capture score for that cluster. Thus, a prey capture score of zero signifies zero probability of eye convergence for all bouts within a cluster, and a prey capture score of one indicates a 100% probability of eye convergence for all bouts within a cluster.

Capture strike analysis

Identifying putative strikes

To identify putative capture strikes in cichlids, we found all bouts whose peak coincided with the end of a hunting episode (see eye convergence analysis, above), allowing the peak to occur up to 100 ms before and up to 200 ms after the detected end of eye convergence. Since we could not use eye convergence to detect prey capture in medaka, we manually inspected clusters obtained from analyzing tail movements, and found that one of these clusters corresponded to a strike behavior (side swing). For cross-species comparisons, all data were interpolated to a common frame rate of 400 fps. We performed an initial round of clustering on these putative strike bouts (Ward hierarchical clustering with n clusters = 8) and removed small clusters containing fewer than 20 bouts. After this filtering, we had 341 putative strike bouts (16, LO; 31, NM; 191, LA; 59, AB; 44, OL).

Clustering strikes

To only consider tail kinematics around the moment of the strike, we truncated bouts to a 150 frame window (375 ms) around the peak. We then clustered these bouts with Ward hierarchical clustering, with the number of clusters set to five (same as the number of species). To determine whether each species had its own unique strike behavior, or whether the same behaviors were shared across species, we computed the confusion matrix using the cluster labels and species identity labels. Two clusters corresponded to abort behaviors in cichlids, two clusters corresponded to cichlid capture strikes (attack swim and capture spring), and one cluster corresponded to medaka side swings.

Change in heading during prey capture

To compute the change in heading leading up to a strike, we considered a 200 ms window before the peak of each strike bout. So as to not bias our analysis to the rapid changes in heading at the moment of the medaka side swing, we only analyzed heading changes up to 25 ms before the peak of each bout. We virtually rotated and mirrored all capture sequences such that the heading 25 ms before the bout peak was zero, and the mean heading prior to this moment was positive. For each event, we then computed the slope of a regression line (least squares fit).

Additional information

Author contributions

D.S.M. and H.B. conceived the project. D.S.M. and M.W.S. collected the data. S.A.H. trained models and performed tracking. A.V.P. and M.S. maintained cichlid colonies and provided larvae for experiments. D.S.M. and S.A.H. analyzed the data. D.S.M. and H.B. wrote the manuscript with input from all authors.

Competing interests

No competing interests declared.

Funding

Funding was provided by the Max Planck Society. M.W.S. was supported by a Boehringer Ingelheim Fonds Fellowship.

Acknowledgements

We thank Alex Jordan for sharing cichlid stocks and Joachim Wittbrodt for sharing medaka stocks; Jessica Zung, Qing Wang and Greg Marquart for critical feedback on the manuscript; Abdelrahmen Adel for technical assistance training neural networks; Krasimir Slanchev and the animal caretakers at the Max Planck Institute for Biological Intelligence for assistance raising and breeding fish; and Julia Kuhl for illustrations.