Figures and data

Megabouts pipeline overview and performance
(a) Overview of Megabouts’ pipeline. Megabouts can process either high-resolution tail and trajectory tracking or low-resolution trajectory tracking. The pipeline begins with data loading and preprocessing, followed by swim bout identification via a segmentation algorithm. A Transformer neural network then classifies each swim bout and predicts the location and sign of the first tail beat. (b) Behavioral repertoire of zebrafish larvae composed of 13 swim bout categories. Left: Exemplar tail angle time series for each category, representing bouts closest to the category mean. Right: Corresponding head trajectories with arrows indicating final head orientation. (c) Performance of the Transformer neural network. Top: Balanced classification accuracy at different frame rates. The downsampled input data was either presented with or without tail masking. Accuracy remains an order of magnitude above chance level (7.7%) in all conditions. Bottom: 95% confidence interval for the predicted location of the first tail beat. This first tail beat is predicted from the Transformer Neural Network along with the bout category. For frame rates below 100 fps, the prediction accuracy is below the data sampling interval, achieving super-resolution.

Sensorimotor coupling and behavioral phenotyping
(a) Sensorimotor coupling. Conditional probability of swim bout occurrence given a stimulus, computed from 1.95 million bouts from 108 larvae. Bout categories are color-coded as in Fig. 1b. For directional stimuli, bouts are split into ipsilateral or contralateral based on the direction of the first tail beat relative to the stimulus. (b) Temporal dynamics of swim bouts recorded using Zebrabox. Average frequency of swim bout initiation for each category as a function of time relative to the light on/off stimuli, averaged over 4 trials for N=372 larvae. The subplots for each category share the same y-scale. (c) Enhanced phenotyping of neuroactive drugs. MiniRocket classifier trained on three behavioural time series: binary movement (top), locomotion speed (middle) and Megabouts action categories (bottom). Vertical dotted line is chance level at 11%.

Comparison of Megabouts Bout Classification with Labels from Mearns et al., 2020
This contingency table compares swim bout categories classified by Megabouts (rows) and those from Mearns et al., 2020 (columns), computed from approximately 40,000 bouts in the Mearns et al., 2020 dataset. Each cell shows the proportion of bouts in a Megabouts category that match a Mearns et al., 2020 category, normalized by the total number of bouts in each Megabouts category (values from 0 to 1). Darker shades indicate higher correspondence. The differences, even for similarly named categories, highlight the need for a standardized classification provided by Megabouts.

Tracking configuration compatible with Megabouts
Megabouts supports multiple tracking configurations: ‘tail tracking’ (for head-restrained conditions), ‘head tracking’, or ‘full tracking’. Head tracking requires two keypoints to estimate the position and orientation of the larva. Tail tracking requires at least four keypoints from the swim bladder to the tail tip. Data can be provided as keypoint coordinates or as posture variables, including head yaw and tail curvature.

Segmentation algorithm and performance
(a) Illustration of the segmentation algorithm. (i) High frame-rate segmentation. A 10-second sample of tail tracking at 700 fps is shown. By applying a threshold to the tail vigor, we can identify the onset (green line) and offset (red dot) of swim bouts. (ii) Downsampled to 80 fps with tail masking. The same data is downsampled to 80 fps. Trajectory speed is computed based on the derivative of x,y and yaw coordinates. First, a peak-finding algorithm is used to locate the maximum trajectory speed. Then, a threshold of 20% of this peak value is applied to determine the onset and offset of swim bouts. (iii) Downsampled to 20 fps with tail masking. The same process as in (ii) but with downsampling to 20 fps. (b) Onset jitter. The distribution of the difference between onset times computed using tail angle vigor at 700 fps and trajectory at 20 fps is shown. The analysis is based on N=4 larvae from high-resolution recordings. (c) Segmentation accuracy. A reference segmentation was computed from tail angle at 700fps using 20 larvae. This reference was compared with segmentation obtained from trajectory-only data downsampled from 20 to 700 fps. See Supp. Note 4.F.1.

Transformer input and architecture
(a) Input data for the Transformer. For each swim bout, the first seven out of ten tail angles (γ1, …, γ7) are used as input, along with coordinates computed after the bout onset, relative to the fish’s position and orientation at onset. (b) Transformer architecture. Each token [T ok ti] corresponds to the pose measured at time ti relative to the movement onset. [T ok ti] is a vector containing 11 values: tail angles (γ1(ti),…, γ7(ti)), x(ti) and y(ti) coordinates, and the head angle cos(θ(ti)), sin(θ(ti)). Missing measurements are replaced with a fixed learned value. After passing through an embedding layer, temporal embeddings are added. The tokens are then processed by 3 layers of 8-head multi-head self-attention. Finally, outputs are computed from the learned classification token [CLS]. (c) Illustration of downsampling/masking data augmentation. The process begins with an example bout. Each line corresponds to a token dimension. A random shift is applied to the segmentation (two vertical black lines) to achieve translational invariance in bout classification and jitter the first tail beat location (shown as the vertical orange line). The time series is then downsampled, and masking can omit the tail information.

Classification accuracy and bout directionality
(a) Confusion matrix. Left: Confusion matrix at high resolution, corresponding to a balanced accuracy of 89.1%. Right: Confusion matrix using only trajectory data downsampled to 60 fps, corresponding to a balanced accuracy of 71.2%. (b) Performance in bout subcategory and directionality. Left: Balanced classification accuracy for swim bout subcategories across different frame rates. The downsampled input data were presented with or without tail masking. Right: Accuracy of the estimation of the first tail half beat amplitude sign, which determines the directionality of the movements.

Swim bouts by subcategory
(a) For each swim bout subcategory, 50 swim bouts were selected from those in the test dataset that were classified with high probability (>98%). (b) We illustrate the probability of bouts occurring from specific subcategories given the stimulus presented. The O-bend subcategories O1 and O2 are presented in different plots since they have very different probabilities of occurrence.

Occurrence of O-Bend in response to light stimuli
Probability of occurrence of O-Bend across time for different well sizes. The probability was computed using N=2 wells for each size. The probability was smoothed using a 1-sec box-car filter.

Head-restrained pipeline for motif analysis
(a) Illustration of the four motifs learnt from the data: slow, fast, turn, struggle. Left: heat map of each motif shows tail beat propagation along rostral-caudal segments and time. Right: Same data as time series with increasing contrast along rostral-caudal axis. (The motif color code is shared in a., b. and d.). (b) Illustration of sparse coding and motif vigor. Top. The original tail angle (black) is reconstructed by adding the four components (red). Middle. Each component is computed by the convolution of a sparse code and the motif. Bottom. The contribution of each motif can be computed as the rolling variance of each component (we used a window of 40ms) (c) A grating moving at different speeds was displayed below a head-restrained larva. A virtual-reality system was used to adjust the speed of the moving grating depending on the fish tail speed. (d) For each motif we computed the vigor (left) or proportion of motif recruitment (right) as a function of the grating speed (filled area corresponds to mean±std.error, N = 25 larvae).