Performance for a benchmark of 33 videos of flies, zebrafish and mice.

a. Median IDF1 score computed using all images of animals in the videos excluding animal crossings. The videos are ordered by decreasing IDF1 score of the original idtracker.ai results for ease of visualization. b. Median tracking times are shown for the scale of hours and, in the inset, for the scale of days. The videos are ordered by increasing tracking times in the original idtracker.ai results for ease of visualization. Supplementary Table 1, Supplementary Table 2, Supplementary Table 3 give more complete statistics (median, mean and 20-80 percentiles) for the original idtracker.ai (version 4 of the software), optimized v4 (version 5) and new idtracker.ai (version 6), respectively. The names of the videos in a. and b. start with a letter for the species (๐‘ง,๐‘‘,๐‘š), followed by the number of animals in the video, and possibly an extra number to distinguish the video if there are several of the same species and animal group size. Lines between points are for visualization purposes only. Figure 1โ€”figure supplement 1. Performance for the benchmark with full trajectories with animal crossings Figure 1โ€”figure supplement 2. Boxplot representation of the benchmark results Figure 1โ€”figure supplement 3. Robustness to blurring and light conditions Figure 1โ€”figure supplement 4. Memory usage across the different softwares

Tracking by identification using deep contrastive learning.

a Schematic representation of a video with five fish. b A ResNet18 network with 8 outputs generates a representation of each animal image as a point in an 8-dimensional space (here shown in 2D for visualization). Each pair of images corresponds to two points in this space, separated by a Euclidean distance. The ResNet18 network is trained to minimize this distance for positive pairs and maximize it for negative pairs. c 2D t-SNE visualizations of the learned 8-dimensional representation space. Each dot represents an image of an animal from the video. As training progresses, clusters corresponding to individual animals become clearer. Here we plot this process for the example video zebrafish_60_1 after training for 0, 2,000, 4,000 and 15,000 batches (each batch contains 400 positive and 400 negative pairs of images, that is, 1,600 images per batch). The t-SNE plot at 15,000 training batches is also shown color-coded by human-validated ground-truth identities. The pink rectangle at 2,000 batches of training highlights clear clusters and the orange square fuzzy clusters. d The Silhouette score measures cluster coherence and increases during training, as illustrated for a video with 60 zebrafish. e A Silhouette score of 0.91 corresponds to a human-validated error rate of less than 1% per image.

Tracking with strong occlusions.

Accuracies when we mask a region of a video defined by an angle ๐œƒ and the tracking system has no access to the information behind the mask. Light and dark gray region correspond to the angles for which no global fragments exist in the video. Dark gray regions correspond to angles for which the video has a fragment connectivity lower than 0.5, with the fragment connectivity defined as the average number of other fragments each fragment coexists with, divided by ๐‘โˆ’ 1, with ๐‘ the total number of animals; see Figure 3โ€”figure Supplement 1, for an analysis justifying this value of 0.5. The original idtracker.ai (v4) and its optimized version (v5) cannot work in the gray regions, and new idtracker.ai is expected to deteriorate only in the dark gray region. Figure 3โ€”figure supplement 1. Fragment connectivity analysis

Performance of original idtracker.ai (v4) in the benchmark.

Performance of optimized v4 (v5) in the benchmark.

Performance of new idtracker.ai (v6) in the benchmark.

Performance of TRex in the benchmark.

Performance for the benchmark with full trajectories with animal crossings.

a. Median IDF1 score computed using all images of animals in the videos including animal crossings. b. Median tracking times. Supplementary Table 1, Supplementary Table 2, Supplementary Table 3 and Supplementary Table 4 give more complete statistics (median, mean and 20-80 percentiles) for the original idtracker.ai (version 4 of the software), optimized v4 (version 5), new idtracker.ai (version 6) and TRex, respectively. Lines between points are for visualization purposes only.

Boxplot representation of the benchmark results.

IDF1 scores without (a) and with crossings (b) and tracking times (c) for the three versions of idtracker.ai and TRex.

Robustness to blurring and light conditions.

First column: Unmodified video zebrafish_60_1. Second column: zebrafish_60_1 with a gaussian blurring of 1 pixel plus a resolution reduction to 40% of the original plus MJPG video compression. Versions 5 of idtracker.ai fails to track this videos with a 82% IDF1 score. Third column: Videos of 60 zebrafish with manipulated light conditions (same test as in idtracker.ai Romero-Ferrero et al. (2019)). First row: Uniform light conditions across the arena (zebrafish_60_1). Second row: Similar setup but with lights off in the bottom and right side of the arena.

Memory usage across the different softwares.

The solid line is a logarithmic fit to the (RAM) memory peak as a function of the number of blobs in a video. The Python tool psutil.virtual_memory().used was used to measure the memory usage 5 times per second during the tracking. The presented value is the maximum measured value minus the baseline measured before the tracking start. Disclaimer: Both softwares include automatic optimizations that adjust based on machine resources, so results may vary on systems with less available memory. These results were measured on computers with the specifications in Methods.

Fragment connectivity analysis.

The relation between fragment connectivity and the corresponding IDF1 score for every video and angle ๐œƒ of the new id-tracker.ai in Figure 3. Fragment connectivity is calculated as the average number of other fragments each fragment coexists with, divided by ๐‘ โˆ’ 1, with ๐‘ the total number of animals. Values below 0.5 are associated with low IDF1 score, and idtracker.ai alerts the user in such cases.

Protocol 2 failure rate.

Probability for the different tracking systems of not tracking the video with protocol 2 in idtracker.ai (v4 and v5) and in TRex the probability that it fails without generating trajectories.

Models comparison.

Error in image identification as a function of training time for different deep learning models randomly initialized in 6 test videos. For each network we report the multiply-accumulate operations (MAC) in giga operations (G) (for a batch of 1600 images of size 40ร—40ร—1) and the number of parameters in the units of million parameters (M). Every 100 training batches, we perform k-means clustering on a randomly selected set of 20,000 images, assigning identities based on clusters. We then compute the Silhouette score and ground-truth error on the same set. The reported error corresponds to the model with the best Silhouette score observed up to that point.

ResNet models comparison.

Error in image identification as a function of training time for different deep learning models randomly initialized (except Pre-trained ResNet18) in 6 test videos. For each network we report the multiply-accumulate operations (MAC) in giga operations (G) (for a batch of 1600 images of size 40ร—40ร—1) and the number of parameters in the units of million parameters (M). Every 100 training batches, we perform k-means clustering on a randomly selected set of 20,000 images, assigning identities based on clusters. We then compute the Silhouette score and ground-truth error on the same set. The reported error corresponds to the model with the best Silhouette score observed up to that point.

Embedding dimensions comparison.

Error in image identification as a function of training time for different embedding dimensions in 6 test videos. Every 100 training batches, we perform k-means clustering on a randomly selected set of 20,000 images, assigning identities based on clusters. We then compute the Silhouette score and ground-truth error on the same set. The reported error corresponds to the model with the best Silhouette score observed up to that point.

๐ทneg over ๐ทpos comparison.

Error in image identification as a function of training time for different ratios of ๐ทnegโˆ•๐ทpos in 6 test videos. Every 100 training batches, we perform k-means clustering on a randomly selected set of 20,000 images, assigning identities based on clusters. We then compute the Silhouette score and ground-truth error on the same set. The reported error corresponds to the model with the best Silhouette score observed up to that point.

Batch size comparison.

Error in image identification as a function of training time for different batch sizes of pairs of images in 6 test videos. Every 100 training batches, we perform k-means clustering on a randomly selected set of 20,000 images, assigning identities based on clusters. We then compute the Silhouette score and ground-truth error on the same set. The reported error corresponds to the model with the best Silhouette score observed up to that point.

Exploration and exploitation comparison.

Error in image identification as a function of training time for different exploration/exploitation weights ๐›ผ in 6 test videos. Every 100 training batches, we perform k-means clustering on a randomly selected set of 20,000 images, assigning identities based on clusters. We then compute the Silhouette score and ground-truth error on the same set. The reported error corresponds to the model with the best Silhouette score observed up to that point.

Segmentation GUI.

Enables users to set the basic parameters required for running idtracker.ai.

Validator GUI.

Enables users to inspect tracking results, correct errors, and access additional tools.

Video Generator GUI.

Allows users to define parameters for general and individual video generation.

SocialNet output example showing learned attraction-repulsion and alignment areas for social interactions around the focal animal.

Confusion matrix between the two parts of drosophila_80 in v6 of idmatcher.ai.

It contains predictions both from the network trained in the first part with images from the second one and the other way around. In this example, the image-level accuracy is 82.9%, enough for a 100% accurate identity matching.