Figures and data in New idtracker.ai rethinks multi-animal tracking as a representation learning problem to increase accuracy and reduce tracking time | eLife

Figures
Additional files

15 figures and 1 additional file

Figures

Figure 1 with 4 supplements

Download asset Open asset

Performance for a benchmark of 33 videos of flies, zebrafish, and mice.

(a) Median IDF1 score computed using all images of animals in the videos excluding animal crossings. The videos are ordered by decreasing IDF1 score of the original idtracker.ai results for ease of visualization. (b) Median tracking times are shown for the scale of hours and, in the inset, for the scale of days. The videos are ordered by increasing tracking times in the original idtracker.ai results for ease of visualization. The names of the videos in (a) and (b) start with a letter for the species ( $z$ , $d$ , $m$ ), followed by the number of animals in the video, and possibly an extra number to distinguish the video if there are several of the same species and animal group size. Lines between points are for visualization purposes only.

Figure 1—source data 1 Numerical benchmark results with median, mean, and 20–80 percentile values of tracking accuracy and times.: https://cdn.elifesciences.org/articles/107602/elife-107602-fig1-data1-v1.xlsx
Download elife-107602-fig1-data1-v1.xlsx

Figure 1—figure supplement 1

Download asset Open asset

Performance for the benchmark with full trajectories with animal crossings.

(a) Median IDF1 score computed using all images of animals in the videos including animal crossings. (b) Median tracking times. Lines between points are for visualization purposes only. Source data can be found in Figure 1—source data 1.

Figure 1—figure supplement 2

Download asset Open asset

Boxplot representation of the benchmark results.

F1 scores without (a) and with crossings (b) and tracking times (c) for the three versions of idtracker.ai and TRex.

Figure 1—figure supplement 3

Download asset Open asset

Robustness to blurring and light conditions.

First column: unmodified video zebrafish_60_1. Second column: zebrafish_60_1 with a Gaussian blurring of 1 pixel plus a resolution reduction to 40% of the original plus MJPG video compression. Version 5 of idtracker.ai fails to track this video with an 82% IDF1 score. Third column: videos of 60 zebrafish with manipulated light conditions (same test as in idtracker.ai Romero-Ferrero et al., 2019). First row: uniform light conditions across the arena (zebrafish_60_1). Second row: similar setup but with lights off in the bottom and right side of the arena.

Figure 1—figure supplement 4

Download asset Open asset

Memory usage across the different software.

The solid line is a logarithmic fit to the (RAM) memory peak as a function of the number of blobs in a video. The Python tool psutil.virtual_memory().used was used to measure the memory usage 5 times per second during the tracking. The presented value is the maximum measured value minus the baseline measured before the tracking start. Disclaimer: both software include automatic optimizations that adjust based on machine resources, so results may vary on systems with less available memory. These results were measured on computers with the specifications in ‘Methods’.

Figure 2

Download asset Open asset

Tracking by identification using deep contrastive learning.

(a) Schematic representation of a video with five fish. (b) A ResNet18 network with eight outputs generates a representation of each animal image as a point in an eight-dimensional space (here shown in 2D for visualization). Each pair of images corresponds to two points in this space, separated by a Euclidean distance. The ResNet18 network is trained to minimize this distance for positive pairs and maximize it for negative pairs. (c) 2D t-SNE visualizations of the learned 8-dimensional representation space. Each dot represents an image of an animal from the video. As training progresses, clusters corresponding to individual animals become clearer. Here, we plot this process for the example video zebrafish_60_1 after training for 0, 2000, 4000, and 15,000 batches (each batch contains 400 positive and 400 negative pairs of images, that is, 1600 images per batch). The t-SNE plot at 15,000 training batches is also shown color-coded by human-validated ground-truth identities. The pink rectangle at 2000 batches of training highlights clear clusters, and the orange square fuzzy clusters. (d) The Silhouette score measures cluster coherence and increases during training, as illustrated for a video with 60 zebrafish. (e) A Silhouette score of 0.91 corresponds to a human-validated error rate of less than 1% per image.

Figure 3 with 1 supplement

Download asset Open asset

Tracking with strong occlusions.

Accuracies when we mask a region of a video defined by an angle θ and the tracking system has no access to the information behind the mask. Light and dark gray regions correspond to the angles for which no global fragments exist in the video. Dark gray regions correspond to angles for which the video has a fragment connectivity lower than 0.5, with the fragment connectivity defined as the average number of other fragments each fragment coexists with, divided by $N - 1$ , with $N$ the total number of animals; see Figure 3—figure supplement 1, for an analysis justifying this value of 0.5 . The original idtracker.ai (v4) and its optimized version (v5) cannot work in the gray regions, and new idtracker.ai is expected to deteriorate only in the dark gray region.

Figure 3—figure supplement 1

Download asset Open asset

Fragment connectivity analysis.

The relation between fragment connectivity and the corresponding IDF1 score for every video and angle θ of the new idtracker.ai in Figure 3. Fragment connectivity is calculated as the average number of other fragments each fragment coexists with, divided by N−1, with N the total number of animals. Values below 0.5 are associated with low IDF1 score, and idtracker.ai alerts the user in such cases.

Appendix 1—figure 1

Download asset Open asset

Protocol 2 failure rate.

Probability for the different tracking systems of not tracking the video with protocol 2 in idtracker.ai (v4 and v5) and in TRex the probability that it fails without generating trajectories.

Appendix 3—figure 1

Download asset Open asset

Models comparison.

Error in image identification as a function of training time for different deep learning models randomly initialized in six test videos. For each network, we report the multiply-accumulate operations (MAC) in giga operations (G) (for a batch of 1600 images of size 40 × 40 × 1) and the number of parameters in the units of million parameters (M). Every 100 training batches, we perform k-means clustering on a randomly selected set of 20,000 images, assigning identities based on clusters. We then compute the Silhouette score and ground-truth error on the same set. The reported error corresponds to the model with the best Silhouette score observed up to that point.

Appendix 3—figure 2

Download asset Open asset

ResNet models comparison.

Error in image identification as a function of training time for different deep learning models randomly initialized (except Pre-trained ResNet18) in six test videos. For each network, we report the multiply-accumulate operations (MAC) in giga operations (G) (for a batch of 1600 images of size 40 × 40 ×1) and the number of parameters in the units of million parameters (M). Every 100 training batches, we perform k-means clustering on a randomly selected set of 20,000 images, assigning identities based on clusters. We then compute the Silhouette score and ground-truth error on the same set. The reported error corresponds to the model with the best Silhouette score observed up to that point.

Appendix 3—figure 3

Download asset Open asset

Embedding dimensions comparison.

Error in image identification as a function of training time for different embedding dimensions in six test videos. Every 100 training batches, we perform k-means clustering on a randomly selected set of 20,000 images, assigning identities based on clusters. We then compute the Silhouette score and ground-truth error on the same set. The reported error corresponds to the model with the best Silhouette score observed up to that point.

Appendix 3—figure 4

Download asset Open asset

$D_{neg}$ over $D_{pos}$ comparison.

Error in image identification as a function of training time for different ratios of $D_{neg} / D_{pos}$ in six test videos. Every 100 training batches, we perform k-means clustering on a randomly selected set of 20,000 images, assigning identities based on clusters. We then compute the Silhouette score and ground-truth error on the same set. The reported error corresponds to the model with the best Silhouette score observed up to that point.

Appendix 3—figure 5

Download asset Open asset

Batch size comparison.

Error in image identification as a function of training time for different batch sizes of pairs of images in six test videos. Every 100 training batches, we perform k-means clustering on a randomly selected set of 20,000 images, assigning identities based on clusters. We then compute the Silhouette score and ground-truth error on the same set. The reported error corresponds to the model with the best Silhouette score observed up to that point.

Appendix 3—figure 6

Download asset Open asset

Exploration and exploitation comparison.

Error in image identification as a function of training time for different exploration/exploitation weights α in six test videos. Every 100 training batches, we perform k-means clustering on a randomly selected set of 20,000 images, assigning identities based on clusters. We then compute the Silhouette score and ground-truth error on the same set. The reported error corresponds to the model with the best Silhouette score observed up to that point.

Appendix 4—figure 1

Download asset Open asset

Segmentation GUI.

Enables users to set the basic parameters required for running idtracker.ai.

Appendix 4—figure 2

Download asset Open asset

Validator GUI.

Enables users to inspect tracking results, correct errors, and access additional tools.

Appendix 4—figure 3

Download asset Open asset

Video Generator GUI.

Allows users to define parameters for general and individual video generation.

Appendix 4—figure 4

Download asset Open asset

SocialNet output example showing learned attraction-repulsion and alignment areas for social interactions around the focal animal.

Appendix 5—figure 1

Download asset Open asset

Confusion matrix between the two parts of drosophila_80 in v6 of idmatcher.ai.

It contains predictions both from the network trained in the first part with images from the second one and the other way around. In this example, the image-level accuracy is 82.9%, enough for a 100% accurate identity matching.

Additional files

MDAR checklist: https://cdn.elifesciences.org/articles/107602/elife-107602-mdarchecklist1-v1.pdf
Download elife-107602-mdarchecklist1-v1.pdf

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Mendeley

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Jordi Torrents
Tiago Costa
Gonzalo de Polavieja

(2026)

New idtracker.ai rethinks multi-animal tracking as a representation learning problem to increase accuracy and reduce tracking time

eLife 14:RP107602.

https://doi.org/10.7554/eLife.107602.4

Sign up for email alerts

Privacy notice