Anti-drift pose tracker (ADPT), a transformer-based network for robust animal pose estimation cross-species
Figures

Anti-drift pose tracker (ADPT).
(A) Three examples of drifts in deep learning-based animal behavioral analysis. Similar object disturbance means that the object similar to a specific body part misleads the deep learning-based methods. Inexplicable keypoint drift is caused by the high confidence score predicted on the wrong place by the network. Failure to detect the keypoint is probably caused by the predicted low confidence score. (B) The anti-drift effects of ADPT. (C) The general workflow of ADPT. The network is trained to predict confidence heatmap, low-resolution semantic segmentation (LRSS), and location refinement. (D) The network architecture of ADPT.

Analysis of anti-drift pose trackers (ADPT’s) anti-drift performance in a mouse dataset collected by our lab.
(A) The time course of the y-axis position of sixteen body parts extracted from a 1 min video using ADPT, DeepLabCut, and SLEAP tools. It showed that ADPT successfully detected all 16 body parts of a mouse, whereas DeepLabCut and SLEAP encountered inexplicable tracking drifts. (B) Two anti-drift examples from ADPT, where the tail was drifted by DeepLabCut and the hind claw failed to detect by SLEAP. (C) Overall percentage of tracking drift and failing to detect (miss) frames from three methods. ADPT demonstrated a significantly lower drift percentage than other methods. (D) The percentage of frames with tracking drift (left) and failing to detect (right). Drifts were mainly from the top four body parts, including the tip tail, the left and the right hind claws, and the middle tail. (E) The averaged RMSE across all body parts (left) and RMSE of the top four body parts with drifts (right). ADPT achieved the smallest RMSE than other two tools when thresholded at 0.2. *p<0.05, **p<0.01, ***p<0.001, ****p<0.0001. RMSE: root mean square error.

Anti-drift performance cross background and individual, where the percentage of frames includes two types of drift phenomena: drift and miss.
(A) The overall cross-individual anti-drift performance of anti-drift pose tracker (ADPT) and the other methods. The drift percentage of ADPT is significant lower than other methods. (B) After training the model 5 times on the dataset shuffle, the cross-individual drift percentage for each shuffle was analysed using one-way ANOVA. The ANOVA results revealed that there are differences in the inference results of the SLEAP model among individual, and there were no differences for ADPT or DeepLabCut. (C) The overall cross-background anti-drift performance of ADPT and the other methods. The drift percentage of ADPT is significant lower than other methods. (D) The cross-background drift percentage for each shuffle was analysed using one-way ANOVA. The ANOVA results revealed that there are slight differences in the inference results of the DeepLabCut model among individual, and there were no differences for ADPT or SLEAP. ns.: no significant, *p<0.05, **p<0.01, ***p<0.001, ****p<0.0001.
Video file containing clips of mouse behavior videos.

Analysis of anti-drift pose trackers (ADPT’s) anti-drift performance on monkey data, showing the cross species anti-drift ability.
(A) The time course of the -axis position of sixteen body parts extracted from a 1 min video using ADPT, DeepLabCut, and SLEAP tools. It showed that ADPT successfully detected all 17 body parts of a monkey, while the other two methods encountered tracking drift because of the appearance of humans. (B) DeepLabCut and SLEAP both mistakenly located the monkey’s eyes on humans when they appeared, while ADPT can achieve robust tracking. (C, D) The percentage of frames with tracking drift and failing to detect (miss). The occurrence of drift was mainly concentrated in the limbs, because the appearance of humans.
Video file containing a clip of monkey behavior video.

Results of public datasets evaluation.
(A) Samples of prediction on single fly dataset. (B) Mean average precision (mAP) on fly dataset, where anti-drift pose tracker (ADPT) achieved average of 92.8% accuracy (the best model achieved 93.27%). (C) Low-resolution semantic segmentation (LRSS) improved the average accuracy by 0.3% on a single fly dataset. (D) Relationship between annotated image and accuracy of ADPT on fly dataset where ADPT achieved acceptable performance with only 350 annotated images in a simple laboratory environment. Points indicate the validation accuracy of model training on specific number of labels dataset. (E) Transformer improved the average accuracy by 0.4% on a single fly dataset. (F) Samples of prediction on OMS_Dataset. (G) Root mean square error (RMSE) on OMS_Dataset, where ADPT achieved smaller root square mean error (RMSE) than SLEAP when threshold = 0.2, and smaller than DeepLabCut when threshold = 0.6. p-value, **: 0.001862, ns.: 0.243472, ***8.700e-06. (H) RMSE comparison on hip and tail of OMS_Dataset. p-value, ***0.000561, Hip ns.:0.023766, Tail ns.:0.336642, *: 0.035782.
Video examples of dog pose estimation.

Illustration for mix-up social animal dataset generation.
(A) Frames originating from different videos and corresponding background. (B) Mix-up image. (C) Represents schematic diagrams illustrating the keypoint generated from single animal pose estimation of anti-drift pose tracker (ADPT). (D) Represents an augmented mix-up image. (E) Represents schematic diagrams of augmented annotation. (F) Represents augmented keypoints. (G) Represents augmented low-resolution semantic segmentation (LRSS). (H) Represents schematic diagrams of augmented Body Affinity Fields (BAF), inspired by Part Affinity Fileds (Cao et al., 2021).

Applications of anti-drift pose tracker (ADPT) for multi-animal pose tracking.
(A) Left: The pipeline for the multi-animal identity-pose tracking task. (B) Confusion matrix of the 10 mice classification (accuracy = 93.16%). (C) Social mice tracking pipeline with identification accuracy of 99.72%.
Video file demonstrating single animal pose estimation and identity synchronized tracking.
Video file demonstrating social animal pose estimation and identity synchronized tracking.

Evaluation of anti-drift pose tracker (ADPT) for homecage social mice scenario.
(A) Illustration of homecage social mice dataset. (B) Filtered predicted back locations of different mice by ADPT. (C) Comparison of different methods and manual labels. We trained each model three times, and this figure presents the results from one of those training sessions. We calculated the average root square mean error (RMSE) between predictions and manual labels, demonstrating that ADPT achieved an average RMSE of 15.8±0.59 pixels, while DeepLabCut (DLC) and SLEAP recorded RMSEs of 113.19±42.75 pixels and 94.76±1.95 pixels, respectively. (D) Pose estimation accuracy comparison between ADPT and DLC based on the DLC evaluation metric. ADPT achieved an accuracy of 6.35±0.14 pixels across all body parts of the mice, while DLC reached 7.49±0.2 pixels. (E) Pose estimation accuracy comparison between ADPT and SLEAP using the SLEAP evaluation metric. ADPT achieved 8.33±0.19 pixels across all body parts of the mice, compared to SLEAP’s 9.82±0.57 pixels. (F) Body affinity fields (BAF) improved pose estimation accuracy by 0.4 pixels under the SLEAP evaluation metric.
Video file demonstrating homecage social mice pose estimation and identity synchronized tracking.
Additional files
-
Supplementary file 1
Comparison among three methods on single fly dataset and OMS_Dataset.
- https://cdn.elifesciences.org/articles/95709/elife-95709-supp1-v1.xlsx
-
MDAR checklist
- https://cdn.elifesciences.org/articles/95709/elife-95709-mdarchecklist1-v1.pdf