Anti-drift pose tracker (ADPT), a transformer-based network for robust animal pose estimation cross-species

  1. Guoling Tang
  2. Yaning Han
  3. Xing Sun
  4. Ruonan Zhang
  5. Ming-Hu Han
  6. Quanying Liu  Is a corresponding author
  7. Pengfei Wei  Is a corresponding author
  1. University of Chinese Academy of Sciences, China
  2. Guangxi University of Science and Technology, China
  3. Shenzhen University of Advanced Technology, China
  4. Department of Biomedical Engineering, Southern University of Science and Technology, China
8 figures and 2 additional files

Figures

Anti-drift pose tracker (ADPT).

(A) Three examples of drifts in deep learning-based animal behavioral analysis. Similar object disturbance means that the object similar to a specific body part misleads the deep learning-based methods. Inexplicable keypoint drift is caused by the high confidence score predicted on the wrong place by the network. Failure to detect the keypoint is probably caused by the predicted low confidence score. (B) The anti-drift effects of ADPT. (C) The general workflow of ADPT. The network is trained to predict confidence heatmap, low-resolution semantic segmentation (LRSS), and location refinement. (D) The network architecture of ADPT.

Analysis of anti-drift pose trackers (ADPT’s) anti-drift performance in a mouse dataset collected by our lab.

(A) The time course of the y-axis position of sixteen body parts extracted from a 1 min video using ADPT, DeepLabCut, and SLEAP tools. It showed that ADPT successfully detected all 16 body parts of a mouse, whereas DeepLabCut and SLEAP encountered inexplicable tracking drifts. (B) Two anti-drift examples from ADPT, where the tail was drifted by DeepLabCut and the hind claw failed to detect by SLEAP. (C) Overall percentage of tracking drift and failing to detect (miss) frames from three methods. ADPT demonstrated a significantly lower drift percentage than other methods. (D) The percentage of frames with tracking drift (left) and failing to detect (right). Drifts were mainly from the top four body parts, including the tip tail, the left and the right hind claws, and the middle tail. (E) The averaged RMSE across all body parts (left) and RMSE of the top four body parts with drifts (right). ADPT achieved the smallest RMSE than other two tools when thresholded at 0.2. *p<0.05, **p<0.01, ***p<0.001, ****p<0.0001. RMSE: root mean square error.

Figure 3 with 1 supplement
Anti-drift performance cross background and individual, where the percentage of frames includes two types of drift phenomena: drift and miss.

(A) The overall cross-individual anti-drift performance of anti-drift pose tracker (ADPT) and the other methods. The drift percentage of ADPT is significant lower than other methods. (B) After training the model 5 times on the dataset shuffle, the cross-individual drift percentage for each shuffle was analysed using one-way ANOVA. The ANOVA results revealed that there are differences in the inference results of the SLEAP model among individual, and there were no differences for ADPT or DeepLabCut. (C) The overall cross-background anti-drift performance of ADPT and the other methods. The drift percentage of ADPT is significant lower than other methods. (D) The cross-background drift percentage for each shuffle was analysed using one-way ANOVA. The ANOVA results revealed that there are slight differences in the inference results of the DeepLabCut model among individual, and there were no differences for ADPT or SLEAP. ns.: no significant, *p<0.05, **p<0.01, ***p<0.001, ****p<0.0001.

Figure 3—video 1
Video file containing clips of mouse behavior videos.
Figure 4 with 1 supplement
Analysis of anti-drift pose trackers (ADPT’s) anti-drift performance on monkey data, showing the cross species anti-drift ability.

(A) The time course of the y-axis position of sixteen body parts extracted from a 1 min video using ADPT, DeepLabCut, and SLEAP tools. It showed that ADPT successfully detected all 17 body parts of a monkey, while the other two methods encountered tracking drift because of the appearance of humans. (B) DeepLabCut and SLEAP both mistakenly located the monkey’s eyes on humans when they appeared, while ADPT can achieve robust tracking. (C, D) The percentage of frames with tracking drift and failing to detect (miss). The occurrence of drift was mainly concentrated in the limbs, because the appearance of humans.

Figure 4—video 1
Video file containing a clip of monkey behavior video.
Figure 5 with 2 supplements
Results of public datasets evaluation.

(A) Samples of prediction on single fly dataset. (B) Mean average precision (mAP) on fly dataset, where anti-drift pose tracker (ADPT) achieved average of 92.8% accuracy (the best model achieved 93.27%). (C) Low-resolution semantic segmentation (LRSS) improved the average accuracy by 0.3% on a single fly dataset. (D) Relationship between annotated image and accuracy of ADPT on fly dataset where ADPT achieved acceptable performance with only 350 annotated images in a simple laboratory environment. Points indicate the validation accuracy of model training on specific number of labels dataset. (E) Transformer improved the average accuracy by 0.4% on a single fly dataset. (F) Samples of prediction on OMS_Dataset. (G) Root mean square error (RMSE) on OMS_Dataset, where ADPT achieved smaller root square mean error (RMSE) than SLEAP when threshold = 0.2, and smaller than DeepLabCut when threshold = 0.6. p-value, **: 0.001862, ns.: 0.243472, ***8.700e-06. (H) RMSE comparison on hip and tail of OMS_Dataset. p-value, ***0.000561, Hip ns.:0.023766, Tail ns.:0.336642, *: 0.035782.

Figure 5—figure supplement 1
Picture examples of dog pose estimation.
Figure 5—video 1
Video examples of dog pose estimation.
Illustration for mix-up social animal dataset generation.

(A) Frames originating from different videos and corresponding background. (B) Mix-up image. (C) Represents schematic diagrams illustrating the keypoint generated from single animal pose estimation of anti-drift pose tracker (ADPT). (D) Represents an augmented mix-up image. (E) Represents schematic diagrams of augmented annotation. (F) Represents augmented keypoints. (G) Represents augmented low-resolution semantic segmentation (LRSS). (H) Represents schematic diagrams of augmented Body Affinity Fields (BAF), inspired by Part Affinity Fileds (Cao et al., 2021).

Figure 7 with 2 supplements
Applications of anti-drift pose tracker (ADPT) for multi-animal pose tracking.

(A) Left: The pipeline for the multi-animal identity-pose tracking task. (B) Confusion matrix of the 10 mice classification (accuracy = 93.16%). (C) Social mice tracking pipeline with identification accuracy of 99.72%.

Figure 7—video 1
Video file demonstrating single animal pose estimation and identity synchronized tracking.
Figure 7—video 2
Video file demonstrating social animal pose estimation and identity synchronized tracking.
Figure 8 with 1 supplement
Evaluation of anti-drift pose tracker (ADPT) for homecage social mice scenario.

(A) Illustration of homecage social mice dataset. (B) Filtered predicted back locations of different mice by ADPT. (C) Comparison of different methods and manual labels. We trained each model three times, and this figure presents the results from one of those training sessions. We calculated the average root square mean error (RMSE) between predictions and manual labels, demonstrating that ADPT achieved an average RMSE of 15.8±0.59 pixels, while DeepLabCut (DLC) and SLEAP recorded RMSEs of 113.19±42.75 pixels and 94.76±1.95 pixels, respectively. (D) Pose estimation accuracy comparison between ADPT and DLC based on the DLC evaluation metric. ADPT achieved an accuracy of 6.35±0.14 pixels across all body parts of the mice, while DLC reached 7.49±0.2 pixels. (E) Pose estimation accuracy comparison between ADPT and SLEAP using the SLEAP evaluation metric. ADPT achieved 8.33±0.19 pixels across all body parts of the mice, compared to SLEAP’s 9.82±0.57 pixels. (F) Body affinity fields (BAF) improved pose estimation accuracy by 0.4 pixels under the SLEAP evaluation metric.

Figure 8—video 1
Video file demonstrating homecage social mice pose estimation and identity synchronized tracking.

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Guoling Tang
  2. Yaning Han
  3. Xing Sun
  4. Ruonan Zhang
  5. Ming-Hu Han
  6. Quanying Liu
  7. Pengfei Wei
(2025)
Anti-drift pose tracker (ADPT), a transformer-based network for robust animal pose estimation cross-species
eLife 13:RP95709.
https://doi.org/10.7554/eLife.95709.3