Figures and data

Anti-drift pose tracker (ADPT). A Three examples of drifts in deep learning-based animal behavioral analysis. Similar object disturbance means that the object similar to a specific body part misleads the deep learning-based methods. Inexplicable keypoint drift is caused by the high confidence score predicted on the wrong place by the network. Failure to detect the keypoint is probably caused by the the predicted low confidence score. B The anti-drift effects of ADPT. C The general workflow of ADPT. The network is trained to predict confidence heatmap, LRSS, and location refinement. D The network architecture of ADPT.

Analysis of ADPT’s anti-drift performance in a mouse dataset collected by our lab. A The time course of the y-axis position of sixteen body parts extracted from a one-minute video using ADPT, DeepLabCut and SLEAP tools. It showed that ADPT successfully detected all 16 body parts of a mouse, whereas DeepLabCut and SLEAP encountered inexplicable tracking drifts. B Two anti-drift examples from ADPT, where the tail was drifted by DeepLabCut and the hind claw failed to detect by SLEAP. C Overall percentage of tracking drift and failing to detect (miss) frames from three methods. ADPT demonstrated a significantly lower drift percentage than other methods. D The percentage of frames with tracking drift (left) and failing to detect (right). Drifts were mainly from the top four body parts, including the tip tail, the left and the right hind claws, and the middle tail. E The averaged RMSE across all body parts (left) and RMSE of the top four body parts with drifts (right). ADPT achieved the smallest RMSE than other two tools when thresholded at 0.2. *: P<0.05, **: P<0.01, ***: P<0.001, ****: P<0.0001. RMSE: root mean square error.

Anti-drift performance cross background and individual, where the percentage of frames includes two types of drift phenomena: drift and miss. A The overall cross-individual anti-drift performance of ADPT and the other methods. The drift percentage of ADPT is significant lower than other methods. B After training the model 5 times on the dataset shuffle, the cross-individual drift percentage for each shuffle was analyse using one-way ANOVA. The ANOVA results revealed that there are differences in the inference results of the SLEAP model among individual, and there were no differences for ADPT or DeepLabCut. C The overall cross-background anti-drift performance of ADPT and the other methods. The drift percentage of ADPT is significant lower than other methods. D The cross-background drift percentage for each shuffle was analyse using one-way ANOVA. The ANOVA results revealed that there are slight differences in the inference results of the DeepLabCut model among individual, and there were no differences for ADPT or SLEAP. ns.: no significant, *: P<0.05, **: P<0.01, ***: P<0.001, ****: P<0.0001.

Analysis of ADPT’s anti-drift performance on monkey data, showing the cross species anti-drift ability. A The time course of the y-axis position of sixteen body parts extracted from a one-minute video using ADPT, DeepLabCut and SLEAP tools. It showed that ADPT successfully detected all 17 body parts of a monkey, while the other two methods encountered tracking drift because of the appearance of humans. B DeepLabCut and SLEAP both mistakenly located the monkey’s eyes on humans when they appeared, while ADPT can achieve robust tracking. C, D The percentage of frames with tracking drift and failing to detect (miss). The occurrence of drift was mainly concentrated in the limbs, because the appearance of humans.

Results of public datasets evaluation. A Samples of prediction on single fly dataset. B Mean average precision (mAP) on fly dataset, where ADPT achieved average 92.8% accuracy (the best model achieved 93.27%). C LRSS improved the average accuracy by 0.3% on single fly dataset. D Relationship between annotated image and accuracy of ADPT on fly dataset where ADPT achieved acceptable performance with only 350 annotated images in a simple laboratory environment. Points indicate the validation accuracy of model training on specific number of labels dataset. E Transformer improved the average accuracy by 0.4% on single fly dataset. F Samples of prediction on OMS_Dataset. G Root mean square error (RMSE) on OMS_Dataset, where ADPT achieved smaller RMSE than SLEAP when threshold = 0.2, and smaller than DeepLabCut when threshold = 0.6. P value, **: 0.001862, ns.: 0.243472, ***: 8.700e-06. H RMSE comparison on hip and tail of OMS_Dataset.P value, ***: 0.000561, Hip ns. :0.023766, Tail ns. :0.336642, *: 0.035782.

Illustration for mix-up social animal dataset generation. A Frames originating from different videos and corresbonding background. B Mix-up image. C Represents schematic diagrams illustrating the keypoint generated from single animal pose estimation of ADPT. D Represents an augmented mix-up image. E Represents schematic diagrams of augmented annotation. F Represents augmented keypoints. G Represents augmented LRSS. H Represents schematic diagrams of augmented Body Affinity Fields(BAF), inspired by Part Affinity Fileds(Cao et al. (2021)).

Applications of ADPT for multi-animal pose tracking.
A Left: The pipeline for the multi-animal identity-pose tracking task. B Confusion matrix of the 10-mice classification (accuracy=93.16%).C Social mice tracking pipeline with identification accuracy of 99.72%.

Evaluation of ADPT for homecage social mice scenario.
A Illustration of homecage social mice dataset. B Filtered predicted back locations of different mice by ADPT. C Comparison of different methods and manual labels. We trained each model three times, and this figure presents the results from one of those training sessions. We calculated the average RMSE between predictions and manual labels, demonstrating that ADPT achieved an average RMSE of 15.8 ± 0.59 pixels, while DeepLabCut (DLC) and SLEAP recorded RMSEs of 113.19 ± 42.75 pixels and 94.76 ± 1.95 pixels, respectively. D Pose estimation accuracy comparison between ADPT and DLC based on the DLC evaluation metric. ADPT achieved an accuracy of 6.35 ± 0.14 pixels across all body parts of the mice, while DLC reached 7.49 ± 0.2 pixels. E Pose estimation accuracy comparison between ADPT and SLEAP using the SLEAP evaluation metric. ADPT achieved 8.33 ± 0.19 pixels across all body parts of the mice, compared to SLEAP’s 9.82 ± 0.57 pixels. F Body Affinity Fields (BAF) improved pose estimation accuracy by 0.4 pixels under the SLEAP evaluation metric.
