Real-time, low-latency closed-loop feedback using markerless posture tracking

  1. Gary A Kane
  2. Gonçalo Lopes
  3. Jonny L Saunders
  4. Alexander Mathis
  5. Mackenzie W Mathis  Is a corresponding author
  1. The Rowland Institute at Harvard, Harvard University, United States
  2. NeuroGEARS Ltd, United Kingdom
  3. Institute of Neuroscience, Department of Psychology, University of Oregon, United States
  4. Center for Neuroprosthetics, Center for Intelligent Systems, & Brain Mind Institute, School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Switzerland
6 figures, 5 tables and 1 additional file

Figures

Overview of using DLC networks in real-time experiments within Bonsai, the DLC-Live! GUI, and AutoPilot.
Figure 2 with 4 supplements
Inference speed of different networks with different image sizes on different system configurations.

Solid lines are tests using DLC-MobileNetV2-0.35 networks and dashed lines using DLC-ResNet-50 networks. Shaded regions represent 10th-90th percentile of the distributions. N = 1000–10,000 observations per point.

Figure 2—figure supplement 1
Inference speed of different networks with different image sizes with different system configurations, using CPUs for inference.

Inference speed of different networks with different image sizes with different system configurations, using CPUs for inference. Solid lines are tests using DLC-MobileNetV2-0.35 networks and dashed lines using DLC-ResNet-50 networks. Shaded regions represent 10th-90th percentile of the distributions. N = 1000–10,000 observations per point.

Figure 2—figure supplement 2
Accuracy of DLC networks on resized images.

Accuracy of DLC networks on resized images. Accuracy of DLC networks was tested on a range of image sizes (test image scale = 0.125–1.5, different panels). Lines indicate the root mean square error (RMSE, y-axis) between DLC predicted keypoints, using DLC networks trained with different scale parameters (x-axis), and human labels. Different colors are for errors on images in the training dataset (blue) vs. test dataset (red). Note, the RMSE is not reported as on the scaled image, but as would be in the original image (640 × 480) for comparability. .

Figure 2—figure supplement 3
Dynamic cropping increased inference speed with little sacrifice to accuracy.

(Left) Inference speeds (in fps) for the full size images compared to dynamic cropping with 50, 25, or 10 pixel margin around the animals last position. Boxplots represent the 0.1, 0.3, 0.5, 0.7, and 0.9% quantiles of each distribution. (Right) The cumulative distribution of errors (i.e. the proportion of images with a smaller error) for dynamic tracking relative to full size images.

Figure 2—figure supplement 4
The number of keypionts in a DeepLabCut network does not affect inference speed.

Inference speeds were not affected by the number of keypoints in a DeepLabCut network. Inference speeds were tested on a series of dog tracking networks with 1 (only the nose), 7 (the face), 13 (the upper body), or 20 (the entire body) keypoints. Inference speeds were tested on different image sizes (x-axis) and on different DeepLabCut architectures (different panels). Lines represent the median and the ribbon represents the 0.1–0.9 quantiles of the distribution of inference speeds.

Pose estimation latency using the DLC-Live!-GUI.

Top: A lick sequence from the test video with eight keypoints measured using the DLC-Live!-GUI (using Windows/GeForce GTX 1080 GPU). Image timestamps are presented in the bottom-left corner of each image. Bottom: Latency from image acquisition to obtaining the pose, from the last pose to the current pose, and from image acquisition when the mouse’s tongue was detected to turning on an LED. The width of the violin plots indicate the probability density – the likelihood of observing a given value on the y-axis.

Inference speed using the Bonsai-DLC Plugin.

Dashed Lines are ResNet-50, solid lines are MobileNetV2-0.35 (A) Overview of using DeepLabCut within Bonsai. Both capture-to-write or capture to detect for real-time feedback is possible. (B) Top: Direct comparison of the inference speeds using Bonsai-DLC plugin vs. the DeepLabCut-Live! package across image sizes from the same computer (OS: Windows 10, GPU: NVIDIA Ge-Force 1080 with Max-Q Design). Bottom: Inference speeds using the Bonsai-DLC plugin while the GPU was engaged in a particle simulation. More particles indicates greater competition for GPU resources.

Latency of DeepLabCut-Live with Autopilot.

(A) Images of an LED were captured on a Raspberry Pi 4 (shaded blue) with Autopilot, sent to an Nvidia Jetson TX2 (shaded red) to process with DeepLabCut-Live and Autopilot Transforms, which triggered a TTL pulse from the Pi when the LED was lit. (B) Latencies from LED illumination to TTL pulse (software timestamps shown, points and densities) varied by resolution (color groups) and acquisition frame rate (shaded FPS bar). Processing time at each stage of the chain in (A) contributes to latency, but pose inference on the relatively slow TX2 (shaded color areas, from Figure 2) had the largest effect. Individual measurements (n = 500 each condition) cluster around multiples of the inter-frame interval (eg. 1/30 FPS = 33.3 ms).

Figure 6 with 2 supplements
Real-time feedback based on posture.

(A) Diagram of the workflow for the dog feedback experiment. Image acquisition and pose estimation were controlled by the DLC-Live! GUI. A DeepLabCut-live Processor was used to detect ‘rearing’ postures based on the likelihood and y-coordinate of the withers and elbows and turn an LED on or off accordingly. (B) An example jump sequence with LED status, labeled with keypoints measured offline using the DeepLabCut-live benchmarking tool. The images are cropped for visibility. (C) Example trajectories of the withers and elbows, locked to one jump sequence. Larger, transparent points represent the true trajectories – trajectories measured offline, from each image in the video. The smaller opaque points represent trajectories measured in real-time, in which the time of each point reflects the delay from image acquisition to pose estimation, with and without the Kalman filter forward prediction. Without forward prediction, estimated trajectories are somewhat delayed from the true trajectories. With the Kalman filter forward prediction, trajectories are less accurate but less delayed when keypoints exhibit rapid changes in position, such as during a rearing movement. (D) The delay from the time the dog first exhibited a rearing posture (from postures measured offline) to the time the LED was turned on or off. Each point represents a single instance of the detection of a transition to a rearing posture or out of a rearing posture.

Figure 6—figure supplement 1
The Kalman filter predictor reduces errors when predicting up to three frames into the future in a mouse reaching task.

(Top) Images at different stages of the mouse reach, with the blue circle indicating the DeepLabCut predicted coordinate of the back of the hand. (Bottom, Left) The trajectory of the x-coordinate of the back of the mouse’s hand during the reach (pictured above). The thicker gray line indicates the trajectory measured by DLC. the blue line indicates the DLC trajectory as if it was delayed by 1–7 frames (indicated by the panel label on the right). The red line indicates the trajectory as if it were predicted 1–7 frames into the future by the Kalman filter predictor. (Bottom, Right) The cumulative distribution of errors for the back of the hand coordinate as if it were delayed 1–7 frames (blue) or as if it were predicted 1–7 frames into the future using the Kalman filter predictor (red) compared to the DLC estimated back of the hand coordinate. The dotted vertical line indicates the error at which the KF distribution intersects with the delayed pose distribution (i.e. the point at which there are more frames with this pixel error or smaller for the Kalman filter distribution than the delayed pose distribution). .

Figure 6—figure supplement 2
The Kalman filter predictor reduces errors when predicting up to three frames into the future in a mouse open-field task.

The Kalman filter predictor reduces errors when predicting up to three frames into the future in a mouse open-field task. (Top) Images at different stages of the mouse open field video, with the blue circle indicating the DeepLabCut predicted coordinate of the snout. (Bottom Left) The trajectory of the x-coordinate of the mouse’s snout. The thicker gray line indicates the trajectory measured by DLC. The blue line indicates the DLC trajectory as if it was delayed by 1–7 frames (indicated by the panel label on the right). The red line indicates the trajectory as if it were predicted 1–7 frames into the future by the Kalman filter predictor. (Bottom Right) The cumulative distribution of errors for the back of the hand coordinate as if it were delayed 1–7 frames (blue) or as if it were predicted 1–7 frames into the future using the Kalman filter predictor (red) compared to the DLC estimated snout coordinate. The dotted vertical line indicates the error at which the two distributions intersect (i.e. the point at which there are more frames with this pixel error or smaller for either the delayed pose or the Kalman filter distribution).

Tables

Table 1
Inference speed using the DeepLabCut-live package.

All values are mean ± standard deviation; Frames Per Second (FPS). See also: https://deeplabcut.github.io/DLC-inferencespeed-benchmark/.

Dog videoImage size (pixels, w*h)
GPU typeDLC model75 × 133150 × 267300 × 533424 × 754600 × 1067
LinuxIntel Xeon CPUMobileNetV2-0.3562 ± 531 ± 114 ± 08 ± 04 ± 0
(3.1 GHz)ResNet-5024 ± 111 ± 03 ± 02 ± 01 ± 0
GeForce GTX 1080MobileNetV2-0.35256 ± 51208 ± 46124 ± 1980 ± 943 ± 1
ResNet-50121 ± 1595 ± 1352 ± 333 ± 119 ± 0
TITAN RTXMobileNetV2-0.35168 ± 29144 ± 20133 ± 14115 ± 971 ± 3
ResNet-50132 ± 14113 ± 1182 ± 456 ± 233 ± 1
Tesla V100-SXM2-16GBMobileNetV2-0.35229 ± 13189 ± 10138 ± 11105 ± 664 ± 3
ResNet-50169 ± 7133 ± 490 ± 465 ± 242 ± 1
Tesla P100-PCIE-16GBMobileNetV2-0.35220 ± 12179 ± 9114 ± 777 ± 344 ± 1
ResNet-50115 ± 391 ± 259 ± 245 ± 126 ± 1
Tesla K80MobileNetV2-0.35118 ± 4105 ± 364 ± 447 ± 226 ± 1
ResNet-5058 ± 243 ± 121 ± 113 ± 07 ± 0
Tesla T4MobileNetV2-0.35200 ± 17166 ± 13117 ± 1086 ± 549 ± 2
ResNet-50134 ± 899 ± 551 ± 333 ± 118 ± 0
Quadro P400MobileNetV2-0.35105 ± 476 ± 231 ± 118 ± 010 ± 0
ResNet-5024 ± 016 ± 06 ± 04 ± 02 ± 0
WindowsIntel Xeon Silver CPUMobileNetV2-0.3528 ± 113 ± 06 ± 03 ± 01 ± 0
(2.1 GHz)ResNet-5022 ± 19 ± 03 ± 02 ± 01 ± 0
GeForce GTX 1080MobileNetV2-0.35142 ± 17132 ± 1498 ± 569 ± 341 ± 1
with Max-Q DesignResNet-5087 ± 377 ± 348 ± 131 ± 118 ± 0
GeForce GTX 1080MobileNetV2-0.35128 ± 11115 ± 1094 ± 772 ± 341 ± 1
ResNet-50101 ± 586 ± 449 ± 132 ± 118 ± 0
Quadro P1000MobileNetV2-0.35120 ± 11108 ± 1058 ± 239 ± 120 ± 0
ResNet-5054 ± 238 ± 117 ± 010 ± 05 ± 0
MacOSIntel Core i5 CPUMobileNetV2-0.3539 ± 520 ± 211 ± 17 ± 14 ± 0
(2.4 GHz)ResNet-508 ± 14 ± 01 ± 01 ± 00 ± 0
Intel Core i7 CPUMobileNetV2-0.35117 ± 847 ± 315 ± 18 ± 04 ± 0
(3.5 GHz)ResNet-5029 ± 211 ± 13 ± 02 ± 01 ± 0
Intel Core i9 CPUMobileNetV2-0.35126 ± 2566 ± 1319 ± 311 ± 16 ± 0
(2.4 GHz)ResNet-5031 ± 616 ± 26 ± 14 ± 02 ± 0
JetsonXavierMobileNetV2-0.3568 ± 860 ± 754 ± 441 ± 125 ± 1
ResNet-5046 ± 134 ± 117 ± 012 ± 06 ± 0
Tegra X2MobileNetV2-0.3537 ± 232 ± 213 ± 07 ± 04 ± 0
ResNet-5018 ± 112 ± 05 ± 03 ± 02 ± 0
Mouse Videoimage size (pixels, w*h)
GPU TypeDLC Model115 × 87229 × 174459 × 349649 × 493917 × 698
LinuxIntel Xeon CPUMobileNetV2-0.3561 ± 432 ± 115 ± 08 ± 04 ± 0
(3.1 GHz)ResNet-5028 ± 111 ± 04 ± 02 ± 01 ± 0
GeForce GTX 1080MobileNetV2-0.35285 ± 24209 ± 23134 ± 986 ± 244 ± 1
ResNet-50136 ± 8106 ± 360 ± 135 ± 019 ± 0
TITAN RTXMobileNetV2-0.35169 ± 28145 ± 19152 ± 15124 ± 978 ± 3
ResNet-50140 ± 16119 ± 1192 ± 358 ± 235 ± 1
Tesla V100-SXM2-16GBMobileNetV2-0.35260 ± 12218 ± 9180 ± 18121 ± 868 ± 3
ResNet-50184 ± 6151 ± 5111 ± 675 ± 347 ± 2
Tesla P100-PCIE-16GBMobileNetV2-0.35246 ± 12198 ± 7138 ± 879 ± 346 ± 1
ResNet-50128 ± 3103 ± 270 ± 346 ± 128 ± 1
Tesla K80MobileNetV2-0.35127 ± 6119 ± 579 ± 452 ± 228 ± 1
ResNet-5060 ± 245 ± 223 ± 113 ± 07 ± 0
Tesla T4MobileNetV2-0.35242 ± 21197 ± 16156 ± 14101 ± 656 ± 2
ResNet-50141 ± 7102 ± 557 ± 334 ± 120 ± 0
Quadro P400MobileNetV2-0.35114 ± 584 ± 337 ± 120 ± 010 ± 0
ResNet-5025 ± 016 ± 07 ± 04 ± 02 ± 0
WindowsIntel Xeon Silver CPUMobileNetV2-0.3528 ± 114 ± 06 ± 03 ± 01 ± 0
(2.1 GHz)ResNet-5023 ± 19 ± 03 ± 02 ± 01 ± 0
GeForce GTX 1080MobileNetV2-0.35147 ± 17136 ± 15108 ± 673 ± 346 ± 1
with Max-Q DesignResNet-50107 ± 593 ± 454 ± 232 ± 118 ± 0
GeForce GTX 1080MobileNetV2-0.35133 ± 15119 ± 14100 ± 977 ± 343 ± 1
ResNet-50110 ± 594 ± 355 ± 134 ± 119 ± 0
Quadro P1000MobileNetV2-0.35129 ± 13115 ± 1172 ± 343 ± 122 ± 0
ResNet-5057 ± 241 ± 119 ± 010 ± 06 ± 0
MacOSIntel Core i5 CPUMobileNetV2-0.3542 ± 522 ± 212 ± 17 ± 14 ± 0
(2.4 GHz)ResNet-509 ± 14 ± 02 ± 01 ± 00 ± 0
Intel Core i7 CPUMobileNetV2-0.35127 ± 1048 ± 316 ± 19 ± 14 ± 0
(3.5 GHz)ResNet-5030 ± 310 ± 23 ± 02 ± 01 ± 0
Intel Core i9 CPUMobileNetV2-0.35178 ± 1574 ± 1619 ± 310 ± 16 ± 0
(2.4 GHz)ResNet-5035 ± 814 ± 26 ± 14 ± 02 ± 0
JetsonXavierMobileNetV2-0.3579 ± 968 ± 959 ± 544 ± 227 ± 1
ResNet-5046 ± 236 ± 119 ± 012 ± 07 ± 0
TX2MobileNetV2-0.3539 ± 230 ± 218 ± 19 ± 05 ± 0
ResNet-5019 ± 111 ± 06 ± 04 ± 02 ± 0
Table 2
Performance of the DeepLabCut-live-GUI (DLG).

F-P = delay from image acquisition to pose estimation; F-L = delay from image acquisition to turning on the LED; FPS (DLG) = Rate of pose estimation (in frames per second) in the DeepLabCut-live-GUI; FPS (DLCLive) = Rate of pose estimation for the same exact configuration directly tested using the DeepLabCut-live benchmarking tool. All values are mean ± STD.

176 × 137 pixels352 × 274 pixels
GPU typeModeF-PF-LFPSFPSF-PF-LFPSFPS
(ms)(ms)(DLG)(DLCLive)(ms)(ms)(DLG)(DLCLive)
WindowsGeForce GTX 1080Latency10 ± 111 ± 156 ± 16123 ± 1612 ± 212 ± 248 ± 6112 ± 12
Rate15 ± 316 ± 391 ± 1116 ± 317 ± 384 ± 9
LinuxQuadro P400Latency11 ± 111 ± 163 ± 19105 ± 420 ± 121 ± 142 ± 752 ± 1
Rate15 ± 316 ± 393 ± 927 ± 427 ± 447 ± 4
JetsonXavierLatency13 ± 114 ± 148 ± 384 ± 713 ± 114 ± 148 ± 373 ± 9
Rate18 ± 318 ± 375 ± 518 ± 318 ± 374 ± 5
MacOSCPULatency29 ± 529 ± 429 ± 562 ± 679 ± 1979 ± 2212 ± 221 ± 3
Rate34 ± 735 ± 635 ± 491 ± 1992 ± 2212 ± 2
Table 3
Materials for Autopilot tests.
ToolVersion
Raspberry Pi4, 2 GB
Autopilot0.3.0-2f31e78
JetsonTX2 Developer Kit
CameraFLIR CM3-U3-13Y3M-CS
Spinnaker SDK2.0.0.147
OscilloscopeTektronix TDS 2004B
Table 4
Software packages presented with this paper.
NameURL
DeepLabCut-Live! SDKGitHub Link
Benchmarking SubmissionGitHub Link
Benchmarking ResultsWebsite Link
DLC-Live! GUIGitHub Link
Bonsai - DLC PluginGitHub Link
AutoPilot - DLCGitHub Link
Table 5
Relevant DLC updates.
FeatureDLC versionPub. link
DLC-ResNets1, 2.0+Mathis et al., 2018b; Nath et al., 2019
DLC-MobileNetV2s2.1+Mathis et al., 2020a
Model Export Fxn2.1.8+this paper
DeepLabCut-livenew packagethis paper

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Gary A Kane
  2. Gonçalo Lopes
  3. Jonny L Saunders
  4. Alexander Mathis
  5. Mackenzie W Mathis
(2020)
Real-time, low-latency closed-loop feedback using markerless posture tracking
eLife 9:e61909.
https://doi.org/10.7554/eLife.61909