DeePosit, an AI-based tool for detecting mouse urine and fecal depositions from thermal video clips of behavioral experiments

  1. David Peles  Is a corresponding author
  2. Shai Netser
  3. Natalie Ray
  4. Taghreed Suliman
  5. Shlomo Wagner
  1. Sagol Department of Neurobiology, Faculty of Natural Sciences, University of Haifa, Israel
6 figures, 2 videos, 2 tables and 1 additional file

Figures

Investigation time across sexes and tests in CD1 mice.

Each of the tests (SP, SxP, and ESPs) is comprised of a 15-min habituation stage with empty chambers, followed by a 5-min trial stage in which the stimuli are present in the chambers (a). The setup row shows schematic representations of the arena for the (b) SP, (c) SxP, and (d) ESPs tests, while the males and females rows show the mean (± SEM) time dedicated by male (n = 36, blue bars) and female (n = 35, red bars) mice to investigate each stimulus during the various tests. The two leftmost bars in each panel show the total investigation time, while the two middle bars show the time spent on short (≤ 6 s) investigation bouts, and the two rightmost bars show the time spent on long (>6 s) investigation bouts. A two-sided Wilcoxon rank sum test was used for statistical significance. ***p < 0.001.

Figure 2 with 2 supplements
The experimental setup and analysis method.

The experimental setup (a) includes a visible light (VIS) camera, an infrared (IR) camera, and a blackbody set to 37°C. VIS (b) and IR (c) images that were captured at the same moment, a short time after a urine deposition, exemplify that, as the urine is still warm, it appears as highly contrasted blob in the IR image but not in the VIS one. Large urine spots, such as the one shown in (d), may be smeared across the arena’s floor (e), which is one limitation of the use of filter paper for quantifying urination at the end of the experiment. The preliminary detection algorithm is based on subtracting a background image from each frame in the video (f), which allows the detection of hot blobs reflecting the animal itself and urine and feces deposits. The detected blobs are then classified using a transformer-based artificial neural network (g), which gets as its input a time series of patches cropped around the detection and provides its classification as an output. Each three patches in that time series are merged into a single RGB image (see methods). In the confusion matrix presenting the accuracy of the full pipeline for test videos (h) in CD1 mice, the ‘Miss’ row counts the events that were not detected by the preliminary hot blobs detection and, hence, were not fed to the classifier. The BG (background) column counts the number of automatic detections for which no matching manually tagged event exists in the relevant space and time window. Test videos include videos from 60 experiments. See Methods for more details. The precision, recall, and F1 score for urine detection are 0.90, 0.86, 0.88 accordingly, and 0.91, 0.89, 0.90 for feces detection. The mean F1 score: (F1Urine+F1Feces)/2 is 0.89.

Figure 2—figure supplement 1
Accuracy for small and large detections in CD1 mice.

(a, b) Confusion matrices on test videos with separation between large and small automatic detections. The threshold for large detections is an area of 1cm2 which is 47.3 pixels. Shown percents sum to 1 for each column in (a) and each row in (b). The Large Urination class is correct in 98.2% of the cases in which it was reported by the classifier while Small Urination is correct in only 84.5% as shown in (b). Most of the confusion between feces and urine spots is for small detections: 2.3% of the Ground Truth (GT) urine events were classified as Small Feces while 0% as Large Feces as shown in (a). Also, 2.4% of the GT feces events were classified as Small Urine while 0% as Large Urine. No GT feces event was classified as Large BG. While feces are usually small, Large Feces detection might occur when two adjacent feces are detected as a single segment or when the detected segment contains both urine and feces. The test set includes videos from 60 experiments.

Figure 2—video 1
Video for the events in the confusion matrix.

Each urine or feces event is shown in a 65 × 65 pixel window from –11 s before the event to +60 s afterward (similar to the classifier input). The video shows both the manual annotation and the automatic detection that was matched with it (side by side). Note that there are no automatic detections for ‘Miss’ and no manual annotation for ‘BG’. The video plays at X3 speed.

Figure 3 with 3 supplements
Validation of DeePosit accuracy.

Mean accuracy ± SEM of urine (a) and fecal (b) deposits detection by DeePosit, as measured by F1 score across various stages of the experiment. Each ‘+’ or ‘o’ marks the F1 accuracy for a single mouse in a single experiment. No significant difference was found. Similarly, DeePosit accuracy was not significantly affected by the experiment type (c), by the sex of the subject mouse (d), or by the spatial location of the deposition in the arena (arena’s floor was divided into three equal parts) (e). A two-sided Wilcoxon rank sum test was used. (a–c, e) are FDR corrected rank sum tests (Benjamini and Hochberg, 1995). The # at (b) stands for FDR corrected p-value of 0.08. Sixty test videos (24 videos with a male subject mouse and 36 with a female) were used in (a, b, d). Forty-six test videos were used in (c, e) of which 18, 14, 14 videos were SP, SxP, and ESPs accordingly. Mice without manually annotated depositions of the relevant type (either urine or feces) during the relevant period, experiment, or spatial location were ignored (since F1 is not defined in such cases). Since differentiating small urine and feces in thermal videos can be a challenging task even for humans, we evaluated the accuracy of a second human annotator on 25 test videos of CD1 mice (a subset of the full test set) and reported both the accuracy achieved by DeePosit (f) and the second human annotator (g) on these test videos. The mean F1 score, (F1Urine+F1Feces)/2 is 0.86 for the second human annotator and 0.84 for the DeePosit algorithm. To compare our result with another popular object detection approach, we annotated 39 training videos of CD1 mice with bounding boxes to match the YOLOv8 framework. For fairness, we trained both algorithms on the same training set of videos. We tested the accuracy on the test set which includes 60 videos. (h) shows the confusion matrix for DeePosit, while (i, j) show the confusion matrices achieved using YOLOv8 with a single image as input (YOLOv8 Gray) and with three images as input representing time t + 0, t + 10, t + 30 s from each event (YOLOv8 RGB). DeePosit accuracy surpasses YOLOv8 results in both cases. YOLOv8 RGB accuracy surpasses YOLOv8 Gray, suggesting that temporal information is helpful in the detection of urine and feces.

Figure 3—figure supplement 1
Accuracy for small and large detections in C57BL/6 mice.

To check the robustness of our method for different strains of mice and experimental conditions, we tested our algorithm on black C57BL/6 male mice and a white arena (the arena is white in visible light but looks dark in long-wave infrared). (a) Confusion matrices reflecting the accuracy of DeePosit algorithm on 10 SP and 10 SxP videos that were not included in the training set. The mean F1 for C57BL/6 is 0.81. Interestingly, C57BL/6 mice do not produce small urine spots, and hence, all the ‘small urine’ detections were wrong. Ignoring the small urine detections improves the mean F1 score to 0.86.

Figure 3—figure supplement 2
Detection accuracy at various values of ΔTThreshold.

DeePosit accuracy was measured for several values of the preliminary heuristic detection temperature threshold ΔTThreshold. The best results were achieved with a threshold of ΔTThreshold = 1.6°C. However, a good accuracy level (F1 score between 0.88 and 0.89) was observed in all cases between 1.1 to 3.0°C. See Methods for more details.

Figure 3—figure supplement 3
Examples of detections in test videos.

(a, b) Examples of urination and defecation events that were detected and classified correctly. Each pair of columns includes a ground truth detection (to the left) next to the matched automatic detection (to the right), which includes the mask of the detected blob. The overlaid text mentions the video index and the frame index. (b) Urination events that were wrongly classified as background. (c) Urine depositions that were classified as feces. (d) Fecal depositions that were classified as urine.

Figure 4 with 2 supplements
Urine and fecal deposition detection results across tests in CD1 mice.

Each o represents a single detection of urine deposition (a), while each + represents a single detection of fecal deposition (b). A black dot in the center of a circle or a + sign marks that this detection is on the side of the preferred stimulus, defined as the social stimulus in the SP trial, the female in the SxP trial, and the stressed mouse in the ESPs trial. Short green lines mark the start and end of the habituation stage and the end of the trial stage, while short vertical black lines mark the end of minute 14 of the habituation stage. The vertical black line at time = 0 marks the start of the trial stage after stimuli introduction to the arena, while the vertical dashed line marks 4 min after the beginning of the trial. Dynamics plots (right) show mean rate (c) and mean area (d) per minute for both urine and fecal deposits. Error bars represent ± SEM.

Figure 4—figure supplement 1
Urine and fecal deposition detection results across tests in C57BL/6 mice.

DeePosit detections for 10 SP and 10 SxP tests performed by male C57BL/6 mice, that were not included in the training set are shown in (a–h) in a similar manner to Figure 4. We chose to ignore small urine detections (deposition area <1 cm2) as we found that C57BL/6 males do not emit small urine depositions. Dynamic plots (c, d, g, h) show mean rate or mean deposition area, while error bars show ± SEM.

Figure 4—figure supplement 2
Urine and fecal deposition side preference.

A comparison of the mean rate ± SEM (a, b) and area (c, d) of urine (two left bars in each panel) and fecal (two right bars in each panel) depositions made by male (blue bars) and female (red bars) subject mice in each side of the arena. For all three tests, two-sided Wilcoxon rank sum test equal to or smaller than 0.1, and 0.05 was marked with #, and * respectively. Mice with zero urine detections at the relevant test were ignored. The same was done for the feces analysis. For SP, n = 10 mice for male urine, n = 6 feces, n = 4 for female urine, and n = 4 for feces. For SxP, n = 24 for male urine and n = 11 for feces, n = 9 for female urine and n = 3 for feces. For ESPs, n = 17 for male urine and n = 6 for feces, n = 7 for female urine and n = 4 for feces.

Figure 5 with 2 supplements
Comparison between test stages.

Mean rate ± SEM of urination and defecation events detected during habituation start (minutes 1–4), habituation end (minutes 11–14), and trial (minutes 1–4) stages, for male CD1 mice (a), female CD1 mice (b) and male C57BL/6 mice (d). Percent of active mice (mice with at least one detection) across tests during habituation start, habituation end, and trial stages, for CD1 mice (c) and for male C57BL/6 mice (e). Two-sided Wilcoxon rank sum test equal to or smaller than 0.1, 0.05, 0.01, and 0.001 was marked with #, *, **, and ***, respectively. In (a, b, d), only mice with urination in at least one of the periods were included in the urine analysis. Same for feces. In (a, b), n = 13, 27, 19 male CD1 urination in SP, SxP, and ESPs, and n = 21, 28, 21 for defecation. Accordingly, for CD1 females, n = 5, 9, 8 for urination and n = 9, 14, 14 for defecation. In (d), n = 6, 6 for urination and n = 7, 9 defecation in SP and SxP. In (c), the total number of CD1 male mice is 24, 28, 21 in SP, SxP, and ESPs, and the total number of female mice is 15, 16, 17. In (e), the total number of male C57BL/6 mice is 10, 10 in SP and SxP.

Figure 5—figure supplement 1
Comparison of deposition events rate between test stages using 5-min periods.

Mean rate ± SEM of urination and defecation events during habituation start (minutes 1–5), habituation end (minutes 11–15), and trial (minutes 1–5) stages for male CD1 (a), female CD1 (b), and male C57BL/6 mice (d). Percent of active mice (mice with at least one detection) across tests during habituation start, habituation end, and trial for male and female CD1 mice (c) and for male C57BL/6 mice (e). Two-sided Wilcoxon rank sum test equal to or smaller than 0.1, 0.05, 0.01, and 0.001 was marked with #, *, **, and ***, respectively. In (a, b, d), only mice with urination in at least one of the periods were included in the urine analysis. Same for feces. In (a, b), n = 13, 27, 19 male CD1 urination in SP, SxP, and ESPs, and n = 23, 28, 21 for defecation. Accordingly, for CD1 females, n = 6, 10, 9 for urination and n = 12, 15, 15 for defecation. In (d), n = 6, 6 for urination and n = 7, 9 defecation in SP and SxP. In (c), the total number of CD1 male mice is 24, 28, 21 in SP, SxP, and ESPs, and the total number of female mice is 15, 16, 17. In (e), the total number of male C57BL/6 mice is 10, 10 in SP and SxP.

Figure 5—figure supplement 2
Comparison of deposition area between test stages using 4-min periods.

Mean area ± SEM of urine and fecal depositions per minute during habituation start (minutes 1–4), habituation end (minutes 11–14), and trial (minutes 1–4) stages. Statistical comparisons between the three periods (three pair-wise comparisons) were done separately for urine and fecal depositions. Mice with no urine or feces detection in these periods were ignored from the urine or feces analysis, respectively. Two-sided Wilcoxon rank sum test equal to or smaller than 0.1, 0.05, 0.01, and 0.001 was marked with #, *, **, and ***, respectively. Only mice with urination in at least one of the periods were included in the urine analysis. Same for feces. For males, n = 13, 27, 19 for urination in SP, SxP, and ESPs, and n = 21, 28, 21 for defecation. Accordingly, in females, n = 5, 9, 8 for urination and n = 9, 14,14 for defecation during SP, SxP, and ESPs.

Figure 6 with 1 supplement
Comparison of deposition rates between sexes.

The mean rate ± SEM of urination and defecation events for males (blue bars) versus females (red bars) during early (minutes 1–4) (a) and late (minutes 11–14) (b) periods of the habituation stage and during the first minute (c) and minutes 2–4 of the trial stage (d) . A significant difference between the mean rate of urine or fecal depositions (two sided Wilcoxon rank sum test) with p value equal to or smaller than 0.1, 0.05, 0.01, 0.001 was marked with #, *, **, ***, respectively. A significant difference in the distribution of non-depositing animals (Chi-square test) with p value equal to or smaller than 0.1, 0.05, 0.01, 0.001 was marked with !,+,++,+++ respectively. For male mice, n = 24, 28, 21 for SP, SxP, and ESPs. For female mice, n = 15, 16, 17 accordingly.

Figure 6—figure supplement 1
Comparison of mean deposition areas between sexes.

The mean area ± SEM of urine and fecal depositions for males (blue bars) versus females (red bars) during early (minutes 1–4) (a) and late (minutes 11–14) (b) periods of the habituation stage and during the first minute (c) and minutes 2–4 (d) of the trial stage. A significant difference between the mean area of urine or fecal depositions (two sided Wilcoxon rank sum test) with p value equal to or smaller than 0.1, 0.05, 0.01, 0.001 was marked with #, *, **, ***, respectively. A significant difference in the distribution of non-depositing animals (Chi-square test) with p value equal to or smaller than 0.1, 0.05, 0.01, 0.001 was marked with !,+,++,+++ respectively. For male mice, n = 24, 28, 21 for SP, SxP, and ESPs. For female mice, n = 15, 16, 17 accordingly.

Videos

Video 1
IR video of a single ESPs trial of a male mouse with an overlay of the automatic detections.

Automatic detections are overlayed in red for feces, green for urine, and blue for BG. The stressed mouse side of the arena is marked in green, and the object side is marked in red. Counters of the number and area of automatic detections in each side of the arena are written on the top left. The video plays at X8 speed.

Video 2
IR video of a single ESP habituation of a male mouse with an overlay of the automatic detections.

The video shows the habituation part of the experiment in Video 1.

Tables

Table 1
The effect of the test (SP, SxP, and ESPs) on urination or defecation events rates.

Kruskal–Wallis test was used to check if the test type affects the rate of urination or defecation events. p-value equal to or smaller than 0.1, 0.05, 0.01, 0.001 was marked with #, *, **, ***, respectively. For male mice, n = 24, 28, 21 for SP, SxP, and ESPs. For female mice, n = 15, 16, 17 accordingly.

MeasurementHabituation, minutes 1–4Habituation, minutes 11–14Trial, minute 1Trial, minutes 2–4
Male #Urine0.0004***0.38040.0015**0.0301*
Female #Urine0.37770.39430.42870.3918
Male #Feces0.0221*0.11780.30540.9251
Female #Feces0.0635#0.26530.15530.5663
Appendix 1—table 1
The effect of the test on the urine and feces area.

Kruskal–Wallis test was used to check if the test type (SP, SxP, and ESPs) affects the area of urine or feces. p-value equal to or smaller than 0.1, 0.01, 0.001 was marked with #, **, ***, respectively. For male mice, n = 24, 28, 21 for SP, SxP, and ESPs. For female mice, n = 15, 16, 17 accordingly.

MeasurementHabituation, minutes 1–4Habituation, minutes 11–14Trial, minute 1Trial, minutes 2–4
Male urine area0.0003***0.34360.0011**0.0614#
Female urine area0.38470.3740.3990.3124
Male feces area0.0098**0.33150.27380.8938
Female feces area0.23520.51380.15530.571

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. David Peles
  2. Shai Netser
  3. Natalie Ray
  4. Taghreed Suliman
  5. Shlomo Wagner
(2025)
DeePosit, an AI-based tool for detecting mouse urine and fecal depositions from thermal video clips of behavioral experiments
eLife 13:RP100739.
https://doi.org/10.7554/eLife.100739.3