Investigation time across sexes and tests in CD1 mice.

Each of the tests (SP, SxP, and ESPs) is comprised of a 15-minute habituation stage with empty chambers, followed by a 5-minute trial stage in which the stimuli are present in the chambers (a). The Setup row shows schematic representations of the arena for the (b) SP, (c) SxP, and (d) ESPs tests, while the Males and Females rows show the mean (±SEM) time dedicated by male (n=36, blue bars) and female (n=35, red bars) mice to investigate each stimulus during the various tests. The two leftmost bars in each panel show the total investigation time, while The two middle bars show the time spent on short (≤6 s) investigation bouts, and the two rightmost bars show the time spent on long (>6 s) investigation bouts.

The experimental setup and analysis method

The experimental setup (a) includes a visible light (VIS) camera, an infrared (IR) camera, and a blackbody set to 37°C. VIS (b) and IR (c) images that were captured at the same moment, a short time after a urine deposition, exemplify that, as the urine is still warm, it appears as highly contrasted blob in the IR image but not in the VIS one. Large urine spots, such as the one shown in (d), may be smeared across the arena’s floor (e), which is one limitation of the use of filter paper for quantifying urination at the end of the experiment. The preliminary detection algorithm is based on subtracting a background image from each frame in the video (f), which allows the detection of hot blobs reflecting the animal itself and urine and feces deposits. The detected blobs are then classified using a transformer-based artificial neural network (g), which gets as its input a time series of patches cropped around the detection and provides its classification as an output. Each three patches in that time series are merged into a single RGB image (see methods). In the confusion matrix presenting the accuracy of the full pipeline for test videos (h) in CD1 mice, the “Miss” row counts the events that were not detected by the preliminary hot blobs detection and, hence, were not fed to the classifier. The BG (background) column counts the number of automatic detections for which no matching manually tagged event exists in the relevant space and time window. See Methods for more details. The precision, recall, and F1 score for urine detection is 0.90,0.86,0.88 accordingly, and 0.91,0.89,0.90 for feces detection. The mean F1 score: (F 1Urine + F 1Feces)/2 is 0.89.

Figure 2—figure supplement 1.Accuracy for small and large detections in CD1 mice.

Figure 2—video 1Video for the events in the confusion matrix. Each urine or feces event is shown in a 65×65 pixel window from −11 seconds before the event to +60 seconds afterward (similar to the classifier input). The video shows both the manual annotation and the automatic detection that was matched with it (side by side). Note that there are no automatic detections for “Miss” and no manual annotation for “BG”. The video plays at X3 speed.

Validation of DeePosit accuracy

Accuracy of detecting urine (a) and fecal (b) deposits by DeePosit, as measured by F1 score across various stages of the experiment. Each “+” or “o” marks the F1 accuracy for a single mouse in a single experiment. No significant difference was found. Similarly, DeePosit accuracy was not significantly affected by the experiment type (c), by the sex of the subject mouse (d), or by the spatial location of the deposition in the arena (arena’s floor was divided into three equal parts) (e). (a, b, c, e) are FDR corrected Rank sum tests Benjamini and Hochberg (1995). The # at (b) stands for FDR corrected p-value of 0.08. Since differentiating small urine and feces in thermal videos can be a challenging task even for humans, we evaluated the accuracy of a second human annotator on 25 test videos of CD1 mice (a subset of the full test set) and reported both the accuracy achieved by DeePosit (f) and the second human annotator (g) on these test videos. The mean F1 score, (F 1Urine + F 1Feces)/2 is 0.86 for the second human annotator and 0.84 for the DeePosit algorithm. To compare our result with another popular object detection approach, we annotated 39 training videos of CD1 mice with bounding boxes to match the YOLOv8 framework. For fairness, we trained both algorithms on the same training set of videos. (h) shows the confusion matrix for DeePosit, while (i,j) Show the confusion matrices achieved using YOLOv8 with a single image as input (YOLOv8 Gray) and with 3 images as input representing time t+0, t+10, t+30 seconds from each event (YOLOv8 RGB). DeePosit accuracy surpasses YOLOv8 results in both cases. YOLOv8 RGB accuracy surpasses YOLOv8 Gray, suggesting that temporal information is helpful in the detection of urine and feces.

Figure 3—figure supplement 1. Accuracy for small and large detections in C57BL/6 mice.

Figure 3—figure supplement 2. Detection accuracy at various values of ΔTThreshold

Figure 3—figure supplement 3. Examples of detections in test videos.

The effect of the test (SP, SxP, and ESPs) on urination or defecation events rates.

Kruskal-Wallis test was used to check if the test type affects the rate of urination or defecation events.

Urine and fecal deposition detection results across tests in CD1 mice.

Each o represents a single detection of urine deposition (a), while each + represents a single detection of fecal deposition (b). A black dot in the center of a circle or a + sign marks that this detection is on the side of the preferred stimulus, defined as the social stimulus in the SP trial, the female in the SxP trial, and the stressed mouse in the ESPs trial. Short green lines mark the start and end of the habituation stage and the end of the trial stage, while short vertical black lines mark the end of minute 14 of the habituation stage. The vertical black line at time=0 marks the start of the trial stage after stimuli introduction to the arena, while the vertical dashed line marks four minutes after the beginning of the trial. Dynamics plots (right) show mean rate (c) and mean area (d) per minute for both urine and fecal deposits. Error bars represent standard error.

Figure 4—figure supplement 1. Urine and fecal deposition detection results across tests in C57BL/6 mice.

Figure 4—figure supplement 2. Urine and fecal deposition side preference.

Comparison between test stages.

Mean rate of urination and defecation events detected during habituation start (minutes 1-4), habituation end (minutes 11-14), and trial (minutes 1-4) stages, for male CD1 mice (a), female CD1 mice (b) and male C57BL/6 mice (d). (c,e): Percent of active mice (mice with at least one detection) across tests during habituation start, habituation end, and trial stages, for CD1 mice (c) and for male C57BL/6 mice (e)

Figure 5—figure supplement 1. Comparison of deposition events rate between test stages using 5 minutes periods

Figure 5—figure supplement 2. Comparison of deposition area between test stages using 4 minutes periods.

Comparison of deposition rates between sexes.

The mean rate of urination and defecation events for males (blue bars) vs. females (red bars) during early (minutes 1-4) and late (minutes 11-14) periods of the habituation stage and during the first minute and minutes 2-4 of the trial stage. A significant difference between the mean rate of urine or fecal depositions (Wilcoxon rank sum test) is marked with * (or # for 0.05<p-value ≤0.1), and a significant difference in the distribution of non-depositing animals (Chi-square test) is marked with + (or ! for 0.05<p-value ≤0.1).

Figure 6—figure supplement 1. Comparison of deposition areas between sexes.

The effect of the test on the urine and feces area.

Kruskal-Wallis test was used to check if the test type (SP, SxP, and ESPs) affects the area of urine or feces.

Code for computing Two Way Chi-Square Test which was used to compare the distribution of active mice (with at least one detection) in males vs females.

Accuracy for small and large detections in CD1 mice.

(a,b) Confusion matrices on test videos with separation between large and small automatic detections. The threshold for large detections is an area of 1cm2 which is 47.3 pixels. Shown percents sum to 1 for each column in (a) and each row in (b). The Large Urination class is correct in 98.2% of the cases in which it was reported by the classifier while Small Urination is correct in only 84.5% as shown in (b). Most of the confusion between feces and urine spots is for small detections: 2.3% of the Ground Truth (GT) urine events were classified as Small Feces while 0% as Large Feces as shown in (a). Also, 2.4% of the GT feces events were classified as Small Urine while 0% as Large Urine. No GT feces event was classified as Large BG. While feces are usually small, Large Feces detection might occur when two adjacent feces are detected as a single segment or when the detected segment contains both urine and feces.

Accuracy for small and large detections in C57BL/6 mice.

To check the robustness of our method for different strains of mice and experimental conditions, we tested our algorithm on black C57BL/6 male mice and a white arena (the arena is white in visible light but looks dark in long-wave infrared). (a) Confusion matrices reflecting the accuracy of DeePosit algorithm on 10 SP and 10 SxP videos that were not included in the training set. The mean F1 for C57BL/6 is 0.81. Interestingly, C57BL/6 mice do not produce small urine spots, and hence, all the “small urine” detections were wrong. Ignoring the small urine detections improves the mean F1 score to 0.86.

Detection accuracy at various values of ΔTThreshold DeePosit accuracy was measured for several values of the preliminary heuristic detection temperature threshold ΔTThreshold. The best results were achieved with a threshold of ΔTThreshold =1.6°C. However, a good accuracy level (F1 score between 0.88 and 0.89) was observed in all cases between 1.1 to 3.0°C. See Methods for more details.

Examples of detections in test videos.

(a,b): Examples of urination and defecation events that were detected and classified correctly. Each pair of columns includes a ground truth detection (to the left) next to the matched automatic detection (to the right), which includes the mask of the detected blob. The overlaid text mentions the video index and the frame index. (b): Urination events that were wrongly classified as background. (c): Urine depositions that were classified as feces. (d): Fecal depositions that were classified as urine.

Urine and fecal deposition detection results across tests in C57BL/6 mice.

DeePosit detections for 10 SP and 10 SxP tests performed by male C57BL/6 mice, that were not included in the training set are shown in (a-h) in a similar manner to Figure 4. We chose to ignore small urine detections (deposition area<1cm2) as we found that C57BL/6 males do not emit small urine depositions.

Urine and fecal deposition side preference.

A comparison of the mean ±SEM rate ((a) and (b)) and area ((c) and (d)) of urine (two left bars in each panel) and fecal (two right bars in each panel) depositions made by male (blue bars) and female (red bars) subject mice in each side of the arena, for all three tests. Rank sum p-value equal to or smaller than 0.1, 0.05, 0.01, 0.001 was marked with #, *, **, ***, respectively.

Comparison of deposition events rate between test stages using 5 minutes periods

Mean rate of urination and defecation events during habituation start (minutes 1-5), habituation end (minutes 11-15), and trial (minutes 1-5) stages for male CD1 (a), female CD1 (b) and male C57BL/6 mice (d). (c,e): Percent of active mice (mice with at least one detection) across tests during habituation start, habituation end, and trial for male and female CD1 mice (c) and for male C57BL/6 mice (e)

Comparison of deposition area between test stages using 4 minutes periods.

mean area ±SEM of urine and fecal depositions per minute during habituation start (minutes 1-4), habituation end (minutes 11-14), and trial (minutes 1-4) stages. Statistical comparisons between the three periods (three pair-wise comparisons) were done separately for urine and fecal depositions. Mice with no urine or feces detection in these periods were ignored from the urine or feces analysis, respectively.

Comparison of mean deposition areas between sexes.

The mean area ±SEM of urine and fecal depositions for males (blue bars) vs. females (red bars) during early (minutes 1-4) and late (minutes 11-14) periods of the habituation stage and during the first minute and minutes 2-4 of the trial stage. A significant difference between the mean area of urine or fecal depositions (Wilcoxon rank sum test) is marked with * (or # for 0.05<p-value ≤0.1) and a significant difference in the distribution of non-depositing animals (Chi-square test) is marked with + (or ! for 0.05<p-value ≤0.1).