Preliminary Detection of Hot Blobs

Investigation time and bout duration across sexes and tests. The first row shows the arena’s setup, while the second and third rows show the mean (±SEM) time dedicated by male (n=36, blue bars) and female (n=35, red bars) mice to investigate each stimulus during the (a) SP, (b) SxP and (c) ESPs tests. The two leftmost bars in each panel show the total investigation time, while The two middle bars show the time spent on short (≤6s) investigation bouts, and the two rightmost bars in each panel show the time spent on long (>6s) investigation bouts.

The experimental setup and analysis method The experimental setup (a) includes a visible light (VIS) camera, an infrared (IR) camera, and a blackbody set to 37°C. VIS (b) and IR (c) images that were captured at the same moment, a short time after a urine deposition, exemplify that, as the urine is still warm, it appears as highly contrasted blob in the IR image but not in the VIS one. Large urine spots, such as the one shown in (d), may be smeared across the arena’s floor (e), which is one limitation of the use of filter paper for quantifying urination at the end of the experiment. The preliminary detection algorithm is based on subtracting a background image from each frame in the video (f), which allows the detection of hot blobs reflecting the animal itself and urine and feces deposits. The detected blobs are then classified using a transformer-based artificial neural network (g), which gets as its input a time series of patches cropped around the detection and provides its classification as an output. Each three patches in that time series are merged into a single RGB image (see methods). In the confusion matrix presenting the accuracy of the full pipeline for test videos (h), the “Miss” row counts the events that were not detected by the preliminary hot blobs detection and, hence, were not fed to the classifier. The BG (background) column counts the number of automatic detections for which no matching manually tagged event exists in the relevant space and time window. See Methods for more details Figure 2—figure supplement 1. Accuracy for small and large detections. Figure 2—video 1. Video for the events in the confusion matrix. Each part of the video matches a cell in the confusion matrix (h) and shows the events included in this cell (up to 48 events). Each event is shown in a 65×65 pixel window from −11 seconds before the event to +60 seconds afterward (similar to the classifier input). The video shows both the manual annotation and the automatic detection that was matched with it (shown side by side). Note that there are no automatic detections for the “Miss” row of the confusion matrix and no manual annotation for the BG column of the confusion matrix. The video plays at X3 speed.

Urine and fecal deposition detection results across tests.

Each circle represents a single detection of urine deposition (a), while each + represents a single detection of fecal deposition (b). Green lines mark the start and end of habituation and the end of the trial. The vertical black line at time=0 marks the stimuli’s introduction and the trial period’s start. The vertical dotted line marks 4 minutes after the beginning of the trial. The short vertical black lines mark the end of minute 14 of the habituation. A black dot in the center of a circle or a + sign marks that this detection is on the side of stimulus1 (preferred stimulus), defined as the social stimulus in the SP trial, the female in the SxP trial, and the stressed mouse in the ESPs trial. Dynamics graphs show mean rate (c) and mean area (d) per minute of urine and feces. Error bars represent standard error.

Comparison between test periods.

The mean rate of urine and fecal deposition during habituation start (minutes 1-5), habituation end (minutes 11-14), and trial (minutes 1-4) for males (a) and females (b). (c): Percent of active mice (mice with at least one detection) across tests during the same periods as above. Figure 4—figure supplement 1. Urine and fecal depositions area during habituation start, habituation end, and trial.

Comparison of deposition rates between sexes.

The mean rate of urine and fecal depositions in males (blue bars) vs. females (red bars) during early (minutes 1-5) and late (minutes 11-14) minutes of habituation and during the first minute and minutes 2-4 of the trial. A significant difference between the mean rate of urine or fecal depositions (Wilcoxon rank sum test) is marked with * (or # for 0.05<p-value ≤0.1), and a significant difference in the distribution of non-depositing animals (Chi-square test) is marked with + (or ! for 0.05<p-value ≤0.1). Figure 5—figure supplement 1. Comparison of deposition areas between sexes.

Urine and fecal deposition side preference.

A comparison of the mean ±SEM rate ((a) and (b)) and area ((c) and (d)) of urine (two left bars in each panel) and fecal (two right bars in each panel) depositions made by male (blue bars) and female (red bars) subject mice in each side of the arena, for all three tests. Rank sum p-value equal to or smaller than 0.1, 0.05, 0.01, 0.001 was marked with #, *, **, ***, respectively

The effect of the test (SP, SxP, and ESPs) on the urine and fecal deposition rates. Kruskal-Wallis test was used to check if the test type affects the rate of urine or fecal depositions.

The effect of the test on the urine and feces area. Kruskal-Wallis test was used to check if the test type (SP, SxP, and ESPs) affects the area of urine or feces.

Code for computing Two Way Chi-Square Test which was used to compare the distribution of active mice (with at least one detection) in males vs females.

Examples of detections in test videos. (a,b,c) are screenshots taken from Figure 2—video 1. (a): Examples of urination events that were detected and classified correctly. Each pair of columns includes a ground truth detection (to the left) next to the matched automatic detection (to the right), which includes the mask of the detected blob. The overlaid text mentions the video index and the frame index. (b): Urination events that were wrongly classified as background. Note that all of these urine spots are very small. (c): Fecal depositions that were detected and classified correctly.

Accuracy for small and large detections. (a,b) Confusion matrices on test videos with separation between large and small automatic detections. The threshold for large detections is an area of 1cm2 which is 47.3 pixels. Shown percents sum to 1 for each column in (a) and each row in (b). The Large Urination class is correct in 100% of the cases in which it was reported by the classifier while Small Urination is correct in only 75.8% as shown in (b). Most of the confusion between feces and urine spots is for small detections: 7.1% of the Ground Truth (GT) urine events were classified as Small Feces while 0% as Large Feces as shown in (a). Also, 6.7% of the GT feces events were classified as Small Urine while 0% as Large Urine. No GT urine or GT feces event was classified as Large BG.

Urine and fecal depositions area during habituation start, habituation end, and trial. The mean area ±SEM of urine and fecal depositions per minute during habituation start (minutes 1-5), habituation end (minutes 11-14), and trial (first four minutes of trial). Statistical comparisons between the three periods (three pair-wise comparisons) were done separately for urine and fecal depositions. Mice with no urine or feces detection in these periods were ignored from the urine or feces analysis, respectively.

Comparison of mean deposition areas between sexes. The mean area ±SEM of urine and fecal depositions in males (blue bars) vs. females (red bars) during early (minutes 1-5) and late (minutes 11-14) minutes of habituation and during the first minute and minutes 2-4 of the trial. A significant difference between the mean area of urine or fecal depositions (Wilcoxon rank sum test) is marked with * (or # for 0.05<p-value ≤0.1) and a significant difference in the distribution of non-depositing animals (Chi-square test) is marked with + (or ! for 0.05<p-value ≤0.1).