The experimental setup and analysis method The experimental setup (a) includes a visible light (VIS) camera, an infrared (IR) camera, and a blackbody set to 37°C. VIS (b) and IR (c) images that were captured at the same moment, a short time after a urine deposition, exemplify that, as the urine is still warm, it appears as highly contrasted blob in the IR image but not in the VIS one. Large urine spots, such as the one shown in (d), may be smeared across the arena’s floor (e), which is one limitation of the use of filter paper for quantifying urination at the end of the experiment. The preliminary detection algorithm is based on subtracting a background image from each frame in the video (f), which allows the detection of hot blobs reflecting the animal itself and urine and feces deposits. The detected blobs are then classified using a transformer-based artificial neural network (g), which gets as its input a time series of patches cropped around the detection and provides its classification as an output. Each three patches in that time series are merged into a single RGB image (see methods). In the confusion matrix presenting the accuracy of the full pipeline for test videos (h), the “Miss” row counts the events that were not detected by the preliminary hot blobs detection and, hence, were not fed to the classifier. The BG (background) column counts the number of automatic detections for which no matching manually tagged event exists in the relevant space and time window. See Methods for more details Figure 2—figure supplement 1. Accuracy for small and large detections. Figure 2—video 1. Video for the events in the confusion matrix. Each part of the video matches a cell in the confusion matrix (h) and shows the events included in this cell (up to 48 events). Each event is shown in a 65×65 pixel window from −11 seconds before the event to +60 seconds afterward (similar to the classifier input). The video shows both the manual annotation and the automatic detection that was matched with it (shown side by side). Note that there are no automatic detections for the “Miss” row of the confusion matrix and no manual annotation for the BG column of the confusion matrix. The video plays at X3 speed.