Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.
Read more about eLife’s peer review process.Editors
- Reviewing EditorGordon BermanEmory University, Atlanta, United States of America
- Senior EditorKate WassumUniversity of California, Los Angeles, Los Angeles, United States of America
Reviewer #1 (Public Review):
Summary:
The manuscript provides a novel method for the automated detection of scent marks from urine and feces in rodents. Given the importance of scent communication in these animals and their role as model organisms, this is a welcome tool.
Strengths:
The method uses a single video stream (thermal video) to allow for the distinction between urine and feces. It is automated.
Weaknesses:
The accuracy level shown is lower than may be practically useful for many studies. The accuracy of urine is 80%. This is understandable given the variability of urine in its deposition, but makes it challenging to know if the data is accurate. If the same kinds of mistakes are maintained across many conditions it may be reasonable to use the software (i.e., if everyone is under/over counted to the same extent). Differences in deposition on the scale of 20% would be challenging to be confident in with the current method, though differences of the magnitude may be of biological interest. Understanding how well the data maintain the same relative ranking of individuals across various timing and spatial deposition metrics may help provide further evidence for the utility of the method.
Reviewer #2 (Public Review):
Summary:
The authors built a tool to extract the timing and location of mouse urine and fecal deposits in their laboratory set up. They indicate that they are happy with the results they achieved in this effort.
The authors note urine is thought to be an important piece of an animal's behavioral repertoire and communication toolkit so methods that make studying these dynamics easier would be impactful.
Strengths:
With the proposed method, the authors are able to detect 79% of the urine that is present and 84% of the feces that is present in a mostly automated way.
Weaknesses:
The method proposed has a large number of design choices across two detection steps that aren't investigated. I.e. do other design choices make the performance better, worse, or the same? Are these choices robust across a range of laboratory environments? How much better are the demonstrated results compared to a simple object detection pipeline (i.e. FasterRCNN or YOLO on the raw heat images)?
The method is implemented with a mix of MATLAB and Python.
One proposed reason why this method is better than a human annotator is that it "is not biased." While they may mean it isn't influenced by what the researcher wants to see, the model they present is still statistically biased since each object class has a different recall score. This wasn't investigated. In general there was little discussion of the quality of the model. Precision scores were not reported. Is a recall value of 78.6% good for the types of studies they and others want to carry out? What are the implications of using the resulting data in a study? How do these results compare to the data that would be generated by a "biased human?"
5 out of the 6 figures in the paper relate not to the method but to results from a study whose data was generated from the method. This makes a paper, which, based on the title, is about the method, much longer and more complicated than if it focused on the method. Also, even in the context of the experiments, there is no discussion of the implications of analyzing data that was generated from a method with precision and recall values of only 70-80%. Surely this noise has an effect on how to correctly calculate p-values etc. Instead, the authors seem to proceed like the generated data is simply correct.
Reviewer #3 (Public Review):
Summary:
The authors introduce a tool that employs thermal cameras to automatically detect urine and feces deposits in rodents. The detection process involves a heuristic to identify potential thermal regions of interest, followed by a transformer network-based classifier to differentiate between urine, feces, and background noise. The tool's effectiveness is demonstrated through experiments analyzing social preference, stress response, and temporal dynamics of deposits, revealing differences between male and female mice.
Strengths:
The method effectively automates the identification of deposits
The application of the tool in various behavioral tests demonstrates its robustness and versatility.
The results highlight notable differences in behavior between male and female mice
Weaknesses:
The definition of 'start' and 'end' periods for statistical analysis is arbitrary. A robustness check with varying time windows would strengthen the conclusions.
The paper could better address the generalizability of the tool to different experimental setups, environments, and potentially other species.
The results are based on tests of individual animals, and there is no discussion of how this method could be generalized to experiments tracking multiple animals simultaneously in the same arena (e.g., pair or collective behavior tests, where multiple animals may deposit urine or feces).