High-throughput unsupervised quantification of patterns in the natural behavior of marmosets

  1. Hock E. Tan and K. Lisa Yang Center for Autism Research, Yang Tan Collective, McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, USA
  2. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, USA

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Jean-Paul Noel
    University of Minnesota, Minneapolis, United States of America
  • Senior Editor
    Michael Taffe
    University of California, San Diego, San Diego, United States of America

Reviewer #1 (Public review):

Summary:

The authors demonstrate a fully unsupervised, high throughput (meaning very low human interaction required) approach to quantifying marmoset behavior in unconstrained environments.

Strengths:

The authors provide an approach that is scalable, easy to implement at face value, and highly robust. Currently, most behavioral quantification approaches do not work well on marmosets, or the published examples that do look promising do not scale towards high throughput as demonstrated by the authors.

While marmosets can certainly be a useful translational research model devoid of free behavior quantification, the authors make a compelling point about how this approach can be useful in the study of treatments of emerging marmoset disease models.

Overall this is a very exhaustive manuscript that overcomes significant shortcomings in previous work and speaks highly to the use of marmosets for unconstrained behavioral and neural assessment.

Weaknesses:

Recording marmoset behavior with a 60Hz frame rate is a significant limitation to the approach which is hopefully easily alleviated in the future through better cameras/reconstruction pipelines. Marmosets (in the reviewers' experience) have a lot of motion energy above the 30Hz nyquist limit imposed by this system and are agile to a degree requiring higher frame rates.

The manuscript neglects recent approaches to non-human primate behavioral quantification from other groups that should be included. Simians are simians after all.

As a minor weakness, this reviewer would have liked to see code shared for the reviewers to evaluate, especially pertaining to the high throughput and robustness of the approach.

Reviewer #2 (Public review):

In this manuscript, Menegas et al. classify the "control" behavior of captive marmosets. They combine behavioral screening from video recordings with audio and neural recordings (from the striatum) to better define what can be considered a typical behavioral repertoire for captive marmoset monkeys. A range of analyses is presented, investigating various aspects of behavior, such as social interactions and the detection of atypical individuals.

The manuscript is compelling in many respects, especially due to the richness of the dataset and the breadth of analyses presented. However, a significant issue with the manuscript lies in its writing: the results are conveyed in an overly succinct and superficial manner, and the "Methods" section is nearly absent. Key concepts are often undefined, and the mathematical details underlying the figures are not explained, leaving readers to guess the authors' approach.

Another issue is the vague use of the term "natural behavior." All data presented here appear to have been collected in small cages with limited climbing opportunities and enrichment. Thus, the authors should refrain from using "natural" to describe these conditions.

Below, we elaborate further on the lack of methodological detail. Based on these issues, we believe the manuscript, in its current form, does not meet the scientific standards necessary for proper review. We strongly encourage the authors to undertake an extensive revision.

Major Revision Points:

The methods and results require significantly more detail. A scientific publication should provide readers with enough information to reproduce the study. Here, the detail level is far too low to fully understand, or reproduce, the study, and in many instances, readers are left to guess how the figure panels were produced. Below is a non-exhaustive list of examples illustrating these issues:

(1) "we temporarily placed horizontal cage dividers to reduce the total cage size during data collection": What were the resulting (and initial) cage dimensions?

(2) "After training the network, we hierarchically clustered the latent space": What is the latent space? Based on Figure 2a, it appears related to the network's recurrent layer, but this is not clarified in the text.

(3) Alpha and perplexity parameters: Please define these terms. Since these concepts appear fundamental, readers should not have to consult external references.

(4) "We then traced cluster identities across hierarchical levels": What are hierarchical levels?

(5) "To understand how the input time series data was weighed in the bottleneck layer of the model": What is the bottleneck layer?

(6) "we measured the average attention allocation to previous time points": The authors should define "attention allocation."

(7) "we compared each neuron's firing rate distribution to shuffled data based on the overall frequency of each behavior during the session": This description is insufficient to understand the analysis.

(8) "we hierarchically clustered neurons according to their firing rate enrichment maps": No mathematical explanation is provided for neuron clustering, nor is the concept of a "firing rate enrichment map" clarified.

(9) "Cluster 4 showed higher activity when neurons were 'alone' or 'active'": This is vague and uses unclear jargon (e.g., "neurons alone"). Additionally, no mathematical explanation is provided for assigning neuronal activity to behavioral states.

(10) Figure 3f, right-side panels: The analysis seems to involve cage mate positioning, yet no description is provided.

(11) "we used motion watches to measure activity across all hours": Are these motion-sensitive watches physically attached to the animals? The methodology should be described, including data analysis details.

This list could continue, but we trust the authors understand the point. There is a wealth of analyses and information in this study, but the descriptions are too superficial. We understand that fully describing each analysis may require significant rewriting, including supplementary figures, and will likely make the manuscript longer. This is entirely acceptable, as the ideas presented here are worth the added rigor.

"Natural behavior": Typically, the term "natural" suggests that the dataset reflects the range of behaviors exhibited by animals in the wild. Here, however, recordings were made in a small cage with limited climbing opportunities and enrichment. Under these conditions, it's hard to justify describing the behavior as "natural". In a project aimed at classifying the behavioral repertoire of marmoset monkeys and making this dataset accessible to other laboratories, it would be helpful to include more detailed information about the animals' housing conditions. This might include cage sizes, temperature, humidity, and details on food quantities, quality, and feeding times.

Correlation versus causation: In the section titled "Large-scale data collection reveals variability across days and correlation between cagemates," the authors conclude: "Overall, these results indicate that measurements of animals' behavioral traits depend heavily on their social environment." This interpretation seems incorrect. We know that animal behavior varies throughout the day, with activity peaks typically occurring in the morning and afternoon. Such factors, or other external influences, could induce correlations between animals that are not caused by social interactions.

Figure 4g: What are we intended to conclude from this analysis?

Figure 5: Please specify the type of calls analyzed. For example, did you analyze only long-distance calls (aka 'loud phees' or 'shrills')? In "We split the audio data into 5-minute (non-continuous) segments and found that the average call rate in these segments varied from 0 calls per minute to 60 calls per minute (Fig. 5d-e)," does the call rate refer to individual animals or the entire cage?

"This implies that a high rate of calls in a room can interrupt animals during social resting states and cause them to preferentially exhibit more active/attentive states." Does it? This could simply indicate that more active animals produce more calls.

"We recorded neural activity in the striatum because it is known to contain diverse signals related to movement and social interactions." While I understand that the authors intend to publish neural data separately, a brief discussion of the striatum's role here would be helpful.

Author response:

We would like to thank the editors and reviewers for taking the time to help improve our manuscript. We appreciate the feedback and will definitely increase the level of methodological detail in a revised submission.

Here is a brief summary of our plan to address the points raised by the reviewers. We will respond to the comments in a point-by-point manner when we resubmit a revised manuscript.

Reviewer 1

This reviewer raised a question about the 60 Hz frame rate for recording. We agree that increasing the number of cameras and frame rate would improve the tracking quality, but this would come at the cost of scalability. In the current study (and other concurrent studies in the lab), we recorded from 10-20 families simultaneously to try to sample the distribution of behavioral responses to stimuli observed in animals in our colony. This was only possible logistically because of the lightweight equipment design allowing us to record data from animals without large disruptions to their home-cage environment.

One strategy for acquiring higher-resolution data is to build a small number of enclosures that are fully surrounded by cameras, and to cycle animals through these enclosures (1). However, this strategy limits throughput by reducing the number of animals per day that can be studied. If the size and cost of cameras and computers decreases in the future, then this recording strategy will be scalable to the whole-colony level. For our current study and analysis, we are limited by the resolution of our dataset. We do believe that our data (although not a perfect 3d reconstruction or an extremely high frame rate) is sufficient to label behavioral states with high accuracy. We will add a figure to more clearly show that behavioral state data can be accurately inferred from this imperfect data, which has also been recently highlighted by other groups (2).

Additionally, with recent progress in the application of deep learning to animal pose tracking, new models can infer 3d pose dynamics from 2d data (3) and leverage spatiotemporal structure to clean up noisy data (4). We believe that other groups will be able to use these types of approaches to extract much more value from this dataset. So, in summary, we do understand the concern related to reconstruction quality and will 1) more clearly define the usefulness of our current models, 2) release our data and code so that others can build upon it or repurpose it, and 3) plan future experiments with higher camera count and frame rate as permitted by logistical constraints.

Reviewer 2

This reviewer asked for an increased level of methodological detail. We will try to address this in a few ways:

(1) Code and data sharing. We believe that many of the questions related to the methodology will be best answered by sharing the data and code directly. Because there is a large amount of code associated with this manuscript, it is impractical to list every step and every parameter in the paper. Along with our revised manuscript, we will make our data and code publicly available. That said, we will improve our description of key parameters in the paper as the reviewer suggested.

(2) More detailed Methods section. The reviewer asked us to provide more methodological detail. We understand that this is currently a weakness of our manuscript, and we will focus on addressing it. For instance, the reviewer rightly points out that we did not describe the motion watches used to generate the data in Figure S7. We will address this.

(3) Simplify the manuscript. The paper currently has 22 figures, and further analysis could be done based on the results shown in any of them. For instance, this reviewer asked us to add a comparison across females and males (similar to our comparison of juveniles and adults). While we plan to add that analysis, we recognize that there are several figures/panels that are not closely related to our intended goal of describing the patterns we found in our large dataset. We will simplify the manuscript by removing some excess figures/panels and focus on describing the parts of the analysis that are crucial to our conclusions in greater detail.

(4) More careful language. This reviewer pointed out that there were some inaccuracies with our descriptive language. For instance, we used the term "natural" behavior to describe the behavior of animals in captivity, which may more accurately be described as their home-cage behavior. We will be more careful to align our language to the standard for the field. For instance, several studies refer to unrestrained behavior in a laboratory setting as "spontaneous" behavior rather than "natural" behavior (5). In our case, the data consists of both spontaneously occurring behavior and responses to a set of stimuli. We will make sure that the descriptions are more precise in the revised manuscript.

(1) Bala, P. C. et al. Automated markerless pose estimation in freely moving macaques with OpenMonkeyStudio. Nat Commun 11, (2020).

(2) Weinreb, C. et al. Keypoint-MoSeq: parsing behavior by linking point tracking to pose dynamics. bioRxiv (2023) doi:10.1101/2023.03.16.532307.

(3) Gosztolai, A. et al. LiftPose3D, a deep learning-based approach for transforming two-dimensional to three-dimensional poses in laboratory animals. Nat Methods 18, 975–981 (2021).

(4) Wu, A. et al. Deep Graph Pose: a semi-supervised deep graphical model for improved animal pose tracking. Adv Neural Inf Process Syst 33, 6040–6052 (2020).

(5) Levy, D. R. et al. Mouse spontaneous behavior reflects individual variation rather than estrous state. Curr Biol 33, 1358-1364.e4 (2023).

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation