Pose estimation for marmosets in group housing

a) Schematic of home cage video recording, and keypoint label accuracy (pixel-based root mean square error (RMSE) across an increasing number of labeled frames). Bars indicate standard deviation across training sessions. b) Example time series of all keypoints (Y coordinate), colored based on animal hair color. c) Example labeled image, and ID accuracy based on 8000 labeled animals drawn from 112 different camera installations. “Missing” indicates that an animal was not labeled and “Switched” indicates that an animal was labeled with the wrong identity. d) Example time series of all keypoints (X and Y coordinates), colored based on animal hair color. e) Raw data time series (z-scored) across 6 minutes (left), and PCA of raw data (right). f) Egocentric data time series (z-scored) after normalization using X/Y position and orientation (left), and PCA of egocentric data (right).

Hierarchical description of natural behavior using deep learning

a) Schematic of network structure. Latent state at each time point used for downstream analysis. b) Example ethogram across 60 seconds with colors indicating behavioral state as defined by the cluster ID. c) Latent space clustering using t-SNE with decreasing values of a and perplexity to produce a hierarchical clustering (with colors representing different clusters). d) Hierarchical organization of behavioral states across levels of analysis. with the average usage for each state(% of total time) shown as a bar graph. e) Hierarchical organization of behavioral states, with colors representing cluster ID. f) Average state duration at each level of analysis, with each point representing the average state duration from one video session. Bars indicate quartiles (25%, 50%, 75%) and open circles indicate the median across sessions.

Striatal neurons reflect behavioral states defined by deep learning

a) Schematic of silicon probe implant design and on-head wireless recording. b) Example image from video of two animals living in the same cage, with one of those animals wearing a wireless logger. c) Example of spike data from 1 minute of recording from 128 channels (spikes sorted into 25 units). d) All recorded neurons clus­ tered by similarity, with colors indicating the highest-level clusters of neurons. e) For each of the 4 high level clusters: the regions in the behavioral latent space with enriched activity (left), the normalized change in firing rate in each of the highest-level behavioral clusters (mid-left), the transitions with enriched activity (mid-right), and summary of the enriched states and transitions (right). *** indicates p < 10-20 using a one-sample t-test to determine if neurons had a higher firing rate during each behavioral state. f) Data from four example neurons showing their firing rate enrichment (left) and the normalized change in firing rate based on the position of the other animal (right). Position is displayed as a distance relative to the head radius (h.r.) of the animal. One example neuron was selected from each of the 4 highest-level clusters.

Behavioral distribution varies across days and is highly correlated to cagemates

a) Example behavioral state usage from 4 different days sampled from two cagemates. Circle diameter indicates the frequency of each behavioral state. b) Distribution across all animals segmented into 10 quantiles with error bars representing the 25%, 50%, and 75% boundaries in the data. c) Distribution of behavioral states across 5400 sessions sampled from 112 animals, sorted based on PC1 from principal component analysis across sessions. d) Example distribution of behavioral states across 58 sessions sampled from a single animal, also sorted based on PC1. e) Correlation of time spent active across all animals (each dot represents one paired observation of an animal and one cagemate. f) Top: distribution of PC1 and PC2 across 45 sessions from an example family of 4 marmosets. Bottom: total time spent in active behavioral states across the 45 sessions, with colors representing animal ID. g) Correlation between measurements of activity normalized by cagemate activity (X-axis) or normalized by average colony activity (Y-axis). Each dot represents single animal average.

Room call rate varies across days and is highly correlated to behavior

a) Schematic of microphone distribution and example of raw amplitude data converted to spectrogram data. b) Example audio data classified by a supervised neural network as time containing no calls (15 minutes total of concatenated spectrograms, top), and containing calls of any type (15 minutes total of concatenated spectro­ grams, bottom). c) Distribution of time spent near other animals from 5 minute clips (n = 3808 clips) from 36 cages_ d) Distribution of average call rates across all 5 minute clips (n = 3808 clips) from 36 cages_e) Example series of non-continuous 5 minute clips comparing the time spent near other animals (top) and total call rate in the room (bottom). f) Comparison of cages with 2 animals (top) or 4 animals (bottom). g) Correlation between behavioral state usage and call rate across n=52 pair housed animals, with * indicating p<10·2 and *** indicating p<10-10 using a one-sample t-test to determine if each behavioral state was positively/negatively correlated with call rate across animals. h) Top: Correlations between behavioral states and call rate from one example animal, colored based on behavioral state. Each point represents one 5 minute clip. Bottom: Hierarchical organization of the correlation between behavioral state usage and room call rate. Bars indicate the quartiles (25%, 50%, and 75%) and circles indicate the median across animals.

Comparing across behavioral conditions induced by stimuli

a) Change in total time spent active during the presentation of each stimulus. * indicates p<13, ** indicates p<1Q-6, and*** indicates p<10-20using a one-sided t-test to determine whether each stimulus increased or decreased the time spent active. b) Change in time spent in the region of the cage where the stimulus is placed (for each stimulus). * indicates p<10-3, ** indicates p<1Q-6, and *** indicates p<10-20 using a one-sided t-test to determine whether each stimulus increased or decreased the time spent near the stimulus. c) Left: average usage of behavioral states when Toy or Tablet are presented. d) Difference in usage between conditions and discriminability between conditions based on usage.* indicates p<10 3,** indicates p<106, and*** indicates p<10-20 using a paired t-test to determine whether there was a difference in behavioral state usage between conditions. Paired t-tests compare data from the same animal in different conditions. e) Left: average correlation between behavioral state usage and time spent interacting with stimuli in each condition. f) Difference in correla­ tions between conditions and discriminability between conditions based on these correlations. * indicates p<1Q·3, ** indicates p<1Q-6, and *** indicates p<10-20 using a paired t-test to determine whether there was a difference in behavioral state correlations between conditions. Paired t-tests compare data from the same animal in different conditions.

Discriminating between conditions using outlier detection

a) Strategy for assigning outlier score based on a probability distribution defined by “Condition 1” and then applied to either held-out data from “Condition 1” or to held-out data from “Condition 2”. b) Outlier scores for each behavioral state across 32 age-matched animals in Condition 2. c) Discriminability between example conditions (Toy and Touchscreen) based on 2-class classification. d) Discriminability between example condi­ tions (Toy and Touchscreen) based on 1-class outlier detection. Bars indicate quartiles (25%, 50%, 75%) and open circles indicate the median score for each group. e) Schematic of 2-class classification. See methods for full detail. f) Comparison of auROC scores (from held-out data) across models trained using a different number of animals (n=4 to n=32). g) Schematic of outlier detection. See methods for full detail. h) Comparison of auROC scores (from held-out data) across models trained using a different number of animals (n=4 to n=32).

Outlier detection has stable performance regardless of heterogeneity in the dataset

a) Scores assigned to inliers (blue) or outliers added to Condition 2 (red) using 2-class classification, with an increasing number of outliers artificially added to the training set. •• indicates p<1012 using at-test to determine whether the scores differed when the number of outliers added to the training set was O or 16. b) Scores assigned to inliers (blue) or outliers added to Condition 2 (red) using outlier detection, with an increasing number of outliers artificially added to the training set. ** indicates p<10-12 using at-test to determine whether the scores differed when the number of outliers added to the training set was O or 16. c) Schematic of 2-class classification with heterogeneity artificially added to Condition 2. See methods for full detail. d) Change in score assigned to outliers as a function of the number of outliers artificially added to the training data. e) Schematic of outlier detection with heterogeneity artificially added to Condition 2. See methods for full detail. f) Change in score assigned to outliers as a function of the number of outliers artificially added to the training data.

PCA in raw data compared to egocentric data

a) Example timeseries of raw postural time series data (21 keypoints) plotted across 6 minutes. b) Example time series of egocentric data (normalized by position and body angle) plotted across 6 minutes. c) For the first four raw data principal components (PCs), the distribution within the image (left), example regression with features (middle), and average posture at different scores (right). d) For the first four egocentric data principal compo­ nents (PCs), the distribution within the image (left), example egocentric body part coordinate colored based on score (middle), and average posture at different scores (right).

Depth estimates for top-down home-cage video

a) Comparison of methods for depth estimation using single view, stereo vision, or stereo vision with mesh grid extraction. b) Example single-view image, and comparison of score (pixel-based) between human-annotated frames labeled with binary O (not climbing) or 1 (climbing) labels. c) Example depth time series based on single-view (30 minute trace). d) Example stereo vision image, and comparison of score (depth-based) between human-annotated frames labeled with binary O (not climbing) or 1 (climbing) labels. e) Example time series (depth estimate) based on stereo vision. f) Example mesh grid based on stereo depth estimates, and compari­ son of score (depth-based) between human-annotated frames labeled with binary O (not climbing) or 1 (climbing) labels. g) Example mesh grid time series, with depths sorted at each time point. h) Example plots of uncorrected position (left) and depth-corrected position (right), with color corresponding to distance from the camera.

Hierarchical behavioral state representation

a) Latent space clustered into 4 components with a median duration of 4.0 seconds. b) Latent space clustered into 10 components with a median duration of 3.0 seconds. c) Latent space clustered into 18 components with a median duration of 1.5 seconds. d) Latent space clustered into 30 components with a median duration of 1.0 seconds. e) Latent space clustered into 80 components with a median duration of 0.5 seconds.

Mapping human description and features to clusters in latent space

Attention distribution varies depending on previous states

a) Schematic of recurrent neural network model. b) Attention distribution concentrated on recent time points when previous states have high variability (top) or attention distribution distributed widely when previous states have low variability (bottom). c) Top: Example time series of the fraction of attention allocated to previous states (>1 second prior) over the course of one minute of video. d) Average attention allocation across behavioral states where each row represents one behavioral state and color corresponds to average attention. e) Attention to previous states plotted as a function of the consistency of previous states. Consistency is defined where 1.0 is the case where all states in the last 20 seconds are identical. f) Attention to previous states plotted for discrete ranges of past state consistency. Data is divided into ten groups, starting with the range (0 to 0.1) and ending with the range (0.9 to 1.0).

Measuring correlation between cagemates across timescales

a) All sessions plotted by PC1 and PC2. Colors correspond to the frequency of “Climbing”, “Active”, “Social”, and “Alone” behavioral states. b) Correlation measurements with different bin sizes ranging from 1 second to four hours. Gray circles represent correlations obtained using shuffled data (with the same time bin sizes). c) Distri­ bution of behavioral state usage, and PCA across sessions. d) Time spent active relative to cagemates, with animals significantly less active than cagemates (p<0.001) shown in red and animals significantly more active than cagemates (p<0.001) shown in blue. Paired t-tests were used to compare simultaneously collected activity between animals’ and their cagemates. Cagemate data was averaged to generate a single measurement for each session.

Sleep cycle monitoring

a) Activity monitoring using a motion watch. Continuous data from one week of monitoring is plotted for each animal with n = 44 animals. Blue bars indicate nighttime (lights off) and yellow bars indicate daytime (lights on). Darker areas in the plot indicate higher activity. In cases where animals removed or disabled motion watches prior to the end of the experiment, remaining values were replaced by NaN. b) Average activity trace across 44 animals. c) Average 24 hour cycle (based on data from 1 week) plotted for each animal. d) Raw motion watch data (a.u. = arbitrary units) separated into 4 time windows of 6 hours. e) Filtered motion watch data representing an estimate of time spent at rest(%) separated into 4 time windows of 6 hours.

Call localization using array of microphones

a) Example time series (35 seconds) of raw data from 4 microphones placed near different cages. Dotted lines indicate calls labeled by neural network. b) Amplitudes (running average of the absolute value of the spectro­ gram) used to assign each call to the microphone with the largest amplitude recording. c) Spectrograms depict­ ing the distribution of signal across frequencies. d) Four individual call spectrograms, where the brightness indicates the amplitude measurement of the call in each microphone, relative to the amplitude measurement from the other microphones.

Supervised categorical audio classification

Comparison of 2-animal cage and 4-animal cage correlation patterns

a) Correlation between behavioral state usage and call rate from cages with 2 adult animals (filled circles indicate positive correlation, open circles indicate negative correlation). b) Data from n = 52 animals, with bars indicating quartile boundaries (25%, 50%, and 75%) and circles indicating median across animals. c) Correlation between behavioral state usage and call rate from cages with 4 animals (filled circles indicate positive correla­ tion, open circles indicate negative correlation). d) Data from n = 43 animals, with bars indicating quartile bound­ aries (25%, 50%, and 75%) and circles indicating median across animals. e) Correlation between data from cages with 2 animals and data from cages with 4 animals. f) Correlation from three example animals that were recorded in both conditions (both in the 2-animal condition and the 4-animal condition).

Behavioral responses to a panel of stimuli

a) Recording timeline, which was repeated every 3 months. b) Example correlations between behavioral state usage and time spent near tablet (using 1-minute bins). c) Difference in behavioral state usage, comparing periods of “high” interaction (>90% time spent near stimulus per minute) and periods of “low” interaction (<10% time spent near stimulus per minute). d) Schematic of stimulus delivery. e) Distribution of time spent near tablet across 91 non-age-matched animals (quantified based on% time spent near stimulus per session). f) Distribu­ tion of time spent near tablet quantified based on % time spent near stimulus per minute. g) Comparison of cage location during times of “high” interaction with tablet (>90% per minute) and times of “low” interaction with tablet (<10% per minute). h) Difference in behavioral state usage between times of “high” interaction with tablet (>90% per minute) and “low” interaction with tablet (<10% per minute).

Comparing frequency-based metrics and correlation-based metrics

a) Change in behavioral state usage across hierarchical levels (purple indicates an increase in state usage and grey indicates a decrease in state usage). Comparisons are relative to control (bed-only condition with no other stimuli). b) Change in behavioral state correlations across hierarchical levels (blue indicates an increase in correlation and red indicates a decrease in correlation). Comparisons are relative to control (bed-only condition with no other stimuli). c) Example comparing the usage of two behavioral states when animals were presented with toy (green) or touchscreen (purple).* indicates p < 10·6, ** indicates p < 10-12, and*** indicates p < 10-20 using paired t-tests to determine whether the same animals used each behavioral state more or less in each condition. d) Example comparing the correlation between two behavioral states with the toy (green) or touch­ screen (purple) stimuli.* indicates p < 10·6, ** indicates p < 10-12, and*** indicates p < 10·20using paired t-tests to determine whether the same animals had higher or lower correlation measurements for each behavioral state in each condition.

Outlier detection using data from non-age-matched animals

a) auROC scores from 2-class classification across an increasing number of animals (non-age-matched) used in training data. Score indicates the discriminability between each pair of stimulus-response conditions. b) auROC scores from 1-class outlier detection across an increasing number of animals (non-age-matched) used in training data. Score indicates the discriminability between each pair of stimulus-response conditions.

Differences in behavioral distribution between juvenile and adult animals

a) Comparison of the fraction of time spent upright/climbing per session between juvenile and adult animals (left), and a paired comparison across cagemates of different ages (right). b) Comparison of the fraction of time spent active/running per session between juvenile and adult animals (left), and a paired comparison across cagemates of different ages (right). c) Comparison of the fraction of time spent social/near other per session between juvenile and adult animals (left), and a paired comparison across cagemates of different ages (right). d) Comparison of the fraction of time spent alone per session between juvenile and adult animals (left), and a paired comparison across cagemates of different ages (right). e) Normalized difference in usage of behavioral states at level 5. f) Example graphical representation. g) Normalized difference in usage of behavioral states across hierarchal levels. h) Example graphical representation. Statistical tests were unpaired (left) or paired using data simultaneously collected from cagemates (right).