JABS data acquisition module (JABS-DA).

The JABS-DA consists of hardware and software for video data acquisition and processing. (A) JABS pipeline highlighting individual steps towards automated behavioral quantification. (B) Detailed example of JABS data acquisition, including a picture of the monitoring hardware, architecture of the real-time monitoring app, and screenshots from videos taken during daytime and nighttime. The open field arena is shown from the outside (left), and a screenshot of video data is shown on the right. JABS-DA blocks visible light to the camera and only collects data using IR illumination, which produces uniform data during day and night. The JABS-DA computer hardware and software (middle) allow streaming of video data from edge devices, which enables remote welfare checks and web based experiment setup and monitoring. Data compression is handled on these edge devices.

JABS data acquisition module (JABS-DA)

consists of a web-based control system for recording and monitoring experiments. (A) JABS pipeline. (B-E) Screenshots from Angular web client that allows monitoring of multiple JABS Acquisition units in multiple physical locations can be seen on one screen (B). Dashboard view allows monitoring of all JABS units and their status, Device Status provided detailed data on individual devices (C) Recording session dashboard allows initiation of new experiments (D), and remote welfare view allows live video to be streamed from each unit (E).

JABS-AL is a behavior annotation and classification module that allows training classifiers with sparse labels.

(A) JABS pipeline highlighting individual steps towards automated behavioral quantification. (B) Screenshot of the python based open source GUI application used for annotating multiple videos frame by frame. One can annotate multiple mouse and for multiple behaviors. The labeled data is used for training classifiers using either random forest or gradient boosting methods. Adjustable window size (number of frames on the left and right of the current frame) to include features from a window of frames around the current frame. The labels and predicted labels are displayed at the bottom. (C) A sample workflow for training a typical classifier. Multiple experts can sparsely label videos to train multiple classifiers for the same behavior. These classifiers can be compared and experts can consult to iterate through the training process

JABS Benchmarks: Selecting hyper-parameters and benchmarking JABS classifiers using grooming dataset.

(A) JABS pipeline highlighting individual steps towards automated behavioral quantification. Using feature window size, type of classification algorithm and the number of training videos as our benchmarking parameters: (B) Accuracy of JABS classifiers trained using different window size (in frames) features. Each boxplot shows the range of accuracy values for different number of training videos and type of classification algorithms. (C, D) The effect of increasing the training data size on Accuracy and AUROC score of the JABS classifiers. (E) ROC curves for the JABS classifier trained with the window size of 60, XGB algorithm and varying training data size. (F) True positive rate at 5% false positive rate corresponding to the JABS classifier from panel (E) as the amount of training data is changed. (G) Comparing the performance of JABS based classifiers with a 3D Convolutional neural network (CNN) and JAABA based classifiers for different training data sizes. JAABA and CNN results were adopted from [9].

Frame based comparison of classifiers from different annotators but trained for the same behavior.

(A) JABS pipeline highlighting individual steps towards automated behavioral quantification. (B, C) Two sample ethograms for the left turn behavior showing variation in behavior inference for two different annotators. (D, G) Kernel density estimate (KDE) of the percentage of frames predicted to be a left turn and a right turn respectively, by each annotator across all the videos. The major discrepancy between the two annotators is that A-2 systematically predicts larger number of frames as behavior compared to A-1. (E, H) Confusion matrix showing the agreement between predictions of two classifiers over all the videos in the strain survey for left and right turn behavior. (F, I) Venn diagram capturing the frame-wise behavior agreement between the two annotators for left and right turn behavior.

Bout based comparison of classifier predictions from different annotators but trained for the same behavior.

(A) JABS pipeline highlighting individual steps towards automated behavioral quantification. (B) Ethogram depicting frame-wise left turn predictions for annotators A1 (red) and A2 (blue). (C) Ethograph corresponding to the ethogram in panel (B) capturing the bout level information as a bipartite network. The nodes represent bouts with node size & color proportional to the bout length & annotator respectively. Edge weights captures the fraction of bout overlap between two bouts predicted by different annotators for the same behavior. Edge weight and node size with zero value indicate missed bouts by an annotator. These have been given a small positive value for visualization purposes only. (D-E) Bout length distribution of annotators A1 & A2 for left and right turn behavior. (F) The mathematical definition of the average bout agreement between two annotators, where w(u, v) represents weight between nodes u and v (uU, vV) in the ethograph G (U,V, E) and w is the bout overlap threshold (0.5 fixed for our study). (G) overview of the workflow for stitching and filtering at the bout level. (H, I) Hyper-parameters tuning to find optimal filtering and stitching thresholds. (J) Sample ethogram and its corresponding ethograph before and after applying stitching and filtering. (K) Inter-annotator agreement in frame wise predictions underestimates the agreement whereas the bout wise comparison post filtering and stitching captures the overall agreement in a more biologically meaningful way.

JABS-AI (Analysis and Integration) module: Strain-level behavioral phenotyping across genetically diverse mouse populations.

(A) JABS pipeline highlighting individual steps towards automated behavioral quantification. (B) Heatmap showing Z-transformed behavioral scores for aggregate phenotypes measured at three time points (5, 20, and 55 minutes) across the JABS600 strain survey. Each column represents a genetically distinct mouse strain, and each row corresponds to a specific behavioral measure including locomotion (turning left/right), exploration (rearing), self-directed behaviors (grooming, scratching), and escape responses. Color intensity indicates deviation from the population mean, with red representing increased behavioral expression and blue representing decreased expression relative to the strain average. Z-score thresholding (|Z-score| > 1) was applied to all behavioral measures, with escape behaviors displayed separately using modified thresholding parameters to preserve detection of outlier strains exhibiting rare but phenotypically important escape responses. Behavioral measures are stratified by time point (T5, T20, T55) to capture temporal dynamics of phenotypic variation across genetic backgrounds.

JABS-AI (Analysis and Integration) module: Large-scale GWAS investigation of different mouse behaviors utilizing the JABS1200 dataset

: (A) Statistical power comparison between two datasets (JABS600 vs JABS1200) at the genome-wide significance threshold of 2.4e-07. The y axis shows how power varies with SNP effect size (x axis) (B) Aggregate (55 min) phenotypes’ heritability (PVE) estimates. (C) Lower Triangular Matrix Representation of Genotypic Correlation Among all of the 55 minute aggregate phenotypes using a bi-variate linear mixed model, (D) Linkage disequilibrium (LD) blocks size, along with the mean genotype correlations for SNPs at varying genomic distances. (E) Aggregated GWAS results graphically represented via a comprehensive Manhattan plot. Peak SNP clusters, extracted from (F), determine color differentiation; SNPs within the same LD block are color-coordinated to match their peak SNP. Each SNP is assigned the minimum p-value derived from all phenotypes. (F) An inclusive heatmap exhibiting all the significant peak SNPs for each phenotype. Each row, representing an SNP, is color-coordinated according to the allocated cluster within the k-means clustering. The color scheme originating from the k-means cluster is also applied in panel E of this analysis.

JABS-AI (Analysis and Integration): A web application for sharing the JABS classifiers and automated downstream genetic analysis.

(A) Illustrates the fundamental workflow of the web application, beginning with the user employing a classifier trained via the JABS active learning application.The user subsequently deposits this classifier into our web application, which performs comprehensive automated analyses, encompassing both behavioral and genetic aspects, on dataset selected by the user from our curated strain survey collection (JABS600, JABS1200) accessible via a dropdown menu. The outcome of these analyses, encapsulating detailed behavioral patterns and genetic correlations, are then dispatched to the user’s designated email address within a short timeframe. (B) Screenshot of the webapp highlighting the tabular presentation of the repository of classifiers developed in our laboratory, complete with pertinent metadata such as the date of creation, training hyperparameters, and user ratings. When any two classifiers are selected, the application offers the option to analyze the genetic correlations between the phenotypes corresponding to the selected classifiers, in conjunction with their heritability scores.

Data used for grooming benchmark.

Number of videos (first column), and number of annotated frames (second and third column).

Summary of framewise behavioral phenotypes and their definitions.

Each value corresponds to the total duration (in seconds) of the indicated behavior during the specified time window, averaged across all analyzed videos.

JABS data acquisition module: Environmental parameters in the arena.

(A) JABS pipeline highlighting individual steps towards automated behavioral quantification. (B) Carbon dioxide concentrations and (C) ammonia concentrations were both much higher in the standard wean cage than in the JABS arena. Carbon dioxide was also compared to room background levels. (D) Temperature and (E) humidity measured at floor level in JABS arenas and a standard wean cage compared to room background across a 14 day period. (F) Average body weight as percent of start weight in each JABS arena and wean cage across the 14 day period. (G) Food and (H) water consumption shown as grams per mouse per day for one JABS arena and one wean cage for a 14 day period.

Representative hematoxylin and eosin (H&E) stained tissue sections from mice after spending 14 days in the JABS arena or control wean cage.

Tissues selected for examination (eye, lung, trachea and nasal passages) are those expected to be most affected if the mice lived in a space with inadequate air flow. All tissues appeared normal.

List of JABS features.

Per strain F1 score and accuracy for Grooming dataset.

F1 score and accuracy for mice with different coat colors in the grooming dataset.

Behavioral phenotypes annotated by different annotators (A1, A2).

Each phenotype measures a specific metric related to bouts of the indicated behavior during the first 55 minutes of the video averaged across all the analyzed videos.

JABS 600 Strain Distribution by Sex.

JABS 1200 Strain Distribution by Sex.

Classifiers trained by JABS with their respective window sizes and F1 scores.

JABS behavior characterization module: Univariate analysis captures the combined effect of sex and strain on the aggregate phenotypes using JABS600 dataset.

(A) JABS pipeline highlighting individual steps towards automated behavioral quantification. (B) The LOD scores (−log10(qvalue)) and effect sizes are shown at left and right panels, respectively. In the left panel, the number of *s represents the strength of evidence against the null hypothesis of no sex effect, while + represents a suggestive effect. In the right panel, the color (red for female and blue for male) and area of the circle (area being proportional to the size of the effect) represent the direction and magnitude of the effect size. Strains with a sex difference in at least one of the aggregated phenotypes are colored pink.