Novel and optimized mouse behavior enabled by fully autonomous HABITS: Home-cage assisted behavioral innovation and testing system

eLife Assessment

This manuscript describes a novel approach for assessing cognitive function in freely moving mice in their home-cage, without human involvement. The authors provide convincing evidence in support of the tasks they developed to capture a variety of complex behaviors and demonstrate the utility of a machine learning approach to expedite the acquisition of task demands. This work is important given its potential utility for other investigators interested in studying mouse cognition.

https://doi.org/10.7554/eLife.104833.3.sa0

Significance of the findings:

Important: Findings that have theoretical or practical implications beyond a single subfield

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Convincing: Appropriate and validated methodology in line with current state-of-the-art

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Mice are among the most prevalent animal models used in neuroscience, benefiting from the extensive physiological, imaging, and genetic tools available to study their brain. However, the development of novel and optimized behavioral paradigms for mice has been laborious and inconsistent, impeding the investigation of complex cognitions. Here, we present a home-cage assisted mouse behavioral innovation and testing system (HABITS), enabling free-moving mice to learn challenging cognitive behaviors in their home-cage without any human involvement. Supported by the general programming framework, we have not only replicated established paradigms in current neuroscience research but also developed novel paradigms previously unexplored in mice, resulting in more than 300 mice demonstrated in various cognition functions. Most significantly, HABITS incorporates a machine-teaching algorithm, which comprehensively optimized the presentation of stimuli and modalities for trials, leading to more efficient training and higher-quality behavioral outcomes. To our knowledge, this is the first instance where mouse behavior has been systematically optimized by an algorithmic approach. Altogether, our results open a new avenue for mouse behavioral innovation and optimization, which directly facilitates investigation of neural circuits for novel cognitions with mice.

eLife digest

Mice are widely used in neuroscience research due to the many tools available to study their brain function and behavior. However, training mice for complex tasks requires extensive human involvement, which can stress the animals and introduce inconsistencies in methods and results.

Automated systems can reduce any potential bias, but most focus on single tasks only and lack optimization. To address these issues, Yu et al. developed the Home-cage Assisted Behavioral Innovation and Testing System (HABITS) – a fully autonomous platform where mice learn tasks in their cages without human intervention.

Using HABITS, mice successfully acquired a wide range of cognitive skills – including decision-making, working memory, and attention – entirely without handling. The system uses machine learning to adjust training sequences, improving learning speed and minimizing bias. In tests with over 300 mice across more than 20 paradigms, including some never attempted in mice, HABITS also improves the overall health of mice compared to the conventionally used water-restriction training. The system’s AI-driven adjustments help mice learn challenging tasks more efficiently and with fewer errors.

Its low cost and automation make it an efficient and reliable tool to study behavior, making it suitable for large-scale studies. Future developments may incorporate wireless neural recordings to directly link behavior with brain activity, providing deeper insight into learning and decision-making mechanisms.

Introduction

Complex goal-directed behaviors are the macroscopic manifestation of high-dimensional neural activity, making animal training in well-defined tasks a cornerstone of neuroscience (Carandini, 2012; Krakauer et al., 2017; Niv, 2021). Over the past decades, researchers have developed a diverse array of cognitive tasks and achieved significant insights into the underlying cognitive computations (Nau et al., 2024). Mouse has been increasingly utilized as model animal for investigating neural mechanisms of decision making, due to the abundant tools available in monitoring and manipulating individual neurons in intact brain (Vázquez-Guardado et al., 2020). One notable example is the field of motor planning Tanji and Evarts, 1976, which, after introduced into mouse models, was significantly advanced by obtaining causal results from whole brain circuits to genetics (Svoboda and Li, 2018).

Traditionally, training mice in cognitive tasks was inseparable from human involvement in frequent handling and modifying shaping strategies according to the performance, thus labor-intensive and inconsistent as well as introducing unnecessary noise and stress (Balcombe et al., 2004). Recently, many works dedicated to design standard training setups and workflows, aiming for more stable and reproducible outcomes (Findling et al., 2023; Han et al., 2018; Benson et al., 2023; Mah et al., 2023; Mohammadi et al., 2024; Pan-Vazquez et al., 2024; Rich et al., 2024; Scott et al., 2013; Aguillon-Rodriguez et al., 2021; Zhou et al., 2024). For example, the International Brain Laboratory has recently shown mice can perform the task comparably across labs after a standard training within identical experimental setup (Aguillon-Rodriguez et al., 2021). However, human intervention still has been a significant factor, which inevitably introduced the variability. Furthermore, training efficiency was still restricted by the limited experimental time as before, which necessitates motivational techniques like water restriction. The training sessions were often restricted to a specific task and have not been extensively tested in other complex paradigms because of the requirement of prolonged training duration. These limitations highlighted the challenges to broadly and swiftly study comprehensive cognitive behaviors in mice.

A promising solution to this scenario is to implement fully autonomous training systems. In recent years, researchers have focused on combining home-cage environments with automated behavioral training methodologies, offering a viable avenue to realize autonomous training (Aoki et al., 2017; Bernhard et al., 2020; Bollu et al., 2019; Caglayan et al., 2021; Francis et al., 2019; Francis and Kanold, 2017; Hao et al., 2021; Kiryk et al., 2020; Murphy et al., 2020; Murphy et al., 2016; Poddar et al., 2013; Qiao, 2019; Salameh et al., 2020; Silasi et al., 2018). For instance, efforts have been made to incorporate voluntary head fixation within the home-cage to train mice on cognitive tasks (Aoki et al., 2017; Hao et al., 2021; Murphy et al., 2020; Murphy et al., 2016). At the cost of increased training difficulty, they successfully integrate large-scale whole-brain calcium imaging (Aoki et al., 2017; Murphy et al., 2020; Murphy et al., 2016) and optogenetic modulations Hao et al., 2021 with fully automated home-cage behavior training. Moreover, other groups have conducted mouse behavior training in group-housed home-cages and utilize RFID technology to distinguish individual mice (Kiryk et al., 2020; Murphy et al., 2020; Qiao, 2019 Qiao, 2019; Silasi et al., 2018). There are also studies which directly train freely moving animals in their home-cage to reduce stress and facilitate deployment (Balcombe et al., 2004; Francis et al., 2019; Kiryk et al., 2020; Poddar et al., 2013; Qiao, 2019 Qiao, 2019; Silasi et al., 2018; Jankowski et al., 2023; Schaefer and Claridge-Chang, 2012). However, many of these studies have focused on single paradigms and incorporated complex components in their systems, which hindered high-throughput deployment for high-efficiency and long-term behavioral testing and exploring.

The training protocols employed in existing studies, no matter in manual or autonomous training, were often artificially designed (Aguillon-Rodriguez et al., 2021 Aguillon-Rodriguez et al., 2021; Hao et al., 2021), potentially failing to achieve optimal outcomes. For instance, a key issue in cognitive behavioral training is that the mouse is likely to develop bias, that is obtaining reward only from one side. Various ‘anti-bias’ techniques Aguillon-Rodriguez et al., 2021; Hao et al., 2021; Do et al., 2023 have been implemented to counteract the bias, yet their efficacy in accelerating training or enhancing testing reliability remains unproven. From a machine learning standpoint, if we can accurately infer the animal’s internal models, it is possible to select a specific sequence of stimuli that will reduce the ‘teaching’ dimension of the animal and thus maximize the learning rate (Zhu et al., 2018). Recently, an adaptive optimal training policy, known as AlignMax, was developed to generate an optimal sequence of stimuli to expedite the animal’s training process in simulation experiments (Bak et al., 2016). While many relative works have realized the theoretical demonstration of the validity of machine teaching algorithms under specific conditions, these have been limited to teaching silicon-based learners in simulated environments (Bak et al., 2016; Jha et al., 2024; Liu et al., 2017). The direct application and empirical demonstration of these algorithms in real-world scenarios, particularly in teaching animals to master complex cognitive tasks, remains unexplored. There are two fundamental barriers for testing these algorithms in real animals training: firstly, traditional session-based behavioral training results in a discontinuous training process, introducing abrupt variation of learning rate and uncontrollable noise (Aguillon-Rodriguez et al., 2021; Roy et al., 2021), which could undermine the algorithm’s capability; secondly, the high computation complexity of model fitting (Bak et al., 2016; Jha et al., 2024) pose challenges for deployment on microprocessors, thereby impeding extensive and high-throughput experiments. In the lack of human supervision, a fully autonomous behavioral training system necessitates an optimized training algorithm. Therefore, integrating fully automated training with machine teaching-based algorithms could yield mutually beneficial outcomes.

To address these challenges, we introduced a home-cage assisted behavioral innovation and testing system, referred to as HABITS, which is a comprehensive platform featuring adaptable hardware and a universal programming framework. This system facilitates the creation of a wide array of mouse behavioral paradigms, regardless of complexity. With HABITS, we have not only replicated established paradigms commonly used in contemporary neuroscience research but have also pioneered novel paradigms that have never been explored in mice. Crucially, we have integrated a machine teaching-based optimization algorithm into HABITS, which significantly enhances the efficiency of both training and testing. Consequently, this study provides a holistic system and workflow for a variety of complex, long-term mouse behavioral experiments, which has the potential to greatly expand the behavioral reservoir for neuroscience and pathology research.

Results

System design of HABITS

The entire architecture of HABITS was comprised of two parts: a custom home-cage and behavioral components embedded in the home-cage (Figure 1A). The home-cage was made of acrylic plates with a dimension of 20×20 × 30 cm, which is more extensive than most of the commercial counterparts for single-housed mice. A tray was located at the bottom of the cage where ad libitum food, bedding, nesting material (cotton), and enrichment (a tube) were placed (Figure 1B). Experimenters can change the bedding effortlessly just by exchanging the tray. The home-cage also included an elevated, arc-shaped weighting platform inside, providing a loose body constraint for the mouse during task performing (Figure 1A). Notably, a micro load cell was installed beneath the platform, which can read out body weight of the mouse for long-term health monitoring. The cage was compatible with the standard mouse rack and occupied as small a space as standard mouse cage (Figure 1C).

Figure 1 with 1 supplement see all

Download asset Open asset

System setup for HABITS.

(A), Front (left) and side (right) view of HABITS, showing components for stimulus presenting (LEDs & buzzers), rewarding (water tanks and pumps), behavioral reporting (lickports) and health monitoring (weight platform). These components are coordinated by the controller unit and integrated into the mouse home-cage with a tray for bedding change. (B), HABITS installed on standard mouse cage rack. (C), Mouse, living in home-cage with food, bedding, nesting material (cotton), and enrichment (tube), is performing task on the weight platform. (D), System architecture for high-throughput behavioral training, showing different tasks are running in parallel groups of HABITS, which further wirelessly connect to one single PC through Wi-Fi to stream real-time data to the graphic user interface (GUI).

To perform behavioral training and testing in HABITS, we constructed a group of training-related components embedded in the front panel of the home-cage (Figure 1A). Firstly, three stimulus modules (LEDs and buzzers) for light and sound presenting were protruded from the front panel and placed in the left, right, and top positions around the weighting platform, enabling generation of visual and auditory stimulus modalities in three different spatial locations. The mouse reported the decisions about the stimuli by licking either left, right, or middle lickports installed in the front of the weighting platform. Finally, peristaltic pumps draw water from water tanks into lickports, serving as the reward for the task, which was the sole water source for the mouse throughout the period living in the home-cage. In the most common scenario, mice living in home-cage stepped on the weighting platform and triggered trials by licking on the lickports to obtain water (Figure 1B, Video 1).

Video 1

Download asset

posterframe for video — Free-moving mouse performing task in HABITS.

To endow autonomy to HABITS, a microcontroller was used to interface with all training-related components and coordinate the training procedure (Figure 1—figure supplement 1A). We implemented a microcontroller-based general programming framework to run finite state machine with millisecond-level precision (see Materials and methods). By using the APIs provided by the framework, we can easily construct arbitrarily complex behavioral paradigms and deploy them into HABITS (Figure 1—figure supplement 1B). Meanwhile, the paradigms were usually divided into small steps from easy to hard and advanced according to the performance of the mouse. Another important role of the microcontroller was to connect the HABITS to PC wirelessly and stream real-time behavioral data via a Wi-Fi module. A graphic user interface (GUI), designed to remotely monitor each individual mouse’s performance, displayed history performance, real-time trial outcomes, body weights, etc. (Figure 1—figure supplement 1C). Meanwhile, the program running in the microcontroller could be updated remotely by the GUI when changing mouse and/or task. As a backup, information of training setup, task parameters, and all behavioral data were also stored in an SD card for offline analysis.

To increase the throughput of behavioral testing, we built more than a hundred independent HABITS and installed them on standard mouse racks (Figure 1—figure supplement 1D). The total material cost for each HABITS was less than $100 (Supplementary file 1). All HABITS, operating different behavioral tasks across different cohorts of mice, were organized into groups according to the tasks and wirelessly connected to a single PC (Figure 1D). The states of each individual HABITS can be accessed and controlled remotely by monitoring the corresponding panels in the GUI, thereby significantly improving the experimental efficiency. We developed a workflow to run behavioral training and testing experiments in HABITS (Figure 1—figure supplement 1E). Firstly, initiate the HABITS system by preparing the home-cage, deploying training protocols, and weighing the naive mouse that did not need to undergo any water restriction. Then, the mouse interacted with fully autonomous HABITS at their own pace without any human intervention. In this study, mice housed in HABITS went through 24/7 behavioral training and testing for up to 3 months (Video 2), although longer duration was supported. Finally, data were harvested from the SD card and offline analyzed.

Video 2

Download asset

Therefore, HABITS permitted high-throughput, parallel behavioral training and testing in a fully autonomous way, which, contrasting with manual training, allows for possible behavioral innovation and underlining neural mechanism investigation.

HABITS performance probed by multimodal d2AFC task

We next deployed a well-established mouse behavioral paradigm, delayed two-alternative forced choice (d2AFC), which was used to study motor planning in mouse Guo et al., 2014b; Inagaki et al., 2018, in HABITS to demonstrate the performance of our system (Figure 2).

Figure 2 with 3 supplements see all

Download asset Open asset

HABITS performance in d2AFC task.

(A) Task structure for d2AFC based on sound frequency. (B) Example licks for correct (blue), error (red), and early lick (gray) trials. Choice is the first lick after response onset. (C) Correct rate (black line) and early lick rate (gray line) of an example mouse during training in HABITS for the first 13 days. Shaded blocks indicate trials occurred in the dark cycle. Trials with early lick inhabitation only occur after the blue vertical line. Red vertical dash lines represent delay duration advancement from 0.2 s to 1.2 s. (D) Averaged correct rate (left) and early lick rate (right) for all mice trained in d2AFC. Criterion level (75%) and chance level (50%) are labeled as gray and red dash lines, respectively. (E) Same as (D) but for manual training (1~3 hr/day in home-cage). (F) Averaged correct rate (left), early lick rate (middle), and no response rate (right) of expert mice trained with the two protocols. (G) Averaged number of trials (left) and days (right) to reach the criterion performance for the two training protocols. Circles, individual mice. Error bar, mean, and 95% CI across mice. (H) Left, number of trials performed per day throughout the training schedule for three different protocols. Error bar indicates the mean and 95% confidence interval (CI) across mice. Middle, volume of water harvested per day. Right, Relative body weights of mice in days 0, 8, 16, 26. Bold line and shades indicate mean and 95% CI across mice. (I) Behavioral performance of all mice training in d2AFC task based on sound orientation (left), light orientation (middle), and light color (right). (J) Box plot of average number of trials (left) and days (right) to reach the criterion performance for d2AFC tasks with different sensory modalities. (K), Left, percentage of trials performed as a function of time in a day for the four modalities trained autonomously (thick black shows the average). Shaded area indicates the dark cycle. Top right, averaged correct rate of grouped mice in dark cycle versus light cycle. Error bars show 95% CI across mice. Bottom right, box plot of the averaged proportion of trials performed in dark cycle for the four modalities. Data collected from expert mice. (L) Left, percentage of trials in blocks with varying number of consecutive trials for automated training in home-cage. Right, correct rate and early lick rate as functions of trial block size. Gray dash line, the criterion performance; Red dash line, chance performance level. Data collected from trials of expert mice. For significance levels not mentioned in all figures, n.s., not significant, p>0.05; *, p<0.05; **, p<0.01 (two-sided Wilcoxon rank-sum tests).

Mice, living in HABITS all the time, licked any of the lickports to trigger a trial block at any time. In the d2AFC task with sound frequency modality (Figure 2A), mice needed to discriminate two tone frequencies presented at the beginning of the trial for a fixed duration (sample epoch, 1.2 s) and responded to the choice by licking left (low tone) or right (high tone) lickports following a brief ‘go’ cue after a fixed delay duration (1.2 s). Licks during the delay epoch were prohibited, and unwished licks (early licks) will pause the trial for a while. Correct choices were rewarded, while incorrect choices resulted in noise and timeout. If mice did not lick any of the lickports after the ‘go’ cue for a fixed period (i.e. no-response trial), the trial block was terminated. The next trial block was immediately entering the state of to be triggered. Mice can only learn the stimulus-action contingency by trial-and-error. To promote learning, we designed an algorithm comprised of many subprotocols to shape the behavior step by step (Figure 2—figure supplement 1A). Figure 2B illustrated example licks during correct, error, and early lick trials for the task. As training progressed, the correct rate increased and early lick rate declined gradually for the example mouse within the first 2 week (Figure 2C). All the 10 mice enrolled in this task effectively learned the task and suppressed licking during the delay period within 15 days (Figure 2D), achieving an average of 980±25 (mean ± SEM) trials per day.

We also tested another training scheme that is limited duration per day in HABITS to simulate the situation of manual training with human supervision. These mice were water-restricted and manually transferred from traditional home-cage to HABITS daily for 1–3 hr training sessions. The duration of sessions was determined by the speed of harvesting water as we controlled the daily available volumes of water to approximately 1 ml (Guo et al., 2014a). All six mice learned the task as the autonomous counterpart (Figure 2) with similar final correct and early lick rate (except that the no-response rate was significantly lower for manual training; Figure 2F). Logistic regression of the mice’s choice revealed similar behavioral strategies were utilized throughout the training for both groups (Figure 2—figure supplement 1B-D). Autonomous training needed significantly more trials than manual training (10,164 ± 1062 vs. 6845 ± 782) to reach the criterion performance; however, the number of days was slightly less due to the high trial number per day (Figure 2G). As shown in Figure 2H, autonomous training yielded significantly higher number of trial/day (980 ± 25 vs. 611 ± 26, Figure 2H left) and more volume of water consumption/day (1.65 ± 0.06 vs. 0.97 ± 0.03 ml, Figure 2H middle), which resulted in monotonic increase of body weight that was even comparable to the free water group (Figure 2H right). In contrast, the body weight in the manual training group experienced a sharp drop at the beginning of training and was constantly lower than the autonomous group throughout the training stage (Figure 2H right). As the training advanced, the number of trials triggered by mice per day decreased gradually for both groups of mice, but the water consumption per day kept relatively stable. At the end of manual training, we transferred all mice to autonomous testing and found that the number of trial and consumption water per day dramatically increased to the level of the autonomous training, suggesting mice actually needed more water throughout the day (Figure 2—figure supplement 1E). These results indicated that autonomous training achieved similar training performance as manual training and maintained a more healthy state of mice.

Three more cohorts of mice were used in d2AFC tasks with different modalities, which included sound orientation (left vs. right sound), light orientation (left vs. right light), and light color (red vs. blue). Using the same training protocol as in the sound frequency modality, we trained 10, 11, and 8 mice on these tasks, respectively (Figure 2I). Mice required different amounts of trials or days to discriminate these modalities, with light color discrimination being the most challenging (an average of 82,932±6817 trials), consistent with the limited sensitivity to light wavelength of mice (Figure 2J). We also tried other modalities, like blue vs. green and flashed blue vs. green, but all failed (see Table 1). The learning rate between sound and light orientation discrimination tasks was similar (p=0.439, two-sided Wilcoxon rank-sum test), but the variation for light orientation was large, indicating possible individual differences (Figure 2J). All modalities maintained good health state indicated by the body weight for up to 2 months (Figure 2—figure supplement 1F). Another two types of 2AFC, reaction time (Munoz and Everling, 2004) (RT-2AFC, Figure 2—figure supplement 2) and random foraging (Figure 2—figure supplement 3) task, were also successfully tested in HABITS. Notably, the dynamic foraging task (Hattori et al., 2023; Hattori et al., 2019), which heavily relies on historical information, was first demonstrated in a fully autonomous training scheme for freely moving mice with similar block size and performance.

Table 1

All tasks training in HABITS.

Protocol name (abbreviation)	Modality	Animals trained(trained / used)	Note
delayed 2-Alternative Forced Choice (d2AFC)	Sound frequency (3 k vs. 10 kHz)	11/11	Figure 2
	Sound orientation (Left vs. Right)	10/11
	Light orientation (Left vs. Right)	10/11
	Light color (Blue VS. Red)	8/11
	Light color (Green VS. Blue)	0/10	Mice are insensitive to light colors.
	Light color (flashed Green VS. Blue)	0/10	Mice are insensitive to light colors.
Reaction time 2AFC (RT-2AFC)	Sound frequency (3 k vs. 12 kHz)	6/6	Figure 2—figure supplement 2
Contingency reversal	RT-2AFC, sound frequency (3 k vs. 12 kHz)	8/8	Figure 3A
Continuous learning	Sound freq. (3 k vs. 12 kHz), reversal sound freq., sound orient. (Left vs. Right), reversal sound orient., light orient. (Left vs. Right)	10/30	Figure 4A; 20 mice did not learn light oriental modal within 90 days.
Evidence accumulation with spatial cue	Poisson distributed clicks with spatial diff. (Left vs. Right)	8/10	Figure 3C
Multimodal Integration	Poisson distributed clicks and flashes (4 vs 20 events/s)	13/15	Figure 3D
Multimodal Integration	Poisson distributed light flashes (4 vs 20 events/s)	3/15	12/15 mice failed to discriminate light flash
Confidence probing task	Sound frequency (8 k vs. 32 kHz)	5/6	Figure 3E
Confidence probing task	Sound frequency (8 k vs. 32 kHz) and Poisson distributed clicks with spatial diff. (Left vs. Right)	0/12	Delay period up to 8 sec, failed
Value-based dynamic foraging task	No sensory cues (Block size from 500 to 100)	6/6	Figure 2—figure supplement 3
Working memory task	Temporal regular clicks with 5 alternative rates (8, 16, 32, 64, 128 Hz)	5/6	Figure 3B
Working memory task	Temporal regular clicks with 3 alternative rates (8, 32, 128 Hz)	8/8	Figure 3—figure supplement 1 C
double Delayed Match Sample (dDMS)	Sound frequency (3 k & 12 kHz)	10/10	Figure 4B; Sample and test period: 500ms
double Delayed Match Sample (dDMS)	Sound frequency (3 k & 12 kHz)	0/10	Random sample and test period: 100 or 1000ms
delayed 3-Alternative Forced Choice (d3AFC)	Sound frequency (8 k vs. 16 k vs. 32 kHz)	14/14	Figure 4C
	Sound frequency reversal (8 k vs. 16 k vs. 32 kHz)	4/4	Figure 3—figure supplement 1B
	Sound orientation (Left vs. Middle vs. Right)	6/6	Figure 3—figure supplement 1A
Context-dependent attention task	Sound frequency (3 k vs. 12 kHz) or sound orientation (Left vs. Right); regular click rates (16 vs 64 Hz) as context cue	6/6	Figure 4D
	Sound orientation (Left vs. Right) or light orientation (Left vs. Right); regular click rates (16 vs 64 Hz) as context cue	0/10	Mice failed in light modality
	Sound orientation (Left vs. Right) or light orientation (Left vs. Right); Sound frequency (3 k vs. 12 kHz) as context cue	0/20	Mice failed in light modality
Machine teaching algorithm	Working memory task, Temporal regular clicks with 5 alternative rates and full stimulus matrix (8, 16, 32, 64, 128 Hz)	7/7 (MT) 5/8(Random)	Figure 5B
	RT-2AFC, Sound frequency (3 k vs. 12 kHz)	10/10 (MT) 10/10 (random) 10/10 (antibias)	Figure 5F; Same group of mice used in continuous learning
	2AFC with Sound frequency (3 k vs. 12 kHz), sound orientation (Left vs. Right) and sound orientation reversal, respectively	10/10 (MT) 8/8 (random)	Figure 6
Total	N/A	200/284	N/A

Two important behavioral characteristics were revealed across all the modalities in 2AFC task. Firstly, as our high-throughput behavioral training platform operated on a 12:12 light-dark cycle, the long-term circadian rhythm of mice can be evaluated based on the number of triggered trials and performance during both cycles. We found all mice exhibited clear nocturnal behavior with peaks in trial proportion at the beginning and end of the dark period (Figure 2K), which was consistent with previous studies (Francis et al., 2019; Hao et al., 2021). The light color modality exhibited a slightly lower percentage of trials during dark (57.89% ± 3.18% vs. above 66% for other modalities; Figure 2K), possibly the light stimulus during trials affected the circadian rhythms of the mice. It was also observed that all mice except the light color modality showed no significant differences in correct rate between light and dark cycle after they learned the task (Figure 2K). The higher performance in a dark environment for light color modality implied that light stimulus presented in a dark environment was with higher contrast and thus better discernibility. Secondly, as we organized the trials into blocks, the training temporal dynamic at trial-level could be examined. We found more than two-thirds of the trials were done in >5-trial blocks (Figure 2L left), which resulted in more than 55% of the trials being with inter-trial intervals less than 2 s (Figure 2—figure supplement 1H). The averaged block duration was 27.64±1.73 s and mice triggered another block of trials within 60 s in more than 60% of cases. Meanwhile, we also found that the averaged correct rate increased and the early lick rate decreased as the length of the block increased (Figure 2L right), which suggested that mice were more engaged in the task during longer blocks.

These results showed that mice can learn and perform cognitive tasks in HABITS with various modalities in a fully autonomous way. During the training process, mice maintained good health conditions, although without any human intervention. Due to the high-efficiency training with less labor and time, it gave us an opportunity to explore and study more widespread cognitive behavioral tasks in mice.

Various cognitive tasks demonstrated in HABITS

We next tested several representative tasks that were commonly used in the field of cognitive neuroscience to demonstrate the universality of HABITS. It is worth noting that many new features of the behavior could be explored due to the autonomy and advantages of HABITS, in terms of either quantity or quality (Figure 3).

Figure 3 with 1 supplement see all

Download asset Open asset

Representative cognitive task performed in HABITS.

(A) Contingency reversal task. (A1) Task structure. (A2) Correct rate of example mice with different learning rates. Gray vertical lines indicate contingency reversal. (A3) Relative number of trials to reach the criterion as a function of reverse times. Gray lines, individual mice. Black lines, linear fit. (A4) Number of trials in the first reversal learning versus the average number of trials of the rest of contingency reversal learning for each mouse (each dot). Black line, linear regression. Red dash line, diagonal line. (B) Working memory task with sound frequency modality. (B1) Task structure. (B2) Stimulus generation matrix (SGM) for left (orange) and right (green) trials. (B3) Left, averaged correct rate for each stimulus combination tested. Right, averaged correct rate for each (S1 +S2) stimulus combination across mice. Black line and shade, linear regression and 95% CI. (B4) Averaged psychometric curves, that is percentage of right choice as a function of frequency difference between sample 1 and sample 2. (B5) Averaged correct rate as a function of delay duration. (C) Evidence accumulation with spatial cue task. (C1) Task structure. (C2) Averaged psychometric curves, that is performance as a function of the difference between right and left clicks rates. (C3) Averaged correct rate across all mice as a function of sample duration for different Poisson rates (different colors). Error bar represents 95% CI. (D) Multimodal integration task. (D1) Task structure. (D2) Averaged correct rate across all mice as a function of sample duration for different stimulus modalities (different colors). (D3) Averaged event rates during sample period for left (red) and right (blue) choice trials. (D4) Averaged weights (black line) of logistic regression fitting to the choice of trials across expert mice tested in >1000 trials (N=11 mice) from the first bin (40ms) to the last bin (1000ms) of the sample period. A gray dash line represents the null hypothesis. Gray dots indicate significance, p<0.05, two-sided t-tests. (D5), Psychometric curve for trials with multimodal stimulus. (E) Confidence probing task. (E1) Task structure. (E2) Psychometric curve, that is right choice rate as a function of relative contrast (log scaled relative frequency). (E3) Histogram of time invested (TI) for both correct and error trials. (E4) Averaged correct rate across all mice as a function of TI. (E5) Averaged TI as a function of absolute relative contrast for both correct and error trials. Cycles, individual mice; *, p<0.05; **, p<0.01, two-sided Wilcoxon rank-sum tests.

Contingency reversal task was a cognition-demanding task and previously used to investigate cognitive flexibility (Izquierdo et al., 2017; Figure 3A). In the task, the contingency for reward switched without any cues once mice hit the criteria performance (Figure 3A1). Mice can dynamically reverse their stimulus-action contingency, though with different learning rates across individuals (Figure 3A2). Given the advantages of long-term autonomous training in HABITS, all mice underwent contingency reversal for an average of 52.25±16.39 times, and one mouse achieved up to 125 times in 113 days. Most of the mice (6/8) gradually decreased the number of trials to reach criterion across multiple contingency reversals representing an effect of learning to learn (Hattori et al., 2023; Akrami et al., 2018; Figure 3A3). Meanwhile, the average number of trials to reach criterion during the reversal was highly correlated with the trial number in the first reversal learning which represented the initial cognitive ability or the sensory sensitivity of individual mice (Figure 3A4).

Working memory was another important cognition that was vastly investigated using rodent model with auditory and somatosensory modalities (Akrami et al., 2018; Fassihi et al., 2014). Here, we utilized a novel modality, regular sound clicks, to implement a self-initiated working memory task, which required mice to compare two click rates separated by a random delay period (Figure 3B1). We initially validated that mice did employ a comparison strategy in a 3×3 stimulus generation matrix (SGM), instead of just taking the first cue as a context (Figure 3—figure supplement 1A). Subsequently, we expanded the paradigm’s perception dynamics to 5×5 and reduced the relative perceptual contrast between two neighboring stimuli to one octave (Figure 3B2). HABITS enabled investigation of detailed behavioral parameters swiftly. We noticed that the discrimination ability for mice was significantly better in the higher frequency range (Figure 3B3), which may be caused by the different sensitivity of mice across the spectrum of regular click rate. During the testing stage, we varied the contrast of the preceding stimulus while maintaining the succeeding one; the psychometric curves affirmed mice’s decision-making based on perceptual comparison of two stimuli and validated their perceptual and memory capacities (Figure 3B4). Meanwhile, mice could maintain stable working memory during up to 1.8 s delay, demonstrating mice can perform this task robustly (Figure 3B5).

Evidence accumulation introduced more dynamics to the decision-making within individual trials and was widely utilized (Brunton et al., 2013; Erlich et al., 2015; Hanks et al., 2015). In the task implemented in HABITS, mice needed to compare the rate of sound clicks that randomly appeared over two sides during the sample epoch and made decisions following a ‘go’ cue after a brief delay (Figure 3C1). We successfully trained 8/10 mice to complete this task as revealed by the psychometric curves (Figure 3C2). After mice learned the task, we systematically tested the effect of evidence accumulation versus the task performance. All mice exhibited consistent behavioral patterns, which were correlated with the evidence of stimulus, that is longer sample period and/or higher stimulus contrast led to higher performance (Figure 3C3). It showed an evident positive correlation between evidence accumulation and task performance.

Multimodal integration Meijer et al., 2018; Odoemene et al., 2018; Raposo et al., 2012 was also tested in HABITS with sound clicks and light flashes as dual-modality events in the evidence accumulation framework. Mice were required to differentiate whether event rates were larger or smaller than 12 event/s (Figure 3D1). We successfully trained 13/15 mice in this paradigm with multimodal or sound clicks stimulus (Figure 3D2), and only three mice achieved performance criteria in trials with light flashes stimulus. Since the events were presented non-uniformly within each trial, we wondered about the dynamics of the decision-making process along the trial. Firstly, we divided all trials with multimodal stimulus into two groups according to the choice of expert mice. The uniform distribution of events within a trial indicated that mice considered the whole sample period to make a decision (Figure 3D3). Secondly, we used a logistic regression model to illustrate the mice’s dependency on perceptual decisions throughout the entire sample period. We found that mice indeed tended to favor earlier stimuli (i.e. higher weights) in making their final choices (Figure 3D4), consistent with previous research findings (Odoemene et al., 2018). Lastly, we further tested modulated unimodal stimuli with different sample periods in the testing stage, in which accuracy correlated positively with sample period and conditioned test for different combinations of modalities demonstrated evidence accumulation and multimodal integration effect, respectively. (Figure 3D5).

Confidence was another important cognition along with the process of decision-making, which was investigated in rat (Kepecs et al., 2008; Masset et al., 2020) and more recently in mice (Schmack et al., 2021) model previously. We introduced a confidence-probing task in HABITS (Figure 3E), in which mice needed to lick twice the correct side for acquiring a reward; the two licks were separated by a random delay during which licking at other lickports prematurely ended the trial (Figure 3E1). This unique design connects mice’s confidence about the choice, which was hidden, with the time investment (TI) of mice between two licks, which was an explicit and quantitative metric. We successfully trained 5/6 mice (Figure 3E2), and most importantly, there was a noticeable difference of TI in correct versus incorrect trials (Figure 3E3). In detail, trials with longer TIs tended to have higher accuracies (Figure 3E4). Furthermore, the TI was also modulated by the contrasts of stimuli; as contrast decreased, mice exhibited reduced confidence about their choices, manifesting as decreased willingness to wait in correct trials, and conversely in error trials (Figure 3E5).

In summary, mice could undergo stable and effective long-term training in HABITS with various cognitive tasks commonly used in state-of-the-art neuroscience. These tasks running in HABITS were demonstrated to exhibit similar behavioral characteristics to previous studies. In addition, some new aspects of the behavior could be systematically tested in HABITS due to its key advantage of autonomy. This high level of versatility, combined with the ability to support arbitrary paradigm designs, suggests that more specialized behavioral paradigms could potentially benefit from HABITS to enhance experimental novelty.

Innovating mouse behaviors in HABITS

One of the main goals of HABITS was to expand mouse behavioral reservoir by developing complex and innovative paradigms that had previously proven challenging or even impossible for mice. These paradigms imposed higher cognitive abilities demands, which required an extensively long period to test in a mouse model. HABITS enabled unsupervised, efficient, and standardized training of these challenging paradigms at scale, and thus was suitable for behavioral innovations.

Firstly, leveraging the autonomy of HABITS, we tested mouse’s ability to successively learn up to 5 tasks one after another without any cues (Figure 4A). These tasks included 2AFC based on sound frequency, sound frequency reversal, sound orientation (pro), sound orientation (anti), and light orientation (Figure 4A1). Firstly, the results showed that mice could quickly switch from one task to another and the learning rates across these sub-tasks roughly followed the learning difficulty of modalities (Figure 4A2). Specifically, reversal of sound frequency was cognitively different from reversal of sound orientation (i.e., from pro to anti), which resulted in significantly longer learning duration (Figure 4A2). Secondly, mice dealt with new tasks with higher reaction time and gradually decreased as training progressed (Figure 4A3). It implied a uniform strategy mice applied: mice chose to respond more slowly in order to learn quickly (Masís et al., 2023). Lastly, mice exhibited large bias at the beginning of each task in all tasks, including tasks without reversals (Figure 4). This means that mice acquired reward only from one lickport in the early training and switched strategy to follow current stimulus gradually, which implied a changing strategy from exploration to exploitation.

Figure 4

Download asset Open asset

Challenging mouse tasks innovated in HABITS.

(A) Continuous learning task. (A1) Task structure showing mice learning five subtasks one by one. (A2) Left, averaged correct rate of all mice performing the five tasks (different colors) continually. All task schedules are normalized to their maximum number of trials and divided into 10 stages equally. Right, box plot of number of trials to criteria for each task. (A3) Left, averaged reaction time of all mice performing the five tasks continually. Right, averaged median reaction time across the five tasks during early (perf. <0.55), middle (perf. <0.75), and trained (perf. >0.75) stage. Error bar indicates 95% CI. (A4) Same as (A3) but for absolute performance bias. n.s., p>0.05; **, p<0.01, two-sided Wilcoxon signed-rank tests. (B) Double delayed match sample task (dDMS) with sound frequency modality. (B1) Task structure. (B2) Averaged correct rate across all mice during training (left) and averaged number of days to reach the criterion (right). (B3) Averaged early lick rate across all mice. (B4) Averaged correct rate (black) and early lick rate (gray) for all combination of sample and test stimulus. (B5) Heatmap of error rate (left) and early lick rate (right) varies with different combination of delay1 and delay2 durations. (C) Delayed 3 alternative forced choice (d3AFC). (C1) Task structure. (C2) Averaged correct rate across all mice during training (left, colors indicate trial types) and averaged number of days to reach the criterion performance (right). (C3) Averaged correct rate (colors indicate trial types) and early lick rate (gray) for different trial types. (C4) Averaged error rate of choices conditioning trial types. In each subplot, the position of bars corresponds to different choices. ****, p<0.0001, n.s., p>0.05, two-sided t-tests. (C5) Averaged choice rates for the three lickports (colors) as a function of sample frequency. Data collected from trained mice. (D) Context-dependent attention task. (D1) Task structure. (D2) Averaged correct rate across all mice during training (left, data only from trials with multimodal w/ conflict) and averaged number of days to reach the criterion (right). (D3) Correct rate (left) and reaction time (right) conditioning modalities. (D4) Averaged psychometric curve and partitioned linear regression for the multimodal with and without conflict conditions, respectively. (D5) Performance bias to sound orientation modal as a function of pre-cue contrast, for the two multimodal conditions. (D6) Averaged correct rate as a function of delay duration.

The delayed match sample task was quite challenging for mice, and only olfactory and tactile modalities were implemented previously Condylis et al., 2020; Liu et al., 2014; Taxidis et al., 2020. Recently, auditory modality was introduced but only in a go/no-go paradigm Yu, 2021. We next constructed a novel double delayed match sample task (dDMS) task (Figure 4B), which required mice to keep working memory of first sound frequency (low or high) during the first delay, match to the second sound based on XOR rules, make a motor planning during the second delay, and finally make a 2AFC choice (Figure 4B1). All the 10 mice achieved the performance criteria during the automated training process, though, an averaged 64.45±7.88 days was required, which was equivalent to more than 120,000 trials (Figure 4B2-3). After training, the four trial types (i.e. four combinations of frequencies) achieved equally well performance (Figure 4B4). During the testing stage, we systematically randomized the duration of the two delays (ranging from 1 to 3 s) and revealed increased error rates and early lick rates as the delay increased (Figure 4B5). Challenging tasks that demanded months of training were well suited for HABITS; otherwise, they were difficult or even impossible for manual training.

Subsequently, we attempted to expand the choice repertoire of d2AFC into 3-alternative forced choice (d3AFC), utilizing the three lickports installed in HABITS (Figure 4C). Previous studies have implemented multiple choice tasks, but only based on spatial modalities Asinof and Paine, 2014; Birtalan et al., 2020; Piantadosi et al., 2019. In our system, mice learned to discriminate low (8 kHz), medium (16 kHz), and high (32 kHz) sound frequencies and lick respectively left, middle, and right lickport to get reward (Figure 4C1). Mice needed to construct two psychological thresholds to conduct correct choices. We successfully trained all the 14 mice to perform the tasks in 13.85±1.05 days, with similar performance among the left, middle, and right trial types (Figure 4C2). The final correct rate and early lick rate made no differences for the three trial types (Figure 4C3). Interestingly, mice made error choices more in the most proximity lickport; for the middle trials, mice made error choices equally in the left and right sides (Figure 4C4). In addition, we tested the whole spectrum of sound frequencies between 8 and 32 kHz and found that mice presented two evident psychological thresholds to deal with this three-choice task (Figure 4C5). Finally, we also implemented sound orientation-based d3AFC in another separated group of mice, which actually required longer training duration (45.16±11.46 days) (Figure 3—figure supplement 1B). The d3AFC was also tested for reversal contingency paradigm, and an accelerated learning was revealed (Figure 3—figure supplement 1C), which potentially provides new insight into the cognitive flexibility Piantadosi et al., 2019.

Finally, we introduced one of the most challenging cognitive tasks in mouse model, delayed context-dependent attention task, in HABITS (Figure 4D). This task was previously implemented by light and sound orientation modalities Mukherjee et al., 2021; Wimmer et al., 2015; however, due to the difficulty, it was not well repeated broadly. In HABITS, we tailored this task into a sound-only-based but multimodal decision-making task (Figure 4D1). We constructed this task using three auditory modalities: regular clicks (16 vs 64 clicks/s) as context, sound frequency (3 k vs. 12 kHz), and sound orientation (left vs. right) as the two stimulus modalities. Mice needed to pay attention to one of the modalities, which presented simultaneously during sample epoch, according to the context cue indicated by the clicks (low click rate to sound frequency and high rate to sound orientation), and make a 2AFC decision accordingly. We successfully trained all six mice enrolled in this task, with an average of 48.09±7.54 days (Figure 4D2). To validate the paradigm’s stability and effectiveness, the direction of stimulus features was presented randomly and independently during the final testing stage. Mice exceeded criterion performance across different trial types (i.e. unimodal, multimodal w/ conflict, multimodal w/o conflict), indicating effective attention to both stimulus features (Figure 4D3). Trials with conflicting stimulus features, requiring mice to integrate context information for correct choice, exhibited reduced decision speeds and accuracy compared to trials without conflicting for all tested mice (Figure 4D3). We further systematically varied the click rate from 16 to 64 Hz to change context contrast. For trials with conflicts, mice decreased their accuracy following the decline of context contrast, formulating a flat psychometric curve. However, for trials without conflicts, mice performed as a near-optimal learner (Figure 4D4). Meanwhile, as the contrast decreased, mice tended to bias to orientation feature against frequency in conflicting trials, but not for the trials without conflicts (Figure 4D5). All these results represented mice dealt with different conditioned trials by a dynamic decision strategy synthesizing context-dependent, multimodal integration, and perceptual bias. Lastly, the performance of both trial types declined with increased delay duration but maintained criterion above up to 2 s delay (Figure 4D6), confirming mice could execute this paradigm robustly in our system.

As a summary, by introducing changes in trial structure, cognition demands, and perceptual modalities, we extended mice behavior patterns in HABITS. These behaviors were usually challenging and very difficult to test previously with manual training. Thus, the training workflow of our system potentially allows for large-scale and efficient validation and iteration of innovative paradigms aimed to explore unanswered cognitive questions with mouse models.

Machine-learning-aided behavioral optimization in HABITS

Mice can be trained to learn challenging tasks in a fully autonomous way in HABITS; however, whether the training efficiency is optimal was unknown. We hypothesized that an optimal train sequence generated by integrating all histories could enhance training procedure, compared with commonly used random or anti-bias strategies. Benefited from recent advances in machine teaching (MT) Zhu et al., 2018, and inspired by previous simulations in optimal animal behavior training experiments Bak et al., 2016, we developed a MT-based automated training sequence generation framework in HABITS to further improve the training qualities.

Figure 5A illustrated the architecture of the MT-based training framework. Initially, mice made an action corresponding to the stimuli presented in current trial t in HABITS; subsequently, an online logistic regression model was constructed to fit the mice’s history choices by weighted sum of multiple features including current stimulus, history, and rules. This model was deemed as the surrogate of the mouse and was used in the following steps; finally, sampling was performed across the entire trial type repertoire and the fitted model predicted positions of potential future trials in the latent weight space; the trial type with closest position to the goal was selected as the next trials. This entire process forms a closed-loop behavioral training framework, ensuring that the mice’s training direction continually progresses towards the goal.

Figure 5 with 1 supplement see all

Download asset Open asset

MT enabled faster learning with higher quality.

(A) The framework of machine teaching (MT) algorithm (see text for details). (B) Working memory task as in Figure 4A, but with full stimulus generation matrix. (C) Averaged number of trials needed to reach the criterion for MT-based and random trial type selection strategies. **, p<0.01, two-sided Wilcoxon rank-sum test. (D) The absolute difference between contrast (contr.) of sample1 (S1) and sample2 (S2) during training process for the two strategies. (E) Same as (D) but for correct rate. (F) MT-based d2AFC task training. Box plot of correct rate of expert mice (left) and number of trials needed to reach the criterion (right) for different training strategies (MT, anti-bias, and random). n.s., p>0.05, Kruskal–Wallis tests. (G) Left, averaged absolute performance bias for the three strategies during different training stages. Right, averaged across training stages. (H) Same as (G) but for absolute trial type bias. (I) Percentage of trials showing significance for different regressors during task learning. (**J–K**) Box plot of correct rate (J) and prediction performance difference between the full model and partial model excluding current stimulus (S0) (K) for different trained stage, including early (perf. >75%), middle (perf. >80%), and well (perf. >85%) trained. *, p<0.05, **, p<0.01, ***, p<0.001, n.s., p>0.05, two-sided Wilcoxon rank-sum tests with Bonferroni correction.

We first validated the theoretical feasibility and efficiency of the algorithms in simulated 2AFC experiments (Figure 5—figure supplement 1). Faster increasement in sensitivity to current stimuli was observed through effective suppression of noise-like biases and history dependence with the MT algorithm (Figure 5—figure supplement 1A–B). Notably, if the learner was ideal (i.e. without any noise), there was no difference between random and MT strategies to train (Figure 5—figure supplement 1C). This implied that the training efficiency was improved by suppression of noise in MT.

To demonstrate MT in real animal training, we initially tested a working memory task similar to Figure 3B, but with a fully stimulus generation matrix. As shown in Figure 5B, this task utilized a complete set of twenty trial types (colored dots), categorized into four levels of difficulty (dot sizes) according to the distance from decision boundary. Trial type selection using a MT algorithm against a baseline of random selection was tested in two separate groups. Mice trained with the MT algorithm achieved criteria performance with significantly fewer trials compared with the random group (Figure 5C); three out of the eight mice in the random group even did not reach the criteria performance at the end of training (60 days). We then asked what kind of strategy the algorithm used that supported an accelerated learning. Analysis of the trial type across learning revealed that the MT-based training presented easier trials first, then gradually increased the difficulty, that is exhibiting a clear curriculum learning trajectory (Figure 5D). However, this did not mean that the MT only presented easy trials at the beginning; hard trials were occasionally selected when the model deemed that a hard trial could facilitate the learning. This strategy enabled mice to maintain consistently higher performance than random groups throughout the training process (Figure 5E). These results suggested that MT-based method enabled more efficient training for specific challenging tasks.

To further validate the effectiveness of MT in more generalized perceptual decision-making tasks, we trained three groups of mice using random, antibias, and MT strategy, respectively, in sound-frequency-based 2AFC task. Due to the fact that this task was relatively simple, all three groups of mice achieved successful training, with comparable efficiency and final performance (Figure 5F). But interestingly, the MT algorithm effectively reduced mice’s preference towards a specific lickport (i.e. bias; Figure 5G) throughout the training process by generating trial types with opposite bias more aggressively (Figure 5H). Using a model-based methodology, we demonstrated that while the MT algorithm minimized bias dependency, it did not increase, and even decreased, mice’s reliance on other noise variables, like previous action A₁, reward R₁ and stimulus S₁ (Figure 5I). Notably, we noticed that all trained mice demonstrated similar low bias (Figure 5G), while only the MT algorithm still exhibited relatively high anti-bias strategy during the trained stage (Figure 5H). This suggests that the MT algorithm might keep regulating cognitive processes actively even in expert mice. To verify this, we segmented the trained trials into early, middle, and well-trained stages based on performance level and showed that all three groups of mice had similar overall accuracies across stages (Figure 5J). However, when we examined the reliance on the current stimulus S₀, that is to what extent the decision was made according to current stimulus, we found that the MT group had significantly higher weights for S₀ than both anti-bias and random groups (Figure 5K). This means that MT-generated sequences across all stages encouraged mice to rely only on current stimuli, rather than noise factors. These results suggested that MT-based training had higher quality for both training and trained stage.

In summary, the MT algorithm automatically chose optimal trial type sequences to enable faster and more efficient training. By modeling the effect of history, choice, and other noise behavioral variables, the MT method manifested higher quality training results. Based on these characteristics, the MT algorithm could enhance training efficiency in challenging paradigms and promote testing robustness in more general paradigms.

Behavioral optimization for multi-dimensional tasks

One of the advantages of the MT algorithm remains that it considers multi-dimensional features altogether and gives the optimal trial sequences to guide the subject to learn. To test this feature, we next expanded the 2AFC task to two stimulus dimensions and trained mice to learn a dynamic stimuli-action contingency. The task was similar to the one applied in Figure 4D which presented both sound frequency and orientation features but without context cues. As illustrated in Figure 6A, the mice were initially required to focus on sound orientation to obtain reward, while ignoring the frequency (Goal 1). Subsequently, the stimulus-action contingency changed, making sound frequency, rather than orientation, the relevant stimuli dimension for receiving reward (Goal 2). Finally, the relevant cue was still sound frequency but with reversed stimulus-action contingency (Goal 3). Throughout the training process, we employed the MT algorithm to adaptively generate trial types about not only the reward location (left or right) but also the components of the sound stimuli (frequency and orientation combinations) and compared with the random control group. The MT algorithm allowed for the straightforward construction of a dynamic multi-goal training just by setting the coordinates of target goals within the latent weight space (Figure 6B).

Figure 6 with 1 supplement see all

Download asset Open asset

MT manifested distinct learning path with faster forgetting and higher learning rate.

(A) Task structure. (B) Chart of training path in latent decision space following three goals one by one. (C) Top, averaged correct rate across grouped mice during training (color, machine teaching; black, random). Bottom, same as top but performance for non-relative cue. (D) Top, the slopes of linear regression between trial number and correct rate. Bottom, same as top but between trial number and performance for non-relative cue. **, p<0.01; n.s., p>0.05; two-sided Wilcoxon rank-sum tests. (E) The learning path of mice (lines) in latent decision space for machine teaching and random training strategies. Light dots represent model weights fitted by individual mice’s behavioral data. Shaded dots, averaged across mice. (Square dots, testing protocol; Cross dots, the first or the last half of trials in learning protocol; Cycle dots, all trials in learning protocol) (F) Left, averaged absolute trial type bias between stay and switch conditions across grouped mice for the MT and random strategies from L1 to L3. Right, same as middle but for the bias between left and right trials. (G) Same as (H) but for absolute performance bias in T1 and T2 protocols. L1, the first 500 trials of frequency learning protocol; L2, intermediate trials of frequency learning protocol; L3, the last 200 trials of frequency learning protocol; T1, testing orientation protocol; T2, testing frequency protocol. *, p<0.05; n.s., p>0.05; two-sided t-tests.

Both groups of mice successfully completed all the goals, achieving an overall correct rate of over 80% (Figure 6C). During the first and the third goal of training, the learning rates of mice in both groups were similar, showing low sensitivity to irrelevant cue. However, in the second goal of training (i.e. transition from orientation to frequency modality), the MT group exhibited a significantly higher learning rate compared to the random group, along with a significantly faster rate of forgetting of the irrelevant cue (Figure 6D and Figure 6—figure supplement 1A–D). To construct a paired comparison, we also retrained the MT group mice in the same protocols but with random sequences after forgetting and confirmed the improvements for both the learning rate and training efficiency (Figure 6—figure supplement 1F–G). We again employed the logistic regression model to extract the weights for each variable and plotted the learning trajectories in the latent weight space. MT algorithm manifested distinct learning path against random group in the space (Figure 6E); MT algorithm quickly suppressed the sensitivity of irrelevant modality (i.e. W_orient.), keeping low sensitivity to noise dimensions (i.e. W_noise) in the meantime. This resulted in a circuitous learning path in the space compared to the random group.

We then asked how MT achieved this learning strategy and what the benefit is. Compared to the random sequence, the MT algorithm effectively suppressed the mice’s reliance on irrelevant strategies by dynamically adjusting the ratio of stay/switch trials and left/right trial types (Figure 6F and Figure 6—figure supplement 1E). After training, we employed the same random trial sequence to test the performance of both groups. Notably, those mice trained with the MT algorithm exhibited significantly lower left/right bias and stay/switch bias compared to the randomly trained mice (Figure 6G and Figure 6—figure supplement 1H). This suggested that the MT algorithm enabled mice to exhibit more stable and task-aligned behavioral training, which implied an internal influence to psychological decision strategies after the MT conditioning.

Discussion

In this study, we developed a fully autonomous behavioral training system known as HABITS, which facilitates free-moving mice to engage in 24/7 self-triggered training in their home-cage without the need for water restriction. The HABITS is equipped with a versatile hardware and software framework, empowering us to swiftly deploy a spectrum of cognitive functions, including motor planning, working memory, confidence, attention, evidence accumulation, multimodal integration, etc. Leveraging the advantages of long-term and parallel running of HABITS, we explored several challenging and novel tasks, some of which introduced new modalities or had never been previously attempted in mouse models. Notably, the acquisition of several tasks spanned more than three months to learn. Benefited from the power of machine teaching algorithms, we endowed the automated training in HABITS with an optimal training sequence which further significantly improved the training efficiency and testing robustness. Furthermore, we extended the machine teaching algorithm to a more generalized form, incorporating multi-dimensional stimuli, which has resulted in diverse training trajectories in the latent space and thus elevated training qualities. Altogether, this study presents a fully autonomous platform that offers innovative and optimal mouse behaviors that could advance the field of behavioral, cognitive, and computational neuroscience.

One of the pivotal contributions of this research is the provision of an extensive behavioral dataset, derived from ~300 mice training and testing in more than 20 diverse paradigms. This comprehensive dataset consisted of the entire learning trajectory with well-documented training strategies and behavioral outcomes. This unprecedented scale of data generated in a single study is mainly attributed to the three distinct features of the HABITS system. Firstly, the hardware was engineered to fit a broad spectrum of cognitive tasks for mice, diverging from the typical focus on specific tasks in previous studies (Aoki et al., 2017; Bernhard et al., 2020; Bollu et al., 2019; Caglayan et al., 2021; Francis et al., 2019; Francis and Kanold, 2017; Hao et al., 2021; Kiryk et al., 2020; Murphy et al., 2020; Murphy et al., 2016; Poddar et al., 2013; Qiao, 2019 Qiao, 2019; Salameh et al., 2020; Silasi et al., 2018). It integrated both visual and auditory stimuli across three spatial locations, as well as up to three lickports for behavioral report, offering various combinations to explore. Secondly, the software within HABITS implemented a general-purpose state machine runner for step-by-step universal task learning and ran standalone without PC in the loop, which contrasted with previous system running on PCs (Aoki et al., 2017; Caglayan et al., 2021; Francis et al., 2019; Francis and Kanold, 2017; Poddar et al., 2013). Thirdly, the cost of the HABITS was quite low (less than $100) compared to previous systems (ranging from $250 to $1500; Francis et al., 2019; Francis and Kanold, 2017; Qiao, 2019 Qiao, 2019; Silasi et al., 2018), which facilitated large-scale deployment and thus high-throughput training and testing. Together, these unique attributes have simplified behavioral exploration, which would otherwise be a time- and labor-intensive endeavor.

Another significant contribution was the expansion of the behavioral repertoire for mice, made possible again by the autonomy of HABITS. We have introduced auditory stimuli and multiple delay epochs into the DMS paradigm (Condylis et al., 2020; Liu et al., 2014; Taxidis et al., 2020; Yu, 2021 Yu, 2021), allowing for investigation of working memory across different modality and stages within single trials. Additionally, we have advanced the traditional d2AFC task (Guo et al., 2014b; Inagaki et al., 2018) to d3AFC, enabling the study of motor planning in multi-classification scenarios. Furthermore, we have implemented delayed context-dependent tasks Mukherjee et al., 2021; Wimmer et al., 2015 based on multi-dimensional auditory stimuli, facilitating research of complex and flexible decision-making processes. Collectively, the advancement of our high-throughput platform was anticipated to improve the experimental efficiency and reproducibility, in either the creation of standardized behavioral datasets for individual paradigms or in the exploration of a multitude of complex behavioral training paradigms.

Last but not least, we have incorporated machine teaching algorithms into the mouse behavioral training process and significantly enhanced the training efficacy and quality. To our knowledge, this is the first study demonstrating the utility of machine teaching in augmenting animal behavioral training, which complements previous simulation studies Bak et al., 2016. The impact of machine teaching algorithms is threefold. First, the training duration of complex tasks was substantially reduced, primarily due to the real-time optimization of trial sequences based on mouse performance, which significantly reduces the mice’s reliance on suboptimal strategies. Second, the final training outcomes were demonstrated to be less influenced by the task-irrelevant variables. A prior study has indicated that suboptimal strategies, such as biases, are common among expert mice trained in various paradigms, potentially stemming from their exploration in real-world uncertain environments (Pisupati et al., 2021; Rosenberg et al., 2021; Wang et al., 2023; Zhu and Kuchibhotla, 2024). Machine teaching-based techniques can significantly reduce the noise dependency, thus facilitating the analysis of the relationship between behavior and neural signals. Third, the machine learning algorithm lowers the barriers for designing effective anti-bias strategies, which were challenging and prone to lopsided in multidimensional tasks. By simply setting the task goals in the fitted decision model, machine teaching can automatically guide the mouse to approach the goal optimally and robustly.

Our study was designed to standardize behavior for the precise interrogation of neural mechanisms, specifically addressing within-subject questions. However, investigators are often interested in between-subject differences—such as sex differences or genetic variants—which can have long-term behavioral and cognitive implications (Lassalle et al., 2008; Weekes, 1994). This is particularly relevant in mouse models due to their genetic tractability (Hemann et al., 2012). Although our primary focus was not on between-subject differences, the dataset we generated provides preliminary evidence for such investigations. Several behavioral readouts revealed individual variability among mice, including large disparities in learning rates across individuals (Figure 2I), differences in overall learning rates between male and female subjects (Figure 2D vs. Figure 2—figure supplement 1G), variations in nocturnal behavioral patterns (Figure 2K), etc. Furthermore, a detailed logistic regression analysis dissected the strategies mice employed during training (Figure 2—figure supplement 1B). Notably, the regression identified variables associated with individual task-performance strategies (Figure 2—figure supplement 1C), which also differed between manually and autonomously trained groups (Figure 2—figure supplement 1D). Thus, our system could facilitate high-throughput behavioral studies exploring between-subject differences in the future.

Our study marks the inaugural endeavor to innovate mouse behavior through autonomous setups, yet it comes with several limitations. Firstly, our experiments were confined to single-housed mice, which is known to influence murine behavior and physiology, potentially affecting social interaction and stress levels (Arndt et al., 2009). In our study, individual housing was necessary to ensure precise behavioral tracking, eliminate competitive interactions during task performance, and maintain consistent training schedules without disruptions from cage-mate disturbances. However, the potential of group-housed training has been explored with technologies such as RFID Kiryk et al., 2020; Murphy et al., 2020; Qiao, 2019; Salameh et al., 2020; Silasi et al., 2018 to distinguish individual mice, which potentially improves the training efficiency and facilitates research of social behaviors (Torquet et al., 2018). Notably, it has shown that simultaneous training of group-housed mice, without individual differentiation, can still achieve criterion performance (Francis et al., 2019). Secondly, we have not yet analyzed any videos or neural signals from mice trained in the home-cage environment. Recent studies have harnessed a variety of technologies and methodologies to gain a deeper understanding of natural animal behavior in home-cage environments (Grieco et al., 2021; Jhuang et al., 2010). Voluntary head fixation, employed in previous studies, has facilitated real-time brain imaging (Rich et al., 2024; Scott et al., 2013; Aoki et al., 2017; Murphy et al., 2020; Murphy et al., 2016). Future integration of commonly used tethered, wireless head-mounted (Shin et al., 2022), or fully implantable devices (Ouyang et al., 2023; Wright et al., 2022), could allow for investigation of neural activity during the whole period in home-cage. Lastly, while HABITS achieves criterion performance in a similar or even shorter overall days compared to manual training, it requires more trials to reach the same learning criterion (Figure 2G). We hypothesize that this difference in trial efficiency may stem from the constrained engagement duration imposed by the experimenter in manual training, which could compel mice to focus more intensely on task execution, resulting in less trial omissions (Figure 2F). In contrast, the self-paced nature of autonomous training may permit greater variability in attentional engagement (Smolen et al., 2016) and inter-trial intervals, which could be problematic for data analysis relying on consistent intervals and/or engagements. Future studies should explore how controlled contextual constraints enhance learning efficiency and whether incorporating such measures into HABITS could optimize its performance.

The large-scale autonomous training system we proposed can be readily integrated into current fundamental neuroscience research, offering novel behavioral paradigms, extensive datasets on mouse behavior and learning, and a large cohort of mice trained on specific tasks for further neural analysis. Additionally, our research provides a potential platform for testing computational models of cognitive learning, contributing to the field of computational neuroscience.

Share this article

Cite this article

System setup for HABITS.

Free-moving mouse performing task in HABITS.

The 24 hr activities of mice living in HABITS.

HABITS performance in d2AFC task.

All tasks training in HABITS.

Representative cognitive task performed in HABITS.

Challenging mouse tasks innovated in HABITS.

MT enabled faster learning with higher quality.

MT manifested distinct learning path with faster forgetting and higher learning rate.

Author details

Bowen Yu

Contribution

Competing interests

Penghai Li

Contribution

Competing interests

Haoze Xu

Contribution

Competing interests

Yueming Wang

Contribution

Competing interests

Kedi Xu

Contribution

Competing interests

Yaoyao Hao

Contribution

For correspondence

Competing interests

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism