1 Introduction

How animals make decisions among alternative actions is a key question in neuroscience. In natural environments, decisions emerge from the integration of sensory inputs, past experience, and internal states. Such innate decision-making requires no training, and animals learn quickly from experience (Meister, 2022). However, in the laboratory, most decision-making researchers rely on trained behaviors such as two-alternative forced choice (2AFC) tasks, where animals undergo weeks of training involving thousands of trials (Burgess et al., 2017). This overtraining can induce brain-wide synchronized neural activity (Steinmetz et al., 2019), potentially engaging neural circuits distinct from those evolved for innate decision-making.

Innate decision-making plays an important role in animal survival and reproductive success. For example, a prey animal that decides not to escape a lethal predator dies. Evolutionary pressure has shaped these responses to occur within extremely short time windows to enhance the likelihood of survival. However, speed alone is not enough – these decisions must also be correct. In complex and ever-changing environments, correct decisions often require cognitive regulation to assess risk, evaluate alternative defensive strategies, and adjust responses accordingly (Evans et al., 2019; Ydenberg and Dill, 1986).

Supporting this notion, studies across species have revealed how decisions to threats are influenced by various factors. Crayfish modify their reactions based on food availability (Liden et al., 2010), satiation levels (Schadegg and Herberholz, 2017), and social status (Krasne et al., 1997). In mice, decisions to overhead visual threats depend on the physical properties of visual stimuli (De Franceschi et al., 2016; Liden and Herberholz, 2008; Tammero and Dickinson, 2002; Yang et al., 2020; Yilmaz and Meister, 2013), learned knowledge from experiences (Vale et al., 2017), and housing conditions (Lenzi et al., 2022). However, despite mice being social animals and most predator encounters occurring during foraging, little is known about how they weigh perceived risks and rewards when making defensive decisions, or how these decisions are influenced by social hierarchy.

To address these questions, we developed an ecologically relevant behavioral paradigm examining decision-making in foraging mice exposed to overhead visual threats. Our findings demonstrate that defensive choices are collectively influenced by threat intensity, reward value, and social hierarchy. Threat intensity plays a dominant role in driving decisions, while the influence of reward depends on threat levels. Under low-threat conditions, elevated reward suppresses defensive responses, consistent with a value-based economic strategy. Yet the same reward increase enhances defensive responses during high-threat scenarios, indicating a shift to a survival-priority strategy. Furthermore, social hierarchy plays a significant role, with dominant mice exhibiting stronger risk-aversion behaviors and heightened vigilance. To quantify the contributions of various factors to decision-making, we proposed a drift-diffusion leaky integrator model that successfully captures key characteristics of the experimental results. Our work provides a valuable behavioral framework for future investigations into the neural mechanisms underlying cognitive control of innate decision-making.

2 Results

2.1 A behavioral paradigm for studying instinctive decision-making in mice

To investigate how animals make innate decisions in natural environments, we designed a behavioral paradigm to simulate the escape behavior of foraging mice in the wild. In this paradigm, a group of 2-5 co-housed mice were placed in a nest, with each individual identified using a radio frequency identification (RFID) tag. Only one mouse was allowed to enter a linear arena to get water delivered at the end. As the mouse approached the reward, an overhead expanding dark disc that mimics the approach of an aerial predator was triggered (Figure 1A), forcing the animal to decide whether to risk getting the reward or defend for safety. Behavioral data were recorded using a ground-mounted camera, and DeepLabCut was used to track the movements of the mouse’s nose and tail base (Mathis et al., 2018) (Figure S1A). Compared to the exploratory phase, the presence of looming stimuli significantly increased the frequency of arena entries and decreased the duration of each visit (Figures 1B and 1C).

A behavioral paradigm for investigating innate decision-making in mice.

(A) Schematic of the behavioral assay (3D and top-down views). (B) Top: arena occupancy patterns for five example mice in one session. Bottom: visit frequency and average duration for each visit. Colors mark the mouse’s identity. (C) Visit frequency and duration under exploration and looming conditions. Error bars represent the standard deviation. N = 46 mice, paired t-test. (D) The pipeline for behavioral classification. (E) Distribution of behavioral decisions across 3862 trials from 140 mice. (F) Left: distance to the safe zone for four decision types (10 example trials each). Red dashed lines mark the onset of each stimulus repetition; solid lines mark the end of the last repetition. Grey shade indicates the safe zone. Right: locomotion speed towards the safe zone for the same trials. (G) Distribution of the longest stationary time for “freezing” and “assessment+flight”. N= 224 and 1667, p < 0.001, Mann-Whitney-Wilcoxon test. (H) Distribution of latency to flee for “direct flight” and “assessment+flight”. N= 458 and 1667, p < 0.001, Mann-Whitney-Wilcoxon test. (I) Distribution of peak speed for “direct flight” and “assessment+flight”. N= 458 and 1667, p < 0.001, Mann-Whitney-Wilcoxon test. (J) Distribution of latency to hide in the safe zone for “direct flight” and “assessment+flight”. N= 455 and 1563, p < 0.001, Mann-Whitney-Wilcoxon test. (K) Distribution of behavioral states at the end of each trial for “direct flight”, “assessment+flight”, and “freezing”. (L) Distribution of fear recovery time for “direct flight”, “assessment+flight”, and “freezing”. N= 74, 458, and 205, p < 0.001, Kruskal-Wallis test and Dunn’s post-hoc test.

To identify distinct behavioral patterns in response to the looming stimuli, we defined 19 behavioral features from key body points of the mouse and fed them into a random forest classifier to predict its decisions (Figures 1D and S1B, see Methods). Animal decisions across 3861 trials were categorized into four types: direct flight (11.8%), flight after assessment (43.2%), freezing (5.8%), and no reaction (39.2%, Figures 1E and 1F). The classification model achieved an accuracy of 0.95 (Figure S1C). A key distinction between “assessment+flight” and “freezing” decisions was the duration of stationary behavior: mice that decided to freeze remained stationary significantly longer than those in the “assessment+flight” group (Figure 1G). Furthermore, the latency to flee differed significantly between “direct flight” and “assessment+flight” decisions (Figure 1H).

We hypothesized that fear level increases progressively across the decision types: “freezing”, “flight after assessment”, and “direct flight”. This is supported by the observations that mice in the “direct flight” group exhibited higher flee speeds (Figure 1I) and shorter hiding latency (Figure 1J). Additionally, the proportion of mice that recovered within 20 seconds of the stimulus was about 20%, 40%, and 80% for “direct flight”, “assessment+flight”, and “freezing”, respectively (Figure 1K). Consistently, recovery time decreased progressively across these decision types (Figure 1L). Our findings align with the theory of predatory imminence continuum (Fanselow and Lester, 1988), which proposes that defensive responses become more intense as the perceived threat level increases.

Although the looming stimulus triggered robust defensive behavior, animals quickly learned to recognize it as non-threatening (Lenzi et al., 2022; Vale et al., 2017). Our behavioral data showed a gradual habituation to the looming stimulus, beginning immediately after the first trial (Figure 2A). Specifically, the defensive probability decreased substantially within the first few trials (Figures 2B and 2C), indicating rapid adaptation to the stimulus. Hiding latency increased progressively across trials (Figure 2D), stabilizing after the 12th trial. Intriguingly, peak fleeing speed increased over the first five trials before declining (Figure 2F), suggesting a rapid modulation of vigilance after just a few exposures to the threat.

Mice learn quickly from experience.

(A) Distance to the safe zone (top) and locomotion speed (bottom) for the 1st, 2nd, 5th, 10th, and 20th trials of 19 mice. Dash lines mark the start of each stimulus, and solid lines mark the end of the stimulus. Positive speed is towards the safe zone. (B) Defensive probability decreases with the number of trials. (C) Summary of the decisions made by 19 mice for the first 20 trials. (D) Latency to flee increases with the number of trials. Gray shade indicates the standard error of the mean. N=19. (E) Hiding latency increases with the number of trials. N=19. (F) Peak fleeing speed varies with the number of trials. N=19.

2.2 Mice make economic decisions modulated by vigilance

In the wild, most prey-predator encounters occur while prey animals are foraging. It remains unclear how prey balance the value of food against the risk of predation in their decision-making process. Recent work has shown that rodents make economically driven decisions in learned tasks that require intensive training (Constantinople et al., 2019). However, it is unknown whether these findings extend to natural behaviors. Furthermore, prey animals have evolved to maintain high levels of vigilance during foraging to increase their chances of survival. Yet, the interplay among food value, threat intensity, and vigilance in shaping defensive decisions remains elusive.

Here, we demonstrate that mice exhibit distinct behavioral patterns in response to varying threat intensities and reward values (Figure 3A). As threat intensity increased, animals were more likely to prioritize threat avoidance and engage in direct flight behaviors (Figure 3B), indicating that perceived threat plays a dominant role in their decision-making processes. This was further evidenced by longer flight distances (Figure 3C) and reduced time spent in the reward zone (Figure 3D), suggesting a trade-off between threat avoidance and reward pursuit. Mice also fled more quickly under higher threat levels (Figure 3E). This reduction in reaction time, a measure of vigilance (Buck, 1966), suggests heightened vigilance in response to increased threat. This is consistent with findings in birds, where elevated predation risk increases vigilance and reduces feeding time (Caraco et al., 1980). Furthermore, fleeing speeds increased significantly with threat level (Figure 3F), reflecting the combined effects of perceived higher risk and heightened vigilance.

Mice make economic decisions modulated by vigilance.

(A) Distance to the safe zone and locomotion speed towards the safe zone across trials in response to low- and high-contrast looming stimuli with different reward values. Dash lines indicate stimulus onset; solid lines mark stimulus offset. (B) Distribution of decision types for six experimental conditions. N = 40 trials from 4 mice (none, low), 50 trials from 5 mice (water, low), 50 trials from 5 mice (sucrose, low), 50 trials from 5 mice (none, high), 39 trials from 5 mice (water, high) and 30 trials from 5 mice (sucrose, high), Chi-squared test. (C-F) Distance fled, duration in the reward zone, latency to flee, and peak fleeing speed across all conditions. Two-way ANOVA with post hoc Tukey’s test. N = 40, 50, 50, 50, 39, 30 trials for C, D, F, and N = 27, 30, 19, 49, 39, 29 trials for E. (G) Water or sucrose consumption during exploration and looming experiments at low and high contrasts. N = 10, 10, 5, 5, 5, 5 trials. Mann-Whitney-Wilcoxon test. Whiskers indicate the minimum and maximum. Boxes indicate the first and third quartiles. (H) Number of up-attention action bouts before exposed to looming at the first and second trials. N = 4, 5, 5, 5, 5, 5 mice. Wilcoxon signed-rank test. (I) Experimental timeline investigating the influence of water reward on innate decision-making in the same mouse. (J) Distance to the safe zone and locomotion speed towards the safe zone across trials in response to low-contrast looming stimuli with and without water reward in an example mouse. Left, non-reward condition followed by reward condition; Right, reward condition followed by non-reward condition. (K) Decision types of nine mice across five trials. Gray squares indicate trials where the mouse did not enter the arena within 30 min. (L) Defensive probability in non-reward and reward conditions. N = 9 mice, paired t-test. (M) Distance fled, duration in reward zone, latency to flee, and peak speed in non-reward and reward conditions. N = 5 mice for latency to flee and 9 mice for other features, paired t-test, two-sided. For all panels: #p < 0.1, **p < 0.05, **p < 0.01, ***p < 0.001.

While threat intensity plays a major role in decision-making, the influence of reward is more nuanced and depends on the threat context. Under low-threat conditions, decisions were primarily driven by perceived reward value. We first confirmed that sucrose was more valuable than water by measuring consumption during exploration, which remained unaffected by low-level threats (Figure 3G). As reward value increased, mice exhibited fewer defensive responses (Figure 3B), indicating that they weighed risk and reward to make economic decisions. Higher rewards also led to shorter flight distance (Figure 3C) and longer stays in the reward zone (Figure 3D), supporting the value-based decision-making strategy. Interestingly, reward value had little effect on latency to flee (Figure 3E), suggesting that vigilance remains relatively stable across reward conditions. Consistently, fleeing speed decreased as reward value increased (Figure 3F).

Conversely, under high-threat conditions, increasing reward value led to more direct flight behaviors (Figure 3B). This counterintuitive observation can be explained by heightened vigilance induced by the increased reward value. The positive correlation between reward value and vigilance likely reflects evolutionary pressures, where high risk is often associated with high reward. In these scenarios, escaping the predator becomes the top priority, even in the presence of high rewards. Evidence for increased vigilance includes shorter latencies to flee (Figure 3E) and higher escape speeds (Figure 3F) when a reward was present. Further supporting this, mice spent more time monitoring the upper visual field during the second trial while approaching sucrose (Figure 3H). These results suggest that, under high-threat conditions, reward value indirectly shapes decision-making through its impact on vigilance.

To confirm value-based decision-making under low-threat conditions while controlling for individual variation, we performed behavioral experiments using the same mice (Figures 3I and 3J, see Methods). When a reward was introduced, mice showed reduced defensive probabilities (Figures 3K and 3L), increased duration in the reward zone, and slower escape speeds (Figure 3M). Latency to flee remained unaffected (Figure 3M), reinforcing the idea that vigilance is unaffected by reward under low-threat conditions. Collectively, these findings reveal how innate decision-making in response to looming stimuli is shaped by the dynamic interplay between perceived threat intensity, reward value, and vigilance.

2.3 Influence of social hierarchy structure on decision-making

Mice are social animals that live in groups, where social hierarchy often plays a critical role in shaping behavior. To investigate how social rank influences decision-making in response to looming threats, we compared the behaviors of dominant and subordinate mice. Rank order within each mouse pair was determined using the tube test before and after behavioral experiments (Figure 4A, see Methods), and only pairs with stable rankings were included in the analysis.

Influence of social hierarchy structure on innate decision-making.

(A) Schematic of the social hierarchy experiment. Top, experimental timeline. Each session lasted 2 hours per mouse pair during the pre-looming, looming, and post-looming phases. Bottom, schematic of the tube test. (B) Arena occupancy for an example pair of mice during the three sessions. (C) Visiting frequency and average duration per visit for dominant and subordinate mice during the pre-looming, looming, and post-looming sessions. N = 9 pairs, paired t-test. (D) Distance to the safe zone over seven days for an example pair. Looming stimuli were presented on days 4 and 5. (E) Percentage of time spent in the reward zone across days. Error bars represent SEM. N = 9 pairs, paired t-test. (F) Distance to the safe zone (left) and locomotion speed (right) for an example pair. (G) Behavioral decisions across the first 10 trials for 9 mouse pairs. (H) Pie charts showing decision distributions for dominant and subordinate mice. N = 90, 90 trials, Chi-squared test. (I) Violin plots comparing dominant and subordinate mice for: latency to flee (N = 90, 78 trials), distance fled, peak fleeing speed, and duration in the reward zone (N = 90, 90 trials) for dominant and subordinate mice. Mann-Whitney-Wilcoxon test. For all panels: **p < 0.05, **p < 0.01, ***p < 0.001.

We first examined how arena occupation was influenced by social rank before and during looming stimulation. Subordinate mice spent more time exploring the arena than dominant mice during the pre-looming exploration (Figures 4B and 4C). However, this difference disappeared during and after exposure to the looming stimulus. No differences were observed in the number of arena entries (Figure 4D). These findings suggest that looming has a long-lasting impact on animal behavior.

Interestingly, when the threat co-localized with the reward, only dominant mice reduced their time in the reward zone (Figures 4E and 4F), indicating greater vigilance compared to subordinates. This heightened vigilance in dominant mice was further supported by reduced habituation to the looming threat (Figures 4G and 4H) and a higher proportion of defensive decisions (Figure 4I). Consistent with their heightened vigilance, dominant mice fled with shorter latencies and higher speeds (Figure 4J). Moreover, they prioritized threat avoidance over reward – fleeing longer distances and spending less time in the reward zone (Figure 4J).

These findings indicate that dominant mice prioritize threat avoidance over reward and are more sensitive to risk. In contrast, subordinates are more easily drawn to the allure of reward and display reduced vigilance in the presence of threats.

2.4 A model for innate decision-making

Our experimental results demonstrate that innate decision-making in response to visual threats is influenced by perceived threat intensity, reward value, and vigilance. Specifically, the distinct effects of reward under low- and high-threat conditions can be explained by taking the animal’s vigilance level into account. To interpret these behavioral results quantitatively, we developed a model of innate decision-making, in which the evidence for escape was modeled as a drift-diffusion leaky integrator (see Methods). In this model, threat intensity and vigilance jointly impact the threat gain and charge the integrator, while a constant leak drives it back towards a stable state. Reward value modulates the drift rate, discharging the integrator and thereby reducing the likelihood of escape.

Model parameters were estimated in two steps. First, we fit the model to experimental data without considering reward and identified the optimal leakage rate, threat gain, diffusion rate, and escape threshold. This revealed distinct decision-making dynamics under low- and high-threat conditions. At low threat, the average evidence level remained below the escape threshold, and escape decisions emerged from stochastic fluctuations in individual trials (Figure 5A). At high threat, the average evidence level crossed the threshold, indicating that escape decisions were mainly driven by threat gain (Figure 5B). Notably, threat gain and diffusion rate had distinct effects on latency to flee: threat gain shifted the mean latency, while diffusion rate affected its variance. Thus, the model captured not only the observed decision patterns but also the distributions of latency to flee.

The drift-diffusion leaky integrator model.

(A-F) Simulated evidence level for escape and predicted latencies to flee and decision types across six conditions. (G) Compare decision models based on threat intensity, reward value, and vigilance to experimental results. Color saturation indicates the likelihood of defensive decisions.

In the second step, we incorporated the effect of reward value and identified the optimal drift rate and threat gain across different threat and reward conditions. Consistent with experimental results showing that decisions at low threat were primarily driven by reward value, drift rate varied across reward conditions (0, 0.01, and 0.05 for no reward, water, and sucrose, respectively), while threat gain remained relatively constant (1, 1, and 0.9 for no reward, water, and sucrose, respectively). Accordingly, the model reproduced the finding that increasing reward value reduced the likelihood of escape without affecting latency to flee (Figures 5A, 5C, and 5E). The same drift rates were applied to model the decision-making process at high threat (Figures 5B, 5D, and 5F). In line with experimental data, threat gain varied substantially across reward conditions (1.2, 1.7, and 1.9 for no reward, water, and sucrose, respectively), exerting an effect opposite to that of drift rate and effectively canceling and reversing its influence.

As summarized in Figure 5G, if decisions are based solely on threat intensity, the likelihood of defensive decisions increases with increasing threat. If based solely on reward value, the likelihood of defensive decisions decreases with increasing reward. If only vigilance is considered, the likelihood of defensive decisions increases with both threat and reward. Our experimental findings and model simulations indicate that while threat intensity plays a central role in innate decision-making, the impact of reward on the decision-making process depends on the threat level. Under low-threat conditions, decisions are primarily driven by reward value, whereas under high-threat conditions, vigilance becomes the dominant factor.

3 Discussion

When confronted by a predator, an animal must decide whether to escape, freeze, or ignore the threat. This life-or-death decision is crucial for survival in the natural environment. In the laboratory, mice make similar decisions when exposed to an overhead expanding dark disc, which mimics the approach of an aerial predator. At first glance, their reactions may seem like simple startle responses to a sudden stimulus, such as thunder. However, a closer examination reveals that their behavior follows complex and structured patterns, distinct from simple reflexes. A growing body of research has demonstrated that these decisions involve cognitive control. Expanding on this idea, we show that an animal’s response to a visual threat is further influenced by economic and social factors.

3.1 Main findings

We designed a behavioral paradigm to simulate how mice respond to visual threats during foraging (Figure 1). Quantitative analysis identified four distinct behavioral decisions in response to the looming stimulus (Figure 1E), characterized by stationary time, latency to flee, escape speed, hiding latency, and recovery time (Figures 1G–1L). Mice rapidly adapted to looming threats starting from the second trial (Figure 2). When a reward was introduced, decision-making was modulated by both threat intensity and reward value, albeit in different ways (Figure 3). Specifically, as threat intensity increased, mice shifted towards more defensive behaviors. In contrast, reward magnitude had a differential effect: at low-threat levels, higher rewards suppressed defensive responses, aligning with value-based decision theory; whereas at high-threat levels, greater rewards promoted defensive decisions, potentially due to heightened vigilance. Furthermore, innate decision-making was shaped by social hierarchy (Figure 4): dominant mice exhibited a stronger tendency towards defensive behaviors, while subordinates were reward-driven and less likely to flee. Finally, we proposed a drift-diffusion leaky integrator model to simulate the decision-making process, effectively capturing the key characteristics of the observed decision patterns (Figure 5).

3.2 Relation to earlier work

Aerial predators pose a significant threat to rodents. Previous studies have shown that overhead visual stimuli in the laboratory elicit robust defensive behaviors in rodents (Wallace et al., 2013; Yilmaz and Meister, 2013). Because such stimuli may mimic various predatory actions, the corresponding behavioral responses depend on specific physical properties of the stimulus. For example, an overhead expanding dark disc that mimics the approach of an aerial predator triggers more flight than freezing behavior (Yilmaz and Meister, 2013), while a small black moving dot that mimics a cruising predator induces more freezing behavior (De Franceschi et al., 2016). These findings suggest that the reaction is not a simple reflex but involves a decision-making process. Additional stimulus properties that influence the action selection include contrast, speed, size, and shape (De Franceschi et al., 2016; Evans et al., 2018; Yang et al., 2020). Support for a decision-making process also comes from its modulation by environmental factors and experience: mice freeze more when no refuge is available (Wei et al., 2015) and quickly learn that the looming stimulus poses no real threat (Lenzi et al., 2022; Vale et al., 2017; Zhong et al., 2023). This innate decision-making process is observed across invertebrates and vertebrates (Evans et al., 2019).

Compared with earlier work, the present study differs in two aspects. First, instead of manually annotating behavioral responses, we employed a machine-learning approach to classify behavioral decisions. This approach largely reduced the labor required for labeling and minimized misclassifications due to inter-individual variability. Second, unlike previous studies using 2D arenas, we designed a 1D linear arena to simulate foraging conditions, where the nest and reward zone were separated by a long corridor. Decision patterns observed here were similar to those reported in 2D environments (Yang et al., 2020). Consistent with previous findings (Lenzi et al., 2022), mice quickly learned that the looming stimulus was non-threatening starting from the 2nd trial. Interestingly, peak escape speed increased over the first five trials, indicating a brief period of heightened vigilance.

One potential concern with the linear arena design is that the looming stimulus is typically displayed at the end of the corridor, raising the possibility that mice may simply return to the nest upon reaching a physical boundary. To address this, we presented the stimulus halfway along the corridor. Even under this condition, mice quickly fled towards the nest – directly towards the threat – rather than away from it. This behavior strongly suggests that their escape responses are not simple reflexes, but instead incorporate the relative safety of the nest into their decisions (Figure S2).

3.3 Economic influence on decision-making

Value-based decision-making has been extensively studied across species (Rangel et al., 2008), including both behavioral paradigms and their underlying neural basis. In humans, decision-making under risk is well characterized by prospect theory, proposed by Daniel Kahneman and Amos Tversky nearly half a century ago (Kahneman and Tversky, 1979). This theory also accounts for the decision-making in rodents after extensive training (Constantinople et al., 2019). Our study demonstrates that mice make value-based decisions even in innate behaviors that require no training. A core component in value-based decision-making is the ability to learn the value of action outcomes. This learning process was clearly observed in our study: just within a few trials, mice habituated to the looming stimulus after recognizing that it posed no real harm (Figure 2). Their decisions reflected the relative values of reward and threat (Figure 3). Interestingly, while mice consistently exhibited value-based responses to threat, reward value-based decision-making was observed only under low-threat conditions. At high threat levels, however, mice exhibited more direct flight behavior regardless of reward value, indicating that their decisions were not value-based. At first glance, this may appear counter-intuitive. However, it can be understood through the lens of vigilance. In nature, foraging often coincides with exposure to predation risk, and evolutionary pressure has favored prey that maintain heightened vigilance during foraging. As a result, reward value, threat intensity, and vigilance level are all correlated. Under high-threat conditions, the influence of reward value is suppressed because survival becomes the top priority and vigilance dominates the decisions. In other words, decisions at low threat levels are more rational, incorporating reward value with limited influence from threat, whereas decisions at high threat levels are more instinctive, dominated by the immediate need to escape.

What neural circuits underlie this innate decision-making under threat and reward? A growing body of research has implicated the superior colliculus (SC) as a central hub for processing looming-evoked defensive behaviors (Evans et al., 2018; Shang et al., 2015; Wei et al., 2015). In parallel, reward-related signals have been found in several brain regions, including the ventral tegmental area (VTA) (Cohen et al., 2012; Schultz, 1998), ventral striatum (Schultz et al., 1992), orbitofrontal cortex (Tremblay and Schultz, 2000), dorsal raphe nucleus (Liu et al., 2014; Miyazaki et al., 2011), and cerebellum (Wagner et al., 2017). Recent work has reported reward-modulated responses in the superficial SC of trained mice (Baruchin et al., 2022). However, such responses may reflect brain-wide synchronized activity induced by overtraining (Steinmetz et al., 2019), raising questions about their presence in untrained, natural behavior. Additionally, the lateral SC is involved in approach behaviors towards rewarding stimuli such as food or prey (Comoli et al., 2012; Krauzlis et al., 2013), and the deep layers of SC directly project to dopaminergic neurons in the substantia nigra (Comoli et al., 2003). Thus, it would be interesting to investigate how SC integrates reward and threat signals during innate decision-making.

3.4 Social influence on decision-making

Mice live in social groups in the wild, and social hierarchy plays a crucial role in shaping their behavior. The heightened vigilance and risk-averse behavior observed in dominant mice have also been observed in other rodents, such as voles (Kleiman et al., 2014). From an evolutionary perspective, this behavioral strategy benefits not only individual survival but also the success of the species. Interestingly, the faster and more consistent escape responses observed in dominant mice are reminiscent of agonistic behaviors in crayfish (Krasne et al., 1997). During social confrontations, dominant crayfish exhibit short-latency escape responses, while subordinates display more flexible escape behavior with longer and more variable latencies.

The neural mechanisms underlying status-dependent defensive decision-making remain poorly understood. In primates, serotonin promotes the acquisition of dominance, particularly under unstable social conditions (Raleigh et al., 1991). Similarly, in crayfish, serotonin modulates escape behavior in a status-dependent manner, enhancing escape responses in dominant individuals while suppressing them in subordinates (Edwards and Kravitz, 1997; Yeh et al., 1997). Investigating the neural circuit mechanisms driving status-dependent defensive decisions in mice would be an exciting avenue of research.

3.5 Mathematic modeling

The drift-diffusion leaky integrator model proposed builds on an integrator model of internal state (Gibson et al., 2015) and incorporates a reward-driven drift-diffusion process (Ratcliff, 1978). Conceptually, the integrator part in our model aligns with Lorenz’s “hydraulic” model of motivation, which describes how internal drives shape behavior (Lorenz, 1950). While our model resembles the leaky integrator used to model escape behavior in flies (Gibson et al., 2015), it differs in several ways. First, the earlier model lacks a reward-related drift component. Second, while they modeled the looming effect as a delta function, our model allows the evidence level to vary continuously with the stimulus size. Third, the influence of vigilance is absent in that model. A similar model has been proposed (Evans et al., 2018), but it did not incorporate reward and vigilance. Note that the evidence level for escape in our model is not equivalent to the fear level (Anderson and Adolphs, 2014); rather, when it crosses a threshold, the animal may enter a fear state. After initiating escape, the fear level decreases while the evidence level remains unaffected.

This hybrid model fits our experimental data well. Below we briefly discuss the roles of different parameters. The leakage rate α represents a drive towards the resting state. The perceived threat was modeled as the product of the habituation parameter hi, the threat gain β, and the sensory input s(t), where β is modulated by vigilance. For simplicity, hi is fixed across all conditions. Under low-threat conditions, the relatively unchanged β across reward conditions suggests minimal variation in vigilance. In contrast, β varies with reward value under high-threat conditions, indicating that reward-related vigilance plays an important role in decision-making. The drift rate captures the influence of reward on decision-making. Interestingly, although the drift rate differences across reward conditions are small, they are sufficient to account for observed behavioral variability. Lastly, the diffusion rate δ plays a critical role, especially under low-threat conditions where the average evidence trajectory does not reach the threshold, which may reflect the fluctuation of the animal’s internal state. The diffusion rate captures the trial-to-trial variability: even small noise can accumulate over time, resulting in large differences across individual trials and contributing to variability in escape latency. Overall, this computational framework not only accounts for our experimental observations but also offers a quantitative foundation for studying decision-making across species and ecological contexts. Understanding how these parameters are implemented at the circuit level is an exciting direction for future research.

4 Resource availability

4.1 Lead Contact

Further information and requests should be directed to and will be fulfilled by the Lead Contact: Ya-tang Li at yatangli@cibr.ac.cn.

4.2 Materials Availability

This study did not generate new unique reagents.

4.3 Data and Code Availability

Data and code will be available in a public repository upon acceptance of the manuscript.

5 Methods

5.1 Animal

Male C57BL/6J mice were group-housed on a 12-hour light / 12-hour dark cycle and used at ages 2-3 months. Behavioral experiments were conducted during the light phase. All experimental procedures were performed under the animal welfare guidelines and approved by the Institutional Animal Care and Use Committee at the Chinese Institute for Brain Research, Beijing.

5.2 Behavioral platform

The behavioral platform consisted of a nest, a linear arena, a radio frequency identification (RFID) system, a real-time mouse position detection system, and a reward delivery port. As shown in Figure 1A, the nest was 40 cm (L) × 20 cm (W) × 30 cm (H), and the linear arena was 100 cm long × 10 cm wide × 30 cm high. They were connected by a one-way tunnel (20 cm × 3 cm × 3 cm) for entering the arena and a 20 cm × 5 cm × 30 cm safe zone with a 5-cm × 5-cm one-way door for returning to the nest. Mouse identity was labeled via implanted RFID tags, which were detected by an RFID reader placed around the tunnel. Only one mouse was allowed to enter the arena at a time. Mice were rewarded at the end of the arena with either water or sucrose. Licking time and reward volume were recorded.

To monitor mouse position in real time, an OpenMV camera with a lens of 90° field of view was mounted on the ground, 110 cm beneath the arena. The mouse position was used to control the tunnel doors and to trigger stimulus presentation when the mouse entered the arena. Specifically, the tunnel had two doors: the first connected the tunnel to the nest, and the second door connected the tunnel to the arena. Once a mouse entered the tunnel, the first door closed and the second door opened. A second camera (LBAS-U350-74M) with a lens (FA0615A) was also placed on the ground to record animal behavior at 30 frames per second.

5.3 Visual stimulation

Looming stimuli were presented on a 55-inch monitor (121 cm × 68 cm) suspended 32 cm above the arena. Visual stimulation was implemented using Psychopy in Python and was aligned to the mouse’s real-time location. The stimulus was a black disc that expanded ten times on a gray background (∼ 65 cd/cm2). For each time, the disc expanded from 0° to 20° of visual angle at a speed of 40°/s, followed by a stationary phase lasting 0.3 s. The inter-stimulus interval was randomized between 1 and 2 minutes. Two stimulus contrasts at 0.2 and 0.99 were displayed by adjusting the luminance of the disc.

5.4 Economical effects on the decision-making

All mice were habituated to the behavioral platform for two days before the looming test. On the first day, five mice from the same home cage were placed in the nest for 30 minutes. Each mouse was allowed to explore the arena for at least 10 minutes, twice. Afterward, all five mice were returned to the nest with all gates open, allowing free exploration of the arena for two hours to ensure that each mouse learned a reward was available at the end of the linear arena. On the second day, each mouse was given one hour to explore the arena and further acclimate to the environment. The looming test was conducted the following day. An overhead looming stimulus with either 0.25 or 0.99 contrast was displayed when a mouse’s position was detected within 20 cm of the arena’s end. For each mouse, both the looming contrast and reward type were fixed. In total, we collected and analyzed behavioral data from 259 trials across 29 mice.

To control for individual variation in decision-making, we compared the same animal’s behavior under reward and no-reward conditions. Two groups of mice were used in the experiment. In the first group (n = 5), mice was first tested under the reward condition followed by the no-reward condition. In the second group (n = 4), the order was reversed. For the reward condition, mice were water-deprived one day before the exploration, and water was provided via the reward port. In the non-reward condition, mice were not water-deprived, and the reward port was removed. Before the looming test, mice were acclimated to the linear arena for two days as described above. On the third day, a looming stimulus with 0.25 contrast was displayed for five trials. In total, behavioral data from 84 trails across nine mice were recorded and analyzed.

5.4.1 Social effects on the decision-making

To investigate the impact of social hierarchy on decision-making, we first determined the social rank of each mouse pair using the tube test (Wang et al., 2011). Mice were co-housed with a glass tube (3 cm diameter, 10 cm long) for one week, then trained to cross a 30 cm tube 10 times per day over two consecutive days. On the third day, the tube test was conducted up to 6 times until the rank order was stable for 4 consecutive trials. Pairs that failed to reach a stable hierarchy were excluded (1 out of 6 pairs).

Prior to the looming test, mice were water-deprived and allowed to explore the linear arena for two hours per day over three days, during which water rewards were delivered at the end of the arena. Over the following two days, the looming test with a stimulus contrast of 0.99 was performed for two hours per day. After the looming test, mice continued to explore the arena for an additional two days, after which their social rank was reassessed with the tube test. All remaining pairs maintained a stable rank order.

To assess whether the tube test itself influenced defensive decision-making, additional looming experiments were conducted on fourmouse pairs before and after introducing the tube test. Specifically, the first looming experiment was conducted prior to the tube test in Figure 4A. The behavioral patterns in a total of 180 trials from 18 mice were recorded and analyzed.

5.5 Behavioral quantification

Animal behaviors in these trials were quantified in three steps. First, two key points – the nose and tail base – were labeled and tracked in the recorded videos using DeepLabCut (Figure S1A). Second, individual 30-second trials were extracted, each consisting of 10 seconds before, 8 seconds during, and 12 seconds after the looming stimulus. For each trial, 19 behavioral features were defined from the tracked key points. Finally, these features were fed into a random forest model with a maximum depth of 5 to classify animal behaviors. The model was trained with 238 trials and tested in 102 trials. The accuracy score for the test set is 0.95. In total, behavioral decisions in 3862 trials from 140 mice were classified into four types: direct flight, flight after assessment, freezing, and no reaction (Figure S1B).

5.6 Behavioral modeling

The animal’s decision-making process in response to the looming stimulus during foraging was modeled as a drift-diffusion leaky integrator. An escape response is triggered when the accumulated evidence level crosses a defined threshold:

Here, xi(t) is the evidence level for escape at time t on the ith trial. The parameter α is the leakage rate, which drives the evidence level towards zero and is the reciprocal of the integration time constant. The stimulus function s(t) denotes the normalized diameter of the looming stimulus, while β is the threat gain that modulates the perceived threat level based on stimulus contrast and the animal’s vigilance. The habituation parameter hi captures the adaptation to repeated stimuli. The term r denotes the drift rate reflecting the perceived reward value. W (t) is a Wiener process, such that W (t + Δt) − W (t) ∼ 𝒩 (0, Δt); δ is the diffusion rate. The binary variable E(t) indicates the escape decision, with ℋ denoting the Heaviside step function.

Given that animals exhibited rapid habituation to the looming stimulus for the first five trials followed by a slower decline, the habituation effect was modeled as the average of fast and slow exponential decay components:

where tri is the ith trial; τf and τs are the time constants for the fast and slow components, respectively.

Model parameters were fitted using the lmfit Python package. In Equation 1, α = 1.78 and δ = 6.9. For low-threat conditions, β = 1, 1, 0.9 for no reward, water, and sucrose, respectively. For high-threat conditions, β = 1.8, 2.4, 2.5 across reward conditions. The drift rate r = 0, 0.02, 0.07 for no reward, water, and sucrose, respectively. In Equation 2, the decision threshold xthr = 0.77. For the habituation function in Equation 3, τf = 300 and τs = 7.96.

5.7 Quantification and statistical analysis

No statistical method was used to predetermine sample size. The Shapiro-Wilk test was applied to assess the normality of data distributions. For comparison between two groups, a two-sample t-test or paired t-test was used for normally distributed data; otherwise, non-parametric tests, including the Mann–Whitney U-test and the one-sample Wilcoxon signed-rank test, was applied. Two-way analysis of variance (ANOVA) followed by post-hoc tests was used for multi-group comparisons. Detailed statistical information for each experiment is provided in the Results section and figure legends.

6 Supplemental information

Here we report details related to the Results and Methods sections.

Quality control for behavioral classification model.

(A) Tracking accuracy of mouse nose and tail base in DeepLabCut. (B) Model performance evaluated by a confusion matrix on the test dataset.

Behavioral responses to looming stimuli presented at four distances from the safe zone.

(A) Distance to the safe zone. Red dashed lines mark the onset of each stimulus repetition; solid lines mark the end of the last repetition. Grey shade indicates the safe zone. N = 18 (70 cm), 19 (50 cm), 13 (30 cm), and 11 (10 cm) trials. (B) Distribution of escape directions.

Comparison of behavioral responses to looming stimuli before and after the tube test.

(A) Schematic timeline of the looming experiment before and after the tube test. (B) Behavioral decisions for dominant and subordinate mice before and after the tube test; Chi-squared test. (C) Violin plots showing latency to flee, distance fled, peak fleeing speed, and duration in the reward zone for dominant (top) and subordinate (bottom) mice before and after the tube test; Mann-Whitney-Wilcoxon test. *p<0.05.

Acknowledgements

We thank Lei Zhang, Xueting Sun, Haojun Sang, Qun Zhang, Bing.Zhao, and Zhuofan.Li in the CIBR instrumentation core for their help in designing and building the linear arena. Ya-tang Li is supported by the National Natural Science Foundation of China (32271060), the Natural Science Foundation of Beijing Municipality (IS23073), and the start-up fund from CIBR. Ling-yun Li is supported by the Natural Science Foundation of Beijing Municipality (5244028), the National Natural Science Foundation of China (32471071), and the R&D Program of Beijing Municipal Education Commission (1240030201).

Additional information

Author contributions

Ya-tang Li supervised the project; Ya-tang Li, Zhe Li, and Yidan Sun designed the experiments; Zhe Li collected all the data; Zhe Li, Jiahui Wang, and Jialin Li analyzed the data; Ya-tang Li carried out the neural modeling; Zhe Li and Ya-tang Li prepared figures; Ya-tang Li, Ling-yun Li, and Zhe Li wrote the manuscript.