Introduction

In many social species, individuals organize into hierarchical structures characterized by dominance and subordination relative to others. Social rank shapes access to resources and reduces conflict 13, and is the most prominent yet identified resilience factor 46. In mice, hierarchies manifest across different behavioral domains, including competitive encounters and agonistic behaviors such as chasing 79. However, the precise nature of the relationship between these different hierarchy-related behaviors remains unclear, particularly in relatively stable social groups and under semi- naturalistic longitudinal conditions 912. Moreover, prior studies have yielded inconsistent findings on whether specific facets of social positioning are influenced by individual differences in reward-related traits, such as impulsivity, cognitive flexibility, and inhibitory control 1315.

Social structures in animals are often assumed to emerge from group dynamics that unfold over time 2,3. These processes are difficult to capture with reductionist paradigms 1619 that are not designed for tracking moment-to-moment interactions over long periods and under ecologically valid conditions. Classic assays typically assess social and cognitive behaviors separately, often in different contexts or at different time points, limiting our ability to test how these domains interact and how stable they are within individuals 16. Moreover, standardized tests such as the tube test 20 rely on forced encounters and manual handling, which may introduce confounds 21,22 and thereby distort natural behavior.

Social hierarchies form and stabilize through repeated interactions 13,5. In mice, small groups tend to form highly despotic hierarchies 2325, whereas larger groups exhibit more complex structures 26. These patterns suggest a strong influence of emergent group-level dynamics on social structure 12,26. Yet animals do not enter social groups as blank slates 2729; stable, latent traits of the individuals may also contribute to hierarchy formation 30. Thus, it remains unclear to what extent social rank reflects significant stable, trait-like properties that generalize across different social contexts 31, or is driven purely by emergent group-level dynamics. Disentangling these possibilities requires experimental conditions that allow unperturbed, continuous tracking of all individuals in sufficiently large groups, together with systematic changes of group composition to modulate social context.

To address these open questions, we developed the Non-invasive Sensor-rich Maze (NoSeMaze). The NoSeMaze provides a fully automated, semi-naturalistic environment for continuous, long-term behavioral assessment of individual mice living in groups of ten. The system tracks real-time outcomes of naturalistic tube competitions, proactive chasing, and self-initiated reinforcement learning via a go/no- go olfactory task. Importantly, it continuously monitors social and cognitive domains in parallel, without human intervention. By combining these measurements across multiple group compositions, the NoSeMaze enables testing of the relative contributions of internalized traits to social rank, and quantifying how social and cognitive domains interact in shaping an individual’s social position.

The resulting massive longitudinal dataset, encompassing more than 4,000 mouse- days, was used to address three specific goals: (1) establish a high-resolution, longitudinal framework that captures individualized profiles of social status-related behaviors and reinforcement learning in group-living mice; (2) disentangle how proactive social behaviors relate to hierarchy networks from tube competitions, and whether either behavior associates with individual differences in non-social reward- seeking features; and (3) determine whether social status and proactive chasing, as well as non-social reward-seeking behaviors represent stable individual traits or are dynamic features across time and with changing group members.

Results

Ecological longitudinal assessment in the NoSeMaze

The NoSeMaze provides a complex, semi-naturalistic environment designed for long- term automated assessment of reinforcement learning and social-status-related behaviors in mouse societies. It contains an open-field arena and a housing arena with nesting material and free access to food (Fig. 1A, Supplementary Fig. S1). The arenas are connected by two tubes equipped with RFID readers, which enable automated tracking of social interactions by detecting the identity and movement of individual RFID-tagged mice. Access to water is provided via the olfactory stimulus- outcome learning module, in which mice perform trials ad libitum to meet their water needs. The module is connected to the open-field arena, requiring mice to cross the tubes regularly to access food and water.

Automated tracking in the NoSeMaze and experimental design.

A, The NoSeMaze (see Supplementary Fig. S1) tracks social status and reinforcement learning in mouse societies (n = 9-10 per group) over multiple weeks without human intervention. RFID-tagged animals are automatically identified at the tube entrances and during olfactory stimulus-outcome learning tasks at the water lickport. B, Two experimental groups of male mice (young and older adults; n = 41 and n =38, respectively) on a C57BL/6J background were used. Prior to the main experiment, repeated reshuffling of home cage group compositions ensured familiarity among all group members. Mice were then housed in the NoSeMaze for several three-week rounds. In each NoSeMaze round, group compositions were shuffled so that mice lived with different group members. Between rounds, mice returned to their home cages for several weeks, again including reshuffling of group compositions. NoSeMaze, Non-invasive Sensor-rich Maze; RFID, radio frequency identification

We investigated a large population of 79 male mice on the C57BL/6J background that were homozygous for floxed oxytocin receptor alleles (OXTRfl/fl), enabling conditional deletion of the receptor. In 26 mice, OXTR was selectively deleted in the AON pars centralis (OXTRΔAON) via viral recombination, induced only after mice had reached full adulthood 32. OXTRΔAON mice show selective impairments in de-novo social olfactory recognition learning 32,33. The remaining 53 mice received a control virus that did not induce OXTR deletion.

The NoSeMaze was originally optimized for male mice because social hierarchies are better characterized in males 9. Female cohorts will be examined in future studies. Each mouse carried a subcutaneous RFID chip for individual identification. The study population comprised two age groups: younger (16-30 weeks) and older adults (55-97 weeks) (Fig. 1B). Mice lived in groups of 9-10 for multiple rounds in the NoSeMaze, with different group members in each round (Fig. 1B, see Supplementary Table S1). This allowed us to test which individual behaviors were stable across specific group compositions. Before the start of the experiment, mice had been housed in groups of four in their home cages, with regular reshuffling to ensure familiarity among potential group members. To prevent bias in group assignment and behavioral interpretation, both the composition of experimental NoSeMaze groups and the home-cage groupings between rounds were assigned randomly and blinded to prior behavioral outcomes, individual performance, or genotype. Up to four NoSeMazes operated in parallel, with each round spanning three weeks. Between rounds, mice were reassigned to different home cage groups for several weeks (mean: 6.8 weeks; SD: 2.8 weeks). In total, data from 21 NoSeMaze rounds, resulting in more than 4,000 mouse-days, were acquired and analyzed (see Supplementary Table S2).

Inter-individual differences in reward-based learning strategies in the NoSeMaze

Before the first round in the NoSeMaze, mice were acclimatized to the reward port (see Methods). To assess individual differences in reinforcement learning, they then performed a go/no-go task with two distinct odors in the stimulus-outcome learning module. Mice initiated trials by inserting their head into the odor port, where they were identified via RFID. In go trials, licking twice during the conditioned stimulus predicting reward (CS+) led to water delivery, while licking twice during the conditioned stimulus predicting no reward (CS−) in no-go trials triggered a timeout of six seconds (Fig. 2A). Reward contingencies alternated every three days between the two odors (i.e., ‘reversal’; Fig. 2B). As expected, activity levels peaked during the dark cycle (Fig. 2C). On average, mice performed 463 trials per day (Supplementary Fig. S2A). The number of trials required to reach 80% correct hits and correct rejections increased after the first reversal, suggesting that the initial reversal was challenging. With subsequent reversals, performance gradually improved, indicating that mice generally learned the task structure (Fig. 2D).

Inter-individual differences in reinforcement-learning in the NoSeMaze.

A, Mice performed stimulus-outcome learning trials for daily water intake, initiated by head insertion into the odor port. In CS+ trials, licking twice during odor presentation released water, while in CS− trials, it triggered a 6-second timeout. B, Reward contingencies switched every three days. C, Trial activity peaked during the dark phase. D, Number of trials required to reach 80% performance (hits or correct rejections, mean ± SEM) in each phase (cf. Fig. 2B). Performance decreased after the first reversal and improved in later phases, consistent with learning the task structure. Only data from the first round in the NoSeMaze was considered. E-G, Example lick patterns from three mice show variability in pre-CS licking rates and modulations during CS presentation in the stable phases of the last 150 trials before reversals. H, Pre-CS licking rates varied widely across animals. I, No correlation was observed between pre-CS licking rates and CS+ modulation peaks (see Supplementary Fig. S2B). J, The fraction of correct rejections was more broadly distributed across mice than hits. K, Pre-CS licking rates negatively correlated with correct rejection rates (see Supplementary Fig. S2C). L, Spearman’s correlation matrix (data from 17 groups) revealed significant relationships between reinforcement- learning metrics, including pre-CS licking rate, correct hit and rejection rates, CS+ modulation peak, and latency to switch after contingency reversals. Only statistically significant correlation coefficients are shown (Bonferroni- corrected for multiple comparisons). M, K-means clustering based on reward-seeking features identified three distinct learning strategies: impulsive go learners (purple), cautious no-go learners (green), and flexible learners (orange). N, The three learning strategies (color labeling as in M) differed across reward-seeking features including hit and rejection rates, pre-CS licking rates, CS switch latencies, and CS+ modulation peak. Horizontal bars denote statistical comparisons between clusters (p-values from permutation tests with n = 100,000 permutations using the median). P-values were corrected for multiple comparisons using the Benjamini–Hochberg false discovery rate (FDR) procedure (α = 0.05). Significant comparisons after FDR correction are marked with #. CI, confidence interval; CS+, rewarded conditioned stimulus; CS−, unrewarded conditioned stimulus; SEM, standard error of the mean Note: The regression line is illustrative only; statistical inference is based on Spearman’s correlation, which does not assume linearity.

Licking behavior varied widely across individuals, as shown for stable phases of the last 150 trials before reversals (Fig. 2E-G). Inter-individual differences were already seen in the pre-CS licking rates (Fig. 2H) that may serve as a proxy of impulsivity 34. Mice also differed in their licking patterns during reward anticipation. While all mice exhibited increased anticipatory licking during CS+ trials (CS+ modulation peak, cf. Fig. 2E-G), this measure of reward sensitivity was independent of pre-CS licking rates, suggesting distinct underlying mechanisms (Fig. 2I, Supplementary Fig. S2B). Correct hit rates were relatively high across animals (Fig. 2J), indicating effective CS+ recognition. The high variability in pre-CS licking and correct rejection rates (Fig. 2J) among individuals may again reflect individual differences in impulsivity and inhibitory control 34. Pre-CS licking was strongly negatively correlated with correct rejection rates (Fig. 2K, Supplementary Fig. S2C), suggesting impaired response suppression in these mice, leading to more timeouts in CS− trials (cf. Fig. 2B).

To quantify behavioral flexibility among mice, we measured CS+ and CS− switch latencies, defined as the trials required to restore (70% correct hits for CS+) or suppress (50% correct rejections for CS−) licking relative to each animal’s pre-reversal CS+ rate. As individualized measures, these latencies capture flexible adaptation rather than absolute performance. Higher pre-CS licking was associated with shorter CS+ switch latencies but slower adaptation to CS− reversals (Fig. 2L, Supplementary Fig. S3), suggesting that impulsivity facilitates detection of the rewarded cue but hinders economic response suppression.

We therefore examined whether multiple reward-seeking features cluster individuals into different learning strategies. K-means clustering based on CS+ and CS− median switch latencies, correct hit and rejection rates, CS+ modulation peak, and pre-CS licking identified three distinct learning strategies (Fig. 2M). Cautious no-go learners (Fig. 2N, green) exhibited the lowest impulsivity and adapted quickly to CS− reversals but updated CS+ associations more slowly, resulting in the lowest correct hit rates. In contrast, impulsive go learners (Fig. 2N, purple), characterized by the highest pre-CS licking, quickly adjusted to CS+ reversals but exhibited poor inhibitory control during CS− reversals, leading to frequent timeouts and a low correct rejection rate. Lastly, flexible learners (Fig. 2N, orange), who showed relatively low impulsivity, efficiently adapted to both CS+ and CS− reversals, achieving high performance in correct hits and rejections. This last cluster thus displayed an optimal balance between retrieving rewards with minimal timeouts while adapting effectively when cognitive flexibility was required. Flexible learners formed the largest group and also exhibited the strongest modulation of anticipatory CS+ licking.

In summary, the NoSeMaze provides quantifiable measures of reinforcement learning, including flexibility, response adaptation, and impulsivity. The three behavioral clusters demonstrate how the NoSeMaze captures individual differences in learning flexibility and inhibitory control.

Social interactions in the tubes

The NoSeMaze enables automated, real-time tracking of two distinct social interactions in the tubes: dyadic dominance competitions and voluntary tube chasing.

Tube competitions occur incidentally when two mice enter the tube from opposite ends at the same time, revealing dominance as the other retreats (Fig. 3A; see Supplementary Videos S1–3 for insightful examples). The dominant mouse is detected sequentially at both ends, whereas the subordinate is detected twice at its entry site – once upon entry and again when retreating. Each dyadic tube competition results in symmetrical win/loss outcomes, enabling the construction of interaction networks and social hierarchies using David’s scores (Fig. 3B), which also consider the opponents’ ranks. Competitions occurred primarily during the dark cycle (Supplementary Fig. S4A-B).

Social hierarchy in the spontaneous tube competitions in the NoSeMaze.

A, Incidental encounters in the tubes triggered dyadic tube competitions. B, Network graph derived from the cumulative tube competitions in an example group over three weeks. Outgoing arrows indicate wins, incoming arrows indicate losses. Hierarchical positions were calculated using the David’s score and can be converted into linear social ranks from 1 (highest) to 10 (lowest). C, Box plots of metrics characterizing social hierarchy for 18 groups, including transitivity, steepness, stability, and uncertainty-by-repeatability (for details, see ‘Source Data’). D, Scatterplots show that David’s scores (z-scored) were significantly correlated across the three different weeks, indicating temporal stability of social hierarchy within one NoSeMaze round (data from 18 groups). E, David’s scores were strongly correlated with Elo ratings, demonstrating convergent validity of rank measures. F, Elo ratings corrected for differences in entry time highly correlated to uncorrected Elo ratings. G, Relative body weight (z-scored per group) was significantly negatively correlated to the fraction of losses in tube competitions (cube root transformed, red), with heavier animals being less likely to lose. No association was found between body weight and the fraction of wins in tube competitions (cube root transformed). Body weight was measured prior to the animals’ introduction to the NoSeMaze, respectively. CI, confidence interval; SD, standard deviation; Note: Regression lines are illustrative only; statistical inference is based on Spearman’s correlation, which does not assume linearity.

Hierarchies derived from tube competitions were well-structured and stable, showing transitivity values exceeding 0.7 and moderate steepness across groups (Fig. 3C, see ‘Source Data’). Stability scores above 0.8 and high uncertainty-by-repeatability values 35 confirmed robust hierarchy sampling over the three-week period (Fig. 3C). David’s scores also showed stable correlations across weeks (Fig. 3D), indicating consistent social ranks over time. Moreover, David’s scores correlated strongly with dynamic hierarchy measures such as the Elo rating (Fig. 3E), demonstrating their convergent validity in capturing social rank. Notably, social ranks with and without adjustment for entry time differences were highly correlated (Fig. 3F), supporting that rank assessments were largely independent of entry time. Finally, David’s scores from the NoSeMaze significantly correlated with those from a manual tube test conducted after the experiment, when animals were again individually housed (Supplementary Fig. S4C).

Body weight showed a moderate positive association with social hierarchy (Supplementary Fig. S5A). Interestingly, this association was driven by a significantly lower fraction of losses in heavier individuals (Fig. 3G, Supplementary Fig. S5B), while body weight did not correlate to the fraction of wins (Fig. 3G, Supplementary Fig. S5C). Thus, larger mice were less likely to be defeated in tube competitions, but did not win more frequently.

Unlike incidental tube competitions, tube chasing is a voluntarily initiated interaction in which one mouse actively chases another through the tube (Fig. 4A). To quantify these behaviors, we calculated the fraction of actively initiated chases and the fraction of times being chased for each mouse, relative to the total number of chasing events in its group (Fig. 4A, Supplementary Fig. S6A, left panel). Chasing events occurred predominantly at night (Fig. 4B). The expression of chasing behavior varied widely across individuals (cf. Fig. 4A, Supplementary Fig. S6A, left panel) and was notably skewed. More specifically, a small subset of mice initiated most chases (Fig. 4C), with the two top chasers accounting for more than 50% of all chases. Across weeks, individual chasing behavior remained stable within the same NoSeMaze round (Fig. 4D, Supplementary Fig. S7). Body weight was unrelated to active chases (Fig. 4E), but heavier animals were less likely to be chased.

Proactive chasing behavior in the NoSeMaze.

A, Schematic of the NoSeMaze setup for automated tracking of voluntary tube chasing. Chasing was quantified as the fraction of chases initiated by each individual relative to the total number of chases in their group (‘active chases’), as well as by the fraction of times being chased (see Supplementary Fig. S6A). B, Chasing events occurred predominantly during the dark phase. C, Cumulative fraction of chases initiated (dark blue) and received (dark red) by individuals ranked within each group. Top-ranking individuals initiated a high number of chases, with the two highest-ranking individuals accounting for more than 50% of all chases. Significant stars mark differences between the two curves (unpaired permutation test, n=10.000 permutations, ***, p < 0.001). D, Active chases were highly consistent across weeks. E, Active chases were not significantly correlated to relative weight before entering the NoSeMaze. However, the fraction of times being chased was negatively correlated with weight, indicating that heavier mice were less likely to be chased. CI, confidence interval; SD, standard deviation; Note: Regression lines are illustrative only; statistical inference is based on Spearman’s correlation, which does not assume linearity.

In summary, social interactions in the NoSeMaze revealed distinct, quantifiable differences in social dominance and proactive chasing, which remained stable within a NoSeMaze round. Tube competitions established reliable dominance hierarchies, while tube chasing reflected individual differences in proactive social behavior.

Behavioral features in the NoSeMaze reflect stable individual traits

To assess whether behavioral features persist beyond a single NoSeMaze round, we examined their stability across rounds with different group members. More specifically, we quantified across-round stability with Spearman’s correlations between rounds 1 and 2 and with intraclass correlation coefficients (ICCs) computed across all rounds. High between-round correlations and high ICCs would indicate that these behaviors are dominated by stable internalized traits rather than transient adjustments to changing social contexts.

We found strong correlations between rounds for social hierarchy and volitional chasing. Specifically, stability was high for the competition-based David’s score (Fig. 5A, ρ = 0.57), as well as for the fraction of active chases (Fig. 5B, ρ = 0.75) and of times being chased (Fig. 5C, ρ = 0.54). Across the full series, ICCs for these social measures were also in the good–excellent range, indicating high across-round stability (Fig. 5D).

Stability of social and reward-seeking features across repeated NoSeMaze rounds.

A–D, Social hierarchy and chasing behavior were stable across NoSeMaze rounds. Individual David’s scores (A), fraction of active chases (B), and fraction of times being chased (C) were all significantly correlated between round 1 and 2 (Spearman’s ρ = 0.57, 0.75, and 0.54, respectively; all p < 0.001). Lines in A–C are illustrative only; inference is based on Spearman’s correlation. (D) Intraclass correlation coefficients (ICC) confirm similar stability for the three metrics across all rounds. Blue circles show Spearman’s ρ between rounds 1 and 2 with 95% CIs; golden squares show ICC with 95% CIs estimated across all rounds to quantify across-round stability of each metric. E, Stability of key reward-seeking features. As in D, blue circles depict Spearman’s ρ between rounds 1 and 2 (95% CIs), whereas orange squares give ICCs computed across all rounds (95% CIs) to index stability over the full longitudinal series. Features include correct hits and correct rejections, pre-CS licking rate, CS+ modulation peak, and switch latencies for CS+ and CS− (see also Supplementary Fig. S8A-C). F, Sankey diagram illustrates consistency in individual reward-seeking strategies across rounds. Most mice retained their behavioral strategy classification (flexible, cautious, or impulsive) between rounds, with the majority of flexible and impulsive animals remaining in the same category (see Supplementary Fig. S8D). CI, confidence interval; CS+, rewarded conditioned stimulus; CS−, unrewarded conditioned stimulus Note: Regression lines are illustrative only; statistical inference is based on Spearman’s correlation, which does not assume linearity.

For reinforcement learning, some features were stable across rounds while others were not (Fig. 5E). Correct hit rates correlated only moderately between rounds (ρ = 0.34), with even lower ICCs (ICC = 0.21). In contrast, pre-CS licking (ρ = 0.69, Supplementary Fig. S8A) and correct rejection rates (ρ = 0.78, Supplementary Fig. S8B) showed higher between-round correlations but only moderate ICCs. Despite shifts in absolute levels, the pattern is consistent with their interpretation as stable individual traits related to impulsivity and inhibitory control (cf. Fig. 5E). Reward sensitivity, measured by peak CS+ licking modulation during anticipation, remained also stable across rounds (ρ = 0.56, cf. Fig. 5E, Supplementary Fig. S8C) with a good ICC. In contrast, the stability of CS+ and CS− switch latencies varied substantially. CS+ switch latencies exhibited low-to-moderate between-round correlations (ρ = 0.16 to 0.45 across reversals; median: ρ = 0.50) and only fair ICCs, whereas CS− switch latencies showed no reliable between-round correlation and poor ICCs (Fig. 5E). The generally shorter switch latencies in the second round (Supplementary Table S3) are consistent with meta-learning of the reversal rule, with a stronger effect on suppressing responses to CS− than on re-learning CS+.

Cluster assignments across rounds showed moderate-to-strong contingencies (χ² = 25.51, p < 0.0001, Cramer’s V = 0.4573, Fig. 5F). Flexible learners, and especially impulsive go-learners, remained mostly stable across rounds (contingencies: 62.5% and 72.2%, respectively, Supplementary Fig. S8D). In contrast, cautious no-go learners were more likely to shift (contingency: 36.4%).

Together, convergent evidence from Spearman’s correlations between the first two rounds and ICCs across all rounds indicates that social hierarchy, chasing behavior, and reward-seeking features including impulsivity, inhibitory control, and reward sensitivity measured in the NoSeMaze represent stable individual traits.

We also tested whether a type of de novo social olfactory learning required for social clique formation in these mice 36 would play a critical role in the here studied domains. Consistent with largely internalized traits, OXTRΔAON produced only transient effects. Social rank was significantly lower during the first week in OXTRΔAON mice, but normalized over the following two weeks (Supplementary Fig. S9A), and no effects of OXTRΔAON were observed on chasing behavior (Supplementary Fig. S9B-C) or reinforcement learning (Supplementary Fig. S10). In line with this, all reported stability relationships remained robust and statistically significant when controlling for OXTRΔAON (see Extended Data Main).

Relationship between social rank, chasing behavior, and reward-seeking traits

Finally, we examined whether and how competition-based social rank, proactive chasing behavior, and reward-seeking features were related, both at the individual and group level. We first compared social rank and chasing. Notably, David’s scores from tube competitions showed a moderately positive association with the fraction of active chases (Fig. 6A, ρ = 0.34, p<0.001), suggesting that higher-ranked mice were more likely to initiate chases. However, this association varied substantially across groups (range: ρ = –0.03 to ρ = 0.62, cf. Fig. 6B). We thus asked whether the strength of this relationship depended on how clearly the group’s hierarchy was structured. Indeed, at the group level, the strength of the association between David’s scores and proactive chasing was negatively related to transitivity (Fig. 6B, ρ = –0.64, p<0.001), indicating that in groups with less clearly defined hierarchies, chasing behavior aligned more strongly with social rank from tube competitions. This suggests that aggressive signaling of status becomes more important when dominance relationships are ambiguous. Conversely, in highly transitive groups, where hierarchies were well established, chasing behavior was less closely aligned with rank, implying that clear social structure reduces the need for proactive reinforcement of dominance. Supporting this interpretation, the fraction of active chases performed by the top- ranking individual also decreased with higher group transitivity (Supplementary Fig. S11, ρ = –0.61, p<0.001).

Relationship between social rank, chasing behavior, and reward-seeking features.

A, Social hierarchy (tube competition David’s score) was positively correlated with the fraction of active chases. B, Across groups, the correlation coefficient calculated between David’s scores from tube competitions and fraction of active chases (x-axis, calculated separately within each group, as in A) was negatively associated with the group- level transitivity of the social hierarchy derived from tube competitions. Groups with lower transitivity showed a stronger association between social rank and chasing, suggesting that aggressive status signaling via chasing becomes more relevant when hierarchies are less well-defined. C, Correlation matrix showing relationships between social hierarchy and chasing metrics. Color scale indicates Spearman’s correlation coefficients; statistically significant values (Bonferroni-corrected) are shown in white. Note that the David’s score for tube competitions correlated to both, fraction of wins and losses in tube competitions, while the ‘chasing David’s score’ was strongly correlated to the fraction of active chases, but not to the fraction of times being chased. D-F, Wins and losses in tube competitions (D) were concentrated in the top-right, occurring most consistently between animals of opposing ranks. In contrast, chasing interactions (E, F) clustered in the top-left, and thus within a subset of high-rank mice, in which they may help clarifying dominance. G, Comparison of the fraction of active chases between the three different clusters defined from the reward-seeking task. Cautious mice initiated less chases than flexible individuals (p-values from permutation tests with n = 100,000 permutations using the median). Box plots show the median (line), interquartile range (IQR, box), and whiskers extending to 1.5 × the IQR. P-values were corrected for multiple comparisons using the Benjamini–Hochberg false discovery rate (FDR) procedure (α = 0.05). Significant comparisons after FDR correction are marked with #. H, Comparison of the David’s score from tube competitions between the three different clusters defined from the reward-seeking task showed no significant difference (p-values from permutation tests with n = 100,000 permutations using the median). Box plots show the median (line), interquartile range (IQR, box), and whiskers extending to 1.5 × the IQR. I, Correlation heatmap between social and reward-seeking traits showed no significant associations after FDR correction for multiple comparison. J, Loading values from a principal component analysis of social and reward-seeking (cognitive) traits for the top three components (PC1–PC3). Bars indicate each trait’s contribution to the respective component. PC1 and PC2 were dominated by cognitive traits (p = 0.0009 and p = 0.0063), PC3 by social traits (p = 0.0016; permutation tests, n = 10,000). The largely non-overlapping loadings suggest orthogonality between social and cognitive domains. CI, confidence interval; CS+, rewarded conditioned stimulus; CS−, unrewarded conditioned stimulus Note: Regression lines are illustrative only; statistical inference is based on Spearman’s correlation, which does not assume linearity.

To gain further insight into the interplay between social rank and chasing behavior, we examined the structural properties of both behaviors in more detail. When considering only social rank, the David’s score correlated strongly with both the fraction of wins (ρ = 0.76, p<0.001) and the fraction of losses (ρ = -0.67, p<0.001) in tube competitions (Fig. 6C, Supplementary Fig. S12A-B). The automated tube test thus meets the assumption of symmetry, where both dominant pushing and subordinate retreat equally contribute to social rank 1,37. Furthermore, the lack of association between number of competition events and social rank (ρ = 0.11, not significant; Fig. 6C, Supplementary Fig. S12C) underscores that tube competitions occurred incidentally, ruling out the possibility that lower-ranking mice actively avoided encounters in the tube. This, in turn, aligns with the design and geometry of the automated tube test, where regular crossings are necessary to access resources. In contrast, these assumptions were not met for chasing behavior in the tube (Fig. 6C). This is best illustrated by constructing a ‘chasing David’s score’. This score was positively associated with the total number of chasing events (ρ = 0.45, p<0.001, Fig. 6C, Supplementary Fig. S12D), reinforcing that chasing reflects volitional, actively initiated behavior. Importantly, the constructed ‘chasing David’s score’ also correlated strongly with the fraction of active chases (ρ = 0.85, p<0.001, Fig. 6C, Supplementary Fig. S12E) but was entirely unrelated to the fraction of times being chased (ρ = -0.01, p=0.897, Fig. 6C, Supplementary Fig. S12F). This fundamental difference suggests that, within a group, chasing behavior did not follow a symmetric dominance- subordination gradient as seen in social rank. Instead, it reflected an unbalanced distribution of initiators and recipients. Consistently, the relationship between the fractions of active chases and being chased differed significantly from the relationship between the fraction of wins and losses in the tube competition (cf. Supplementary Fig. S6A-B, Supplementary Fig. S13). As a result, the fraction of active chases provided a more direct measure of volitional engagement in chasing events than its David’s score, as it directly quantified proactive initiation.

With this distinction in mind, we further explored the specific relationship between incidental competitions and actively initiated chases (Fig. 6D-F, Supplementary Fig. S14). Ranked wins and losses in the tube competitions (Fig. 6D, Supplementary Fig. S14A) distributed differently from active chases and being chased (Fig. 6E, Supplementary Fig. S14B). Specifically, a subset of mice expressed high numbers of both active chases and being chased (Fig. 6F), and this subset preferentially mapped onto mice with high social ranks in tube competitions (top left cluster, Fig. 6F, Supplementary Fig. S14C). As such, chasing behavior may serve to negotiate the social rank among animals high in hierarchy.

Finally, we examined how social traits mapped onto reward-seeking strategies. Cautious no-go learners were less likely to engage in active chases compared to flexible learners (Fig. 6G). Apart from this, no significant group differences were found for competition-based David’s score (cf. Fig. 6I) or fraction of times being chased (Supplementary Fig. S15). Moreover, neither social rank nor volitional chasing were associated with individual reward-seeking traits, including impulsive pre-CS licking, reward sensitivity, inhibitory control, or cognitive flexibility (Fig. 6H). Consistent with this dissociation, principal component analysis showed that social and cognitive traits loaded onto distinct components, further supporting their independence (Fig. 6J). Notably, all reported between-metric relationships remained robust and statistically significant when controlling for OXTRΔAON (see Extended Data Main).

Together, these findings suggest that non-social cognitive traits explain little social dominance behaviors and instead form largely orthogonal behavioral domains when describing ‘personalities’.

Discussion

Understanding behavioral individuality and social structures requires moving beyond reductionist paradigms toward enriched, ecological assessments. Here, we show that individual mice carry stable, internalized, and multi-faceted profiles of social status and cognitive traits across changing group contexts. To capture this, we had developed the NoSeMaze, an open-source modular platform for continuous tracking of spontaneous behavior in socially housed mice. Although ecological approaches are increasingly valued in neuroscience 16, few existing systems support long-term, automated monitoring of individuals across multiple behavioral domains in larger mouse groups. Most current paradigms are tailored to specific behaviors, such as aggression 23,24, conflict 38, or locomotor activity 29, and offer only limited integrative capacities 12,18,3941. The NoSeMaze addresses this gap by the simultaneous, longitudinal assessment of social hierarchy, proactive chasing, and olfactory reinforcement learning in a semi- naturalistic space. Its modular, open design supports future expansion to other behavioral domains and integration with neurobiological tools, providing a flexible framework for studying social structure and behavioral individuality. This study focused on the relationships of behavioral domains that reflect stable dimensions of individuality. Importantly, neither the stability of these traits across rounds nor their relationships to one another were altered by the OXTRΔAON phenotype. The richness of the dataset enabled multiple lines of investigation. Social status and chasing were largely unaffected by OXTRΔAON, apart from an initially slightly lower social rank during the first week that normalized thereafter. By contrast, OXTRΔAON did affect de novo higher-order social bonding, as identified by video tracking of self-paced interactions in the same cohort 36. Together, these findings suggest that OXT-dependent olfactory learning is critical for higher-order social bonding but plays a limited role in shaping hierarchy-related behaviors.

Social hierarchies 1,42 were assessed from incidental dyadic interactions in the novel, automated tube-test module. This method enabled rank estimation without experimenter intervention and was validated against traditional tube testing. Importantly, participation frequency and differences in tube entry time did not confound the resulting ranks, underscoring the robustness of the automated hierarchy assessment. Social rank was equally shaped by dominance and subordination in tube competitions. This supports Bernstein’s assertion 37 that hierarchies are maintained not only by dominance but also by voluntary subordination. While this study focused on male mice, in which social hierarchies are best established 9, future work is needed to explore sex-specific expressions of social structure and their neurobiological underpinnings in female groups. Critically, male hierarchies were robustly maintained within the same group over time. Even more interestingly, the longitudinal accelerated design, in which animals lived in different groups composed of different individuals, allowed us to test whether social rank is predominantly driven by group composition or by internalized social status of the individuals, expressed irrespective of social context. Indeed, we observed that social rank is internalized, with individuals occupying similar positions across groups with different members. This temporal and contextual stability defines social rank as a stable, trait-like characteristic of individuals. Our findings also highlight the value of the NoSeMaze for lifespan-oriented studies of behavioral individuality.

Existing models in more constrained or despotic conditions often view chasing as a unidirectional, dominance-related behavior directed at subordinates 9,43,44. Our socially enriched, semi-naturalistic setting reveals a more nuanced and complex role for chasing behavior: it is both individually stable, but also socially context-sensitive. Chasing was neither broadly distributed nor consistently directed down the hierarchy. Instead, it was initiated by a small subset of individuals – primarily those occupying high tube ranks – and frequently occurred reciprocally within this group, suggesting intra-elite social dynamics rather than broad dominance enforcement. Rather than solely serving to impose hierarchy, chasing appeared to function as a means through which high-status individuals monitor, negotiate, and maintain their relative positions within the top tier. Notably, the identity of frequent chasers remained stable over time and persisted across changing group compositions, indicating that the propensity to initiate chases reflects a consistent individual-level trait rather than a purely situational response.

However, the function and social significance of chasing varied by context. Specifically, its relationship with social rank was not fixed but modulated by the group’s dominance structure. In groups with less clearly defined hierarchies (i.e., lower transitivity), chasing behavior aligned more strongly with rank, suggesting that mice in less structured groups rely more on proactive status signaling to clarify their social position. Indeed, the top-ranked mice in these groups exhibited relatively high levels of active chasing. This context-sensitivity highlights chasing’s dual role: it serves as a tool for elite-level status negotiation among high-ranked individuals, and additionally functions to establish or reinforce hierarchical clarity when social structures are ambiguous. These dynamic aspects of chasing are not captured by the tube test, which relies on incidental, reactive encounters and lacks the volitional and sustained nature of proactive social interaction. Within the more naturalistic complex environment, chasing emerges as a flexible behavioral dimension that is dissociable from formal rank yet contributes dynamically to the maintenance, negotiation, or clarification of social structure. These findings reveal a novel dimension of social dynamics: chasing is not merely a dominance display but a flexible context-dependent mechanism shaped by both individual disposition and group-level social structure.

In addition to behavioral strategies, physical characteristics can also influence patterns of social interaction. Body weight is a factor implicated in social rank and aggression 45, yet studies in mice have yielded mixed results 2,46. In the NoSeMaze, heavier animals were less likely to lose tube competitions or to be targeted in chases, but they were not more likely to win tube competitions or initiate chases. This suggests that weight may buffer against subordination rather than confer dominance, acting as a passive protective factor that reduces the likelihood of being challenged by others.

Beyond social and physical characteristics, we also assessed reward-guided cognitive traits, including impulsivity, reward sensitivity, and inhibitory control. These traits were relatively stable despite the dynamics of reinforcement learning. Mice adopted distinct cognitive styles: impulsive ‘go’, cautious ‘no-go’, and flexible learners. The two styles associated with the most effective reward retrieval also showed the highest stability over months. The cognitive profiles also map onto known axes of behavioral variation observed across species 4750. Although both domains revealed internally consistent trait profiles, social and cognitive features were largely orthogonal. This is in line with meta-analyses showing weak associations between social status and cognitive traits in mice 13. However, when viewed as composite styles, some links emerged: cautious ‘no-go’ learners were least likely to chase others, while flexible learners showed the highest levels of proactive social interaction. Contrary to expectation, impulsivity did not reliably predict chasing behavior, suggesting that social engagement in dominance negotiation may require flexible rather than impulsive behavior.

Together, these findings highlight the value of the NoSeMaze as an integrative ecological platform for dissecting social group organization and behavioral individuality. Social status features represent major factors in conferring resilience. The approach may thus provide a suitable testbed for modeling and causally dissecting resilience mechanisms. The NoSeMaze balances environmental complexity with experimental control, enabling longitudinal high-dimensional phenotyping of individuals. Its modular design allows for the future introduction of environmental stressors, hierarchy perturbations, or resource competition, while maintaining the social and cognitive richness of naturalistic settings. Crucially, the NoSeMaze is compatible with longitudinal, individualized neuroscience, including chemogenetic 51 or pharmacological circuit manipulations that can be applied in a time-locked, subject- specific manner via the lickports. The NoSeMaze can also be combined with intermittent task-based fMRI protocols, mirroring study designs currently emerging in human neuroscience 52,53. Together, this makes the NoSeMaze uniquely suited for unraveling the neural, genetic, and environmental contributions to individuality over time and across contexts.

Conclusion

The NoSeMaze provides a novel and powerful platform for studying the emergence of social structure and behavioral individuality in socially housed mice. By enabling continuous, automated tracking of social hierarchy, proactive social behaviors, and cognitive strategies within the same ecological context, the NoSeMaze revealed that individual behavioral traits are remarkably stable over time, even across changing group compositions. This suggests that social status is not merely a product of immediate social environments, but rather reflects a stable, internalized construct that individuals carry across different social contexts.

Crucially, social status was not a single axis but comprised distinct dimensions: competition-based rank and proactive chasing. Chasing behavior played a dual role, reflecting a stable individual tendency while also adapting to group-level structure. It was most pronounced among high-ranking individuals and aligned more closely with formal rank when hierarchies were less well clarified, suggesting that chasing serves both trait-driven and context-sensitive functions. At the same time, social and cognitive traits were largely independent, challenging the idea of unified behavioral types and supporting a multidimensional view of individuality.

Finally, the NoSeMaze exemplifies how the complexity of real-world behavior can be captured without sacrificing experimental control. Its modular design and compatibility with longitudinal neural interventions make it an ideal tool for linking behavioral traits to neural circuit dynamics, genetic factors, and environmental influences. By combining high-dimensional phenotyping with ecological validity, NoSeMaze opens new avenues for understanding how stable individual differences in social and cognitive function arise and how they shape, and are shaped by, the brain across time and context.

Methods

Animals

Animal strain and housing conditions

A total of 79 adult male homozygous OXTRfl/fl mice (B6.129(SJL)-Oxtrtm1.1Wsy/J, RRID:IMSR_JAX:008471, Jackson Laboratory) backcrossed > F10 to C57BL/6J background (Charles River, Sulzfeld) were used for the experiments. Of these, 26 mice were injected six weeks before the start of the experiment with an AAV expressing Cre recombinase (rAAV1/2-CBA-Cre) into the AON pars centralis to induce bilateral OXTR deletion (OXTRΔAON). The remaining 53 animals received an AAV expressing only dTomato (rAAV1/2-CBA-dTomato). Details on viral constructs, stereotactic surgery, and injection procedures are provided in Wolf et al. 32. Genotype-dependent effects are mainly addressed in a separate manuscript 36. Each mouse was implanted in the neck with a subcutaneous RFID chip (Euro I.D. ID 100 trovan®) for individual identification using a microchip syringe injector under brief anesthesia with 1.5% isoflurane. During home cage conditions, mice were housed in groups of four at a 12 h day-and-night-cycle (room temperature 24 °C, air humidity 55%). Food and water were provided ad libitum. Before entering the NoSeMaze for the first time, group compositions in the home cages were shuffled every 3-4 days to ensure that each mouse had the opportunity to interact with all potential group members (cf. Fig. 1B). A similar reshuffling procedure was applied both during the intervals between NoSeMaze rounds and to the experimental group compositions within each NoSeMaze round. Notably, group assignments in both the NoSeMaze rounds and the home cages were randomized and blinded to prior behavioral outcomes or individual performance. In the NoSeMaze, animals had free access to food. Water was obtained from the self-paced stimulus-outcome learning task that was accessible 24/7. The number of trials determined the water intake, which was continuously monitored by the number of licks in go trials.

Ethics statement

All procedures were in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and the EU 2010/63 directive, and approved by the local animal welfare authority (Referat 35, Regierungspräsidium Karlsruhe, Karlsruhe, Germany).

Sex

In this study, we used male mice, for which social hierarchy is best studied 9 and for which the NoSeMaze was originally designed and optimized. The cohabitation of male and female mice in the same NoSeMaze leads to mating and thus different dynamics in the social behavior. Future studies will leverage current efforts to determine optimal NoSeMaze conditions for female mouse societies to generalize the observations across sexes.

Non-invasive sensor-rich maze (NoSeMaze) and behavioral data acquisition

NoSeMaze infrastructure and habituation

The NoSeMaze contains an olfactory learning module (Fig. 1A and Supplementary Fig. S1), which is controlled by an open source Python software that we modified based on Erskine et al. 40. Full documentation is available at https://github.com/KelschLAB/NoSeMaze. Four identical NoSeMazes, each housing n=9-10 male mice, were used in parallel (for group sizes, see Supplementary Table S2). All mice were habituated to the NoSeMaze twice in subgroups of 6-7 animals. Each acclimatization period lasted 16-20 hours including one dark cycle, giving the animals enough time to explore the novel environment. The age ranges at which mice entered the NoSeMaze for the different rounds are shown in Supplementary Table S2.

Stimulus-outcome learning task at the water lick port

The task at the olfactory stimulus- outcome learning module, where the animals earned their daily water, was gradually introduced during acclimatization. Decanal (Sigma Aldrich W236217) and octanal (Sigma Aldrich W279706), both diluted in mineral oil, were used as odors and applied at a final concentration of 0.1 % with a custom-built olfactometer. Initially, a drop of water (10 µl) was given each time the lick port was visited. Then, only one of the two odors was associated with a water drop. The odor presentation started 500 ms after the animal was recognized by its RFID chip and lasted for 2 s (cf. Fig. 2A). When the mice continuously lived in the NoSeMaze, in the go trials, one odor was followed by a water drop in 100% of the trials if the animal licked twice during the odor presentation. In the no-go trials, the other odor (CS−) was presented and not rewarded. If the animals licked more than once during odor presentation, they received a timeout of 6 s to initiate the next trial. The reward contingencies switched every three days (cf. Fig. 2B). The number of trials and licks were registered with a Darlington sensor and assigned to the RFID tag of each mouse recognized by the antenna at the lick port. Together with the licking behavior, the water intake per animal was recorded and controlled daily. All animals consumed at least 2 ml (up to 6 ml) water per day.

Tube competitions and chasing

Two tubes connected the housing area and the social arena, where the access to the water port was located. Thus, mice had to cross the tubes regularly to consume both food and water. Custom written MATLAB scripts detected events of tube competitions and tube chasing. A competition event was defined as a sequence of RFID- tag detections indicating that two mice entered the tube from different sides at the same time and one mouse pushed the other back out of the tube. While the ‘dominant’ mouse was detected on both sides of the tube, the ‘subordinate’ mouse was detected twice at its entry detector (when entering and when exiting backwards, see Fig. 3A). Detection patterns of putative competition events were visually inspected and included in the analysis if they met the criteria outlined above. A ‘chasing event’ was defined as a sequence of detections indicating the simultaneous presence of two mice in the tube, moving in the same direction and in close succession (Fig. 4A). Events were classified as chases if (1) the time difference between the two animals at the detector was less than 1.5 seconds (Δtime at detector < 1.5 s), and (2) both animals crossed the tube within 2 seconds (Δtime through tube < 2 s). Animals were visually inspected daily for their well-being. Notably, we did not observe any intensive fighting or relevant injuries during the entire duration of the experiment in any of the groups.

Quantification of individual reward-seeking behavioral metrics

Metrics derived from the stimulus-outcome learning task

A detailed summary of the data across NoSeMaze rounds is provided in Supplementary Table S1. Data from 17 NoSeMaze groups were used for the analysis of the reward-seeking domains (see Supplementary Table S2). Six animals, distributed across different groups, were excluded due to insufficient RFID recognition at the lickport. The variability in licking responses was quantified using the following metrics: (1) correct hit rate, (2) correct rejection rate, (3) pre-CS licking rate before odor presentation (from +0.05 to +0.5 s relative to trial start; see Fig. 2G), (4) peak licking during the CS+ window (from +1 s to +2.4 s relative to trial start; ‘CS+ modulation peak’, Fig. 2G), (5) switch latencies for CS+ and CS−, computed separately for each reversal and summarized as the median across reversals, and (6) the number of trials required for animals to achieve an 80% correct hit or correct rejection rate (‘time to criterion’). All trials were used to calculate pre- CS licking rate, correct hit rate, and correct rejection rate. The pre-CS licking rate served as a measure of impulsivity, while licking inhibition during CS− trials reflected inhibitory control. Reward sensitivity was assessed using CS+ modulation peak, defined as the maximum licking rate during CS+ presentation, by concatenating the last 150 CS+ trials before each reversal to ensure that animals had reached a stable state. CS+ switch latency quantified how quickly an animal increased licking behavior in response to the newly rewarded conditioned stimulus (CS+) after a reversal. It was defined as the number of CS+ trials needed after a reversal for the animal to return to at least 70% of its pre-reversal CS+ licking rate. A shorter CS+ latency reflected faster acquisition of the new reward contingency. CS− switch latency captured how quickly an animal suppressed licking in response to the newly unrewarded conditioned stimulus (CS−) after a reversal. It was calculated as the number of CS− trials after a reversal needed for licking to drop below 50% of the pre-reversal CS+ licking rate. A shorter CS− latency indicated faster response inhibition or suppression of previously reinforced behavior. For both metrics, the pre-reversal licking rates were computed from the last 150 CS+ trials preceding each reversal. All rates were calculated from the odor presentation window (0.5-2.5 s). If no switch was detected, the switch latency was set to the total number of available post- reversal CS+ or CS− trials, which varied by animal and phase. Note that both metrics, CS+ and CS− switch latencies, reflect individualized measures, as they referred to each mouse’s own pre-reversal licking rates. Lastly, time to criterion was assessed as the number of trials following a reversal required for the animal to achieve at least 80% correct responses – hits for CS+ and correct rejections for CS− – sustained over a 10-trial moving average. If the animal did not reach this performance level during the post-reversal phase, the time to criterion was set to the total number of available post-reversal trials for the respective stimulus, indicating that the performance criterion was not met within the phase. In contrast to the switch latencies, time to criterion referred to a uniform performance threshold for all animals. For illustrative purposes, some of the data were cube root or Box-Cox transformed in cases of non-normal distribution.

Clustering of reward-seeking profiles

Data from six features were used for the clustering approach: CS+ and CS− median switch latencies, correct hit rates, correct rejection rates, CS+ modulation peak, and pre-CS licking rates. All features were standardized using min-max normalization to preserve individual ranges despite potential skew. Then, K-means clustering was applied. The optimal number of clusters (k = 3) was determined using the elbow method. A silhouette score of 0.50 indicated a moderately good separation between clusters. Cluster assignments were visualized in PCA-reduced space using the first three principal components, and statistically compared across features using permutation tests. To assess differences of the six reward-seeking features between clusters, we performed unpaired permutation tests using 100,000 iterations and a median-based test statistic to account for data skewness. To address repeated measures and within-animal dependencies, a realistic swap ratio was estimated by computing the ratio of between-animal to total variability (i.e., between-animal / [within-animal + between-animal] variability). This swap ratio defined the proportion of data globally shuffled across clusters during each permutation. It was calculated per feature within each cluster and was applied globally during permutation tests to maintain realistic within- subject structure. Specifically, we used the mean of the two swap ratios from the two clusters that were compared. By incorporating both within- and between-subject variability, this approach generated a more robust null distribution. Final p-values were computed using two- tailed tests comparing observed and permuted differences. To control for multiple testing, p- values were adjusted using the Benjamini–Hochberg false discovery rate (FDR) procedure 54.

Quantification of individual social behavioral metrics

Social rank competition and chasing metrics

Individual-level social status was quantified using the David’s score from tube competitions. It was calculated based on the counts of wins and losses between all pairs of animals. Specifically, for each pair of animals 𝑖 and 𝑗, we calculated the number of times that 𝑖 defeated 𝑗 (𝑎𝑖𝑗) and the total number of encounters between them (𝑛𝑖𝑗 = 𝑎𝑖𝑗 + 𝑎𝑗𝑖). The directed proportion of wins was then defined as 𝑃𝑖𝑗 = 𝑎𝑖𝑗/𝑛𝑖𝑗 whenever 𝑛𝑖𝑗 > 0. For each individual 𝑖, the first-order scores were calculated as

representing the summed proportions of wins and losses, respectively. Pairs with no interactions (𝑛𝑖𝑗 = 0) produce undefined 𝑃𝑖𝑗 values, which were excluded from the sums. To capture the strength of opponents, we additionally computed second-order terms, where

These weighted proportions assign greater value to a win against a high-ranking opponent than against a low-ranking one, and analogously for losses 55. The David’s score for each individual was then obtained as

Linear ranking of David’s scores within a group yielded each animal’s social rank, with 1 denoting the highest rank. In this study, we determined the social hierarchy position for every animal based on the first three weeks in the NoSeMaze. Similarly, a ‘chasing David’s score’ was calculated using the counts of active chases and times being chased between all pairs of animals in place of wins and losses. Moreover, we computed the fraction of active chases and the fraction of times being chased for each animal, defined as the number of actively initiated chases or times being chased divided by the total number of chasing events in its group. Similarly, the fractions of wins and losses in tube competitions were calculated for each individual based on the outcomes of these events.

Group-level characterization of social hierarchy structure

Social hierarchies were evaluated using metrics recommended in recent literature 35, including steepness, transitivity, stability, and uncertainty-by-repeatability (Fig. 3C, see also ‘Source Data’). Triangle transitivity was used to quantify transitivity of the social hierarchy by calculating the proportion of transitive triangles (i.e., triads of animals) from all triangles 56. A triangle is transitive when, if A dominates B and B dominates C, then A also dominates C. The triangle transitivity index therefore reflects the fraction of all triads in the group that meet this criterion. Significance of triangle transitivity was tested with a permutation test by computing 1,000 random networks (while preserving the distributions of observed null, tied and asymmetric edges in the hierarchy network). These random networks’ triangle transitivity values were compared against the observed values 56. Steepness describes the degree of dominance success over subordinate animals and is defined as the slope of a linear fit of normalized David’s scores 57. The significance of a hierarchy’s steepness was also assessed with a permutation test, in which 1,000 hierarchies with wins and losses assigned by chance are calculated and compared to the observed steepness. The stability index ranges between 0 and 1, with 0 indicating unstable hierarchies, in which the ordering reverses every other day, and 1 describing stable hierarchies without rank changes 35. The uncertainty-by-repeatability is calculated as the intraclass correlation coefficients for each individual after randomizing the order of the interactions. A repeatability score above 0.8 suggests a reasonably robust hierarchy 35. The metrics were calculated in R using the packages ‘aniDom’ (https://cran.r-project.org/package=aniDom) 35 and ‘EloRating’ (https://cran.r-project.org/web/packages/EloRating/index.html) 58.

Validation of social hierarchy metrics

To validate the integrated social hierarchy metrics and demonstrate their convergent validity in assessing social rank, David’s scores derived from the integrated tube test were compared to two established dynamic dominance hierarchy estimation methods: the traditional Elo-rating and the randomized Elo-rating 35. Further, to cross-validate the integrated tube test readout, a conventional tube test was performed the day after mice were removed from the NoSeMaze and had returned to standard cages. The standard tube test was performed on ten NoSeMaze groups. In this standard setup, mice were habituated to a classical tube test apparatus consisting of two compartments connected by a tube with a central barrier. Random dyads were drawn from each group and a tube test was conducted by placing the two mice in the tube facing each other. If one mouse pushed the other out of the tube within one minute it was scored a win or loss respectively, otherwise a tie. Each animal performed five such trials to avoid exhaustion 20. The resulting David’s score from the manual tube tests were correlated significantly to those from the integrated tube test (Supplementary Fig. S4C). To assess the influence of entry-time differences in tube competitions, we computed two versions of dynamic Elo ratings for each animal based on its observed win/loss outcomes. The ‘uncorrected’ Elo rating was calculated using standard update rules, where each win/loss updated an individual’s Elo score based on the pre-match Elo ratings and a group-specific K-factor 58. To evaluate and correct for potential biases due to asymmetries in tube entry time, we first quantified the influence of entry-time differences on competition outcomes using a linear mixed-effects model (LME). For this, we compiled a dataset combining tube competition events across all groups. Each event was included twice – once from the perspective of the winner and once from the loser. Each row contained the individual’s and the opponent’s identity, the pre-match Elo ratings of both animals, and their entry times. From these, the differences in both Elo rating and entry time were computed and included as predictors. Extreme values for entry-time difference were capped at a threshold of 10 seconds to minimize the influence of outliers. The LME was implemented in R (v4.4.1) using the lme4 package 59 (glmer() function, binomial family). The binary outcome variable indicated wins or losses, with scaled Elo difference and scaled entry-time difference as fixed effects, and mouse identity as a random intercept. The estimated fixed-effect coefficient for entry-time difference was then used to adjust the expected win probability in the Elo update formula. Specifically, for each match, the scaled entry-time difference was multiplied by the LME- derived coefficient and added to the log-odds of the standard Elo expectation. This resulted in a ‘corrected’ Elo rating, which was updated step-wise throughout the match sequences in each group. Finally, corrected scores were rescaled to fit the distributional properties (mean and SD) of the ‘uncorrected’ Elo scores to ensure comparability across methods.

Statistics analysis

Basic data assessments

We quantified the number of trials at the lickport, the number of tube competitions, and the number of chasing events per group and day, and illustrated those using boxplots. The time of day at which licking and social interaction events occurred was visualized with polar plots.

Correlation and stability analyses of behavioral metrics

We used Spearman’s rank correlation coefficients (ρ) to examine relationships among behavioral metrics, as well as between behavioral metrics and body weight. To account for repeated measures, we also fitted LMEs with mouse identity as a random intercept (see Extended Data Main and Extended Data Supplement). Note that the LMEs confirmed all results from the correlation analyses shown in the manuscript. To assess the relationship between social traits and body weight, the z-scored body weight (within groups) before entering the NoSeMaze was correlated with the David’s score from tube competitions, the fraction of active chases, and the fraction of times being chased. To evaluate the temporal stability of social traits within a NoSeMaze round, social metrics were calculated separately for the first, second, and third week, and cross- correlated across weeks. Most mice (n = 68) participated in at least two NoSeMaze rounds with reshuffled group members. We examined the trait stability of social and reward-seeking metrics by correlating values from the first and second round (Spearman’s correlation, cf. Fig. 5). Additionally, we assessed the consistency of cluster assignments based on reinforcement learning metrics between rounds using a chi-square test and visualized transitions with a Sankey diagram. Importantly, to quantify stability across all available rounds, we also estimated intraclass correlation coefficients (ICCs) from variance-component LMEs, interpreting ICCs as the proportion of total variance attributable to stable between-mouse differences. The ICC treated mouse identity as a random intercept and group as a random effect, with repetition fixed (i.e., stability across groups while holding repetition means constant). Formally, for a given metric 𝑦,

with σ2 containing the group as the relevant context term. The time between rounds extended over several weeks (mean: 6.8 weeks, SD: 2.8 weeks; see Supplementary Table S2 for details). For correlation analyses, we obtained 95% confidence intervals by subject-level bootstrap (resampling mice with replacement; B = 2,000 draws; percentile intervals), preserving the within-mouse pairing of rounds. For ICCs, we computed 95% confidence intervals by subject-level bootstrap using the same resampling scheme and refitting the LME for each bootstrap sample.

Comparative analysis of chasing and social hierarchy

Distributions of the fraction of active chases and the fraction of times being chased, pooled across all groups, were assessed using histograms and empirical cumulative distribution plots (Supplementary Fig. S6A). Distribution shapes were compared via Kolmogorov–Smirnov tests. A similar analysis was applied to the distributions of wins and losses in tube competitions (Supplementary Fig. S6B). To further examine differences between the fractions of active chases and of times being chased, we computed cumulative sums over the sorted data within each group, visualized them using shaded error bar plots, and tested group-level differences between the two metrics using unpaired permutation tests (cf. Fig. 4C, 10,000 permutations). Lastly, to explore whether win- loss relationships differed between tube chasing and competition events, we combined data from both modalities and modeled the fraction of ‘wins’ – defined as successful tube competitions or active chases – using a LME (Supplementary Fig. S13). Active chases were treated as ‘wins’ in the context of chasing, being chased as ‘losses’. The model included the fraction of ’losses’, the event type (chasing or competition), and their interaction as fixed effects, with mouse identity as a random intercept. Model assumptions were tested using residual diagnostics and normality tests (Lilliefors, Anderson–Darling, and Shapiro–Wilk). Due to assumption violations, we performed cluster-resampled bootstrapping (10,000 iterations, resampled by subject) to estimate robust confidence intervals for fixed effects. The interaction term captured differences in the win–loss relationship between tube competitions and chasing events.

Relationship between social hierarchy and chasing

First, to assess how competition metrics and chasing relate at the individual level, we conducted correlation analyses using Spearman’s rank correlation coefficients (cf. Fig. 6A and Fig. 6C). To account for repeated measures across mice, we also fitted LMEs with mouse identity as a random intercept that confirmed all results (see Extended Data Main). To further examine and illustrate structural patterns within social interactions, we processed win–loss match matrices from tube competition and chasing events for each NoSeMaze group. These match matrices were sorted by the David’s score (competition-based) or by the fraction of active chases (chasing-based) and normalized by dividing all entries by the maximum value within that matrix, thus setting the strongest observed interaction to 1. This procedure was applied at the group level to preserve the relative structure of interactions while allowing comparison across groups with differing activity levels. Normalized matrices were then averaged across all groups and visualized as heatmaps, with rows and columns representing individuals ordered by their social rank or chasing activity (cf. Fig. 6D-F). Color intensity indicated the relative frequency of interactions between each pair of individuals. To quantify asymmetries in interaction patterns, we modeled quadrant-level differences in the frequencies of normalized dyadic interactions using an LME with a beta distribution and logit link function. Proportions of dyadic events were bounded away from 0 and 1 by a small constant (ε = 0.0001) to meet the assumptions of the beta distribution. The model included quadrant as a fixed effect and group identity as a random intercept. Model fitting was performed using the glmmTMB package in R (https://cran.r-project.org/web/packages/glmmTMB/index.html) 60. Estimated marginal means were computed for each quadrant using the emmeans package (https://cran.r-project.org/web/packages/emmeans/index.html) 61, and back-transformed to the original scale (0–1) for interpretability. Pairwise post hoc comparisons between quadrants were conducted on the response scale, with Tukey-adjusted p-values to control for multiple comparisons.

Group-level analysis of rank–chasing relationships

To evaluate the relationship between social rank and proactive chasing behavior at the group level, we computed Spearman’s correlation coefficients between the David’s scores from tube competitions and the fraction of active chases within each NoSeMaze group. This yielded one correlation coefficient per group (n = 18). We then tested whether these group-level coefficients were associated with group- level transitivity scores, using a Spearman’s rank correlation across groups (cf. Fig. 6B). This analysis assessed whether the strength of the within-group association between rank and proactive chasing varied systematically with the clarity of the social hierarchy. To further characterize this relationship, we also correlated group transitivity with the fraction of active chases performed by the top-ranking individual (cf. Supplementary Fig. S11), reflecting the extent to which dominant individuals engaged in proactive aggression as a function of group hierarchy structure.

Relationships between social and reward-seeking features

Differences in the David’s score from tube competitions, the fraction of active chases and the fraction of times being chased between the three learning clusters were similarly assessed as for the six stimulus- outcome learning features (cf. 'Clustering of reward-seeking profiles'), using the same median-based permutation method accounting for between- and within-animal variability. Again, to control for multiple comparisons, p-values were adjusted using the Benjamini– Hochberg FDR procedure 54. Then, association between the social metrics (David’s score and active chases) and the reward-seeking traits (correct hit and rejection rate, pre-CS licking rate, CS+ modulation peak, switch latencies for CS+ and CS− (from reversal 1 to 4 and for the median) were assessed with similar Spearman’s correlation analyses and LMEs, as described above (cf. 'Relationship between social hierarchy and chasing'). Lastly, we performed principal component analysis (PCA) on z-scored data to examine the underlying structure of social and cognitive traits more profoundly. Skewed variables were first log- or square root- transformed based on skewness direction and magnitude. PCA was conducted combining social (David’s score, active chases and being chased) and reward-seeking traits. Loadings were interpreted to assess the contribution of each trait to the top three components. To evaluate domain specificity, permutation tests (n = 10,000) were used to test whether social and cognitive traits differed significantly in their contributions to each principal component.

Genotype effects

Group differences in social and reinforcement-learning metrics (cf. Supplementary Fig. S9 and Supplementary Fig. S10) between OXTRΔAON and control animals were assessed using permutation tests (10,000 iterations) comparing group medians. To account for repeated measures, group labels were shuffled at the level of the animal. Week- wise comparisons for the social metrics were FDR-corrected for multiple comparisons. To test whether the observed behavioral associations were influenced by the OXTRΔAON genotype, genotype (OXTRΔAON vs. control) was included as a covariate in LMEs and correlation analyses (see Extended Data Main and Extended Data Supplement). Specifically, LMEs were extended with the covariate as additional predictor, and Spearman’s partial correlation analyses were performed including the covariate as factor of no interest. In each analysis, we tested whether the primary relationships of interest remained significant when accounting for genotype. All analyses remained statistically significant after inclusion of the genotype covariate, indicating that the observed associations were robust across genotypes. For instance, in the relationship between the David’s score from tube competitions and the fraction of active chases, the inclusion of genotype in the LME did not alter the significance of the main association.

Acknowledgements

We thank Michael Bram and Jan Ringkamp for their help in engineering the NoSeMaze, and Dr. Carla Filosa for her initial help in developing the behavioral analyses. We also thank Michal Adveev and Laura Haenschke for help with preparing the competition videos.

We acknowledge the use of ChatGPT (OpenAI, GPT-4, accessed 2025) for assistance with language editing, improving readability, and annotating code. All outputs were reviewed, validated, and verified by the authors, who take full responsibility for the content of this manuscript. The work was funded by BMBF 3R consortium grants ‘NoSeMaze1’ (161L0277A) and ‘NoSeMaze2’ (16LW0333K) to W.K., Leibniz Association program grant ‘Learning resilience’ (K430/2021) to W.K., Boehringer Ingelheim Foundation grant ‘Complex Systems’ to W.K., BMBF CRCNS grant ‘Oxystate’ (01GQ1708) to W.K, DFG CRC 379 Project C03 to W.K., and the DFG Clinician Scientist Program ‘Interfaces and Interventions in Complex Chronic Conditions’ (EB187/8-1) to J.R.

Additional information

Data availability

The full documentation of the NoSeMaze, including hardware information and the open-source Python-based software package, is available on GitHub: https://github.com/KelschLAB/NoSeMaze. Source data files for each main figure and each supplemental figures as well as the extended data are provided with the manuscript. All analysis code and data matrices used to generate the results are available on GitHub: https://github.com/KelschLAB/NoSeMaze-social-status.

Funding

The work was funded by BMBF 3R consortium grants ‘NoSeMaze1’ (161L0277A) and ‘NoSeMaze2’ (16LW0333K) to W.K., Leibniz Association program grant ‘Learning resilience’ (K430/2021) to W.K., Boehringer Ingelheim Foundation grant ‘Complex Systems’ to W.K., BMBF CRCNS grant ‘Oxystate’ (01GQ1708) to W.K, DFG CRC 379 Project C03 to W.K., and the DFG Clinician Scientist Program ‘Interfaces and Interventions in Complex Chronic Conditions’ (EB187/8-1) to J.R.

Additional files

Supplementary material.