Introduction

Exploration and exploitation are fundamental components of adaptive behavior under uncertainty in both humans and non-human animals (for review see [28]). However, relatively little is known about how these behavioral processes are modulated by outcomes that are both extremely rare and highly consequential - hereafter referred to as Rare and Extreme Events (REE). Although risk processing has been extensively studied across psychology, behavioral economics, and neuroscience, most laboratory decision-making tasks rely on outcome distributions in which event probabilities exceed 10% (e.g. [9], [10], [41], [53]; see also [20], [27], [32] for notable exceptions). This represents a major limitation considering that in natural environments, many species must cope with events that occur far less frequently yet exert disproportionately large negative or positive impacts. Such events have shaped evolutionary trajectories and can even lead to species-wide extinction [19], while in humans they have occasionally driven technological and cultural leaps [29]. Because their probabilities and magnitudes cannot be reliably estimated from experience, REE pose profound challenges for learning and decision-making mechanisms. It remains largely unknown whether, and how, animals - including humans - detect, integrate, or adapt to such events.

Here, we address this gap by studying rats, a species widely used in behavioral and systems neuroscience to investigate the neural substrates of cost-benefit learning and decision-making (e.g. [6], [1], [49], [51], [24], [33], [47], [21], [58]; see [23] for relevance to human decision processes). We developed a novel four-armed bandit task in which rats repeatedly interact with their environment and encounter REE with probabilities < 1%. Critically, these REE produce outcome magnitudes - both gains and losses - that exceed 10 standard deviations relative to the average outcome. Standard rodent analogs of the Iowa Gambling Task typically use lower-magnitude rare events with probabilities ≥ 10% ([4], [1], [37], [57]). The central feature of our task is that it allows us to test whether rats adopt behavioral strategies that minimize exposure to extreme losses (Black Swans) while preserving the possibility of extreme gains (Jackpots) - analogous to “Anti-fragile” strategies described in conceptual work [48]. In contrast, strategies that eliminate Jackpots while maintaining exposure to Black Swans are here termed “Fragile”. Operationally, complete exposure to Jackpots and complete avoidance of Black Swans defines the Anti-fragile o ption, whereas the inverse pattern defines the Fragile o ption. We additionally define the Robust option, as the one excluding all REE and Vulnerable option the one exposing animals to both positive and negative REE. This framework is compatible with evolutionary hypotheses proposing that organisms may be biased toward choice policies that enhance long-term viability under uncertainty.

Our design separates outcomes into three probabilistic domains: (i) “normal events” (NE, ≈ 90% probability), (ii) “rare events” (RE, ≈ 10%), and (iii) REE (< 1%). Reward magnitudes scale nonlinearly with decreasing probability such that convex options produce increasingly larger gains as probability decreases (i.e., exposure to Jackpots) while concave options limit the possible gain within a short range whatever the probability (i.e. avoidance of Jackpots). In contrast, in the negative domain, concave options produce increasingly larger losses (exposure to Black Swans), while convex ones limit the possible losses within a short range whatever the probability. Convexity here refers not to utility curvature but to the accelerating relationship between magnitude and rarity.

Based on stochastic dominance arguments, value-maximizing agents with non-decreasing value functions should prefer concave options in the NE domain (first-order dominance), and continue to do so in the RE domain if they also display risk aversion (second-order dominance). However, in the full environment including the REE, second-order dominance reverses, and convex options become optimal because avoiding Black Swans is increasingly costly when REE fail to materialize. Thus, Anti-fragile strategies are only advantageous if REE are internally represented.

We incorporate both gains (sugar pellets) and losses (time-out punishment) to capture sensitivity across valence domains. We derive two key behavioral metrics: (1) Total Sensitivity to REE, defined as the combined frequency of convex choices in gain and loss domains, indexing whether rats incorporate REE into their strategy; and (2) One-Sided Sensitivity to REE, defined as the difference between convex gain choices and convex loss choices, quantifying in particular asymmetric Jackpot-seeking versus Black Swan avoidance.

Using 20 rats (about 6000 trials per subject over 41 sessions), we report two major behavioral findings. First, rats diversify their behavior across options but typically concentrate on roughly two of the four available strategies. Nineteen of 20 animals show moderate-to-high Total Sensitivity, indicating that most animals systematically combine some exposure to extreme rewards with partial avoidance of extreme losses. Second, 13 of 20 rats demonstrate near-complete avoidance of Black Swans while only partially seeking Jackpots, producing negative One-Sided Sensitivity and consistent choice mixtures of Anti-fragile and Robust options. This suggests differential weighting of rare positive versus rare negative outcomes. Consistent with this interpretation, both Total Sensitivity and Black Swan avoidance increase after rats directly experience a rare extreme loss. To interpret computationally these results, we evaluated a family of augmented Q-learning models that incorporate explicit sensitivity parameters for Jackpots and Black Swans. Model comparison using information criteria identifies an architecture in which NE+RE and REE are weighted separately during action selection, suggesting that rats assign distinct cognitive or neural salience to REE - especially negative ones - when making decisions under uncertainty.

Results

Measures of sensitivity to Rare and Extreme Events in a four-armed bandit task

Rats were tested in an operant chamber equipped with four nose-poke ports, each linked to a specific distribution of reward (sucrose pellets) and punishment (time-out delays) drawn from a fixed schedule across sessions (Fig. 1a). After training, animals completed 41 experimental sessions (20 min/session; ≈ 120 trials/session). As shown in Figure 1b, positive outcomes consisted of sucrose pellets, whereas negative outcomes corresponded to time-out periods during which ports were inactive, preventing pellet acquisition.

Experimental Design.

(a) Schematic of the operant conditioning chamber. Rats performed a self-paced four-option choice task using nose-poke apertures, each corresponding to a distinct option leading to distribution of reward (sucrose pellets) or punishment (time-out delay) at various size and probability. (b) Outcome structure for each option. Rewards and losses follow either convex (green) or concave (red) probability-magnitude functions. Outcomes are grouped into three probability domains: high-probability Normal Events (NE; 20 − 25%, blue), Rare Events (RE; 5%, yellow), and Rare and Extreme Events (REE; < 1%, pink). (c) Functional characterization of the four choice options based on exposure to rare and extreme gains (“Jackpots”) and rare and extreme losses (“Black Swans”). On the horizontal x-axis are reported (in decreasing order moving away from the origin) the ex-ante probabilities that are unknown to the subject, which only observes the outcome that is measured on the vertical y-axis. For convenience, gains appear in the upper-right quadrant while losses appear in the lower-left one. Convex curve are in green while concave are red. Robust: convex losses and concave gains (avoids both REE). Anti-fragile: convex losses and convex gains (avoids Black Swans while retaining access to Jackpots). Fragile: concave losses and concave gains (exposure to Black Swans, avoidance of Jackpots). Vulnerable: concave losses and convex gains (exposed to both Black Swans and Jackpots). (d) Behavioral sensitivity metrics. Total Sensitivity (y-axis) quantifies overall incorporation of REE by summing convex choices in gain and loss domains. One-sided Sensitivity (x-axis) measures asymmetry between REE seeking and REE avoidance. Example profiles illustrate: an Anti-fragile-like pattern (green), a Fragile-like pattern (red), and a mixed Anti-fragile/Robust profile with strong Black Swan avoidance and partial Jackpot seeking (blue). See Material and Methods for computational definitions.

Each choice option was defined by a characteristic mapping between outcome magnitude and event probability. Outcome magnitudes could follow either convex (accelerating; green) or concave (decelerating; red) functions as event probability decreased (Fig. 1b-c). Critically, all options included a spectrum of outcome classes: highly probable Normal Events (NE; ≈ 90%), Rare Events (RE; ≈ 10%), and Rare and Extreme Events (REE; ≈ 1%), which corresponded to exceptionally large rewards (“Jackpots”: 80 pellets) or exceptionally large punishments (“Black Swans”: 240-s delay).

Combining convex or concave gain and loss functions generated four distinct options (Anti-fragile, Robust, Vulnerable, Fragile; Fig. 1c). The Anti-fragile option paired convex gains with convex losses, thereby allowing exposure to Jackpots while avoiding exposure to Black Swans. The Fragile option combined concave gains and concave losses, yielding small plateauing rewards but exposing animals to extreme losses (Black Swans). Robust and Vulnerable options showed complementary patterns: Robust avoided Black Swans but also prevented access to Jackpots, whereas Vulnerable exposed animals to both extreme outcomes.

Because the behavioral structure explicitly manipulated exposure to REE, we quantified how frequently each rat selected options containing convex (versus concave) outcome functions. Two measures were derived (Fig. 1d). Total Sensitivity to REE reflects the overall tendency to select options that allow contact with extreme outcomes (i.e., convex functions in either domain). One-sided Sensitivity quantifies asymmetry between the gain and loss domains - specifically, whether an animal preferentially seeks extreme rewards (Jackpot Seeking) or preferentially avoids extreme punishments (Black Swan Avoidance).

These two dimensions generate a behavioral space in which animals may cluster near extreme phenotypes (e.g., exclusively Anti-fragile or exclusively Fragile) or diversify among several strategies. Points lying between edges reflect mixed strategies. For example, a profile near the Anti-fragile - Fragile axis indicates strong Black Swan Avoidance combined with frequent-but not exclusive - Jackpot seeking.

Importantly, although Normal and Rare Events dominate the experience of the task, the inclusion of REE changes the adaptive landscape: extreme outcomes, despite their low probability, produce large shifts in reinforcement history. Thus, rats that behave as though REE “matter” (high Total Sensitivity) preferentially select options allowing contact with large rewards while minimizing catastrophic time-outs, whereas rats that behave as if REE are irrelevant (low Total Sensitivity) preferentially sample concave options that yield stable but limited outcomes, according to the stochastic dominance of order 1. Together, this framework allows quantification of how individual animals integrate extremely infrequent but high-impact outcomes into their decision strategy, enabling downstream analyses linking these behavioral phenotypes to neural processes.

Main phenotype: strong avoidance of extreme punishments with partial sampling of extreme rewards

We analyzed the behavioral data collected across 41 daily sessions from 20 rats (see Material and Methods). To highlight the dominant features of the behavioral strategies adopted by the animals, Figure 2 plots each rat’s choices within the two-dimensional space defined by Total Sensitivity to REE and One-sided Sensitivity (relative weighting of extreme losses vs extreme rewards). This representation corresponds to the rotated square introduced in Figure 1d, where each vertex corresponds to exclusive selection of one of the four outcome distributions. A first inspection of Figure 2 shows that no animal operates at a single vertex - i.e., no rat commits exclusively to one contingency. Instead, all animals sample from multiple options, producing distributed behavioral profiles rather than categorical strategies. This diversification is qualitatively consistent with what is often observed in naturalistic foraging scenarios in which animals hedge against uncertainty or extreme events ([5], [40], [43], [48]).

Total sensitivity to Rare and Extreme Events (REE) (y-axis) plotted against one-sided sensitivity to REE (x-axis), where rightward movement indicates REE seeking and leftward movement indicates REE avoidance.

For a detailed description of the modeling and statistical analysis methods used, please refer to the Materials and Methods section. (a) the 820 black dots represent data from 41 sessions conducted with a total of 20 rats. (b) individual rat profiles, with each square corresponding to a single rat. The colored dots within each square denote the performance across the 41 sessions for each rat, categorized into two primary sensitivity profiles: blue dots indicate rats that exhibit strong sensitivity to REE and are classified as “Black Swan avoiders”, while green dots indicate rats that are responsive to REE but maintain a neutral stance, situated near the midline of the graph. Additionally, two outlier data points are highlighted in red and pink. (c) color-coded dots, each representing the average sensitivity measures across all 41 sessions per rat (n=20 rats).

Population-level patterns

Panel 2a shows the full dataset (≈ 800 session-points). Most sessions lie on the left half of the behavioral space, corresponding to strategies that rather avoid REE. Within this domain, most sessions fall in the upper quadrant, indicating moderate to high Total Sensitivity - that is, rats systematically adjusted their choices in ways that reflect the presence of rare and extreme outcomes by minimizing exposure to extreme losses (Black Swans). Training history contributing to these patterns is detailed in Material and Methods.

Individual behavioral profiles

Panel 2b displays each individual rat’s pattern across the 41 sessions. Each square corresponds to one rat, and the four vertices correspond to the four possible outcome distributions (A: Antifragile; R: Robust; V: Vulnerable; F: Fragile). With the exception of one animal (rat 17, red), all rats show moderate to high Total Sensitivity. Within this dominant group, two sub-groups emerge (blue and green). A second outlier, rat 19 (purple), is the only animal expressing both very high Total Sensitivity and strong Jackpot Seeking, achieved by alternating mainly between Antifragile and, less frequently, Vulnerable options. The remaining 18 animals fall into two stable phenotypes:

  1. Blue group (13 rats): These animals primarily oscillate between the Robust and Antifragile options. This phenotype combines (i) moderate to high Total Sensitivity with (ii) pronounced avoidance of Black Swans. In other words, these rats preferentially avoid extreme punishments more than they seek extreme rewards.

  2. Green group (5 rats): These rats primarily alternate between the Fragile and Antifragile options. They show moderate to high Total Sensitivity but display no directional bias in One-sided Sensitivity, indicating neither consistent avoidance of extreme losses nor active seeking of extreme rewards. Their choices are symmetric with respect to REE in the gain and loss domains.

Average phenotypes

Figure 2c summarizes each rat’s average metrics across sessions. All but four rats exhibit an average Total Sensitivity above half of the maximum value; three are slightly below one, and one rat (rat 17) sits around 0.5, making it the closest to a Fragile-like pattern, though still deviating from true Fragile behavior. In total, 19 of 20 rats show clear sensitivity to the presence of REE.

Panel 2c also reveals a pronounced dominance of Black Swan Avoidance (negative One-sided Sensitivity) in the majority of animals. Rats 17 and 19 are opposite outliers: rat 17 has low Total Sensitivity, whereas rat 19 shows the highest Total Sensitivity coupled with strong Jackpot Seeking.

Within the 18-rat majority, the two main phenotypes described above are clearly distinguished: the green subgroup shows no One-sided bias, while the blue subgroup shows pronounced Black Swan Avoidance. Quantitative values are provided in Supplementary Material: Behavioral Measures per Rat.

Structure in the relationship between Total and One-sided Sensitivity

A notable feature in panel 2c is a positive relationship between Total Sensitivity and One-sided Sensitivity among the 16 rats with the highest Total Sensitivity. Within this high-sensitivity group, animals that more strongly incorporate REE into their choices tend to show reduced Black Swan Avoidance (i.e., their One-sided Sensitivity becomes less negative). This shift reflects a progressive tilt toward convexity in the gain domain relative to the loss domain. Rat 19 represents an extreme case, approaching the maximal combination of Total Sensitivity and Jackpot Seeking.

This relationship is confirmed statistically: in the full dataset of 20 rats, the correlation between average Total and One-sided Sensitivity is near zero. However, restricting the analysis to the 16 high-sensitivity rats yields a correlation of ≈0.57. Over learning, this relationship strengthens: splitting the 41 sessions into four blocks of 10, the correlation increases from ≈0.17 initially to ≈0.56, ≈0.71, and ≈0.64 in successive epochs. Thus, the structure linking Total and One-sided Sensitivity emerges early and stabilizes over time.

Behavioral measures of reward seeking and punishment avoidance

Figure 3 reports more intuitive behavioral metrics: the proportion of choices that (i) avoid Black Swans and (ii) seek Jackpots (detailed computation in Materials and Methods: Modeling and Statistical Analysis). Median values show a sharp dissociation between the two phenotypes. Approximately half of the blue rats avoid extreme punishments in > 90% of trials, whereas only ≈40% of green rats-and none of the outliers-achieve comparable avoidance.

Behavioral Differences Between Phenotypes in Avoidance and Seeking Behaviors.

This figure illustrates the median behavioral measures of Black Swan Avoidance and Jackpot Seeking across the two phenotypic groups, blue and green. The phenotypes are defined by distinct responses to Rare and Extreme Events (REE), with the blue phenotype exhibiting higher sensitivity to REEs and a tendency towards avoiding Black Swans, while the green phenotype shows less bias and remains closer to neutral behavior. Left Panel: This panel depicts the percentage of nosepokes where rats were exposed (black) or not exposed (light gray) to Black Swans. Increased Black Swan Avoidance is indicated by a higher proportion of non-exposure, suggesting a behavioral preference to avoid aversive outcomes among the blue phenotype compared to green. Right Panel: Conversely, this panel shows the percentage of nosepokes where rats were exposed (black) or not exposed (light gray) to Jackpots. Enhanced Jackpot Seeking is indicated by a higher proportion of exposure, suggesting greater motivation towards reward acquisition in the blue phenotype compared to green. Statistical comparisons between the phenotypes were conducted using Wilcoxon Rank Sum tests. Significant differences are denoted with ⋆ (p < 0.05) and ⋆⋆ (p < 0.01).

By contrast, Jackpot Seeking does not differ between blue and green rats. Wilcoxon paired tests confirm this: blue rats differ significantly between punishment avoidance and reward seeking (p = 0.0002), whereas green rats do not (p = 0.6250). The two outliers diverge in opposite directions: rat 17 seeks almost maximal Jackpot exposure, while rat 19 nearly eliminates it. Exact values are given in the Supplementary Material.

Black Swan outcomes strengthen sensitivity to rare and extreme events, whereas Jackpots do not

After establishing that most animals (19/20) display moderate-to-high Total Sensitivity to REE and robust avoidance of extreme punishments (Black Swans), we next asked how the actual experience of an REE shapes subsequent choice behavior within the same session. Specifically, we compared each rat’s behavioral sensitivity in the 10 nose-pokes preceding an REE with the 10 nose-pokes following it.

Figure 4 summarizes these within-session dynamics. For each rat (color-coded as in Figure 2), panel 4a shows the effects of experiencing a Jackpot, whereas panel 4b shows the effects of encountering a Black Swan. Total Sensitivity is plotted on the left column, and One-Sided Sensitivity (relative weighting of extreme losses vs gains) on the right. The dotted diagonal indicates no change.

Responses of Total Sensitivity to REE and of One-sided Sensitivity to REE, averaged over the 41 final sessions, following Jackpots in panel (a) and Black Swans in panel (b), for each of the 20 rats.

Each dot indicates one rat and dots are color coded to indicate the profile of the animal, as in Figure 2. Black dotted lines materialize no change in sensitivities after exposure to the REE compared to before. Black solid lines represent spline estimates -see Material and Methods.

A clear asymmetry emerges: Jackpots produce minimal change in either metric, with most points lying close to the diagonal. In contrast, Black Swan outcomes shift behavior systematically. Most animals show an increase in Total Sensitivity - i.e., they become more responsive to the structure of rare events-along with a decrease in One-Sided Sensitivity, reflecting a further shift toward avoidance of extreme losses. In both measures, points cluster above (Total Sensitivity) or below (One-Sided Sensitivity) the 45° line, indicating a consistent strengthening of REE-related behavioral weighting after punishment but not after reward.

Thus, rare and severe losses - but not rare gains-acutely potentiate the animals’ sensitivity to REE. This pattern aligns with our overall finding that rats prioritize avoidance of extreme negative outcomes and that negative REE have disproportionate influence on choice updating. Bootstrapped statistics and additional tests reported in Material and Methods (see Table 3) confirm the significance of these effects.

In summary, within-session analyses reveal a strong valence asymmetry: Black Swan events transiently enhance both global sensitivity to REE and the bias to avoid rare losses, whereas Jackpots do not measurably alter behavioral sensitivity. This suggests that rare aversive outcomes exert a privileged influence on decision-making processes in this task, reinforcing the dominant phenotype of heightened vigilance toward extreme negative events.

Q-Learning models require specific REE decision weights to mimic rats’ behavior

We have introduced sensitivity to Black Swans and Jackpots in a new class of augmented Reinforcement Learning models and we have estimated their parameters using observed choices and outcomes for each rat. The selected model ended up with a distinctive feature: it separates normal from rare and extreme outcomes through different weights in the Decision-Making process. Adding such specific sensitivity results in a good fit of the selected model - and simulated behaviors that are close - to behavioral observations, whereas a standard Q-learning model without sensitivity to REE is rejected for almost all rats.

To each of the four available options, indexed below by “o” (from 1 to 4, referring to Antifragile, Fragile, Robust, Vulnerable) in equations (1), we attach at each moment in trial/time t a gain sub-value Qg(o) and a loss sub-value Ql(o) that are updated as follows:

where the convention that updated values are indexed by t + 1 means right after observing the gain or loss corresponding to the choice made at t. Parameters αg and αl are usually labeled learning rates, indicating the speed at which reward prediction error gets updated into the latest values. Limiting cases occur when setting the learning rate either to zero - its smallest possible value - which means no subvalue updating, or to one - its largest possible value - which means that the subvalue tracks the obtained reward itself. While equations (1) applies for the updating of the chosen option, a similar updating rule applies for the three other options that are not chosen at that moment, by assuming that this happens as if the reward was zero, with different parameters. That is, for options that are not chosen, the updating rules for subvalues are simply and , where the forgetting rates and are assumed to be bounded between zero and one.

The novel feature of our augmented Q-Learning model, besides integrating gains and losses, is that specific weights are attached to options that may produce REE. More specifically, we model the value of each option o as:

In equation (2), the value of each option Vo is the sum of two terms. The first is the decision weight attached that sums up the gain and loss subvalues, each weighted by parameters λg and λl that may reflect a differential effect. Importantly, in the class of nested models that we estimated, REE may or may not be incorporated in the gain and loss subvalues, that should be thought of as averages - though not arithmetic but of an exponential moving type in view of equations (1) - see page 32 in [50]. The second and key term in equation (2) specifically captures the decision weight attached to REE, and it can itself be decomposed into the decision weights on Jackpots and on Black Swans. More precisely, the indicator function 1JP (o) equals one if the option exposes to Jackpots (namely when the chosen option is either A or V) and zero otherwise. Similarly, 1BS(o) equals one if the option exposes to Black Swans (namely options F and V) and zero otherwise. Therefore, γg and γl reflect the (possibly different) weights attached to Jackpots and Black Swans in the decision to choose one particular option rather than any other available.

Finally, the associated so-called action probabilities are given, for each option, by the Softmax function (a.k.a. the multinomial Logit model), each a number between zero and one.

In our model-fitting procedure (described in detail below), the key parameters of interest are the decision-weight terms for rare and extreme gains (γg) and losses (γl). In the baseline Q-learning model, both terms are fixed at zero, meaning that REE do not receive any special weighting beyond their contribution to standard value updates. We therefore compared two families of candidate models to this benchmark: one in which REE contribute directly to option sub-values, and another in which REE influence choice only through dedicated decision-weight parameters (γg,γl).

The latter class of models is especially informative. If selected, it suggests that animals treat REE as qualitatively distinct inputs to the decision process-processed separately from average reward estimates-rather than integrating them into conventional value updating. Overall, the augmented Q-learning framework includes up to eight free parameters: learning rates for gains and losses, forgetting rates for gains and losses, decision weights for gain- and loss-related sub-values, and the two REE-specific decision weights γg and γl.

Model comparison using information criteria (summarized by phenotype in Table 1) shows that 17 of 20 animals incorporate REE using dedicated decision weights, whereas only three animals are best fit by models in which REE carry no special computational status. Most rats (including 8 blue, 4 green, and the red and pink animals) rely specifically on γg and/or γl to shape choices, with or without distinct forgetting parameters. A minority of blue rats incorporate REE both through decision-weight parameters and through REE contributions to sub-values. Learning and forgetting rates did not systematically distinguish between model classes.

Selected augmented Q-Learning models by phenotype

Figure 5 presents the predicted sensitivities (bottom left) and simulated sensitivities (bottom right), compared with empirically observed sensitivities (top; identical to Figure 2, right panel). Predicted sensitivities - computed using each rat’s estimated parameters and the actual reward sequence - closely reproduce empirical sensitivity patterns. Simulated sensitivities were obtained by running each selected model through artificial 41-session datasets, using median parameters for the blue and green phenotypes (and individual parameters for the red and pink rats). These simulations show that the selected models generate stable behavioral phenotypes for the blue, red, and pink animals. Green rats, however, tend to segregate into two stable subgroups along the vertical (One-Sided Sensitivity) axis.

Total Sensitivity to REE (y-axis) against One-sided Sensitivity to REE (x-axis).

Averages for each rat over the 41 sessions; top panel replicates the right panel in Figure 2: observed sensitivities; bottom left panel: predicted sensitivities derived from selected models; bottom right panel: simulated sensitivities computed from runs of selected models over artificial 41 sessions

Estimated parameters for each rat are shown in Figure 6. A prominent distinction between phenotypes emerges when focusing on the REE-specific d ecision w eights: b lue rats show a strong selective sensitivity to Black Swan, expressed as significantly n egative γl values (p = 0.0102). This indicates a robust computational penalty assigned to options that expose the animal to Black Swans (options F and V), consistent with their observed behavioral avoidance of these outcomes. In contrast, green rats do not show this selective aversive weighting and typically combine options F and A, suggesting a less selective treatment of REE. The before/after analysis illustrated in Figure 4 shows that the behavioral response to Black Swans is locally small in terms of both Total and One-sided sensitivities. This suggests that such effects are likely to be too subtle to be captured by this class of models for most rats.

Parameters of the selected Q-Learning model, estimated for each rat over the 41 sessions.

Left panel: decision weights for averages of gains and losses and for REE; right panel: learning and forgetting rates for gains and losses; while individual parameter for pink and red rats are represented as dots, parameters for blue and green rats are represented by box plots with 10th and 90th percentiles, interquartile ranges and median; between phenotype comparisons carried out by Wilcoxon Rank Sum tests (⋆ means p < 0.05) - see Material and Methods subsection Augmented Q-Learning Model Estimation and Simulation for details

Discussion

Asymmetric behavioral responses to rare extreme losses and rare extreme gains

We showed, first, that all rats in our sample exhibit some degree of Total Sensitivity to REE, with most animals displaying medium to high sensitivity. Second, we identified a dominant behavioral phenotype characterized by near-complete Black Swan Avoidance combined with partial Jackpot Seeking. We interpret these findings as reflecting a fundamental asymmetry between negative and positive REE, with a few caveats. Though we cannot fully exclude the possibility that satiety contributed to the lack of complete Jackpot Seeking, we did not systematically observe animals that stopped seeking the Jackpot. Because we restricted the rare outcome to occur on either the 10th or the 60th activation in a session (Table S1 in Supplementary Material), the question of whether the animals learn this association may arise. If the animals had learnt the 10th and 60th activation, they would exhibit a choice strategy that would tend to be more optimized than what is observed. For example, the options offering the possibility to obtain the Jackpot are not optimal in terms of gains for the frequent events, therefore the animals should tend to select these options only around the 10th and 60th choice. Most of their other choices should favor the options delivering the larger gains in the frequent domain. This is not what is observed.

In our task, avoiding Black Swans requires accepting longer frequent delays, and seeking Jackpots requires foregoing larger frequent gains. However, rats can guarantee the avoidance of Black Swans by selecting convex loss schedules, making these options relatively attractive. In contrast, no sequence of choices can guarantee the occurrence of a Jackpot-its realization depends on rare stochastic draws. For illustration, if 150 choices are made in a session (within the observed range) and the ex-ante REE probability is 1%, a Poisson approximation gives a probability of ≈ 78% for observing exactly one REE-a non-negligible chance for zero occurrence. Thus, our design captures a general asymmetry likely to characterize positive versus negative REE more broadly: while extreme losses can in principle be fully avoided, extreme gains cannot be assured, even under persistent optimal seeking. This asymmetry has important consequences for choice behavior. Rats experience a continuous stream of evidence-derived from the frequent-outcome domain-that points toward concave options as locally optimal. In particular, the Fragile option yields higher frequent gains and smaller frequent losses, and stochastically dominates the other options when REE do not occur. Although Fragile is dominated by Anti-fragile when both the Black Swan and Jackpot occur, complete Black Swan Avoidance provides near-certain protection against extreme losses. This makes strategies combining Anti-fragile and Robust options attractive, despite violating Stochastic Dominance in the frequent domain: full-domain Stochastic Dominance is effectively secured.

In contrast, complete Jackpot Seeking is less attractive because it cannot ensure the occurrence of a Jackpot; the cost of receiving smaller gains on most trials becomes unjustified unless the compensatory extreme reward actually occurs. Partial Jackpot Seeking therefore emerges as a more balanced strategy: it reduces violations of Stochastic Dominance in the frequent domain while still allowing some exposure to large rare gains. Put simply, bearing frequent relatively large losses is acceptable only if the avoided Black Swan is certain, and bearing frequent small gains is acceptable only if the compensating Jackpot is likely to appear-which it is not. Our results indicate that the “blue” rats have learned this property, combining Robust and Anti-fragile options to achieve near-complete Black Swan Avoidance with partial Jackpot Seeking. In contrast, “green” rats continue to mix Anti-fragile and Fragile options, leaving them exposed to Black Swans.

Interestingly, since rats do not stick to one choice but show some flexibility in their strategy, this rules out the influence of a spatial bias as the main driver in their response. Interestingly indeed, the design of the task was biased in two ways: the favorite hole during training was allocated to the antifragile option as a proof of concept and the REE occurred at the 10th and 60th choice if the appropriate option was chosen. Since most rats diversified their choices, the biased location does not seem to have prevented rats to learn all options. One might wonder whether or not they may have understood and learnt the 10th and 60th choice REE delivery. If this was the case, a trend towards optimal choice such as options delivering the largest rewards and the lowest losses in the most frequent domain with a shift towards options with positive REE and avoidance of negative REE around the 10th and 60th trial should have been observed. This was never the case.

In theory, because frequent outcomes occur often, animals should be able to detect Stochastic Dominance in the frequent domain relatively quickly. However, prior work indicates that animals require many sessions to learn Stochastic Dominance (e.g. [59], [3]). Our findings suggest a possible explanation: substantial experience may be needed to infer that REE have not occurred and can be excluded from the estimated distribution.

At this stage, it is natural to ask whether our findings reflect loss aversion or ambiguity aversion. Although most studies do not focus on REE, tasks involving both positive and negative outcomes generally report loss aversion (e.g., [18], [14], [61]), consistent with Prospect Theory ([22]). Models based on this framework, however, tend to fit rat behavior more effectively in tasks involving o nly rewards, w here losses correspond to the a bsence of a positive outcome rather than punishment ([10],[3]). Emotional and anxiety-related mechanisms may also contribute ([60]), potentially implicating the basolateral amygdala, which has been linked to differential evaluation of negative outcomes in rats ([54]). Ambiguity aversion in non-human animals remains relatively unexplored (but see [44]).

Our results tentatively suggest that rats may exhibit ambiguity reduction with respect to REE: combining near-complete Black Swan Avoidance with only partial Jackpot Seeking collapses the sampling probability of Black Swans toward zero, while keeping the sampling probability of Jackpots bounded between zero and ε. This behavioral asymmetry provides further evidence that extreme gains and losses receive distinct treatment during decision making. Classical Q-learning models, which allow differential weighting of gains and losses, fail to capture these patterns. In contrast, the augmented Q-learning models incorporating explicit sensitivity to REE accurately reproduce the observed behaviors - principally through the strong aversion to Black Swans that distinguishes the “blue” phenotype from the “green” one.

While a wavy recency effect has been documented in human experiments ([38]), the before/after analysis reported in Figure 4 suggests that there is no sizeable immediate recency effect for Jackpots in rats. Even for Black Swans, the immediate recency effect we report remains modest when using a 10-trial window, and the analysis of the choice immediately following a REE does not show evidence of immediate negative recency. Although further investigation is beyond the scope of this paper, possibly interacting recency effects of Jackpots and Black Swans seems a promising research avenue. Also note that the task design does not allow rats to completely avoid negative rare events (RE) unless they cease performing the task altogether - a pattern typically seen in paradigms involving aversive stimuli such as electric foot shocks. The fact that all 20 rats maintained stable performance across the 41 sessions therefore provides evidence against a pronounced “hot-stove effect” (see [12]).

The augmented Q-learning models suggest a specific neural pathway for REE encoding

The close fit and predictive accuracy of the augmented Q-learning model for the dominant behavioral phenotype point to a novel neurobiological hypothesis. Within this modeling framework, 11 of the 13 rats in the blue group (as defined in the Materials and Methods) appear to exclude REE from the Q-value update, yet display a specific sensitivity to Black Swans through a negative parameter γl. In other words, when evaluating available actions, “blue” rats do not integrate Black Swans into the running average of losses; instead, they assign them a distinct decision weight, effectively reducing the subjective value of options that expose the animal to these extreme losses.

In humans, rare and extreme stimuli are known to be disproportionately salient in perception and memory compared to moderate ones (e.g., [26]). Our behavioral and modeling results suggest that rats may exhibit a similar cognitive bias. This, in turn, raises the possibility that REE - particularly rare and extreme losses-may rely on neural encoding mechanisms that differ from those that process frequent, small rewards, which are classically attributed to dopamine neurons ([9], [46], [61], [55]), the striatum ([7], [11]), and cortical regions such as the anterior cingulate and orbitofrontal cortex ([45], [15]).

Identifying where and how REE are encoded in the brain remains an open question. If integrating REE requires representing long-term or low-frequency outcomes-beyond the short-term reward horizons typically studied (e.g., [56]) - candidate regions may include the anterior insula and the amygdala. The latter, given its central role in emotional salience, is a plausible substrate for representing highly aversive and unpredictable outcomes, which are known to bias choice asymmetrically toward negative events ([7], [54]).

From a computational standpoint, it is statistically rational to avoid folding extreme outliers into the mean of a distribution, as this can destabilize the estimate. Analogously, adaptive decision-making may benefit from treating REE separately from frequent outcomes, and from imposing different weights on Black Swans versus Jackpots. If such differential encoding confers evolutionary advantages in volatile environments (see [8]), then we might expect dedicated neural mechanisms to have evolved-possibly across multiple species.

To our knowledge, this specific hypothesis, grounded jointly in behavioral data and computational modeling, has not yet been directly tested in neuroscience. Doing so would align with the computational-validity framework advocated in [42], and represents an important direction for future research.

Relevance for humans

Whether humans would also find Black Swan Avoidance a nd Jackpot Seeking attractive - and how they might combine these strategies in a similar experimental context-is an obvious question raised by our results. Our experimental design can be readily adapted to human participants, allowing us to test whether people likewise tend to avoid harmful REE and pursue beneficial ones. We have started exploring modified designs for human subjects in which the rare but non-extreme outcomes are removed, in line with [27] who emphasize that combining rarity and extremity is key. Preliminary results indicate that the behavioral phenotypes observed in rats also emerge in humans under these modified conditions, suggesting that REE are the primary drivers of observed behaviors. Furthermore, such an extended task may be a promising tool to delineate decision-making profiles in professional environments (e.g. firemen) and also to investigate vulnerability to addiction in patients. This translational potential makes the present findings i n r ats r elevant t o b roader issues concerning the past and future of humanity.

Within the framework of the Anthropocene, mounting evidence shows that climate change and other environmental disruptions driven by human activities generate extreme events (see e.g., [13]), including rare but potentially catastrophic ones that could threaten many species with partial or complete extinction. Although uncertainty remains about the magnitude of these risks, empirical data place humanity in a situation that raises difficult but pressing questions. Why has our species historically failed to avoid behaviors and policies that increase exposure to destructive REE? And to what extent will humans be able to cope with such events in the near future, should they occur? Put differently, are humans following trajectories that diverge from Black Swan Avoidance rather than moving toward it?

While it is intuitive to assume that avoidance of destructive REE-and the ability to cope with them when they occur - provides adaptive value, compelling evidence for such evolutionary mechanisms remains scarce (but see [8]). If correct, this idea introduces a paradox: humans disproportionately contribute to creating environments rich in harmful REE. Does our species possess a particular trait that impairs our ability to protect ourselves against them, despite the apparent evolutionary benefits of doing s o? Because destructive REE are increasingly endogenously generated by human activity, one may even wonder whether a uniquely human developmental or epigenetic factor predisposes us to decisions that fail to eliminate exposure to extremely harmful outcomes. If so, such a factor could also impair our ability to cope with REE once they become unavoidable. For instance, humans might struggle to detect predictive cues of REE - such as convexity in our experimental design (see [2]).

The results from our rat experiments fortunately do not support this pessimistic view. The substantial individual variability we observe-where some animals fail to avoid rare but extreme losses even when given the opportunity - suggests that humans might share this variability rather than being uniquely deficient. Importantly, the majority of rats consistently chose options that nearly eliminated exposure to harmful REE while still allowing partial access to beneficial REE in the form of Jackpots. This “adaptive” subgroup constitutes the largest portion of our sample.

A speculative hypothesis follows: some humans, and perhaps many, may also adopt strategies that avoid harmful - and sometimes pursue beneficial - rare extreme outcomes. More broadly, if sensitivity and behavioral responses to REE vary across individuals within a species, this raises important questions about how such heterogeneity shapes population dynamics, social structures, and, crucially, collective decision-making.

Material and Methods

Experimental Model and Subject Details

Animal Subjects

Adult Lister Hooded males (n = 20, ≈200 g at arrival, Charles River) were housed in groups of two in Plexiglas cages and maintained on an inverted 12 h light/dark cycle (light onset at 7 pm) with water available ad libitum, in a temperature - and humidity - controlled environment. Food was slightly restricted (≈80% of daily intake). Animal care and use conformed to the French regulation (Decree 2010-118) and were approved by local ethic committee and the French Ministry of Agriculture under #03129.01.

Experimental Method Details

Apparatus

All behavioral experiments took place during the animals’ dark phase in standard five-hole operant boxes (MedAssociates) located in ventilated sound-attenuating cubicles. One side of each box was equipped with a central house light, a tone generator and a food magazine, outfitted an infrared beam for detecting nose poke inputs. Sucrose pellets (20 mg; Bio-Concept Scientific) were delivered from an external food pellet dispenser. An array of five response holes was located on the opposite curved wall, each equipped with stimulus lights and infrared beams for detecting input (nose poke). The center hole was continuously closed throughout the experiments (Fig. 1a). Data were acquired on a PC running MedPC-IV.

Design of Behavioral Sequences

Four menus were elaborated by mixing convex and concave exposures for both the gains (sugar pellets) and the losses (time-out punishment) described on Figure 1 (panel (b)). For the gains, animals could obtain 1, 3 (NE domain, blue), 12 (RE domain, yellow) or 80 (REE domain; pink) sugar pellets in the convex exposure or 2, 4, 5 or 5 pellets in the concave exposure. For the losses, convex exposure may impose 6, 12, 15 or 15 sec of time-out punishment while it was 3, 9 (NE domain; blue), 36 (RE domain, yellow) or 240 sec (REE; pink) for the concave exposure.

The four behavioral options depicted in Figure 1, panel (c), are therefore combinations of the above convex and concave exposures: the “Anti-fragile” exposure at the top middle is convex in both gains and losses, while the “Robust” option (left) is only convex for the losses. On the other hand, the “Vulnerable” exposure (right) is convex only for the gains, while the “Fragile” option (bottom middle) is concave for both gains and losses. This implies that the extreme - but rare - gain of 80 pellets (i.e. Jackpot) may be delivered only when either the Anti-fragile or the Vulnerable options are picked, while the extreme and rare time-out punishment of 240 sec (i.e. Black Swan) may be only experienced if choosing either the Fragile or the Vulnerable options.

The first three events of each behavioral options belong to the frequent domain as their frequency of occurrence, respectively 0.5, 0.4 or 0.1, is significantly larger than zero. During behavioral testing, animals equivalently experienced the gain and loss domains. On the other hand, extreme outcomes, i.e. Jackpot or Black Swan, have a much smaller likelihood of occurring since they may appear only at particular point during the behavioral sequences (see below). This means that they could happen if a rat has chosen an exposure that is either convex in the gain domain or concave in the loss domain at a given time. This implies that the frequency of rare and extreme events is less than half of one percent for all rats. Despite their low probability, extreme events have some importance because of their value. Our calibration of both frequencies and outcomes ensures that concave exposures dominate convex exposures: if the extreme gain (Jackpot) does not materialize, the expected payoff in sugar pellets is larger for concave exposures than convex exposures (i.e. ex-post first order stochastic dominance, over the frequent domain). Similarly, the expected time-out punishment is lower for concave exposures compared to convex ones, if the Black Swan does not happen. However, ex-post first-order stochastic dominance is reversed in the presence of extreme events, in which case convex exposures become more interesting in terms of payoff than concave ones. The reason we imposed such a dominance reversal was as follows. If rats always choose concave exposures, then they show no sensitivity to rare and extreme events, since they act as if those events never occur and always go for first-order stochastic dominance. This gives us our first measure, Total Sensitivity to REE, that simply sums up the proportion of convex exposures that are chosen for each rat over the 41 sessions. Because we deliberately integrate gains and losses in our design, we need to distinguish whether rats tend to choose convex exposures symmetrically over the gain and loss domains. We say that a particular rat exhibits Black Swan avoidance when it picks convex exposures in the loss domain more often than in the gain domain. Likewise, Jackpot Seeking occurs when convex choices are more frequent in the gain domain.

To avoid potential learning of the event occurrence during behavioral training and testing, ten different sequences of events, with respect of the first-order stochastic dominance as well as the balance between gain and loss domain exposures, were generated and randomly used for behavioral training and testing (Supplemental Table S3). Furthermore, to increase the rarity and the unpredictable nature of extreme events, the sequences of events used during behavioral training and testing were declined into seven various sequences, in which Jackpot and Black Swan are either unavailable, solely or both available at a given time point of the sequence of events (Supplemental Table S1): when available, extreme events could be obtained at the 10th or the 60th activation within a given sequence, but could not occur at the same time. For example, in a behavioral sequence where the Jackpot should be available at the 10th position, any poke in the Anti-fragile and Vulnerable holes following nine responses made in one of these menus (regardless of the positive or negative outcomes) would trigger the delivery of the Jackpot. Of note, depending on the sequence used, animals could experience both extreme events in a single session.

Behavioral Training and Testing

Training was divided into five distinct phases before the final test: acquisition of the food collecting responses, acquisition of nose poking in the holes, training with four holes, attribution of menus to hole and training on the menus (Figure 7). Each session started with the illumination of the house light.

Time-line of experiments.

  1. Acquisition of the food collecting responses: Animals were trained to collect sucrose pellets in the food magazine during three 30-min daily sessions (100 pellets max) under fixed ratio 1 schedule of reinforcement (FR1): one nose-poke into the food magazine triggered the delivery of one sucrose pellet. During this initial phase, nose-poke in any hole had no programmed consequences.

  2. Acquisition of nose-poking in the holes: Here, animals had only access to one hole (the three others being occluded) during the 20-min sessions. One nose-poke in the available hole triggered the illumination of the hole-light and the delivery of one sucrose pellet in the food magazine. Perseverative nose-pokes (those performed before food collection) had no consequences. Following food collection in the food tray, animals were allowed to poke again in the opened hole. During each session, a maximum of 100 pellets were delivered. All animals were trained twice on each hole.

  3. Training with four holes: Following the eight training sessions, animals were allowed to poke in the four different holes during twenty daily sessions of 20-min. Nose poke in any hole triggered the illumination of the associated light and the delivery of one sugar pellet in the food magazine. Both perseverative activations and pokes in other holes had no consequences. After food collection in the magazine, animals were allowed to nose poke again in any hole. The first ten training sessions were limited to 100 sugar pellets. During the following ten sessions, animals were able to collect up to 200 pellets per session.

  4. Attribution of menus to hole: We determined the spatial preference for each rat by establishing the percentage of activation of each hole during the last ten sessions. To favor the emergence of Anti-fragile choices, menus’ attribution was made as follow:

    • Anti-fragile exposure was associated to the preferred hole

    • Robust exposure was associated to the 2nd preferred hole

    • Vulnerable exposure was associated to the 3rd preferred hole

    • Fragile exposure was associated to the least preferred hole

  5. Training on the exposures: Here, animals were first trained on the gain domain, i.e. no time-out punishment, for each menu/hole. They were subjected to two 20-min sessions, with unlimited number of pellets, during which only one hole was available (eight training sessions in total). For Anti-fragile and Vulnerable options, we used the two sequence-types where the jackpot was available at the 10th and 60th activations (Supplemental Table S1) to ensure that all individuals could experience both an early and delayed jackpot during training. Animals were then allowed to explore all gain options (four opened holes) during 9 20-min sessions, for which different behavioral sequence-types were used. Before training on the loss domain of the different menus, animals were first exposed to a mild and constant 3-sec time-out punishment. Here, animals had access to all options, but half of the activations lead to a 3-sec time-out punishment, notified by a 3-sec tone and the extinction of the house light. During this period, pokes in the different holes or the food magazine had no programmed consequences. After the 3-sec time-out punishment, the house light was turned on and the animals could again pokes in the different holes. Following nine 20-min training sessions, the loss domain of each menus was progressively introduced. As described above, each time-out punishment was notified by a 3-sec tone and the extinction of the house light for the whole duration of the punishment, which termination was signaled by house light illumination. Animals were first exposed to concave exposures, having only access to vulnerable and fragile holes during four 20-min sessions. They were then exposed to convex losses (only Anti-fragile and Robust holes available) for another four 20-min sessions. Thus, at the end of the training, all animals experienced both extreme events at least four times.

  6. Final tests: For the final tests, animals were free to explore all menus throughout the forty 20-min sessions. Animals experienced four times the ten different sequences, randomly distributed across sessions. Population (n = 20) was subdivided into two groups that experienced different, but equivalent, sequence-type distribution (Supplemental Table S2).

Figure 7 depicts a simplified time-line of the experimental procedure.

Table S3 shows the chain of events in the ten sequences used for behavioral training and testing. Numbers in the second column indicate which event would occur and the sign preceding it whether it belongs to the gain (no sign) or loss (minus sign) domains. For example, the third event in the first sequence (noted −2) triggers a time-out punishment of 9 or 12 sec, depending on whether an animal performed his third nose poke in concave or convex menu, respectively. Note that extreme events (which should be noted 4 or −4) do not appear in the sequences, as they would automatically replace the 10th or 60th events of the sequence. See supplementary material for detailed information on sequences of events.

Modelling and Statistical Analysis

Modelling Convex and Concave Exposures

Central to our experimental design is the notion of convex/concave exposure under radical uncertainty, that is, when probabilities and consequences are unknown a priori to subjects. A well known measure of convexity is the Jensen gap that is derived from Jensen’s inequality (see [30] for a graphical exposition), which we now define and relate to statistical moments. In our context, the relevant form of Jensen’s equality states, loosely speaking, that the expectation of a convex function of a random variable is larger than the value of that function when evaluated at the expectation of the random variable. Jensen’s gap is then defined as the difference between the expectation of the function minus the value of the function at the expectation (hence positive by construction). The inequality is reversed for a concave function and the (positive again) Jensen’s gap is then defined as the difference between the value of the function at the expectation minus the expectation of the function. In the gain domain, rats obtain 1, 3, 12 or 80 sugar pellets if the convex exposure is chosen, or 2, 4, 5, 5 pellets if the concave exposure chosen. In the loss domain, convex exposure imposes 6, 12, 15 or 15 seconds of time-out punishment, and 3, 9, 36, 240 seconds for the concave exposure. Because the values in the loss domain are proportional to the values in the gain domain, the former corresponding to 3 times the latter, we focus here on values for pellets. The statistical properties of convex and concave losses follow accordingly. More formally, sequences of gains are ordered by increasing values and are denoted for the convex exposure and for concave gains. We assume identical probabilities for both concave and convex gains, where ε is the ex-ante probability of the rare and extreme event (REE for short). The third value (that is, 12 for convex gains and 5 for concave gains) is labeled a rare event (RE), and the sets of the lowest two values (that is, 1 and 3 for convex gains, 2 and 4 for concave gains) for both exposures are composed of normal events (NE).

Making use of the exponential transform, we define Jensen gaps for the convex and concave exposures as, respectively, which relate to statistical moments as follows. The moment generating function for the convex and concave exposures are given by, respectively: and . The series expansion of the exponential function allows us to derive:

where and are the N-th statistical raw moments of the convex and concave exposures, respectively. For example, for any integer N ≥ 1. Jensen’s gaps for both exposures are then given by . Using again the series expansion of the exponential of the first moment for each exposure, it follows that both positive Jensen gaps can be written in terms of moments, as follows:

Note that the sums in equation (3) nicely allow for a straightforward decomposition in terms on all raw moments with order larger than two.

More specifically, we set probabilities to identically for both concave and convex gains. Treating the probability for the REE as a varying parameter, it can be shown that Jensen’s gaps for convex and concave exposures are ordered, with the former larger than the latter when ε ≤ 0.02. Over that range, it also holds true that both Jensen’s gaps are monotone increasing function of ε. While this property also holds for the expectations and variances of both exposures, it doesn’t for higher-order moments. For example, the skewness and kurtosis of the convex exposure turn out to be hump-shaped functions of ε, with peaks corresponding to values of ε smaller than one percent. This fact underlines that convexity measured by Jensen’s gap offers a unifying approach to rank exposures, seen as lotteries, whereas statistical moments do not necessarily do so. We conjecture that, more generally, all lotteries satisfying our assumptions (including monotone probability distribution) can be ranked according to their convexity as measured by the Jensen gap.

In Table 2, we report the central moments and the log of Jensen gaps in the top five rows, for both exposures when ε = 1/75 ≈ 1.3%, which corresponds to obtaining one REE out of 75 nose pokes. For that particular parametrization, convex and concave exposures differ mostly in their respective kurtosis and in their Jensen gaps. From row six and below, we report the decomposition of Jensen gaps in terms of the raw moments. For example, in row six is given the ratio of (m2 − (m1)2)/2 - that is, half the variance - to the corresponding Jensen gap for each exposure, in percentage. Strikingly, while moments of order 2 to 8 explain about 90% of the concave exposure’s Jensen gap, they contribute negligibly to that of the convex exposure. In fact, it turns out that moments with order around 80 start contributing, although for a small share each, to the convex exposure’s Jensen gap. In sum, while for the concave exposure a few low-order moments concentrate the contributions to the Jensen gap, the contributions of raw moments to the convex exposure’s Jensen gap spread across a larger number of very high order raw moments.

Statistical moments and decomposition of Jensen gaps for convex and concave exposures

Before/after differences for Total Sentivity to REE and for One-sided Sensitivity to REE, averaged over the 41 sessions, for each of the 20 rats - see Figure 4.

We highlight negative mean differences in red and positive mean differences in yellow.

We derive next the ex-ante properties of the probability distributions associated with concave and convex gains (for definitions of stochastic dominance and congruent utility classes, see for example Fishburn and Vickson [16]; also note that first-order, resp. second-order, stochastic dominance implies second-order, resp. third-order, stochastic dominance):

  • In the domain restricted to NE, concave gains first-order stochastically dominate convex gains. In addition, concave gains then have a larger expected value than that of convex gains, with equal variance, skewness and kurtosis for concave and convex exposures.

  • In the domain restricted to NE and RE, concave gains second-order stochastically dominate convex gains. In addition, concave gains then have a larger expected value and smaller variance, skewness and kurtosis than that of convex gains.

  • In the full domain including NE, RE and REE, convex gains second-order stochastically dominate concave gains if and only if ε ≥ 0.302%. In addition, convex gains then have a larger expected value, variance, skewness and kurtosis than that of concave gains if ε ≥ 0.302%.

The above assumptions on the experimental design are stated in terms of stochastic dominance and moments of the probability distributions. They also have implications in relation to standard approaches to decision-making. First, from an ex-ante perspective with perfect information about the probability distributions of all exposures, value-maximizing subjects whose preferences are represented by any non-decreasing value function would choose concave gains in the domain restricted to NE. They would continue to do so in the domain restricted to NE and RE for any non-decreasing and concave value function. However, in the full domain with REE, value-maximizing subjects endowed with any non-decreasing and concave value function are predicted to choose convex gains.

We denote the value attributed to sequences of gains with probabilities , for n ≤ 4. The above assumptions have the following implications in terms of value maximization:

  • In the domain restricted to NE, holds.

  • In the domain restricted to NE and RE, holds.

  • In the unrestricted domain, holds if and only if ε ≥ 0.302%.

Note that the reversal in second-order stochastic dominance (and value) in the full domain favors convex gains when extreme events that are indeed very rare - with probabilities much smaller that one percent - are added. In contrast, absent REE, adding only a RE is not enough to favor convex gains, even though it allows subjects to possibly detect convexity/acceleration and concavity/deceleration of gains and losses.

The above assumptions hold true under expected utility, that is when with the appropriate conditions on the utility function u (non-decreasing for first-order stochastic dominance, non-decreasing and concave for second-order stochastic dominance; see [16]). To better match experimental data, however, expected utility is increasingly supplemented by some form of probability weighting w(pi). For instance, [10] use a two-parameter functional form due to [39] and find that 30 rats out of 36 behave as if their probability weighting w(pi) is concave. In that case, the above ex-ante properties hold mutatis mutandis, for example, under expected utility with probability weighting, defined as . Alternatively, in the setting of rank-dependent expected utility, we can make use of results 3 and 4 in [25] to show that the above ex-ante properties still hold provided that the transformation of the cumulative distribution function is any increasing-concave function. Note that since in our experimental design REEs are both extremes (in the sense of being the largest values) and rare (that is, they have very low probabilities), one expects similar results under probability weighting and rank-dependent expected utility (or cumulative prospect theory for that matter). Although in theory assuming simply a weighting function w(pi) that is inverse-s-shaped and “very” convex for large gains (see [39] for the associated parametric restrictions) could overturn the rankings of concave and convex exposures stated above, results in [10] suggest that this is not to be expected for the overwhelming majority of the rats that are subject to their experiments.

Choice Data Analysis

Given the four options modelled in the previous section, each of the 20 rats went through 41 final sessions. Denote fA, fF, fR, and fV the relative frequencies (in terms of nose pokes) with which each of the four options Antifragile, Fragile, Robust and Vulnerable, respectively, is chosen over a particular session. In order to represent the rats’ choices, we construct the rotated square in panel (d) of Figure 1 using a linear transformation of the 3-frequency vector, as follows:

In equation (4), the first two rows deliver the two coordinates of interest that are the Total Sentivity to REE (TSREE in short) and One-sided Sentivity to REE (OSREE), while the last row requires that all frequencies sum to one. To derive TSREE and OSREE, it is useful to go through the following steps. Since we are interested in the convexity property of each chosen option, we first measure the frequencies of convex options in the gain domain by summing fA and fV. Similarly, fA + fR measures how frequent choices are convex in the loss domain. It follows that TSREE measures total convexity if it equals 2fA + fV + fR, while OSREE equals fVfR and measures asymmetric convexity (that in the gain domain minus that in the loss domain). Hence the first two rows in equation (4). Using that fA + fF + fR + fV = 1 - that is, the last row of equation (4) - it follows that TSREE equals the simpler expression 1 + fAfF. Note that the 3 × 3 matrix in equation (4) is invertible, which implies that all three frequencies can be recovered from the left vector. In addition, that matrix is not unique in the sense that the definitions of TSREE and OSREE that appear above as its first two lines could be moved to any other rows, provided that the identity in the third row and the constraint in the last row are adapted accordingly. This essentially means that given both differences TSREE (= 1 + fAfF) and OSREE (= fVfR), any of the four relative frequencies provides enough information to derive the remaining three, using the constraint that all four frequencies sum up to one. Finally, it follows that in the rotated square in panel (d) of Figure 1, the four vertices have the following coordinates: (0, 0) for Fragile, (0, 2) for Antifragile, (−1, 1) for Robust, and (1, 1) for Vulnerable. TSREE and OSREE are depicted in Figure 2.

TSREE and OSREE are also related to behavioral measures of the share of nosepokes that lead to either exposure or non exposure to REE, as follows: we define Black Swan Avoidance as the share of nosepokes that lead to avoidance of Black Swans - i.e. fA + fR - and Jackpot Seeking as the share of nosepokes that lead to be exposed to Jackpots - i.e. fA + fV. It follows that TSREE is by definition the sum of those two shares, while OSREE is the difference between the former and the latter. Formally, if we denote JPS and BSA the shares of nosepokes that lead to be exposed to Jackpots and to avoid Black Swans, this implies that JPS= 0.5(TSREE+OSREE) and BSA= 0.5(TSREE−OSREE). In other words, in the (not rotated) square that appears in panel (d) of Figure 1, the four vertices have the alternative coordinates: (0, 0) for Fragile, (1, 1) for Antifragile, (0, 1) for Robust, and (1, 0) for Vulnerable, in the (JPS,BSA) coordinates. Obviously, both interpretations are equivalent up to a change in coordinates which is a bijection. Note that in the rotated square in panel (d) of Figure 1, lines with a 45-degree slope depict choices with constant BSA so that lines moving north-west represent increasing BSA. Similarly, lines with a −45-degree slope depict choices with constant JPS so that lines moving north-east represent increasing JPS. Median BSA and JP are depicted in Figure 3.

In Figure 4 we report the short term behavioral responsiveness of rat to REE. Behavioral responsiveness is measured by calculating for each rat before/after differences in TSREE and OSREE. This amounts to calculate TSREE and OSREE over the 10 choices preceding a REE and over the 10 choices following the occurrence of each REE. The window of 10 choices before and after is dictated by the configuration of REE in our d esign. A REE can happen at 11th choice in a sequence (see appendix). Before/after TSREE and OSREE are averaged by rat over the 41 sessions in the gain domain (Jackpot) and in the loss domain (Black Swan). Table 3 reports the values for the mean before/after differences. I n each panel of Figure 4, each rat is depicted by a color-coded dot and the dotted black line represents the 45-degree line. Color-coded dot on the 45-degree line indicate no difference in TSREE or OSREE before and after a REE, i.e. no short term behavioral responsiveness to REE. To visualisation purpose, we computed for each panel a smooth spline regression that estimates a non parametric relationship between before and after TSREE and OSREE. They are plotted as solid black lines in each panel.

We then test the statistical significance of short term behavioral by conducting bootstrap paired-sample mean tests on before/after coordinates (separately) for each type of REE. We assessed short-term behavioral responses using a paired bootstrap test (boot.paired.bca) in R (wBoot package). This non-parametric method estimates the mean difference between paired conditions by resampling with replacement and computes a bias-corrected and accelerated (BCa) confidence interval, providing robust inference without assuming normality. An observation in the sample is the mean behavioral responsiveness in TSREE or OSREE for a given rat. Observations therefore respect statistical independence. Bootstrap tests lead to the following p-values, under the null hypothesis that the mean difference i s zero. Following Jackpots, p = 0.3470 for TSREE and p = 0.0522 for OSREE, which indicates that the null hypothesis is not rejected at 1%. Following Black Swans, however, the null hypothesis is rejected at 1% since p = 0.0045 for TSREE and p = 0.0064 for OSREE. In sum, the mean before/after difference following a REE in the loss domain is s ignificant for both Total and One-sided sensitivities, further confirming the Black Swan avoidance result that we document in this paper.

We also report in Figure 8 how choices by rats after the 41 final sessions compare to choices made in the training step 3, defined in the section on Behavioral Training and Testing. We do so by computing TSREE and OSREE in step 3 sessions and pair coordinates resulting from training with that resulting from the 40 experimental sessions reported in Figure 2. The lines between two color-coded dot connects for each rat its conditioning coordinates with its coordinates resulting from its behavior in the 41 sessions. We report on the right side of the figure the total variation distance between the distribution of nosepokes in the 4 holes in the training and the distribution in the 41 experimental sessions. The interpretation of total variation distance in our setting is simple: it measures, for each rat, the proportion of nosepokes that need to be changed in order to equalize the behavior in the training and the behavior in the 41 final experimental sessions. Median and mean total variation distance are 0.293 and 0.324 respectively, which implies that half of the rats changed more than 30% of their choices in the final experimental sessions compared to the training sessions.

Variation of choices observed in final experimental sessions compared to that observed in training sessions (step 3 in Material and Methods section on Experimental Method Details).

Panel (a) shows variations of Total Sensitivity to REE (y-axis) and of One-sided Sensitivity to REE (x-axis), that is, dark color-coded dots replicate averages over the 40 sessions for each of the 20 rats - see panel (c) in Figure 2 - and are paired with averages over training sessions for the same rat, depicted with light color-coded dots connected through lines with corresponding dark color-coded dots; panel (b) shows total variation distance, that is, the proportion of total nose pokes in training sessions that need to be changed to replicate nosepokes in final sessions

Finally, in Figure S1 of the Supplementary Material, we report what we label “convexity premiums”, which are counterfactual situations that we construct as follows. Given the rats’ choices over the 41 sessions, we recall all random sequences that have been used to generate gains and losses for each rat. We next compute the outcome that would have been obtained, had the rat chosen convex options in the gain domain (that is, Antifragile or Vulnerable) and in the loss domain (that is, Antifragile or Robust) all the time. The right and left panels in Figure S1 depict the normalized convexity premiums thus computed, for each rat represented in rows, with convex exposures yielding more pellets and implying less waiting in terms of seconds.

Augmented Q-Learning Model Estimation and Simulation

Parameter optimization is carried out for each rat over the 41 sessions, using observed choices and outcomes. We estimate by maximum likelihood two series of augmented Q-Learning models. The first series integrates REE in Q subvalues whereas the second series does not. Each series consists of a first baseline model without specific forgetting rates and decision weights attached to REE. Two partial models integrate either specific forgetting parameters or decisions weights on the presence of REE. The fourth model integrate both. In total, we estimate eight nested models for each rat. Formally, we estimate four different models: a baseline model (Model 1) without specific forgetting parameters (, with k{g, l}) and with zero decision weights of REE (γg = γl = 0). Model 2 introduces specific forgetting parameters (, with k{g, l}) and Model 3 introduces decision weights of REE (γg and γl0). Model 4 introduces both.

To ensure convergence and avoid local maxima, each parameter in baseline models was initialized with two different points of its parameter space. This makes in total 42 = 16 combinations of initial parameter values. Each of the 16 models is estimated using an automatized two-step procedure. The fist step consists of an unconstrained downhill simplex method in [31], which does not make use of first derivatives. Estimated parameters are then used as initial parameters in the second step of the estimation procedure, a quasi-Newton method in [17]. Parameter estimates from baseline models are then used as initial parameters for subsequent augmented models. Estimation is carried out using the above two-step procedure to ensure that we convergence is reached.

Once the eight models are estimated, we then compare them using both Akaike information criterion (AIC) and Bayesien Information criterion (BIC), that penalizes the use of additional parameters, so as to select which model configuration best fits observed behavior for each rat (see Supplementary Material for detailed results per rat). We check whether the selected model properly estimates sensitivities to REE by comparing estimated sensitivities to observed sensitivities to REE. This is done by computing Total Sensitivity to REE and One-sided Sensitivity to REE based on estimated choice frequencies from the extended Q-Learning model over the 41 sessions. Fitted sensitivities are reported bottom right panel in Figure 5. Finally, we checked that selected models were able to reproduce experimental data by running simulations for each phenotype observed in the sample of animals (as in [35]). Simulations are conducted over the artificially generated original 41 sessions, using individual parameter estimates for pink and red rats, median parameter estimates for blue and green rats and estimated Q subvalues as prior Qs. For each simulation, we then computed Total Sensitivity to REE and One-sided Sensitivity to REE. Simulations are reported in bottom right panel of Figure 5. See Supplementary Material for more details on the estimation results and for an exploration in related outcome range-adaption models

Supplementary Material

Sequences used in the experimental design

In Table S1 we report the different types of sequences used for behavioral training and testing, which shows the position of the extreme events. For example, in the sequence-type 6, jackpot could be obtained at the 10th position while the Black-swan would be triggered at the 60th activation.

Position of REE in the sequence

Table S2 shows the different behavioral sequence-types used for the forty testing sessions. As indicated in the text, half of the population was subjected to the sequence-types described in the left part of the ‘type column’, while the other half experienced the sequence-types described in the right part of the ‘type column’. Table S3 shows the succession of events for each sequence.

Sequences and types

Succession of events for each sequence

Behavioral measures for each rat

All behavioral measures for each rat that are used in the main text are gathered in Table S4: Total and One-sided Sensitivities to REE, Black Swan Avoidance and Jackpot Seeking (both in %).

Behavioral measures for each rat - Total and One-sided Sensitivities to REE - TSREE and OSREE in short; Black Swan Avoidance and Jackpot Seeking in percentage - BSA and JPS in short.

The last four columns present the percentage choice of each option.

Augmented Q-Learning model selection according to BIC

Table S5 presents BIC values for all rat models. In the first column, the rats’ phenotypes are color-coded as in the text. Each uniquely selected model is identified by light blue color.

BIC values for all augmented Q-Learning models, with selected value highlighted

Augmented Q-Learning model selection according to AIC

Table S6 presents AIC values for all rat models. In the first column, the rats’ phenotypes are color-coded as in the text. Each uniquely selected model is identified by light blue color.

AIC values for all augmented Q-Learning models, with selected value highlighted

Parameter estimates for selected Q-Learning models

Table S7 presents for each rat all estimated parameters arising from models selected using BIC (see Table S5 for values of this criterion).

Parameter estimates by rat for all selected Q-Learning models

Comparison of mean and median parameter estimates of augmented Q-Learning models for blue and green phenotypes

Table S8 presents median and mean parameter values estimated from augmented Q-Learning models for green and blue phenotypes, as well as the relevant statistical tests. Parameters difference tests are carried out by Wilcoxon Rank Sum tests (column 4) and Mean Difference tests (column 7). Significant results are highlighted.

Mean and median parameter estimates for blue and green phenotypes, and difference tests with significant results highlighted

An augmented Q-Learning model with outcome range-adaptation

As a robustness check, we introduce in our augmented Q-learning models the range principle due to [36] - see also [34] - which captures the notion that subjective evaluation of rewards may take into account their range through the Min-Max normalization presented below. That is, we replace objective rewards by their subjective judgements s(.) as follows:

Subjective judgements were then substituted in equation (1) for gains and losses and the resulting Q-learning models were estimated using the procedure described in Material and Methods, section Augmented Q-Learning Model Estimation and Simulation. More precisely, subjective judgements for gains and losses are as follows: s(rg) = (rg − 1)/(80 − 1) for options that give convex gains and s(rg) = (rg − 2)/(5 − 2) for options that give concave gains; similarly, s(rl) = (rl + 15)/(−6 + 15) when losses are convex losses and s(rl) = (rl + 240)/(−3 + 240) when losses are concave. REE are then used (as Min and Max values) to normalize all rewards and they are included in the Q-sub-values when they happen. This implies, by definition, that subjective judgements equal zero when Black Swans materialize, and equal one when Jackpots occur.

When REE are introduced through the normalization of gains and losses described above, models that consider nonzero decisions weights for REE are selected for 18 rats out of 20. Augmented Q-learning models with outcome range-adaptation proved however to improve the quality of the models for four rats only and we kept with our parsimonious models in the article. BIC and AIC values by rat are presented in the Tables S9-S10, for all four models with REE included in the Q-sub-values.

BIC values for all augmented Q-Learning model with outcome range-adaptation, with selected value highlighted

AIC values for all augmented Q-Learning model with outcome range-adaptation, with selected value highlighted

Outcomes

What are the ex-post outcomes of all choices made by the different rats over the course of the 41 final sessions? In the first left columns of Table S11, we report for each rat the number of nose pokes, the number of REE of each type the rat has experienced. We also report, for the rewards in pellets and the waiting times in seconds, the sum as well as the first four moments of the outcome per nose poke. The overall pattern that emerges from Table S11 is that a typical rats in with high Total Sensitivity group tends to have outcomes that differ from a typical rat with moderate Total Sensitivity. For gains (i.e. sugar pellets), the former’s rewards have typically higher mean and variance but smaller skewness and kurtosis that the later’s. This happens for instance, to an extreme degree, when we compare rat 15 (the most Anti-fragile rat) and rat 17. On the loss side, by symmetry the waiting time of the average high-sensitivity rat tends to have lower mean and variance, but larger skewness and kurtosis. These facts are consistent with the fact that a typical high-Total Sensitivity rat tend to pick a mix of exposures that is more convex both on the gain domain and on the loss domain, compared to an average low-Total Sensitivity rat.

Ex-post total outcomes for each of the 20 rats over the final 41 sessions - summary statistics.

Each row represents one rat and the color codes indicate the profile of the animal, as in Figure 2

Convexity Premiums

Evidently the outcomes that we report in Table S11 depend on the convexity mix of options that each rat has chosen, as captured by our notions of Total and One-Sided Sensitivity. To go beyond Table S11 so as to capture the extent to which rats exploit convexity in their choices, it is perhaps informative to look at the outcomes for each rat in the following way. Right and left panels in Figure S1 refer to losses and rewards, respectively.

In each panel, rats are color-coded as in Figure 2, and they are represented horizontally by two segments which are normalized in the following way. Consider for example (blue) rat 1 in the first line. In the right panel about pellets, the point of the black segment most to the left represents the outcome that would have happened, had the rat chosen a concave option in the gain domain (that is, either Robust or Fragile), exclusively in all sessions. The point most to right corresponds, in contrast, to the counterfactual outcome in which all nose pokes of the rat correspond to a convex option (that is, either Vulnerable or AntiFragile). Superimposed on this background grey segment, is a bold-colored segment (blue because rat 1 is blue) that ends with a circle indicating what the rat has gained due to convexity in relative terms - what we call the convexity premium (see Appendix for more details). Rat 1 has a normalized convexity premium that corresponds to roughly 80% of what it would have got, in terms of sugar pellets, had it counterfactually chosen to mix only Vulnerable and Fragile, instead of Robust and Anti-fragile as it did. Similarly, one sees from the right panel of Figure S1 that the Anti-fragile rat 15 has a convexity premium that corresponds to roughly 90% of what it would have got, in terms of sugar pellets, had he chosen to mix only Vulnerable and Fragile. From panel (b) of Figure 2, we know that rat 15 has also, more rarely, picked the Robust and Fragile exposures and this is why its convexity premium in the gain domain is less than maximal. At the other extreme, the rat 6 has a convexity premium near zero. Note that two rats are outside the black segment in the gain domain, that is, have “negative” convexity premiums, and they are indicated by dashed lines. Rat 17 has picked the Anti-fragile exposure a few times but got only one extreme gain, while rat 14 is more sensitive and hence got more extreme gains but still too few of them. Therefore, both rats got less sugar pellets than what they would have gotten by mixing exposures that are concave in the gain domain (Robust and Fragile).

Convexity Premiums for each of the 20 rats (row) - see Appendix for details.

Each row represents a rat and the color code indicates the profile of the animal, as in Figure 2. Towards the right, the dot materializes how many pellets were obtained relative to the number that would have been obtained if a convex menu had been chosen at each trial. Towards the left, the coloured dot indicates the total seconds of penalty obtained relative to what would have been obtained if a concave menu had been chosen at each trial.

The left panel in Figure S1 about time-out punishments can be read in a similar way, with now the point most to the left of the black segment corresponds to the largest loss in terms of waiting times, while the end point to the right corresponds to the lowest amount of time wasted. Therefore, the bold and colored segment indicates a negative premium: for instance, rat 2 on the first line in the left panel of Figure S1 has been exposed to the largest time-out punishment: it has a negative and large convexity premium because it mixed exposures with concave losses (that, is Fragile and Vulnerable), too often. Symmetrically, some rats turn out to get convexity premiums that are above 100% because they got too few black swans when they picked exposures that are concave in the loss domain). Comparing the right and left panels in Figure S1, one infers that rats exploit convexity better in the loss domain than in the gain domain, that is, they more often avoid Black Swans than they get Jackpots. This is of course consistent with our earlier observation that most rats exhibit moderate to high Black Swan Avoidance.

Data availability

Data availability: All data needed to replicate the results are available via GitHub at https://github.com/PatrickPintus/Rare-and-Extreme-Events. Code is available from the authors upon request

Acknowledgements

This paper supersedes a previous version circulated under the title “Decision-Making in Rats is Sensitive to Rare and Extreme Events: the Black Swan Avoidance”. The authors would like to gratefully thank the Editors and Referees at eLife for constructive and helpful comments and suggestions. The authors would like to also thank, without implicating, Raouf Boucekkine, Andrea Brovelli, Giorgio Coricelli, John Duffy, Thibault Gajdos, Brian Hill, Tobias Kalenscher, Fabio Maccheroni, Dean Mobbs, Stefano Palminteri, Drazen Prelec, Guillaume Rocheteau and Wolfram Schultz for very useful discussions and encouragement at early and more recent stages of this research, which has had a long gestation lag, as well as many conference and seminar participants for helpful feedback, especially at European Winter Conference on Brain Research, CERGIC, BSE Summer Forum, Institut Jean Nicod, Bocconi University, CREST, GREDEG, UC Irvine, MUSEES conference in Lyon, Comparative Psychology Research Colloquium in Düsseldorf, LNC2, IAST.

Additional information

Funding

The authors are grateful for the support of ANR through BEAM (ANR-15-ORAR-0004-03) and of CNRS through a MITI interdisciplinary grant (call for projects on rare events).

Author contributions

C.B., M.D, S.L., and P.P. designed the experiments. M.D and L.W.M. performed the experiments. S.L. and P.P. analyzed and modelled the data. S.L, P.P. and C.B, wrote the manuscript.