Individual differences in tail risk sensitive exploration using Bayes-adaptive Markov decision processes
Figures
Extraction of phase-averaged behavioral statistics from minute-resolution bouts data.
(a) Detailed visualization of minute-to-minute statistics of animal 25 (in the sessions after the introduction of the novel object). From top to bottom, the plots show % time within (Akiti et al., 2022)โs 7 cm threshold of the object with (cautious) and without (confident) tail-behind, the length of a bout at the object and the number of bouts per minute. Orange lines are the box-car functions fitted to segment phases and illustrate the change in time, duration, and frequency statistics across phases. The transition points and as well as the initial cautious , final cautious , peak confident and steady-state confident approach percentage times are shown. The right plots show examples of minute-to-minute and phase-averaged approach time, duration, and frequency for (b) brave, (c) intermediate, and (d) timid animals. Note that animals are ordered by the group-timidity animal index (see A spectrum of risk-sensitive exploration trajectories). Green indicates cautious and blue indicates confident approach. Darker colors indicate higher values. For the purpose of modeling, we average the idiosyncrasies of behavior over phases and thereby characterize a high-level summary of learning dynamics.
Categorization of animals into timid, intermediate, and brave groups based on cautious and confident bout statistics.
The x-axis shows the ratio of total time spent in confident versus cautious bouts. The y-axis shows the ratio of bout time in the first 10 min of confident approach and the last 10 min of confident approach (set to 0 for timid animals that do not have a confident phase). The horizontal line indicates . All nine timid animals are close to the origin. We separate brave and intermediate animals according to the line. Solid green dots are brave animals that pass the BenjaminiโHochberg procedure for ๏ปฟ at level ๏ปฟ๏ปฟ according to a random permutation test. Hollow dots represent brave animals that did not pass. We decided to model these animals as brave since they had ๏ปฟ and hence a relatively clear confident-peak to confident-steady-state transition point. Modeling them as intermediate animals instead would not have significantly affected our results. Black dots are intermediate animals. They did not pass the BenjaminiโHochberg procedure for either ๏ปฟ๏ปฟ or ๏ปฟ๏ปฟ.
Markov decision process underlying the Bayes-adaptive Markov decision process (BAMDP) model.
Four real (nest, cautious object, confident object, retreat) and three imagined (cautious detect, confident detect, dead) states. Agent actions are italicized. Blue arrows indicate (possibly stochastic) transitions caused by agent actions. Green arrows indicate (possibly stochastic) forced transitions. A cautious approach provides less informational reward ๏ปฟ but has a smaller chance of death ๏ปฟ compared to a confident approach. Travel and dying costs are not shown.
Hazard function learning for (a) brave and (b) timid animals.
Brave animals start with a flexible hazard prior with a low mean for . This leads to longer bouts (first length 2, then 3 and 4), which imply that the hazard posterior quickly approaches zero (here, after 10 bouts). Timid animals start with an inflexible hazard prior with a higher mean , and are limited to length bouts. The hazard posterior only changes slightly after 10 bouts.
Summary of model fit.
Left panels: minute-to-minute time the animals spend within 7 cm of the novel object (top), duration (middle), and frequency (bottom). There are 26 animals (one per row) sorted by the group-timidity animal index (see A spectrum of risk-sensitive exploration trajectories). Central panels: the same values averaged over behavioral phases. Right panels: time, duration, and frequency of bouts generated as sample trajectories from the individual fits of the Bayes-adaptive Markov decision process (BAMDP) model. Legend: green/blue distinguishes cautious and confident bouts. The intensity of colors indicates higher values, and gray indicates zeros.
The bout durations of brave animals depend on the hazard prior.
(a) Brave animals that initially perform cautious-2 bouts, then confident-3 bouts. The prior mean for is higher than in (c) because there is some hazard to overcome before the animal does a duration-3 bout. Blue indicates individual animals and black indicates the mean. The y-axis , shows averaged over the Approximate Bayesian Computation Sequential Monte Carlo (ABCSMC) posterior particles for each animal. (b) Cautious-2 then confident-4 animals. Since the mean prior is low, once the animal overcomes the hazard, it quickly transitions from duration 2 to 4. (c) Cautious-3, then confident-3 animals. These animals are fitted with a low prior and high prior because they never perform duration-4 bouts. (d) Cautious-3 then confident-4 animals. Since the prior is lower than in (b), these animals begin with duration-3 bouts.
Group-level and within-group variation in fitted risk-sensitivity and hazard priors.
(a) nCVaRโs ฮฑ versus the group-timidity animal index ranking defined in A spectrum of risk-sensitive exploration trajectories. Color indicates the animal group. More timid animals are generally fitted by a lower ฮฑ. Prior hazard parameter for t = 2 (b), t = 3 (c), and t = 4 (d) versus timidity ranking. Dots indicate the mean; the probability density is represented by color where darker means higher density regions. The t = 2 prior mean is similar across all animals (timid = , intermediate = , brave = ) explaining the short, cautious bouts all animals initially use to assess risk. However, timid animals are best fit with lower variance (inflexible) and higher t = 3 and t = 4 prior means. This leads to shorter, cautious bouts in the long run. Brave animals are fitted by a low slope (indicated by lower mean for t = 3 and t = 4) and high variance (flexible) hazard prior. This allows them to perform longer bouts over time. t = 4 mean is low (panel d) for brave animals that perform length 4 bouts. Like brave animals, most intermediate animals have flexible, gradual hazards up to t = 3.
Influence of exploration pool and forgetting rate on steady-state behavior.
(a) The relationship between and the peak to steady-state change point for brave animals. The best fit line is shown in black. Higher means the agent explores longer, hence postponing the change point. (b) versus peak to steady-state change point for timid animals. (c) Forgetting rate versus steady-state turns at the nest state for brave animals. A higher forgetting rate leads to quicker replenishment of the exploration pool and hence fewer turns at the nest before approaching the object. (d) Forgetting rate versus turns at nest timid animal. All correlations are significant with .
Non-identifiability of nCVaRโs ฮฑ against the hazard prior.
Animals are labeled using the group-timidity animal index. (a) The scatter plot shows the t = 2 prior mean () versus ฮฑ for Approximate Bayesian Computation Sequential Monte Carlo (ABCSMC) particles of timid animal 1. The ellipse indicates one standard deviation in a Gaussian density model. Animal 1 (and timid animals generally) can be either fit with a higher ฮฑ and a higher , or a lower ฮฑ and a lower . The box-and-whisker plot illustrates the correlation between and ฮฑ across all timid animals. (b) The scatter plot shows an example intermediate animal 10; the box-and-whisker plot shows versus ฮฑ for the intermediate population. (c) The scatter plot shows an example animal 11 from the group containing cautious-2/confident-4 and cautious-2/confident-3 animals. This group of animals starts with duration = 2 bouts and hence must overcome the prior . The box-and-whisker plot shows versus ฮฑ for the population. (d) The scatter plot shows an example animal 25 from the group containing cautious-2/confident-4 and cautious-3/confident-4 animals. This group of animals eventually performs duration = 4 bouts and hence must overcome the prior . The box-and-whisker plot shows versus for the population. and are correlated in the ABCSMC posterior for all animals and hence non-identifiable. for all correlations.
Comparing the behavior of FONC and UONC conditions.
There are 9 FONC and 11 UONC brave animals (one per row). Left panels: minute-to-minute time the animals spend within 7 cm of the novel object (top), duration (middle), and frequency (bottom). Animals are again sorted by group-timidity animal index but split by experiment condition (UONC then FONC). Central panels: the same values averaged over behavioral phases. Right panels: time, duration, and frequency of bouts generated as sample trajectories from the individual fits of the Bayes-adaptive Markov decision process (BAMDP) model.
Approximate Bayesian Computation Sequential Monte Carlo (ABCSMC) parameter fits of the 9 FONC and 11 UONC animals (with the latter replotted from Figure 7 for convenience).
The x-axis shows group-timidity animal index, but UONC and FONC animals are separated. (a) Average nCVaRโs ฮฑ over posterior particles of each animal. Color indicates the animal group. Dashed lines indicate the average (across animals) values of each condition (UONC brave or FONC brave). for the KolmogorovโSmirnov test of condition differences is shown. and therefore the ฮฑ values of brave FONC animals are significantly higher than those of brave UONC animals. (b) Exploration bonus pool, which is also significantly different between FONC and UONC animals. (c) Forgetting rate, which is not significantly different between the two conditions. Prior hazard parameter for t = 2 (d), t = 3 (e), and t = 4 (f). The probability density is represented by color where darker means higher density regions. Dots indicate the mean. Dashed lines indicate the average of mean values across animals, while dotted lines indicate the average of standard deviation values across animals. ๏ปฟ testing the difference between the two conditionsโ means and standard deviations is shown on the right-hand side and left-hand side of the plots, respectively. Brave FONC animals have both significantly lower hazard prior mean and standard deviation than brave UONC animals.
Bayes net showing the relationship between the random variables in the noisy-or model.
Only is shown . depends on , and so on.
Recovery targets versus the closest particles in the ABCSMC posterior.
Each subplot plots one of the nine fitted parameters for all 26 animals. The colors of the points indicate the animal group. The gray line represents a perfect recovery of the recovery targets. Most points lie close to the line, suggesting our ABCSMC fitting algorithm has good recoverability.
Identical to Appendix 1โfigure 1 but the recovery targets are plotted against the (marginal) means of the ABCSMC posterior.
We chose the final ABCSMC population for the posterior (population 15). is high for and, suggesting that these parameters are identifiable. is low for and the hazard priors due to the non-identifiability discussed in the main text. In particular, is less than 0.0 for and suggesting these parameters are the most confounded. However, is high for, suggesting does not confound the flexibility of the hazard function. Finally, the for is nearly zero. This is expected because timid and some intermediate animals do not have duration-3 approach, and for these animals, can take on arbitrarily large values.
The ABCSMC posterior for animal 24.
Univariate and bivariate marginals are shown on the diagonal and off-diagonal, respectively. Recovery targets are shown as green vertical lines in univariate plots and green points on bivariate plots. Marginal means are shown in orange. Recovery targets and means are close for and due to their identifiability. and the hazard prior parameters are non-identifiable. Hence, the recovery targets are farther from the mean but still lie in a region of the posterior with support.
Tables
Table of Approximate Bayesian Computation Sequential Monte Carlo (ABCSMC) parameters.
| Parameter | Description | Value |
|---|---|---|
| Number of populations | 30 | |
| Population size | 100 | |
| Set adaptively to lowest 30-percentile | ||
| Prior distributions for fitted parameters | Uniform | |
| Transition kernel | ||
| Distance function | distance |