Individual differences in tail risk sensitive exploration using Bayes-adaptive Markov decision processes

  1. Tingke Shen  Is a corresponding author
  2. Peter Dayan
  1. Max Planck Institute for Biological Cybernetics, Germany
15 figures, 1 table and 1 additional file

Figures

Extraction of phase-averaged behavioral statistics from minute-resolution bouts data.

(a) Detailed visualization of minute-to-minute statistics of animal 25 (in the sessions after the introduction of the novel object). From top to bottom, the plots show % time within (Akiti et al., 2022)’s 7 cm threshold of the object with (cautious) and without (confident) tail-behind, the length of a bout at the object and the number of bouts per minute. Orange lines are the box-car functions fitted to segment phases and illustrate the change in time, duration, and frequency statistics across phases. The transition points 𝑡1 and 𝑡2 as well as the initial cautious 𝑔𝑖cau, final cautious 𝑔𝑠cau, peak confident 𝑔𝑝con and steady-state confident 𝑔𝑠con approach percentage times are shown. The right plots show examples of minute-to-minute and phase-averaged approach time, duration, and frequency for (b) brave, (c) intermediate, and (d) timid animals. Note that animals are ordered by the group-timidity animal index (see A spectrum of risk-sensitive exploration trajectories). Green indicates cautious and blue indicates confident approach. Darker colors indicate higher values. For the purpose of modeling, we average the idiosyncrasies of behavior over phases and thereby characterize a high-level summary of learning dynamics.

Categorization of animals into timid, intermediate, and brave groups based on cautious and confident bout statistics.

The x-axis shows the ratio of total time spent in confident versus cautious bouts. The y-axis shows the ratio of bout time in the first 10 min of confident approach and the last 10 min of confident approach (set to 0 for timid animals that do not have a confident phase). The horizontal line indicates y=1.0. All nine timid animals are close to the origin. We separate brave and intermediate animals according to the y=1 line. Solid green dots are brave animals that pass the Benjamini–Hochberg procedure for 𝑦>1 at level 𝑞=0.05 according to a random permutation test. Hollow dots represent brave animals that did not pass. We decided to model these animals as brave since they had 𝑦>1.5 and hence a relatively clear confident-peak to confident-steady-state transition point. Modeling them as intermediate animals instead would not have significantly affected our results. Black dots are intermediate animals. They did not pass the Benjamini–Hochberg procedure for either 𝑦>1 or 𝑦<1.

Markov decision process underlying the Bayes-adaptive Markov decision process (BAMDP) model.

Four real (nest, cautious object, confident object, retreat) and three imagined (cautious detect, confident detect, dead) states. Agent actions are italicized. Blue arrows indicate (possibly stochastic) transitions caused by agent actions. Green arrows indicate (possibly stochastic) forced transitions. A cautious approach provides less informational reward 𝑟2<𝑟1 but has a smaller chance of death 𝑝2<𝑝1 compared to a confident approach. Travel and dying costs are not shown.

Hazard function learning for (a) brave and (b) timid animals.

Brave animals start with a flexible hazard prior with a low mean for 2. This leads to longer bouts (first length 2, then 3 and 4), which imply that the hazard posterior quickly approaches zero (here, after 10 bouts). Timid animals start with an inflexible hazard prior with a higher mean 2, and are limited to length bouts. The hazard posterior only changes slightly after 10 bouts.

Summary of model fit.

Left panels: minute-to-minute time the animals spend within 7 cm of the novel object (top), duration (middle), and frequency (bottom). There are 26 animals (one per row) sorted by the group-timidity animal index (see A spectrum of risk-sensitive exploration trajectories). Central panels: the same values averaged over behavioral phases. Right panels: time, duration, and frequency of bouts generated as sample trajectories from the individual fits of the Bayes-adaptive Markov decision process (BAMDP) model. Legend: green/blue distinguishes cautious and confident bouts. The intensity of colors indicates higher values, and gray indicates zeros.

The bout durations of brave animals depend on the hazard prior.

(a) Brave animals that initially perform cautious-2 bouts, then confident-3 bouts. The prior mean 𝜇3 for 𝜏=3 is higher than in (c) because there is some hazard to overcome before the animal does a duration-3 bout. Blue indicates individual animals and black indicates the mean. The y-axis 𝐸[𝜇𝜏], shows 𝜇𝜏 averaged over the Approximate Bayesian Computation Sequential Monte Carlo (ABCSMC) posterior particles for each animal. (b) Cautious-2 then confident-4 animals. Since the mean 𝜇4 prior is low, once the animal overcomes the 𝜏=2 hazard, it quickly transitions from duration 2 to 4. (c) Cautious-3, then confident-3 animals. These animals are fitted with a low 𝜇3 prior and high 𝜇4 prior because they never perform duration-4 bouts. (d) Cautious-3 then confident-4 animals. Since the 𝜇3 prior is lower than in (b), these animals begin with duration-3 bouts.

Group-level and within-group variation in fitted risk-sensitivity and hazard priors.

(a) nCVaR’s α versus the group-timidity animal index ranking defined in A spectrum of risk-sensitive exploration trajectories. Color indicates the animal group. More timid animals are generally fitted by a lower α. Prior hazard parameter for t = 2 (b), t = 3 (c), and t = 4 (d) versus timidity ranking. Dots indicate the mean; the probability density is represented by color where darker means higher density regions. The t = 2 prior mean is similar across all animals (timid = 0.28±0.02, intermediate = 0.26±0.04, brave = 0.22±0.08) explaining the short, cautious bouts all animals initially use to assess risk. However, timid animals are best fit with lower variance (inflexible) and higher t = 3 and t = 4 prior means. This leads to shorter, cautious bouts in the long run. Brave animals are fitted by a low slope (indicated by lower mean for t = 3 and t = 4) and high variance (flexible) hazard prior. This allows them to perform longer bouts over time. t = 4 mean is low (panel d) for brave animals that perform length 4 bouts. Like brave animals, most intermediate animals have flexible, gradual hazards up to t = 3.

Influence of exploration pool and forgetting rate on steady-state behavior.

(a) The relationship between 𝐺0 and the peak to steady-state change point for brave animals. The best fit line is shown in black. Higher 𝐺0 means the agent explores longer, hence postponing the change point. (b) 𝐺0 versus peak to steady-state change point for timid animals. (c) Forgetting rate versus steady-state turns at the nest state for brave animals. A higher forgetting rate leads to quicker replenishment of the exploration pool and hence fewer turns at the nest before approaching the object. (d) Forgetting rate versus turns at nest timid animal. All correlations are significant with 𝑝<0.002.

Non-identifiability of nCVaR’s α against the hazard prior.

Animals are labeled using the group-timidity animal index. (a) The scatter plot shows the t = 2 prior mean (𝜇2) versus α for Approximate Bayesian Computation Sequential Monte Carlo (ABCSMC) particles of timid animal 1. The ellipse indicates one standard deviation in a Gaussian density model. Animal 1 (and timid animals generally) can be either fit with a higher α and a higher 𝜇2, or a lower α and a lower 𝜇2. The box-and-whisker plot illustrates the correlation between 𝜇2 and α across all timid animals. (b) The scatter plot shows an example intermediate animal 10; the box-and-whisker plot shows 𝜇2 versus α for the intermediate population. (c) The scatter plot shows an example animal 11 from the group containing cautious-2/confident-4 and cautious-2/confident-3 animals. This group of animals starts with duration = 2 bouts and hence must overcome the prior 𝜇3. The box-and-whisker plot shows 𝜇3 versus α for the population. (d) The scatter plot shows an example animal 25 from the group containing cautious-2/confident-4 and cautious-3/confident-4 animals. This group of animals eventually performs duration = 4 bouts and hence must overcome the prior 𝜇4. The box-and-whisker plot shows 𝜇4 versus 𝛼 for the population. 𝛼 and 𝜇 are correlated in the ABCSMC posterior for all animals and hence non-identifiable. 𝑝<0.05 for all correlations.

Comparing the behavior of FONC and UONC conditions.

There are 9 FONC and 11 UONC brave animals (one per row). Left panels: minute-to-minute time the animals spend within 7 cm of the novel object (top), duration (middle), and frequency (bottom). Animals are again sorted by group-timidity animal index but split by experiment condition (UONC then FONC). Central panels: the same values averaged over behavioral phases. Right panels: time, duration, and frequency of bouts generated as sample trajectories from the individual fits of the Bayes-adaptive Markov decision process (BAMDP) model.

Approximate Bayesian Computation Sequential Monte Carlo (ABCSMC) parameter fits of the 9 FONC and 11 UONC animals (with the latter replotted from Figure 7 for convenience).

The x-axis shows group-timidity animal index, but UONC and FONC animals are separated. (a) Average nCVaR’s α over posterior particles of each animal. Color indicates the animal group. Dashed lines indicate the average (across animals) values of each condition (UONC brave or FONC brave). 𝑝-values for the Kolmogorov–Smirnov test of condition differences is shown. 𝑝<0.05 and therefore the α values of brave FONC animals are significantly higher than those of brave UONC animals. (b) Exploration bonus pool, which is also significantly different between FONC and UONC animals. (c) Forgetting rate, which is not significantly different between the two conditions. Prior hazard parameter for t = 2 (d), t = 3 (e), and t = 4 (f). The probability density is represented by color where darker means higher density regions. Dots indicate the mean. Dashed lines indicate the average of mean values across animals, while dotted lines indicate the average of standard deviation values across animals. 𝑝-values testing the difference between the two conditions’ means and standard deviations is shown on the right-hand side and left-hand side of the plots, respectively. Brave FONC animals have both significantly lower hazard prior mean and standard deviation than brave UONC animals.

Bayes net showing the relationship between the random variables in the noisy-or model.

Only xτ is shown xτ+1. depends on zt=1:τ+1, and so on.

Appendix 1—figure 1
Recovery targets versus the closest particles in the ABCSMC posterior.

Each subplot plots one of the nine fitted parameters for all 26 animals. The colors of the points indicate the animal group. The gray 𝑦=𝑥 line represents a perfect recovery of the recovery targets. Most points lie close to the 𝑦=𝑥 line, suggesting our ABCSMC fitting algorithm has good recoverability.

Appendix 1—figure 2
Identical to Appendix 1—figure 1 but the recovery targets are plotted against the (marginal) means of the ABCSMC posterior.

We chose the final ABCSMC population for the posterior (population 15). 𝑅2 is high for 𝐺0 and,𝑓 suggesting that these parameters are identifiable. 𝑅2 is low for nCVaR𝛼 and the hazard priors due to the non-identifiability discussed in the main text. In particular, 𝑅2 is less than 0.0 for nCVaR𝛼 and 𝜃2-mean suggesting these parameters are the most confounded. However, 𝑅2 is high for, 𝜃2-deviation suggesting nCVaR𝛼 does not confound the flexibility of the hazard function. Finally, the 𝑅2 for 𝜃3 is nearly zero. This is expected because timid and some intermediate animals do not have duration-3 approach, and for these animals, 𝜃3 can take on arbitrarily large values.

Appendix 1—figure 3
The ABCSMC posterior for animal 24.

Univariate and bivariate marginals are shown on the diagonal and off-diagonal, respectively. Recovery targets are shown as green vertical lines in univariate plots and green points on bivariate plots. Marginal means are shown in orange. Recovery targets and means are close for 𝐺0 and 𝑓 due to their identifiability. nCVaRα and the hazard prior parameters are non-identifiable. Hence, the recovery targets are farther from the mean but still lie in a region of the posterior with support.

Tables

Table 1
Table of Approximate Bayesian Computation Sequential Monte Carlo (ABCSMC) parameters.
ParameterDescriptionValue
TNumber of populations30
BPopulation size100
𝜖𝑡Set adaptively to lowest 30-percentile
π(θ)Prior distributions for fitted parametersUniform
Kt(θ|θ)Transition kernelN(0,Σ)
d(x,x0)Distance functionL1 distance

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Tingke Shen
  2. Peter Dayan
(2025)
Individual differences in tail risk sensitive exploration using Bayes-adaptive Markov decision processes
eLife 13:RP100366.
https://doi.org/10.7554/eLife.100366.3