Individual differences in tail risk sensitive exploration using Bayes-adaptive Markov decision processes

  1. Tingke Shen  Is a corresponding author
  2. Peter Dayan
  1. Max Planck Institute for Biological Cybernetics, Germany
15 figures, 1 table and 1 additional file

Figures

Extraction of phase-averaged behavioral statistics from minute-resolution bouts data.

(a) Detailed visualization of minute-to-minute statistics of animal 25 (in the sessions after the introduction of the novel object). From top to bottom, the plots show % time within (Akiti et al., 2022)โ€™s 7 cm threshold of the object with (cautious) and without (confident) tail-behind, the length of a bout at the object and the number of bouts per minute. Orange lines are the box-car functions fitted to segment phases and illustrate the change in time, duration, and frequency statistics across phases. The transition points ๐‘ก1 and ๐‘ก2 as well as the initial cautious ๐‘”๐‘–cau, final cautious ๐‘”๐‘ cau, peak confident ๐‘”๐‘con and steady-state confident ๐‘”๐‘ con approach percentage times are shown. The right plots show examples of minute-to-minute and phase-averaged approach time, duration, and frequency for (b) brave, (c) intermediate, and (d) timid animals. Note that animals are ordered by the group-timidity animal index (see A spectrum of risk-sensitive exploration trajectories). Green indicates cautious and blue indicates confident approach. Darker colors indicate higher values. For the purpose of modeling, we average the idiosyncrasies of behavior over phases and thereby characterize a high-level summary of learning dynamics.

Categorization of animals into timid, intermediate, and brave groups based on cautious and confident bout statistics.

The x-axis shows the ratio of total time spent in confident versus cautious bouts. The y-axis shows the ratio of bout time in the first 10 min of confident approach and the last 10 min of confident approach (set to 0 for timid animals that do not have a confident phase). The horizontal line indicates y=1.0. All nine timid animals are close to the origin. We separate brave and intermediate animals according to the y=1 line. Solid green dots are brave animals that pass the Benjaminiโ€“Hochberg procedure for ๏ปฟ๐‘ฆ>1 at level ๏ปฟ๏ปฟ๐‘ž=0.05 according to a random permutation test. Hollow dots represent brave animals that did not pass. We decided to model these animals as brave since they had ๏ปฟ๐‘ฆ>1.5 and hence a relatively clear confident-peak to confident-steady-state transition point. Modeling them as intermediate animals instead would not have significantly affected our results. Black dots are intermediate animals. They did not pass the Benjaminiโ€“Hochberg procedure for either ๏ปฟ๏ปฟ๐‘ฆ>1 or ๏ปฟ๏ปฟ๐‘ฆ<1.

Markov decision process underlying the Bayes-adaptive Markov decision process (BAMDP) model.

Four real (nest, cautious object, confident object, retreat) and three imagined (cautious detect, confident detect, dead) states. Agent actions are italicized. Blue arrows indicate (possibly stochastic) transitions caused by agent actions. Green arrows indicate (possibly stochastic) forced transitions. A cautious approach provides less informational reward ๏ปฟ๐‘Ÿ2<๐‘Ÿ1 but has a smaller chance of death ๏ปฟ๐‘2<๐‘1 compared to a confident approach. Travel and dying costs are not shown.

Hazard function learning for (a) brave and (b) timid animals.

Brave animals start with a flexible hazard prior with a low mean for โ„Ž2. This leads to longer bouts (first length 2, then 3 and 4), which imply that the hazard posterior quickly approaches zero (here, after 10 bouts). Timid animals start with an inflexible hazard prior with a higher mean โ„Ž2, and are limited to length bouts. The hazard posterior only changes slightly after 10 bouts.

Summary of model fit.

Left panels: minute-to-minute time the animals spend within 7 cm of the novel object (top), duration (middle), and frequency (bottom). There are 26 animals (one per row) sorted by the group-timidity animal index (see A spectrum of risk-sensitive exploration trajectories). Central panels: the same values averaged over behavioral phases. Right panels: time, duration, and frequency of bouts generated as sample trajectories from the individual fits of the Bayes-adaptive Markov decision process (BAMDP) model. Legend: green/blue distinguishes cautious and confident bouts. The intensity of colors indicates higher values, and gray indicates zeros.

The bout durations of brave animals depend on the hazard prior.

(a) Brave animals that initially perform cautious-2 bouts, then confident-3 bouts. The prior mean ๐œ‡3 for ๐œ=3 is higher than in (c) because there is some hazard to overcome before the animal does a duration-3 bout. Blue indicates individual animals and black indicates the mean. The y-axis ๐ธ[๐œ‡๐œ], shows ๐œ‡๐œ averaged over the Approximate Bayesian Computation Sequential Monte Carlo (ABCSMC) posterior particles for each animal. (b) Cautious-2 then confident-4 animals. Since the mean ๐œ‡4 prior is low, once the animal overcomes the ๐œ=2 hazard, it quickly transitions from duration 2 to 4. (c) Cautious-3, then confident-3 animals. These animals are fitted with a low ๐œ‡3 prior and high ๐œ‡4 prior because they never perform duration-4 bouts. (d) Cautious-3 then confident-4 animals. Since the ๐œ‡3 prior is lower than in (b), these animals begin with duration-3 bouts.

Group-level and within-group variation in fitted risk-sensitivity and hazard priors.

(a) nCVaRโ€™s ฮฑ versus the group-timidity animal index ranking defined in A spectrum of risk-sensitive exploration trajectories. Color indicates the animal group. More timid animals are generally fitted by a lower ฮฑ. Prior hazard parameter for t = 2 (b), t = 3 (c), and t = 4 (d) versus timidity ranking. Dots indicate the mean; the probability density is represented by color where darker means higher density regions. The t = 2 prior mean is similar across all animals (timid = 0.28ยฑ0.02, intermediate = 0.26ยฑ0.04, brave = 0.22ยฑ0.08) explaining the short, cautious bouts all animals initially use to assess risk. However, timid animals are best fit with lower variance (inflexible) and higher t = 3 and t = 4 prior means. This leads to shorter, cautious bouts in the long run. Brave animals are fitted by a low slope (indicated by lower mean for t = 3 and t = 4) and high variance (flexible) hazard prior. This allows them to perform longer bouts over time. t = 4 mean is low (panel d) for brave animals that perform length 4 bouts. Like brave animals, most intermediate animals have flexible, gradual hazards up to t = 3.

Influence of exploration pool and forgetting rate on steady-state behavior.

(a) The relationship between ๐บ0 and the peak to steady-state change point for brave animals. The best fit line is shown in black. Higher ๐บ0 means the agent explores longer, hence postponing the change point. (b) ๐บ0 versus peak to steady-state change point for timid animals. (c) Forgetting rate versus steady-state turns at the nest state for brave animals. A higher forgetting rate leads to quicker replenishment of the exploration pool and hence fewer turns at the nest before approaching the object. (d) Forgetting rate versus turns at nest timid animal. All correlations are significant with ๐‘<0.002.

Non-identifiability of nCVaRโ€™s ฮฑ against the hazard prior.

Animals are labeled using the group-timidity animal index. (a) The scatter plot shows the t = 2 prior mean (๐œ‡2) versus ฮฑ for Approximate Bayesian Computation Sequential Monte Carlo (ABCSMC) particles of timid animal 1. The ellipse indicates one standard deviation in a Gaussian density model. Animal 1 (and timid animals generally) can be either fit with a higher ฮฑ and a higher ๐œ‡2, or a lower ฮฑ and a lower ๐œ‡2. The box-and-whisker plot illustrates the correlation between ๐œ‡2 and ฮฑ across all timid animals. (b) The scatter plot shows an example intermediate animal 10; the box-and-whisker plot shows ๐œ‡2 versus ฮฑ for the intermediate population. (c) The scatter plot shows an example animal 11 from the group containing cautious-2/confident-4 and cautious-2/confident-3 animals. This group of animals starts with duration = 2 bouts and hence must overcome the prior ๐œ‡3. The box-and-whisker plot shows ๐œ‡3 versus ฮฑ for the population. (d) The scatter plot shows an example animal 25 from the group containing cautious-2/confident-4 and cautious-3/confident-4 animals. This group of animals eventually performs duration = 4 bouts and hence must overcome the prior ๐œ‡4. The box-and-whisker plot shows ๐œ‡4 versus ๐›ผ for the population. ๐›ผ and ๐œ‡ are correlated in the ABCSMC posterior for all animals and hence non-identifiable. ๐‘<0.05 for all correlations.

Comparing the behavior of FONC and UONC conditions.

There are 9 FONC and 11 UONC brave animals (one per row). Left panels: minute-to-minute time the animals spend within 7 cm of the novel object (top), duration (middle), and frequency (bottom). Animals are again sorted by group-timidity animal index but split by experiment condition (UONC then FONC). Central panels: the same values averaged over behavioral phases. Right panels: time, duration, and frequency of bouts generated as sample trajectories from the individual fits of the Bayes-adaptive Markov decision process (BAMDP) model.

Approximate Bayesian Computation Sequential Monte Carlo (ABCSMC) parameter fits of the 9 FONC and 11 UONC animals (with the latter replotted from Figure 7 for convenience).

The x-axis shows group-timidity animal index, but UONC and FONC animals are separated. (a) Average nCVaRโ€™s ฮฑ over posterior particles of each animal. Color indicates the animal group. Dashed lines indicate the average (across animals) values of each condition (UONC brave or FONC brave). ๐‘-values for the Kolmogorovโ€“Smirnov test of condition differences is shown. ๐‘<0.05 and therefore the ฮฑ values of brave FONC animals are significantly higher than those of brave UONC animals. (b) Exploration bonus pool, which is also significantly different between FONC and UONC animals. (c) Forgetting rate, which is not significantly different between the two conditions. Prior hazard parameter for t = 2 (d), t = 3 (e), and t = 4 (f). The probability density is represented by color where darker means higher density regions. Dots indicate the mean. Dashed lines indicate the average of mean values across animals, while dotted lines indicate the average of standard deviation values across animals. ๏ปฟ๐‘-values testing the difference between the two conditionsโ€™ means and standard deviations is shown on the right-hand side and left-hand side of the plots, respectively. Brave FONC animals have both significantly lower hazard prior mean and standard deviation than brave UONC animals.

Bayes net showing the relationship between the random variables in the noisy-or model.

Only xฯ„ is shown xฯ„+1. depends on zt=1:ฯ„+1, and so on.

Appendix 1โ€”figure 1
Recovery targets versus the closest particles in the ABCSMC posterior.

Each subplot plots one of the nine fitted parameters for all 26 animals. The colors of the points indicate the animal group. The gray ๐‘ฆ=๐‘ฅ line represents a perfect recovery of the recovery targets. Most points lie close to the ๐‘ฆ=๐‘ฅ line, suggesting our ABCSMC fitting algorithm has good recoverability.

Appendix 1โ€”figure 2
Identical to Appendix 1โ€”figure 1 but the recovery targets are plotted against the (marginal) means of the ABCSMC posterior.

We chose the final ABCSMC population for the posterior (population 15). ๐‘…2 is high for ๐บ0 and,๐‘“ suggesting that these parameters are identifiable. ๐‘…2 is low for nCVaR๐›ผ and the hazard priors due to the non-identifiability discussed in the main text. In particular, ๐‘…2 is less than 0.0 for nCVaR๐›ผ and ๐œƒ2-mean suggesting these parameters are the most confounded. However, ๐‘…2 is high for, ๐œƒ2-deviation suggesting nCVaR๐›ผ does not confound the flexibility of the hazard function. Finally, the ๐‘…2 for ๐œƒ3 is nearly zero. This is expected because timid and some intermediate animals do not have duration-3 approach, and for these animals, ๐œƒ3 can take on arbitrarily large values.

Appendix 1โ€”figure 3
The ABCSMC posterior for animal 24.

Univariate and bivariate marginals are shown on the diagonal and off-diagonal, respectively. Recovery targets are shown as green vertical lines in univariate plots and green points on bivariate plots. Marginal means are shown in orange. Recovery targets and means are close for ๐บ0 and ๐‘“ due to their identifiability. nCVaRฮฑ and the hazard prior parameters are non-identifiable. Hence, the recovery targets are farther from the mean but still lie in a region of the posterior with support.

Tables

Table 1
Table of Approximate Bayesian Computation Sequential Monte Carlo (ABCSMC) parameters.
ParameterDescriptionValue
TNumber of populations30
BPopulation size100
๐œ–๐‘กSet adaptively to lowest 30-percentile
ฯ€(ฮธ)Prior distributions for fitted parametersUniform
Kt(ฮธ|ฮธโˆ—)Transition kernelN(0,ฮฃ)
d(x,x0)Distance functionL1 distance

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Tingke Shen
  2. Peter Dayan
(2025)
Individual differences in tail risk sensitive exploration using Bayes-adaptive Markov decision processes
eLife 13:RP100366.
https://doi.org/10.7554/eLife.100366.3