Active inference. (a) Qualitatively, agents receive observations from the environment and use these observations to optimize Bayesian beliefs under an internal cognitive (a.k.a., world or generative) model of the environment. Then agents actively sample the environment states by action, choosing actions that would make them in more favorable states. The environment changes its state according to agents’ policies (action sequences) and transition functions. Then again, agents receive new observations from the environment. (b) From a quantitative perspective, agents optimize the Bayesian beliefs under an internal cognitive (a.k.a., world or generative) model of the environment by minimizing the variational free energy. Then agents select policies minimizing the expected free energy, namely, the surprise expected in the future under a particular policy.

Ingredients for computational modeling of active inference

The contextual two-armed bandit task. (a) In this task, agents need to make two choices in each trial. The first choice is: “Stay” and “Cue”. The “Stay” option gives you nothing while the “Cue” option gives you a -1 reward and the context information about the “Risky” option in the current trial. The second choice is: “Safe” and “Risky”. The “Safe” option always gives you a +6 reward and the “Risky” option gives you a reward probabilistically, ranging from 0 to +12 depending on the current context (context 1 or context 2); (b) The four policies in this task are: “Cue” and “Safe”, “Stay” and “Safe”, “Cue” and “Risky”, and “Stay” and “Risky”; (c) The likelihood matrix maps from 8 hidden states (columns) to 7 observations (rows).

The simulation experiment results. This figure demonstrates how an agent selects actions and updates beliefs over 60 trials in the active inference framework. The first two panels (a-b) display the agent’s policy and depict how the policy probabilities are updated (choosing between the stay or cue option in the first choice, and selecting between the safe or risky option in the second choice). The scatter plot indicates the agent’s actions, with green representing the cue option when the context of the risky path is “Context 1” (high-reward context), orange representing the cue option when the context of the risky path is “Context 2” (low-reward context), purple representing the stay option when the agent is uncertain about the context of the risky path, and blue indicating the safe-risky choice. The shaded region represents the agent’s confidence, with darker shaded regions indicating greater confidence. The third panel(c) displays the rewards obtained by the agent in each trial. The fourth panel(d) shows the prediction error of the agent in each trial, which decreases over time. Finally, the fifth panel(e) illustrates the expected rewards of the “Risky Path” in the two contexts of the agent.

The experiment task and behavioral result. (a) The five stages of the experiment, which include the “You can ask” stage to prompt the participants to decide whether to request information from the Ranger, the “First choice” stage to decide whether to ask the ranger for information, the “First result” stage to display the result of the “First choice” stage, the “Second choice” stage to choose between left and right paths under different uncertainties and the “Second result” stage to show the result of the “Second choice” stage. (b) The number of times each option was selected. The error bar indicates the variance among participants. (c) The Bayesian Information Criterion of active inference, model-free reinforcement learning, and model-based reinforcement learning.

EEG results at the sensor level. (a) The electrode distribution. (b) The signal amplitude of different brain regions in the first and second half of the experiment in the “Second choice” stage. The error bar indicates the amplitude variance in each region. The right panel shows the visualization of the evoked data and spectrum data. (c) The signal amplitude of different brain areas in the “Second choice” stage where participants know the context or do not know the context of the right path. The error bar indicates the amplitude variance in each region. The right panel shows the visualization of the evoked data and spectrum data.

The source estimation results of expected free energy and active inference in the “First choice” stage. (A) The regression intensity (β) of expected free energy. The right panel indicates the regression intensity between the frontal pole (1, right half) and expected free energy, the black line indicates the average intensity of this region, and the gray-shaded region indicates the range of variation. The blue-shaded region indicates p < 0.01. (B) The regression intensity (β) of the value of avoiding risk. The right panel indicates the regression intensity between the middle temporal gyrus (5, left half) and the value of avoiding risk, the black line indicates the average intensity of this area, and the gray-shaded region indicates the range of variation. The green-shaded region indicates p < 0.05.

The source estimation results of avoiding risk and reducing ambiguity in the two result stages. (A) The regression intensity (β) of avoiding risk in the “First result” stage. The right panel indicates the regression intensity between the medial orbitofrontal cortex (5, left half) and avoiding risk, the black line indicates the average intensity of this region, and the gray-shaded region indicates the range of variation. The green-shaded region indicates p < 0.05. (B) The regression intensity (β) of reducing ambiguity in the “Second result” stage. The right panel indicates the regression intensity between the middle temporal gyrus (5, right half) and reducing ambiguity, the black line indicates the average intensity of this region, and the gray-shaded region indicates the range of variation. The green-shaded region indicates p < 0.05.

The source estimation results of expected free energy and the value of reducing ambiguity in the “Second choice” stage. (a) The regression intensity (β) of expected free energy. The right panel indicates the regression intensity between the rostral middle frontal gyrus (1, left half) and expected free energy, the black line indicates the average intensity of this region, and the gray shaded-region indicates the range of variation. The yellow shaded-region indicates p < 0.001. (b) The regression intensity (β) of the value of reducing ambiguity. The right panel indicates the regression intensity between the middle temporal gyrus (5, left half) and the value of reducing ambiguity, the black line indicates the average intensity of this region, and the gray-shaded region indicates the range of variation. The green-shaded region indicates p < 0.05.