Active inference. (a) Qualitatively, agents receive observations from the environment and use these observations to optimize the internal cognitive model of the environment. Then agents actively sample the environment states by action (choose actions that would make them in more favorable states). The environment changes its state according to agents’ actions and again agents receive new observations from the environment. (b) Quantiflcationally, agents optimize the internal cognitive model by minimizing the variational free energy. Then agents select policies (actions) by minimizing the expected free energy to minimize the surprise in the future.

Ingredients for computational modeling of active inference

Generative model of the contextual two-armed bandit task. (a) There are 2 stages in this task. The flrst choice is: “Stay” and “Cue”. The “Stay” option gives you nothing while the “Cue” option gives you a −1 reward and the context information about the “Risky” option in the current trial. The second choice is: “Safe” and “Risky”. The “Safe” option gives you a +6 reward and the “Risky” option gives you a reward probabilistically ranging from 0 to +12 depending on the current context (context 1 or context 2); (b) The four policies in this task are: “Cue” and “Safe”, “Stay” and “Safe”, “Cue” and “Risky”, and “Stay” and “Risky”; (c) The A-matrix maps from 8 hidden states (columns) to 7 observable outcomes (rows).

The simulation experiment results. This Flgure demonstrates how an agent selects actions and updates beliefs over 60 trials in the active inference framework. The flrst two panels (a-b) display the agent’s policy and depict how the policy probabilities are updated (choosing between the stay or cue option in the flrst choice, and selecting between the safe or risky option in the second choice). The scatter plot indicates the agent’s actions, with green representing the cue option when the context of the risky path is “Context 1” (high-reward context), orange representing the cue option when the context of the risky path is “Context 2” (low-reward context), purple representing the stay option when the agent is uncertain about the context of the risky path, and blue indicating the safe-risky choice. The shadow represents the agent’s confldence, with darker shadows indicating greater confldence. The third panel(c) displays the rewards obtained by the agent in each trial. The fourth panel(d) shows the prediction error of the agent in each trial, which decreases over time. Flnally, the flfth panel(e) illustrates the expected rewards of the “Risky Path” in the two contexts of the agent.

The experiment task and behavioral result. Panel (a) outlines the flve stages of the experiment, which include the “You can ask” stage to determine if participants can request information from the ranger, the “Flrst choice” stage to decide whether to ask the ranger for information, the “Flrst result” stage to display the result of the “Flrst choice” stage, the “Second choice” stage to choose between left and right paths under different uncertainties, and the “Second result” stage to show the result of the “Second choice” stage. Panel (b) displays the number of times each option was selected. Flnally, panel (c) compares model-free RL and active inference models.

EEG results in at the sensor level. (a) The electrode distribution. (b) The signal amplitude of different brain regions in the flrst and second half of the experiment in the “Second choice” stage. The right panel shows the visualization of the evoked data and spectrum data. (c) The signal amplitude of different brain areas in the “Second choice” stage where participants know the context or do not know the context of the right path. The right panel shows the visualization of the evoked data and spectrum data

The source estimation results of the “Flrst choice” stage of expected free energy and active inference. (A) The regression coefflcients (β) of the expected free energy regressor. The blue point indicates the most correlated brain region (the lateral occipital cortex, left hemisphere, MNI: [−9.9, −96.8, 9.8]). The right panel shows the neural activity of the blue point and the shadow indicates that the neural activity of the blue point in these time intervals (0.380s to 0.581s and 1.172s to 1.724s) is signiflcantly correlated with expected free energy (p < 0.05 and the time intervals are longer than 0.2s). (B) The regression coefflcients (β) of the active inference regressor. The blue point indicates the most correlated brain region (the middle temporal gyrus, left hemisphere, MNI: [-63.5, −23.5, −13.8]). The right panel shows the neural activity of the blue point and the green shadow indicates that the neural activities of the blue point in these time intervals (0.08s to 0.636s, 0.657s to 0.906s, and 1.32s to 2.00s) are signiflcantly correlated with active inference (p < 0.05 and the time intervals are longer than 0.2s).

The source estimation results of the two result stages of active inference and active learning. (A) The regression coefflcients (β) of the active inference regressor in the “Flrst result” stage. The blue point indicates the most correlated brain region (the middle temporal gyrus, right hemisphere, MNI: [52.6, −32.3, −19.7]). The right panel shows the neural activity of the blue point and the shadow indicates that the neural activity of the blue point in these time intervals (0.076s to 0.436s, 0.47s to 0.67s, and 0.80s to 1.24s) is signiflcantly correlated with expected free energy (p < 0.05 and the time intervals are longer than 0.2s). (B) The regression coefflcients (β) of the active learning regressor in the “Second result” stage. The blue point indicates the most correlated brain region (the intersection of the middle temporal gyrus and the bankssts, left hemisphere, MNI: [-52.5, −56.6, 5.5]). The right panel shows the neural activity of the blue point and the shadow indicates that the neural activity of the blue point in these time intervals (0.355s to 0.745 and 1.441 to 1.651) is signiflcantly correlated with active inference (p < 0.05 and the time intervals are longer than 0.2s).

The source estimation results of the “Second choice” stage of expected free energy and active learning. (a) The regression coefflcients (β) of the expected free energy regressor. The blue point indicates the most correlated brain region (the lateral occipital cortex, left hemisphere, MNI: [−5.5, −92.8, 15.4]). The right panel shows the neural activity of the blue point and the shadow indicates that the neural activity of the blue point in these time intervals (0.188s to 0.516s, 0.532s to 1.312s, and 1.336s to 2.00s) is signiflcantly correlated with expected free energy (p < 0.05 and the time intervals are longer than 0.2s). (b) The regression coefflcients (β) of the active learning regressor. The blue point indicates the most correlated brain region (the lateral occipital cortex, left hemisphere, MNI: [−9.9, −96.8, 9.8]). The right panel shows the neural activity of the blue point and the shadow indicates that the neural activity of the blue point in the time intervals (0.108s to 2.00s) is signiflcantly correlated with expected free energy (p < 0.05).