Uncertainty-modulated prediction errors in cortical microcircuits

  1. Katharina Anna Wilmes  Is a corresponding author
  2. Mihai A Petrovici
  3. Shankar Sachidhanandam
  4. Walter Senn
  1. Department of Physiology, University of Bern, Switzerland
9 figures, 6 tables and 1 additional file

Figures

Distributed uncertainty-modulated prediction error computation in cortical circuits.

(A) A person who learned that buses are unreliable has a prior expectation, which can be described by a wide Gaussian distribution of expected bus arrival times. When the bus does not arrive at the scheduled time, this person is not surprised and remains calm, as everything happens according to their model of the world. On the other hand, a person who learned that buses are punctual, which can be described by a narrow distribution of arrival times, may notice that the bus is late and get nervous, as they expected the bus to be punctual. This person can learn from this experience. If they always took this particular bus, and their uncertainty estimate is accurate, the prediction error could indicate that the bus schedule changed. (B) Models of uncertainty representation in cortex. Some models suggest that uncertainty is only represented in higher level areas concerned with decision-making (left). In contrast, we propose that uncertainty is represented at each level of the cortical hierarchy (right, shown is the visual hierarchy as an example). (C) A mouse learns the association between a sound (a) and a whisker deflection (s). The posterior parietal cortex (PPC) receives inputs from both somatosensory and auditory cortex. (D) The whisker stimulus intensities are drawn from a Gaussian distribution with mean μ and standard deviation σ. (E) Negative (left) and positive (right) prediction error circuit consisting of three cell types: layer 2/3 pyramidal cells (triangle), somatostatin-positive interneurons (SST, circle) and parvalbumin-positive interneurons (PV). SSTs represent the mean prediction in the postive circuit and the stimulus in the negative circuit, and PVs represent the variance.

SSTs learn to represent the mean context-dependently.

Illustration of the changes in the positive prediction error circuit. Thicker lines denote stronger weights. (B) Two different tones (red, orange) are associated with two somatosensory stimulus distributions with different means (red: high, orange: low). (C) SST firing rates (mean and std) during stimulus input. (D) SST firing rates over time for low (orange) and high (red) stimulus means. (E) Weights (mean and std) from sound a to SST for different values of μ. (F) SST firing rates (mean and std) for different values of μ. Mean and std were computed over 1000 data points from timestep 9000–10,000.

Figure 3 with 3 supplements
PVs learn to estimate the variance context-dependently.

(A) Illustration of the changes in the positive prediction error circuit. Thicker lines denote stronger weights. (B) Two different tones (purple, green) are associated with two somatosensory stimulus distributions with different variances (purple: high, green: low). (C) Weights from sound a to PV over time for two different values of stimulus variance (high: σ=0.8 [purple], low: σ=0.4 [green]). (D) PV firing rates over time given sound input (without whisker stimulus input) for low (green) and high (purple) stimulus variance. (E) PV firing rates (mean and std) given sound input and whisker stimuli for low and high stimulus variance. (F) PV firing rates (mean and std) during sound and stimulus input. (G) Weights (mean and std) from sound a to PV for different values of σ. (H) PV firing rates (mean and std) given sound input for different values of σ2. Mean and std were computed from 150,000 data points from timestep 450,000–600,000.

Figure 3—figure supplement 1
Different choice of supralinear activation function for PV.

Learning the variance in the positive prediction error circuit with PVs with a power activation function (exponent = 3.0). (A, B) are analogous to Figure 3G and H, and the circuit is the same except that the activation function of the PVs (ϕPV(x)) has an exponent of 3.0 instead of 2.0. (C, D) are zoomed-out versions of (A, B).

Figure 3—figure supplement 2
Plastic weights from SST to PV learn to match weights from s to PV.

With inhibitory plasticity, weights from SST to PV can be learned. This figure shows that the weight from SST to PV (wPV,SST) is equal to the weight from s to PV (wPV,s). The inhibitory plasticity rule is described in the Appendix.

Figure 3—figure supplement 3
PVs learn to represent the variance given an associative cue in the negative prediction error circuit.

(A) Illustration of the changes in the negative prediction error circuit. Thicker lines denote stronger weights. (B) Two different tones (purple, green) are associated with two somatosensory stimulus distributions with different variances (purple: high, green: low). (C) Weights from sound a to PV over time for two different values of stimulus variance (high: σ=0.8 [purple], low: σ=0.4 [green]). (D) PV firing rates over time given sound input (without stimulus input) for low (green) and high (purple) stimulus variance. (E) PV firing rates (mean and std) given sound input for low and high stimulus variance. (F) PV firing rates (mean and std) during sound and stimulus input. (G) Weights from sound a to PV for different values of σ (mean and std). (H) PV firing rates given sound input for different values of σ2 (mean and std).

Figure 4 with 1 supplement
Calculation of the UPE in layer 2/3 error neurons.

(A) Illustration of the negative prediction error circuit. (B) Distributions with different standard deviations σ. (C) Illustration of the positive prediction error circuit. (D) Firing rate of the error neuron in the negative prediction error circuit (UPE) as a function of σ for two values of |sμ| after learning μ and σ. (E) Rates of both UPE+ and UPE-representing error neurons with a nonlinear activation function, where k=2.0, as a function of the difference between the stimulus and the mean (sμ). (F) Firing rate of the error neuron in the positive prediction error circuit (UPE+) as a function of σ for two values of |sμ| after learning μ and σ. (G-I) same as (D-F) for error neurons with k=2.5 with the same legend as in (D-F).

Figure 4—figure supplement 1
Learning the weights from the SSTs to the UPE neurons.

This figure shows that the weights from the SSTs to the UPEs in both the positive (left) and the negative (right) prediction error circuit can be learned with inhibitory plasticity to match the weights from the stimulus representation s to the UPEs. The inhibitory plasticity rule is described in the Appendix.

Figure 5 with 1 supplement
Learning the mean representation with UPEs.

(A) Illustration of the circuit. A representation neuron (turquoise) receives input from both positive and negative prediction error circuits (UPE+ and UPE) and projects back to them. The UPE has a negative impact on the firing rate of the representation neuron (rR). A weight wR,a from the higher level representation of the context given by sound a is learned. (B) Weights wR,a over time for different values of μ (μ[1,3,5]). (C) R firing rates given sound input for different values of μ (mean and std over 50,000 data points from timestep 50,000–100,000,, the end of the simulation). (D) Activity of the different cell types (PV: light green, R: turquiose, UPE: black) and whisker stimulus samples (grey dots) over time. Learning the mean representation with PVs (light green) reflecting the MSE at the beginning, which is compensated by nonlinear activation of L2/3 neurons (black). The evolution of the mean rate of neuron R (turquoise) is similar to the perfect case in (E). (E) Same colour code as in (D). Inset shows comparison to (D) Learning the mean representation assuming PVs (light green) perfectly represent the variance.

Figure 5—figure supplement 1
PV firing rates are proportional to the variance in the recurrent circuit model.

Weights from a to PV as a function of σ in the positive (A) and negative (C) prediction error subcircuit. PV firing rates as a function of σ2 in the positive (B) and negative (D) prediction error circuit.

Cell-type-specific experimentally testable predictions.

(A) Illustration of the two experienced stimulus distributions with different variances that are associated with two different sounds (green, purple). The presented mismatch (MM) stimulus (black) is larger than expected (positive prediction error). (B-F) Simulated firing rates of different cell types to positive prediction errors when a sound associated with high (purple) or low (green) uncertainty is presented. (G) As in (A) the presented mismatch stimulus (grey) is smaller than expected (negative prediction error). (H-L) Firing rates of different cell types to the negative mismatch when a sound associated with high (purple) or low (green) uncertainty is presented. Because the firing rate predictions are equal for PV+ and PV, we only show the results for PV+ in the figure.

Effective learning rate is automatically adjusted with UPEs.

(A, B) Firing rate over time of the representation neuron in a circuit with uncertainty-modulated prediction errors (gold) and in a circuit with unmodulated errors (black) in a low uncertainty setting (A) and a high uncertainty setting (B). (C) Standard deviation of the firing rate of the representation neuron in the low uncertainty setting (inset has a different scale, outer axis scale matches the one in D). (D) Standard deviation of the firing rate of the representation neuron in the high uncertainty setting. (E) Standard deviation of the firing rate rR as a function of the standard deviation of the presented stimulus distribution σs. Standard deviations were computed over 100,000 data points from timestep 100,000–200,000.

UPEs ensure uncertainty-based weighting of prior and sensory information.

(A) Illustration of the hierarchical two-area model. A representation neuron (r) in area receives positive and negative UPEs from the area 1 below (sensory prediction error, as in Figure 5), and positive and negative UPEs from the same area (representation prediction error) with different signs, see Equation 5. In this example, the uncertainty in area 1 corresponds to the sensory uncertainty σs2, and the uncertainty in the area above, corresponds to the prior uncertainty σp2. Both uncertainties are represented by PV activity in the respective area. (B, C) The simulation results from the hierarchical circuit model yield an uncertainty-based weighting of prior and sensory information as in Equation 4. (B) The activity r of the representation neuron as a function of the stimulus s for different amounts of prior uncertainty σp2, when the sensory uncertainty is fixed σs2=0.5. The histograms show the distribution of the sensory stimulus samples (grey) and the distribution of the activity of the representation neuron (green). (C) The activity of the representation neuron as a function of the stimulus for different amounts of sensory uncertainty σs2, when the prior uncertainty is fixed σp2=0.5.

For small β, and ws=2ββ, the weight from a to PV approaches σ and the PV firing rate approaches σ2.

Tables

Table 1
Parameters of the network.
ParameterValueDescription
wPV+,SST+2ββ

Weight from SST+ to PV+
wPV+,s2ββ

Weight from s to PV+
wPV,SST2ββ

Weight from SST to PV
wPV,R2ββ

Weight from R to PV
wSST+,R1.0Weight from R to SST+
wSST,s1.0Weight from s to SST
wUPE+,SST+1.0Weight from SST+ to UPE+
wUPE+,s1.0Weight from s to UPE+
wUPE,R1.0Weight from R to UPE
wUPE,SST1.0Weight from SST to UPE
wR,UPE+0.1(1.0)Weight from UPE+ to R (Figure 6)
wR,UPE0.1(1.0)Weight from UPE to R (Figure 6)
xmax20Limits neuronal activity
β0.1Nudging parameter
Table 2
Additional parameters of the hierarchical network.
ParameterValueDescription
wR,UPE+1.0Weight from UPE+ to R
wR,UPE1.0Weight from UPE to R
wUPE,R1.0Weight from R to UPE
wUPE,prior1.0Weight from R to UPE
Table 3
Inputs.
ParameterValueDescription
a{0.0, 1.0}Auditory stimulus (on/off)
sN(μ,σ)Somatosensory (whisker) stimulus
N1000–20,000Number of whisker stimulus samples
D{10, 100}Stimulus duration (Figures 15 and 7)
Table 4
Parameters of the plasticity rules.
ParameterValueDescription
ηSST0.1Learning rate for wSST+/,a
ηPV0.01ηR=0.001Learning rate for wPV+/,a
ηR0.1Learning rate for wR,a
wSST,ainitial0.01Initial value for wSST+/,a
wPV,ainitial0.01Initial value for dt=0.1
wR,ainitial0.01Initial value for wR,a
Table 5
Parameters of simulations in Figures 25.
ParameterValueDescription
TNDSimulation time
dt0.1Simulation time step
τE1.0Excitatory membrane time constant
τI1.0Inhibitory membrane time constant
Table 6
Parameters of the simulation in Figure 6.
ParameterValueDescription
TNDSimulation time
dt0.1Simulation time step
τE10.0Excitatory membrane time constant
τI2.0Inhibitory membrane time constant
ηR0.01Learning rate of wR,A

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Katharina Anna Wilmes
  2. Mihai A Petrovici
  3. Shankar Sachidhanandam
  4. Walter Senn
(2025)
Uncertainty-modulated prediction errors in cortical microcircuits
eLife 13:RP95127.
https://doi.org/10.7554/eLife.95127.4