Distributed uncertainty-modulated prediction error computation in cortical circuits

A: A person who learned that buses are unreliable has a prior expectation, which can be described by a wide Gaussian distribution of expected bus arrival times. When the bus does not arrive at the scheduled time, this person is not surprised and remains calm, as everything happens according to their model of the world. On the other hand, a person who learned that buses are punctual, which can be described by a narrow distribution of arrival times, may notice that the bus is late and get nervous, as they expected the bus to be punctual. This person can learn from this experience. If they always took this particular bus, and their uncertainty estimate is accurate, the prediction error could indicate that the bus schedule changed. B: Models of uncertainty representation in cortex. Some models suggest that uncertainty is only represented in higher-level areas concerned with decision-making (left). In contrast, we propose that uncertainty is represented at each level of the cortical hierarchy (right, shown is the visual hierarchy as an example). C: a mouse learns the association between a sound (a) and a whisker deflection (s). The posterior parietal cortex (PPC) receives inputs from both somatosensory and auditory cortex. D: The whisker stimulus intensities are drawn from a Gaussian distribution with mean µ and standard deviation σ. E: Negative (left) and positive (right) prediction error circuit consisting of three cell types: layer 2/3 pyramidal cells (triangle), somatostatin-positive interneurons (SST, circle) and parvalbumin-positive interneurons (PV). SSTs represent the mean prediction in the postive circuit and the stimulus in the negative circuit, and PVs represent the variance.

SSTs learn to represent the mean context-dependently.

Illustration of the changes in the positive prediction error circuit. Thicker lines denote stronger weights. B: Two different tones (red, orange) are associated with two somatosensory stimulus distributions with different means (red: high, orange: low). C: SST firing rates (mean and std) during stimulus input. D: SST firing rates over time for low (orange) and high (red) stimulus means. E: Weights (mean and std) from sound a to SST for different values of µ. F: SST firing rates (mean and std) for different values of µ. Mean and std were computed over 1000 data points from timestep 9000 to 10000.

PVs learn to estimate the variance context-dependently.

A: Illustration of the changes in the positive prediction error circuit. Thicker lines denote stronger weights. B: Two different tones (purple, green) are associated with two somatosensory stimulus distributions with different variances (purple: high, green: low). C: Weights from sound a to PV over time for two different values of stimulus variance (high: σ = 0.8 (purple), low: σ = 0.4 (green)). D: PV firing rates over time given sound input (without whisker stimulus input) for low (green) and high (purple) stimulus variance. E: PV firing rates (mean and std) given sound input and whisker stimuli for low and high stimulus variance. F: PV firing rates (mean and std) during sound and stimulus input. G: Weights (mean and std) from sound a to PV for different values of σ. H: PV firing rates (mean and std) given sound input for different values of σ2. Mean and std were computed from 150000 data points from timestep 450000 to 600000.

Calculation of the UPE in layer 2/3 error neurons

A: Illustration of the negative prediction error circuit. B: Illustration of the positive prediction error circuit. C: Illustration of error neuron with a nonlinear integration of inputs (k = 2). D: firing rate of the error neuron in the negative prediction error circuit (UPE) as a function of σ for two values of |sµ| after learning µ and σ. E: Rates of both UPE+ and UPE-representing error neurons as a function of the difference between the stimulus and the mean (sµ). F: firing rate of the error neuron in the positive prediction error circuit (UPE+) as a function of σ for two values of |sµ| after learning µ and σ. G: Illustration of an error neuron with a non-linear activation function k = 2.5. H-J: same as D-F for error neurons with k = 2.5.

Learning the mean representation with UPEs

A: Illustration of the circuit. A representation neuron (turquoise) receives input from both positive and negative prediction error circuits (UPE+ and UPE) and projects back to them. The UPE has a negative impact on the firing rate of the representation neuron (rR). A weight wR,a from the higher level representation of the context given by sound a is learned. B: Weights wR,a over time for different values of µ (µ ∈ [1, 3, 5]). C: R firing rates given sound input for different values of µ (mean and std over 50000 data points from timestep 50000 to 100000, the end of the simulation). D: Activity of the different cell types (PV: light green, R: turquiose, UPE: black) and whisker stimulus samples (grey dots) over time. Learning the mean representation with PVs (light green) reflecting the MSE at the beginning, which is compensated by nonlinear activation of L2/3 neurons (black). The evolution of the mean rate of neuron R (turquoise) is similar to the perfect case in E. E: Same colour code as in D. Inset shows comparison to D. Learning the mean representation assuming PVs (light green) perfectly represent the variance.

Cell-type specific experimentally testable predictions

A: Illustration of the two experienced stimulus distributions with different variances that are associated with two different sounds (green, purple). The presented mismatch (MM) stimulus (black) is larger than expected (positive prediction error). B-F: Simulated firing rates of different cell types to positive prediction errors when a sound associated with high (purple) or low (green) uncertainty is presented. G: As in A. The presented mismatch stimulus (grey) is smaller than expected (negative prediction error). H-L: Firing rates of different cell types to the negative mismatch when a sound associated with high (purple) or low (green) uncertainty is presented. Because the firing rate predictions are equal for PV+ and PV, we only show the results for PV+ in the figure.

Effective learning rate is automatically adjusted with UPEs

A,B: Firing rate over time of the representation neuron in a circuit with uncertainty-modulated prediction errors (gold) and in a circuit with unmodulated errors (black) in a low uncertainty setting (A) and a high uncertainty setting (B), C: standard deviation of the firing rate of the representation neuron in the low uncertainty setting (inset has a different scale, outer axis scale matches the one in D), D: standard deviation of the firing rate of the representation neuron in the high uncertainty setting. E: Standard deviation of the firing rate rR as a function of the standard deviation of the presented stimulus distribution σs. Standard deviations were computed over 100000 data points from timestep 100000 to 200000

UPEs ensure uncertainty-based weighting of prior and sensory information.

A: Illustration of the hierarchical two-area model. A representation neuron (r) in area ℳ receives positive and negative UPEs from the area ℳ − 1 below (sensory prediction error, as in Fig. 5), and positive and negative UPEs from the same area (representation prediction error) with different signs, see Eq. 5. In this example, the uncertainty in area ℳ − 1 corresponds to the sensory uncertainty , and the uncertainty in the area ℳ above, corresponds to the prior uncertainty . Both uncertainties are represented by PV activity in the respective area. B,C: The simulation results from the hierarchical circuit model yield an uncertainty-based weighting of prior and sensory information as in Eq. 4. B: The activity r of the representation neuron as a function of the stimulus s for different amounts of prior uncertainty , when the sensory uncertainty is fixed . The histograms show the distribution of the sensory stimulus samples (grey) and the distribution of the activity of the representation neuron (green). C: The activity of the representation neuron as a function of the stimulus for different amounts of sensory uncertainty , when the prior uncertainty is fixed .

Parameters of the network.

For small β, and , the weight from a to PV approaches σ and the PV firing rate approaches σ2.

Additional Parameters of the hierarchical network.

Inputs.

Parameters of the plasticity rules.

Parameters of simulations in Figs. 2-5.

Parameters of the simulation in Fig. 6.

Learning the variance in the positive prediction error circuit with PVs with a power activation function (exponent = 3.0). A and B are analogous to Fig. 3G and H, and the circuit is the same except that the activation function of the PVs (φP V (x)) has an exponent of 3.0 instead of 2.0. C and D are zoomed-out versions of A and B.

With inhibitory plasticity, weights from SST to PV can be learned. This figure shows that the weight from SST to PV (wPV,SST) is equal to the weight from s to PV (wPV,s). The inhibitory plasticity rule is described in the Supplementary Methods.

PVs learn to represent the variance given an associative cue in the negative prediction error circuit.

A: Illustration of the changes in the negative prediction error circuit. Thicker lines denote stronger weights. B: Two different tones (purple, green) are associated with two somatosensory stimulus distributions with different variances (purple: high, green: low). C: Weights from sound a to PV over time for two different values of stimulus variance (high: σ = 0.8 (purple), low: σ = 0.4 (green)). D: PV firing rates over time given sound input (without stimulus input) for low (green) and high (purple) stimulus variance. E: PV firing rates (mean and std) given sound input for low and high stimulus variance. F: PV firing rates (mean and std) during sound and stimulus input. G: Weights from sound a to PV for different values of σ (mean and std). H: PV firing rates given sound input for different values of σ2 (mean and std).

Learning the weights from the SSTs to the UPE neurons. This figure shows that the weights from the SSTs to the UPEs in both the positive (left) and the negative (right) prediction error circuit can be learned with inhibitory plasticity to match the weights from the stimulus representation s to the UPEs. The inhibitory plasticity rule is described in the supplementary methods.

PV firing rates are proportional to the variance in the recurrent circuit model. Weights from a to PV as a function of σ in the positive (A) and negative (C) prediction error subcircuit. PV firing rates as a function of σ2 in the positive (B) and negative (D) prediction error circuit.