Learning predictive cognitive maps with spiking neurons during behavior and replays

  1. Jacopo Bono
  2. Sara Zannone
  3. Victor Pedrosa
  4. Claudia Clopath  Is a corresponding author
  1. Department of Bioengineering, Imperial College London, United Kingdom
5 figures, 1 table and 1 additional file

Figures

Figure 1 with 2 supplements
Successor representation and neuronal network.

(A) Our simple example environment consists of a linear track with 4 states (S1 to S4) and the animal always moves from left to right — i.e. one epoch consists of starting in S1 and ending in S4. (B)…

Figure 1—figure supplement 1
Learning the SR in a two-dimensional environment.

(A) Our two-dimensional environment contains 16 states. The 11th state is assumed to be an inaccessible obstacle. A random policy is guiding the trajectories, while the starting state is always …

Figure 1—figure supplement 2
The equivalence with TD(λ) guarantees convergence even with random initial synaptic weights.

Evolution of synaptic weights from CA3 state 2 to CA1 state 3 over time. Ten simulations of the linear track task (Figure 2) are performed, where the CA3-CA1 synaptic weights are randomly …

Figure 2 with 1 supplement
Comparison between TD(λ) and our spiking model.

(A-top) Learning during behavior corresponds to TD(λ0). States are traversed on timescales larger than the plasticity timescales and place cells use a rate-code. (A-middle) Comparison of the …

Figure 2—figure supplement 1
Comparison of the exact and approximate equations for the parameters.

The discount parameter γ when we vary T by: (a) increasing the duration of the presynaptic current, θ leading to a hyperbolic discount, and (b) increasing ψ, while keeping θ fixed, leading to …

Learning on behavioral timescales and state-dependent discounting.

(A) In our model, the network can learn relationships between neurons that are active seconds apart, while the plasticity rule acts on a millisecond timescale. (B) Due to transitions between …

Figure 4 with 2 supplements
Replays can be used to control the bias-variance trade-off.

(A) The agent follows a stochastic policy starting from the initial state (denoted by START). The probability to move to either neighboring state is 50%. An epoch stops when reaching a terminal …

Figure 4—figure supplement 1
Combining equal amounts of replays and behavioral learning.

Unlike Figure 4, where the likelihood of replays is exponentially decaying over time, here we simulate equal probability for learning using replays (MC) and using behavioral experience (TD) …

Figure 4—figure supplement 2
Setting the noise for replays.

Using the settings of Figure 4, we calculate the variance caused by random spiking for the behavioral model first. For this purpose, we simulate the same trajectory of the agent 25 times with …

Figure 5 with 4 supplements
Reproducing place-avoidence experiments with the spiking model.

(A–C) Data from Wu et al., 2017. (A) Experimental protocol: the animal is first allowed to freely run in the track (Pre). In the next trial, a footshock is given in the shock zone (SZ). In …

Figure 5—figure supplement 1
Doubling the time-steps in the scenario without replays.

In Figure 5, the policy without replays has less SR updates. Here, we simulated this policy but doubled the time (and SR updates). Even with this modification, the agent keeps exploring the dark …

Figure 5—figure supplement 2
The value of states can be read out by downstream neurons.

(A) A linear track with 4 states is simulated as in Figure 1, where we now assume the state 4 contains a reward with value 1. This reward is assumed to be encoded in the synaptic weights from the …

Figure 5—figure supplement 3
Dependency of γ, λ and the place-tuned input to CA1 on θ/T, for various values of T and depression amplitude Apre.

(A) The γ variable increases with the ratio θ/T and decreases with T. (B) The λ variable decreases with the ratio θ/T and decreases with T. (C) The ρbias input to CA1 increases with the ratio.θ/T

Figure 5—figure supplement 4
Readout of the state value for various parameters.

A population of readout neurons is simulated as in Figure 5—figure supplement 2. Each row corresponds to a different choice of parameters θ/T and Apre, chosen such that the γ variable has a value …

Tables

Table 1
Parameters used for the spiking network.
ϵ01
ρpre0.1ms-1
τm2ms
Npost1
Npretot1
Npre1
stepsize0.01ms
ηstdp0.003
τLTP60ms
ALTP1 ms-1

Additional files

Download links