(A) Our simple example environment consists of a linear track with 4 states (S1 to S4) and the animal always moves from left to right — i.e. one epoch consists of starting in S1 and ending in S4. (B)…
(A) Our two-dimensional environment contains 16 states. The 11th state is assumed to be an inaccessible obstacle. A random policy is guiding the trajectories, while the starting state is always …
Evolution of synaptic weights from CA3 state 2 to CA1 state 3 over time. Ten simulations of the linear track task (Figure 2) are performed, where the CA3-CA1 synaptic weights are randomly …
(A-top) Learning during behavior corresponds to TD(). States are traversed on timescales larger than the plasticity timescales and place cells use a rate-code. (A-middle) Comparison of the …
The discount parameter when we vary by: (a) increasing the duration of the presynaptic current, leading to a hyperbolic discount, and (b) increasing , while keeping fixed, leading to …
(A) In our model, the network can learn relationships between neurons that are active seconds apart, while the plasticity rule acts on a millisecond timescale. (B) Due to transitions between …
(A) The agent follows a stochastic policy starting from the initial state (denoted by START). The probability to move to either neighboring state is 50%. An epoch stops when reaching a terminal …
Unlike Figure 4, where the likelihood of replays is exponentially decaying over time, here we simulate equal probability for learning using replays (MC) and using behavioral experience (TD) …
Using the settings of Figure 4, we calculate the variance caused by random spiking for the behavioral model first. For this purpose, we simulate the same trajectory of the agent 25 times with …
(A–C) Data from Wu et al., 2017. (A) Experimental protocol: the animal is first allowed to freely run in the track (Pre). In the next trial, a footshock is given in the shock zone (SZ). In …
In Figure 5, the policy without replays has less SR updates. Here, we simulated this policy but doubled the time (and SR updates). Even with this modification, the agent keeps exploring the dark …
(A) A linear track with 4 states is simulated as in Figure 1, where we now assume the state 4 contains a reward with value 1. This reward is assumed to be encoded in the synaptic weights from the …
(A) The variable increases with the ratio and decreases with T. (B) The variable decreases with the ratio and decreases with T. (C) The input to CA1 increases with the ratio. …
A population of readout neurons is simulated as in Figure 5—figure supplement 2. Each row corresponds to a different choice of parameters and , chosen such that the variable has a value …
1 | |
0.1ms-1 | |
2ms | |
1 | |
1 | |
1 | |
stepsize | 0.01ms |
0.003 | |
60ms | |
1 ms-1 |