Recurrent network model for learning goaldirected sequences through reverse replay
 Cited 0
 Views 1,174
 Annotations
Abstract
Reverse replay of hippocampal place cells occurs frequently at rewarded locations, suggesting its contribution to goaldirected path learning. Symmetric spiketiming dependent plasticity (STDP) in CA3 likely potentiates recurrent synapses for both forward (start to goal) and reverse (goal to start) replays during sequential activation of place cells. However, how reverse replay selectively strengthens forward synaptic pathway is unclear. Here, we show computationally that firing sequences bias synaptic transmissions to the opposite direction of propagation under symmetric STDP in the copresence of shortterm synaptic depression or afterdepolarization. We demonstrate that significant biases are created in biologically realistic simulation settings, and this bias enables reverse replay to enhance goaldirected spatial memory on a Wmaze. Further, we show that essentially the same mechanism works in a twodimensional open field. Our model for the first time provides the mechanistic account for the way reverse replay contributes to hippocampal sequence learning for rewardseeking spatial navigation.
https://doi.org/10.7554/eLife.34171.001eLife digest
To find their way around, animals – including humans – rely on an area of the brain called the hippocampus. Studies in rodents have shown that certain neurons in the hippocampus called place cells become active when an animal passes through specific locations. At each position, a different set of place cells fires. A journey from A to B will thus be accompanied by a sequential activation of place cells corresponding to a particular point.
Rats can learn new routes to a given place. Every time they take a specific way, the connections between the activated place cells become stronger. After learning, the hippocampus replays the sequence of place cell activation both in the order the rat has experienced and backwards. This is known as reverse replay, which occurs more often when the animals find rewards at their destination. This suggests that reverse replay may help animals learn the routes to locations where food is available.
To test this idea, Haga and Fukai built a computer model that simulates the hippocampal activity seen in a rat running through a maze. In contrast to previous models, which featured only forward replay, the new simulation also included reverse replay. The results confirmed that reverse replay helps the rodents to learn routes to rewarded locations. It also enables the hippocampus to combine multiple past experiences, which may teach animals that a combination of previous paths will lead to a reward, even if they have never tried the combined route before.
The hippocampus has a central role in many different types of memory. The findings by Haga and Fukai may therefore provide a framework for studying the mechanisms of memory and decisionmaking. The results could even offer insight into the mechanisms of logical thinking. After all, the ability to combine multiple known paths into a new route bears some similarity to joining up thought processes such as ‘If Sophie oversleeps, she will miss the bus’ with ‘If Sophie misses the bus, she will be late for school’ to reason that ‘If Sophie oversleeps, she will be late for school’. Future studies should test whether reverse replay helps with this process of deduction.
https://doi.org/10.7554/eLife.34171.002Introduction
The hippocampus plays an important role in episodic memory and spatial processing in the brain (O'Keefe and Dostrovsky, 1971; Scoville and Milner, 1957). Because an episode is a sequence of events, sequential neural activity has been extensively studied as the basis of hippocampal memory processing. In the rodent hippocampus, firing sequences of place cells are replayed during awake immobility and sleep (Carr et al., 2011; Pfeiffer, 2018) and these reactivations are crucial for performance in spatial memory tasks (Girardeau et al., 2009; Jadhav et al., 2012; Singer et al., 2013). Replay can be either in the same firing order as experienced (forward replay) or in the reversed order (reverse replay). Forward replay is observed during sleep after exploration (Lee and Wilson, 2002; Wikenheiser and Redish, 2013) or in immobile states before rats start to travel towards reward (Diba and Buzsáki, 2007; Pfeiffer and Foster, 2013), hence forward replay is thought to engage in the consolidation and retrieval of spatial memory. In contrast, reverse replay presumably contributes to the optimization of goaldirected paths because rewarded spatial paths are replayed around the timing of reward delivery (Diba and Buzsáki, 2007; Foster and Wilson, 2006), and the occurrence frequency is modulated by the presence and the amount of reward (Ambrose et al., 2016; Singer and Frank, 2009).
Because sequences are essentially time asymmetric, sequence learning often hypothesizes asymmetric spiketimingdependent plasticity (STDP) found in CA1 (Bi and Poo , 2001) which induce longterm potentiation (LTP) for pretopost firing order and longterm depression (LTD) for posttopre firing order. This type of STDP enables a recurrent network to reactivate sequential firing in the same order with the experience. However, in the hippocampal area CA3, it was recently reported that the default form of STDP is time symmetric at recurrent synapses (Mishra et al., 2016). Because CA3 is the most likely source of hippocampal firing sequences (Middleton and McHugh, 2016; Nakashiba et al., 2009), this finding raises the question whether and how STDP underlies sequence learning in hippocampus. A symmetric time window implies that a firing sequence equally strengthens both forward synaptic pathways leading to the rewarded location and reverse pathways leaving away from the rewarded location in CA3 recurrent network. However, rewardbased optimization requires selective reinforcement of forward pathways as it will strengthen prospective placecell sequences in subsequent trials and forward replay events in the consolidation phase. Such bias toward forward sequences in postexperience sleep (Wikenheiser and Redish, 2013) and goaldirected behavior (Johnson and Redish, 2007; Pfeiffer and Foster, 2013; Wikenheiser and Redish, 2015) is actually observed in hippocampus. How this directionality arises in replay events and how reverse replay enables the learning of goaldirected navigation remain unclear.
In this paper, we first show how goaldirected path learning is naturally realized through reverse replay in a onedimensional chain model. To this end, we hypothesize that the contribution of presynaptic spiking for STDP is attenuated in CA3 by shortterm depression, as was revealed in the rat visual cortex (Froemke et al., 2006). Under this condition, symmetric STDP and a ratebased Hebbian plasticity rule bias recurrent synaptic weights toward the opposite direction to the propagation of a firing sequence, implying that the combined rule virtually acts like antiHebbian STDP. We also show that accumulation of afterdepolarization (Mishra et al., 2016) in postsynaptic neurons results in the same effect. By simulating the model with various spiking patterns, we confirm this effect for a broad range of spiking patterns and parameters of plasticity rules, including those observed in experiments. Based on this mechanism, we built a twodimensional recurrent network model of place cells with the combination of reverse replay, Hebbian plasticity with shortterm plasticity, and rewardinduced enhancement of replay frequency (Ambrose et al., 2016; Singer and Frank, 2009). We first demonstrate that the network model can learn forward pathways leading to reward on a Wshaped track. Further, we extend the role of reverse replay to unbiased sequence propagations from reward sites on a twodimensional open field, which enable learning of goaldirected behavior in the open field. Unlike the previous models for hippocampal sequence learning (Blum and Abbott, 1996; Gerstner and Abbott, 1997; Jahnke et al., 2015; Jensen and Lisman, 1996; Sato and Yamaguchi, 2003; Tsodyks and Sejnowski, 1995) in which recurrent networks learn and strengthen forward sequences through forward movements, our model proposes goaldirected path learning through reverse sequences.
Results
Hebbian plasticity with shortterm depression potentiates reverse synaptic transmissions
We first simulated a sequential firing pattern that propagates through a onedimensional recurrent neural network, and evaluated weight changes by ratebased Hebbian plasticity rules. The network consists of 500 rate neurons, which were connected with distancedependent excitatory synaptic weights modulated by shortterm synaptic plasticity (STP) (Romani and Tsodyks, 2015; Wang et al., 2015). In addition, the network had global inhibitory feedback to all neurons. A first external input to a neuron at one end (#0) elicited traveling waves of neural activity propagating to the opposite end (Figure 1A). Here, we regard these activity patterns as a model of hippocampal firing sequences (Romani and Tsodyks, 2015; Wang et al., 2015). A second external input to a neuron at the center of the network triggered firing sequences propagating to both directions (Figure 1A) because synaptic weights were symmetric (Figure 1B). Here, we implemented a standard Hebbian plasticity rule, which potentiated excitatory synaptic weights by the product of postsynaptic and presynaptic neural activities. During the propagation of the first unidirectional sequence, this rule potentiated synaptic weights symmetrically without creating any bias in the synaptic weights (Figure 1B).
However, when synaptic weights were changed by a modified Hebbian plasticity rule (see Materials and methods), in which the longterm plasticity is also regulated by STP at presynaptic terminals (Froemke et al., 2006), the second firing sequence only propagated to the reverse direction of the first one (Figure 1C). This selective propagation occurred because the first firing sequence potentiated synaptic weights asymmetrically in the forward and reverse directions, thus creating a bias in the spatial distribution of synaptic weights (Figure 1D). This means that firing sequences strengthen the reverse synaptic transmissions more strongly than the forward ones in this model, and reverse sequences are more likely to be generated after forward sequences.
Why does this model generate such a bias to the reverse direction? To explain this, we show the packet of neural activity during the first firing sequence (at $t=300\mathrm{}\mathrm{m}\mathrm{s}$) in Figure 1E, top. We also plotted neurotransmitter release from the presynaptic terminal of each neuron (presynaptic outputs), which is determined by the product of presynaptic neural activity and the amount of available neurotransmitters in the combined Hebbian rule (Figure 1E, bottom). Because neurotransmitters are exhausted in the tail of the activity packet, presynaptic outputs are effective only at the head of the activity packet. Due to this spatial asymmetricity of presynaptic outputs, connections from the head to the tail are strongly potentiated but those from the tail to the head are not (Figure 1F). This results in the biased potentiation of reverse synaptic transmissions in this model (Figure 1G). In other words, the packet of presynaptic outputs becomes somewhat ‘prospective’ (namely, slightly skewed toward the direction of activity propagation), so the weight changes based on the coincidences between presynaptic outputs and postsynaptic activities result in the selective potentiation of connections from the ‘future’ to the ‘past’ in the firing sequence. This mechanism was not known previously and enables forward sequences to potentiate the synaptic transmissions responsible for reverse replay, and vice versa.
In the above simulations, STP contributed crucially to the asymmetric potentiation of synaptic connections between neurons. Meanwhile, similar asymmetricity, and hence the potentiation of reverse synaptic transmission, can also emerge when the effect of postsynaptic activity on Hebbian plasticity accumulates through time (Figure 1—figure supplement 1). Because STDP in CA3 depends on the afterdepolarization (ADP) in dendrites (Mishra et al., 2016), this phenomenon will occur if the effect of ADPs accumulates over multiples postsynaptic spikes as afterhyperpolarization modulates STDP in the visual cortex (Froemke et al., 2006). Because we can obtain essentially the same results for STPdependent and ADPdependent plasticity rules, we only consider the effect of STP in the rest of this paper.
Potentiation of reverse synaptic transmissions by STDP
The potentiation of reverse synaptic transmissions also occurs robustly in spiking neurons with STDP. To show this, we constructed a onedimensional recurrent network of Izhikevich neurons (Izhikevich, 2003b, 2004) connected via conductancebased AMPA and NMDA synaptic currents (see Materials and methods). We chose Izhikevich model because its frequency adaptation induces instability and enhances the generation of moving activity bumps. However, we note that the learning mechanism itself does not depend on a specific model of spiking neurons. Initial synaptic weights, STP, and global inhibitory feedback were similar to those used in the rate neuron model. We tested two types of STDP: asymmetric STDP in which pretopost firing leads to potentiation and posttopre firing leads to depression, and symmetric STDP in which both firing orders result in potentiation if two spikes are temporally close or depression if two spikes are temporally distant. Experimental evidence suggests that recurrent synapses in hippocampal CA3 undergo symmetric STDP (Mishra et al., 2016).
Among these STDP types, only symmetric STDP showed a similar effect to the ratebased Hebbian rule. Under symmetric STDP without modulation by STP, synaptic transmissions were potentiated in both directions (Figure 2A and B). However, symmetric STDP modulated by STP biased the weight changes to the reverse direction (Figure 2C), and accordingly the second firing sequence selectively traveled towards the opposite direction to the first sequence (Figure 2D). This effect did not depend significantly on synaptic time constants, and we could obtain the same effect when we turned off NMDA current and shortened time constants for AMPA and inhibition (Figure 2—figure supplement 1A,B). In contrast, under asymmetric STDP without modulation by STP, firing sequences strengthened the forward sequence propagation (Figure 2—figure supplement 1C) and the related synaptic connections (Figure 2—figure supplement 1D), as expected. Introducing modulation by STP did not change the qualitative results (Figure 2—figure supplement 1E,F). These results indicate that a greater potentiation of reverse synaptic transmissions in CA3 occurs under the modified symmetric STDP.
Evaluation of parameter dependence in Poisson spike trains
We further confirmed the bias effect of the modified symmetric STDP in broader conditions than in the above network simulation. To this end, we generated sequential firing patterns along the onedimensional network by sampling from a Poisson process, while manually controlling the number of propagating spikes per neuron, the mean interspike interval (ISI) of Poisson input spike trains, and time lags in spike propagation (i.e. time difference between the first spikes of neighboring neurons) (Figure 3A). The amount of neurotransmitter release by each presynaptic spike was calculated by the STP rule (see Materials and methods), and the magnitude of longterm synaptic changes was calculated by a Gaussianshaped symmetric STDP (Figure 3B) (Mishra et al., 2016). The net effect of synaptic plasticity was given as the product of the two quantities, as in the previous simulations. The parameter values of shortterm and longterm plasticity were adopted from experimental results (Figure 3B,C) (Guzman et al., 2016; Mishra et al., 2016). We calculated the longterm weight changes in synapses sent from the central neuron in the network, and defined a weight bias as the difference in synaptic weights between forward and reverse directions (in which positive values mean bias to the reverse direction). We then obtained the mean bias and the fraction of positive biases (P(bias >0)) over 100 different realizations of spike trains generated with the same parameter values. For each number of spikes per neuron (2, 3, 4 and 5 spikes), we ran 1000 simulations using different mean ISIs and time lags randomly sampled from the interval [5, 50 ms].
Significant biases toward the reverse direction were observed in broad simulation conditions (Figure 3D). The biases were statistically significant (p<0.01 in Wilcoxon signed rank test for mean bias or binomial test for P(bias >0)) already in a part of conditions with two spikes per neuron, and the effect became prominent as the number of spikes increased. In general, the magnitude of synaptic changes is larger for a faster spike propagation (Figure 3D, Table 1), which is reasonable because the potentiation of symmetric STDP becomes stronger as presynaptic firing and postsynaptic firing get closer in time. On the other hand, P(bias >0) was greater for a smaller mean ISI, and it did not depend on the propagation speed (Figure 3D, Table 1). Especially, all simulations showed statistically significant biases to the reverse direction regardless of the propagation speed when the number of spikes per neuron is 4 or five and the mean ISI <20 ms. This implies that the bias toward the reverse direction is the most prominent when the neural network propagates a sequence of bursts with intraburst ISIs less than 20 ms. Such bursting is actually observed in CA3 in vivo (Mizuseki et al., 2012) and simulations of a CA3 recurrent network model suggest that bursting plays a crucial role in propagation of firing sequences (Omura et al., 2015).
Parameters that regulate STP also influence the bias toward the reverse direction. Actually, these parameters largely change in the hippocampus depending on experimental settings (Guzman et al., 2016). Therefore, we also performed simulations with randomly sampled values of the initial release probability of neurotransmitters ($U$) and the time constants of shortterm depression and facilitation (${\tau}_{\mathrm{S}\mathrm{T}\mathrm{D}}$ and ${\tau}_{\mathrm{S}\mathrm{T}\mathrm{F}}$). Here, the number of spikes per neuron was fixed to five, and the ISI and time lag were independently sampled from the interval [5, 20 ms] in every trial. As shown in Figure 4 and Table 1, both mean bias and P(bias>0) clearly depend on the initial release probability. Prominent bias was observed for $U>0.3$, and it gradually disappeared as $U$ was decreased. The bias was weakly correlated with ${\tau}_{\mathrm{S}\mathrm{T}\mathrm{D}}$, but there was almost no correlation between the bias and ${\tau}_{\mathrm{S}\mathrm{T}\mathrm{F}}$ (Table 1). In sum, the bias toward the reverse direction occurred robustly for a sufficiently high intraburst firing frequency and a sufficiently high release probability of neurotransmitters.
In contrast to symmetric STDP, asymmetric STDP was not effective in potentiating reversed synaptic transmissions even if it was modulated by STP. In our simulations with parameters taken from experiments in CA1 (Bi and Poo , 2001), such a potentiation effect was never observed for asymmetric STDP with alltoall spike coupling for any parameter value of STP and the number of spikes per neuron (5 or 15 spikes) (Figure 4—figure supplement 1). However, asymmetric STDP with nearestneighbor spike coupling (Izhikevich and Desai, 2003a), in which only the nearest postsynaptic spikes before and after a presynaptic spike were taken into account, generated statistically significant biases toward the reverse direction in some parameter region (Figure 4—figure supplement 1). In this case, large biases required large values of $U$ and ${\tau}_{\mathrm{S}\mathrm{T}\mathrm{D}}$, and a large number of spikes per neuron (15 spikes). Thus, the condition that asymmetric STDP generates the biases to the reverse direction is severely limited, although we cannot exclude this possibility.
Bias effects induced by spike trains during run
We also tested whether the realistic activity pattern of place cells during run can induce the directional bias. When a rat passes through place fields of CA3 place cells, they typically shows bellshape activity patterns duration of which is about 1 s and mean peak firing rate is about 13 Hz (Mizuseki and Buzsáki, 2013; Mizuseki et al., 2012). Firing of place cells is phaselocked to theta oscillation and the firing phase gradually advances as a rat moves through place fields (thetaphase precession). Furthermore, firing sequences of place cells scan the path from behind to ahead of the rat in every theta cycle, which is a phenomenon called theta sequence (Dragoi and Buzsáki, 2006; Foster and Wilson, 2007; Huxter et al., 2008; O'Keefe and Recce, 1993; Wang et al., 2015; Wikenheiser and Redish, 2015). Some models of hippocampal sequence learning hypothesized that these compressed sequential activity patterns enhance memory formation through STDP (Jensen and Lisman, 1996, 2005; Sato and Yamaguchi, 2003).
Here, we simulated weight biases induced by Poisson spike trains that mimic placecell activities when a rat is running through 81 equidistantlyspaced place fields. We modified both mean peak firing rates and magnitude of theta phaselocking (phase selectivity), as shown in Figure 5A (see Materials and methods). We found that weight biases primarily depended on coarsegrained firing rates, but phase selectivity had no significant effect in our model (Figure 5B, Table 2). Noticeable effects of phase selectivity on weight biases emerged for a narrow STDP time window of 10 ms (Figure 5—figure supplement 1, Table 2), suggesting that experimentally observed broad time window of STDP (70 ms) masks small differences in firing phases. As for the mean peak firing rate, 13 Hz was not high enough to induce statistically significant bias in these simulations. However, in two scenarios, weight biases can become strong during movement in our model. First, the summation of synaptic inputs from ten place cells with identical (or overlapped) place fields amplified weight biases, and the bias became significant at 13 Hz in this case (Figure 5C, Table 2). Second, firing rates of place cells obey a lognormal distribution and a small fraction of place cells exhibits extremely high firing rates (>30 Hz) (Mizuseki and Buzsáki, 2013). These cells created large weight changes and significant directional biases (Figure 5B and C). We note that the bias effects of replay sequences can be also enhanced in the above scenarios. Taken together, the bias effect in our model is weaker during run than in replay events because mean firing rate is lower and theta phaselocking is not effective for learning with broad symmetric STDP. However, nonnegligible bias effects may arise in some biologically plausible situations.
Learning strongly biased forward synaptic pathways through reverse replay
By the mechanism described above, our network model can potentiate forward synaptic pathway through reverse replay. However, whether reverse replay creates strong bias to the forward direction depends crucially on two parameters, that is, the slow time constant of longterm plasticity and the strength of shortterm depression. Here, we demonstrate this by simulating the onedimensional recurrent network model similar to Figure 1 in three different conditions. In all the cases (Figure 6A,B,C), we repeatedly triggered firing sequences at the beginning of each simulation trial (0 s < time < 5 s), which are regarded as ‘forward’ sequences corresponding to repeated sequential experiences. After this (time >10 s), we repeatedly stimulated central neurons in the network to induce firing sequences, which selectively traveled along the reverse synaptic pathway strengthened by the forward sequences, as was demonstrated in Figure 1 of the manuscript. These sequences overwrote the weight bias induced by forward sequences and eventually reversed it into the forward direction on neuron #100 (Figure 6D). In the end, the bias converges to some value for which reverse replay could not propagate a long distance. On the other hand, neuron #400 was not recruited in reverse replay and hence the reversal of weight bias did not occur (Figure 6E).
In the condition 1 (Figure 6A), modifications of synaptic weights were relatively slow (time constant was 5000 ms) and the depression effect of STP was the same as in Figure 1. Due to the slow modifications of synaptic weights, a large number of reverse replay was generated before the bias was overwritten. Therefore, accumulated weight bias to the forward direction became the largest in the final state (Figure 6D, red). In the condition 2 (Figure 6B), the time constant for weight changes was shorter (500 ms) and accordingly each reverse replay rapidly changed the weight bias. Consequently, reverse replay stopped earlier and the weight bias became smaller than the condition 1 (Figure 6D, blue). However, the bias still converges to the forward direction because firing sequences can propagate even when synaptic weights were weakly biased to the opposite direction. In the condition 3 (Figure 6C), we weakened STP in addition to the short time constant for weight changes (see Materials and methods). Because shortterm depression enhances the generation of firing sequences (Romani and Tsodyks, 2015), this manipulation further reduced the number of reverse replay and consequently the final value of weight bias (Figure 6D, green).
These results demonstrate that enhanced firing propagation by STP and relatively slow longterm plasticity are necessary to create strongly biased forward synaptic pathways through reverse replay. Strong shortterm depression may be replaced by other mechanisms such as dendritic spikes, which also enhance the propagation of firing sequences (Jahnke et al., 2015). However, this possibility was not pursued in the present study.
Goaldirected path learning through reverse replay
We now demonstrate how reverse replay events starting from a rewarded position enables the learning of goaldirected paths. We consider the case where an animal is exploring on a Wmaze (Figure 7A). During navigation, the animal gets a reward at the one end of the arm (position D2), but not at the opposite end (position D1) and other locations. In each trial, the animal starts at the center arm (position A) and runs into one of the two side arms at position B. In the present simulations, the animal visits both ends alternately: it reaches to D1 in $(2n+1)$th trials and D2 in $\left(2n\right)$th trials, where n is an integer. After reaching either of the ends (i.e. D1 or D2), the animal stops there for 7 s.
We constructed the 50 × 50 twodimensional (2D) placecell network associated to the 2D space that the animal explored (Figure 7B), using the rate neuron model. Each place cell had a place field in the corresponding position on the 2D space and received global inhibitory feedback proportional to the overall network activity. Neighboring place cells were reciprocally connected with excitatory synapses, which were modulated by shortterm and longterm plasticity rules as in Figure 1 and Figure 6. During the delivery of reward, we mimicked dopaminergic modulations by enhancing the inputs to CA3 and increasing the frequency of triggering firing sequences (Ambrose et al., 2016; Singer and Frank, 2009). Under this condition, a larger number of reverse replay was generated in the rewarded position. Thus, larger potentiation of synaptic pathways toward reward is expected in our model.
After a few traversals on the Wmaze, the network generated reverse replay at D1 and D2 (Figure 7D and E, red arrows) and forward replay at A (Figure 7D and E, black arrows) during immobility, and theta oscillation induced theta sequences along the animal’s path (see Figure 7—video 1). Notably, in trial five and later trials, forward replay selectively traveled towards the rewarded position D2. Furthermore, firing sequences that started from the nonrewarded position D1 propagated to the rewarded position D2 but not to the start A (Figure 7D and E, blue arrows). We note that the animal never traveled directly from D2 to D1 in our simulation. Thus, the network model could combine multiple spatial paths to form a synaptic pathway that has not been traversed by the animal. All these properties of firing sequences look convenient for the goaldirected learning of spatial map.
We statistically confirmed the abovementioned biases in firing sequences. We performed independent simulations of 10 model rats, in which five rats visited the two arms in the abovementioned order, and the other five rats visited the arms in the reversed sequential order. In each simulation, we counted the number of firing sequences propagating along different synaptic pathways. Propagation of firing sequences triggered at the start A was significantly biased to the rewarded position D2 (Figure 7F, Wilcoxon’s signed rank test, p=$5.86\times {10}^{3}$). While firing sequences from D2 tended to be reverse replay which propagates to A (Figure 7G, Wilcoxon’s signed rank test, p=$5.86\times {10}^{3}$), the majority of firing sequences from D1 propagated prospectively to D2 and hence were goaldirected sequences (Figure 7H, Wilcoxon’s signed rank test, p=$5.57\times {10}^{3}$).
To visualize how the recurrent network was optimized for the goaldirected exploratory behavior, we defined ‘connection vectors’ from recurrent synaptic weights. For each place cell, we calculated the weighted sum of eight unit vectors each directed towards one of the eight neighboring neurons, using the corresponding synaptic weights (Figure 8A). These connection vectors represent the average direction of neural activity transmitted from each neuron, and the 2D vector field shows the flow of neural activities in the 2D recurrent network and hence in the 2D maze. We note that these vectors bias the flow, but actual firing sequences can sometimes travel in different directions from the vector flow. Initially, synaptic connections were random and the connection vector field was not spatially organized (Figure 8—figure supplement 1). However, after the exploration, the vector field was organized so as to route neural activities to those neurons encoding the rewarded position on the track (Figure 8B). A similar route map was also obtained when we reversed the sequential order of visits to the two arms (Figure 8—figure supplement 2) but was abolished when we removed reward (Figure 8—figure supplement 3) or the effect of STP on Hebbian plasticity (Figure 8—figure supplement 4). As demonstrated previously, direct synaptic pathways from D1 to D2 were also created. The emergence of direct paths relies on two mechanisms in this model. First, as seen in Figure 8—figure supplement 3, connections are biased from goal to start when there is no reward at the goal because theta sequences enhance synaptic pathways opposite to the direction of animal's movement. In nonrewarded travels, these directional biases are not overwritten by reverse replay. Thus, the relative preference of a synaptic pathway in hippocampal sequential firing decreases for exploration that does not result in reward. Second, some of reverse replay sequences from D2 propagates into D1 instead of the stem arm (Figure 7E and G and Figure 7—video 1), and such joint replay enhances biases towards goal through unexperienced spatial paths. Thus, our model creates a map not only for the spatial paths experienced by the animal, but also for their possible combinations if they guide the animal directly to the rewarded position from a point in the space. In this sense, our model optimizes the cognitive map of the spatial environment.
We also examined the role of theta oscillation in our network model. When we turned off theta oscillation, the network model generated replaylike longlasting firing sequences not only during immobility, but also during run (Figure 7—figure supplement 1). These sequences propagated randomly in both forward and reverse directions. In our model, theta oscillation offers periodic hyperpolarization of the membrane potentials that terminates firing sequences in a short period and localizes placecell activity. Although the absence of theta oscillation does not impair placecell sequences during run if we weaken recurrent connection weights (e.g. the model in Wang et al., 2015), such model does not show replay events. At least in our model, theta oscillation during run is useful to realize the robust generation of local placecell activity during run and replay events simultaneously. We further explore the role of theta sequences in the next section.
Unbiased sequence propagation enhances goaldirected behavior in a 2D space
So far, we have shown the role of reverse replay for goaldirected learning in a 1D environment. However, whether a similar mechanism works in a 2D environment remains unclear. Previous models produce reverse replay of recent paths by transient upregulation of the excitability of recently activated neurons (Foster and Wilson, 2006; Molter et al., 2006), which may also work in a 2D space. Such an effect has been experimentally observed, but the bias to recent paths disappears rapidly (Csicsvari et al., 2007). In an open arena, prospective placecell sequences tend to propagate from the current position to reward sites, but sequence propagation from the goal position was not biased to recent paths (Pfeiffer, 2018; Pfeiffer and Foster, 2013). These observations suggest that firing sequences during immobility in a 2D space propagate isotropically from trigger points, rather than reverse replay of recent paths.
Our model predicts that such firing sequences triggered at reward sites are beneficial for goaldirected path learning in a 2D space. To show this, we simulated the same 2D neural network model as in the previous section, increasing initial connection weights. Because we connected only neighboring neurons, these strong neuronal wiring reflected the topological structure of 2D square space. We intermittently stimulated the central place cells to trigger firing sequences, which homogenously propagated through the 2D neural network (Figure 9A). Because sequence propagation potentiates synaptic pathways in the opposite direction, these divergent firing sequences created the connection vector field converging to the center (Figure 9B). This result generalizes the role of 1D reverse replay to higher dimensional spaces: isotropic sequence propagation from reward sites achieves goaldirected sequence learning in a 2D (or even 3D) open field.
We further demonstrate how this learning mechanism works for goaldirected navigation in an open arena in a task similar to Morris water maze task (Foster et al., 2000; Morris et al., 1986; Vorhees and Williams, 2006) and a 2D foraging task (Pfeiffer and Foster, 2013). In the simulations, an animal started from random positions in a 2D square space to search for a reward placed at one of the four candidate reward sites (Figure 10A). In each trial, the animal stayed at the starting position for 3 s, ran around the 2D space at a constant speed until it found the reward, and stayed at the reward site for 15 s. We triggered replay sequences every 1 s during immobility and theta oscillation induced theta sequences during run. At each time, we calculated a vector from the animal’s current position to the gravity center of the neural activity (corresponding to the current position expressed by the neural network), which we call the activity vector (Figure 10B). We used this vector to rotate the angle of the velocity vector of the animal’s movement. The velocity vector was also updated during immobility to determine the direction of the animal’s movement at the next start. Consequently, goaldirected replay sequences or theta sequences bias the animal’s movement toward the goal. Such a relationship between the animal’s movement and hippocampal firing sequences has been suggested in several experiments (Huxter et al., 2008; Pfeiffer and Foster, 2013; Wikenheiser and Redish, 2015). One simulation set consisted of 20 trials, and the position of reward was changed every five trials. Therefore, memory of the previous trials could guide the animal to the reward in trials 2–5, 7–10, 12–15, and 17–20 (REPEAT trials), but not in trials 6, 11, 16 (SWITCH trials). We performed 10 independent simulation sets, and 10 control simulation sets in which learning was disabled and the animal’s behavior was similar to a random search.
The model quickly learned efficient goaldirected navigation after every SWITCH trial (Figure 10C). Animals took relatively short paths from start to goal in REPEAT trials (Figure 10E), and accordingly exploration time was significantly shorter in REPEAT trials than in other trials (Figure 10D, Wilcoxon’s rank sum test, p<${10}^{10}$ for both REPEATSWITCH and REPEATCONTROL). In contrast, exploration time in SWITCH trials was longer than that in control simulations (Figure 10D, Wilcoxon’s rank sum test, p=$3.43\times {10}^{6}$) because animals typically explored around the previous reward site in SWITCH trials (Figure 10E). The exploration time around the previous reward site in SWITCH trials was significantly longer in simulations with learning than in control simulations (Wilcoxon’s rank sum test, p=$3.21\times {10}^{8}$), which is consistent with rodents’ behavior in Morris water maze task (Vorhees and Williams, 2006).
To analyze biases in sequence propagation, we calculated the angle between the instantaneous activity vector and a reference vector. The reference vector was a vector from the animal’s current position to a goal (reward) before and during exploration, or a vector from the animal’s current position to the recent path (the animal’s average position within 3 s before reaching the goal) at a goal. Bias to the goal or recent paths is strong if the angle is small. However, because of the small size of the network and the 2D space, the angles were not exactly uniform even in control simulations. To remove this effect, we calculated a mean angular displacement in each behavioral state (start, run and goal) in control simulations and subtracted these baseline values. In REPEAT trials, the bias to reward before and during exploration was stronger than the bias to recent path at the goal (Figure 10F, paired sample ttest, p=$2.98\times {10}^{7}$ for StartGoal, p=$5.02\times {10}^{3}$ for RunGoal). The bias at the goal was almost zero, that is, the same level as a random search. These results suggest that the bias from start to goal was stronger than that from goal to recent paths (reverse replay) in REPEAT trials, which is consistent with experimental observation (Pfeiffer, 2018; Pfeiffer and Foster, 2013). Furthermore, the bias during exploration suggests that weight biases also affected propagation of theta sequences. In contrast, in SWITCH trials, the bias to recent paths at the goal was significantly stronger than the bias to the goal in other periods (Figure 10G, paired sample ttest, p=$5.9\times {10}^{4}$ for StartGoal in SWITCH, p=$4.83\times {10}^{5}$ for RunGoal in SWITCH). This bias to recent paths in SWITCH trials can be explained by the fact that the animal typically reached reward after visiting the previous reward site which strongly attracted firing sequences (Figure 10E and F). Notably, this bias gives an efficient way to update connection weights for the new reward site because the connection vector fields converging to the two goals differ only in the space between the goals and updating weight biases is unnecessary outside this space. Thus, our model predicts that the bias of sequence propagation from a novel goal position to previously learned goals appears when the animal should update previous memories to a new memory. Taken together, these results suggest that divergent sequences create weight biases for convergent sequence propagation to goals in our model and this learning mechanism can be a basis for efficient goaldirected navigation in the 2D space.
Discussion
In this paper, we showed that the modified Hebbian plasticity rule modulated by STP biases synaptic weights toward the reverse direction of the firing sequences that traveled through a recurrent network. We demonstrated this counterintuitive phenomenon in network models of rate neurons and those of spiking neurons obeying symmetric STDP. We further clarified for various Poissonlike sequential firing patterns that the phenomenon favors spike bursts and a high presynaptic release probability. We also showed that the selective potentiation of reverse directions is unlikely to occur for the conventional asymmetric STDP.
Reverse replay and STPmodulated symmetric STDP for goaldirected learning
Our results have several implications for spatial memory processing by the hippocampus. Suppose that the animal is rewarded at a spatial position after exploring a particular path. Reverse replay propagating backward from the rewarded location will strengthen the neuronal wiring in CA3 that preferentially propagates forward firing sequences to this location along the path. Because the frequency of reverse replay increases at rewarded positions (Ambrose et al., 2016; Singer and Frank, 2009), reward delivery induces the preferential potentiation of forward synaptic pathways, which in turn results in an enhanced occurrence of forward replay in the consolidation phase. Thus, our model predicts that reverse replay is crucial for the reinforcement of rewardseeking behavior in the animal and gives, for the first time, the mechanistic account for the way reverse replay enables the hippocampal prospective coding of rewardseeking navigation. Furthermore, if the occurrence of reverse replay is modulated not only by reward but also by other salient events for the animal, this model is immediately generalized to memorization of important paths to be replayed afterwards. An interesting example would be learning of spatial paths associated with fear memory (Wu et al., 2017). Our computational results are qualitatively consistent with the experimentally observed properties of forward and reverse replay events (Ambrose et al., 2016; Carr et al., 2011; Diba and Buzsáki, 2007; Foster and Wilson, 2006; Pfeiffer, 2018; Pfeiffer and Foster, 2013; Singer and Frank, 2009) and matches the recent finding of symmetric STDP time windows in CA3 (Mishra et al., 2016).
Importantly, our model reinforces unexperienced spatial paths by connecting the multiple paths that were previously encoded by separate experiences. In the simulations on the Wmaze, the network model not only learned actually traversed paths from the start (A) to the goal (D2), but also remembered paths from other locations (C1 and D1) to the goal despite that the animal had not experienced these paths. This reinforcement occurs because reverse replay sequences starting from the visited arm occasionally propagate or bifurcate into an unvisited arm at the branching point. The hippocampus can generate replay along joint paths (Wu and Foster, 2014) even when the animal has no direct experience (Gupta et al., 2010). Therefore, the above mechanism is biologically possible.
Testable predictions of the model
In 1D tracks, reverse replay is observed immediately after the first lap (Foster and Wilson, 2006; Wu and Foster, 2014). Our model shows that such a replay creates a bias to the forward direction (i.e. toward the reward) even in the very early stage of learning. Consistently, a weak bias to forward replay was observed in the first exposure to a long 1D track (Davidson et al., 2009). Our model predicts that the bias to the goaldirected sequences will be suppressed (or enhanced) if we selectively block (or enhance) reverse replay at the goal. Such experiments are possible by using the techniques of realtime decoding feedback (Ciliberti and Kloosterman, 2017; Sodkomkham et al., 2016).
The most critical assumption in our model is the rapid modulation of STDP coherent to the presynaptic neurotransmitter release of STP. Such a modulation was actually reported in the visual cortex (Froemke et al., 2006). Although shortterm depression also exists in CA3 (Guzman et al., 2016), STDP is modulated in a slightly different fashion in the hippocampus: the strong modulation arises from the second presynaptic spike rather than the first one (Wang et al., 2005). However, the experiment was performed in a dissociated culture in which hippocampal subregions were not distinguished. Moreover, the modulation of STDP was not tested for more than two presynaptic spikes, and whether a third presynaptic spike further facilitates or rather depresses STDP remains unknown. Therefore, the contributions of STP to STDP should be further validated in CA3. The proposed role of shortterm depression in biasing replay events may also be examined by pharmacological blockade or enhancement of STP in the hippocampus (Froemke et al., 2006). In addition, we showed that the dendritic ADP accumulated over multiple spikes causes a similar phenomenon (Figure 1—figure supplement 1). This can be directly tested in CA3 by modifying the protocol described previously (Mishra et al., 2016). Moreover, a bias to the reverse direction can be further strengthened if some neuromodulator expands the time window of symmetric STDP selectively toward the anticausal temporal domain ($t}_{\mathrm{p}\mathrm{r}\mathrm{e}}>{t}_{\mathrm{p}\mathrm{o}\mathrm{s}\mathrm{t}$). Our model also predicts this type of metaplasticity in CA3
Neuromodulations, especially rewardtriggered facilitation of replay events (Ambrose et al., 2016; Singer and Frank, 2009), play important roles for the proposed mechanism of goaldirected learning with reverse replay. CA3 primarily receives dopaminergic input from the locus coeruleus (LC) which signals novelty and facilitate learning in the hippocampus (Takeuchi et al., 2016; Wagatsuma et al., 2018; Walling et al., 2012). Therefore, sequence learning in CA3 is also affected by novelty and salience of events (Lisman and Grace, 2005; Lisman et al., 2011). Consequently, any place in which the animal experiences behaviorally important events can be a potential goal for the hippocampal path learning, and valence of the goal is encoded in downstream areas (de Lavilléon et al., 2015; Redondo et al., 2014). This may explain sequence learning for fear memory (Wu et al., 2017) and vicarious trial and error in hippocampus, that is, evaluation of potential paths before decision making (Johnson and Redish, 2007; Singer et al., 2013). In this case, reverse replay in CA3 contributes to selective forward replay of paths informative for decision making.
In our model, modulation of triggering replay events (or sharp wave ripples) crucially affects the learning with reverse replay. Therefore, other hippocampal subregions may also participate in goaldirected path learning. The area CA2 encodes the current position during immobility (Kay et al., 2016) and can trigger sharp wave ripples (Oliva et al., 2016). Thus, dopaminergic enhancement in CA2 may increase replay events from the current position as in our simulation settings. Dentate gyrus intensively encodes reward sites and triggers sharp wave ripples in CA3 in working memory task (Sasaki et al., 2018). It is interesting to examine whether these ripples accompany replay sequences, which remains unclear at present. If this is indeed the case, our results suggest that CA2 and dentate gyrus also play active roles in the goal selection of hippocampal path learning.
Our model also predicts that the release probability of neurotransmitter strongly affects the magnitude and probability of bias toward the reverse direction (Figure 4). Therefore, modulation of neurotransmitter release in CA3 can regulate the behavioral impact of hippocampal firing sequences. For example, acetylcholine suppresses neurotransmitter release at recurrent synapses (Hasselmo, 2006), which may abolish the directional biases created during movement (see Figure 4 and Figure 8—figure supplement 3). Presynaptic longterm plasticity (Costa et al., 2015) may also affect the directional biases at longer timescales.
Role of theta sequences in hippocampal goaldirected learning
In our simulation, the strength of theta phaselocking of place cell activity did not have significant effects on learning (Figure 5 and Figure 5—figure supplement 1). This result poses a question against the longstanding hypothesis that theta sequences are essential for sequence learning (Jensen and Lisman, 1996; 2005; Sato and Yamaguchi, 2003). However, it is possible that theta phaselocking is effective in learning CA3toCA1 connections at which STDP is asymmetric and more sensitive to time differences between presynaptic and postsynaptic activities than at CA3 recurrent synapses (Bi and Poo , 2001). Furthermore, our simulations in the 2D space showed that theta sequences can act as readout of weight biases to reward, and hence are useful for planning future trajectories during exploration. This role of theta sequences is consistent with experimental findings (Huxter et al., 2008; Wikenheiser and Redish, 2015). Shortterm facilitation also had only limited effects on the proposed learning mechanism (Figure 4); however, it contributed to the generation of theta sequences (Wang et al., 2015). Therefore, in our model, shortterm facilitation also enhances the readout of spatial information during run.
Relationships to the previous models
By extending the role of reverse replay in 1D space, we showed that divergent sequences that isotropically propagate from reward sites helps goaldirected sequence learning and hence efficient navigation in 2D space. Similar foraging tasks can be solved by temporal difference (TD) learning model (Dayan, 1993; Sutton, 1988) and the relevance of TD learning to hippocampal information processing has been proposed (Foster et al., 2000; Stachenfeld et al., 2017). Thus, how our learning mechanism is related to TD learning should be mathematically investigated in the future.
A conceptual model has been proposed for goaldirected sequence learning with symmetric STDP based on the gradient field of connection strength (Pfeiffer, 2018). The conceptual model concluded that goaldirected sequence learning would not occur if sequences homogenously propagate through the entire space. However, our model shows that this is not the case for STPmodulated symmetric STDP and proposes a network mechanism to implement the conceptual model based on reverse replay.
Some limitations of the present model
While the present model could demonstrate goaldirected path learning, the model has yet to be improved to learn contextdependent switching of behavior such as navigation on an alternating Tmaze. In our simulation, animals learn only reward delivered at the same location. However, in alternating Tmaze tasks, the animal has to remember recent experiences to change its choices based on the stored memory. Furthermore, the experiment in Pfeiffer and Foster (2013) also contains workingmemorybased switching between predictable and unpredictable reward searches. Thus, the consistency between our simulations and experiments is still limited. A straightforward extension of our model is to maintain multiple charts representing the alternating paths (Samsonovich and McNaughton, 1997) and switch them according to the stored shortterm memory or certain context information. Each chart will be selectively reinforced by reverse replay along one of the alternating paths. This switching of CA3 activity may be supported by the dentate gyrus (Sasaki et al., 2018). How to protect the previous memories from overwriting with novel reward experiences is another important issue that is left unsolved by the present model. If the animal can memorize all four candidate reward sites simultaneously, the animal can efficiently search reward even when the reward position is changed in every trial. One solution for this issue is triggering remote replay (Gupta et al., 2010; Karlsson and Frank, 2009) at the previous reward sites. Thus, we predict that bias of starting points of remote replay affects persistence of multiple memories.
While 2D hippocampal place cells are omnidirectional, the majority of 1D place cells are unidirectional (Buzsáki, 2005). We did not take into account this property of 1D place cells in this study. Because we only simulated unidirectional movements in the Wmaze, the network may describe an ensemble of place cells for one direction and another neuron ensemble is necessary for the opposite direction. In this case, place cells for the path B→D1 and those for the opposite path D1→B are different. Thus, the directional bias learned in our network may not implicate direct paths from D1 to D2 for the animal. To learn unexperienced paths, continuous replay events of multiple unidirectional placecell ensembles is necessary, which has been experimentally observed (Davidson et al., 2009; Gupta et al., 2010; Wu and Foster, 2014). Neural network models also demonstrated that Hebbian plasticity can connect multiple unidirectional place cells at the junction points (Brunel and Trullier, 1998; Buzsáki, 2005; Káli and Dayan, 2000). Relating to this, replay of long paths and complex spatial structures is often accompanied by concatenated sharp wave ripples (Davidson et al., 2009). Interestingly, each sharp wave ripple corresponds to a segment in the spatial structure (Wu and Foster, 2014). The effect of this segmentation to our learning mechanism is not obvious. Elucidating the underlying mechanism will reveal how the hippocampal circuit segment and concatenate sequential experiences.
Possible implications of our results in other cognitive tasks
Reverse replay has not been found in the neocortical circuits. For instance, firing sequences in the rodent prefrontal cortex are reactivated only in the forward directions (Euston et al., 2007). To the best of our knowledge, neocortical synapses obey asymmetric STDP (Froemke et al., 2006). This seems to be consistent with the selective occurrence of forward sequences because, as suggested by our model, the sensoryevoked forward firing sequences should only strengthen forward synaptic pathways under asymmetric STDP. However, if dopamine turns asymmetric STDP into symmetric STDP, which is actually the case in hippocampal area CA1 (Brzosko et al., 2015; Zhang et al., 2009), forward firing sequences will reinforce the propagation of reverse sequences. Whether reverse sequences exist in the neocortex and, if not, what functional roles replay events play in the neocortical circuits are still open questions.
Whether the present neural mechanism to combine experienced paths into a novel path accounts for cognitive functions other than memory is an intriguing question. For instance, does this mechanism explain the transitivity rule of inference by neural networks? The transitive rule is one of the fundamental rules in logical thinking and says, ‘if A implies B and B implies C, then A implies C.’ This flow of logic has some similarity to that of joint forwardreplay sequences, which says, ‘if visiting A leads to visiting B and visiting B leads to visiting C, then visiting A leads to visiting C.’ Logic thinking is more complex and should be more rigorous than spatial navigation, and little is known about its neural mechanisms. The proposed neural mechanism of path learning may give a cue for exploring the neural substrate for logic operations by the brain.
In sum, our model proposes a biologically plausible mechanism for goaldirected path learning through reverse sequences. In the dynamic programmingbased path finding methods such as Dijkstra’s algorithm for finding the shortest path (Dijkstra, 1959) and Viterbi algorithm for finding the most likely state sequences in a hidden Markov model (Bishop, 2010), a globally optimal path is determined by backtrack from the goal to the start after an exhaustive local search of forward paths. Our model enables similar path finding mechanism through reverse replay. Such a mechanism has been suggested in machine learning literature (Foster and Knierim, 2012), but whether and how neural dynamics achieves it remained unknown. In addition, our model predicts the roles of neuromodulators that modify plasticity rules and activity levels in sequence learning. These predictions are testable by physiological experiments.
Materials and methods
Onedimensional recurrent network model with rate neurons (Figure 1)
We simulated the network of 500 neurons. Firing rate of neuron $i$ was determined as
The function ${\mathrm{f}}_{\mathrm{r}\mathrm{a}\mathrm{t}\mathrm{e}}\left(I\right)$ was threshold linear function
where $\rho =0.0025$ and $\u03f5=0.5$. Excitatory synaptic current ${I}_{i}^{\mathrm{e}\mathrm{x}\mathrm{c}}$ and Inhibitory feedback $I}^{\mathrm{i}\mathrm{n}\mathrm{h}$ followed
where ${\tau}^{\mathrm{e}\mathrm{x}\mathrm{c}}={\tau}^{\mathrm{i}\mathrm{n}\mathrm{h}}=10\mathrm{m}\mathrm{s}$, and ${w}^{\mathrm{i}\mathrm{n}\mathrm{h}}=1$. Variables for shortterm synaptic plasticity ${D}_{j}$ and ${F}_{j}$ obeyed the following equations (Wang et al., 2015):
Parameters were ${\tau}_{\mathrm{S}\mathrm{T}\mathrm{D}}=500\mathrm{}\mathrm{m}\mathrm{s},\mathrm{}{\tau}_{\mathrm{S}\mathrm{T}\mathrm{F}}=200\mathrm{m}\mathrm{s}$, and $U=0.6$. External input ${I}_{i}^{\mathrm{e}\mathrm{x}\mathrm{t}}$ was usually zero and changed to $5$ for 10 ms when the cell was stimulated. We stimulated neurons $0\le i\le 10$ at the beginning of simulations, and neurons $245\le i\le 255$ at 3 s after the beginning.
We determined initial values of excitatory weights ${w}_{ij}$ as
where ${w}_{\mathrm{m}\mathrm{a}\mathrm{x}}=27$ and $d=5$. We set selfconnections ${w}_{ii}$ to zero throughout the simulations.
The weights were modified according to the ratebased Hebbian synaptic plasticity as
where $\eta =20$ and ${\tau}_{\mathrm{w}}=1000\mathrm{}\mathrm{m}\mathrm{s}$. When we simulated Hebbian synaptic plasticity without the modulation by shortterm plasticity, we removed ${D}_{j}{F}_{j}$ from this equation and changed the value of $\eta $ to 4.
In the simulation with accumulation of ADP (Figure 1—figure supplement 1), we calculated smoothed postsynaptic activity ${p}_{i}$ by solving
where ${\tau}_{\mathrm{A}\mathrm{D}\mathrm{P}}=80\mathrm{m}\mathrm{s}$, and changed Hebbian plasticity (9) to
The value of $\eta $ was also changed to 4.
Onedimensional recurrent network model with spiking neurons (Figure 2)
We used Izhikevich model (Izhikevich, 2003b) for the simulation of spiking neurons:
If the membrane voltage ${v}_{i}\ge 30\mathrm{m}\mathrm{V}$, the neuron emits a spike and the two variables were reset as ${v}_{i}\leftarrow c\mathrm{}$and ${u}_{i}\leftarrow {u}_{i}+d$. Parameter values were $a=0.02,\mathrm{}b=0.2,\mathrm{}c=65\mathrm{}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{}d=8$. Excitatory synaptic current was
and the synaptic conductance followed
where ${t}_{jk}^{\mathrm{f}}$ is the timing of the $k$th spike of neuron $j$ and parameter values were ${\tau}_{\mathrm{A}\mathrm{M}\mathrm{P}\mathrm{A}}=5\mathrm{}\mathrm{m}\mathrm{s},\mathrm{}{\tau}_{\mathrm{N}\mathrm{M}\mathrm{D}\mathrm{A}}=150\mathrm{}\mathrm{m}\mathrm{s}\mathrm{}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{}{t}_{\mathrm{d}\mathrm{e}\mathrm{l}\mathrm{a}\mathrm{y}}=2\mathrm{m}\mathrm{s}$. The voltage dependence of NMDA current (Izhikevich et al., 2004) was
Inhibitory feedback ${I}_{i}^{\mathrm{i}\mathrm{n}\mathrm{h}}$ was calculated as
where ${w}^{\mathrm{i}\mathrm{n}\mathrm{h}}=1\mathrm{a}\mathrm{n}\mathrm{d}{\tau}^{\mathrm{i}\mathrm{n}\mathrm{h}}=10\mathrm{m}\mathrm{s}$. Shortterm synaptic plasticity obeyed the following dynamics:
Parameter values were ${\tau}_{\mathrm{S}\mathrm{T}\mathrm{D}}=500\mathrm{}\mathrm{m}\mathrm{s},\mathrm{}{\tau}_{\mathrm{S}\mathrm{T}\mathrm{F}}=200\mathrm{m}\mathrm{s}\mathrm{}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{}U=0.6$. External input ${I}_{i}^{\mathrm{e}\mathrm{x}\mathrm{t}}$ was the same as that in the rate neuron model.
We determined initial values of synaptic weights ${w}_{ij}^{\mathrm{A}\mathrm{M}\mathrm{P}\mathrm{A}}$ as
where ${w}_{\mathrm{m}\mathrm{a}\mathrm{x}}$ was 0.3 and $d=5$. We set selfconnections ${w}_{ii}^{\mathrm{A}\mathrm{M}\mathrm{P}\mathrm{A}}$ to zero throughout the simulations. The weights of NMDA current ${w}_{ij}^{\mathrm{N}\mathrm{M}\mathrm{D}\mathrm{A}}$ were determined as ${w}_{ij}^{\mathrm{N}\mathrm{M}\mathrm{D}\mathrm{A}}=0.2{w}_{ij}^{\mathrm{A}\mathrm{M}\mathrm{P}\mathrm{A}}$ and fixed at these values throughout the simulations.
The weights of AMPA current were modified by STDP as
We simulated two different STDP types by changing the function ${\mathrm{f}}_{\mathrm{S}\mathrm{T}\mathrm{D}\mathrm{P}}\left({t}_{\mathrm{p}\mathrm{o}\mathrm{s}\mathrm{t}},{t}_{\mathrm{p}\mathrm{r}\mathrm{e}}\right)$ as follows.
Asymmetric STDP:
Symmetric STDP:
Parameter values were $\eta =0.05,\mathrm{}{\tau}_{\mathrm{w}}=1000\mathrm{}\mathrm{m}\mathrm{s}$, ${A}_{+}=1,{A}_{}=0.5,{\tau}_{+}=20\mathrm{m}\mathrm{s}\mathrm{}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{}{\tau}_{}=40\mathrm{m}\mathrm{s}$ for all STDP types. We took into account contributions of all spike pairs were in the simulations. When we simulated STDP without the modulation by shortterm plasticity, we removed ${D}_{j}{F}_{j}$ from the above equation and changed the value of $\eta $ to 0.01.
In the simulation with short time constants (Figure 2—figure supplement 1A), we changed the values of parameters as ${\tau}_{\mathrm{A}\mathrm{M}\mathrm{P}\mathrm{A}}=2.5\mathrm{}\mathrm{m}\mathrm{s},\mathrm{}{\tau}^{\mathrm{i}\mathrm{n}\mathrm{h}}=5\mathrm{}\mathrm{m}\mathrm{s},\mathrm{}{w}_{\mathrm{m}\mathrm{a}\mathrm{x}}=0.35$ and ${w}_{ij}^{\mathrm{N}\mathrm{M}\mathrm{D}\mathrm{A}}=0$.
Evaluation of weight changes with Poisson spike trains (Figures 3 and 4)
In each simulation, we sampled spike trains of 21 neurons (neuron #1  #21) 100 times for given values of the number of spikes per neuron ${N}_{\mathrm{s}\mathrm{p}\mathrm{i}\mathrm{k}\mathrm{e}}$, mean interspike interval (ISI) ${t}_{\mathrm{I}\mathrm{S}\mathrm{I}}$, and firing propagation speed ${t}_{\mathrm{s}\mathrm{p}\mathrm{e}\mathrm{e}\mathrm{d}}$. We set first spikes of neuron #$n$ to ${t}_{n,1}^{\mathrm{f}}=\left(n1\right){t}_{\mathrm{s}\mathrm{p}\mathrm{e}\mathrm{e}\mathrm{d}}$. Following a first spike, Poisson spike train for each neuron was simulated by sampling ISI ($\mathrm{\Delta}{t}_{n,k}^{\mathrm{f}}={t}_{n,k}^{\mathrm{f}}{t}_{n,k1}^{\mathrm{f}}$) from the following exponential distribution:
We induced an absolute refractory period by resampling ISI if it was shorter than 1 ms. After we generated spike trains, we simulated neurotransmitter release by solving Equations (19) and (20) for each neuron. In Figure 3, parameter values of shortterm plasticity were ${\tau}_{\mathrm{S}\mathrm{T}\mathrm{D}}=150\mathrm{}\mathrm{m}\mathrm{s},\mathrm{}{\tau}_{\mathrm{S}\mathrm{T}\mathrm{F}}=40\mathrm{m}\mathrm{s}\mathrm{}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{}U=0.37$. In Figure 4, we sampled the values of $U,{\tau}_{\mathrm{S}\mathrm{T}\mathrm{D}}$ and ${\tau}_{\mathrm{S}\mathrm{T}\mathrm{F}}$ from [0.1, 0.6], [50, 500 ms] and [10ms, 300 ms], respectively.
We calculated changes in the weight from the neuron in the center ($j=11$) to the neuron $i$ as
In alltoall STDP, we calculated the above summation over all spike pairs. In nearestneighbor STDP (Izhikevich and Desai, 2003a), we considered only pairs of a presynaptic spike and the nearest postsynaptic spikes before and after the presynaptic spike. For symmetric STDP, ${\mathrm{f}}_{\mathrm{S}\mathrm{T}\mathrm{D}\mathrm{P}}\left({t}_{\mathrm{p}\mathrm{o}\mathrm{s}\mathrm{t}},{t}_{\mathrm{p}\mathrm{r}\mathrm{e}}\right)$ was Gaussian (Mishra et al., 2016)
where $A=1$ and $\tau =70\mathrm{m}\mathrm{s}$. For asymmtric STDP, ${\mathrm{f}}_{\mathrm{S}\mathrm{T}\mathrm{D}\mathrm{P}}\left({t}_{\mathrm{p}\mathrm{o}\mathrm{s}\mathrm{t}},{t}_{\mathrm{p}\mathrm{r}\mathrm{e}}\right)$ was the same as the Equation (24) except that parameter values were changed as ${A}_{+}=0.777,{A}_{}=0.273,{\tau}_{+}=16.8\mathrm{m}\mathrm{s}\mathrm{}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{}{\tau}_{}=33.7\mathrm{m}\mathrm{s}$ (Bi and Poo , 2001). We calculated weight biases for each spike train as ${\sum}_{i=1}^{10}{\mathrm{\Delta}}_{ij}{\sum}_{i=12}^{21}{\mathrm{\Delta}}_{ij}$.
Evaluation of weight changes with thetamodulated Poisson spike trains during run (Figure 5)
In each simulation, we sampled onesecondlong spike trains of 81 neurons (neuron #1  #81) 100 times. Calculation of firing probabilities and sampling of spikes were performed every 1 ms time bin. We assumed a constant speed of the rat, and expressed the animal’s current position with current time $t$ [s]. We defined theta phase at time $t$ as $\theta \left(t\right)=2\mathrm{\pi}\times 8t+c$, where $c$ is a random offset. Place field of each neuron was given by normalized Gaussian function
The placefield center of neuron $i$ was ${\mu}_{i}=2\times \frac{i1}{80}0.5$ which spanned from 0.5 to 1.5, and ${\sigma}_{\mathrm{P}\mathrm{F}}=0.2$. We simulated theta phaselocking with vonMises distribution function (Bishop, 2010), which is periodic version of Gaussian function
where ${\mathrm{I}}_{0}\left(\beta \right)$ is the zerothorder Bessel modified function of the first kind. The mean firing phase of neuron $i$ at time $t$ changed through time as ${p}_{i}\left(t\right)=\mathrm{\pi}\left({\mu}_{i}t\right)$. Using these functions, we determined the firing rate of neuron $i$ at time $t$ as $\alpha P{F}_{i}\left(t\right)P{L}_{i}\left(t\right)$ [kHz]. In this setting, $\alpha $ and $\beta $ were the tuning parameters that control the peak firing rates and phase selectivity, respectively. The range of sampling was $0.01\le \alpha \le 0.15$ and $0.1\le \beta \le 10$, and the peak firing rates were calculated by smoothing spike trains of the central neuron (#41) with Gaussian kernel of the standard deviation 50 ms. Changes and biases of weights from the central neuron (j=41) were calculated in the same way with the previous section. In the evaluation of the biases summed over 10 overlapping place cells, we sampled 10 spike trains for the central neuron (#41), and summed weight biases calculated from these samples.
Simulation of learning forward synaptic pathways through reverse replay (Figure 6)
We used the same network model as in Figure 1 and additionally implemented normalization of synaptic weights for homeostasis. We calculated the sum of incoming weights on each neuron (${\sum}_{j}{w}_{ij}$) at each time step and normalized the weights as ${w}_{ij}\leftarrow \frac{{\sum}_{j}{w}_{ij}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}}{{\sum}_{j}{w}_{ij}}{w}_{ij}$, where $\left\{{w}_{ij}^{\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{t}}\right\}$ denotes initial synaptic weights. We changed the time constant of longterm plasticity (${\tau}_{\mathrm{w}}$) to 5000 ms in the condition 1, and 500 ms in the conditions 2 and 3. In the condition 3, we also changed the values of parameters for shortterm plasticity and initial synaptic weights as ${\tau}_{\mathrm{S}\mathrm{T}\mathrm{D}}=200\mathrm{}\mathrm{m}\mathrm{s}$, $U=0.3$ and ${w}_{\mathrm{m}\mathrm{a}\mathrm{x}}=30$. Neurons were stimulated every 1 s. Weight biases were calculated as $\sum _{i<j}{w}_{ij}\sum _{i>j}{w}_{ij}$ for two neurons $(j=100,400)$.
Simulations of goaldirected sequence learning on a Wmaze (Figures 7 and 8)
We simulated an animal moving on a twodimensional space spanned by $x$ and $y\mathrm{}(0\le x\le 50,\mathrm{}0\le y\le 50)$. Coordinates of the six corners (A, B, C1, C2, D1, D2) of Wmaze were ${\mathbf{z}}_{\mathrm{A}}=(25,\mathrm{}15)$, ${\mathbf{z}}_{\mathrm{B}}=\left(\mathrm{25,35}\right)$, ${\mathbf{z}}_{\mathrm{C}1}=\mathrm{}(45,\mathrm{}35)$, ${\mathbf{z}}_{\mathrm{D}1}=\text{}\left(45,\text{}15\right)$, ${\mathbf{z}}_{\mathrm{C}2}=\left(5,\text{}35\right)$ and ${\mathbf{z}}_{\mathrm{D}2}=\mathrm{}(5,\mathrm{}15)$. In each set of trials with time length $T=15\mathrm{}\mathrm{s}$, we determined the position of the animal ${\mathbf{z}}_{\mathrm{p}\mathrm{o}\mathrm{s}}$ at time ${t}^{\text{'}}=t\mathrm{m}\mathrm{o}\mathrm{d}\mathrm{}T$ as
We set $\mathbf{z}}_{\mathrm{X}}={\mathbf{z}}_{\mathrm{C}1$ and $\mathbf{z}}_{\mathrm{Y}}={\mathbf{z}}_{\mathrm{D}1$ in the (2n+1)th trials and $\mathbf{z}}_{\mathrm{X}}={\mathbf{z}}_{\mathrm{C}2$ and $\mathbf{z}}_{\mathrm{Y}}={\mathbf{z}}_{\mathrm{D}2$ in the (2n)th trials.
The neural network consisted of 2500 place cells that were arranged on a 50 x 50 twodimensional square lattice. The place field centers of neurons in the $i$th column (xaxis) and the $j$th row (yaxis) were denoted as ${\mathbf{z}}_{i,j}=\left(i,j\right)$. Each place cell received excitatory connections from eight surrounding neurons. The connection weight from a neuron at $\left(ik,jl\right)$ to a neuron at $\left(i,j\right)$ were denoted as ${w}_{i,j}^{k,l}$, where the possible combinations of $(k,l)$ were given as $S=\left\{\left(\mathrm{1,1}\right),\left(\mathrm{1,0}\right),\left(1,1\right),\left(\mathrm{0,1}\right),\left(0,1\right),\left(\mathrm{1,1}\right),\left(\mathrm{1,0}\right),\left(1,1\right)\right\}$. Initial connection weights were uniformly random and normalized such that the sum of eight connections obeys ${\sum}_{\left(k,l\right)\in S}{w}_{i,j}^{k,l}=0.5$.
We simulated activities of place cells ${r}_{i,j}$ as
Time constant $\tau $ was $10\mathrm{m}\mathrm{s}$. The function ${\mathrm{f}}_{\mathrm{r}\mathrm{a}\mathrm{t}\mathrm{e}}\left(I\right)$ was a threshold linear function
where $\rho =1$ and $\u03f5=0.002$. Inhibitory feedback ${I}^{\mathrm{i}\mathrm{n}\mathrm{h}}$ followed
where ${\tau}^{\mathrm{i}\mathrm{n}\mathrm{h}}=10\mathrm{m}\mathrm{s}$ and ${w}^{\mathrm{i}\mathrm{n}\mathrm{h}}=0.0005$. Variables for shortterm synaptic plasticity ${D}_{i,j}$ and ${F}_{i,j}$ obeyed
with parameter values ${\tau}_{\mathrm{S}\mathrm{T}\mathrm{D}}=300\mathrm{}\mathrm{m}\mathrm{s},\mathrm{}{\tau}_{\mathrm{S}\mathrm{T}\mathrm{F}}=200\mathrm{m}\mathrm{s}$, and $U=0.4$. We induced theta oscillation by
where $B=0.005\mathrm{}\mathrm{k}\mathrm{H}\mathrm{z}$ and ${t}_{\mathrm{t}\mathrm{h}\mathrm{e}\mathrm{t}\mathrm{a}}=\frac{1000}{7}\mathrm{m}\mathrm{s}$. ${I}_{i,j}^{\mathrm{n}\mathrm{o}\mathrm{i}\mathrm{s}\mathrm{e}}$ was independent Gaussian noise with the standard deviation 0.0005 kHz. We determined placedependent inputs for each neuron ${I}_{i,j}^{\mathrm{p}\mathrm{l}\mathrm{a}\mathrm{c}\mathrm{e}}$ from the place field center of each neuron ${\mathbf{z}}_{i,j}$ and the current position of the animal ${\mathbf{z}}_{\mathrm{p}\mathrm{o}\mathrm{s}}$:
where $d$= 2. The parameter $C$ was set as 0.005 kHz when the animal was moving and 0.001 kHz when the animal was stopping at D2 (the position of reward). When the animal was stopping at other positions, $C$ was set at zero but occasionally changed to 0.001 kHz for a short interval of 200 ms. The occurrence of this brief activation followed Poisson process at 0.1 Hz, but it always occurred one second after the onset of each trial to trigger prospective firing sequences.
We implemented the Hebbian synaptic plasticity as
where $\eta =1$ and ${\tau}_{\mathrm{w}}=30\mathrm{}\mathrm{s}$. If the sum of synaptic weights on each neuron (${\sum}_{\left(k,l\right)\in S}{w}_{i,j}^{k,l}$) was greater than unity, we renormalized synaptic weights by dividing them by the sum. When we simulated Hebbian synaptic plasticity without modulations by shortterm plasticity, ${D}_{ik,jl}{F}_{ik,jl}$ was removed from the above equation and the $\eta $ value was changed to 0.1.
We calculated 'connection vector' of each neuron ${\mathbf{u}}_{i,j}$ by the following weighted sum of unit vectors ${\mathbf{v}}_{k,l}=\left(\frac{k}{\sqrt{{k}^{2}+{l}^{2}}},\frac{l}{\sqrt{{k}^{2}+{l}^{2}}}\right)$:
Simulations of goaldirected sequence learning in a 2D space (Figures 9 and 10)
We used a similar twodimensional space and a similar 50 x 50 neural network model to those used in the simulation of the Wmaze. We made some minor changes: (1) We normalized initial connection weights as ${\sum}_{\left(k,l\right)\in S}{w}_{i,j}^{k,l}=1$. (2) In order to create finitelength firing sequences, we added an external inhibitory input ${I}^{\mathrm{e}\mathrm{x}\mathrm{t}\mathrm{i}\mathrm{n}\mathrm{h}}$ to the Equation (33). When the animal was immobile, we kept excitatory input ${I}_{i,j}^{\mathrm{p}\mathrm{l}\mathrm{a}\mathrm{c}\mathrm{e}}$ nonzero (C=0.001 kHz), and triggered sequences by disinhibiting the network (${I}^{\mathrm{e}\mathrm{x}\mathrm{t}\mathrm{i}\mathrm{n}\mathrm{h}}=0$) every 1 s and terminated sequences (${I}^{\mathrm{e}\mathrm{x}\mathrm{t}\mathrm{i}\mathrm{n}\mathrm{h}}=0.1$) at 800 ms after each trigger.
In the simulation of divergent sequences (Figure 9), parameters of Hebbian plasticity were changed as $\eta =0.5$ and ${\tau}_{\mathrm{w}}=10\mathrm{}\mathrm{s}$. We triggered firing sequences at the point $\left(x,y\right)=\left(25,\mathrm{}25\right)$ for 30 s.
In the simulation of foraging task (Figure 10), parameters of Hebbian plasticity were changed as $\eta =0.1$ and ${\tau}_{\mathrm{w}}=10\mathrm{}\mathrm{s}$. Four candidate reward sites were positioned at $\left(x,y\right)=\left(15,\mathrm{}15\right),\left(15,\mathrm{}35\right),\left(35,\mathrm{}15\right),\left(35,\mathrm{}35\right)$. The reward position was randomly determined in each simulation. The animal could explore the space defined by $5\le x\le 45,\mathrm{}5\le y\le 45$. We randomly determined a starting position in this area such that the initial distance to the reward position was longer than 10, and the animal began to explore after 3slong immobility. When the animal reached within distance 3 from the reward, the animal was set immobile for 15 s, and then reset to the starting position of the next trial. We terminate a trial when the animal did not reach reward within 300 s, and regarded the exploration time of these trials as 300 s. We excluded these trials from the analysis of angular displacements.
We calculated the activity vector in the 2D coordinate system as
when $\sum _{i,\mathit{j}}{r}_{i,j}>0$. We changed the animal’s position ${\mathbf{z}}_{\mathrm{p}\mathrm{o}\mathrm{s}}$ through the velocity vector $\mathbf{v}$ as
and rotated the velocity vector towards the direction of the activity vector:
where $\mathbf{v}$ was normalized to unity at each time and ${\mathbf{a}}_{\mathrm{n}\mathrm{o}\mathrm{i}\mathrm{s}\mathrm{e}}$ is a twodimensional independent normal Gaussian noise. The speed ${\gamma}_{\mathrm{v}}$ was 0.01 or 0 during exploration and immobility, respectively. Other values of parameters were ${\gamma}_{\mathrm{a}}=0.01$ and ${\gamma}_{\mathrm{n}\mathrm{o}\mathrm{i}\mathrm{s}\mathrm{e}}=0.05$ throughout the simulation. We calculated angular displacements from cosine similarity between $\mathbf{a}$ and the reference vector $\mathbf{\rho}$ every 20 ms. Before and during exploration, $\mathbf{\rho}$ was a vector from ${\mathbf{z}}_{\mathrm{p}\mathrm{o}\mathrm{s}}$ to the reward position. At the reward position, $\mathbf{\rho}$ was a vector from the current ${\mathbf{z}}_{\mathrm{p}\mathrm{o}\mathrm{s}}$ to the mean of ${\mathbf{z}}_{\mathrm{p}\mathrm{o}\mathrm{s}}$ within 3 seconds before reaching the reward position. We excluded the periods in which the maximum firing rate in the network was below 0.01 kHz or the length of $\mathbf{a}$ was below 1. Due to the small size of the network and the 2D space, angular displacements were not uniform even in the control simulations. To compensate this background biases, we calculated the mean angular displacements of the control simulations and subtracted those baseline values from the mean angular displacements obtained in each condition.
We applied paired sample ttest after checking normality of the data by Shapiro Wilk test (p>0.05).
Code availability
Simulations and visualization were written in C++ and Python 3. The codes are available at https://github.com/TatsuyaHaga/reversereplaymodel_codes (Haga, 2018; copy archived at https://github.com/elifesciencespublications/reversereplaymodel_codes).
References
 1

2
Synaptic modification by correlated activity: Hebb's postulate revisitedAnnual Review of Neuroscience 24:139–166.https://doi.org/10.1146/annurev.neuro.24.1.139
 3

4
A model of spatial map formation in the hippocampus of the ratNeural Computation 8:85–93.https://doi.org/10.1162/neco.1996.8.1.85
 5
 6
 7
 8

9
Falcon: a highly flexible opensource software for closedloop neuroscienceJournal of Neural Engineering 14:045004.https://doi.org/10.1088/17412552/aa7526
 10
 11
 12
 13
 14

15
Forward and reverse hippocampal placecell sequences during ripplesNature Neuroscience 10:1241–1242.https://doi.org/10.1038/nn1961

16
A note on two problems in connexion with graphsNumerische Mathematik 1:269–271.https://doi.org/10.1007/BF01386390
 17
 18

19
Sequence learning and the role of the hippocampus in rodent navigationCurrent Opinion in Neurobiology 22:294–300.https://doi.org/10.1016/j.conb.2011.12.005
 20
 21
 22

23
Contribution of individual spikes in burstinduced longterm synaptic modificationJournal of Neurophysiology 95:1620–1629.https://doi.org/10.1152/jn.00910.2005

24
Learning navigational maps through potentiation and modulation of hippocampal place cellsJournal of Computational Neuroscience 4:79–94.https://doi.org/10.1023/A:1008820728122

25
Selective suppression of hippocampal ripples impairs spatial memoryNature Neuroscience 12:1222–1223.https://doi.org/10.1038/nn.2384
 26
 27

28
The role of acetylcholine in learning and memoryCurrent Opinion in Neurobiology 16:710–715.https://doi.org/10.1016/j.conb.2006.09.002
 29
 30

31
Spiketiming dynamics of neuronal groupsCerebral Cortex 14:933–944.https://doi.org/10.1093/cercor/bhh053

32
Simple model of spiking neuronsIEEE Transactions on Neural Networks 14:1569–1572.https://doi.org/10.1109/TNN.2003.820440
 33

34
A unified dynamic model for learning, replay, and SharpWave/RipplesJournal of Neuroscience 35:16236–16258.https://doi.org/10.1523/JNEUROSCI.397714.2015
 35

36
Hippocampal sequenceencoding driven by a cortical multiitem working memory bufferTrends in Neurosciences 28:67–72.https://doi.org/10.1016/j.tins.2004.12.001

37
Neural ensembles in CA3 transiently encode paths forward of the animal at a decision pointJournal of Neuroscience 27:12176–12189.https://doi.org/10.1523/JNEUROSCI.376107.2007

38
Awake replay of remote experiences in the hippocampusNature Neuroscience 12:913–918.https://doi.org/10.1038/nn.2344
 39

40
The involvement of recurrent connections in area CA3 in establishing the properties of place fields: a modelThe Journal of Neuroscience 20:7463–7477.https://doi.org/10.1523/JNEUROSCI.201907463.2000
 41

42
A neoHebbian framework for episodic memory; role of dopaminedependent late LTPTrends in Neurosciences 34:536–547.https://doi.org/10.1016/j.tins.2011.07.006
 43

44
Silencing CA3 disrupts temporal coding in the CA1 ensembleNature Neuroscience 19:945–951.https://doi.org/10.1038/nn.4311
 45
 46
 47

48
International Conference on Neural Information Processing1–10, How reward can induce reverse replay of behavioral sequences in the hippocampus, International Conference on Neural Information Processing, 10.1007/11893028_1.
 49
 50
 51
 52
 53

54
A lognormal recurrent network model for burst generation during hippocampal sharp wavesJournal of Neuroscience 35:14585–14601.https://doi.org/10.1523/JNEUROSCI.494414.2015
 55
 56
 57
 58
 59

60
Path integration and cognitive mapping in a continuous attractor neural network modelThe Journal of Neuroscience 17:5900–5920.https://doi.org/10.1523/JNEUROSCI.171505900.1997
 61

62
Memory encoding by theta phase precession in the hippocampal networkNeural Computation 15:2379–2397.https://doi.org/10.1162/089976603322362400

63
Loss of recent memory after bilateral hippocampal lesionsJournal of Neurology, Neurosurgery & Psychiatry 20:11–21.https://doi.org/10.1136/jnnp.20.1.11
 64
 65
 66

67
The hippocampus as a predictive mapNature Neuroscience 20:1643–1653.https://doi.org/10.1038/nn.4650

68
Learning to predict by the methods of temporal differencesMachine Learning 3:9–44.https://doi.org/10.1007/BF00115009
 69

70
Associative memory and hippocampal place cellsInternational Journal of Neural Systems 6:81–86.
 71
 72
 73

74
Coactivation and timingdependent integration of synaptic potentiation and depressionNature Neuroscience 8:187–193.https://doi.org/10.1038/nn1387

75
Theta sequences are essential for internally generated hippocampal firing fieldsNature Neuroscience 18:282–288.https://doi.org/10.1038/nn.3904
 76

77
Hippocampal theta sequences reflect current goalsNature Neuroscience 18:289–294.https://doi.org/10.1038/nn.3909

78
Hippocampal awake replay in fear memory retrievalNature Neuroscience 20:571–580.https://doi.org/10.1038/nn.4507

79
Hippocampal replay captures the unique topological structure of a novel environmentJournal of Neuroscience 34:6459–6469.https://doi.org/10.1523/JNEUROSCI.341413.2014
 80
Decision letter

Upinder Singh BhallaReviewing Editor; National Centre for Biological Sciences, Tata Institute of Fundamental Research, India

Michael J FrankSenior Editor; Brown University, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for sending your article entitled "Recurrent network model for learning goaldirected sequences through reverse replay" for peer review at eLife. Your article is being evaluated by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation is being overseen by Michael Frank as the Senior Editor.
The reviewers felt that the topic of the paper, to obtain reverse replay through modified STDP rules, was interesting and led to some potentially significant predictions of network behavior. However, there were numerous concerns with the study, particularly relating to clarification of figures and results, and consistency with the literature.
Given the list of essential revisions, which could conceivably involve extensive new work, the editors and reviewers invite you to respond within the next two weeks with an action plan and timetable for the completion of the additional work. We plan to share your responses with the reviewers and then issue a binding recommendation.
1). The authors must clear up several points of confusion in data presentation. a) Figure 6 is confusing in many respects and can be interpreted in differentways from the authors. b) The section on theta modulation and sequences is unclear and must beclarified.
2) The authors must address concerns about apparent inconsistency with the experimental literature. These include the timecourse of excitatory input, the presence of forward replays in many studies, the conjunctive coding of space and direction of motion by place cells on linear tracks, and whether replay should correlate with previous experience or future navigation. There are also concerns about whether the learning rules are appropriate for hippocampus, and whether the firing patterns in the model look like invivo place cell patterns.
3) The authors should make some more testable predictions, for example, the effect of NMDA knockdown on reverse replay.
These concerns and other suggestions from the reviewers are attached to help the authors.
Reviewer #1:
In this study, the authors show that one can obtain reverse replay in 1 and 2D networks. This relies on symmetric STDP in combination with various forms of STP or postsynaptic after depolarization. The authors also map their learning rules to a Tmaze context using a 2D network and state that the network learns to do goaldirected path learning through reverse sequences. They examine how network connections organize following such learning. In principle this study is interesting as an implementation of a plasticitydriven approach to the emergence of forward and reverse replay, and provides a way to link it to goal planning.
The initial logic of the paper builds up nicely from Figure 1 through 5. One can see how to obtain reverse replay, there is evidence that this is reasonably robust, and one can see conditions where the replay will erase itself due to plasticity.
Figure 6 is a key figure, applying the learning rule to a 2D network upon which the authors place a T maze. This is a key figure and unfortunately is very confusing.
1) The authors talk about early and late trials. I am going to assume that only trial 9 and 10 are late trials, but the authors must clarify this point.
2) There is a listing of positions A, B, C1, C2, D1, D2 in Panel A mapping to positions 0, 1, 2, 3, 4, 5 in panels C and D. It is strange to have to jump around with the naming within a single figure.
3) Worse, the mapping is different in odd trials and even trails because position 3 from panel C can either be mapped to D1 (panel A) or to B (panel A). Similarly, position B of panel A could be either position 1 or position 3 of Panels C and D. This makes no sense.
4) The authors make several statements about the reverse replay sequencesthat are hard to identify in the figure. They should individually highlightspecific reverse sequences that they want to talk about.
5) I do not see any case in the later trials where sequences travel to D2 except possibly in the first couple of spontaneous responses in trial 9 and one spontaneous trial in 10. Each of these occurs before the training run. Instead almost all the cases, e.g., in Trial 10, start from D2 and go to B. This is a perfectly reasonable reverse replay but is not presented as such in the text.
6) I do not see any case where backward sequences start from D1 and go to D2, unless the authors are conflating position 3 with D1, rather than B. If so then it is the 3 cases (2 in Trial 9 and 1 in Trial 10, which occur before the training run) which fit the bill. If so, those 3 cases look like forward sequences to me.
7) The text says that some of the reverse replay sequences from D1 propagate into D2 instead of the stem arm. I do not see any instances of this, except again if the authors have confused the identity of position 3.
8) The supplementary video looks interesting but lacks annotation to give clarity. Its value is considerably diminished as a result. Nevertheless, my impression on watching it is that my interpretation of Figure 6 is correct, and that the authors have confused position B and position D1.
9) The key paragraph three in subsection “Goaldirected path learning through reverse replay” is very anecdotal. For every statement of a certain kind of replay, the authors need to first, point to examples, and second, give statistics for how often such replays occur in a series of randomized runs.
It may well be that I am quite misunderstanding this figure, in which case the authors should explain it more clearly. Otherwise I think the figure and movie do not support the text.
Reviewer #2:
The manuscript "Recurrent network model for learning goaldirected sequences through reverse replay" proposes an intriguing mechanism for reverse replay of the sequential activation of place cells: combining spiketiming dependent plasticity with synaptic depression, the proposal envisions a wave packet of activity traveling through the network of neurons, such that neurons at the tail end of the packet still fire, but due to synaptic depression, no longer synaptically impinge on the neurons at the front of the packet. As a consequence, Hebbian plasticity strengthens front to back connections, hence enabling reverse replay upon reactivation. In a 2D model, the authors present a nice application of how such a network can produce sequences of place cell activations towards a goal on a path that the animal has never experienced.
The most critical assumption is not, in my opinion, the "rapid modulation of STDP" by synaptic depression, but rather the persistence of neural activity behind the immediate wavefront. Because of profound synaptic depletion, the fact is that there is no reverberant activity that would support the packet and cause neurons to continue to be active.
The trick appears to lie in a time constant of τ^{exc}=10 ms with which the excitatory input is convolved (Equation 3) Or, in the spiking network, an NMDA time constant of 150 ms (Equation 16), wherein the peak conductance for NMDA is slaved to the AMPA conductance.
The value of 10 ms for the time constant is, at the very least, debatable. Going back to classic papers (Koch, Rapp, Segev, 1996) or Treves (1993), the real excitatory synaptic time constant (as opposed to the membrane time constant of 1020 ms) is extraordinarily short and on the order of 2 ms. With that kind of time constant, though, I believe the entire mechanism might collapse.
The second critique I would levy is that the model inhabits an intermediate realm: it is neither minimalist, nor veridically detailed.
In particular:
i) Equation 56 cover both facilitation and depression. As far as I can tell, facilitation is not at all necessary for the mechanism. Why is facilitation then included?
ii) Equation 1213 describe the Izhikevich, 2003 model with the parameter set for the regular spiking cell (though the fact that it is the parameter set for RS is not explicitly mentioned). The only possible advantage of the Izhikevich model over an integrateandfire model might lie in the adaptation of the firing. But is this important in the model?
iii) Going from 1D to 2D in subsection “Goaldirected path learning through reverse replay”, the idea of theta modulation and theta sequences is sprung upon the reader, but it never became clear to me whether Equation 36 (for the thetamodulated current input) is really necessary or not.
The third critique reflects the color scheme. Throughout, inactivity is represented by black, which makes the figures hard to read in a printout or even on the screen. Please choose another color scheme that results in a white (or lightcolored) background. Figure 6 is confusing, as it mixes letters on the Wshaped track (which, for some reason, is called a Tshaped track), but then the panels use numeric labels; the mapping is only explained in the last sentence of the caption. On some panels of Supplementary Figure 2, the ticklabels on the yaxes cannot be read.
Reviewer #3:
The paper describes a modified version of spiketiming dependent longterm plasticity (STDP) modulated by shortterm synaptic plasticity (STP). The main advantage of this modulation is to be able to obtain an effectively asymmetric STDP rule starting from a symmetric one (symmetric STDP has been recently observed in hippocampus). The authors show for instance how an imposed sequential activation of neurons can modify synapses in a network, so that a network can spontaneously "replay" the sequence in the opposite order, as observed in place cells.
I think that the results nicely characterize the properties of the hypothesized plasticity rule and show a potential application in neural networks.
I have two major concerns:
1) The connection with dynamics of hippocampal place cell activity strongly focuses on reverse replays, but there might be other aspects to consider more carefully:
i) Forward replays: The modeling presented in the first part of the paper, where a reverse replay is elicited by the stimulation of place cell coding for the middle of the track, should be probably compared to the results of Davidson et al., 2009. In that paper, the rodents frequently stop away from the two ends of the track; this more closely resembles the scenario depicted in the manuscript. The results of the paper indicated that forward replays were as (or more) likely to occur compared to backward replays (similar to Diba and Buzsaki, 2007), and the propagation speed of forward and backward replays was comparable. How can one reconcile the results reported in the manuscript with these observations?
ii) Directionality of firing in linear tracks: It is generally observed that place cells code conjunctively for spatial location and direction of motion in 1D. This aspect of place field firing has not been discussed and it's not entirely clear to me how to interpret the authors’ results in light of this experimental observation.
iii) Replays in 2D: Pfeiffer and Foster, 2013 reported replays that generally did not correlate with the previous experience of the animal, rather there was a correlation with the immediate future navigational behavior of the rodent. In those experiments, the goal location was moved from session to session, and those sequences were observed within the first few trials. In a followup commentary (Pfeiffer, 2017), it is argued that "reverse replay does not facilitate learning in a familiar environment". It would be useful to see those results and claims more carefully discussed in the manuscript.
iv) Theta sequences: During running, fast sequential activity (forward direction) of place cells within individual cycles of the theta oscillation are observed. I guess that before replays, these sequences could help in building up the asymmetric connections that later generates reverse replays. But what happens to theta sequences after the synapses are modified? Would they be less likely to occur? In general, I found the few references to theta sequences in the manuscript to be confusing.
2) The plasticity rule is loosely based on previous work in visual cortical synapses (Froemke et al., 2006). As such there is no evidence that this rule quantitatively captures the dynamics of synapses in neocortex or hippocampus. It would be helpful to test the plasticity rules with firing patterns more closely resembling the activity of in vivo place cells (e.g., fields of ~1s durations, peak firing ~10Hz, phase precessing) and realistic learning rates.
[Editors' note: the authors’ plan for revisions was approved and the authors made a formal revised submission.]
https://doi.org/10.7554/eLife.34171.027Author response
The reviewers felt that the topic of the paper, to obtain reverse replay through modified STDP rules, was interesting and led to some potentially significant predictions of network behavior. However, there were numerous concerns with the study, particularly relating to clarification of figures and results, and consistency with the literature.
1) The authors must clear up several points of confusion in data presentation.a) Figure 6 is confusing in many respects and can be interpreted in differentways from the authors.
Following the reviewers’ suggestions, we have clarified the definitions of types of firing sequences and added quantitative evaluations in Figure 7 (Figure 6 in the previous manuscript). In short, reverse replay refer to sequences propagating backward along the spatial paths that the animal has traveled and joint replay is sequences through unexperienced paths combining two (or more) of experienced paths. Forward replay represents sequences that start from the starting point (A) and propagate along oncetraveled paths in the forward direction. We specifically call sequences from A to D2 (reward site) as “goaldirected sequence”. We have confirmed our conclusion by statistics from simulations of 10 model rats (5 rats visited the arms in the reversed visiting order) (Figure 7FH). These points are explained in paragraph four of subsection “Goaldirected path learning through reverse replay” in the revised manuscript.
We have clarified the relationship between the linearized plots and the three arms of the Ymaze by showing how we linearized the track in Figure 7C (we took a visualization method from Wu and Foster, 2014). We have also added explanations on the structure of the track, the number of trials, the schedule of rat’s movements, reward ON/OFF, and important activity patterns to the movie. In figures including heatmap (ex. Figures 1, 2 and 7), we have changed a color code in which white indicates zero.
b) The section on theta modulation and sequences is unclear and must beclarified.
We have added references for phase precession and theta sequence (e.g. O’Keefe and Recce, 1993; Dragoi and Buzsaki, 2006; Foster and Wilson, 2007), and have performed additional simulations to discuss the effect of phase precession and theta sequence in our model (Figure 5). We have evaluated the weight biases generated by spike trains mimicking placecell activity with phase precession. The simulation results suggest that a bias to the reverse direction is reliably induced when the peak firing rate >20 Hz (Figure 5B). However, since this value is higher than the average peak firing rate of place cells, we have additionally shown two biologically plausible cases in which significant biases can appear. First, the summation of the bias effects induced by multiple presynaptic place cells with overlapping place fields can result in a significant overall bias in realistic settings of the mean firing rate (Figure 5C). Second, because the firing rates of place cells are lognormally distributed (Mizuseki and Buszaki, 2013), some place cells show higher firing rates and produce large weight changes with large biases. However, we have found that phase precession hardly affects the weight biases (Figure 5B and 5C). All these points are explained in paragraph two of subsection “Bias effects induced by spike trains during run”.
Furthermore, simulations in 2D space have indicated the role of theta sequences for reading out the learned paths towards the goal (Figure 10), as suggested in experiments (Johnson and Redish, 2007; Wikenheiser and Redish, 2015). These points are explained in paragraph three of subsection “Unbiased sequence propagation enhances goaldirected behavior in a 2D space”.
2) The authors must address concerns about apparent inconsistency with the experimental literature. These include the timecourse of excitatory input, the presence of forward replays in many studies, the conjunctive coding of space and direction of motion by place cells on linear tracks, and whether replay should correlate with previous experience or future navigation. There are also concerns about whether the learning rules are appropriate for hippocampus, and whether the firing patterns in the model look like invivo place cell patterns.
As for the timecourse of excitatory input, we have included simulation data to show that the bias to the reverse direction occurs even without NMDA current and with shorter AMPA and inhibitory time constants. Furthermore, we have evaluated the weight biases generated by realistic placecell activity with phase precession, as mentioned above. We have added discussion on the directionality of place cells and the related experimental results (Davidson et al., 2009, Pfeiffer and Foster, 2013 and Pfeiffer, 2017) in paragraph two of subsection “Some limitations of the present model”.
Especially, to connect our model to Pfeiffer and Foster, 2013 and Pfeiffer, 2017, we have additionally simulated learning in a 2D openfield using a 2D neural network (see Figure 9). In a 2D openfield, the directions of propagation of firing sequences from a reward site are often isotropic in our model, and are not limited to the (reverse) direction of the path that the animal has just experienced. This isotropic propagation will create connections converging to the reward site and bias both replays and theta sequences towards the reward site from an arbitrary surrounding point on the field. We suggest that these properties of our model account for experimental observations in openfield exploration tasks (Pfeiffer and Foster, 2013). We have added a new subsection "Unbiased sequence propagation enhances goaldirected behavior in a 2D space" to explain all these results together with a quantitative evaluation of the directional biases in replay and theta sequences.
3) The authors should make some more testable predictions, for example, the effect of NMDA knockdown on reverse replay.
Experimental evidence suggests that NMDA knockdown impairs the plasticity in the hippocampus and hence degrades the formation of replay (Silva et al., 2016) and the performance of spatial learning task (Morris et al., 1986). In that case, all replay events including reverse replay will disappear and they will become ineffective for learning, whatever the roles of these events are. Therefore, we think that predictions in such a condition may not be meaningful enough. Below, we list other possible predictions of the model, which we have included in the subsection "Testable predictions of the model".
Now, selective control of reverse replay is possible by using the techniques of realtime decoding feedback (Ciliberti and Kloosterman, 2017; Ciliberti, Michon and Kloosterman, 2016). First, our model predicts the consequence of such a control. For instance, if we selectively block reverse replay at a reward site, prospective firing sequences tend to propagate away from the spatial site, and accordingly the animal’s preference to the reward location will be abolished.
Second, as shown in Figure 1AD, our model predicts that the modulation of STDP by shortterm depression is crucial for the preferential strengthening of synaptic pathways reversal to the preceding activity propagation. In addition, based on the plasticity mechanisms shown in Figure 1E and 1F, we expect that this preference should be further enhanced in our model if the time window of symmetric STDP is extended into the acausal temporal domain (t_{post} < t_{pre}). We speculate that some neuromodulators (e.g., dopamine or acetylcholine) may cause such a metaplasticity effect in the STDP of CA3.
Third, modulation of triggering replay events (or sharpwave ripples) is crucial for goaldirected learning in our model. Recently, it has been revealed that CA2 (Oliva et al., 2016) and dentate gyrus (Sasaki et al., 2018) trigger the significant amount of sharpwave ripples in the awake state. Therefore, we predict that CA2 and dentate gyrus also play active roles in the goal selection of hippocampal path learning.
These concerns and other suggestions from the reviewers are attached to help the authors.
Reviewer #1:
In this study, the authors show that one can obtain reverse replay in 1 and
2D networks. This relies on symmetric STDP in combination with various forms of STP or postsynaptic afterdepolarization. The authors also map their learning rules to a Tmaze context using a 2D network and state that the network learns to do goaldirected path learning through reverse sequences. They examine how network connections organize following such learning. In principle this study is interesting as an implementation of a plasticitydriven approach to the emergence offorward and reverse replay, and provides a way to link it to goal planning.
The initial logic of the paper builds up nicely from Figure 1 through 5. One can see how to obtain reverse replay, there is evidence that this is reasonably robust, and one can see conditions where the replay will erase itself due to plasticity.
We thank the reviewer for their overall positive comments on our work.
Figure 6 is a key figure, applying the learning rule to a 2D network upon which the authors place a T maze. This is a key figure and unfortunately is very confusing.
We apologize that our figure was confusing. We have improved the data presentation in the revised manuscript as shown below. Please note that the previous Figure 6 is shown as Figure 7 in the revised manuscript.
1) The authors talk about early and late trials. I am going to assume that only trial 9 and 10 are late trials, but the authors must clarify this point.
Late trials referred to trial 5 to 10 because sequences from D1 to D2 already appear in trial 5. However, we agree that the usage of early and late trials was not clear enough. In the revised manuscript, we showed linearized plots only for trial 5 and 6, which contain all types of sequences we want to discuss. In Figure 7, we have indicated the sequences referred to in the text by colored (red, blue and black) arrows.
2) There is a listing of positions A, B, C1, C2, D1, D2 in Panel A mapping to positions 0, 1, 2, 3, 4, 5 in panels C and D. It is strange to have to jump around with the naming within a single figure.
3) Worse, the mapping is _different_ in odd trials and even trails because position 3 from panel C can either be mapped to D1 (panel A) or to B (panel A). Similarly, position B of panel A could be either position 1 or position 3 of Panels C and D. This makes no sense.
These confusions were caused because we mapped two branches of a twodimensional maze onto a onedimensional axis. As a consequence, the portions 0→1, 1→3 and 3→5 in the linearized plot refer to the portions A→B, B→D1 and B→D2, respectively, of the actual maze. Thus, the correspondence between the two labeling schemes is not onetoone, which confused the reviewer. As illustrated in Figure 7CE, we have improved the data presentation by using the color scheme introduced originally in a related experimental paper (Wu and Foster, 2014).
4) The authors make several statements about the reverse replay sequences that are hard to identify in the figure. They should individually highlight specific reverse sequences that they want to talk about.
We have identified the examples of replay events in all the panels (Figure 7D and 7E) we mention in the text. The following three comments raised by the reviewer have been also clarified in the revised manuscript.
5) I do not see any case in the later trials where sequences travel to D2 except possibly in the first couple of spontaneous responses in trial 9 and one spontaneous trial in 10. Each of these occurs before the training run. Instead almost all the cases, e.g., in Trial 10, start from D2 and go to B. This is a perfectly reasonable reverse replay but is not presented as such in the text.
We apologize to the reviewer for our misleading presentation of replay events. In Figure 7, sequences traveling from A to D2 are the sequences propagating from neuron #1 to #20 (A→B) and successively from neuron #61 to #100 (B→D2) at the initial stages of trials 5 and 6. These sequences are indicated by black arrows. Note that the previous panels for trials 9 and 10 have been removed from the figure and novel panels have been included (Figure 7FH) to show some quantitative evaluations of the relative frequencies among different firing sequences.
6) I do not see any case where backward sequences start from D1 and go to D2, unless the authors are conflating position 3 with D1, rather than B. If so then it is the 3 cases (2 in Trial 9 and 1 in Trial 10, which occur before the training run) which fit the bill. If so, those 3 cases look like forward sequences to me.
Examples of such sequences are sequences involving neuron #60 to #21 (D1 → B) and neuron #61 to neuron #100 (B → D2). This sequential firing pattern represents one compound sequence. The end point of such a sequence is indicated by a blue arrow in trial 5.
7) The text says that some of the reverse replay sequences from D1 propagate into D2 instead of the stem arm. I do not see any instances of this, except again if the authors have confused the identity of position 3.
Replay events from D2 to D1 is represented by slanted lines starting from neuron #100 and ending at neuron #61 (D2 to B) or by lines starting from neuron #21 and ending at neuron #60 (B to D1). This means that the sequence jumps from neuron #61 to # 21 in the linearized plots. The start point of such a sequence is indicated by a blue arrow in trial 6.
8) The supplementary video looks interesting but lacks annotation to give clarity. Its value is considerably diminished as a result. Nevertheless, my impression on watching it is that my interpretation of Figure 6 is correct, and that the authors have confused position B and position D1.
We have added explanations of each trial to the movie (e.g. the structure of the track, the trial number, the schedule of rat’s movement, reward ON/OFF, explanation of important activity patterns).
9) The key paragraph three in subsection “Goaldirected path learning through reverse replay” is very anecdotal. For every statement of a certain kind of replay, the authors need to first, point to examples, and second, give statistics for how often such replays occur in a series of randomized runs.
We have clearly pointed the examples of replay events in a 2D, and then given the statistics of replay events in the simulations performed with different random seeds and different visiting orders of two arms in paragraph three of subsection “Goaldirected path learning through reverse replay”.
It may well be that I am quite misunderstanding this figure, in which case the authors should explain it more clearly. Otherwise I think the figure and movie do not support the text.
We should agree that the figures and movie were hard to interpret. We hope the improved manuscript resolves all the doubts raised by the reviewer.
Reviewer #2:
[…] The most critical assumption is not, in my opinion, the "rapid modulation of STDP" by synaptic depression, but rather the persistence of neural activity behind the immediate wavefront. Because of profound synaptic depletion, the fact is that there is no reverberant activity that would support the packet and cause neurons to continue to be active.
The trick appears to lie in a time constant of τ^{exc} =10 ms with which the excitatory input is convolved (Equation 3) Or, in the spiking network, an NMDA time constant of 150 ms (Equation 16), wherein the peak conductance for NMDA is slaved to the AMPA conductance.
The value of 10 ms for the time constant is, at the very least, debatable. Going back to classic papers (Koch, Rapp, Segev, 1996) or Treves (1993), the real excitatory synaptic time constant (as opposed to the membrane time constant of 1020 ms) is extraordinarily short and on the order of 2 ms. With that kind of time constant, though, I believe the entire mechanism might collapse.
We built the model on the basis of bump attractor network models (e.g. Samsonovich and Mcnaughton, 1997; Romani and Tsodyks, 2014). In such models, the generation of localized activity packets (bumps) mainly depends on the distribution of connection weights (e.g., strong localized excitation and weak global inhibition), but the time constant is not important. As for the structure of connection weights, some experiments (e.g., Takahashi et al., 2010; Guzman et al., 2016) suggested that the clusters of strong excitatory connections are overexpressed in CA3, which supports the assumption of strong localized excitation necessary for our model. To show our learning mechanism does not depend on the synaptic time constants, we performed additional simulations with removed NMDA current and shorten AMPA and inhibitory time constants (Figure 2—figure supplement 1).
The second critique I would levy is that the model inhabits an intermediate realm: it is neither minimalist, nor veridically detailed.
In particular:
i) Equation 56 cover both facilitation and depression. As far as I can tell, facilitation is not at all necessary for the mechanism. Why is facilitation then included?
As the reviewer pointed out, facilitation is not necessary for our proposed mechanism. However, we implemented it to evaluate the model’s performance under a realistic situation because strong facilitation is observed in the hippocampus (e.g. Guzman et al., 2016). We could show that our learning mechanism reliably works with a physiologically realistic range of shortterm facilitation (Figure 4). We wish to note that in the simulations of the 2D network the facilitation contributes to the generation of theta sequences (Wang et al., 2014).
ii) Equation 1213 describe the Izhikevich, 2003 model with the parameter set for the regular spiking cell (though the fact that it is the parameter set for RS is not explicitly mentioned). The only possible advantage of the Izhikevich model over an integrateandfire model might lie in the adaptation of the firing. But is this important in the model?
In our model, the choice of spiking neuron model is relatively arbitrary because the proposed mechanism does not crucially depend on the specific spiking patterns. Therefore, the use of Izhikevich model is not required. However, the adaptation of firing prevents the persistence of activity packets within the same neuron group, hence making their sequential propagation easier. We briefly mentioned this point in subsection “Potentiation of reverse synaptic transmissions by STDP” in the revised manuscript.
iii) Going from 1D to 2D in subsection “Goaldirected path learning through reverse replay”, the idea of theta modulation and theta sequences is sprung upon the reader, but it never became clear to me whether Equation 36 (for the thetamodulated current input) is really necessary or not.
We apologize for the confusion. We have performed additional simulations to discuss the effect of theta sequences on the proposed mechanisms. Simulation results are shown in the subsection "Bias effects induced by spike trains during run", and suggest that theta sequences have almost no effects on learning in our model (Figure 5). However, as suggested in Johnson and Redish, 2007 and Wikenheiser and Redish, 2015, theta sequences are useful for the readout of the learned paths towards the goal. Namely, once synaptic connections are biased towards the position of reward, theta sequences also tend to extend towards the reward, which gives information on the adequate paths to the animal. We have shown this role of theta sequences by simulations in a 2D openfield (Figure 10). We have included references to phase precession and theta sequence (e.g. O’Keefe and Recce, 1993; Dragoi and Buzsaki, 2006; Foster and Wilson, 2007).
The third critique reflects the color scheme. Throughout, inactivity is represented by black, which makes the figures hard to read in a printout or even on the screen. Please choose another color scheme that results in a white (or lightcolored) background. Figure 6 is confusing, as it mixes letters on the Wshaped track (which, for some reason, is called a Tshaped track), but then the panels use numeric labels; the mapping is only explained in the last sentence of the caption. On some panels of Supplementary Figure 2, the ticklabels on the yaxes cannot be read.
We apologize that some figures were not clear enough and Figure 6 (Figure 7 in the revised manuscript) was confusing. Following your suggestions, we have changed the color scheme and labels in the related panels, explained mapping in the linearized plots in a more comprehensive manner. We also corrected the labels in Figure 4—figure supplement 1 (previous Supplementary Figure 2).
Reviewer #3:
The paper describes a modified version of spiketiming dependent longterm plasticity (STDP) modulated by shortterm synaptic plasticity (STP). The main advantage of this modulation is to be able to obtain an effectively asymmetric STDP rule starting from a symmetric one (symmetric STDP has been recently observed in hippocampus). The authors show for instance how an imposed sequential activation of neurons can modify synapses in a network, so that a network can spontaneously "replay" the sequence in the opposite order, as observed in place cells.
I think that the results nicely characterize the properties of the hypothesized plasticity rule and show a potential application in neural networks.
We thank the reviewer for their positive comments on the STDP learning rule proposed in the manuscript.
I have two major concerns:
1) The connection with dynamics of hippocampal place cell activity strongly focuses on reverse replays, but there might be other aspects to consider more carefully:
i) Forward replays: The modeling presented in the first part of the paper, where a reverse replay is elicited by the stimulation of place cell coding for the middle of the track, should be probably compared to the results of Davidson et al., 2009. In that paper, the rodents frequently stop away from the two ends of the track; this more closely resembles the scenario depicted in the manuscript. The results of the paper indicated that forward replays were as (or more) likely to occur compared to backward replays (similar to Diba and Buzsaki, 2007), and the propagation speed of forward and backward replays was comparable. How can one reconcile the results reported in the manuscript with these observations?
As suggested by the reviewer, our model may explain the weak bias towards forward replays observed in the neural activity recorded in the early stage of learning (Davidson et al., 2009). Once the rat got a reward at the end of the track and reverse replay generated a bias towards the reward site, we expect, from the results of simulations of our model, to observe more forward replays in the middle of the track. Because reverse replay can be observed immediately after the first lap (Foster and Wilson 2006; Wu and Foster, 2014), the bias to the forward direction (i.e., towards the reward) can appear during the very early stage of learning. On the other hand, our evaluation of the bias effect with Poisson spike trains suggests that the bias to the reverse direction is stochastic (not 100% reliable). Therefore, we speculate that the consolidation of strong bias requires repeated experiences. However, we also note that the situation studied in Davidson et al., 2009 is more complex than our simulation setting because they used a very long track to observe long replay sequences across multiple ripples. We briefly mentioned these points in paragraph one of subsection “Testable predictions of the model” and paragraph two of subsection “Some limitations of the present model” in the revised manuscript. We wish to study this interesting problem in the future.
We also wish to leave the problem of propagation speed for our future study because the speed depends on many factors. Propagation speed is directly affected by weight biases in our simple model. However, in the hippocampus, many other factors such as oscillation and dendritic spikes (Jahnke et al., 2015) can affect the propagation speed. Furthermore, some experiment suggests that the propagation speed can be actively controlled without changing sensory inputs (Patalkova et al., 2008). We feel that taking all these factors into account is beyond the scope of this study.
ii) Directionality of firing in linear tracks: It is generally observed that place cells code conjunctively for spatial location and direction of motion in 1D. This aspect of place field firing has not been discussed and it's not entirely clear to me how to interpret the authors’ results in light of this experimental observation.
A straightforward interpretation of our results under unidirectionality is that a 1D track is represented by two subnetworks of unidirectional place cells encoding opposite movement directions at each location. In that case, learning an experienced path (e.g. A to D2 in Figure. 7) is possible in one of the subnetworks through the mechanism we have shown. However, to learn an unexperienced path by combination of multiple paths with different directionality (e.g. D1 to D2 in Figure 7), joint replay through the two subnetworks (B to D1 and B to D2) is necessary. Such replay is experimentally observed (Davidson et al., 2009; Wu and Foster, 2014). The previous studies proposed that Hebbian plasticity supports the generation of joint replay by connecting multiple unidirectional place cells at the junction (Brunel and Trullier, 1998; Káli and Dayan, 2000; Buzsaki, 2005). This issue is related to how the hippocampus connects together and generalizes multiple different episodes and chunks in information streams. We have briefly discussed this issue in paragraph two of subsection “Some limitations of the present model” in the revised manuscript. We will investigate this interesting problem in the future.
iii) Replays in 2D: Pfeiffer and Foster, 2013 reported replays that generally did not correlate with the previous experience of the animal, rather there was a correlation with the immediate future navigational behavior of the rodent. In those experiments, the goal location was moved from session to session, and those sequences were observed within the first few trials. In a followup commentary (Pfeiffer, 2017), it is argued that "reverse replay does not facilitate learning in a familiar environment". It would be useful to see those results and claims more carefully discussed in the manuscript.
In our example of 2D neural network, the task was essentially 1D. Therefore, the previous simulations were not really adequate for discussing the implications of our model in navigating a 2D open field. In a 2D openfield, we observed in our model that firing sequences often propagate isotropically from a reward site irrespective of the previous experience. Such an isotropic activity propagation creates 2D connections that converge to the reward site, and hence biases the directions of both forward replays and theta sequences towards the reward site from any position on the field. We think that these properties of our model correspond to the experimental observation in open fields (Pfeiffer and Foster, 2013). In the revised manuscript, we have presented the simulation results of a 2D openfield in Figures 9 and 10 and have discussed the relationship to Pfeiffer and Foster, 2013 and Pfeiffer, 2018 in the subsection "Unbiased sequence propagation enhances goaldirected behavior in a 2D space". A brief discussion is also found in subsection “Some limitations of the present model” in the Discussion.
iv) Theta sequences: During running, fast sequential activity (forward direction) of place cells within individual cycles of the theta oscillation are observed. I guess that before replays, these sequences could help in building up the asymmetric connections that later generates reverse replays. But what happens to theta sequences after the synapses are modified? Would they be less likely to occur? In general, I found the few references to theta sequences in the manuscript to be confusing.
We have added references for theta sequences (e.g. O’Keefe and Recce 1993; Dragoi and Buzsaki, 2006; Foster and Wilson, 2007). We have performed simulation of place cell activity with phase precession in Figure 5, and the results suggest that the bias is not affected by the existence of phase precession (or theta sequences). The subsection "Bias effect induced by spike trains during run" is devoted to these points. However, we think that theta sequences assist the readout of the learned paths towards the goal (Johnson & Redish 2007; Wikenheiser and Redish, 2015). This is because after synaptic connections are biased, theta sequences tend to spread towards the reward site and gives the animal useful information on the rewarding paths. We have shown this role of theta sequences by simulating navigation in a 2D openfield (Figure 10).
2) The plasticity rule is loosely based on previous work in visual cortical synapses (Froemke et al., 2006). As such there is no evidence that this rule quantitatively captures the dynamics of synapses in neocortex or hippocampus. It would be helpful to test the plasticity rules with firing patterns more closely resembling the activity of in vivo place cells (e.g., fields of ~1s durations, peak firing ~10Hz, phase precessing) and realistic learning rates.
In the revised manuscript, we have evaluated the weight biases generated by placecelllike activity with phase precession. Our results suggest that biases towards reverse direction are reliably induced when peak firing rate during theta precession firing is greater than 20 Hz. This rate seems to be higher than the average peak firing rate over place cell population. However, summing up bias effects induced in multiple place cells with overlapping place fields resulted in significant bias in the realistic mean firing rate setting (Figure 5B). Furthermore, because the firing rates of place cells are lognormally distributed (Mizuseki and Buszaki, 2013), a small fraction of place cells with high firing rates (~30Hz) will contribute to the generation of weight biases. Our results also suggest that phase precession does not significantly affect the weight bias. This is because the time windows of STDP (70ms) are long enough to eliminate the small influences of different thetaphases in spiking such that only coarsegrained firing rates (place fields) determine the effect of synaptic plasticity (Figure 5—figure supplement 1). These points are discussed in the second paragraph of subsection “Bias effects induced by spike trains during run”.
https://doi.org/10.7554/eLife.34171.028Article and author information
Author details
Funding
Ministry of Education, Culture, Sports, Science, and Technology (15H04265)
 Tomoki Fukai
Core Research for Evolutional Science and Technology (JPMJCR13W1)
 Tomoki Fukai
Ministry of Education, Culture, Sports, Science, and Technology (16H01289)
 Tomoki Fukai
Ministry of Education, Culture, Sports, Science, and Technology (17H06036)
 Tomoki Fukai
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We are grateful to Masami Tatsuno, Kaoru Inokuchi, Takeshi Yagi, and members in Fukai lab for fruitful discussion.
Senior Editor
 Michael J Frank, Brown University, United States
Reviewing Editor
 Upinder Singh Bhalla, National Centre for Biological Sciences, Tata Institute of Fundamental Research, India
Publication history
 Received: December 7, 2017
 Accepted: July 2, 2018
 Accepted Manuscript published: July 3, 2018 (version 1)
 Version of Record published: July 25, 2018 (version 2)
Copyright
© 2018, Haga et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,174
 Page views

 232
 Downloads

 0
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Download citations (links to download the citations from this article in formats compatible with various reference manager tools)
Open citations (links to open the citations from this article in various online reference manager services)
Further reading

 Evolutionary Biology
 Neuroscience

 Neuroscience