Synaptic learning rules for sequence learning

  1. Eric Torsten Reifenstein  Is a corresponding author
  2. Ikhwan Bin Khalid
  3. Richard Kempter
  1. Institute for Theoretical Biology, Department of Biology, Humboldt-Universität zu Berlin, Germany
  2. Bernstein Center for Computational Neuroscience Berlin, Germany
  3. Einstein Center for Neurosciences Berlin, Germany

Abstract

Remembering the temporal order of a sequence of events is a task easily performed by humans in everyday life, but the underlying neuronal mechanisms are unclear. This problem is particularly intriguing as human behavior often proceeds on a time scale of seconds, which is in stark contrast to the much faster millisecond time-scale of neuronal processing in our brains. One long-held hypothesis in sequence learning suggests that a particular temporal fine-structure of neuronal activity — termed ‘phase precession’ — enables the compression of slow behavioral sequences down to the fast time scale of the induction of synaptic plasticity. Using mathematical analysis and computer simulations, we find that — for short enough synaptic learning windows — phase precession can improve temporal-order learning tremendously and that the asymmetric part of the synaptic learning window is essential for temporal-order learning. To test these predictions, we suggest experiments that selectively alter phase precession or the learning window and evaluate memory of temporal order.

Introduction

It is a pivotal quality for animals to be able to store and recall the order of events (‘temporal-order learning’, Kahana, 1996; Fortin et al., 2002; Lehn et al., 2009Bellmund et al., 2020) but there is only little work on the neural mechanisms generating asymmetric memory associations across behavioral time intervals (Drew and Abbott, 2006). Putative mechanisms need to bridge the gap between the faster time scale of the induction of synaptic plasticity (typically milliseconds) and the slower time scale of behavioral events (seconds or slower). The slower time scale of behavioral events is mirrored, for example, in the time course of firing rates of hippocampal place cells (O'Keefe and Dostrovsky, 1971), which signal when an animal visits certain locations (‘place fields’) in the environment. The faster time scale is given by the temporal properties of the induction of synaptic plasticity (Markram et al., 1997; Bi and Poo, 1998) — and spike-timing-dependent plasticity (STDP) is a common form of synaptic plasticity that depends on the millisecond timing and temporal order of presynaptic and postsynaptic spiking. For STDP, the so-called ‘learning window’ describes the temporal intervals at which presynaptic and postsynaptic activity induce synaptic plasticity. Such precisely timed neural activity can be generated by phase precession, which is the successive across-cycle shift of spike phases from late to early with respect to a background oscillation (Figure 1). As an animal explores an environment, phase precession can be observed in the activity of hippocampal place cells with respect to the theta oscillation (O'Keefe and Recce, 1993; Buzsáki, 2002Qasim et al., 2021). Phase precession is highly significant in single trials (Schmidt et al., 2009; Reifenstein et al., 2012) and occurs even in first traversals of a place field in a novel environment (Cheng and Frank, 2008). Interestingly, phase precession allows for a temporal compression of a sequence of behavioral events from the time scale of seconds down to milliseconds (Figure 1; Skaggs et al., 1996; Tsodyks et al., 1996; Cheng and Frank, 2008), which matches the widths of generic STDP learning windows (Abbott and Nelson, 2000; Bi and Poo, 2001; Froemke et al., 2005; Wittenberg and Wang, 2006). This putative advantage of phase precession for temporal-order learning, however, has not yet been quantified. To assess the benefit of phase precession for temporal-order learning, we determine the synaptic weight change between pairs of cells whose activity represents two events of a sequence. Using both analytical methods and numerical simulations, we find that phase precession can dramatically facilitate temporal-order learning by increasing the synaptic weight change and the signal-to-noise ratio by up to an order of magnitude. We thus provide a mechanistic description of associative chaining models (Lewandowsky and Murdock, 1989) and extend these models to explain how to store serial order.

Rationale for temporal-order learning via phase precession.

Top: Behavioral events (A to D) happen on a time scale of seconds. Middle: These events are represented by different cells (a–d), which fire a burst of stochastic action potentials in response to the onset of their respective event. We assume that each cell shows phase precession with respect to the LFP’s theta oscillation (every second cycle is marked by a greay box). When the activities of multiple cells overlap, the sequence of behavioral events is compressed in time to within one theta cycle (two examples highlighted in the dashed, shaded boxes). Bottom: This faster time scale can be picked up by STDP and strengthen the connections between the cells of the sequence. Figure adapted from Korte and Schmitz, 2016.

Results

To address the question of how behavioral sequences could be encoded in the brain, we study the change of synapses between neurons that represent events in a sequence. We assume that the temporal order of two events is encoded in the asymmetry of the efficacies of synapses that connect neurons representing the two events (Figure 1). After the successful encoding of a sequence, a neuron that was activated earlier in the sequence has a strengthened connection to a neuron that was activated later in the sequence, whereas the connection in the reverse direction may be unchanged or is even weakened. As a result, when the first event is encountered and/or the first neuron is activated, the neuron representing the second event is activated. Consequently, the behavioral sequence could be replayed (as illustrated by simulations for example in Tsodyks et al., 1996; Sato and Yamaguchi, 2003; Leibold and Kempter, 2006; Shen et al., 2007; Cheng, 2013; Chenkov et al., 2017; Malerba and Bazhenov, 2019; Gillett et al., 2020) and the memory of the temporal order of events is recalled (Diba and Buzsáki, 2007; Schuck and Niv, 2019). We note, however, that in what follows we do not simulate such a replay of sequences, which would depend also on a vast number of parameters that define the network; instead, we rather focus on the underlying change in connectivity, which is the very basis of replay, and draw connections to ‘replay’ in the Discussion.

Let us now illustrate key features of the encoding of the temporal order of sequences. To do so, we consider the weight change induced by the activity of two sequentially activated cells i and j that represent two behavioral events (dashed lines in Figure 2A). Classical Hebbian learning (Hebb, 1949), where weight changes Δwij depend on the product of the firing rates fi and fj, is not suited for temporal-order learning because the weight change is independent of the order of cells:

Δwijfifj=fjfiΔwji.
Model of two sequentially activated phase-precessing cells.

(A) Oscillatory firing-rate profiles for two cells (solid blue and cyan lines). The black curve depicts the population theta oscillation. For easier comparison of the two different frequencies, the population activity’s troughs are continued by thin gray lines, and the peaks of the cell-intrinsic theta oscillation are marked by dots. Dashed lines depict the underlying Gaussian firing fields without theta modulation. (B) Phase precession of the two cells (same colors as in A). The compression factor c describes the phase shift per theta cycle for an individual cell (2πc). For the temporal separation Tij of the firing fields and the theta frequency ω, the phase difference between the cells is ωcTij. The dots depict the times of the maxima in (A). (C) Resulting cross-correlation for the two firing rates from (A). The solid red curve shows the full cross-correlation. The dashed line depicts the cross-correlation without theta-modulation. The gray region indicates small (t<20 ms) time lags. (D) Same as in (C), but zoomed in. Note that the first peak of the theta modulation is at a positive non-zero time lag, reflecting phase precession. The dashed black curve shows the approximation of the cross-correlation for the analytical treatment (Materials and methods, Equation 17). (E) Synaptic learning window. The gray region indicates the region in which the learning window is large, and this region is also indicated in (C) and (D). Positive time lags correspond to postsynaptic activity following presynaptic activity. Parameters for all plots: Tij=0.3 s, ω=2π10 Hz, σ=0.3 s, τ=10 ms, μ=1, c=0.042, A=10.

Therefore, a classical Hebbian weight change is symmetric, that is, Δwij-Δwji=0. This result can be generalized to learning rules that are based on the product of two arbitrary functions of the firing rates. We note that, although not suited for temporal-order learning, Hebbian rules are able to achieve more general ‘sequence learning’, where an association between sequence elements is created — independent of the order of events. To become sensitive to temporal order, we use spike-timing dependent plasticity (STDP; Markram et al., 1997; Bi and Poo, 1998). For STDP, average weight changes depend on the cross-correlation function of the firing rates (example in Figure 2C,D),

Cij(t):=dtfi(t)fj(t+t),

which is anti-symmetric: Cij(t)=Cji(-t). Assuming additive STDP, that is, weight changes resulting from pairs of pre- and postsynaptic action potentials are added, the average synaptic weight change Δwij between the two cells in a sequence can then be calculated explicitly (Kempter et al., 1999):

(1) Δwij=-+dtW(t)Cij(t)

where W is the STDP learning window (example in Figure 2E). We aim solve Equation 1 for given firing rates fi and fj. To do so, we assume that the synaptic weight wij is generally small and thus only has a weak impact on the cross-correlation of the cells during encoding, that is, for the ‘encoding’ of a sequence the cross-correlation function is dominated by feedforward input, whereas the recurrent inputs are neglected.

Next, let us show that the symmetry of W is essential for temporal-order learning. Any learning window W can be split up into an even part Weven, with Weven(t)=Weven(-t), and an odd part Wodd, with Wodd(t)=-Wodd(-t), such that W=Weven+Wodd. For even learning windows, one can derive from Equation 1 and the anti-symmetry of Cij that weight changes are symmetric, that is, Δwij=Δwji; therefore, only the odd part Wodd of W is useful for learning temporal order.

To further explore requirements for encoding the temporal order of a sequence of events, we restrict our analysis to odd learning windows. We then can relate the weight change Δwij to the essential features of Cij(t). To do so, we integrate Equation 1 by parts (with W replaced by Wodd),

(2) Δwij=[Wodd¯(t)Cij(t)]+=0++dt[Wodd¯(t)]Cij(t),

with the primitive Wodd¯(t):=-tdtWodd(t) and the derivative Cij(t):=ddtCij(t). Because Wodd¯(t) can be assumed to have finite support (note that -+dtWodd(t)=0), the first term in Equation 2 vanishes. Also the learning window has finite support, and therefore we can restrict the integral in the second term in Equation 2 to a finite region of width K around zero:

(3) Δwij=|t|<Kdt[-Wodd¯(t)]Cij(t)

where K describes the width of the learning window W (gray region in Figure 2E). The integral in Equation 3 can be interpreted as the cross-correlation’s slope around zero, weighted by the symmetric function -Wodd¯(t); interestingly, features of Cij for |t|K, for example whether side lobes of the correlation function are decreasing or not, are irrelevant.

As a generic example of sequence learning, let us consider the activities of two cells i and j that encode two behavioral events, for example the traversal of two place fields of two hippocampal place cells. In general, the cells’ responses to these events are called ‘firing fields’. We model these firing fields as two Gaussian functions G0,σ and GTij,σ that have the same width σ but different mean values 0 and Tij (we note that Tij and σ are measured in units of time, that is, seconds; Figure 2A, dashed curves). In this case of identical Gaussian shapes of the two firing fields, the cross-correlation Cij(t) is also a Gaussian function, denoted by GTij,2σ, but with mean Tij and width 2σ (dashed curve in Figure 2C). The value σ=0.3 s, which we use in the example of Figure 2, matches experimental findings on place cells (O'Keefe and Recce, 1993; Geisler et al., 2010).

It is widely assumed that phase precession facilitates temporal-order learning (Skaggs et al., 1996; Dragoi and Buzsáki, 2006; Schmidt et al., 2009), but it has never been quantitatively shown. To test this hypothesis and to calculate how much phase precession contributes to temporal-order learning, we consider Gaussian firing fields that exhibit oscillatory modulations with theta frequency ω (Figure 2A, solid curves). The time-dependent firing rate of cell i is described by fi(t)Gμi,σ(t){1+cos[ω(t-cμi)]}, that is, a Gaussian that is multiplied by a sinusoidal oscillation; see also Equation 11 in Materials and methods. Phase precession occurs with respect to the population theta, which oscillates at a frequency of (1-c)ω that is slightly smaller than ω, with a ‘compression factor’ c that is usually small: 0c1 (Dragoi and Buzsáki, 2006; Geisler et al., 2010). This compression factor c describes the average advance of the firing phase — from theta cycle to theta cycle — in units of the fraction of a theta cycle; c thus determines the slope ωc of phase precession (Figure 2B). A typical value is cπ/(4σω), which accounts for ‘slope-size matching’ of phase precession (Geisler et al., 2010); that is, c is inversely proportional to the field size L:=4σ of the firing field, and the total range of phase precession within the firing field is constant and equals π180. If there are multiple theta oscillation cycles within a firing field (ωσ1), which is typical for place cells, the cross-correlation Cij(t) is a theta modulated Gaussian (solid curve in Figure 2C; see also Equation 15 in Materials and methods).

The generic shape of the cross-correlation Cij in Figure 2C allows for an advanced interpretation of Equation 3, which critically depends on the width K of the learning window W. We distinguish here two limiting cases: narrow learning windows (K1/ωσ), that is, the width K of the learning window is much smaller than a theta cycle and the width of a firing field, and wide learning windows (Kσ), that is, the width K of the learning window exceeds the width of a firing field. Let us first consider narrow learning windows. Only later in this manuscript, we will turn to the case of wide learning windows.

Dependence of temporal-order learning on the overlap of firing fields for narrow learning windows (K1/ωσ)

We first show formally that sequence learning with narrow learning windows requires that the two firing fields do overlap, that is, their separation Tij should be less than or at least similar to the width σ of the firing fields. In Equation 3, which was derived for odd learning windows, the weight change Δwij is determined by Cij(t) around t=0 in a region of width K. For narrow learning windows (K1/ω), this region is small compared to a theta oscillation cycle and much smaller than the width σ of a firing field. Because the envelope of the cross-correlation Cij(t) is a Gaussian with mean Tij and width 2σ, the slope Cij(t=0) scales with the Gaussian factor GTij,2σ(0)exp[-Tij2/(4σ2)]. The weight change Δwij therefore strongly depends on the separation Tij of the firing fields. When the two firing fields do not overlap (Tijσ), the factor exp[-Tij2/(4σ2)] quickly tends to zero, and sequence learning is not possible. On the other hand, when the two firing fields do have considerable overlap (Tijσ) we have exp[-Tij2/(4σ2)]1. In this case, sequence learning may be feasible with narrow learning windows. In this section, we will proceed with the mathematical analysis for overlapping fields, which allows us to assume exp[-Tij2/(4σ2)]1.

For overlapping firing fields (Tijσ), let us now consider the fine structure of the cross-correlation Cij(t) for |t|<K, as illustrated in Figure 2D. Importantly, phase precession causes the first positive peak (i.e. for t>0) of Cij to occur at time cTij with c1 (Dragoi and Buzsáki, 2006; Geisler et al., 2010); phase precession also increases the slope Cij(t) around t=0, which could be beneficial for temporal-order learning according to Equation 3. To quantify this effect, we calculated the cross-correlation’s slope at t=0 (see also Equation 18 in Materials and methods):

(4) Cij(0)GTij,2σ(0)[Tijσ+ωσsin(ωcTij)+Tij2σcos(ωcTij)].

How does Cij(0) depend on the temporal separation Tij of the firing fields? If the two fields overlap entirely (Tij=0) the sequence has no defined temporal order, and thus Cij(0) is zero. For at least partly overlapping firing fields (Tijσ) and typical phase precession where c=π/(4ωσ)1, we will show in the next paragraph (and explain in Materials and methods in the text below Equation 18) that the second addend in Equation 4 dominates the other two. In this case, Cij(0) is much higher as compared to the cross-correlation slope in the absence of phase precession (c=0), leading to a clearly larger synaptic weight change for phase precession. The maximum of Cij(0) is mainly determined by this second addend (multiplied by GTij,2σ(0)) and it can be shown (see Materials and methods) that this maximum is located near Tij2σ .

The increase of Cij(0) induced by phase precession can be exploited by learning windows W that are narrower than a theta cycle (e.g. gray regions in Figure 2C,D,E). To quantify this effect, let us consider a simple but generic shape of a learning window, for example, the odd STDP window W(t)=μsign(t)exp(-|t|/τ) with time constant τ and learning rate μ>0 (Figure 2E); this STDP window is narrow for τ1/ω. Equations 3 and 4 then lead to (see Materials and methods, Equation 19) the average weight change

(5) Δwij=A2μτ2GTij,2σ(0)σ[Tijσ+ωσsin(ωcTij)1+ω2τ2+Tij2σcos(ωcTij)1ω2τ2(1+ω2τ2)2]

where A depicts the number of spikes per field traversal. Note that, according to Equation 3, the weight change Δwij in Equation 5 can be interpreted as a time-averaged version of Cij(t) near t=0 from Equation 4. Thus, Equations 4 and 5 have a similar structure, but Equation 5 includes multiple incidences of the term ω2τ2 that account for this averaging. This term is small for narrow learning windows (τ1/ω) and can thus be neglected (ω2τ21) in this limiting case; however, for typical biological values of τ10 ms and ω=2π10 Hz, the peculiar structure of the ω2τ2-containing factor in the third addend in the square brackets is the reason why this addend can be neglected compared to the first one; as a result, the cases of ‘phase locking’ (c=0) and ‘no theta’ (only the first addend remains) are basically indistinguishable. Moreover, for narrow odd learning windows, Δwij in Equation 5 inherits a number of properties from Cij(0) in Equation 4: the second addend still remains the dominant one for Tijσ; inherited are also the absence of a weight change for fully overlapping fields (Δwij=0 for Tij=0), the maximum weight change for Tij2σ, and Δwij0 for Tij (Figure 3A). Furthermore, the prefactor A2μτ2 in Equation 5 suggests that the average weight change increases with increasing width τ of the learning window, but we emphasize that this increase is restricted to τ1/ω (as we assumed for the derivation), which prohibits a generalization of the quadratic scaling to large τ; the exact dependence on τ will be explained later.

Temporal-order learning for narrow learning windows (τ1ω).

(A) The average synaptic weight change Δwij depends on the temporal separation Tij between the firing fields. Phase precession (blue) yields higher weight changes than phase locking (red). Simulation results (circles, averaged across 104 repetitions) and analytical results (lines, Equation 5) match well. The vertical lines mark time lags of σ and 4σ, respectively, where 4σ approximates the total field width. (B) The benefit B of phase precession is determined by the ratio of the average weight changes of two scenarios from (A). The solid and dashed lines depict the analytical expression for the benefit (Equation 20) and its approximation for small Tij (Equation 7), respectively. (C) Signal-to-noise ratio (SNR) of the weight change as a function of the firing-field separation Tij. The SNR is defined as the mean weight change divided by the standard deviation across trials in the simulation. Colors as in (A). Parameters for all plots: ω=2π10 Hz, σ=0.3 s, τ=10 ms, μ=1, c=0.042, A=10.

To quantify how much better a sequence can be learned with phase precession as compared to phase locking, we use the ratio of the weight change Δwij with phase precession (c>0) and the weight change Δwij(c=0) without phase precession (Figure 3A), and define the benefit B of phase precession as

(6) B:=ΔwijΔwij(c=0)-1.

By inserting Equation 5 in Equation 6, we can explicitly calculate the benefit B of phase precession (see Equation 20 in Materials and methods and solid line in Figure 3B). For Tijσ and ω4τ41 (see Materials and methods) the benefit B is well approximated by a Taylor expansion up to third order in Tij (dashed line in Figure 3B),

(7) B23ω2σ2c[1-(c4σ21-ω2τ21+ω2τ2+16ω2c2)Tij2].

The maximum of B as a function of Tij is obtained for Tij=0 (fully overlapping fields), but the average weight change Δwij is zero at this point. We note, however, that B decays slowly with increasing Tij, so B(Tij=0) can be used to approximate the benefit for small field separations Tij (i.e. largely overlapping fields). For narrow (ωτ1) odd STDP windows and slope-size matching (ωσc=π/4), we find the maximum Bmaxωσ/2, which has an interesting interpretation: If we relate σ to the field size L of a Gaussian firing field through L=4σ and if we relate the frequency ω to the period Tθ of a theta oscillation cycle through Tθ=2π/ω, we obtain Bmax0.82L/Tθ, that is, the maximum benefit of phase precession is about the number of theta oscillation cycles in a firing field. The example in Figure 3B (with firing fields in Figure 2A) has the maximum benefit Bmax10 and the benefit remains in this range for partly overlapping firing fields (0<Tijσ). We thus conclude that phase precession can boost temporal-order learning by about an order of magnitude for typical cases in which learning windows are narrower than a theta oscillation cycle and overlapping firing fields are an order of magnitude wider than a theta oscillation cycle.

So far, we have considered ‘average’ weight changes that resulted from neural activity that was described by a deterministic firing rate. However, neural activity often shows large variability, that is, different traversals of the same firing field typically lead to very different spike trains. To account for such variability, we have simulated neural activity as inhomogeneous Poisson processes (see Materials and methods for details). As a result, the change of the weight of a synapse, which depends on the correlation between spikes of the presynaptic and the postsynaptic cells, is a stochastic variable. It is important to consider the variability of the weight change (‘noise’) in order to assess the significance of the average weight change. For this reason, we utilize the signal-to-noise ratio (SNR), that is, the mean weight change divided by its standard deviation (see Materials and methods for details). To do so, we perform stochastic simulations of spiking neurons and calculate the average weight change and its variability across trials. This is done for phase-precessing as well as phase-locked activity. To connect this approach to our previous results, we confirm that the average weight changes estimated from many fields traversals matches well the analytical predictions (Figure 3A and B, see Materials and methods for details).

The SNR shown in Figure 3C summarizes how reliable is the learning signal in a single traversal of the two firing fields — for the assumed odd learning window. The SNR further depends on Tij and follows a similar shape as the weight changes in Figure 3A. For phase precession, there is a maximum SNR that is slightly shifted to larger Tij; for phase locking, SNR is always much lower. For the synapse connecting two cells with firing fields as in Figure 2A where Tij=σ, we find an SNR of 0.27, which is insufficient for a reliable representation of a sequence.

To allow reliable temporal-order learning, one possible solution is to increase the number of spikes per field traversal A (SNRA, as shown in Appendix 1). Another possibility is to increase the number of synapses. In Materials and methods we show that SNRM where M is the number of identical and uncorrelated synapses. Therefore, to achieve SNR1 for A=10, one needs M14 synapses.

In summary, for narrow, odd learning windows (τ1/ωσ), temporal-order learning could benefit tremendously from phase precession as long as firing fields have some overlap. Average weight changes and the SNR are highest, however, for clearly distinct but still overlapping firing fields. It should be noted that any even component of the learning window would increase the noise and thus further decrease the SNR.

Dependence of temporal-order learning on the width of the learning window for overlapping firing fields

To investigate how temporal-order learning for an odd learning window depends on its width, we vary the parameter τ and quantify the average synaptic weight change Δwij and the SNR both analytically and numerically. We first study overlapping firing fields (Figure 4) and later consider non-overlapping firing fields (Figure 5).

Effect of the learning-window width on temporal-order learning for overlapping fields (here: Tij=σ).

(A) Average weight change Δwij as a function of width τ (for the asymmetric window W in Equation 14) for phase precession and phase locking (colored curves). The solid black line depicts the theoretical maximum for large τ (Δwij52, Equation 8). The dashed curves show the analytical small-tau approximations (Equation 5). The dotted curve depicts the analytical approximation for the ’no theta’ case (Equation A2-46 in Appendix 2). The vertical dashed lines mark 1/ω0.016 s and the value of σ=Tij=0.3s, respectively. (B) The benefit B of phase precession is largest for narrow learning windows, and it approaches 0 for wide windows. Simulations (gray line) and analytical result (black line, small-tau approximation from Equation 20) match well. (C) The signal-to-noise ratio (SNR; phase precession: blue, phase locking: red, no theta: cyan) takes into account that only the asymmetric part of the learning window is helpful for temporal-order learning. For large τ, all three coding scenarios induce the same SNR. The horizontal dashed black line depicts the analytical limit of the SNR for large τ and overlapping firing fields (SNR1.6, Equation A1-17 of Appendix 1). The dotted black line depicts the analytical expression for the ’no theta’ case (Equation A2-48 in Appendix 2, the curve could not be plotted for τ0.1 s due to numerical instabilities). Dots represent the SNR for experimentally observed learning windows. The learning windows were taken from ‘B&P’, Bi and Poo, 2001: their Figure 1, ‘F’, Froemke et al., 2005: their Figure 1D bottom, ‘W&W’, Wittenberg and Wang, 2006: their Figure 3, ‘P&G’, Pfister and Gerstner, 2006: their Table 4, ‘All to All’, ‘minimal model’, and ‘B’, Bittner et al., 2017: their Figure 3D. For ‘B&P’, ‘F’, and ‘B’, the position of the dots on the horizontal axis was estimated as the average time constants for positive and negative lobes of the learning windows. Wittenberg and Wang modeled their learning rule by a difference of Gaussians — we approximated the corresponding time constant as 30 ms. For the triplet rule by Pfister and Gerstner, we used the average of three time constants: the two pairwise-interaction time constants (as in Bi and Poo) and the triplet-potentiation time constant. Parameters for all plots: Tij=0.3 s, ω=2π10 Hz, σ=0.3 s, c=0.042, A=10, μ=1. Colored/gray curves and dots are obtained from stochastic simulations; see Materials and methods for details.

Temporal-order learning for non-overlapping firing fields using wide, asymmetric learning windows.

(A) Firing rates of two example cells with non-overlapping firing fields. (B) Cross-correlation Cij of the two cells from (A). (C) Asymmetric learning window with large width (τ=5 s). (D) Resulting weight change Δwij for wide learning window and non-overlapping firing fields. The solid gray line depicts the average weight change. The dashed gray lines represent ±1 standard deviation across 1000 repetitions of stochastic spiking simulations. The analytical curve (dashed black line, Equation 9) matches the simulation results. (E) SNR of the weight change. Results of the stochastic simulations are shown by the gray curve. The SNR saturates for larger Tij, which fits the analytical expectation (dashed black line, Equation 10). Parameters, unless varied in a plot: Tij=6 s, σ=0.3 s, τ=5 s, μ=1, A=10.

For partly overlapping firing fields (e.g. Tij=σ), we find numerically that the average synaptic weight change Δwij (the ‘learning signal’) increases monotonically for increasing τ and saturates (colored curves in Figure 4A). This is because for increasing τ the overlap between the learning window and the cross-correlation function grows, and this overlap begins to saturate as soon as the learning window is wider than Tij, that is, the value at which the cross-correlation assumes its maximum (cmp. dashed curve in Figure 2C). To analytically calculate the saturation value of Δwij for large learning-window widths (τσ), we can approximate the learning window as a step function (see Materials and methods for details) and find the maximum

(8) ΔwijmaxA2μerf(Tij2σ)

that provides an upper bound to the weight change for overlapping firing fields (solid line in Figure 4A). For τ1/ω (and actually well beyond this region), the analytical small-tau approximation of Δwij (Equation 5, dashed curves in Figure 4A) matches the numerical results well.

The results in Figure 4A confirm that Δwij is increased by phase precession for narrow learning windows but is independent of phase precession for τ1/ω. Thus, the benefit B becomes small for large τ (Figure 4B) because, for large enough τ, the theta oscillation completes multiple cycles within the width of the learning window. To better understand this behavior, let us return to Equation 1: if the product of a wide learning window and the cross-correlation Cij is integrated to obtain the weight change, the oscillatory modulation of the cross-correlation (e.g. as in Figure 2C) becomes irrelevant; similarly, according to Equation 3, the particular value of the derivative Cij(t) near t=0 can be neglected. Consequently, for τ1/ω phase precession and phase locking as well as the scenario of firing fields that are not theta modulated yield the same weight change (Figure 4A), and the benefit approaches 0 (Figure 4B). Wide learning windows thus ignore the temporal (theta) fine-structure of the cross-correlation.

How noisy is this learning signal Δwij across trials? Figure 4C shows that for odd learning windows the SNR increases with increasing τ and, for τ1ω, approaches a constant value. This constant value is the same for phase precession, phase locking, or no theta oscillations at all. Taken together, for large enough τ, the advantage of phase precession vanishes. For small enough τ, phase precession increases the SNR, which confirms and generalizes the results in Figure 3C. Remarkably, the SNR for ‘phase locking’ is lower than the one for ‘no theta’, which means that theta oscillations without phase precession degrade temporal-order learning, even though theta oscillations as such were emphasized to improve the modification of synaptic strength in many other cases (e.g. Buzsáki, 2002; D'Albis et al., 2015).

Figure 4C predicts that a large τ yields the biggest SNR, and thus wide learning windows are the best choice for temporal-order learning; however, we note that this conclusion is restricted to odd (i.e. asymmetric) learning windows. An additional even (i.e. symmetric) component of a learning window would increase the noise without affecting the signal, and thus would decrease the SNR (dots in Figure 4C). It is remarkable that the only experimentally observed instance of a wide window (with τ1 s in Bittner et al., 2017) has a strong symmetric component, which leads to a low SNR (dot marked ‘B’ in Figure 4C).

Taken together, we predict that temporal-order learning would strongly benefit from wide, asymmetric windows. However, to date, all experimentally observed (predominantly) asymmetric windows are narrow (e.g. Bi and Poo, 2001; Froemke et al., 2005; Wittenberg and Wang, 2006; see Abbott and Nelson, 2000; Bi and Poo, 2001 for reviews).

Temporal-order learning for wide learning windows (Kσ)

We finally restrict our analysis to wide learning windows, which allows us then to also consider non-overlapping firing fields (Figure 5A, we again use two Gaussians with widths σ and separation Tij). To allow for temporal-order learning in this case, the spikes of two non-overlapping fields can only be ‘paired’ by a wide enough learning window. As already indicated in Figure 4, phase precession does not affect the weight change for such wide learning windows where the width τ of the learning window obeys τ1/ω (note that we always assumed many theta oscillation cycles within a firing field, that is, 1/ωσ). Furthermore, Figure 4 indicated that only the asymmetric part of the learning window contributes to temporal-order learning. For the analysis of temporal-order learning with non-overlapping firing fields and wide learning windows, we thus ignore any theta modulation and phase precession and evaluate, again, only the odd STDP window W(t)=μsign(t)exp(-|t|/τ). In this case, the weight change (Equation 1) is still determined by the cross-correlation function and the learning window (examples in Figure 5B,C). The resulting weight change Δwij as a function of the temporal separation Tij of firing fields is shown in Figure 5D: with increasing Tij, the weight Δwij quickly increases, reaches a maximum, and slowly decreases. The initial increase is due to the increasing overlap of the Gaussian bump in Cij with the positive lobe of the learning window. The decrease, on the other hand, is dictated by the time course of the learning window. For τσ, these two effects can be approximated by

(9) ΔwijA2μerf(Tij2σ)exp(-Tijτ)

in which the error function describes the overlap of the cross-correlation with the learning window and the exponential term describes the decay of the learning window (dashed black curve in Figure 5D, see also Equation 25 in Materials and methods for details).

How does the SNR of the weight change depend on the separation Tij of firing fields? For Tij=0, the signal is zero and thus also the SNR. As Tij increases, both signal and noise increase, but quickly settle on a constant ratio. The value of the SNR height of this plateau can be approximated by

(10) SNRA2A+1

(dashed line in Figure 5E), where A is the number of spikes within a firing field (Equation 11). For A=10, we find SNR2.2, allowing for temporal-order learning with a single synapse. We note that this conclusion is limited to asymmetric STDP windows. A symmetric component (like in Bittner et al., 2017) decreases the SNR and makes temporal-order learning less efficient.

Taken together, temporal-order learning can be performed with wide STDP windows, and phase precession does not provide any benefit; but temporal-order learning requires a purely asymmetric plasticity window. For non-overlapping firing fields, wide learning windows are essential to bridge a temporal gap between the fields.

Discussion

In this report, we show that phase precession facilitates the learning of the temporal order of behavioral sequences for asymmetric learning windows that are shorter than a theta cycle. To quantify this improvement, we use additive, pairwise STDP and calculate the expected weight change for synapses between two activated cells in a sequence. We confirm the long-held hypothesis (Skaggs et al., 1996) that phase precession bridges the vastly different time scales of the slow sequence of behavioral events and the fast STDP rule. Synaptic weight changes can be an order of magnitude higher when phase precession organizes the spiking of multiple cells at the theta time scale as compared to phase-locking cells.

Other mechanisms and models for sequence learning

As an alternative mechanism to bridge the time scales of behavioral events and the induction of synaptic plasticity, Drew and Abbott, 2006 suggested STDP and persistent activity of neurons that code for such events. The authors assume regularly firing neurons that slowly decrease their firing rate after the event and show that this leads to a temporal compression of the sequence of behavioral events. For stochastically firing neurons, this approach is similar to ours with two overlapping, unmodulated Gaussian firing fields. In this case, sequence learning is possible, but the efficiency can be improved considerably by phase precession.

Sato and Yamaguchi, 2003 as well as Shen et al., 2007 investigated the memory storage of behavioral sequences using phase precession and STDP in a network model. In computer simulations, they find that phase precession facilitates sequence learning, which is in line with our results. In contrast to these approaches, our study focuses on a minimal network (two cells), but this simplification allows us to (i) consider a biologically plausible implementation of STDP, firing fields, and phase precession and (ii) derive analytical results. These mathematical results predict parameter dependencies, which is difficult to achieve with only computer simulations.

Related to our work is also the approach by Masquelier and colleagues Masquelier et al., 2009 who showed that pattern detection can be performed by single neurons using STDP and phase coding, yet they did not include phase precession. They consider patterns in the input whereas, in our framework, it might be argued that patterns between input and output are detected instead.

Noisy activity of neurons and prediction of the minimum number of synapses for temporal-order learning

To account for stochastic spiking, we use Poisson neurons. We find that a single synapse is not sufficient to reliably encode a minimal two-neuron sequence in a single trial because the fluctuations of the weight change are too large. Fortunately, the SNR scales with MA, that is, the square root of the number M of identical, but independent synapses and the number A of spikes per field traversal of the neurons. For generic hippocampal place fields and typical STDP, we predict that about 14 synapses are sufficient to reliably encode temporal order in a single traversal. Interestingly, peak firing rates of place fields are remarkably high (up to 50 spikes/s; e.g. O'Keefe and Recce, 1993, Huxter et al., 2003). Taken together, in hippocampal networks, reliable encoding of the temporal order of a sequence is possible with a low number of synapses, which matches simulation results on memory replay (Chenkov et al., 2017).

Width, shape, and symmetry of the STDP window are critical for temporal-order learning

Various widths have been observed for STDP learning windows (Abbott and Nelson, 2000; Bi and Poo, 2001). We show that for all experimentally found STDP time constants phase precession can improve temporal-order learning. However, for learning windows much wider than a theta oscillation cycle, the benefit of phase precession for temporal-order learning is small. Wide learning windows, where the width can be even on a behavioral time scale of 1 s (Bittner et al., 2017) or larger, could, on the other hand, enable the association of non-overlapping firing fields. Alternatively, non-overlapping firing fields might also be associated by narrow learning windows if additional cells (with firing fields that fill the temporal gap) help to bridge a large temporal difference, much like 'time cells' in the hippocampal formation (reviewed in Eichenbaum, 2014).

STDP windows typically have symmetric and asymmetric components (Abbott and Nelson, 2000Mishra et al., 2016). We find that only the asymmetric component supports the learning of temporal order. In contrast, the symmetric component strengthens both forward and backward synapses by the same amount and thus contributes to the association of behavioral events independent of their temporal order. For example, the learning window reported by Bittner et al., 2017 shows only a mild asymmetry and is thus unfavorable to store the temporal order of behavioral events. Only long, predominantly asymmetric STDP windows would allow for effective temporal-order learning (Figure 4).

Generally, the shape of STDP windows is subject to neuromodulation; for example, cholinergic and adrenergic modulation can alter its polarity and symmetry (Hasselmo, 1999). Also dopamine can change the symmetry of the learning window (Zhang et al., 2009). Therefore, sequence learning could be modulated by the behavioral state (attention, reward, etc.) of the animal.

Key features of phase precession for temporal order-learning: generalization to non-periodic modulation of activity

For STDP windows narrower (10 ms) than a theta cycle (100 ms), we argue that the slope of the cross-correlation function at zero offset controls the change of the weight of the synapse connecting two neurons; and we show that phase precession can substantially increase this slope. This result predicts that features of the cross-correlation at temporal offsets that are larger than the width of the learning window are irrelevant for temporal-order learning. It is thus conceivable to boost temporal-order learning even without phase precession, which is weak if theta oscillations are weak, as for example in bats (Ulanovsky and Moss, 2007) and humans (Herweg and Kahana, 2018Qasim et al., 2021). In this case, temporal-order learning may instead benefit from two other phenomena that could create an appropriate shape of the cross-correlation: (i) Spiking of cells is locked to common (aperiodic) fluctuations of excitability. (ii) Each cell responds the faster to an increase in its excitability the longer ago its firing field has been entered, which may be mediated by a progressive facilitation mechanism. Together, these phenomena can make the cross-correlation exhibit a steeper slope around zero and could even give rise to a local maximum at a positive offset. This temporal fine structure is superimposed on a slower modulation, which is related to the widths of the firing fields. In summary, a progressively decreasing delay of spiking with respect to non-rhythmic fluctuations in excitation generalizes the notion of phase precession. Interestingly, synaptic short-term facilitation, which could generate the described fine structure of the cross-correlation, has also been proposed as mechanism underlying phase precession (Leibold et al., 2008).

Model assumptions

In our model, we assumed that recurrent synapses (e.g. between neurons representing a sequence) are plastic but weak during encoding, such that they have a negligible influence on the postsynaptic firing rate; and that the feedforward input dominates neuronal activity. These assumptions seem justified as Hasselmo, 1999 indicated that excitatory feedback connections may be suppressed during encoding to avoid interference from previously stored information (see also Haam et al., 2018). Furthermore, neuromodulators facilitate long-term plasticity (reviewed, e.g. by Rebola et al., 2017), which also supports our assumptions.

The assumption of weak recurrent connections implies that these connections do not affect the dynamics. Consequently (and in contrast to Tsodyks et al., 1996), we thus hypothesize that phase precession is not generated by the local, recurrent network (see also, e.g. Chadwick et al., 2016); instead, we assume that phase precession is inherited from upstream feedforward inputs (Chance, 2012; Jaramillo et al., 2014) or generated locally by a cellular/synaptic mechanism (Magee, 2001; Harris et al., 2002; Mehta et al., 2002; Thurley et al., 2008). After temporal-order learning was successful, the resulting asymmetric connections could indeed also generate phase precession (as demonstrated by the simulations in Tsodyks et al., 1996), and this phase precession could then even be similar to the one that has initially helped to shape synaptic connections. Finally, inherited or local cellularly/synaptically generated phase precession and locally network-generated phase precession could interact (as reviewed, for example in Jaramillo and Kempter, 2017).

We assumed in our model that the widths of the two firing fields that represent two events in a sequence are identical (see, e.g. Figure 2A). But firing fields may have different widths, and in this case a slope-size matched phase precession would fail to reproduce the timing of spikes required for the learning of the correct temporal order of the two events. For example, the learned temporal order of events (timed according to field entry) would even be reversed if two fields with different sizes are aligned at their ends. How could the correct temporal order nevertheless be learned in our framework? In the hippocampus, theta oscillations are a traveling wave (Lubenov and Siapas, 2009; Patel et al., 2012) such that there is a positive phase offset of theta oscillations for the wider firing fields in the more ventral parts of the hippocampus. This traveling-wave phenomenon could preserve the temporal order in the phase-precession-induced compressed spike timing, as also pointed out earlier (Leibold and Monsalve-Mercado, 2017; Muller et al., 2018).

Our results on learning rules for sequence learning rely on pairwise STDP in which pairs of presynaptic and postsynaptic spikes are considered. Conversely, triplet STDP considers also motifs of three spikes (either 2 presynaptic - 1 postsynaptic or 2 postsynaptic - 1 presynaptic) (Pfister and Gerstner, 2006). Triplets STDP models can reproduce a number of experimental findings that pairwise STDP could not, for example the dependence on the repetition frequency of spike pairs (Sjöström et al., 2001). To investigate the influence of triplet interactions on sequence learning, we implemented the generic triplet rule by Pfister and Gerstner, 2006. We used their ‘minimal’ model, which was regarded as the best model in terms of number of free parameters and fitting error; for the parameters they obtained from fitting the triplet STDP model to hippocampal data, we found only mild differences to our results (see, e.g. Figure 4C). Differences are small because the fitted time constant of the triplet term (40 ms) is smaller than typical inter-spike intervals (50 ms, minimum in field centers) in our simulations.

Replay of sequences and storage of multiple and overlapping sequences

A sequence imprinted in recurrent synaptic weights can be replayed during rest or sleep (Wilson and McNaughton, 1994; Nádasdy et al., 1999; Diba and Buzsáki, 2007; Peyrache et al., 2009; Davidson et al., 2009), which was also observed in network-simulation studies (Matheus Gauy et al., 2020; Malerba and Bazhenov, 2019; Gillett et al., 2020). Replay could thus be a possible readout of the temporal-order learning mechanism. However, replay depends on the many parameters of the network, and a thorough investigation of is beyond the scope of this manuscript. Therefore, we focus on synaptic weight changes that represent the formation of sequences in the network, which underlies replay, and we do not simulate replay.

We have considered the minimal example of a sequence of two neurons. Sequences can contain many more neurons, and the question arises how two different sequences can be told apart if they both contain a certain neuron, but proceed in different directions — as they might do for sequences of spatial or non-spatial events (Wood et al., 2000). In this case, it may be beneficial to not only strengthen synapses that connect direct successors in the sequence but also synapses that connect the second-to-next neuron. In this way, the two crossing sequences could be disambiguated, and the wider context in which an event is embedded becomes associated, which is in line with retrieved-context theories of serial-order memory (Long and Kahana, 2019). More generally, it is an interesting question of how many sequences can be stored in a network of a given size. Gillett et al., 2020 were able to analytically calculate the storage capacity for the storage of sequences in a Hebbian network.

In conclusion, our model predicts that phase precession enables efficient and robust temporal-order learning. To test this hypothesis, we suggest experiments that modulate the shape of the STDP window or selectively manipulate phase precession and evaluate memory of temporal order.

Materials and methods

Experimental design: model description

Request a detailed protocol

We model the time-dependent firing rate of a phase precessing cell i (two examples in Figure 2A) as

(11) fi(t)=AGμi,σ(t){1+cos[ω(t-cμi)]},

where the scaling factor A determines the number of spikes per field traversal and Gμi,σ(t)=1/(2πσ)exp[-(t-μi)2/(2σ2)] is a Gaussian function that describes a firing field with center at μi and width σ. The firing field is sinusoidally modulated with theta frequency ω (but the sinusoidal modulation is not a critical assumption, see Discussion), with typically many oscillation cycles in a firing field (ωσ1). The compression factor c can be used to vary between phase precession (c>0), phase locking (c=0), and phase recession (c<0) because the average population activity of many such cells oscillates at frequency of (1-c)ω (Geisler et al., 2010; D'Albis et al., 2015), which provides a reference frame to assign theta phases (Figure 2A). Usually, |c|1 with typical values c1/(σω) (Geisler et al., 2010); for a pair of cells with overlapping firing fields (centers separated by Tij:=μj-μi) the phase delay is ωcTij (Figure 2B).

To quantify temporal-order learning, we consider the average weight change Δwij of the synapse from cell i to cell j, which is (Kempter et al., 1999)

(12) Δwij=dtW(t)Cij(t)

where Cij(t) is the cross-correlation between the firing rates fi and fj of cells i and j, respectively (Figure 2C,D):

(13) Cij(t)=dtfi(t)fj(t+t).

W(t) denotes the synaptic learning window, for example the asymmetric window

(14) W(t)=μ{+exp(t/τ),t0exp(+t/τ),t<0,

where τ is the time constant and μ>0 is the learning rate (Figure 2E).

For the following calculations, we make two assumptions that are reasonable in the hippocampal formation (O'Keefe and Recce, 1993; Bi and Poo, 2001; Geisler et al., 2010) :

  1. The theta oscillation has multiple cycles within the Gaussian envelope of the firing field in Equation 11 (1/ωσ).

  2. The window W is short compared to the theta period (τ1/ω).

Analytical approximation of the cross-correlation function

Request a detailed protocol

To explicitly calculate the cross-correlation Cij(t) as defined in Equation 13, we plug in the firing-rate functions (Equation 11) for the two neurons:

Cij(t)=dtAG0,σ(t)[1+cos(ωt)]AGTij,σ(t+t){1+cos[ω(t+tcTij)]}=A2dt{G0,σ(t)GTij,σ(t+t)+G0,σ(t)GTij,σ(t+t)cos(ωt)+G0,σ(t)GTij,σ(t+t)cos[ω(t+tcTij)]+G0,σ(t)GTij,σ(t+t)cos(ωt)cos[ω(t+tcTij)]}.

The first term (out of four) describes the cross-correlation of two Gaussians, which results in a Gaussian function centered at Tij and with width σ2. For the second term, we note that the product of two Gaussians yields a function proportional to a Gaussian with width σ/2, and then use assumption (i). When integrated, the second term’s contribution to Cij(t) is negligible because the cosine function oscillates multiple times within the Gaussian bump, that is, positive and negative contributions to the integral approximately cancel. The same argument applies to the third term. For the fourth term, we use the trigonometric property cos(α)cos(β)=12(cos(α+β)+cos(α-β)). We set α=ωt, β=ω(t+t-cTij) and find

G0,σ(t)GTij,σ(t+t)cos(ωt)cos[ω(t+tcTij)]=12G0,σ(t)GTij,σ(t+t)cos[ω(t+2tcTij)]+12G0,σ(t)GTij,σ(t+t)cos[ω(tcTij)].

Again, we use assumption (i) and neglect the first addend on the right-hand side. Notably, the cosine function in the second addend is independent of the integration variable t. Taken together, we find

(15) Cij(t)A2GTij,σ2(t){1+12cos[ω(t-cTij)]}.

Thus, the cross-correlation can be approximated by a Gaussian function (center at Tij, width σ2) that is theta modulated with an amplitude scaled by the factor 12.

To further simplify Equation 15, we note that the time constant τ of the STDP window is usually small compared to the theta period (assumption (ii), Figure 2C,D,E). Structures in Cij(t) for |t|τ thus have a negligible effect on the synaptic weight change. Therefore, we can focus on the cross-correlation for small temporal lags. In this range, we approximate the (slow) Gaussian modulation of Cij(t) (Figure 2C,D, dashed red line) by a linear function, that is,

(16) GTij,2σ(t)ddtGTij,2σ(t)|t=0t+GTij,2σ(0)=GTij,2σ(0)(Tij2σ2t+1).

Inserting this result in Equation 15, we approximate the cross-correlation function Cij(t) for |t|τ as (Figure 2D, dashed black line)

(17) Cij(t)A2GTij,2σ(0)(Tij2σ2t+1)[1+12cos(ω(t-cTij))].

In the Results, we show that the slope of the cross-correlation function at t=0 is important for temporal-order learning. From Equation 17 we find

(18) Cij(0)A2Tij2σ2GTij,2σ(0)[1+ωσωcσsin(ωcTij)ωcTij+cos(ωcTij)2],

which has three addends within the square brackets. Let us estimate the relative size of the second and third terms with respect to the first one. The third term is at most of the order of 0.5 because |cos(ωcTij)|1. For the second addend, we note that sin(ωcTij)/(ωcTij) approaches 1 for Tij0 and remains in this range for |ωcTij|π/4. This condition is fulfilled for |Tij|σ if we assume slope-size matching of phase precession (Geisler et al., 2010), that is, ωcσπ40.79. Then, the size of the second addend is dictated by the factor ωσ, which is large according to assumption (i). In other words, for typical phase precession and |Tij|σ, the second addend is much larger than the other two.

To further understand the structure of Cij(0), which is also shaped by the prefactors in front of the square brackets, we first note that Cij(0) is zero for fully overlapping firing fields (Tij0). On the other hand, for very large field separations (Tijσ), the Gaussian term G causes Cij(0) to become zero. The prefactors have a maximum at |Tij|=2σ. The maximum’s exact location is slightly shifted by the second addend but remains near 2σ. This peak will be important because it is inherited by the average weight change (Equation 3).

Average weight change

Request a detailed protocol

Having approximated the cross-correlation function and its slope at zero (Equations 17,18), we are now ready to calculate the average synaptic weight change (Equation 3) for the assumed STDP window (Equation 14). Standard integration methods yield

(19) Δwij=A2μτ2Tijσ2GTij,2σ(0)[1+ω2σ2cω2τ2+1sin(ωcTij)ωcTij+(1ω2τ2)cos(ωcTij)2(1+ω2τ2)2].

Because Δwij is a temporal average of Cij(t) for small t (see interpretation of Equation 3), the weight change’s structure resembles the previously discussed structure of Cij(0). The averaging introduces additional factors proportional to 1±ω2τ2, but for ωτ1 [assumption (ii)] those have only minor effects on the relative size of the three addends. The second term still dominates. Importantly, Δwij=0 for Tij=0 and the position of the peak at Tij2σ is inherited from Cij(0) (Figure 3A).

The benefit of phase precession

Request a detailed protocol

To quantify the benefit B of phase precession, we consider the expression Δwij/Δwij(c=0)-1, because Δwij describes the overall weight change (including phase precession), and Δwij(c=0) serves as the baseline weight change due to the temporal separation of the firing fields (without phase precession). We subtract 1 to obtain B=0 when the weight changes are the same with and without phase precession. From Equation 19 we find

(20) B=23ω2σ2csin(ωcTij)ωcTij1+ω2τ21+ω2τ2+23ω4τ4+cos(ωcTij)131ω2τ21+ω2τ2+23ω4τ4.

To better understand the structure of B, we Taylor-expand it in Tij up to the third order and assume ω4τ41 [assumption (ii)]. The result is

(21) B23ω2σ2c[1-(16ω2c2+c4σ21-ω2τ21+ω2τ2)Tij2].

Thus, B assumes a maximum for Tij=0 and slowly decays for small Tij (Figure 3B). Using slope-size matching (ωσc=π/4), the maximal benefit is

(22) Bmaxπ6ωσ=π212LTθ0.82LTθ,

where L=4σ depicts the total field size and Tθ=2πω is the period of the theta oscillation. Thus, the number of theta cycles per firing field determines the benefit for small separations of the firing fields.

Average weight change for wide learning windows

Request a detailed protocol

In this paragraph we relax assumption (ii), that is, we consider wide asymmetric learning windows W (Equation 14 with τσ). Furthermore, we neglect any theta-oscillatory modulation of the firing fields in Equation 11 and, thus, Cij in Equation 15.

First, for non-overlapping fields (Tijσ), the learning window can be approximated to be constant near the peak of the Gaussian bump of Cij. We can thus rewrite Equation 1 as

(23) ΔwijW(Tij)Cij(t)dt=A2μexp(Tijτ).

Second, for overlapping fields (0<Tijσ), the Gaussian bump of Cij partly lies on the negative lobe of W. We can approximate W(t)=sign(t), and the average weight change in Equation 1 then reads

(24) ΔwijA2μerf(Tij2σ).

Combining the two limiting cases in Equations 23 and 24 yields

(25) ΔwijA2μerf(Tij2σ)exp(-Tijτ).

Signal-to-noise ratio

Request a detailed protocol

To correctly encode the temporal order of behavioral events, the average weight change Δwij of a forward synapse needs to be larger than the average weight change Δwji of the corresponding backward synapse. We thus define the signal-to-noise ratio as

SNR=Δwij-Δwjistd(Δwijk)+std(Δwjik),

where std() denotes the standard deviation and Δwijk, Δwjik are the weight changes for trial k[1,N], the averages across trials being Δwij=Δwijkk and Δwji=Δwjikk. This expression for the SNR ‘punishes’ the non-sequence-specific strengthening of backward synapses. Specifically, SNR=0 for a symmetric (even) learning window, because the numerator (which represents the ‘signal’) is zero. On the other hand, a perfectly asymmetric learning window, like the one used throughout this study (Equation 14), yields SNR=Δwijstd(Δwijk), because Δwijk=-Δwjik. Asymmetric learning windows thus recover the classical definition of the SNR as the ratio between the average weight change and the standard deviation of the weight change.

We note that the generalized definition above can be used to calculate the SNR for arbitrary windows, such as the learning window from Bittner et al., 2017, Figure 4C.

Assuming an asymmetric window and M uncorrelated synapses with the same mean and variance of the weight change, we can write the signal-to-noise ratio as

SNR=MΔwijvar(k=1MΔwijk)=MΔwijstd(Δwijk),

because the variance of the sum can be decomposed into the sum of variances and covariances. All covariances are zero because synapses are uncorrelated. This leaves a sum of M variances, which are identical. Therefore, the standard deviation, and consequently also the SNR, scale with M.

Numerical simulations

Request a detailed protocol

To numerically simulate the synaptic weight change, spikes were generated by inhomogeneous Poisson processes with rate functions according to Equation 11. For every spike pair, the contribution to the weight change was calculated according to Equation 14. We repeated the simulations for N=104 trials, and the mean weight change as well as the standard deviation across trials and the SNR were estimated. All simulations were implemented in Python 3.8 using the packages NumPy (RRID:SCR_008633) and SciPy (RRID:SCR_008058). Matplotlib (RRID:SCR_008624) was used for plotting; Inkscape (RRID:SCR_014479) was used for final adjustments to the Figures. The Python code is available at https://gitlab.com/e.reifenstein/synaptic-learning-rules-for-sequence-learning (Reifenstein and Kempter, 2021; copy archived at swh:1:rev:157c347a735a090f591a2b77a71b90d7de65bca5).

Appendix 1

The signal-to-noise ratio of Δwij

A synapse with weight wij is assumed to connect neuron i to neuron j. Here, we aim to derive the signal-to-noise ratio (SNR) of the weight changes Δwij, which is defined as (Materials and methods)

(A1-1) SNR=Δwij-Δwjistd(Δwij)+std(Δwji).

where Δwij is the average signal. The noise is described by the standard deviation of the weight change,

std(Δwij)=var(Δwij)=Δwij2-Δwij2.

Signal and noise are generated by additive STDP and spiking activity that is modeled by two inhomogeneous Poisson processes with rates fi(t) and fj(t) that have finite support. The average weight change is calculated by Δwij=dtW(t)Cij(t) where W(t) is the synaptic learning window and Cij(t) depicts the cross-correlation function Cij(t)=dtfi(t)fj(t+t). From Kempter et al., 1999, we use

(A1-2) Δwij2(t)=Δwij(t0)2+2Δwij(t0)Δwij(t)+t0tdtt0tdu{Si(t)Si(u)(win)2+Sj(t)Sj(u)(wout)2+Si(t)Sj(u)2winwout+2dsW(s)(Si(t)Si(u+s)Sj(u)win+Sj(t)Si(u+s)Sj(u)wout)+dsdvW(s)W(v)Si(t+s)Si(u+v)Sj(t)Sj(u)},

where Si(t)=nδ(t-ti(n)) and Sj(t)=nδ(t-tj(n)) are the presynaptic and postsynaptic spike trains, respectively. To simplify, we set t0=- and Δwij(t0)=0. Furthermore, we are interested in paired STDP and thus set win=wout=0. For Δwij2=limtΔwij2(t), Equation A1-2 reduces to

(A1-3) Δwij2=-dt-dudsdvW(s)W(v)Si(t+s)Si(u+v)Sj(t)Sj(u).

Because both spike trains are drawn from different Poisson processes, Si and Sj are statistically independent, and therefore we can simplify

Si(t+s)Si(u+v)Sj(t)Sj(u)=Si(t+s)Si(u+v)Sj(t)Sj(u).

Moreover, in a spike train the spikes at different times are uncorrelated,

Si(t+s)Si(u+v)=Si(t+s)Si(u+v)+Si(t+s)δ(t+s-u-v)

and

Sj(t)Sj(u)=Sj(t)Sj(u)+Sj(t)δ(t-u).

As Si and Sj are realizations of inhomogeneous Poisson processes with rates fi(t) and fj(t), respectively, we find

Si(t+s)Si(u+v)=fi(t+s)fi(u+v)+fi(t+s)δ(t+suv)

and

Sj(t)Sj(u)=fj(t)fj(u)+fj(t)δ(tu).

We insert these expressions into Equation A1-3:

(A1-4) Δwij2=-dt-dudsdvW(s)W(v)F(t,s,u,v)

where

F(t,s,u,v)=[fi(t+s)fi(u+v)+fi(t+s)δ(t+suv)][fj(t)fj(u)+fj(t)δ(tu)].

To explicitly calculate the SNR, we parameterize the firing rates as

(A1-5) fi(t)=A2πσexp(-t22σ2)[1+cos(ωt)]

and

(A1-6) fj(t)=A2πσexp(-(t-Tij)22σ2)[1+cos(ω(t-cTij))].

See main text for definitions of symbols. Furthermore, we assume W(s)=Wodd(s)+Weven(s) with

Wodd(s)=μ{+exp(s/τ),s0exp(+s/τ),s<0,

(see Equation 14 in Materials and methods) and

Weven=λexp(-|s|/κ).

In what follows we consider a limiting case of wide learning windows, for which we can explicitly calculate the SNR. The results obtained in this case match well to the numerical simulations for wide learning windows (Figures 4 and 5 in the main text).

Wide learning windows

For wide windows (formally: τ, κ), we can approximate Weven=λ and Wodd(t)=μsgn(t) and neglect the sinusoidal modulations of fi and fj in Equation A1-5 and A1-6; phase precession does not affect the SNR in this case.

The following calculations are similar for odd and even windows. We elaborate the calculations in detail for odd windows and use ‘±’ and ‘’ to include the similar calculations for even windows. The top symbol (‘+’ and ‘—’, respectively) corresponds to odd windows; the bottom symbol corresponds to even windows.

To start, we split the third and fourth integral in Equation A1-4 into positive and negative time lags s and v, respectively:

(A1-7) 1μ2Δwij2(t)=dtdudsdvsgn(s)sgn(v)F(t,s,u,v)=dtdu{0ds0dvF(t,s,u,v)0ds0dvF(t,s,u,v)0ds0dvF(t,s,u,v)+0ds0dvF(t,s,u,v)}

We rewrite F as

F(t,s,u,v)=fi(t+s)fi(u+v)fj(t)fj(u)(i)+fi(t+s)fi(u+v)fj(t)δ(tu)(ii)+fi(t+s)δ(t+suv)fj(t)fj(u)(iii)+fi(t+s)δ(t+suv)fj(t)δ(tu)(iv),

which has four addends and occurs in four integrals in Equation A1-7. Thus, there are 16 terms we need to evaluate. We label these terms (1.i) to (1.iv) for the first integral, (2.i) to (2.iv) for the second integral and so on until (4.iv).

For the term (1.i) we find

(A1-8) dtdu0ds0dvfi(t+s)fi(u+v)fj(t)fj(u)=dtfj(t)dufj(u)0dsfi(t+s)0dvfi(u+v)=A24dtfj(t)[1erf(t2σ)]dufj(u)[1erf(u2σ)]=A24{dtfj(t)[1erf(t2σ)]}2=A24{dtfj(t)dtfj(t)erf(t2σ)}2.

The first integral is -dtfj(t)=A. The second integral can be solved by taking the derivative with respect to Tij:

dtfj(t)erf(t2σ)=dTijdtddTijfj(t)erf(t2σ)=dTijdt[fj(t)]erf(t2σ)(A1-9)(because ddTij fj(t)=ddt fj(t)fj(t))=dTij{[erf(t2σ)[fj(t)]]dt2Afi(t)[fj(t)]}(integration by parts)=dTij{[10(1)0]+2AA22πσexp(Tij24σ2)}=AπσdTijexp(Tij24σ2)(A1-10)=Aerf(Tij2σ)

Term (1.i) (Equation A1-8) thus reads:

dtdu0ds0dvfi(t+s)fi(u+v)fj(t)fj(u)=A24[AAerf(Tij2σ)]2=A44[1erf(Tij2σ)]2

For (2.i) we find

dtdu0ds0dv fi(t+s)fi(u+v)fj(t)fj(u)=dtfj(t)dufj(u)0dsfi(t+s)0dv fi(u+v)=A24dtfj(t)[1erf(t2σ)]dufj(u)[1+erf(u2σ)]=A24[AAerf(Tij2σ)][A+Aerf(Tij2σ)](using Equation A1-10)=A44[1erf2(Tij2σ)]

Term (3.i) is symmetric to (2.i) and thus yields the same result. For (4.i) we find (in analogy to the term (1.i)):

dtdu0ds0dvfi(t+s)fi(u+v)fj(t)fj(u)=A24(dtfj(t)[1+erf(t2σ)])2=A44[1+erf(Tij2σ)]2

We sum the contributions (1.i) to (4.i) for the odd learning window:

(A1-11) A44[1erf(Tij2σ)]22A44[1erf2(Tij2σ)]+A44[1+erf(Tij2σ)]2=A44[12erf(Tij2σ)+erf2(Tij2σ)2+2erf2(Tij2σ)+1+2erf(Tij2σ)+erf2(Tij2σ)]=A4erf2(Tij2σ)

Let us continue with the second term of F, which is labeled by ‘(ii)’, and consider the first (of four) integrals in Equation A1-7, that is, we continue with contribution (1.ii):

dtdu0ds0dvfi(t+s)fi(u+v)fj(t)δ(tu)=dtfj(t)duδ(tu)0dsfi(t+s)0dvfi(u+v)=A24dtfj(t)[1erf(t2σ)]duδ(tu)[1erf(u2σ)]=A24dtfj(t)[12erf(t2σ)+erf2(t2σ)]=A34[12erf(Tij2σ)+CA](using Equation A1-10),

with

C=-dtfj(t)erf2(t2σ),

which will be solved later for special cases. Note that C depends on Tij because fj(t) depends on Tij. For (2.ii) we find:

dtdu0ds0dvfi(t+s)fi(u+v)fj(t)δ(tu)=A24dtfj(t)[1erf(t2σ)][1+erf(t2σ)]=A24dtfj(t)[1erf2(t2σ)]=A34(1CA).

For (3.ii) we find the same:

dtdu0ds0dvfi(t+s)fi(u+v)fj(t)δ(tu)=A24dtfj(t)[1+erf(t2σ)][1erf(t2σ)]=A34(1CA).

For (4.ii) we find:

dtdu0ds0dvfi(t+s)fi(u+v)fj(t)δ(tu)=A24dtfj(t)[1+erf(t2σ)]2=A24dtfj(t)[1+2erf(t2σ)+erf2(t2σ)]=A34[1+2erf(Tij2σ)+CA].

Summing contributions (1.ii) to (4.ii) for the odd window yields:

(A1-12) A34[12erf(Tij2σ)+CA]2A34(1CA)+A34[1+2erf(Tij2σ)+CA]=A344CA=CA2

We continue with contribution (1.iii):

dtdu0ds0dvfj(t)fj(u)fi(t+s)δ(t+suv)=dtfj(t)dufj(u)0dsfi(t+s)0dvδ(t+suv)

Contribution (1.iii) is non-zero if the argument t+s-u-v of the delta function in the last integral (across v) is zero for some v, which varies from 0 to . The argument of the delta function is thus zero for some v if 0t+su<, which we can rewrite as ut+s and then use it in the integral across u, which leads to

dtfj(t)0dsfi(t+s)t+sdufj(u)=A2dtfj(t)0dsfi(t+s)[1+erf(s+tTij2σ)]=A2dtfj(t)[0dsfi(t+s)+0dsfi(t+s)erf(s+tTij2σ)]=A2dtfj(t)[A2(1erf(t2σ))+0dsfi(t+s)erf(s+tTij2σ)]=A24dtfj(t)[1erf(t2σ)]+A2dtfj(t)0dsfi(t+s)erf(s+tTij2σ)=A34[1erf(Tij2σ)]+DA2(using Equation A1-10)

with D:=dtfj(t)0dsfi(t+s)erf(s+tTij2σ). D will be evaluated later for special cases.

Similarly to (1.iii), we treat (2.iii):

dtdu0ds0dvfj(t)fj(u)fi(t+s)δ(t+suv)=dtfj(t)0dsfi(t+s)t+sdufj(u)=A2dtfj(t)0dsfi(t+s)[1erf(s+tTij2σ)]=A2dtfj(t)0dsfi(t+s)±A2dtfj(t)0dsfi(t+s)erf(s+tTij2σ)=A24dtfj(t)[1erf(t2σ)]±A2dtfj(t)0dsfi(t+s)erf(s+tTij2σ)=A34[1erf(Tij2σ)]±DA2

For (3.iii) we find:

dtdu0ds0dvfj(t)fj(u)fi(t+s)δ(t+suv)=dtfj(t)0dsfi(t+s)t+sdufj(u)=A2dtfj(t)0dsfi(t+s)[1+erf(s+tTij2σ)]=A24dtfj(t)[1+erf(t2σ)]A2dtfj(t)0dsfi(t+s)erf(s+tTij2σ)=A34[1+erf(Tij2σ)]DA2.

 with D:=dtfj(t)0dsfi(t+s)erf(s+tTij2σ), which we will evaluate later for special cases.

Finally, for (4.iii) we find

dtdu0ds0dvfj(t)fj(u)fi(t+s)δ(t+suv)=dtfj(t)0dsfi(t+s)t+sdufj(u)=A2dtfj(t)0dsfi(t+s)[1erf(s+tTij2σ)]=A24dtfj(t)[1+erf(t2σ)]A2dtfj(t)0dsfi(t+s)erf(s+tTij2σ)=A34[1+erf(Tij2σ)]DA2.

To sum the four contributions (1.iii) to (4.iii) for the odd window, we note that the first terms (square brackets) of (1.iii) and (2.iii) cancel, as well as the first terms of (3.iii) and (4.iii). We thus obtain:

DA2+DA2DA2DA2=A(DD).

We continue with contribution (1.iv):

dtdu0ds0dvfi(t+s)δ(t+suv)fj(t)δ(tu)=dtfj(t)duδ(tu)0dsfi(t+s)0dvδ(t+suv)=dtfj(t)0dsfi(t+s)t+sduδ(tu)=dtfj(t)0dsfi(t+s)sduδ(u)(with u=ut)=dtfj(t)0dsfi(t+s){1,for s>00,else=dtfj(t)0dsfi(t+s)=A2dtfj(t)[1erf(t2σ)]=A22[1erf(Tij2σ)](using Equation A1-10)

By similar arguments, (2.iv) yields:

dtdu0ds0dvfi(t+s)δ(t+suv)fj(t)δ(tu)=dtfj(t)0dsfi(t+s)t+sduδ(tu)=dtfj(t)0dsfi(t+s)sduδ(u)=dtfj(t)0dsfi(t+s){1,for s>00,else=0.

(3.iv) yields

dtdu0ds0dvfi(t+s)δ(t+suv)fj(t)δ(tu)=dtfj(t)0dsfi(t+s)t+sduδ(tu)=dtfj(t)0dsfi(t+s){1,for s>00,else=0.

(4.iv) yields

dtdu0ds0dvfi(t+s)δ(t+suv)fj(t)δ(tu)=dtfj(t)0dsfi(t+s)t+sduδ(tu)=dtfj(t)0dsfi(t+s){1,for s>00,else=dtfj(t)0dsfi(t+s)=A2dtfj(t)[1+erf(t2σ)]=A22[1+erf(Tij2σ)](using Equation A1-10).

We sum the contributions (1.iv) and (4.iv) and obtain A2. We now collect all terms for the odd window:

1μ2Δwij2=A4erf2(Tij2σ)+CA2+A(D-D)+A2.

So far, we have calculated the second moment of Δwij. In order to determine the variance, we need to calculate the average weight change for the odd window:

(A1-13) Δwij=dtW(t)Cij(t)=μdtsgn(t)Cij(t)=μdtsgn(t)dtfi(t)fj(t+t)=A2μ2πσdtsgn(t)exp((tTij)24σ2)(using Equation 15 of the main text)=A2μ2πσ[0dtexp((tTij)24σ2)0dtexp((tTij)24σ2)]=A2μ2[1+erf(Tij2σ)(1erf(Tij2σ))]=A2μerf(Tij2σ).

The variance thus reads:

(A1-14) 1μ2var(Δwij)=Δwij2Δwij2=A4erf2(Tij2σ)+CA2+A(DD)+A2A4erf2(Tij2σ)=(C+1)A2+A(DD).

For the signal-to-noise ratio, we note that the definition from Equation A1-1, for odd learning windows, simplifies to

SNR=Δwij-Δwjistd(Δwij)+std(Δwji)=Δwijstd(Δwij),

because Δwij=-Δwji for odd learning windows.

We insert Equation A1-13 and A1-14 and find

SNR=A2erf(Tij2σ)(C+1)A2+A(D-D).

To obtain the final result, we have to evaluate C, D, and D. We distinguish the two cases Tijσ and Tij=σ to approximate these three terms:

1. Tijσ:

(A1-15) erf(Tij2σ)1
C=-dtfj(t)erf2(t2σ)A,

because the Gaussian function fj is shifted far into the positive lobe of the error function.

D=dtfj(t)0dsfi(t+s)erf(t+sTij2σ)dtfj(t)0ds[fi(t+s)]=A2dtfj(t)[1erf(t2σ)]=A2[AAerf(Tij2σ)](using Equation A1-10)A2[AA](using Equation A1-15)=0.
D=dtfj(t)0dsfi(t+s)erf(t+sTij2σ)=A2dtfj(t)[1+erf(t2σ)]A2[A+A](using Equations A1-10 and A1-15)=A2.

Thus,

(A1-16) SNRA21(A+1)A2+A(0+A2)=A22A3+A2=A2A+1.

This number (for A=10) is indicated as the analytical comparison in Figure 5E. For large A, the SNR (Equation A1-16) approaches A/2.

2. Tij=σ:

erf(Tij2σ)=erf(12)0.52
C0.494A,
D-0.013A2,
D-0.507A2,

all of which we calculated numerically.

It follows:

(A1-17) SNRA2erf(12)(C+1)A2+A(DD)0.52A20.494A3+A2+0.494A30.52A20.99A3+A2A=101.58

This number is plotted as the large-tau approximation in Figure 4C. For large A, we find SNRA.

Even windows

As argued in the main text, for even windows, the weight change Δwij contains no information about the order of events because Δwij=Δwji. This can be seen from Equation A1-1 of Appendix 1. The SNR is zero for purely even windows because the signal is zero. Nonetheless, we can calculate the variance of the weight change. To do so, we collect all terms of Δwij2 for even windows (indicated by the bottom symbol of all occurences of ‘±’ and ‘’ in the previous section). Again, we assume wide windows (κ).

Collecting the terms (1.i) to (4.i) yields

A44[1-erf(Tij2σ)]2+2A44[1-erf2(Tij2σ)]+A44[1+erf(Tij2σ)]2=A4.

Similarly, we sum the terms (1.ii) to (4.ii):

A34[1-2erf(Tij2σ)+CA]+2A34(1-CA)+A34[1+2erf(Tij2σ)+CA]=A3.

We continue to collect the contributions (1.iii) to (4.iii):

A34[1erf(Tij2σ)]+DA2+A34[1erf(Tij2σ)]DA2+A34[1+erf(Tij2σ)]+DA2+A34[1+erf(Tij2σ)]DA2=A3

Finally, summing (1.iv) to (4.iv) yields the same result as for the odd window: A2.

Overall,

1λ2Δwij2=1λ2Δwji2=A4+2A3+A2.

Together with

1λΔwij=1λΔwji=A2,

the variance reads:

1λ2var(Δwij)=1λ2var(Δwji)=2A3+A2.

We now insert these variances in the denominator of Equation A1-1:

1λ[std(Δwij)+std(Δwji)]=22A3+A2,

which, assuming μ=λ, is twice the noise as for odd windows (2A3+A2, Equation A1-16).

In summary, for a complex learning window with even and odd contributions, the signal solely depends on the odd part, whereas both parts, even and odd, contribute to the noise. Any even contribution thus only decreases the SNR.

Appendix 2

Calculating SNR for learning windows of arbitrary width

We again consider odd learning windows of the shape

(A2-1) Wodd(s)=μ{+exp(s/τ),s0exp(+s/τ),s<0.

As in the case of wide learning windows, we again consider the second moment of the weight change (similar to Equation A1-4 of Appendix 1):

(A2-2) Δwij2(t)μ2=1μ2dtdudsdvWodd(s)Wodd(v)F(t,s,u,v)=dtdu{0ds0dvexp(s/τ)exp(v/τ)F(t,s,u,v)0ds0dvexp(s/τ)exp(v/τ)F(t,s,u,v)0ds0dvexp(s/τ)exp(v/τ)F(t,s,u,v)+0ds0dvexp(s/τ)exp(v/τ)F(t,s,u,v)}

We write F similarly as before, neglecting the theta modulation of the firing rate:

(A2-3) F(t,s,u,v)=fi(t+s)fi(u+v)fj(t)fj(u)+fi(t+s)fi(u+v)fj(t)δ(tu)+fi(t+s)δ(t+suv)fj(t)fj(u)+fi(t+s)δ(t+suv)fj(t)δ(tu)

with

(A2-4) fi(t)=A2πσexp(-t22σ2)

and

(A2-5) fj(t)=A2πσexp(-(t-Tij)22σ2).

We label the addends of Equation A2-2 as {1,2,3,4} and the addends of Equation A2-3 as {i,ii,iii,iv}. In evaluating the second moment of the weight change, we realize that many integrands have similar forms, that is, products of exponentials, error functions, and delta functions. Consequently, we will first state the integral identities we use, and will then explicitly derive the term (2.iii) as an example. The other terms can be evaluated in a similar manner.

Integral identities

For the evaluation of the second moment of the weight change, many integrands consist of exponential functions containing linear and squared terms. To tackle these integrals, we use Albano et al., 2011

(A2-6) aexp[q2x2px]dx=π2qexp[p24q2][1erf(p+2aq22q)]with q>0,a>0,p>0.

The second recurring form of integrals is

(A2-7) dt exp[a2(t2+bt)]erf(a{t+c})with a>0 and b,cR.

Substituting

x:=a(t+c)dx=adtt:=cxat2=c2+2cax+x2a2,

we can rewrite

(A2-8) -dtexp[-a2(t2+bt)]erf(-a{t+c})=1aexp[a2b24]-dxexp[-(x+a{c-b2})2]erf[x].

We now use an integral identity by Ng and Geller, 1969 (their section 4.3, eq. 13):

(A2-9) dx erf(x)exp[(px+q)2]dx=πperf[qp2+1],

which yields the desired solution (p=1,q=a{cb/2}):

(A2-10) -dtexp[-a2(t2+bt)]erf(-a{t+c})=-πaexp[a2b24]erf[a2(c-b2)].

Example: deriving the term (2.iii)

For the term (2.iii), we have

(A2-11) (2.iii)=dtdu0ds0dv exp[sτ]exp[vτ]fi(t+s)δ(t+suv)fj(t)fj(u)=(A2πσ)3dt exp[(tTij)22σ2]exp[t22σ2]du exp[(uTij)22σ2]0ds exp[12σ2(s2+2s{t+σ2τ})]0dv exp[vτ]δ(t+suv)

When applying the sifting property of the Dirac delta function, we note that the integral over v is nonzero for -<t+s-u0, that is, for t+su. Thus we have:

(A2-12) (2.iii)=(A2πσ)3dt exp[(tTij)22σ2]exp[t22σ2]0dst+sdu exp[(uTij)22σ2]exp[12σ2(s2+2s{t+σ2τ})]exp[t+suτ]=(A2πσ)3exp[Tij2σ2]dt exp[12σ2(2t22t{Tij+σ2τ})]0ds exp[12σ2(s2+2ts)]t+sdu exp[12σ2(u22u{Tijσ2τ})]

The integral over u can be evaluated by using Equation A2-6, which yields:

(A2-13) (2.iii)=A34πσ2exp[σ22τ2Tij22σ2Tijτ]dt exp[12σ2(2t22t{Tij+σ2τ})]0ds exp[12σ2(s2+2ts)][1erf(12σ{t+sTij+σ2τ})]

The second part of the integral over s (involving the error function) will be solved numerically. For this purpose, we define D2 as:

(A2-14) D2:=1πσ2dt exp[12σ2(2t22t{Tij+σ2τ})]0ds exp[12σ2(s2+2ts)]erf(12σ{t+sTij+σ2τ})

For the first part of the integral over s in Equation A2-13, we again use Equation A2-6, which results in:

(A2-15) (2.iii)=A342πσexp[σ22τ2Tij22σ2Tijτ]dt exp[12σ2(t22t{Tij+σ2τ})][1+erf(t2σ)]+A34exp[σ22τ2Tij22σ2Tijτ]D2

The first part of the integral over t can be solved by applying Equation A2-6 in the limit of a-. For the second part we use Equation A2-10.

(A2-16) (2.iii)=A34exp[σ2τ2][1erf(12σ{Tij+σ2τ})]+A34exp[σ22τ2Tij22σ2Tijτ]D2

We now observe that defining D2 in the following way:

(A2-17) D2=1πσ2dtexp[12σ2(t{Tij+σ2τ})2]0ds exp[12σ2(s+t)2]erf[12σ(t+s{Tijσ2τ})]

allows us to write (2.iii) as:

(A2-18) (2.iii)=A34exp[σ2τ2][1erf(12σ{Tij+σ2τ})D2]

Addends of the second moment

By similar logic, all four addends of Equation A2-2 (with four parts each) can be obtained. We list the results here:

First Addend

(A2-19)(1.i)=A44exp[2(σ2τ2+Tijτ)][1erf(12σ{Tij+2σ2τ})]2(A2-20)(2.i)=A44exp[2σ2τ2][1erf(12σ{Tij+2σ2τ})][1+erf(12σ{Tij2σ2τ})]
(A2-21) (3.i)=-A44exp[2σ2τ2][1-erf(12σ{Tij+2σ2τ})][1+erf(12σ{Tij-2σ2τ})]
(A2-22) (4.i)=A44exp[2(σ2τ2-Tijτ)][1+erf(12σ{Tij-2σ2τ})]2

Second Addend

(A2-23)(1.ii)=A34exp[3σ2τ2+2Tijτ][12erf(12σ{Tij+3σ2τ})+C1](A2-24)(2.ii)=A34exp[σ2τ2][1erf(12σ{Tij+σ2τ})+erf(12σ{Tijσ2τ})C2]
(A2-25)(3.ii)=A34exp[σ2τ2][1erf(12σ{Tij+σ2τ})+erf(12σ{Tijσ2τ})C2](A2-26)(4.ii)=A34exp[3σ2τ22Tijτ][1+2erf(12σ{Tij3σ2τ})+C4]

with the integral terms:

(A2-27) C1=12πσdt exp[12σ2(t{Tij+2σ2τ})2]erf2[12σ{t+σ2τ}]
(A2-28) C2=12πσdt exp[12σ2(tTij)2]erf[12σ{t+σ2τ}]erf[12σ{tσ2τ}]
(A2-29) C4=12πσdt exp[12σ2(t{Tij2σ2τ})2]erf2[12σ{tσ2τ}]

Third Addend

(A2-30)(1.iii)=A34exp[3σ2τ2+2Tijτ][1erf(12σ{Tij+3σ2τ})+D1](A2-31)(2.iii)=A34exp[σ2τ2][1erf(12σ{Tij+σ2τ})D2]
(A2-32)(3.iii)=A34exp[σ2τ2][1+erf(12σ{Tijσ2τ})+D3](A2-33)(4.iii)=A34exp[3σ2τ22Tijτ][1+erf(12σ{Tij3σ2τ})D4]

with the integral terms:

(A2-34) D1=1πσ2dt exp[12σ2(t{Tij+σ2τ})2]0ds exp[12σ2(s+{t+2σ2τ})2]erf[12σ(t+s{Tij+σ2τ})]
(A2-35) D2=1πσ2dt exp[12σ2(t{Tij+σ2τ})2]0ds exp[12σ2(s+t)2]erf[12σ(t+s{Tijσ2τ})]
(A2-36) D3=1πσ2dt exp[12σ2(t{Tijσ2τ})2]0ds exp[12σ2(s+t)2]erf[12σ(t+s{Tij+σ2τ})]
(A2-37) D4=1πσ2dt exp[12σ2(t{Tijσ2τ})2]0ds exp[12σ2(s+{t2σ2τ})2]erf[12σ(t+s{Tijσ2τ})]

Fourth Addend

(A2-38) (1.iv)=A22exp[4σ2τ2+2Tijτ][1erf(12σ{Tij+4σ2τ})]
(A2-39) (2.iv)=(3.iv)=0
(A2-40) (4.iv)=A22exp[4σ2τ2-2Tijτ][1+erf(12σ{Tij-4σ2τ})]

By collecting all 16 terms, we will obtain the average squared weight change. To calculate the variance, we also need the squared average weight change, which we will calculate in the next section.

Average weight change

The average weight change for odd learning windows is given by (cmp. Equation 1 in the main text):

(A2-41) Δwij=dtW(t)Cij(t)=μ0dtexp(tτ)Cij(t)μ0dtexp(tτ)Cij(t)=μ0dtexp(tτ)dtfi(t)fj(t+t)μ0dtexp(tτ)dtfi(t)fj(t+t)

We again neglect the theta modulation of the firing fields. Evaluating the first addend yields:

(A2-42) μ0dtexp(tτ)dtfi(t)fj(t+t)=A2μ2πσ20dtexp(tτ)dtexp(t22σ2)exp((t+tTij)22σ2)=A2μ2πσ20dtexp(tτ)dtexp(t2+2t2+Tij2+2tt2tTij2tTij2σ2)=A2μ2πσ2exp(Tij22σ2)0dt exp(t22(Tijσ2τ)t2σ2)dtexp(t2+(tTij)tσ2)=A2μ2πσexp(Tij22σ2)0dt exp(t22(Tijσ2τ)t2σ2)exp((tTij)24σ2)=A2μ2πσexp(Tij24σ2)0dt exp(t22(Tij2σ2τ)t4σ2)=A2μ2exp(Tij24σ2)exp[(Tij2σ2τ)24σ2][1+erf(Tij2σ2τ2σ)]=A2μ2exp(σ2τ2Tijτ)[1+erf(Tij2σ2τ2σ)]

The second addend can be similarly evaluated:

(A2-43) μ0dtexp(tτ)dtfi(t)fj(t+t)=A2μ2πσexp(Tij24σ2)0dt exp(t22(Tij+2σ2τ)t4σ2)=A2μ2exp(Tij24σ2)exp[(Tij+2σ2τ)24σ2][1erf(Tij+2σ2τ2σ)]=A2μ2exp(σ2τ2+Tijτ)[1erf(Tij+2σ2τ2σ)]

The average weight change thus reads:

(A2-44) 1μΔwij=A22{exp(σ2τ2Tijτ)[1+erf(Tij2σ2τ2σ)]exp(σ2τ2+Tijτ)[1erf(Tij+2σ2τ2σ)]}

Equation A2-44 might show numerical instabilities for small τ. These instabilities can be fixed using the following approximation for the error function proposed by Abramowitz, 1974:

(A2-45) erf(x)1(a1t+a2t2+a3t3)exp(x2),t=11+px,x0,

where p=0.47047, a1=0.3480242, a2=-0.0958798, a3=0.7478556. Along with a new set of variables

x1=12σ(2σ2τTij)x2=12σ(2σ2τ+Tij)0 because Tij0u1=11+px1u1=11px1,u2=11+px2,

the approximation yields

(A2-46) 1μΔwij=A22{exp(Tij24σ2)[a1u1+a2u12+a3u13a1u2a2u22a3u23],x10,x202exp(σ2τ2Tijτ)exp(Tij24σ2)[a1(u1+u2)+a2(u12+u22)+a3(u13+u23)],x1<0

Note that the exponential with the σ2τ2 term vanishes when x10, and only one addend contains the term for x1<0. Therefore, this approximation results in improved numerical stability for small τ. Equation A2-46 is shown in Figure 4A.

Variance and signal-to-noise ratio of the weight change

With all of the above results, we are now ready to state the variance and signal-to-noise ratio of the weight change:

(A2-47) 1μ2var(Δwij)=1μ2(Δwij2Δwij2)=A34{exp[3σ2τ2+2Tijτ][2+C1+D13erf(Tij+3σ2τ2σ)]exp[σ2τ2][42C2D2+D33erf(Tij+σ2τ2σ)+3erf(Tijσ2τ2σ)]+exp[3σ2τ22Tijτ][2+C4D4+3erf(Tij3σ2τ2σ)]} +A22{exp[4σ2τ2+2Tijτ][1erf(12σ{Tij+4σ2τ})]+exp[4σ2τ22Tijτ][1+erf(12σ{Tij4σ2τ})]}

The signal-to-noise ratio is then given by:

(A2-48) SNR=Δwijstd(Δwij)=Δwijvar(Δwij)

Equation A2-48 (with the variance from Equation A2-47 and the mean from Equation A2-44) is shown in Figure 4C. We observe that the analytical solution fits the numerical solution well for τ0.1 s but numerical instabilities cause it to diverge for τ0.1 s.

The numerical instability for τ0.1 s is likely due to a combination of two factors: the exponential terms exp[σ2/τ2] become very large for small tau, and large arguments in the error function cause the terms (1±erf(.)) to be very close to zero. The product of the two is numerically unstable for small tau. Unfortunately, unlike in the case of the average weight change, we did not find an approximation which canceled out these exponential terms in the noise.

Data availability

Code and data are available at https://gitlab.com/e.reifenstein/synaptic-learning-rules-for-sequence-learning (copy archived at https://archive.softwareheritage.org/swh:1:rev:157c347a735a090f591a2b77a71b90d7de65bca5).

References

  1. Book
    1. Abramowitz M
    (1974)
    Handbook of Mathematical Functions, With Formulas, Graphs, and Mathematical Tables
    USA: Dover Publications, Inc.
    1. Albano M
    2. Amdeberhan T
    3. Beyerstedt E
    4. Moll V
    (2011)
    The integrals in Gradshteyn and Ryzhik. Part 19: The error function
    SCIENTIA Series A Mathematical Sciences 21:25–42.
    1. Ng EW
    2. Geller M
    (1969) A table of integrals of the Error functions
    Journal of Research of the National Bureau of Standards, Section B: Mathematical Sciences 73B:1.
    https://doi.org/10.6028/jres.073B.001
  2. Book
    1. Shen E
    2. Wang R
    3. Zhang Z
    (2007) Theta Phase Precession Enhance Single Trial Learning in an STDP Network
    In: Wang R, Shen E, Gu F, editors. Advances in Cognitive Neurodynamics ICCN 2007 . Springer. pp. 109–114.
    https://doi.org/10.1007/978-1-4020-8387-7_21

Decision letter

  1. Martin Vinck
    Reviewing Editor; Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Germany
  2. Joshua I Gold
    Senior Editor; University of Pennsylvania, United States
  3. Francesco P Battaglia
    Reviewer; Donders Institute, Netherlands

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

One of the major challenges of cortical circuits is to learn associations between events that are separated by long time periods, given that spike-timing-dependent plasticity operates on short time scales. In the Hippocampus, a structure critical for memory formation, phase precession is known to compress the sequential activation of place fields to the theta-cycle (~8Hz) time period. Reifenstein et al., describe a simple yet elegant mathematical principle through which theta phase precession contributes to learning the sequential order by which place fields are activated.

Decision letter after peer review:

[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]

Thank you for submitting your work entitled "Synaptic learning rules for sequence learning" for consideration by eLife. Your article has been reviewed by 4 peer reviewers, including Martin Vinck as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by a Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Francesco P Battaglia (Reviewer #2); Frances Chance (Reviewer #4).

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife. However, eLife would welcome a substantially improved manuscript that addresses concerns raised; this would be treated as a new submission, but likely go to the same reviewers.

The reviewers acknowledged that the study addresses an important topic. They also applauded the rigor and elegance of the analytical approach. However, reviewers individually, and in subsequent discussion, expressed the concern that the physiological relevance of the findings is far from clear; this point would require a substantial amount of new simulations and models. They furthermore commented that the generation and storage of sequences remains unclear, again requiring substantial additions to the manuscript. Reviewers therefore recommended that, at present, the manuscript appears to be more suited for a more specialized journal.

Reviewer #1:

This paper develops a model of the way in which phase precession modulates synaptic plasticity. The idea and derivations are simple and easy to follow. The results, while not surprising, are overall interesting and important for researchers on phase precession and sequence learning. There are some useful analytical approximations in the paper. I have several comments:

1. The paper is all based on pairwise STDP.

How robust are these results when we consider perhaps more realistic STDP rules like triplet STDP? Perhaps this is something to discuss or explore, because it is not a priori obvious to me.

2. What are the widths reported in the literature for hippocampus? With all the recent literature on the dependence of STDP in vitro on Ca2+ levels, one has to take this with a grain of salt of course. I would think it's around 100ms which would make the benefit small?

3. The approximation of theta as an oscillator that shows no dampening is of course not realistic; in reality autocorrelation functions will show decreasing sidelobes. It's maybe not a problem, but could actually benefit your model.

4. To say that phase precession benefits sequence learning is maybe not the whole story. It seems that in general long STDP kernels benefit sequence learning for place fields, and they do this equally well for phase or no phase precession. If the STDP kernels are short, sequence learning is more difficult (and requires huge place field overlap), and phase precession is beneficial for that.

How does benefit interact with place field overlap? If the place fields are highly overlapping, then how does STDP kernel size regulate the sequence learning? Are longer STDP kernels invariantly better for sequence learning in the hippocampus? Or does this depend on place field separation. In other words are there are some scenarios where short STDP kernels have a clear benefit and where phase precession then gives a huge boost?

Reviewer #2:

Reifenstein and Kempter propose an analytical formulation for synaptic plasticity dynamics with STDP and phase precession as observed in hippocampal place cells.

The main result is that phase precession increases the slope of the 2-cell cross-correlation around the origin, which is the key driver of plasticity under asymmetric STDP, therefore improving the encoding of sequences in the synaptic matrix.

While the overall concept of phase precession favoring time compression of sequences and plasticity (when combined with STDP) has been present in the literature since the seminal Skaggs and McNaughton, 1996 paper, the novel contribution of this study is the elegant analytical formulation of the effect, which can be very useful to embed this effect into a network model. As a suggestion of a further direction, one could look at models (e.g. Tsodyks et al., Hippocampus, 1996) where asymmetries in synaptic connections are driver for phase precession. One could use this formulation for e.g. seeing how experience may induce phase precessing place field by shaping synaptic connections (maybe starting from a small symmetry breaking term in the initial condition).

The analytical calculation seems crystal clear to me (and quite simple, once one finds the right framework)

Reviewer #3:

The study uses analytical and numerical approaches to quantify conditions in which spike timing-dependent plasticity (STDP) and theta phase precession may promote sequence learning. The strengths of the study are that the question is of general interest and the analytical approach, in so far as it can be applied, is quite rigorous. The weaknesses are that the extent to which the conclusions would hold in more physiological scenarios is not considered, and that the study does not investigate sequences but rather the strength of synaptic connections between sequentially activated neurons.

1. While the stated focus in on sequences, the key results are based on measures of synaptic weight between sequentially activated neurons. Given the claims of the study, a more relevant readout might be generation of sequences by the trained network.

2. The target network appears very simple. Assuming it can generate sequences, it's unclear whether the training rule would function under physiologically relevant conditions. For example, can the network trained in this way store multiple sequences? To what extent do sequences interfere with one another?

3. In a behaving animal movement speed varies considerably, with the consequence that the time taken to cross a place field may vary by an order of magnitude. I think it's important to consider the implications that this might have for the results.

4. Phase precession, STDP and sequence learning have been considered in previous work (e.g. Sato and Yamaguchi, Neural Computation, 2003; Shen et al., Advances in Cognitive Neurodynamics ICCN, 2007; Masquelier et al., J. Neurosci. 2009; Chadwick et al., eLife 2016). These previous approaches differ to various degrees from the present work, but each offers alternative suggestions for how STDP and phase precession could interact during memory. It's not clear what the advantages are of the framework proposed here.

5. While theta sequences are the focus of the introduction, many of the same arguments could be applied to compressed representations during sharp wave ripple events. This may be worth considering. Also, given the model involves excitatory connections between neurons that represent sequences, the relevance here may be more to CA3 were such connectivity is more common, rather than CA1 which is the focus of many of the studies cited in the introduction.

Reviewer #4:

This manuscript argues that phase precession enhances the learning of sequence learning by compressing a slower behavioral sequence, for example movement of an animal through a sequence of place fields, into a faster time scales associated with synaptic plasticity. The authors examine the synaptic weight change between pairs of neurons encoding different events in the behavioral sequence and find that phase precession enhances sequence learning when the learning rule is asymmetric over a relatively narrow time window (assuming the behavioral events encoded by the two neurons overlap, ie the place fields of the neurons overlap). For wider time windows, however, phase precession does not appear to convey any advantage.

I thought the study was interesting – the idea that phase precession "compresses" sequences into theta cycles has been around for a bit, but this is the first study that I've seen that does analysis at this level. I think many researchers who are interested in temporal coding would find the work very interesting.

I did, however, have a little trouble understanding what conclusions the study draws about the brain (if we are supposed to draw any). The authors conclude that phase precession facilitates if the learning window is shorter than a theta cycle – that seems in line with published STDPs rules from slice studies. However, Figure 4 seems to imply that the authors have recovered a 1 second learning window from Bittner's data – are they suggesting that phase precession is not an asset for the learning in that study (or did I miss something)? Are there predictions to be made about how place fields must be spaced for optimal sequence learning?

Also, I'd be curious to know how the authors analysis fits in with replay – is the assumption that neuromodulation is changing the time window or other learning dynamics?

https://doi.org/10.7554/eLife.67171.sa1

Author response

[Editors’ note: the authors resubmitted a revised version of the paper for consideration. What follows is the authors’ response to the first round of review.]

Reviewer #1:

This paper develops a model of the way in which phase precession modulates synaptic plasticity. The idea and derivations are simple and easy to follow. The results, while not surprising, are overall interesting and important for researchers on phase precession and sequence learning. There are some useful analytical approximations in the paper. I have several comments:

1. The paper is all based on pairwise STDP.

How robust are these results when we consider perhaps more realistic STDP rules like triplet STDP? Perhaps this is something to discuss or explore, because it is not a priori obvious to me.

In “pairwise STDP”, pairs of presynaptic and postsynaptic spikes are considered. Conversely, “triplet STDP” considers triplet motifs of spiking (either 2 presynaptic – 1 postsynaptic or 2 postsynaptic – 1 presynaptic). Triplet STDP models allow to account for a number of experimental findings that pairwise STDP fails to reproduce, for example the dependence on the repetition frequency of spike pairs. However, it is unclear whether our results on sequence learning still hold for generic triplet STDP rules.

To investigate the relative weight change (forward weight minus backward weight), we reproduced results like the ones shown in Figure 3 of our manuscript for generic versions of the triplet rule from Pfister and Gerstner, (2006), who fitted triplet rule models to data from the hippocampus. Their model consists of four terms: pairwise potentiation, pairwise depression, triplet potentiation, and triplet depression. To be able to compare triplet STDP models with pairwise STDP models, we first simulated pairwise potentiation and pairwise depression according to the learning rule from Bi and Poo, (1998). The results resembled very much our Figure 3 (see Author response image 1) because the Bi-and-Poo rule is close to the perfectly odd learning rule used for Figures 3 and 4 (see also the new simulations results shown in Figure 4C, for example for the Bi-and-Poo rule). We then added triplet terms with the parameters of the minimal model described in Table 4 (“All-to-All”, “Minimal” model, Pfister and Gerstner, 2006). This “minimal” model, which included only one triplet term to pairwise STDP (the “triplet potentiation term”, i.e., a 1-pre-2-post term), was regarded as the best model in terms of number of free parameters and fitting error. We found that the results were very similar to the pairwise Bi-and-Poo rule (see Author response image 1).

The small difference between pairwise and triplet STDP is probably due to the fact that the time constant for the triplet potentiation term is only 40 ms, which is shorter than the average ISI in our simulations with values typically > 50 ms (minimum average ISIs is 50 ms in the center of the firing field with a peak rate of 20 spikes/s; see, e.g., Figure 2A). This comparison therefore suggests that we can neglect triplet potentiation in our framework because the time constant of the triplet term is low enough.

Author response image 1
Comparison of pairwise and triplet STDP.

(A) Average weight change for the pairwise Bi-and-Poo learning rule (circles) and the minimal triplet model from Pfister and Gerstner, 2006 (squares). Bluish symbols represent phase precession, reddish symbols represent phase locking. Note that we have normalized the weight changes here (by the peak of the respective phase-locking curve) because the parameters A+ and A- differ in Bi and Poo (1998) as compared to Pfister and Gerstner, (2006) as they were fitted to different data sets. In Figure 3A in the manuscript we show the un-normalized weight change because we choose the learning-rate parameter 𝜇=1 for mathematical convenience. (B) Benefit of phase precession as defined in the manuscript (equation 6) for the Bi-and-Poo learning rule (circles) and the minimal triplet model (squares). (C) Signal-to-noise ratio for Bi-and-Poo learning rule and the minimal triplet model. Symbols and colors as in (A). We note that the results in B and C are independent of the value of the learning-rate parameter.

We note, however, that we found larger differences (for weight changes, benefit, and SNR) when we used (instead of the “minimal” model) the “All-to-All”, “Full” model from Table 4 in Pfister and Gerstner (2006), which included triplet depression, i.e., a 2-pre-1-post term. This marked difference between “minimal” and “full” models in our simulations was surprising because Pfister and Gerstner, (2006) observed very similar fitting errors. A closer inspection revealed the origin of this difference: the triplet depression term has a time constant τ x = 946 ms, which leads in our simulations to a strong accumulation of the corresponding dynamic variable r2 that keeps track of presynaptic events (equation 2 in Pfister and Gerstner, 2006). This accumulation is particularly relevant in our simulations in which we consider widths of firing fields and time lags between firing fields on the order of one second. On the other hand, the data to which Pfister and Gerstner (2006) fitted their model did not critically rely on such long delays; instead the data were dominated by pairs and triples with time differences on a 10 ms scale. Therefore, the long-time constant of τ x = 946 ms of triplet depression should not be a critical parameter of their model. This fact was recognized by Pfister and Gerstner, (2006), who showed that “minimal” triplet learning rules are almost as good as the “full” ones but have two parameters less. Therefore the “minimal” model was regarded as the best model. Because this “minimal” model did not change the outcome in our sequence learning paradigm, we conclude that our results obtained for pairwise STDP are robust with respect to effects originating from triplets of spikes.

To indicate in the manuscript that our results are robust when triplet STDP is considered, we have added simulation results of the “minimal” triplet rule from Pfister and Gerstner, (2006) in Figure 4C and have included a paragraph on triplet rules in the Discussion.

2. What are the widths reported in the literature for hippocampus? With all the recent literature on the dependence of STDP in vitro on Ca2+ levels, one has to take this with a grain of salt of course. I would think it's around 100ms which would make the benefit small?

The reported time constants in the literature for hippocampus are on the order of 15-30 ms (e.g. Abbott and Nelson, 2000; Bi and Poo, 2001; Wittenberg and Wang, 2006; Inglebert et al., 2020). In this case, the benefit is large (Figure 4B), and the “width” of a learning window, i.e., the range of the time interval in which weights are affected, appears to be in the range of 100 ms (see e.g. Figure 1 in Bi and Poo, 2001).

We agree with the reviewer that STDP depends on the Ca2+ level, as recent studies show (e.g. Inglebert et al., 2020). However, the total width of the STDP kernels rarely exceeds 100 ms — corresponding to ~50 ms for each lobe, positive and negative time lags. These widths are in line with the reported time constants of 15-30 ms mentioned above.

To create a stronger connection to the STDP literature, we added and discussed the following references to the manuscript: Froemke et al., 2005; Wittenberg and Wang, 2006; Inglebert et al., 2020. Furthermore, we estimated the SNR of the synaptic weight change for a number of experimental STDP kernels and included the results in Figure 4C, as a comparison to our theoretical results.

3. The approximation of theta as an oscillator that shows no dampening is of course not realistic; in reality autocorrelation functions will show decreasing sidelobes. It's maybe not a problem, but could actually benefit your model.

The reviewer is right in that we do not explicitly model dampening sidelobes in the spiking autocorrelation function. For narrow STDP windows (K≪ 1/ω ≪ σ ), however, dampening sidelobes would have no effect on the synaptic weight change because only the slope of the cross-correlation function around zero-time lag matters (Figure 2C,D and Equation 3 in the manuscript). Also for wide STDP windows (K≫ σ ), dampening sidelobes of the theta modulation would not cause a difference because we show (e.g. in Figure 4 for τ > 0.3 s) that “phase precession” and “phase locking” are basically identical to the case of no theta. In the Discussion (section “Key features of phase precession for temporal order-learning: generalization to non-periodic modulation of activity”) we even mention a scenario in which temporal-order learning could benefit from spike statistics similar to phase precession but in the absence of any periodic modulation.

Taken together, we think that the shape of the theta modulation is not a critical model assumption. To better emphasize this, we have added a brief note on irrelevant features of the autocorrelation already at the end of the paragraph following Equation 3.

4. To say that phase precession benefits sequence learning is maybe not the whole story. It seems that in general long STDP kernels benefit sequence learning for place fields, and they do this equally well for phase or no phase precession. If the STDP kernels are short, sequence learning is more difficult (and requires huge place field overlap), and phase precession is beneficial for that.

Indeed, phase precession can benefit temporal-order learning for short STDP kernels and overlapping firing fields, and this is one of our main findings. To make sure that this gets not (erroneously) generalized to wide STDP kernels, we checked our wording throughout the manuscript to be clear about the “short STDP kernels” condition. As a result, we added the words “for short synaptic learning windows” to the abstract.

On the other hand, wide STDP kernels also can facilitate temporal-order learning, but only if they are sufficiently asymmetric. Any symmetric component of the STDP kernel disturbs temporal-order learning, as we exemplify in Figure 4C by applying the learning window from Bittner et al., (2017) to our framework. We further fully agree with the reviewer that the temporal fine structure of spiking (phase precession vs. phase locking) becomes irrelevant for wide STDP kernels. For short kernels, however, phase precession makes all the difference (Figures 3 and 4).

To further clarify what we mean by “sequence learning”, we defined the terms “sequence learning” (in the Results, below the first equation) and “temporal-order learning” (in the Introduction) and replaced the more general term “sequence learning” by “temporal-order learning” at many suitable places in the manuscript.

How does benefit interact with place field overlap?

The larger the overlap, the stronger the benefit. This is shown in Figure 3B and described analytically in Equation 7 where we relate the benefit to the field separation Tij (which is the inverse of the overlap). We improved the text below Equation 7 to more strongly emphasize the relationship between the field separation Tij and the field overlap. We additionally made sure to include the field overlap in the summary sentence of the paragraph describing Figure 3B.

If the place fields are highly overlapping, then how does STDP kernel size regulate the sequence learning?

For overlapping fields (Tij = 𝜎), we show in Figure 4 that weight change (Figure 4A) and SNR (Figure 4C) increase with increasing STDP kernel width. We note that these results crucially depend on the STDP kernel being perfectly asymmetric. For learning windows with a symmetric component (dots in Figure 4C) the SNR is much lower and decreases for large widths. To better emphasize this in the manuscript, we included in Figure 4C further experimentally observed learning windows (for more details, see also our response to the next question by the reviewer).

Are longer STDP kernels invariantly better for sequence learning in the hippocampus?

Long STDP kernels are not invariantly better for temporal-order learning because it depends on the symmetry of the STDP kernel. A symmetric component of the STDP kernel reduces temporal-order learning — as we exemplify in Figure 4C by applying the very long (in the order of one second) and mostly symmetric learning window from Bittner et al. (2017) to our framework. Additionally, in the updated version of Figure 4C, we added several experimentally found learning windows (which also had symmetric components) from other studies (dots). SNRs were again below the blue line, which indicates the maximal SNR for purely asymmetric windows and phase precession. The most extreme case, i.e., a long (τ ≫ σ ) and purely asymmetric STDP kernels would yield the largest weight change and largest SNR but, in this scenario, phase precession would not alter the result compared to phase locking or no theta (as we show in Figure 4 A,B,C). However, long and predominantly asymmetric STDP windows have — to date and to the best of our knowledge — not been experimentally observed.

Or does this depend on place field separation.

For longer, asymmetric STDP kernels, the dependence of the weight change on place-field separation is given by Equation 9 and illustrated in Figure 5D: with increasing T​​ij, the weight change quickly increases, reaches a maximum, and then slowly decreases. Figure 5E shows that the SNR increases with increasing T​ ​ij, but quickly settles on a constant value. We again note that weight change and SNR would be lower for learning windows with a symmetric component.

In other words are there are some scenarios where short STDP kernels have a clear benefit and where phase precession then gives a huge boost?

We assume that the reviewer means “benefit” as we use the term in the manuscript, i.e., the benefit of phase precession over phase locking (instead of short vs. long STDP kernels, which we discussed above). In that case, Figure 4B shows that short (τ < 100 ms) STDP kernels generate a clear benefit, i.e., phase precession leads to clearly larger weight changes than phase locking (Figure 4A); this result is also supported by Equation 7 (solid black line in Figure 4B), which shows how the benefit depends on all parameters of the model. The benefit is the stronger the smaller τ is. This effect is confirmed by the signal-to-noise ratio analysis in Figure 4C.

We believe that the last set of reviewer’s questions suggests that we should improve the manuscript to clarify the distinction between symmetric/asymmetric (mathematically even/odd) STDP kernels in the manuscript. We did so in the Results, and we also used more informative headlines for the subsections in the results.

Reviewer #2:

Reifenstein and Kempter propose an analytical formulation for synaptic plasticity dynamics with STDP and phase precession as observed in hippocampal place cells.

The main result is that phase precession increases the slope of the 2-cell cross-correlation around the origin, which is the key driver of plasticity under asymmetric STDP, therefore improving the encoding of sequences in the synaptic matrix.

While the overall concept of phase precession favoring time compression of sequences and plasticity (when combined with STDP) has been present in the literature since the seminal Skaggs and McNaughton 1996 paper, the novel contribution of this study is the elegant analytical formulation of the effect, which can be very useful to embed this effect into a network model.

We thank the reviewer for this very positive assessment of our work and particularly for pointing out the appeal of the analytical approach.

As a suggestion of a further direction, one could look at models (eg Tsodyks et al., Hippocampus 1996) where asymmetries in synaptic connections are driver for phase precession. One could use this formulation for eg seeing how experience may induce phase precessing place field by shaping synaptic connections (maybe starting from a small symmetry breaking term in the initial condition).

We thank the reviewer for this interesting direction to extend our work. In the current manuscript, we assume that asymmetries in synaptic connections do not generate phase precession (in contrast to Tsodyks et al., 1996). We even assume, for simplicity of the analytical treatment, that recurrent connections do not affect the dynamics. We thus hypothesize that phase precession is not generated by the local, recurrent network; instead, phase precession is inherited or generated locally by a cellular/synaptic mechanism. After experience, the resulting asymmetric connections could indeed also generate phase precession (as demonstrated by the simulations by Tsodyks et al., 1996), and this phase precession could then even be similar to the one that has initially helped to shape synaptic connections. Finally, inherited or local cellularly/synaptically-generated phase precession and locally network-generated phase precession could interact (as reviewed, for example in Jaramillo and Kempter, 2017). We added this important line of thought to the Discussion (new section “Model assumptions”) of our manuscript.

The analytical calculation seems crystal clear to me (and quite simple, once one finds the right framework)

We thank the reviewer for this encouraging comment.

Reviewer #3:

The study uses analytical and numerical approaches to quantify conditions in which spike timing-dependent plasticity (STDP) and theta phase precession may promote sequence learning. The strengths of the study are that the question is of general interest and the analytical approach, in so far as it can be applied, is quite rigorous.

We thank the reviewer for these positive comments on general interest and thorough analysis.

The weaknesses are that the extent to which the conclusions would hold in more physiological scenarios is not considered, and that the study does not investigate sequences but rather the strength of synaptic connections between sequentially activated neurons.

We regret that it became not clear enough how the conclusions of this theoretical work could be applied to more physiological scenarios, and why the predicted changes of synapses have strong implications on the replay of sequences. We try to respond to this general critique in detail below (see points 1-5).

1. While the stated focus in on sequences, the key results are based on measures of synaptic weight between sequentially activated neurons. Given the claims of the study, a more relevant readout might be generation of sequences by the trained network.

We agree that an appropriate readout of the result of learning would be the generation of sequences by a trained network. However, the fundamental basis of replay is a specific connectivity whereas the detailed characteristics of replay also depend on a variety of other parameters that define the neurons and the network. To illustrate the enormous complexity of the relation between the weights in a very simple network and the properties of replay, we refer the reviewer, for example, to Cheng, (2013) or Chenkov et al., (2017). Also the reviewer’s suggestions below (Sato and Yamaguchi, 2003; Shen et al., 2007) offer insights into the challenges of network simulations that include replay. To repeat such network simulations would be way beyond the scope of our manuscript, which tries to reveal the intricate relation between plasticity and weight changes. Because replay is indeed an important readout, we nevertheless thoroughly linked our work to the literature on the generation of sequences in trained networks; for example, we now also discuss the work by Gauy et al., (2018), Malerba and Bazhenov, (2019), and Gillett et al., (2020). Additionally, we have extensively discussed models of replay in the Discussion.

To better state our aims, to weaken our claims, and to scale down (possibly wrong) expectations, we added at end of the first paragraph of the Results (where we first mention “replay”) a remark on the scope of this work: “We note, however, that in what follows we do not simulate such a replay of sequences, which would depend also on a vast number of parameters that define the network; instead, we focus on the underlying changes in connectivity, which is the very basis of replay, and draw connections to replay in the Discussion.”

On the other hand, we note that we have removed the paragraph on replay speed because we felt that the numbers used for its estimation were questionable: (i) the delay of 10ms for the propagation of activity from one assembly to the next in Chenkov et al., (2017) might depend on the specific choice of parameters and (ii) the estimated spatial width of a place field (here 0.09m) is realistic but arbitrary. Much larger place fields exist. Therefore, the two numbers that are the basis for the estimation of the replay speed are variable and the replay speed (ratio of the two) might vary strongly.

2. The target network appears very simple. Assuming it can generate sequences, it's unclear whether the training rule would function under physiologically relevant conditions. For example, can the network trained in this way store multiple sequences? To what extent do sequences interfere with one another?

We agree with the reviewer that the anatomy of the target network appears very simple. However, the problem of evaluating weight changes in dependence upon phase precession, place field properties, and STDP parameters is quite complex. Though it is unclear whether physiologically relevant conditions can ever be achieved in a computational model (e.g., Almog and Korngren, 2016, J Neurophysiol), our model attempts to carefully reflect the essence of biological reality, and thus we consider our parameter settings as physiologically realistic as possible. Other theoretical work has demonstrated that multiple sequences can be stored in a network and that the memory capacity for sequences can be large (e.g. Leibold and Kempter, 2006; Trengove et al., 2013; Chenkov et al., 2017). We note that the corresponding network simulations are usually quite involved and typically depend on a vast number of parameters. We thus think that these kinds of network simulations are well beyond the scope of the current study.

We nevertheless address the topics of replay in general and multiple (and possibly overlapping) sequences in particular in the Discussion (sections “Other mechanisms and models for sequence learning” and “Replay of sequences and storage or multiple and overlapping sequences”), and now have added a note on storing multiple sequences and memory capacity.

3. In a behaving animal movement speed varies considerably, with the consequence that the time taken to cross a place field may vary by an order of magnitude. I think it's important to consider the implications that this might have for the results.

Running speed indeed affects field size and field distance (when measured in units of time). Because our theory investigates plasticity in dependence upon field size (which we quantify in units of time) and field separation (also in units of time, Figures 3 and 5; equations 4, 5 and 7; as well as the derivation of the maximal benefit in the paragraph after equation 7), our results include variations in running speed.

To make all this more explicit, we clarified in the manuscript (in the second paragraph after Equation 3, starting with “As a generic example.…”) that field width and field separation are measured in units of time (and not in units of length).

4. Phase precession, STDP and sequence learning have been considered in previous work (e.g. Sato and Yamaguchi, Neural Computation, 2003; Shen et al., Advances in Cognitive Neurodynamics ICCN, 2007; Masquelier et al., J. Neurosci. 2009; Chadwick et al., eLife 2016). These previous approaches differ to various degrees from the present work, but each offers alternative suggestions for how STDP and phase precession could interact during memory. It's not clear what the advantages are of the framework proposed here.

We thank the reviewer for the hint to these references and have included all of them in our manuscript. In particular, Sato and Yamaguchi, (2003) add a valuable contribution, investigating phase precession and STDP in a network of coupled phase-oscillators — a clear and rewarding approach, yet somewhat detached from biology.

We would like to point out that our formulation of phase precession and STDP intends to reflect the biological reality as close as possible. All individual components like phase precession, STDP, and place fields have been experimentally described. This is in contrast, to, e.g., phase precession in interneurons, as assumed by Chadwick et al., 2016. Compared to the cited new references, a major advantage of our approach is the analytical tractability. Our mathematical treatment of the problem yields a clear description of parameter dependencies — in contrast to, e.g., Shen et al., 2007 who investigate only one example of a small-network simulation and thus cannot predict the dependence of learning upon the various parameters.

Finally, Masquelier et al., (2009) offer an alternative approach to learning using phase coding and STDP, and Chadwick et al., (2016) nicely explain generation of phase precession via recurrent networks.

5. While theta sequences are the focus of the introduction, many of the same arguments could be applied to compressed representations during sharp wave ripple events. This may be worth considering. Also, given the model involves excitatory connections between neurons that represent sequences, the relevance here may be more to CA3 were such connectivity is more common, rather than CA1 which is the focus of many of the studies cited in the introduction.

The place field width is also in seconds, we now point that out clearer (in the second paragraph after Equation 3, starting with “As a generic example.…”).

Reviewer #4:

This manuscript argues that phase precession enhances the learning of sequence learning by compressing a slower behavioral sequence, for example movement of an animal through a sequence of place fields, into a faster time scales associated with synaptic plasticity. The authors examine the synaptic weight change between pairs of neurons encoding different events in the behavioral sequence and find that phase precession enhances sequence learning when the learning rule is asymmetric over a relatively narrow time window (assuming the behavioral events encoded by the two neurons overlap, ie the place fields of the neurons overlap). For wider time windows, however, phase precession does not appear to convey any advantage.

I thought the study was interesting – the idea that phase precession "compresses" sequences into theta cycles has been around for a bit, but this is the first study that I've seen that does analysis at this level. I think many researchers who are interested in temporal coding would find the work very interesting.

We thank the reviewer for the very positive comments.

I did, however, have a little trouble understanding what conclusions the study draws about the brain (if we are supposed to draw any). The authors conclude that phase precession facilitates if the learning window is shorter than a theta cycle – that seems in line with published STDPs rules from slice studies. However, Figure 4 seems to imply that the authors have recovered a 1 second learning window from Bittner's data – are they suggesting that phase precession is not an asset for the learning in that study (or did I miss something)?

Correct. In Figure 4C we compare the signal-to-noise ratio (SNR) of the weight change as a function of the width of the learning window. For asymmetric windows (colored solid lines), phase precession is helpful for temporal-order learning, but only for narrow (compared, e.g., to the theta oscillation cycle) learning windows. Interestingly, the SNR is increasing and saturates for larger widths of the learning window, which seems to suggest that very wide learning windows are optimal for temporal-order learning. To indicate that this behavior critically depends on the symmetry of the learning window, we included in this graph the SNR obtained with the learning window from Bittner’s data (dot marked “Bittner et al., (2017)” in the updated version of Figure 4), which has a strong symmetric component. In this case, the SNR is much lower than the SNR of an asymmetric window of the same width. Even though the Bittner window has a low SNR for temporal-order learning, it may still be useful for other tasks, for example place-field formation. The second-long learning rule seems to serve this purpose well. For temporal-order memory, however, it is not suited due to its strong symmetric component.

To indicate that there are published STDP rules that are more useful for temporal-order learning, we now include in the same graph (new Figure 4C) the SNR for several other experimentally observed learning windows. These windows have strong asymmetric components and - depending on the proportion of symmetric and asymmetric parts - can reach SNR values close to the theoretical prediction for perfectly asymmetric windows. This graph (and many other results presented in our study) suggests that phase precession in combination with experimentally determined narrow asymmetric learning windows could be a mechanism supporting temporal-order learning in hippocampal networks. This conclusion is also summarized in a similar way in the abstract.

Are there predictions to be made about how place fields must be spaced for optimal sequence learning?

Yes, for example Figure 3 predicts that for temporal-order learning with narrow learning windows there is an optimal overlap between firing fields. For asymmetric STDP windows, the maximum weight change (Figure 3A) and the maximum SNR (Figure 3C) are achieved for partially overlapping firing fields where the overlap Tij is in the range of the width σ of the firing fields. On the other hand, wide STDP windows support temporal-order learning only if they are largely asymmetric (Figure 4 and Figure 5) but in this case phase precession is not beneficial.

Also, I'd be curious to know how the authors analysis fits in with replay – is the assumption that neuromodulation is changing the time window or other learning dynamics?

A key assumption underlying our work is that neuromodulation affects the plasticity and the strength of synapses (see, e.g., the sections “Model assumptions” and “Width, shape, and symmetry of the STDP window…” in the Discussion). For example, acetylcholine (among other neuromodulators) seems to play a particular role by differentially modulating the distinct phases of memory encoding and memory consolidation. In our work we follow the idea [proposed by Hasselmo, (1999) and supported by many later studies] that during encoding excitatory feedback connections (and replay) are suppressed to avoid interference from previously stored information, but that the same synapses in this phase are highly plastic in order to store sequences; this may be mediated by high levels of acetylcholine. On the other hand, during slow-wave sleep, when acetylcholine levels are low and synapses are strong but less plastic, a sequence imprinted in recurrent synaptic weights can be replayed without having a too strong impact on the change of recurrent synaptic weights.

We have mentioned all these ideas on differential modulation of synapses and replay at various points in the manuscript. To better outline and summarize these important points in our manuscript, we have thoroughly revised and extended the Discussion.

https://doi.org/10.7554/eLife.67171.sa2

Article and author information

Author details

  1. Eric Torsten Reifenstein

    1. Institute for Theoretical Biology, Department of Biology, Humboldt-Universität zu Berlin, Berlin, Germany
    2. Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany
    Contribution
    Conceptualization, Software, Formal analysis, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    eric@bccn-berlin.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6898-0178
  2. Ikhwan Bin Khalid

    Institute for Theoretical Biology, Department of Biology, Humboldt-Universität zu Berlin, Berlin, Germany
    Contribution
    Validation, Investigation, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1783-2834
  3. Richard Kempter

    1. Institute for Theoretical Biology, Department of Biology, Humboldt-Universität zu Berlin, Berlin, Germany
    2. Bernstein Center for Computational Neuroscience Berlin, Berlin, Germany
    3. Einstein Center for Neurosciences Berlin, Berlin, Germany
    Contribution
    Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5344-2983

Funding

Deutsche Forschungsgemeinschaft (01GQ1705)

  • Richard Kempter

Deutsche Forschungsgemeinschaft (GRK 1589/2)

  • Richard Kempter

Deutsche Forschungsgemeinschaft (SPP 1665)

  • Richard Kempter

Deutsche Forschungsgemeinschaft (SFB 1315)

  • Richard Kempter

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation; Grants GRK 1589/2, SPP 1665, SFB 1315 - project-ID 327654276) and the German Federal Ministry for Education and Research (BMBF; Grant 01GQ1705). We thank Lukas Kunz, Natalie Schieferstein, Tiziano D’Albis, Paul Pfeiffer, and Adam Wilkins for helpful discussions and feedback on the manuscript. ETR and RK designed the research. ETR, IBK, and RK performed the research, wrote and discussed the manuscript. ETR, IBK, and RK declare no conflict of interest.

Senior Editor

  1. Joshua I Gold, University of Pennsylvania, United States

Reviewing Editor

  1. Martin Vinck, Ernst Strüngmann Institute (ESI) for Neuroscience in Cooperation with Max Planck Society, Germany

Reviewer

  1. Francesco P Battaglia, Donders Institute, Netherlands

Version history

  1. Received: February 3, 2021
  2. Accepted: March 31, 2021
  3. Accepted Manuscript published: April 16, 2021 (version 1)
  4. Version of Record published: June 3, 2021 (version 2)

Copyright

© 2021, Reifenstein et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,058
    Page views
  • 293
    Downloads
  • 12
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Eric Torsten Reifenstein
  2. Ikhwan Bin Khalid
  3. Richard Kempter
(2021)
Synaptic learning rules for sequence learning
eLife 10:e67171.
https://doi.org/10.7554/eLife.67171

Further reading

    1. Evolutionary Biology
    2. Neuroscience
    Katja Heuer, Nicolas Traut ... Roberto Toro
    Research Article

    The process of brain folding is thought to play an important role in the development and organisation of the cerebrum and the cerebellum. The study of cerebellar folding is challenging due to the small size and abundance of its folia. In consequence, little is known about its anatomical diversity and evolution. We constituted an open collection of histological data from 56 mammalian species and manually segmented the cerebrum and the cerebellum. We developed methods to measure the geometry of cerebellar folia and to estimate the thickness of the molecular layer. We used phylogenetic comparative methods to study the diversity and evolution of cerebellar folding and its relationship with the anatomy of the cerebrum. Our results show that the evolution of cerebellar and cerebral anatomy follows a stabilising selection process. We observed 2 groups of phenotypes changing concertedly through evolution: a group of 'diverse' phenotypes - varying over several orders of magnitude together with body size, and a group of 'stable' phenotypes varying over less than 1 order of magnitude across species. Our analyses confirmed the strong correlation between cerebral and cerebellar volumes across species, and showed in addition that large cerebella are disproportionately more folded than smaller ones. Compared with the extreme variations in cerebellar surface area, folial anatomy and molecular layer thickness varied only slightly, showing a much smaller increase in the larger cerebella. We discuss how these findings could provide new insights into the diversity and evolution of cerebellar folding, the mechanisms of cerebellar and cerebral folding, and their potential influence on the organisation of the brain across species.

    1. Neuroscience
    Amanda J González Segarra, Gina Pontes ... Kristin Scott
    Research Article

    Consumption of food and water is tightly regulated by the nervous system to maintain internal nutrient homeostasis. Although generally considered independently, interactions between hunger and thirst drives are important to coordinate competing needs. In Drosophila, four neurons called the interoceptive subesophageal zone neurons (ISNs) respond to intrinsic hunger and thirst signals to oppositely regulate sucrose and water ingestion. Here, we investigate the neural circuit downstream of the ISNs to examine how ingestion is regulated based on internal needs. Utilizing the recently available fly brain connectome, we find that the ISNs synapse with a novel cell-type bilateral T-shaped neuron (BiT) that projects to neuroendocrine centers. In vivo neural manipulations revealed that BiT oppositely regulates sugar and water ingestion. Neuroendocrine cells downstream of ISNs include several peptide-releasing and peptide-sensing neurons, including insulin producing cells (IPCs), crustacean cardioactive peptide (CCAP) neurons, and CCHamide-2 receptor isoform RA (CCHa2R-RA) neurons. These neurons contribute differentially to ingestion of sugar and water, with IPCs and CCAP neurons oppositely regulating sugar and water ingestion, and CCHa2R-RA neurons modulating only water ingestion. Thus, the decision to consume sugar or water occurs via regulation of a broad peptidergic network that integrates internal signals of nutritional state to generate nutrient-specific ingestion.