1. Neuroscience
Download icon

Predicting non-linear dynamics by stable local learning in a recurrent spiking neural network

  1. Aditya Gilra  Is a corresponding author
  2. Wulfram Gerstner
  1. École Polytechnique Fédérale de Lausanne, Switzerland
Research Article
  • Cited 6
  • Views 2,305
  • Annotations
Cite this article as: eLife 2017;6:e28295 doi: 10.7554/eLife.28295

Abstract

The brain needs to predict how the body reacts to motor commands, but how a network of spiking neurons can learn non-linear body dynamics using local, online and stable learning rules is unclear. Here, we present a supervised learning scheme for the feedforward and recurrent connections in a network of heterogeneous spiking neurons. The error in the output is fed back through fixed random connections with a negative gain, causing the network to follow the desired dynamics. The rule for Feedback-based Online Local Learning Of Weights (FOLLOW) is local in the sense that weight changes depend on the presynaptic activity and the error signal projected onto the postsynaptic neuron. We provide examples of learning linear, non-linear and chaotic dynamics, as well as the dynamics of a two-link arm. Under reasonable approximations, we show, using the Lyapunov method, that FOLLOW learning is uniformly stable, with the error going to zero asymptotically.

https://doi.org/10.7554/eLife.28295.001

Introduction

Over the course of life, we learn many motor tasks such as holding a pen, chopping vegetables, riding a bike or playing tennis. To control and plan such movements, the brain must implicitly or explicitly learn forward models (Conant and Ross Ashby, 1970) that predict how our body responds to neural activity in brain areas known to be involved in motor control (Figure 1A). More precisely, the brain must acquire a representation of the dynamical system formed by our muscles, our body, and the outside world in a format that can be used to plan movements and initiate corrective actions if the desired motor output is not achieved (Pouget and Snyder, 2000; Wolpert and Ghahramani, 2000; Lalazar and Vaadia, 2008). Visual and/or proprioceptive feedback from spontaneous movements during pre-natal (Khazipov et al., 2004) and post-natal development (Petersson et al., 2003) or from voluntary movements during adulthood (Wong et al., 2012; Hilber and Caston, 2001) are important to learn how the body moves in response to neural motor commands (Lalazar and Vaadia, 2008; Wong et al., 2012; Sarlegna and Sainburg, 2009; Dadarlat et al., 2015), and how the world reacts to these movements (Davidson and Wolpert, 2005; Zago et al., 2005, 2009; Friston, 2008). We wondered whether a non-linear dynamical system, such as a forward predictive model of a simplified arm, can be learned and represented in a heterogeneous network of spiking neurons by adjusting the weights of recurrent connections.

Figure 1 with 1 supplement see all
Schematic for learning a forward model.

(A) During learning, random motor commands (motor babbling) cause movements of the arm, and are also sent to the forward predictive model, which must learn to predict the joint angles and velocities (state variables) of the arm. The deviation of the predicted state from the reference state, obtained by visual and proprioceptive feedback, is used to learn the forward predictive model with architecture shown in B. (B) Motor command u is projected onto neurons with random weights elβff. The spike trains of these command representation neurons Slff are sent via plastic feedforward weights wilff into the neurons of the recurrent network having plastic weights wij (plastic weights in red). Readout weights dαi decode the filtered spiking activity of the recurrent network as the predicted state x^α(t). The deviation of the predicted state from the reference state of the reference dynamical system in response to the motor command, is fed back into the recurrent network with error encoding weights keiα. (C) A cartoon depiction of feedforward, recurrent and error currents entering a neuron i in the recurrent network. (D) Spike trains of a few randomly selected neurons of the recurrent network from the non-linear oscillator example are plotted (alternate red and blue colours are for guidance of eye only). A component x^2 of the network output during a period of the oscillator is overlaid on the spike trains to indicate their relation to the output.

https://doi.org/10.7554/eLife.28295.002

Supervised learning of recurrent weights to predict or generate non-linear dynamics, given command input, is known to be difficult in networks of rate units, and even more so in networks of spiking neurons (Abbott et al., 2016). Ideally, in order to be biologically plausible, a learning rule must be online that is constantly incorporating new data, as opposed to batch learning where weights are adjusted only after many examples have been seen; and local that is the quantities that modify the weight of a synapse must be available locally at the synapse as opposed to backpropagation through time (BPTT) (Rumelhart et al., 1986) or real-time recurrent learning (RTRL) (Williams and Zipser, 1989) which are non-local in time or in space, respectively (Pearlmutter, 1995; Jaeger, 2005). Even though Long-Short-Term-Memory (LSTM) units (Hochreiter and Schmidhuber, 1997) avoid the vanishing gradient problem (Bengio et al., 1994; Hochreiter et al., 2001) in recurrent networks, the corresponding learning rules are difficult to interpret biologically.

Our approach toward learning of recurrent spiking networks is situated at the crossroads of reservoir computing (Jaeger, 2001; Maass et al., 2002; Legenstein et al., 2003; Maass and Markram, 2004; Jaeger and Haas, 2004; Joshi and Maass, 2005; Legenstein and Maass, 2007), FORCE learning (Sussillo and Abbott, 2009, 2012; DePasquale et al., 2016; Thalmeier et al., 2016; Nicola and Clopath, 2016), function and dynamics approximation (Funahashi, 1989; Hornik et al., 1989; Girosi and Poggio, 1990; Sanner and Slotine, 1992; Funahashi and Nakamura, 1993; Pouget and Sejnowski, 1997; Chow and Xiao-Dong Li, 2000; Seung et al., 2000; Eliasmith and Anderson, 2004; Eliasmith, 2005) and adaptive control theory (Morse, 1980; Narendra et al., 1980; Slotine and Coetsee, 1986; Weiping Li et al., 1987; Narendra and Annaswamy, 1989; Sastry and Bodson, 1989; Ioannou and Sun, 2012). In contrast to the original reservoir scheme (Jaeger, 2001; Maass et al., 2002) where learning was restricted to the readout connections, we focus on a learning rule for the recurrent connections. Whereas neural network implementations of control theory (Sanner and Slotine, 1992; DeWolf et al., 2016) modified adaptive feedback weights without a synaptically local interpretation, we modify the recurrent weights in a synaptically local manner. Compared to FORCE learning where recurrent synaptic weights have to change rapidly during the initial phase of learning (Sussillo and Abbott, 2009, 2012), we aim for a learning rule that works in the biologically more plausible setting of slow synaptic changes. While previous work has shown that linear dynamical systems can be represented and learned with local online rules in recurrent spiking networks (MacNeil and Eliasmith, 2011; Bourdoukan and Denève, 2015), for non-linear dynamical systems the recurrent weights in spiking networks have typically been computed offline (Eliasmith, 2005).

Here, we propose a scheme for how a recurrently connected network of heterogeneous deterministic spiking neurons may learn to mimic a low-dimensional non-linear dynamical system, with a local and online learning rule. The proposed learning rule is supervised, and requires access to the error in observable outputs. The output errors are fed back with random, but fixed feedback weights. Given a set of fixed error-feedback weights, the learning rule is synaptically local and combines presynaptic activity with the local postsynaptic error variable.

Results

A forward predictive model (Figure 1A) takes, at each time step, a motor command u(t) as input and predicts the next observable state x^(t+Δt) of the system. In the numerical implementation, we consider Δt=1ms, but for the sake of notational simplicity we drop the Δt in the following. The predicted system state x^ (e.g., the vector of joint angles and velocities of the arm) is assumed to be low-dimensional with dimensionality Nd (4-dimensional for a two-link arm). The motor command u(t) is used to generate target movements such as ‘lift your arm to a location’, with a dimensionality Nc of the command typically smaller than the dimensionality Nd of the system state.

The actual state of the reference system (e.g., actual joint angles and velocities of the arm) is described by a non-linear dynamical system, which receives the control input u(t)RNc and evolves according to a set of coupled differential equations

(1) dxα(t)dt=hα(x(t),u(t)),

where x with components xα (where α=1,,Nd) is the vector of observable state variables, and h is a vector whose components are arbitrary non-linear functions hα. For example, the observable system state x(t) could be the joint angles and velocities of the arm deduced from visual and proprioceptive input (Figure 1A). We show that, with training, the forward predictive model learns to make the error

(2) ϵαxα(t)-x^α(t)

between the actual state x(t) and the predicted state x^(t) negligible.

Network architecture for learning the forward predictive model

In our neural network model (Figure 1B), the motor command u(t) drives the spiking activity of a command representation layer of 3000 to 5000 leaky integrate-and-fire neurons via connections with fixed random weights. These neurons project, via plastic feedforward connections, to a recurrent network of also 3000 to 5000 integrate-and-fire neurons. We assume that the predicted state x^ is linearly decoded from the activity of the recurrent network. Denoting the spike train of neuron i by Si(t), the component α of the predicted system state is

(3) x^α(t)=idαi-tSi(s)κ(t-s)dsidαi(Si*κ)(t),

where dαi are the readout weights. The integral represents a convolution with a low-pass filter

(4) κ(t)exp(-t/τs)/τs,

with a time constant τs=20 ms, and is denoted by (Si*κ)(t).

The current into a neuron with index l (l=1,,N), in the command representation layer comprising N neurons, is

(5) Jlff=αelαffuα+blff,

where elαff are fixed random weights, while blff is a neuron-specific constant for bias (see Methods) (Eliasmith and Anderson, 2004). We use Greek letters for the indices of low-dimensional variables (such as command) and Latin letters for neuronal indices, with summations going over the full range of the indices. The number of neurons N in the command representation layer is much larger than the dimensionality of the input, that is NNc.

The input current to a neuron with index i (i=1,,N) in the recurrent network is

(6) Ji=lwilff(Slffκ)(t)+jwij(Sjκ)(t)+αkeiα(ϵακ)(t)+bi,

where wilff and wij are the feedforward and recurrent weights, respectively, which are both subject to our synaptic learning rule, whereas keiα are fixed error feedback weights (see below). The spike trains travelling along the feedforward path Slff and those within the recurrent network Sj are both low-pass filtered (convolution denoted by *) at the synapses with the exponential filter κ defined above. The constant parameter bi is a neuron specific bias (see Methods). The constant k>0 is the gain for feeding back the output error. The number of neurons N in the recurrent network is much larger than the dimensionality Nd of the represented variable x^, that is NNd.

For all numerical simulations, we used deterministic leaky integrate and fire (LIF) neurons. The voltage Vl of each LIF neuron indexed by l, was a low-pass filter of its driving current Jl:

(7) τmdVldt=-Vl+Jl,

with a membrane time constant, of τm=20 ms. The neuron fired when the voltage Vl crossed a threshold θ=1 from below, after which the voltage was reset to zero for a refractory period τr of 2 ms. If the voltage went below zero, it was clipped to zero. Mathematically, the spike trains Slff(t) in the command representation layer and Sl(t) in the recurrent network, are a sequence of events, modelled as a sum of Dirac delta-functions.

Biases and input weights of the spiking neurons vary between one neuron and the next, both in the command representation layer and the recurrent network, yielding different frequency versus input curves for different neurons (Figure 1—figure supplement 1). Since arbitrary low-dimensional functions can be approximated by linear decoding from a basis of non-linear functions (Funahashi, 1989; Girosi and Poggio, 1990; Hornik et al., 1989), such as neuronal tuning curves (Sanner and Slotine, 1992; Seung et al., 2000; Eliasmith and Anderson, 2004), we may expect that suitable feedforward weights onto, and lateral weights within, the recurrent network can be found that approximate the role of the function h in Equation (1). In the next subsection, we propose an error feedback architecture along with a local and online synaptic plasticity rule that can train these feedforward and recurrent weights to approximate this role, while the readout weights are kept fixed, so that the network output mimics the dynamics in Equation (1).

Negative error feedback via auto-encoder enables local learning

To enable weight tuning, we make four assumptions regarding the network architecture. The initial two assumptions are related to input and output. First, we assume that, during the learning phase, a random time-dependent motor command input u(t) is given to both the muscle-body reference system described by Equation (1) and to the spiking network. The random input generates irregular trajectories in the observable state variables, mimicking motor babbling (Meltzoff and Moore, 1997; Petersson et al., 2003). Second, we assume that each component x^α of the output predicted by the spiking network is compared to the actual observable output xα produced by the reference system of Equation (1) and their difference (the output error ϵα; Equation (2)) is calculated, similar to supervised learning schemes such as perceptron learning (Rosenblatt, 1961).

The final two assumptions are related to the error feedback. Our third assumption is that the readout weights dαi have been pre-learned, possibly earlier in development, in the absence of feedforward and recurrent connections, so as to form an auto-encoder of gain k with the fixed random feedback weights keiα. Specifically, an arbitrary value ϵα sent via the error feedback weights to the recurrent network and read out, from its N neurons, via the decoding weights gives back (approximately) kϵα. Thus, we set the decoding weights so as to minimize the squared error between the decoded output and required output kϵ for a set of randomly chosen vectors ϵ while setting feedforward and recurrent weights to zero (see Methods). We used an algorithmic learning scheme here, but we expect that these decoding weights can also be pre-learned by biologically plausible learning schemes (D'Souza et al., 2010; Urbanczik and Senn, 2014; Burbank, 2015).

Fourth, we assumed that the error ϵα=xα-x^α is projected back to neurons in the recurrent network through the above-mentioned fixed random feedback weights. From the third term in Equation (6) and Figure 1B–C, we define a total error input that neuron i receives:

(8) Iiϵkαeiαϵα,

with feedback weights keiα, where k is fixed at a large constant positive value.

The combination of the auto-encoder and the error feedback implies that the output stays close to the reference, as explained now. In open loop that is without connecting the output x^ and the reference x to the error node, an input ϵ to the network generates an output x^=kϵ due to the auto-encoder of gain k. In closed loop, that is with the output and reference connected to the error node (Figure 1B), the error input is ϵ=x-x^, and the network output x^ settles to:

(9) x^=kϵ=k(xx^)x^=kk+1xx,

that is approximately the reference x for large positive k. The fed-back residual error ϵ=x/(k+1) drives the neural activities and thence the network output. Thus, feedback of the error causes the output x^α to approximately follow xα, for each component α, as long as the error feedback time scale is fast compared to the reference dynamical system time scale, analogous to negative error feedback in adaptive control (Narendra and Annaswamy, 1989; Ioannou and Sun, 2012).

While error feedback is on, the synaptic weights wilff and wij on the feedforward and recurrent connections, respectively, are updated as:

(10) w˙ilff=η(Iiϵκϵ)(Slffκ)(t),w˙ij=η(Iiϵκϵ)(Sjκ)(t),

where η is the learning rate (which is either fixed or changes on the slow time scale of minutes), and κϵ is an exponentially decaying filter kernel with a time constant of 80 or 200 ms. For a postsynaptic neuron i, the error term Iiϵ*κϵ is the same for all its synapses, while the presynaptic contribution is synapse-specific.

We call the learning scheme ‘Feedback-based Online Local Learning Of Weights’ (FOLLOW), since the predicted state x^ follows the true state x from the start of learning. Under precise mathematical conditions, we show in the Methods that the FOLLOW scheme converges to a stable solution, while simultaneously deriving the learning rule.

Because of the error feedback, with constant k1, the output is close to the reference from the start of learning. However, initially the error is not exactly zero, and this non-zero error drives the weight updates via Equation (10). After a sufficiently long learning time, a vanishing error (ϵα=0 for all components) indicates that the neuronal network now autonomously generates the desired output, so that feedback is no longer required. In the Methods section, we show that not just the low-dimensional output x^, but also the spike trains Si(t), for i=1,,N, are entrained by the error feedback to be close to the ideal ones required to generate x.

During learning, the error feedback via the auto-encoder in a loop serves two roles: (i) to make the error current available in each neuron, projected correctly, for a local synaptic plasticity rule, and (ii) to drive the spike trains to the target ones for producing the reference output. In other learning schemes for recurrent neural networks, where neural activities are not constrained by error feedback, it is difficult to assign credit or blame for the momentarily observed error, because neural activities from the past affect the present output in a recurrent network. In the FOLLOW scheme, the spike trains are constrained to closely follow the ideal time course throughout learning, so that the present error can be attributed directly to the weights, enabling us to change the weights with a simple perceptron-like learning rule (Rosenblatt, 1961) as in Equation (10), bypassing the credit assignment problem. In the perceptron rule, the weight change Δw(pre)δ is proportional to the presynaptic input (pre) and the error δ. In the FOLLOW learning rule of Equation (10), we can identify (Si*κ) with (pre) and (Iiϵ*κϵ) with δ. In Methods, we derive the learning rule of Equation (10) in a principled way from a stability criterion.

FORCE learning (Sussillo and Abbott, 2009, 2012; DePasquale et al., 2016; Thalmeier et al., 2016; Nicola and Clopath, 2016) also clamps the output and neural activities to be close to ideal during learning, by using weight changes that are faster than the time scale of the dynamics. In our FOLLOW scheme, clamping is achieved via negative error feedback using the auto-encoder, which allows weight changes to be slow and makes the error current available locally in the post-synaptic neuron. Other methods used feedback based on adaptive control for learning in recurrent networks of spiking neurons, but were limited to linear systems (MacNeil and Eliasmith, 2011; Bourdoukan and Denève, 2015), whereas the FOLLOW scheme was derived for non-linear systems (see Methods). Our learning rule of Equation (10) uses an error ϵαxα-x^α in the observable state, rather than an error involving the derivative dxα/dt in Equation (1), as in other schemes (see Appendix 1) (Eliasmith, 2005; MacNeil and Eliasmith, 2011). The reader is referred to Discussion for detailed further comparisons. The FOLLOW learning rule is local since all quantities needed on the right-hand-side of Equation (10) could be available at the location of the synapse in the postsynaptic neuron. For a potential implementation and prediction for error-based synaptic plasticity, and for a critical evaluation of the notion of ‘local rule’, we refer to the Discussion.

Spiking networks learn target dynamics via FOLLOW learning

In order to check whether the FOLLOW scheme would enable the network to learn various dynamical systems, we studied three systems describing a non-linear oscillator (Figure 2), low-dimensional chaos (Figure 3) and simulated arm movements (Figure 4) (additional examples in Figure 2—figure supplement 2, Figure 2—figure supplement 4 and Methods). In all simulations, we started with vanishingly small feedforward and recurrent weights (tabula rasa), but assumed pre-learned readout weights matched to the error feedback weights. For each of the three dynamical systems, we had a learning phase and a testing phase. During each phase, we provided time-varying input to both the network (Figure 1B) and the reference system. During the learning phase, rapidly changing control signals mimicked spontaneous movements (motor babbling) while synaptic weights were updated according to the FOLLOW learning rule Equation (10).

Figure 2 with 5 supplements see all
Learning non-linear dynamics via FOLLOW: the van der Pol oscillator.

(A-C) Control input, output, and error are plotted versus time: before the start of learning; in the first 4 s and last 4 s of learning; and during testing without error feedback (demarcated by the vertical red lines). Weight updating and error current feedback were both turned on after the vertical red line on the left at the start of learning, and turned off after the vertical red line in the middle at the end of learning. (A) Second component of the input u2. (B) Second component of the learned dynamical variable x^2 (red) decoded from the network, and the reference x2 (blue). After the feedback was turned on, the output tracked the reference. The output continued to track the reference approximately, even after the end of the learning phase, when feedback and learning were turned off. The output tracked the reference approximately, even with a very different input (Bii). With higher firing rates, the tracking without feedback improved (Figure 2—figure supplement 1). (C) Second component of the error ϵ2=x2-x^2 between the reference and the output. (Cii) Trajectory (x1(t),x2(t)) in the phase plane for reference (red,magenta) and prediction (blue,cyan) during two different intervals as indicated by and in Bii. (D) Mean squared error per dimension averaged over 4 s blocks, on a log scale, during learning with feedback on. Learning rate was increased by a factor of 20 after 1,000 s to speed up learning (as seen by the sharp drop in error at 1000 s). (E) Histogram of firing rates of neurons in the recurrent network averaged over 0.25 s (interval marked in green in H) when output was fairly constant (mean across neurons was 12.4 Hz). (F) As in E, but averaged over 16 s (mean across neurons was 12.9 Hz). (G) Histogram of weights after learning. A few strong weights |wij|>10 are out of bounds and not shown here. (H) Spike trains of 50 randomly-chosen neurons in the recurrent network (alternating colors for guidance of eye only). (I) Spike trains of H, reverse-sorted by first spike time after 0.5 s, with output component x^2 overlaid for timing comparison.

https://doi.org/10.7554/eLife.28295.004
Figure 3 with 1 supplement see all
Learning chaotic dynamics via FOLLOW: the Lorenz system.

Layout and legend of panels (A-C) are analogous to Figure 2A–C. (D) The trajectories of the reference (left panel) and the learned network (right panel) are shown in state space for 40 s with zero input during the testing phase, forming the well-known Lorenz attractor. (E) Tent map, that is local maximum of the third component of the reference signal (blue)/network output (red) is plotted versus the previous local maximum, for 800 s of testing with zero input. The reference is plotted with filtering in panels (A-C), but unfiltered for the strange attractor (panel D left) and the tent map (panel E blue).

https://doi.org/10.7554/eLife.28295.010
Learning arm dynamics via FOLLOW.

Layout and legend of panels A-C are analogous to Figure 2A–C except that: in panel (A), the control input (torque) on the elbow joint is plotted; in panel (B), reference and decoded angle θ2,θ^2 (solid) and angular velocity ω2,ω^2 (dotted) are plotted, for the elbow joint; in panel (C), the error θ2-θ^2 in the elbow angle is plotted. (Aii-Cii) The control input was chosen to perform a swinging acrobot-like task by applying small torque only on the elbow joint. (Cii) The shoulder angle θ1(t) is plotted versus the elbow angle θ2(t) for the reference (blue) and the network (red) for the full duration in Aii-Bii. The green arrow shows the starting direction. (D) Reaching task. Snapshots of the configuration of the arm, reference in blue (top panels) and network in red (bottom panels) subject to torques in the directions shown by the circular arrows. After 0.6 s, the tip of the forearm reaches the cyan target. Gravity acts downwards in the direction of the arrow. (E) Acrobot-inspired swinging task (visualization of panels of Aii-Cii). Analogous to D, except that the torque is applied only at the elbow. To reach the target, the arm swings forward, back, and forward again.

https://doi.org/10.7554/eLife.28295.012

During learning, the mean squared error, where the mean was taken over the number of dynamical dimensions Nd and over a duration of a few seconds, decreased (Figure 2D). We stopped the learning phase that is weight updating, when the mean squared error approximately plateaued as a function of learning time (Figure 2D). At the end of the learning phase, we switched the error feedback off (‘open loop’) and provided different test inputs to both the reference system and the recurrent spiking network. A successful forward predictive model should be able to predict the state variables in the open-loop model over a finite time horizon (corresponding to the planning horizon of a short action sequence) and in the closed-loop mode (with error feedback) without time limit.

Non-linear oscillator

Our FOLLOW learning scheme enabled a network with 3000 neurons in the recurrent network and 3000 neurons in the motor command representation layer to approximate the non-linear 2-dimensional van der Pol oscillator (Figure 2). We used a superposition of random steps as input, with amplitudes drawn uniformly from an interval, changing on two time scales, 50 ms and 4 s (see Methods).

During the four seconds before learning started, we blocked error feedback. Because of zero error feedback and our initialization with zero feedforward and recurrent weights, the output x^ decoded from the network of spiking neurons remained constant at zero while the reference system performed the desired oscillations. Once the error feedback with large gain (k=10) was turned on, the feedback forced the network to roughly follow the reference. Thus, with feedback, the error dropped to a very low value, immediately after the start of learning (Figure 2B,C). During learning, the error dropped even further over time (Figure 2D). After having stopped learning at 5000 s (2 hr), we found the weight distribution to be uni-modal with a few very large weights (Figure 2G). In the open-loop testing phase without error feedback, a sharp square pulse as initial input on different 4 s long pedestal values caused the network to track the reference as shown in Figure 2Aii–Cii panels. For some values of the constant pedestal input, the phase of the output of the recurrent network differed from that of the reference (Figure 2Bii), but the shape of the non-linear oscillation was well predicted as indicated by the similarity of the trajectories in state space (Figure 2Cii).

The spiking pattern of neurons of the recurrent network changed as a function of time, with inter-spike intervals of individual neurons correlated with the output, and varying over time (Figure 2H,I). The distributions of firing rates averaged over a 0.25 s period with fairly constant output, and over a 16 s period with time-varying output, were long-tailed, with the mean across neurons maintained at approximately 12–13 Hz (Figure 2E,F). The distribution averaged over 16 s had a smaller number of neurons firing at very low and very high rates compared to the distribution over 0.25 s, consistent with the expectation that the identity of low-rate and high-rate neurons changed over time for time-varying output (Figure 2E,F). We repeated this example experiment (‘van der Pol oscillator’) with a network of equal size but with neurons that had higher firing rates, so that some neurons could reach a maximal rate of 400 Hz (Figure 1—figure supplement 1). The reference was approximated better and learning time was shorter with higher rates (Figure 2—figure supplement 1 – 10,000 s with constant learning rate) compared to the low rates here (Figure 2 – 5,000 s with 20 times the learning rate after 1,000 s). Hence, for all further simulations, we set neuronal parameters to enable peak firing rates up to 400 Hz (Figure 1—figure supplement 1B).

Convergence of error, weights and spike times for a realizable reference network.

(A) We ran our FOLLOW scheme on a network for learning one of two different implementations of the reference van der Pol oscillator: (1) differential equations, versus (2) a network realized using FOLLOW learning for 10,000 s (3 hr). We plot the evolution of the mean squared error, mean over number of dimensions Nd and over 4 s time blocks, from the start to 100,000 s of learning, with the weights starting from zero. Mean squared error for the differential equations reference (1) is shown in black, while that for the realizable network reference (2) is in red. (B) The feedforward weights (top panel) and the recurrent weights (bottom panel) at the end of 100,000 s (28 hr) of learning, are plotted versus the corresponding weights of the realizable target network. The coefficient of determination, that is the R2 value of the fit to the identity line (y=x), is also displayed for each panel. A value of R2=1 denotes perfect equality of weights to those of the realizable network. Some weights fall outside the plot limits. (C) After 0 s, 10,000 s (3 hr), and 100,000 s (28 hr) of the learning protocol against the realizable network as reference, we show spike trains of a few neurons in the recurrent network (red) and the reference network (blue) in the top, middle and bottom panels respectively, from test simulations while providing the same control input and keeping error feedback on. With error feedback off, the low-dimensional output diverged slightly from the reference, hence the spike trains did too (not shown).

https://doi.org/10.7554/eLife.28295.013

We also asked whether merely the distribution of the learned weights in the recurrent layer was sufficient to perform the task, or whether the specific learned weight matrix was required. This question was inspired from reservoir computing (Jaeger, 2001; Maass et al., 2002; Legenstein et al., 2003; Maass and Markram, 2004; Jaeger and Haas, 2004; Joshi and Maass, 2005; Legenstein and Maass, 2007), where the recurrent weights are random, and only the readout weights are learned. To answer this question, we implemented a perceptron learning rule on the readout weights initialized at zero, with the learned network’s output as the target, after setting the feedforward and/or recurrent weights to either the learned weights as is or after shuffling them. The readout weights could be approximately learned only for the network having the learned weights and not the shuffled ones (Figure 2—figure supplement 3), supporting the view that the network does not behave like a reservoir (Methods).

Chaotic Lorenz system

Our FOLLOW scheme also enabled a network with 5000 neurons each in the command representation layer and recurrent network, to learn the 3-dimensional non-linear chaotic Lorenz system (Figure 3). We considered a paradigm where the command input remained zero so that the network had to learn the autonomous dynamics characterized in chaos theory as a ’strange attractor’ (Lorenz, 1963). During the testing phase without error feedback minor differences led to different trajectories of the network and the reference which show up as large fluctuations of ϵ3(t) (Figure 3A–C). Such a behaviour is to be expected for a chaotic system where small changes in initial condition can lead to large changes in the trajectory. Importantly, however, the activity of the spiking network exhibits qualitatively the same underlying strange attractor dynamics, as seen from the butterfly shape (Lorenz, 1963) of the attractor in configuration space, and the tent map (Lorenz, 1963) of successive maxima versus the previous maxima (Figure 3D,E). The tent map generated from our network dynamics (Figure 3E) has lower values for the larger maxima compared to the reference tent map. However, very large outliers like those seen in a network trained by FORCE (Thalmeier et al., 2016) are absent. Since we expected that the observed differences are due to the filtering of the reference by an exponentially-decaying filter, we repeated learning without filtering the Lorenz reference signal (Figure 3—figure supplement 1), and found that the mismatch for large maxima reduced, but a doubling appeared in the tent map (Figure 3—figure supplement 1E) which had been almost imperceptible with filtering (cf. Figure 3E).

FOLLOW enables learning a two-link planar arm model under gravity

To turn to a task closer to real life, we next wondered if a spiking network can also learn the dynamics of a two-link arm via the FOLLOW scheme. We used a two-link arm model adapted from (Li, 2006) as our reference. The two links in the model correspond to the upper and fore arm, with the elbow joint in between and the shoulder joint at the top. The arm moved in the vertical plane under gravity, while torques were applied directly at the two joints, so as to coarsely mimic the action of muscles. To avoid full rotations, the two joints were constrained to vary in the range from -90 to +90 where the resting state is at 0 (see Methods).

The dynamical system representing the arm is four-dimensional with the state variables being the two joint angles and two angular velocities. The network must integrate the torques to obtain the angular velocities which in turn must be integrated for the angles. Learning these dynamics is difficult due to these sequential integrations involving non-linear functions of the state variables and the input. Still, our feedforward and recurrent network architecture (Figure 1B) with 5000 neurons in each layer was able to approximate these dynamics.

Similar to the previous examples, random input torque with amplitudes of short and long pulses changing each 50 ms and 1 s, respectively, was provided to each joint during the learning phase. The input was linearly interpolated between consecutive values drawn every 50 ms. In the closed loop scenario with error feedback, the trajectory converged rapidly to the target trajectory (Figure 4). We found that the FOLLOW scheme learned to reproduce the arm dynamics even without error feedback for a few seconds during the test phase (Figure 4 and Video 1 and Video 2), which corresponds to the time horizon needed for the planning of short arm movements.

Video 1
Reaching by the reference arm is predicted by the network.

After training the network as a forward model of the two-link arm under gravity as in Figure 4, we tested the network without feedback on a reaching task. Command input was provided to both joints of the two-link reference arm so that the tip reached the cyan square. The same command input was also provided to the network without error feedback. The state (blue, left) of the reference arm and the state predicted (red, right) by the learned network without error feedback are animated as a function of time. The directions of the circular arrows indicate the directions of the command torques at the joints. The animation is slowed 5× compared to real life.

https://doi.org/10.7554/eLife.28295.014
Video 2
Acrobot-like swinging by the reference arm is predicted by the network.

After training the network as a forward model of the two-link arm under gravity as in Figure 4, we tested the network without feedback on a swinging task analogous to an acrobot. Command input was provided to the elbow joint of the two-link reference arm so that the tip reached the cyan square by swinging. The same command input was also provided to the network without error feedback. The state (blue, left) of the reference arm and the state predicted (red, right) by the learned network without error feedback are animated as a function of time. The directions of the circular arrows indicate the directions of the command torques at the joints. The animation is slowed 5× compared to real life.

https://doi.org/10.7554/eLife.28295.015

To assess the generalization capacity of the network, we fixed the parameters post learning, and tested the network in the open-loop setting on a reaching task and an acrobot-inspired swinging task (Sutton, 1996). In the reaching task, torque was provided to both joints to enable the arm-tip to reach beyond a specific (x,y) position from rest. The arm dynamics of the reference model and the network are illustrated in Figure 4D and animated in Video 1. We also tested the learned network model of the 2-link arm on an acrobot-like task that is a gymnast swinging on a high-bar (Sutton, 1996), with the shoulder joint analogous to the hands on the bar, and the elbow joint to the hips. The gymnast can only apply small torques at the hip and none at the hands, and must reach beyond a specified (x,y) position by swinging. Thus, during the test, we provided input only at the elbow joint, with a time course that could make the reference reach beyond the target (x,y) position from rest by swinging. The control input and the dynamics (Figure 4A–C right panels, Figure 4E and Video 2) show that the network can perform the task in open-loop condition suggesting that it has learned the inertial properties of the arm model, necessary for this simplified acrobot task.

Feedback in the FOLLOW scheme entrains spike timings

In Methods, we show that the FOLLOW learning scheme is Lyapunov stable and that the error tends to zero under certain reasonable assumptions and approximations. Two important assumptions of the proof are that the weights remain bounded and that the desired dynamics are realizable by the network architecture, that is there exist feedforward and recurrent weights that enable the network to mimic the reference dynamics perfectly. However, in practice the realizability is limited by at least two constraints. First, even in networks of N rate neurons with non-linear tuning curves, the non-linear function h of the reference system in Equation (1) can in general only be approximated with a finite error (Funahashi, 1989; Girosi and Poggio, 1990; Hornik et al., 1989; Sanner and Slotine, 1992; Eliasmith and Anderson, 2004) which can be interpreted as a form of frozen noise, that is even with the best possible setting of the weights, the network predicts, for most values of the state variables, a next state which is slightly different than the one generated by the reference differential equation. Second, since we work with spiking neurons, we expect on top of this frozen noise the effect of shot noise caused by pseudo-random spiking. Both noise sources may potentially cause drift of the weights (Narendra and Annaswamy, 1989; Ioannou and Sun, 2012) which in turn can make the weights grow beyond any reasonable bound. Ameliorative techniques from adaptive control are discussed in Appendix 1. In our simulations, we did not find any effect of drift of weights on the error during a learning time up to 100,000 s (Figure 5A), 10 times longer than that required for learning this example (Figure 2—figure supplement 1).

To highlight the difference between a realizable reference system and non-linear differential equations as a reference system, we used, in an additional simulation experiment, a spiking network with fixed weights as the reference. More precisely, instead of using directly the differential equations of the van der Pol oscillator as a reference, we now used as a reference a spiking approximation of the van der Pol oscillator, that is the spiking network that was the final result after 10,000 s (3 hr) of FOLLOW learning in Figure 2—figure supplement 1. For both the spiking reference network and the to-be-trained learning network we used the same architecture, the same number of neurons, and the same neuronal parameters as in Figure 2—figure supplement 1 for the learning of the van der Pol oscillator. The readout and feedback weights of the learning network also had the same parameters as those of the spiking reference network, but the feedforward and recurrent weights of the learning network were initialized to zero and updated, during the learning phase, with the FOLLOW rule. We ran FOLLOW learning against the reference network for 100,000 s (28 hr) (Figure 5). With the realizable network as reference, learning was more rapid than with the original van der Pol oscillator as reference (Figure 5A).

We emphasize that, analogous to the earlier simulations, the feedback error ϵα was low-dimensional and calculated from the decoded outputs. Nevertheless, the low-dimensional error feedback was able to entrain the network spike times to the reference spike times (Figure 5C). In particular, a few neurons learned to fire only two or three spikes at very precise moments in time. For example, after learning, the spikes of neuron i=9 in the learning network were tightly aligned with the spike times of the neuron with the same index i in the spiking reference network. Similarly, neuron i=8 that was inactive at the beginning of learning was found to be active, and aligned with the spikes of the reference network, after 100,000 s (28 hr) of learning. The spike trains were entrained by the low-dimensional feedback. With the feedback off, even the low-dimensional output, and hence the spike trains, diverged from the reference. It will be interesting to explore if this entrainment by low-dimensional feedback via an auto-encoder loop can be useful in supervised spike train learning (Gütig and Sompolinsky, 2006; Pfister et al., 2006; Florian, 2012; Mohemmed et al., 2012; Gütig, 2014; Memmesheimer et al., 2014; Gardner and Grüning, 2016).

Our results with the spiking reference network suggest that the error is reduced to a value close to zero for a realizable or closely-approximated system (Figure 5A) as shown in Methods, analogous to proofs in adaptive control (Ioannou and Sun, 2012; Narendra and Annaswamy, 1989). Moreover, network weights became very similar, though not completely identical, to the weights of the realizable reference network (Figure 5B), which suggests that the theorem for convergence of parameters from adaptive control should carry over to our learning scheme.

Learning is robust to sparse connectivity, noisy error or reference, and noisy decoding weights, but not to delays

So far, our spiking networks had all-to-all connectivity. We next tested whether sparse connectivity (Markram et al., 2015; Brown and Hestrin, 2009) of the feedforward and recurrent connections was sufficient for learning low-dimensional dynamics. We ran the van der Pol oscillator learning protocol with the connectivity varying from 0.1 (10 percent connectivity) to 1 (full connectivity). Connections that were absent after the sparse initialization could not appear during learning, while the existing sparse connections were allowed to evolve according to FOLLOW learning. As shown in Figure 6A, we found that learning was slower with sparser connectivity; but with twice the learning time, a sparse network with about 25% connectivity reached similar performance as the fully connected network with standard learning time.

Robustness of FOLLOW learning.

We ran the van der Pol oscillator (A–D) or the linear decaying oscillator (F,H) learning protocol for 10,000 s for different parameter values and measured the mean squared error, over the last 400 s before the end of learning, mean over number of dimensions Nd and time. (A) We evolved only a fraction of the feedforward and recurrent connections, randomly chosen as per a specific connectivity, according to FOLLOW learning, while keeping the rest zero. The round dots show mean squared errors for different connectivities after a 10,000 s learning protocol (default connectivity = 1 is starred); while the square dots show the same after a 20,000 s protocol. (B) Mean squared error after 10,000 s of learning versus the standard deviation of noise added to each component of the error, or equivalently to each component of the reference, is plotted. (C) We multiplied the original decoding weights (that form an auto-encoder with the error encoders) by a random factor (1 + uniform(-χ,χ)) drawn for each weight. The mean squared error at the end of a 10,000 s learning protocol for increasing values of χ is plotted (default χ=0 is starred). (D) We multiplied the original decoding weights by a random factor (1 + uniform(-χ+ξ,χ+ξ)), fixing χ=2, drawn independently for each weight. The mean squared error at the end of a 10,000 s learning protocol, for a few values of ξ on either side of zero, is plotted. (E,G) Architectures for learning the forward model when the reference x(t) is available after a sensory feedback delay Δ for computing the error feedback. The forward model may be trained without a compensatory delay in the motor command path (E) or with it (G). (F,H) Mean squared error after 10,000 s of learning the linear decaying oscillator is plotted (default values are starred) versus the sensory feedback delay Δ in the reference, for the architectures without and with compensatory delay, in F and H respectively.

https://doi.org/10.7554/eLife.28295.016

We added Gaussian white noise to each component of the error, which is equivalent to adding it to each component of the reference, and ran the van der Pol oscillator learning protocol for 10,000 s for different standard deviations of the noise (Figure 6B). The learning was robust to noise with standard deviation up to around 0.001, which must be compared with the error amplitude of the order of 0.1 at the start of learning, and orders of magnitude lower later.

The readout weights have been pre-learned until now, so that, in the absence of recurrent connections, error feedback weights and decoding weights formed an auto-encoder. We sought to relax this requirement. Simulations showed that with completely random readout weights, the system did not learn to reproduce the target dynamical system. However, if the readout weights had some overlap with the auto-encoder, learning was still possible (Figure 6C). If for a feedback error ϵ, the error encoding followed by output decoding yields k(1+ξ)ϵ+n(ϵ), where n is a vector of arbitrary functions not having linear terms and small in magnitude compared to the first term, and ξ is sufficiently greater than -1 so that the effective gain k(1+ξ) remains large enough, then the term that is linear in error can still drive the output close to the desired one (see Methods).

To check this intuition in simulations, we incorporated multiplicative noise on the decoders by multiplying each decoding weight of the auto-encoder by one plus γ, where for each weight γ was drawn independently from a uniform distribution between -χ+ξ and χ+ξ. We found that the system was still able to learn the van der Pol oscillator up to χ5 and ξ=0, or χ=2 and ξ variable (Figure 6B,C). Negative values of ξ result in a lower overlap with the auto-encoder leading to the asymmetry seen in Figure 6C. Thus, the FOLLOW learning scheme is robust to multiplicative noise on the decoding weights. Alternative approaches for other noise models are discussed in Appendix 1.

We also asked if the network could handle sensory feedback delays in the reference signal. Due to the strong limit cycle attractor of the van der Pol oscillator, the effect of delay is less transparent than for the linear decaying oscillator (Figure 2—figure supplement 2), so we decided to focus on the latter. For the linear decaying oscillator, we found that learning degraded rapidly with a few milliseconds of delay in the reference, that is if x(t-Δ) was provided as reference instead of x(t) (Figure 6E–F). We compensated for the sensory feedback delay by delaying the motor command input by identical Δ (Figure 6G), which is equivalent to time-translating the complete learning protocol, to which the learning is invariant, and thus the network would learn for arbitrary delay (Figure 6H). In the Discussion, we suggest how a forward model learned with a compensatory delay (Figure 6G) could be used in control mode to compensate for sensory feedback delays.

Discussion

The FOLLOW learning scheme enables a spiking neural network to function as a forward predictive model that mimics a non-linear dynamical system activated by one or several time-varying inputs. The learning rule is supervised, local, and comes with a proof of stability.

It is supervised because the FOLLOW learning scheme uses error feedback where the error is defined as the difference between predicted output and the actual observed output. Error feedback forces the output of the system to mimic the reference, an effect that is widely used in adaptive control theory (Narendra and Annaswamy, 1989; Ioannou and Sun, 2012).

The learning rule is local in the sense that it combines information about presynaptic spike arrival with an abstract quantity that we imagine to be available in the postsynaptic neuron. In contrast to standard Hebbian learning, the variable representing this postsynaptic quantity is not the postsynaptic firing rate, spike time, or postsynaptic membrane potential, but the error current projected by feedback connections onto the postsynaptic neuron, similar in spirit to modern biological implementations of approximated backpropagation (Roelfsema and van Ooyen, 2005; Lillicrap et al., 2016) or local versions of FORCE (Sussillo and Abbott, 2009) learning rules. We emphasize that the postsynaptic quantity is different from the postsynaptic membrane potential or the total postsynaptic current which would also include input from feedforward and recurrent connections.

A possible implementation in a spatially extended neuron would be to imagine that the postsynaptic error current Iiϵ arrives in the apical dendrite where it stimulates messenger molecules that quickly diffuse or are actively transported into the soma and basal dendrites where synapses from feedfoward and feedback input could be located, as depicted in Figure 7A. Consistent with the picture of a messenger molecule, we low-pass filtered the error current with an exponential filter κϵ of time constant 80 ms or 200 ms, much longer than the synaptic time constant of 20 ms of the filter κ. Simultaneously, filtered information about presynaptic spike arrival Sj*κ is available at each synapse, possibly in the form of glutamate bound to the postsynaptic receptor or by calcium triggered signalling chains localized in the postsynaptic spines. Thus the combination of effects caused by presynaptic spike arrival and error information available in the postsynaptic cell drives weight changes, in loose analogy to standard Hebbian learning.

Possible implementation of learning rule, and delay compensation using forward model.

(A) A cartoon depiction of feedforward, recurrent and error currents entering a neuron i in the recurrent network. The error current enters the apical dendrite and triggers an intra-cellular chemical cascade generating a signal that is available at the feedforward and recurrent synapses in the soma and basal dendrites, for weight updates. The error current must trigger a cascade isolated from the other currents, here achieved by spatial separation. (B-C) An architecture based on the Smith predictor, that can switch between learning the forward model (B), versus using the forward model for motor control (C, adapted from (Wolpert and Miall, 1996)), to compensate for the delay in sensory feedback. Active pathways are in blue and inactive ones are in red. (B) The learning architecture (blue) is identical to Figure 6G, but embedded within a larger control loop (red). During learning, when error feedback gain k1, the motor command is fed in with a compensatory delay identical to the sensory feedback delay. Thus motor command and reference state are equally delayed, hence temporally matched, and the forward model learns to produce the motor system output for given input. (C) Once the forward model is learned, the system switches to motor control mode (feedback gain k=0). In this mode, the forward model receives the present motor command and predicts the current state of the motor system, for rapid feedback to the controller (via loop indicated by thick lines), even before the delayed sensory feedback arrives. Of course the delayed sensory feedback can be further taken into account by the controller, by comparing it with the delayed output of the forward model, to better estimate the true state. Thus the forward model learned as in B provides a prediction of the state, even before feedback is received, acting to compensate for sensory feedback delays in motor control.

https://doi.org/10.7554/eLife.28295.017

The separation of the error current from the currents at feedforward or recurrent synapses could be spatial (such as suggested in Figure 7A) or chemical if the error current projects onto synapses that trigger a signalling cascade that is different from that at other synapses. Importantly, whether it is a spatial or chemical separation, the signals triggered by the error currents need to be available throughout the postsynaptic neuron. This leads us to a prediction regarding synaptic plasticity that, say in cortical pyramidal neurons, the plasticity of synapses that are driven by pre-synaptic input in the basal dendrites, should be modulated by currents injected in the apical dendrite or on stimulation of feedback connections.

The learning scheme is provenly stable with errors converging asymptotically to zero under a few reasonable assumptions (Methods). The first assumption is that error encoding feedback weights and output decoding readout weights form an auto-encoder. This requirement can be met if, at an early developmental stage, either both sets of weights are learned using say mirrored STDP (Burbank, 2015), or the output readout weights are learned, starting with random encoding weights, via a biological perceptron-like learning rule (D'Souza et al., 2010; Urbanczik and Senn, 2014). A pre-learned auto-encoder in a high-gain negative feedback loop is in fact a specific prediction of our learning scheme, to be tested in systems-level experiments. The second assumption is that the reference dynamics f(x) is realizable. This requirement can be approximately met by having a recurrent network with a large number N of neurons with different parameters (Eliasmith and Anderson, 2004). The third assumption is that the state variables x(t) are observable. While currently we calculate the feedback error directly from the state variables as a difference between reference and predicted state, we could soften this condition and calculate the difference in a higher-dimensional space with variables y(t) as long as y=K(x) is an invertible function of x(t) (Appendix 1). The fourth assumption is that the system dynamics be slower than synaptic dynamics. Indeed, typical reaching movements extend over hundreds of milliseconds or a few seconds whereas neuronal spike transmission delays and synaptic time constants can be as short as a few milliseconds. In our simulations, neuronal and synaptic time constants were set to 20 ms, yet the network dynamics evolved on the time scale of hundreds of milliseconds or a few seconds, even in the open-loop condition when error feedback was switched off (Figures 2 and 4). The fifth assumption is that weights stay bounded. Indeed, in biology, synaptic weights should not grow indefinitely. Algorithmically, a weight decay term in the learning rule can suppress the growth of large weights (see also Appendix 1), though we did not need to implement a weight decay term in our simulations.

One of the postulated uses of the forward predictive model is to compensate for delay in the sensory feedback during motor control (Wolpert and Miall, 1996; Wolpert et al., 1995) using the Smith predictor configuration (Smith, 1957). We speculate that the switch from the closed-loop learning of forward model with feedback gain k1 to open-loop motor prediction k=0 could also be used to switch delay lines: the system can have either a delay before the forward model as required for learning (Figure 7B), or after the forward model as required for the Smith predictor (Figure 7C). We envisage that FOLLOW learning of the forward model occurs in closed loop mode (k1) with a delay in the motor command path, as outlined earlier in Figure 6G and now embedded in the Smith predictor architecture in Figure 7B. After learning, the network is switched to motor control mode, with the forward predictive model in open loop (k=0), implementing the Smith predictor (Figure 7C). In this motor control mode, the motor command is fed with zero delay to the forward model. This enables to rapidly feed the estimated state back to the motor controller so as to take corrective actions, even before sensory feedback arrives. In parallel, available sensory feedback is compared with a copy of the forward model that has passed through a compensatory delay after the forward model (Figure 7C).

Simulations with the FOLLOW learning scheme have demonstrated that strongly non-linear dynamics can be learned in a recurrent spiking neural network using a local online learning rule that does not require rapid weight changes. Previous work has mainly focused on a limited subset of these aspects. For example, Eliasmith and colleagues used a local learning rule derived from stochastic gradient descent, in a network structure comprising heterogeneous spiking neurons with error feedback (MacNeil and Eliasmith, 2011), but did not demonstrate learning non-linear dynamics (Appendix 1). Denève and colleagues used error feedback in a homogeneous spiking network with a rule similar to ours, for linear dynamics only (Bourdoukan and Denève, 2015), and while this article was in review, also for non-linear dynamics (Alemi et al., 2017), but their network requires instantaneous lateral interactions and in the latter case, also non-linear dendrites.

Reservoir computing models exploit recurrent networks of non-linear units in an activity regime close to chaos where temporal dynamics is rich (Jaeger, 2001; Maass et al., 2002; Legenstein et al., 2003; Maass and Markram, 2004; Jaeger and Haas, 2004; Joshi and Maass, 2005; Legenstein and Maass, 2007). While typical applications of reservoir computing are concerned with tasks involving a small set of desired output trajectories (such as switches or oscillators), our FOLLOW learning enables a recurrent network with a single set of parameters to mimic a dynamical system over a broad range of time-dependent inputs with a large family of different trajectories in the output.

Whereas initial versions of reservoir computing focused on learning the readout weights, applications of FORCE learning to recurrent networks of rate units made it possible to also learn the recurrent weights (Sussillo and Abbott, 2009, 2012). However, in the case of a multi-dimensional target, multi-dimensional errors were typically fed to distinct parts of the network, as opposed to the distributed encoding used in our network. Moreover, the time scale of synaptic plasticity in FORCE learning is faster than the time scale of the dynamical system which is unlikely to be consistent with biology. Modern applications of FORCE learning to spiking networks (DePasquale et al., 2016; Thalmeier et al., 2016; Nicola and Clopath, 2016) inherit these issues.

Adaptive control of non-linear systems using continuous rate neurons (Sanner and Slotine, 1992; Weiping Li et al., 1987; Slotine and Coetsee, 1986) or spiking neurons (DeWolf et al., 2016) has primarily focused on learning parameters in adaptive feedback paths, rather than learning weights in a recurrent network, using learning rules involving quantities that do not appear in the pre- or post-synaptic neurons, making them difficult to interpret as local to synapses. Recurrent networks of rate units have occasionally been used for control (Zerkaoui et al., 2009), but trained either via real-time recurrent learning or the extended Kalman filter which are non-local in space, or via backpropagation through time which is offline (Pearlmutter, 1995). Recent studies have used neural network techniques to train inverse models by motor babbling, to describe behavioral data in humans (Berniker and Kording, 2015) and song birds (Hanuschkin et al., 2013), albeit with abstract networks. Optimal control methods (Hennequin et al., 2014) or stochastic gradient descent (Song et al., 2016) have also been applied in recurrent networks of neurons, but with limited biological plausibility of the published learning rules. As an alternative to supervised schemes, biologically plausible forms of reward-modulated Hebbian rules on the output weights of a reservoir have been used to learn periodic pattern generation and abstract computations (Hoerzer et al., 2014; Legenstein et al., 2010), but how such modulated Hebbian rules could be used in predicting non-linear dynamics given time-dependent control input remains open.

Additional features of the FOLLOW learning scheme are that it does not require full connectivity but also works with biologically more plausible sparse connectivity; and it is robust to multiplicative noise in the output decoders, analogous to recent results on approximate error backpropagation in artificial neural networks (Lillicrap et al., 2016). Since the low-dimensional output and all neural currents are spatially averaged over a large number of synaptically-filtered spike trains, neurons in the FOLLOW network do not necessarily need to fire at rates higher than the inverse of the synaptic time scale. In conclusion, we used a network of heterogeneous neurons as in the Neural Engineering Framework (Eliasmith and Anderson, 2004), employed a pre-learned auto-encoder to enable negative feedback of error as in adaptive control theory (Morse, 1980; Narendra et al., 1980; Slotine and Coetsee, 1986; Weiping Li et al., 1987; Narendra and Annaswamy, 1989; Sastry and Bodson, 1989; Ioannou and Sun, 2012), and derived and demonstrated a local and online learning rule for recurrent connections that learn to reproduce non-linear dynamics.

Our present implementation of the FOLLOW learning scheme in spiking neurons violates Dale’s law because synapses originating from the same presynaptic neuron can have positive or negative weights, but in a different context extensions incorporating Dale’s law have been suggested (Parisien et al., 2008). Neurons in cortical networks are also seen to maintain a balance of excitatory and inhibitory incoming currents (Denève and Machens, 2016). It would be interesting to investigate a more biologically plausible extension of FOLLOW learning that maintains Dale’s law; works in the regime of excitatory-inhibitory balance, possibly using inhibitory plasticity (Vogels et al., 2011); pre-learns the auto-encoder, potentially via mirrored STDP (Burbank, 2015); and possibly implements spatial separation between different compartments (Urbanczik and Senn, 2014). It would also be interesting for future work to see whether our model of an arm trained on motor babbling with FOLLOW, can explain aspects of human behavior in reaching tasks involving force fields (Shadmehr and Mussa-Ivaldi, 1994), uncertainty (Körding and Wolpert, 2004; Wei and Körding, 2010) or noise (Burge et al., 2008). Further directions worth pursuing include learning multiple different dynamical transforms within one recurrent network, without interference; hierarchical learning with stacked recurrent layers; and learning the inverse model of motor control so as to generate the control input given a desired state trajectory.

Methods

Simulation software

All simulation scripts were written in python (https://www.python.org/) for the Nengo simulator (Stewart et al., 2009) (http://www.nengo.ca/, version 2.4.0) with minor custom modifications to support sparse weights. We ran the model using the Nengo GPU back-end (https://github.com/nengo/nengo_ocl) for speed. The script for plotting the figures was written in python using the matplotlib module (http://matplotlib.org/). These simulation and plotting scripts are available online at https://github.com/adityagilra/FOLLOW (Gilra, 2017). A copy is archived at https://github.com/elifesciences-publications/FOLLOW.

Network parameters

Initialization of plastic weights

The feedforward weights wilff from the command representation layer to the recurrent network and the recurrent weights wij inside the network were initialized to zero.

Update of plastic weights

With the error feedback loop closed, that is with reference output x and predicted output x^ connected to the error node, and feedback gain k=10, the FOLLOW learning rule, Equation (10), was applied on the feedforward and recurrent weights, wilff and wij. The error for our learning rule was the error ϵα=xα-x^α in the observable output x, not the error in the desired function h(x,u) (cf. Eliasmith, 2005; MacNeil and Eliasmith, 2011, Appendix 1). The observable reference state x was obtained by integrating the differential equations of the dynamical system. The synaptic time constant τs was 20 ms in all synapses, including those for calculating the error and for feeding the error back to the neurons (decaying exponential κ with time constant τs in Equation (6)). The error used for the weight update was filtered by a 200 ms decaying exponential (κϵ in Equation (10)).

Random setting of neuronal parameters and encoding weights

We used leaky integrate-and-fire neurons with a threshold θ=1 and time constant τm=20 ms. After each spike, the voltage was reset to zero, and the neuron entered an absolute refractory period of τr=2 ms. When driven by a constant input current J, a leaky integrate-and-fire neuron with absolute refractoriness fires at a rate a=g(J) where g is the gain function with value g(J)=0 for J1 and

(11) g(J)=1/(τr+τmlnJJ1),for J>1.

Our network was inhomogeneous in the sense that different neurons had different parameters as described below. The basic idea is that the ensemble of N neurons, with different parameters, forms a rich set of basis functions in the Nc or Nd dimensional space of inputs or outputs, respectively. This is similar to tiling the space with radial basis functions, except that here we replace the radial functions by the gain functions of the LIF neurons (Equation (11)) each having different parameters (Eliasmith and Anderson, 2004). These parameters were chosen randomly once at the beginning of a simulation and kept fixed during the simulation.

For the command representation layer, we write the current J into neuron l, in the case of a constant input u, as

(12) Jlff=νlffβe~lβffuβ+blff,with elβffνlffe~lβff,

where νlff and blff are neuron-specific gains and biases, and e~lβff are ‘normalized’ encoding weights (cf. Equation (5)).

These random gains, biases and ‘normalized’ encoding weights must be chosen so that the command representation layer adequately represents the command input u, whose norm is bounded in the interval [0,R1] (Table 1). First, we choose the ‘normalized’ encoding weight vectors on a hypersphere of radius 1/R1, so that the scalar product between the command vector and the vector of ‘normalized’ encoding weights, βe~lβffuβ, lies in the normalized range [1,1]. Second, the distribution of the gains sets the distribution of the firing rates in a target range. Third, we see from Equation (11) that the neuron starts to fire at the rheobase threshold J=1. The biases blff randomly shift this rheobase threshold over an interval (see Figure 1—figure supplement 1). For the distributions used to set the fixed random gains and biases, see Table 1.

Table 1
Network and simulation parameters for example systems.
https://doi.org/10.7554/eLife.28295.018
Linearvan der PolLorenzArmNon-linear feedforward
Number of neurons/layer20003000500050002000
Tperiod (s)242022
Representation radius R10.20.260.20.2
Representation radius R2153011
Gains νi and biases bi for command representation and recurrent layersNengo v2.4.0 default Figures 1 and 2: νi=2 and bi chosen uniformly from [2,2). All other Figures: Nengo v2.4.0 default Nengo v2.4.0 default Nengo v2.4.0 default Nengo v2.4.0 default
Learning pulse ζ1R1/6R1/6,R1/2R2/10R2/0.3R1/0.6
Learning pedestal ζ2R2/16R1/6, R1/20R2/0.3R2/1.6
Learning rate η2e-42e-4*2e-42e-42e-4
FiguresFigure 2—figure supplement 2Figure 1, Figure 2, Figure 2—figure supplement 1, Figure 2—figure supplement 3, Figure 5, Figure 6, Figure 7Figure 3, Figure 3—figure supplement 1Figure 4Figure 2—figure supplement 4, Figure 2—figure supplement 5
  1. * 4.5 for Figures 1 and 2.

  2. † 4e-3 after 1,000 s for Figures 1 and 2. 1e-4 for readout weights in Figure 2—figure supplement 3.

  3. ‡ Nengo v2.4.0 sets the gains and biases indirectly, by default. The projected input at which the neuron just starts firing (i.e. αe~iαxα=J~i0) is chosen uniformly from [-1,1), while the firing rate for αe~iαxα=1 is chosen uniformly between 200 and 400 Hz. From these, νi and bi are computed using Equations (11) and (13).

Analogously, for the recurrent network, we write the current into neuron i, for a constant ‘pseudo-input’ vector x~ being represented in the network, as

(13) Ji=νiαe~iαx~α+bi,with eiανie~iα,

where νi, bi are neuron-specific gains and biases, and e~iα are ‘normalized’ encoding weights. We call x~ a ‘pseudo-input’ for two reasons. First, the error encoding weights keiα are used to feed the error ϵα=(xα-x^α) back to neuron i in the network (cf. Equation (6)). However, ϵα=xα/(k+1), due to the feedback loop according to Equation (9). Thus, the ‘pseudo-input’ x~α=kxα/(k+1) has a similar range as x, whose norm lies in the interval [0,R2] (see Table 1). Second, the neuron also gets feedforward and recurrent input. However, the feedforward and recurrent inputs get automatically adjusted during learning (starting from zero), so their absolute values do not matter for the initialization of parameters that we discuss here. Thus, we choose the ‘normalized’ encoding weight vectors on a hypersphere of radius 1/R2. For the distributions used to set the fixed random gains and biases, see Table 1.

Setting output decoding weights to form an auto-encoder with respect to error encoding weights

The linear readout weights dαi from the recurrently connected network were pre-computed algorithmically so as to form an auto-encoder with the error encoding weights eiα (for k=1), while setting the feedforward and recurrent weights to zero (wlβff=0 and wij=0). To do this, we randomly selected P error vectors ϵ(p), that we used as training samples for optimization, with sample index p=1,,P, and having vector components ϵα(p), α=1,,Nd. Since the observable system is Nd-dimensional, we chose the training samples randomly from within an Nd-dimensional hypersphere of radius R2. We applied each of the error vectors statically as input for the error feedback connections and calculated the activity

(14) ai(p)ai(ϵ(p))=g(αeiαϵα(p)+bi),

of neuron i for error vector ϵ(p) using the static rate Equation (11). The decoders dαi acting on these activities should yield back the encoded points thus forming an auto-encoder. A squared-error loss function , with L2 regularization of the decoders,

(15) =12pPαNd(iNdαiai(p)-ϵα(p))2+12λαNdiNdαi2,

setting λ=P(0.1max({ai(p)}))2 with number of samples P=N, was used for this linear regression (default in Nengo v2.4.0) (Eliasmith and Anderson, 2004; Stewart et al., 2009). Biologically plausible learning rules exist for auto-encoders, either by training both encoding and decoding weights (Burbank, 2015), or by training decoding weights given random encoding weights (D'Souza et al., 2010; Urbanczik and Senn, 2014), but we simply calculated and set the decoding weights as if they had already been learned.

Compressive and expansive auto-encoder

Classical three-layer (input-hidden-output-layer) auto-encoders come in two different flavours, viz. compressive or expansive, which have the dimensionality of the hidden layer smaller or larger respectively, than that of the input and output layers. Instead of a three-layer feedfoward network, our auto-encoder forms a loop from the neurons in the recurrent network via readout weights to the output and from there via error-encoding weights to the input. Since the auto-encoder is in the loop, we expect that it works both as a compressive one (from neurons in the recurrent network over the low-dimensional output back to the neurons) and as an expansive one (from the output through the neurons in the recurrent network back to the output).

Rather than constraining, as in Equation (15), the low-dimensional input ϵα and round-trip output idαiai(ϵ) to be equal for each component α (expansive auto-encoder), we can alternatively enforce the high dimensional input Ij (projection into neuron j of low-dimensional input ϵ)

(16) Ijαejαϵα,

and round-trip output Iji,αejαdαig~i(Ii), where g~i(Ii)ai(ϵ), to be equal for each neuron j in the recurrent network (compressive auto-encoder) in order to optimize the decoding weights of the auto-encoder. Thus, the squared-error loss for this compressive auto-encoder becomes:

(17) =pPjN(αNdejα(iNdαiai(p)ϵα(p)))2=pPjN(αNdejα(iNdαiai(p)ϵα(p)))(γNdejγ(lNdγlal(p)ϵγ(p)))=pPjNαNdejα2(iNdαiai(p)ϵα(p))2+pPjN(α,γ,αγejαejγ(iNdαiai(p)ϵα(p))(lNdγlal(p)ϵγ(p)))cpPαNd(iNdαiai(p)ϵα(p))2,

where in the approximation, we exploit that (i) the relative importance of the term involving pPjα,γ,αγejαejγ tends to zero as 1/NP, since ejα and ejγ are independent random variables; and (ii) jejα2c is independent of α. Thus, the loss function of Equation (17) is approximately proportional to the squared-error loss function of Equation (15) (not considering the L2 regularization) used for the expansive auto-encoder, showing that for an auto-encoder embedded in a loop with fixed random encoding weights, the expansive and compressive descriptions are equivalent for those N-dimensional inputs Ii that lie in the Nd-dimensional sub-space spanned by {eiα}that is Ii is of the form αeiαϵα where ϵα lies in a finite domain (hypersphere). We employed a large number P=N of random low-Nd-dimensional inputs when constraining the expansive auto-encoder.

Command input

The command input vector u(t) to the network was Nc-dimensional (Nc=Nd for all systems except the arm) and time-varying. During the learning phase, input changed over two different time scales. The fast value of each command component was switched every 50 ms to a level uα chosen uniformly between (-ζ1,ζ1) and this number was added to a more slowly changing input variable u¯α (called ’pedestal’ in the main part of the paper) which changed with a period Tperiod. Here u¯α is the component of a vector of length ζ2 with a randomly chosen direction. The value of component α of the command is then uα=u¯α+uα. Parameter values for the network and input for each dynamical system are provided in Table 1. Further details are noted in the next subsection.

During the testing phase without error feedback, the network reproduced the reference trajectory of the dynamical system for a few seconds, in response to the same kind of input as during learning. We also tested the network on a different input not used during learning as shown in Figures 2 and 4.

Equations and parameters for the example dynamical systems

The equations and input modifications for each dynamical system are detailed below. Time derivatives are in units of s-1.

Linear system

The equations for a linear decaying oscillator system (Figure 2—figure supplement 2) are

x˙1=u1/0.02+(-0.2x1-x2)/0.05x˙2=u2/0.02+(x1-0.2x2)/0.05.

For this linear dynamical system, we tested the learned network on a ramp of 2 s followed by a step to a constant non-zero value. A ramp can be viewed as a preparatory input before initiating an oscillatory movement, in a similar spirit to that observed in (pre-)motor cortex (Churchland et al., 2012). For such input too, the network tracked the reference for a few seconds (Figure 2—figure supplement 2A–C).

van der Pol oscillator

The equations for the van der Pol oscillator system are

x˙1=u1/0.02+x2/0.125x˙2=u2/0.02+(2(1-x12)x2-x1)/0.125.

Each component of the pedestal input u¯α was scaled differently for the van der Pol oscillator as reported in Table 1.

Lorenz system

The equations for the chaotic Lorenz system (Lorenz, 1963) are

x˙1=u1/0.02+10(x2-x1)x˙2=u2/0.02-x1x3-x2x˙3=u3/0.02+x1x2-8(x3+28)/3.

In our equations above, Z of the original Lorenz equations (Lorenz, 1963) is represented by an output variable x3=Z-28 so as to have observable variables that vary around zero. This does not change the system dynamics, just its representation in the network. For the Lorenz system, only a pulse at the start for 250 ms, chosen from a random direction of norm ζ1, was provided to set off the system, after which the system followed autonomous dynamics.

Non-linearly transformed input to linear system

For the above dynamical systems, the input adds linearly on the right hand sides of the differential equations. Our FOLLOW scheme also learned non-linear feedforward inputs to a linear dynamical system, as demonstrated in Figure 2—figure supplement 4 and Figure 2—figure supplement 5. As the reference, we used the linear dynamical system above, but with its input transformed non-linearly by gα(u)=10((uα/0.1)3-uα/0.4). Thus, the equations of the reference were:

x˙1=10((u1/0.1)3-u1/0.4)+(-0.2x1-x2)/0.05x˙2=10((u2/0.1)3-u2/0.4)+(x1-0.2x2)/0.05.

The input to the network remained u. Thus, effectively the feedforward weights had to learn the non-linear transform g(u) while the recurrent weights learned the linear system.

Arm dynamics

In the example of learning arm dynamics, we used a two-link model for an arm moving in the vertical plane with damping, under gravity (see for example http://www.gribblelab.org/compneuro/5_Computational_Motor_Control_Dynamics.html and https://github.com/studywolf/control/tree/master/studywolf_control/arms/two_link), with parameters from (Li, 2006). The differential equations for the four state variables, namely the shoulder and elbow angles θ=(θ1,θ2)T and the angular velocities ω=(ω1,ω2)T, given input torques τ=(τ1,τ2)T are:

(18) θ˙=ω
(19) ω˙=M(θ)1(τC(θ,ω)BωgD(θ)),

with

M(θ)=(d1+2d2cosθ2+m1s12+m2s22d3+d2cosθ2+m2s22d3+d2cosθ2+m2s22d3+m2s22)C(θ,ω)=(θ˙2(2θ˙1+θ˙2)θ˙12)d2sinθ2,B=(b11b12b21b22),D(θ)=((m1s1+m2l1)sinθ1+m2s2sin(θ1+θ2)m2s2sin(θ1+θ2)),d1=I1+I2+m2l12,d2=m2l1s2,d3=I2,

where mi is the mass, li the length, si the distance from the joint centre to the centre of the mass, and Ii the moment of inertia, of link iM is the moment of inertia matrix; C contains centripetal and Coriolis terms; B is for joint damping; and D contains the gravitational terms. Here, the state variable vector x=[θ1,θ2,ω1,ω2]T, but the effective torque τ is obtained from the input torque u as follows.

To avoid any link from rotating full 360 degrees, we provide an effective torque τα to the arm, by subtracting a term proportional to the input torque uα, if the angle crosses ±90 degrees and uα is in the same direction:

τα=uα{uασ~(θα)uα>00uα=0uασ~(θα)uα<0,

where σ~(θ) increases linearly from 0 to 1 as θ goes from π/2 to 3π/4:

σ~(θ)={0θπ/2(θπ/2)/(π/4)3π/4>θ>π/21θ3π/4.

The parameter values were taken from the human arm (Model 1) in section 3.1.1 of the PhD thesis of Li (Li, 2006) from the Todorov lab; namely m1=1.4kg, m2=1.1kg, l1=0.3m, l2=0.33m, s1=0.11m, s2=0.16m, I1=0.025kgm2, I2=0.045kgm2, and b11=b22=0.05kgm2/s, b12=b21=0.025kgm2/s. Acceleration due to gravity was set at g=9.81m/s2. For the arm, we did not filter the reference variables for calculating the error.

The input torque u(t) for learning the two-link arm was generated, not by switching the pulse and pedestal values sharply, every 50 ms and Tperiod as for the others, but by linearly interpolating in-between to avoid oscillations from sharp transitions.

The input torque u and the variables ω, θ obtained on integrating the arm model above were scaled by 0.02 (Nm)-1, 0.05 (rad/s)-1 and 1/2.5 rad-1 respectively, and then these dimensionless variables were used as the input and reference for the spiking network. Effectively, we scaled the input torques to cover one-fifth of the representation radius R2, the angular velocities one-half, and the angles full, as each successive variable was the integral of the previous one.

Learning readout weights with recurrent weights fixed

For learning the readout weights after setting either the true or shuffled set of learned recurrent weights (Figure 2—figure supplement 3), we used a perceptron learning rule.

(20) ddtdαi=-ηr(jdαj(Sj*κ)(t)-xα(t))(Si*κ)(t)=-ηr(x^α(t)-xα(t))(Si*κ)(t),

with learning rate ηr=1e-4.

Derivation and proof of stability of the FOLLOW learning scheme

We derive the FOLLOW learning rule Equation (10), while simultaneously proving the stability of the scheme. We assume that: (1) the feedback {keiα} and readout weights {dαj} form an auto-encoder with gain k; (2) given the gains and biases of the spiking LIF neurons, there exist feedforward and recurrent weights that make the network follow the reference dynamics perfectly (in practice, the dynamics is only approximately realizable by our network, see Appendix 1 for a discussion); (3) the state x of the dynamical system is observable; (4) the intrinsic time scales of the reference dynamics are much larger than the synaptic time scale and the time scale of the error feedback loop, and much smaller than the time scale of learning; (5) the feedforward and recurrent weights remain bounded; and (6) the input u and reference output x remain bounded.

The proof proceeds in three major steps: (1) using the auto-encoder assumption to write the evolution equation of the low-dimensional output state variable in terms of the recurrent and feedforward weights; (2) showing that output follows the reference due to the error feedback loop; and (3) obtaining the evolution equation for the error and using it in the time-derivative of a Lyapunov function V, to show that V˙0 for uniform stability, similar to proofs in adaptive control theory (Narendra and Annaswamy, 1989; Ioannou and Sun, 2012).

Role of network weights for low-dimensional output

The filtered low-dimensional output of the recurrent network is given by Equation (3) of Results and repeated here:

(21) x^α=jdαj(Sj*κ)(t),

where dαj are the readout weights. Since κ is an exponential filter with time constant τs, Equation (21) can also be written as

(22) τsx^˙α(t)=-x^α(t)+jdαjSj(t),

We convolve this equation with kernel κ, multiply by the error feedback weights, and sum over the output components α

(23) τsαeiα(x^˙α*κ)(t)=-αeiα(x^α*κ)(t)+αeiαjdαj(Sj*κ)(t).

We would like to write Equation (23) in terms of the recurrent and feedforward weights in the network.

To do this, we exploit assumptions (1) and (4). Having shown the equivalence of the compressive and expansive descriptions of our auto-encoder in the error-feedback loop (Equations (15) and (17)), we formulate our non-linear auto-encoder as compressive: we start with a high-dimensional set of inputs IjJj-bj (where Jj is the current into neuron j with bias bj, cf. Equations (5) and (6)); transform these inputs non-linearly into filtered spike trains Sj*κ; decode these filtered spike trains into a low-dimensional representation z with components zα=jdαj(Sj*κ); and increase the dimensionality back to the original one, via weights keiα, to get inputs:

(24) Ii=αkeiαzα=kαjeiαdαj(Sj*κ).

Using assumption (1) we expect that the final inputs Ii are approximately k times the initial inputs Ii:

(25) kαjeiαdαj(Sj*κ)kIi.

This is valid only if the high-N-dimensional input Ii lies in the low-Nd-dimensional subspace spanned by {eiα} (Equation (17)). We show that this requirement is fulfilled in the next major step of the proof (see text accompanying Equations (31)–(35)).

Our assumption (4) says that the state variables of the reference dynamics change slowly compared to neuronal dynamics. Due to the spatial averaging (sum over j in Equation (25)) over a large number of neurons, individual neurons do not necessarily have to fire at a rate higher than the inverse of the synaptic time scale, while we can still assume that the total round trip input Ii on the left hand side of Equation (25) is varying only on the slow time scale. Therefore, we used firing rate equations to compute mean outputs given static input when pre-calculating the readout weights (earlier in Methods).

Inserting the approximate Equation (25) into Equation (23) we find

(26) τsαeiα(x^˙α*κ)(t)-αeiα(x^α*κ)(t)+Ii(t).

We replace IiJi-bi, using the current Ji from Equation (6) for neuron i of the recurrent network, to obtain

(27) τsαeiα(x^˙ακ)(t)αeiα(x^ακ)(t)+jwij(Sjκ)(t)+lwilff(Slffκ)(t)+αkeiα(ϵακ)(t).

Thus, the change of the low-dimensional output x^α*κ depends on the network weights, which need to be learned. This finishes the first step of the proof.

Error-feedback loop ensures that output follows reference

Because of assumption (2), we may assume that there exists a recurrent network of spiking neurons that represents the desired dynamics of Equation (1) without any error feedback. This second network serves as a target during learning and has variables and parameters indicated with an asterisk. In particular, the second network has feedforward weights wilff and recurrent weights wij*. We write an equation similar to Equation (27) for the output xα* of the target network:

(28) τsαeiα(x˙ακ)(t)=αeiα(xακ)(t)+jwij(Sjκ)(t)+lwilff(Slffκ)(t),

where (Slffκ)(t) and (Sj**κ)(t) are defined as the filtered spike trains of neurons in the realizable target network. We emphasize that this target network does not need error feedback because its output is, by definition, always correct. In fact, the readout from the spike trains Sj* gives the target output which we denote by x*. The weights of the target network are constant and their actual values are unimportant. They are mere mathematical devices to demonstrate stable learning of the first network which has adaptable weights. For the first network, we choose the same number of neurons and the same neuronal parameters as for the second network; moreover, the input encoding weights from the command input to the representation layer and the readout weights from the recurrent network to the output are identical for both networks. Thus, the only difference is that the feedforward and recurrent weights of the target network are realized, while for the first network they need to be learned.

In view of potential generalization, we note that any non-linear dynamical system is approximately realizable due to the expansion in a high-dimensional non-linear basis that is effectively performed by the recurrent network (see Appendix 1). Approximative weights (close to the ideal ones) could in principle also be calculated algorithmically (see Appendix 1). In the following we exploit assumption (2) and assume that the dynamics is actually (and not only approximately) realized by the target network.

Our assumption (3) states that the output is observable. Therefore the error component ϵα can be computed directly via a comparison of the true output x of the reference with the output x^ of the network: ϵα=xα-x^α. (In view of potential generalizations, we remark that the observable output need not be the state variables themselves, but could be a higher-dimensional non-linear function of the state variables, as shown for a general dynamical system in Appendix 1.)

As the second step of the proof, we now show that the error feedback loop enables the first network to follow the target network under assumptions (4 - 6). More precisely, we want to show that the readout and neural activities of the first network remain close to those of the target network at all times, that is x^α(t)xα*(t) for each component α and (Si*κ)(t)(Si**κ)(t) for each neuron index i. To do so, we use assumption (4) and exploit that (i) learning is slow compared to the network dynamics so the weights of the first network can be considered momentarily constant, and (ii) the reference dynamics is slower than the synaptic and feedback loop time scales, so the reference output xα can be assumed momentarily constant. Thus, we have a separation of time scales in Equation (27): for a given input (transmitted via the feedforward weights) and a given target value xα*, the network dynamics settles on the fast time scale τs to a momentary fixed point x^ which we find by setting the derivative on the left-hand side of Equation (27) to zero:

(29) 0=αeiα(x^ακ)(t)+jwij(Sjκ)(t)+lwilff(Slffκ)(t)+αkeiα((xαx^α)κ)(t).

We rewrite this equation in the form

(30) αeiα(x^ακ)(t)=kk+1αeiα(xακ)(t)+1k+1(jwij(Sjκ)(t)+lwilff(Slffκ)(t)).

We choose the feedback gain for the error much larger than 1 (k1), such that k/(k+1)1. We show below (in the text accompanying Equations (31)–(35)), that the factor in parentheses multiplying 1/(k+1) in the second term starts from zero and tends, with learning, towards αeiα(xακ), which is the factor multiplying k/(k+1) in the first term. Thus, the first term remains approximately k times larger than the second during learning. To obtain x^αxα, we set k1.

To show that the momentary fixed point is stable at the fast synaptic time scale, we calculate the Jacobian =[il], for the dynamical system given by Equation (27). We introduce auxiliary variables yiαeiα(x^ακ) to rewrite Equation (27) with the new variables in the form y˙i=Fi(y); and then we take derivative of its right hand side to obtain the elements of the Jacobian matrix at the fixed point αeiαx^α (using 0κ(τ)dτ=1):

𝒥ilFi(y)yl=(k+1)τsδil+1τsjwij(Sjκ)(t)yl|yi=αeiαx^α

where δil is the Kronecker delta function. We note that jwij(Sjκ) is a spatially and temporally averaged measure of the population activity in the network with appropriate weighting factors wij. We assume that the population activity varies smoothly with input, which is equivalent to requiring that on the time scale τs, the network fires asynchronously, i.e. there are no precisely timed population spikes. Then we can take the second term to be bounded, in absolute value, by say B1/τs. The Jacobian matrix 𝒥 is of the form (k+1)I/τs+Λ, where I is the N×N identity matrix and Λ is a matrix with each element bounded in absolute value by B1/τs. If we set kNB1, then all eigenvalues of the Jacobian have negative real parts, applying the Gerschgorin circle theorem (the second term can perturb any eigenvalue from (k+1)/τs to within a circle of radius NB1/τs at most), rendering the momentary fixed point asymptotically stable.

Thus, we have shown that if the initial state of the first network is close to the initial state of the target network, e.g., both start from rest, then on the slow time scale of the system dynamics of the reference x*, the first network output follows the target network output at all times, x^αxα*.

We now show that neurons are primarily driven by inputs close to those in the target network due to error feedback, and that these lie in the low-dimensional manifold spanned by {eiα}, as required for Equation (25). We compute the projected error using Equation (30):

(31) αeiα(ϵακ)(t)=αeiα((xαx^α)κ)(t)=1k+1αeiα(xακ)(t)1k+1(jwij(Sjκ)(t)+lwilff(Slffκ)(t)),

and insert it into Equation (6) to obtain the current into a neuron in the recurrent network:

(32) Ji=kk+1αeiα(xακ)(t)+1k+1(jwij(Sjκ)(t)+lwilff(Slffκ)(t))+bi

At the start of learning, if the feedforward and recurrent weights are small, then the neural input is dominated by the fed-back error input that is the first term, making Ji close to the ideal current

(33) Ji*=αeiα(xα**κ)(t)+bi.

Thus, the neural input at the start of learning is of the form αeiαxα* which lies in the low-dimensional subspace spanned by {eiα} as required for Equation (25). Furthermore, over time, the feedforward and recurrent weights get modified so that their contribution tends towards αeiα(xα**κ), such that the two terms of Equation (32) add to make Ji even closer to the ideal current Ji* given by Equation (33). This is made clearer by considering the weight update rule Equation 10 as stochastic gradient descent on a loss function,

(34) J=12i(αeiα(xακ)(t)lwilff(Slffκ)(t)jwij(Sjκ)(t))2,

leading us to (for each recurrent weight wij, and similarly for wilff):

(35) w˙ij=ηJwij=η(αeiα(xακ)lwilff(Slffκ)jwij(Sjκ))(Sjκ)=ηk+1k(Iiϵκ)(Sjκ),

which is identical to the FOLLOW learning rule for wij in Equation (10) except for the time-scale of filtering of the error current (see Discussion), and a factor involving k that can be absorbed into the learning rate η. In the last step above, we used the projected error current from Equation (31) and the definition of Iiϵ in Equation (8). Thus, the feedforward and recurrent connections evolve to inject, after learning, the same ideal input within the low-dimensional manifold, as was provided by the error feedback during learning. Hence, the neural input remains in the low-dimensional manifold spanned by {eiα} throughout learning, as required for Equation (25), making this major step and the previous one self-consistent.

Since the driving neural currents are close to ideal throughout learning, the filtered spike trains of the recurrent neurons in the first network will also be approximately the same as those of the target network, so that (Si*κ)(t) can be used instead of (Si**κ)(t) in (Equation (28)). Moreover, the filtered spike trains (Slffκ)(t) of the command representation layer in the first network are the same as those in the target network, since they are driven by the same command input u and the command encoding weights are, by construction, the same for both networks. The similarity of the spike trains in the first and target networks will be used in the next major part of the proof.

Stability of learning via Lyapunov’s method

We now turn to the third step of the proof and consider the temporal evolution of the error ϵα=xα-x^α. We exploit that the network dynamics is realized by the target network and insert Equations (27) and (28) so as to find

(36) τsαeiα(ϵ˙ακ)(t)=τsαeiα((x^˙αx˙α)κ)(t)τsαeiα((x^˙αx˙α)κ)(t)j(wijwij)(Sjκ)(t)+l(wilffwilff)(Slffκ)(t)+(k+1)αeiα(ϵακ)(t)jψij(Sjκ)(t)+lϕil(Slffκ)(t)+(k+1)αeiα(ϵακ)(t).

In the second line, we have replaced the reference output by the target network output; and in the third line we have used Equations (27) and (28), and replaced the filtered spike trains of the target network by those of the first network, exploiting the insights from the previous paragraph. In the last line, we have introduced abbreviations ψijwij-wij* and ϕilwilffwilff.

In order to show that the absolute value of the error decreases over time with an appropriate learning rule, we consider the candidate Lyapunov function:

(37) V(ϵ~,ψ,ϕ)=12iϵ~i2+121η~1i,j(ψij)2+121η~2i,l(ϕil)2,

where ϵ~iτsαeiα(ϵα*κ) and η~1,η~2>0 are positive constants. We use Lyapunov’s direct method to show the stability of learning. For this, we require the following properties for the Lyapunov function. (a) The Lyapunov function is positive semi-definite V(ϵ~,ψ,ϕ)0, with the equality to zero only at (ϵ~,ψ,ϕ)=(0,0,0). (b) It has continuous first-order partial derivatives. Furthermore, V is (c) radially unbounded since

V(ϵ~,ψ,ϕ)>|(ϵ~,ψ,ϕ)|2/(4max(1,η~1,η~2)),

and (d) decrescent since

V(ϵ~,ψ,ϕ)<|(ϵ~,ψ,ϕ)|2/min(1,η~1,η~2),

where |(ϵ~,ψ,ϕ)|2i(ϵ~i)2+i,j(ψij)2+i,k(ϕil)2 and min/max take the minimum/maximum of their respective arguments.

Apart from the above conditions (a)-(d), we need to show the key property V˙0 for uniform global stability (which implies that bounded orbits remain bounded, so the error remains bounded); or the stronger property V˙<0 for asymptotic global stability (see for example [Narendra and Annaswamy, 1989; Ioannou and Sun, 2012]). Taking the time derivative of V, and replacing ϵ~˙ithat is τsαeiα(ϵ˙α*κ) from (Equation (36)), we have:

(38) V˙=iϵ~iϵ~˙i+1η~1i,jψijψ˙ij+1η~2i,lϕilϕ˙iliϵ~i(jψij(Sjκ)(t)+lϕil(Slffκ)(t)+(k+1)αeiα(ϵακ)(t))+1η~1i,jψijψ˙ij+1η~2i,lϕilϕ˙il=i,jψij(ϵ~i(Sjκ)(t)+1η~1ψ˙ij)+i,kϕil(ϵ~i(Slffκ)(t)+1η~2ϕ˙il)(k+1)iϵ~i2/τs.

If we enforce the first two terms above to be zero, we derive a learning rule

(39) ψ˙ij=η~1ϵ~i(Sjκ)(t)ϕ˙il=η~2ϵ~i(Slffκ)(t),

and then

V˙=-(k+1)iϵ~i2/τs0

requiring k>-1, which is subsumed under k1 for the error feedback. Equation (39) with η1η~1τs/k and η2η~2τs/k, and κ replaced by a longer filtering kernel κϵ, is the learning rule used in the main text, Equation (10).

Thus, in the (ϵ~,ψ,ϕ)-system given by Equations (36) and (39), we have proven the global uniform stability of the fixed point (ϵ~,ψ,ϕ)=(0,0,0), which is effectively (ϵ,ψ,ϕ)=(0,0,0), choosing η1,η2>0 and kmax(1,NB1), under assumptions (1 - 6), while simultaneously deriving the learning rule (Equation (39)).

This ends our proof. So far, we have shown that the system is Lyapunov stable, that is bounded orbits remain bounded, and not asymptotically stable. Indeed, with bounded firing rates and fixed readout weights, the output will remain bounded, as will the error (for a bounded reference). However, here, we also derived the FOLLOW learning rule, and armed with the inequality for the time derivative of the Lyapunov function in terms of the error, we further show in the following subsection that the error ϵ goes to zero asymptotically, so that after learning, even without error feedback, x^ reproduces the dynamics of x.

A major caveat of this proof is that under assumption (2) the dynamics be realizable by our network. In a real application this might not be the case. Approximation errors arising from a mismatch between the best possible network and the actual target dynamics are currently ignored. The adaptive control literature has shown that errors in approximating the reference dynamics appear as frozen noise and can cause runaway drift of the parameters (Narendra and Annaswamy, 1989; Ioannou and Sun, 2012). In our simulations with a large number of neurons, the approximations of a non-realizable reference dynamics (e.g., the van der Pol oscillator) were sufficiently good, and thus the expected drift was possibly slow, and did not cause the error to rise during typical time-scales of learning. A second caveat is our assumption (5). While the input is under our control and can therefore be kept bounded, some additional bounding is needed to stop weights from drifting. Various techniques to address such model-approximation noise and bounding weights have been studied in the robust adaptive control literature (e.g., (Ioannou and Tsakalis, 1986; Slotine and Coetsee, 1986; Narendra and Annaswamy, 1989; Ioannou and Fidan, 2006; Ioannou and Sun, 2012)). We discuss this issue and briefly mention some of these ameliorative techniques in Appendix 1.

To summarize, the FOLLOW learning rule (Equation (39)) on the feedforward or recurrent weights has two terms: (i) a filtered presynaptic firing trace (Slffκ)(t) or (Sj*κ)(t) that is available locally at each synapse; and (ii) a projected filtered error αkeiα(ϵακ)(t) used for all synapses in neuron i that is available as a current in the postsynaptic neuron i due to error feedback, see Equation (6). Thus the learning rule can be classified as local. Moreover, it uses an error in the observable x, not in its time-derivative. While we have focused on spiking networks, the learning scheme can be easily used for non-linear rate units by replacing the filtered spikes (Si*κ)(t) by the output of the rate units r(t). Our proof is valid for arbitrary dynamical transforms h(x,u) as long as they are realizable in a network. The proof shows uniform global stability using Lyapunov’s method.

Proof of error tending to zero asymptotically

In the above subsection, we showed uniform global stability using V˙=-(k+1)i(ϵ~i)20, with kmax(1,NB1) and ϵ~iτsαejα(ϵα*κ). This only means that bounded errors remain bounded. Here, we show more importantly that the error tends to zero asymptotically with time. We adapt the proof in section 4.2 of (Ioannou and Sun, 2012), to our spiking network.

Here, we want to invoke a special case of Barbălat’s lemma: if f,f˙ and fp for some p[1,), then f(t)0 as t. Recall the definitions: function fp when ||x||p(0|f(τ)|pdτ)1/p exists (is finite); and similarly function f when ||x||supt0|f(t)| exists (is finite).

Since V is positive semi-definite (V0) and is a non-increasing function of time (V˙0), its limtV=V exists and is finite. Using this, the following limit exists and is finite:

i0(ϵ~i(τ))2dτ=-1k+10V˙(τ)dτ=1k+1(V(0)-V).

Since each term in the above sum i is positive semi-definite, 0(ϵ~i(τ))2dτ also exists and is finite i, and thus ϵ~i2i.

To show that ϵ~i,ϵ~˙ii, consider Equation (36). We use assumption (6) that the input u(t) and the reference output x(t) are bounded. Since network output x^ is also bounded due to saturation of firing rates (as are the filtered spike trains), the error (each component) is bounded that is ϵ~ii. If we also bound the weights from diverging during learning (assumption (5)), then ψij,ϕili,j,l. With these reasonable assumptions, all terms on the right hand side of the Equation (36) for ϵ~˙i are bounded, hence ϵ~˙ii.

Since ϵ~i2i and ϵi~,ϵ~˙ii, invoking Barbălat’s lemma as above, we have ϵ~i0i as t. We have shown that the error tends to zero asymptotically under assumptions (1 - 6). In practice, the error shows fluctuations on a short time scale while the mean error over a longer time scale reduces and then plateaus, possibly due to approximate realizability, imperfections in the error-feedback, and spiking shot noise (cf. Figure 5).

We do not further require the convergence of parameters to ideal ones for our purpose, since the error tending to zero, that is network output matching reference, is functionally sufficient for the forward predictive model. In the adaptive control literature (Ioannou and Sun, 2012; Narendra and Annaswamy, 1989), the parameters (weights) are shown to converge to ideal ones if input excitation is ‘persistent’, loosely that it excites all modes of the system. It should be possible to adapt the proof to our spiking network, as suggested by simulations (Figure 5), but is not pursued here.

Appendix 1

Decoding

Consider only the command representation layer without the subsequent recurrent network. Assume, following (Eliasmith and Anderson, 2004), we wish to decode an arbitrary output v(u) corresponding to the u encoded in the command representation layer, from the spike trains Slff(t) of the neurons, by synaptically filtering and linearly weighting the trains with decoding weights dαl(v):

(40) v^α(u)=ldαl(v)(Slffκ)(t),

where * denotes convolution (Slffκ)(t)tSlff(t)κ(tt)dt=0Slff(tt)κ(t)dt, and κ(t)exp(-t/τs)/τs is a normalized filtering kernel.

We can obtain the decoders dαi(v) by minimizing the loss function

(41) L=α(vα(u)ldαl(v)Slffκt)2u

with respect to the decoders. The average u over u guarantees that the same constant decoders are used over the whole range of constant inputs u. The time average t denotes an analytic rate computed for each constant input for a LIF neuron. Linear regression with a finite set of constant inputs u was used to obtain the decoders (see Methods). With these decoders, if the input u varies slowly compared to the synaptic time constant τs, we have v^α=ldαl(v)(Slffκ)(t)vα(u).

Any function of the input v(u) can be approximated with appropriate linear decoding weights dαl(v) from the high-dimensional basis of non-linear tuning curves of heterogeneous neurons with different biases, encoding weights and gains, schematized in Figure 1—figure supplement 1. With a large enough number of such neurons, the function is expected to be approximated to arbitrary accuracy. While this has not been proven rigorously for spiking neurons, this has theoretical underpinnings from theorems on universal function approximation using non-linear basis functions (Funahashi, 1989; Hornik et al., 1989; Girosi and Poggio, 1990) successful usage in spiking neural network models by various groups (Seung et al., 2000; Eliasmith and Anderson, 2004; Eliasmith, 2005), and biological plausibility (Poggio, 1990; Burnod et al., 1992; Pouget and Sejnowski, 1997).

Here, the neurons that are active at any given time operate in the mean driven regime, that is the instantaneous firing rate increases with the input current (Gerstner et al., 2014). The dynamics is dominated by synaptic filtering, and the membrane time constant does not play a significant role (Eliasmith and Anderson, 2004; Eliasmith, 2005; Seung et al., 2000; Abbott et al., 2016). Thus, the decoding weights derived from Equation (41) with stationary input are good approximations even in the time-dependent case, as long as the input varies on a time scale slower than the synaptic time constant.

Online learning based on a loss function and its shortcomings

Suppose that a dynamical system given by

(42) x˙α=fα(x)+gα(u)

is to be mimicked by our spiking network implementing a different dynamical system with an extra error feedback term as in Equation (27). This can be interpreted as:

(43) τsx^˙α=-x^α+f˘α(x^,u)+g˘α(u)+kϵα.

Comparing with the reference Equation (42), after learning we want that f˘α(x^,u)+g˘α(u) should approximate τsfα(x^)+x^α+τsgα(u). One way to achieve this (Eliasmith and Anderson, 2004) is to ensure that f˘α(x^,u) and g˘α(u) approximate f~α(x^)τsfα(x^)+x^α and g~α(u)τsgα(u) respectively, as used in the loss functions below. In our simulations, we usually start with zero feedforward and recurrent weights, so that initially f˘(x^,u)=0=g˘α(u).

Assuming that the time scales of dynamics are slower than synaptic time scale τs, we can approximate the requisite feedforward and recurrent weights, by minimizing the following loss functions respectively, with respect to the weights (Eliasmith and Anderson, 2004):

(44) Lff=j(αekαffg~α(u)lwjlSlffκt)2x,
(45) Lrec=j(αejαf~α(x)-iwjiSi*κt)2x.

Using these loss functions, we can pre-calculate the weights required for any dynamical system numerically, similarly to the calculation of decoders in the subsection above.

We now derive rules for learning the weights online based on stochastic gradient descent of these loss functions, similar to (MacNeil and Eliasmith, 2011), and point out some shortcomings.

The learning rule for the recurrent weights by gradient descent on the loss function given by Equation (45) is

(46) dwjidt=12ηLrecwjiη(αejαf~α(x)iwji(Siκ)(t))(Siκ)(t)xηϵj(f~)(Siκ)(t)x.

In the second line, the effect of the weight change on the filtered spike trains is assumed small and neglected, using a small learning rate η. With requisite dynamics slower than synaptic τs, and with large enough number of neurons, we have approximated iwjiSi*κt(t)iwji(Si*κ)(t). The third line defines an error in the projected f~(x), which is the supervisory signal.

If we assume that the learning rate is slow, and the input samples the range of x uniformly, then we can remove the averaging over x, similar to stochastic gradient descent

(47) dwjidtηϵj(f~)(Siκ)(t),

where ϵj(f~)(αejαf~α(x)-iwji(Si*κ)(t)). This learning rule is the product of a multi-dimensional error ϵj(f~) and the filtered presynaptic spike train (Si*κ)(t). However, this error in the unobservable f~ is not available to the postsynaptic neuron, making the learning rule non-local. A similar issue arises in the feedforward case.

In mimicking a dynamical system, we want only the observable output of the dynamical system, that is x to be used in a supervisory signal, not a term involving the unknown f(x)that appears in the derivative x˙. Even if this derivative is computed from the observable x, it will be noisy. Furthermore, this derivative cannot be obtained by differentiating the observable versus time, if the observable is not directly the state variable, but an unknown non-linear function of it, which however our FOLLOW learning can handle (see next subsection). Thus, this online rule, if using just the observable error, can learn only an integrator for which f(x)x (MacNeil and Eliasmith, 2011).

Indeed, learning both the feedforward and recurrent weights simultaneously using gradient descent on these loss functions, requires two different and unavailable error currents to be projected into the postsynaptic neuron to make the rule local.

General dynamical system and transformed observable

General dynamical systems of the form

dx(t)dt=h(x(t),u(t)),y(t)=K(x(t))

can be learned with the same network configuration (Figure 1B) used for systems of the form Equation 1. Here, the state variable is x, but the observable which serves as the reference to the network is y. The transformation equation of the observable (second equation) can be absorbed into the first equation as below.

Consider the transformation equation for the observable. The dimensionality of the relevant variables: (1) the state variables (say joint angles and velocities) x; (2) the observables represented in the brain (say sensory representations of the joint angles and velocities) y; and (3) the control input (motor command) u, can be different from each other, but must be small compared to the number of neurons. Furthermore, we require the observable y to not lose information compared to x, that is K must be invertible, so y will have at least the same dimension as x.

The time evolution of the observable is

y˙β=αKβ(x)xαx˙α=αKβ(x)xαhα(x,u)pβ(y,u).

The last step follows since function K is invertible, so that x=K-1(y). So we essentially need to learn y˙β=pβ(y,u).

Having solved the observable transformation issue, we use x now for our observable instead of y, consistent with the main text. The dynamical system to be learned is now x˙β=pβ(x,u). Since our learning network effectively evolves as Equation (43), it can approximate pβ(x,u). Thus our network can learn general dynamical systems with observable transformations.

Approximation error causes drift in weights

A frozen noise term ξ(x(t)) due to the approximate decoding from non-linear tuning curves of neurons, by the feedforward weights, recurrent weights and output decoders, will appear additionally in Equation (36). If this frozen noise has a non-zero mean over time as x(t) varies, leading to a non-zero mean error, then it causes a drift in the weights due to the error-based learning rules in Equation (10), and possibly a consequent increase in error. Note that the stability and error tending to zero proofs assume that this frozen noise is negligible.

Multiple strategies with contrasting pros and cons have been proposed to counteract this parameter drift in the robust adaptive control literature (Ioannou and Sun, 2012; Narendra and Annaswamy, 1989; Ioannou and Fidan, 2006). These include a weight leakage/regularizer term switched slowly on, when a weight crosses a threshold (Ioannou and Tsakalis, 1986; Narendra and Annaswamy, 1989), or a dead zone strategy with no updating of weights once the error is lower than a set value (Slotine and Coetsee, 1986; Ioannou and Sun, 2012). In our simulations, the error continued to drop even over longer than typical learning time scales (Figure 5), and so, we did not implement these strategies.

In practice, the learning can be stopped once error is low enough, while the error feedback can be continued, so that the learned system does not deviate too much from the observed one.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
    Enforcing balance allows local supervised learning in spiking recurrent networks
    1. R Bourdoukan
    2. S Denève
    (2015)
    In: C Cortes, N. D Lawrence, D. D Lee, M Sugiyama, R Garnett, R Garnett, editors. Advances in Neural Information Processing Systems, 28. Curran Associates, Inc. pp. 982–990.
  6. 6
  7. 7
  8. 8
  9. 9
    Visuomotor transformations underlying arm movements toward visual targets: a neural network model of cerebral cortical operations
    1. Y Burnod
    2. P Grandguillaume
    3. I Otto
    4. S Ferraina
    5. PB Johnson
    6. R Caminiti
    (1992)
    Journal of Neuroscience 12:1435–1453.
  10. 10
    Modeling of continuous time dynamical systems with input by recurrent neural networks
    1. TWS Chow
    2. Xiao-Dong Li
    (2000)
    IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications 47:575–578.
    https://doi.org/10.1109/81.841860
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
    Neural Engineering: Computation, Representation, and Dynamics in Neurobiological Systems
    1. C Eliasmith
    2. CH Anderson
    (2004)
    MIT Press.
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
    Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies
    1. S Hochreiter
    2. Y Bengio
    3. P Frasconi
    4. J Schmidhuber
    (2001)
    Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies.
  35. 35
  36. 36
  37. 37
  38. 38
    Adaptive Control Tutorial
    1. P Ioannou
    2. B Fidan
    (2006)
    Philadelphia, PA: SIAM, Society for Industrial and Applied Mathematics.
    https://doi.org/10.1137/1.9780898718652
  39. 39
    Robust Adaptive Control (first edition)
    1. P Ioannou
    2. J Sun
    (2012)
    Mineola, New York: Dover Publications.
  40. 40
  41. 41
  42. 42
    The ”Echo State” Approach to Analysing and Training Recurrent Neural Networks
    1. H Jaeger
    (2001)
    Technical report.
  43. 43
    A Tutorial on Training Recurrent Neural Networks, Covering BPPT, RTRL, EKF and the "Echo StateNetwork" Approach
    1. H Jaeger
    (2005)
    A Tutorial on Training Recurrent Neural Networks, Covering BPPT, RTRL, EKF and the "Echo StateNetwork" Approach.
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
    On the adaptive control of robot manipulators
    1. W Li
    (1987)
    The International Journal of Robotics Research 6:49–59.
    https://doi.org/10.1177/027836498700600303
  52. 52
    Optimal Control for Biological Movement Systems
    1. W Li
    (2006)
    San Diego: University of California.
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
    Stable Adaptive Systems
    1. KS Narendra
    2. AM Annaswamy
    (1989)
    Prentice-Hall, Inc.
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
    A theory of how the brain might work
    1. T Poggio
    (1990)
    Cold Spring Harbor Symposia on Quantitative Biology 55:899–910.
    https://doi.org/10.1101/SQB.1990.055.01.084
  71. 71
  72. 72
  73. 73
  74. 74
    Principles of neurodynamics. perceptrons and the theory of brain mechanisms
    1. F Rosenblatt
    (1961)
    Technical report.
  75. 75
    Parallel Distributed Processing: Explorations in the Microstructure of Cognition
    1. D Rumelhart
    2. G Hinton
    3. R Williams
    (1986)
    Learning Internal Representations by Error Propagation, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, 1, Cambridge, MA, USA, MIT Press.
  76. 76
  77. 77
  78. 78
    Adaptive Control: Stability, Convergence, and Robustness
    1. S Sastry
    2. M Bodson
    (1989)
    Upper Saddle River, NJ, USA: Prentice-Hall, Inc.
  79. 79
  80. 80
    Adaptive representation of dynamics during learning of a motor task
    1. R Shadmehr
    2. FA Mussa-Ivaldi
    (1994)
    Journal of Neuroscience 14:3208–3224.
  81. 81
  82. 82
    Closer control of loops with dead time
    1. OJM Smith
    (1957)
    Chemical Engineering Progress 53:217–219.
  83. 83
  84. 84
    Python scripting in the Nengo simulator
    1. TC Stewart
    2. B Tripp
    3. C Eliasmith
    (2009)
    Frontiers in Neuroinformatics, 3, 10.3389/neuro.11.007.2009, 19352442.
  85. 85
  86. 86
  87. 87
    Generalization in reinforcement learning: Successful examples using sparse coarse coding
    1. RS Sutton
    (1996)
    Advances in Neural Information Processing Systems 8:138–1044.
  88. 88
  89. 89
  90. 90
  91. 91
  92. 92
  93. 93
  94. 94
  95. 95
  96. 96
  97. 97
  98. 98
  99. 99

Decision letter

  1. Peter Latham
    Reviewing Editor; University College London, United Kingdom

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Predicting non-linear dynamics by stable local learning in a recurrent spiking neural network" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors and the evaluation has been overseen by Timothy Behrens as the Senior Editor.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This paper proposes a learning scheme that allows a recurrent network of spiking neurons to predict the dynamics of a non-linear system by training feedforward and recurrent connections. The learning is achieved using a local learning rule. The error signal is fed back by fixed, random connections, with the feedback loop forming an auto-encoder. After learning, which is done via motor "babbling," the network is able to reproduce the output in an open-loop setting without feedback. The authors showcase their training algorithm in a number of linear and non-linear applications, including the dynamics of a two-link arm. In addition, the authors prove analytically under mild assumptions that learning converges if the non-linear dynamics can be realized by the network. The paper is mainly clearly written, and has the potential to be of significant interest to the community.

Essential revisions:

1) The firing rates are far higher than is biologically plausible. It is important that this is fixable, at least in principle. If not, then the scheme has nothing to do with the brain. We hesitate to ask for additional simulations, but we think you need to show that the network can work when firing rates are in a more biologically plausible range, as in less than about 20 Hz on average (still high, but closer to reality).

2) We did not follow a key step. The sixth equation in subsection “Network architecture and parameters” says:

sum_{beta, i} e_{j,beta} d_{beta,i} a_i(sum_alpha e_{i,alpha} epsilon_alpha)

\approx sum_alpha e_{j,alpha} epsilon_alpha

It seems that you effectively rewrote this as

sum_{beta, i} e_{j,beta} d_{beta,i} a_i(I_i) \approx I_i

(see Eq. (13)). Strictly speaking, this holds only if the current (whose ith component is I_i) lies in the low dimensional submanifold spanned by the e_{i,alpha}. However, because of the recurrent dynamic, the network is likely be chaotic, and so the current is likely to live on a high-dimensional manifold.

This seems to indicate a problem with the derivation. Or are we missing something? In any case, this needs to be explained much more clearly.

3) It would be good to have a clear, succinct description of what new feature allowed you to train spiking networks with a local, biologically realistic learning rule, something that others have failed to do. Perhaps it's the auto-encoder? (See next comment.)

4) The auto-encoder is critical for the operation of the network. Can you provide any intuition about why it's there, and what it does? This may not be possible, but it would greatly strengthen the manuscript if you could do it.

5) A potentially important limitation of this paper is that it does not take into account feedback delays during training. This would be a significant limitation if the purpose of this work is to propose how forward models can be learned in the brain: one of the main reasons to have forward models in the first place is being able to deal with feedback delays (see e.g. Wolpert, Ghahramani & Jordan, 1995). Learning that ignores feedback delays is therefore not fully biologically realistic. If this limitation is indeed correct, it needs to be mentioned more prominently in the Discussion section (we only found it buried in the Methods section); if the network can in fact handle feedback delays during training (that are on a similar scale as the target dynamics), it should be demonstrated, as it would make the impact of the paper significantly stronger.

6) An alternative approach to training recurrent networks is to initialize the recurrent weights in some appropriate way and to just train the readout weights during learning (reservoir computing). How much more computationally powerful is the proposed method? A simple way to test this would be to compare the performance of the trained network to that of a network where the recurrent weights have been shuffled after training, the feedback weights set to zero, and the output weights retrained by linear regression. If such a network had comparable computational performance, it would suggest that only the correct recurrent weight distribution is required to learn the task.

7) Does learning still work if the dynamical system has small amounts of stochasticity, or if the error signal is noisy?

8) It's critical to have experimental predictions. Or at least a summary of the implications of this work for what's going on in the brain.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for submitting your revised article "Predicting non-linear dynamics by stable local learning in a recurrent spiking neural network" for consideration by eLife. The evaluation has been overseen by a Peter Latham as the Reviewing Editor and Timothy Behrens as the Senior Editor.

I want to congratulate you on a very nice paper, and thank you for your very thorough responses; you addressed almost all the points. The only problem is that in parts the manuscript is still not very clear (at least not to me). Please take a very hard look at the Methods section "Network architecture and parameters", and make sure it's crystal clear.

https://doi.org/10.7554/eLife.28295.022

Author response

Summary:

This paper proposes a learning scheme that allows a recurrent network of spiking neurons to predict the dynamics of a non-linear system by training feedforward and recurrent connections. The learning is achieved using a local learning rule. The error signal is fed back by fixed, random connections, with the feedback loop forming an auto-encoder. After learning, which is done via motor "babbling," the network is able to reproduce the output in an open-loop setting without feedback. The authors showcase their training algorithm in a number of linear and non-linear applications, including the dynamics of a two-link arm. In addition, the authors prove analytically under mild assumptions that learning converges if the non-linear dynamics can be realized by the network. The paper is mainly clearly written, and has the potential to be of significant interest to the community.

We are sincerely thankful to the reviewers and editors for their insightful comments. We have incorporated all the essential revisions and minor points in our revised manuscript, as detailed in the individual responses below. We hope we have addressed them all to your satisfaction.

Essential revisions:

1) The firing rates are far higher than is biologically plausible. It is important that this is fixable, at least in principle. If not, then the scheme has nothing to do with the brain. We hesitate to ask for additional simulations, but we think you need to show that the network can work when firing rates are in a more biologically plausible range, as in less than about 20 Hz on average (still high, but closer to reality).

We thank the reviewers for exhorting us to reduce the firing rates to a more biologically plausible range.

We have replaced Figure 2 with newer simulations run with modified neuronal parameters that yield 13 Hz mean firing rate (averaged over 16s) now compared to 37 Hz before. Third paragraph of subsection “Non-linear oscillator” now reads:

“The distributions of firing rates averaged over a 0.25 s period with fairly constant output, and over a 16 s period with time-varying output, were long-tailed, with the mean across neurons maintained at approximately 12-13 Hz (Figure 2E, F).”

We have also modified the Methods section to include these different neuronal parameters in the second para of subsection “Network architecture and parameters”:

“For the low firing rate simulation in Figure 2, the gains $\nu_i$ were all set to 2, with the biases chosen uniformly from (−2, 2). Choosing a fixed gain for all neurons led to lower variance in firing rates compared to random gains.”

See also Figure 1—figure supplement 1 for representative neuronal tuning curves.

2) We did not follow a key step. The sixth equation in subsection “network architecture and parameters” says:

sum_{beta, i} e_{j,beta} d_{beta,i} a_i(sum_alpha e_{i,alpha} epsilon_alpha)

\approx sum_alpha e_{j,alpha} epsilon_alpha

It seems that you effectively rewrote this as

sum_{beta, i} e_{j,beta} d_{beta,i} a_i(I_i) \approx I_i

(see Eq. (13)). Strictly speaking, this holds only if the current (whose ith component is I_i) lies in the low dimensional submanifold spanned by the e_{i,alpha}. However, because of the recurrent dynamic, the network is likely be chaotic, and so the current is likely to live on a high-dimensional manifold.

This seems to indicate a problem with the derivation. Or are we missing something? In any case, this needs to be explained much more clearly.

We profusely thank the reviewers for pointing out this issue.

The neural input actually remains always in the requisite low-dimensional manifold. We have now clarified and derived it in detail in the text as below.

Below equation (17) for the compressive auto-encoder we write:

“Thus, the loss function of equation (17) is approximately proportional to the squared-error loss function of equation (15) used for the expansive auto-encoder, showing that for an auto-encoder embedded in a loop with fixed random encoding weights, the expansive and compressive descriptions are equivalent for those $N$-dimensional inputs $I_i$ that lie in the $N_d$-dimensional sub-space spanned by $\{e_{i\alpha}\}$ i.e. $I_i$ is of the form $\sum_\alpha e_{i\alpha} \epsilon_\alpha$ where $\epsilon_\alpha$ lies in a finite domain (hypersphere).”

Below equation (25) [above-referred previous equation (13)] in the first major step of the proof, we have added:

“This is valid only if the high-$N$-dimensional input $I_i$ lies in the low-$N_d$-dimensional subspace spanned by $\{e_{i\alpha}\}$ (equation (17)). We show that this requirement is fulfilled in the next major step of the proof (see text accompanying equations (31)-(35)).”

The text in subsection “Error-feedback loop ensures that output follows reference” around new equations (31)-(35) in the second major step of the proof confirms this requirement. The text is too large to excerpt here.

In brief, the derivation shows that the fed-back error current dominates over the feedforward and recurrent contributions, beginning with small feedforward and recurrent weights, at the start of learning. The fed-back error input lies in the low dimensional manifold spanned by the error encoding weights, driving the neural activity close to ideal (also in a low-dimensional manifold) and this in turn ensures that the feedforward and recurrent weights evolve to contribute terms essentially in the low-dimensional manifold. Thus, at all times in the learning and testing, the neural currents lie in a low dimensional manifold.

3) It would be good to have a clear, succinct description of what new feature allowed you to train spiking networks with a local, biologically realistic learning rule, something that others have failed to do. Perhaps it's the auto-encoder? (See next comment.)

We thank the reviewers for asking us to clarify the new features of our learning scheme.

We address both points 3 and 4 with this response. The key new features are the use of the error-feedback loop along with the auto-encoder.

We have introduced a sub-section early in the Results section titled: “Negative-feedback of error via auto-encoder enables local learning”.

This incorporates old and new text, explaining what new features enable learning and how.

In brief, the error feedback using the auto-encoder of gain k, serves two purposes: (1) it makes the projected error available as a local current in every neuron in the recurrent network, and (2) it drives each neuron's activity in the recurrent network to be close to ideal. Thus, since the activities are already ideal, only the weights are responsible for the output error, and the weights can be trained with a simple perceptron-like learning rule, overcoming the credit-assignment problem. In FORCE, for example, neural activities are also forced close to ideal, but by very fast weights swings initially, at a time scale faster than the reference dynamics (biologically implausible), and without the error being projected locally in each neuron (non-local).

In particular please see the paragraphs (much easier to read in the typeset pdf, hence not excerpting fully here):

Subsection “Negative error feedback via auto-encoder enables local learning:

"The combination of the auto-encoder and the error feedback implies that the output stays close to the reference, as explained now..."

In the Methods section, we show that not just the low-dimensional output $\vec{\hat{x}}$, but also the spike trains $S_i(t)$, for $i=1,\ldots,N$, are entrained by the error feedback to be close to the ideal ones required to generate $\vec{x}$.

"During learning, the error feedback via the auto-encoder in a loop serves two roles: (i) to make the error current available in each neuron, projected correctly, for a local synaptic plasticity rule, and (ii) to drive the spike trains to the target ones for producing the reference output. In other learning schemes for recurrent neural networks, where neural activities are not constrained by error feedback, it is difficult to assign credit or blame for the momentarily observed error, because neural activities from the past affect the present output in a recurrent network. In the FOLLOW scheme, the spike trains are constrained to closely follow the ideal time course throughout learning, so that the present error can be attributed directly to the weights, enabling us to change the weights with a simple perceptron-like learning rule \citep{rosenblatt_principles_1961} as in equation (10), bypassing the credit assignment problem. In the perceptron rule, the weight change $\Δ w \sim (\text{pre})\cdot \δ$ is proportional to the presynaptic input $(\text{pre})$ and the error $\δ$. In the FOLLOW learning rule of equation (10), we can identify $(S_i*\kappa)$ with $(\text{pre})$ and $(I_i^\epsilon * \kappa^\epsilon)$ with $\δ$. In Methods, we derive the learning rule of equation (10) in a principled way from a stability criterion."

We further mention how this is different from FORCE and some other similar methods (briefly here after introducing the rule and above intuition in the results, see also the detailed comparisons in the Discussion section):

"FORCE learning [Sussillo and Abbott, 2009; 2012; DePasquale et al., 2016; Thalmeier et al., 2016; Nicola and Clopath, 2016] also clamps the output and neural activities to be close to ideal during learning, by using weight changes that are faster than the time scale of the dynamics. In our FOLLOW scheme, clamping is achieved via negative error feedback using the auto-encoder, which allows weight changes to be slow and makes the error current available locally in the post-synaptic neuron. Other methods used feedback based on adaptive control for learning in recurrent networks of spiking neurons, but were limited to linear systems [MacNeil and Eliasmith, 2011; Bourdoukan and Deneve, 2015], whereas the FOLLOW scheme was derived for non-linear systems (see Methods section). Our learning rule of equation (10) uses an error $\epsilon_\alpha \equiv x_\alpha – \hat{x}_\alpha$ in the observable state, rather than an error involving the derivative $dx_\alpha/dt$ in equation (1), as in other schemes (see Appendix 1) [Eliasmith, 2005; MacNeil and Eliasmith, 2011]. The reader is referred to the Discussion section for detailed further comparisons. The FOLLOW learning rule is local since all quantities needed on the right-hand-side of equation (10) could be available at the location of the synapse in the postsynaptic neuron. For a potential implementation and prediction for error-based synaptic plasticity, and for a critical evaluation of the notion of ‘local rule’, we refer to the Discussion section."

4) The auto-encoder is critical for the operation of the network. Can you provide any intuition about why it's there, and what it does? This may not be possible, but it would greatly strengthen the manuscript if you could do it.

Thanks for asking us to clarify the auto-encoder. It is key to ensuring that the error is fed-back projected correctly to the neurons for a local learning rule, and to ensure that the neurons are driven to the ideal activities by the error current.

The text added for this clarification is incorporated in the response for the above revision.

See in particular Subsection “Negative error feedback via auto-encoder enables local learning.

“The combination of the auto-encoder and the error feedback implies that the output stays close to the reference, as explained now. In open loop i.e. without connecting the output $\vec{\hat{x}}$ and the reference $\vec{x}$ to the error node, an input $\vec{\epsilon}$ to the network generates an output $\vec{\hat{x}} = k\vec{\epsilon}$ due to the auto-encoder of gain $k$. In closed loop, i.e. with the output and reference connected to the error node (Figure 1B), the error input is $\vec{\epsilon} = \vec{x}-\vec{\hat{x}}$, and the network output $\vec{\hat{x}}$ settles to:

\vec{\hat{x}} = k\vec{\epsilon} = k\left(\vec{x}-\vec{\hat{x}}\right)

\implies \vec{\hat{x}} = \frac{k}{k+1}\vec{x} \approx \vec{x},

i.e. approximately the reference $\vec{x}$ for large positive $k$. The fed-back residual error $\vec{\epsilon}=\vec{x}/(k+1)$ drives the neural activities and thence the network output. Thus, feedback of the error causes the output $\hat{x}_\alpha$ to approximately follow $x_\alpha$, for each component $\alpha$, as long as the error feedback time scale is fast compared to the reference dynamical system time scale, analogous to negative error feedback in adaptive control [Narendra and Annaswamy, 1989; Ioannou and Sun, 2012].”

5) A potentially important limitation of this paper is that it does not take into account feedback delays during training. This would be a significant limitation if the purpose of this work is to propose how forward models can be learned in the brain: one of the main reasons to have forward models in the first place is being able to deal with feedback delays (see e.g. Wolpert, Ghahramani & Jordan, 1995). Learning that ignores feedback delays is therefore not fully biologically realistic. If this limitation is indeed correct, it needs to be mentioned more prominently in the Discussion section (we only found it buried in the Methods section); if the network can in fact handle feedback delays during training (that are on a similar scale as the target dynamics), it should be demonstrated, as it would make the impact of the paper significantly stronger.

Thank you for pointing out this important use case of the forward model vis-a-vis our learning scheme.

We performed these simulations and incorporated the results (Figure 6E-H). While our learning scheme cannot really learn a dynamical system followed by a delay taken together as the reference system, we propose a scheme in the Discussion section (Figure 7B-C) using the Smith predictor configuration, adapted from suggestions by Miall & Wolpert, 1996 and Smith, 1957, whereby our forward model can learn with and compensate for sensory feedback delays in motor control. Our forward model is able to feedback a corrective prediction to the controller, before the sensory feedback arrives, thus improving control.

In the Results section, we have added panels E-H in Figure 6 and added further text at the very end of Results section:

"We also asked if the network could handle sensory feedback delays in the reference signal. Due to the strong limit cycle attractor of the van der Pol oscillator, the effect of delay is less transparent than for the linear decaying oscillator (Figure 2—figure supplement 2), so we decided to focus on the latter. For the linear decaying oscillator, we found that learning degraded rapidly with a few milliseconds of delay in the reference, i.e. if $x(t − \Δ)$ was provided as reference instead of $x(t)$ (Figure 6E-F). We compensated for the sensory feedback delay by delaying the motor command input by identical $\Δ$ (Figure 6G), which is equivalent to time-translating the complete learning protocol, to which the learning is invariant, and thus the network would learn for arbitrary delay (Figure 6H). In the Discussion section, we suggest how a forward model learned with a compensatory delay (Figure 6G) could be used in control mode to compensate for sensory feedback delays."

In the Discussion section, we have added panels B and C in Figure 7 and the text below:

“One of the postulated uses of the forward predictive model is to compensate for delay in the sensory feedback during motor control [Miall and Wolpert, 1996; Wolpert et al., 1995] using the Smith predictor configuration [Smith, 1957]. We speculate that the switch from the closed-loop learning of forward model with feedback gain k >> 1 to open-loop motor prediction k = 0 could also be used to switch delay lines: the system can have either a delay before the forward model as required for learning (Figure 7B), or after the forward model as required for the Smith predictor (Figure 7C). We envisage that FOLLOW learning of the forward model occurs in closed loop mode (k >> 1) with a delay in the motor command path, as outlined earlier in Figure 6G and now embedded in the Smith predictor architecture in Figure 7B. After learning, the network is switched to motor control mode, with the forward predictive model in open loop (k = 0), implementing the Smith predictor (Figure 7C). In this motor control mode, the motor command is fed with zero delay to the forward model. This enables to rapidly feed the estimated state back to the motor controller so as to take corrective actions, even before sensory feedback arrives. In parallel, available sensory feedback is compared with a copy of the forward model that has passed through a compensatory delay after the forward model (Figure 7C).”

6) An alternative approach to training recurrent networks is to initialize the recurrent weights in some appropriate way and to just train the readout weights during learning (reservoir computing). How much more computationally powerful is the proposed method? A simple way to test this would be to compare the performance of the trained network to that of a network where the recurrent weights have been shuffled after training, the feedback weights set to zero, and the output weights retrained by linear regression. If such a network had comparable computational performance, it would suggest that only the correct recurrent weight distribution is required to learn the task.

Thank you for this interesting comparison to reservoir computing.

We performed these simulations. Our recurrent network does not behave like a reservoir. Rather as shown for revision 2, the neural activities remain low-dimensional as they are driven strongly by the error current.

At the end of subsection “Non-linear oscillator” (van der Pol) oscillator, we added the text below:

“We also asked whether merely the distribution of the learned weights in the recurrent layer was sufficient to perform the task, or whether the specific learned weight matrix was required. This question was inspired from reservoir computing [Jaeger, 2001; Maass et al., 2002; Legenstein et al., 2003; Maass and Markram, 2004; Jaeger and Haas, 2004; Joshi and Maass, 2005; Legenstein and Maass, 2007], where the recurrent weights are random, and only the readout weights are learned. To answer this question, we implemented a perceptron learning rule on the readout weights initialized at zero, with the learned network’s output as the target, after setting the feedforward and / or recurrent weights to either the learned weights as is or after shuffling them. The readout weights could be approximately learned only for the network having the learned weights and not the shuffled ones (Figure 2—figure supplement 3), supporting the view that the network does not behave like a reservoir (Methods section).”

We have also added the corresponding text in the Methods section.

7) Does learning still work if the dynamical system has small amounts of stochasticity, or if the error signal is noisy?

Thanks for asking about noise in the reference / error signals. Yes, it is robust.

We have added in the last sub-section of Results section, panel B in Figure 6 with accompanying text:

“We added Gaussian white noise to each component of the error, which is equivalent to adding it to each component of the reference, and ran the van der Pol oscillator learning protocol for 10,000 s for different standard deviations of the noise (Figure 6B). The learning was robust to noise with standard deviation up to around 0.001, which must be compared with the error amplitude of the order of 0.1 at the start of learning, and orders of magnitude lower later.”

8) It's critical to have experimental predictions. Or at least a summary of the implications of this work for what's going on in the brain.

Thank you for pointing out this important lacuna. Our two predictions about error-current driven synaptic plasticity and the presence of an auto-encoder in a feedback loop have been highlighted now.

We have added in the Discussion section with Figure 7A, a prediction about error-current driven synaptic plasticity:

“A possible implementation in a spatially extended neuron would be to imagine that the post-synaptic error current $I_i^\epsilon$ arrives in the apical dendrite where it stimulates messenger molecules that quickly diffuse or are actively transported into the soma and basal dendrites where synapses from feedforward and feedback input could be located, as depicted in Figure 7A. Consistent with the picture of a messenger molecule, we low-pass filtered the error current with an exponential filter $\kappa^\epsilon$ of time constant 80 ms or 200 ms, much longer than the synaptic time constant of 20 ms of the filter $\kappa$. Simultaneously, filtered information about presynaptic spike arrival $S_j*\kappa$ is available at each synapse, possibly in the form of glutamate bound to the postsynaptic receptor or by calcium triggered signalling chains localized in the postsynaptic spines. Thus, the combination of effects caused by presynaptic spike arrival and error information available in the postsynaptic cell drives weight changes, in loose analogy to standard Hebbian learning.

The separation of the error current from the currents at feedforward or recurrent synapses could be spatial (such as suggested in Figure 7A) or chemical if the error current projects onto synapses that trigger a signalling cascade that is different from that at other synapses. Importantly, whether it is a spatial or chemical separation, the signals triggered by the error currents need to be available throughout the postsynaptic neuron. This leads us to a prediction regarding synaptic plasticity that, say in cortical pyramidal neurons, the plasticity of synapses that are driven by pre-synaptic input in the basal dendrites, should be modulated by currents injected in the apical dendrite or on stimulation of feedback connections.”

Further, the presence of an auto-encoder in a feedback loop is a prediction also added to the Discussion section:

“The first assumption is that error encoding feedback weights and output decoding readout weights form an auto-encoder. This requirement can be met if, at an early developmental stage, either both sets of weights are learned using say mirrored STDP [Burbank, 2015], or the output readout weights are learned, starting with random encoding weights, via a biological perceptron-like learning rule [D'Souza et al., 2010; Urbanczik and Senn, 2014]. A pre-learned auto-encoder in a high-gain negative feedback loop is in fact a specific prediction of our learning scheme, to be tested in systems-level experiments.”

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for submitting your revised article "Predicting non-linear dynamics by stable local learning in a recurrent spiking neural network" for consideration by eLife. The evaluation has been overseen by a Peter Latham as the Reviewing Editor and Timothy Behrens as the Senior Editor.

I want to congratulate you on a very nice paper, and thank you for your very thorough responses; you addressed almost all the points. The only problem is that in parts the manuscript is still not very clear (at least not to me). Please take a very hard look at the Methods section "Network architecture and parameters", and make sure it's crystal clear.

We thank you for the compliments and for pointing out the lack of clarity in the Methods subsection: "Network architecture and parameters". We have re-written the unclear subsection, and we hope that the same is now crystal clear. The changes are outlined below.

We have substantially re-written the said Methods subsection. It is now renamed "Network parameters" as the network architecture was already described at the start of the Results section. We have partitioned this subsection into smaller subheadings. The key changes are, in brief, that the gains and biases of neurons and the encoding weights for each layer are first clearly defined. We then explain the procedure for initializing them randomly at the start of a simulation, after which they remain fixed. Figure —figure supplement 1 has also been suitably modified to show the different gain functions of the heterogeneous neurons. We have also improved the explanation of setting the readout weights to form an auto-encoder with respect to the error encoding weights, as well as improved the discussion of the expansive versus compressive view of our auto-encoder.

We have also made a few minor word-level corrections / improvements in other sections.

https://doi.org/10.7554/eLife.28295.023

Article and author information

Author details

  1. Aditya Gilra

    1. Brain-Mind Institute, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
    2. School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    aditya.gilra@epfl.ch
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8628-1864
  2. Wulfram Gerstner

    1. Brain-Mind Institute, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
    2. School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Writing—review and editing
    Competing interests
    No competing interests declared

Funding

European Research Council (Multirules 268 689)

  • Aditya Gilra
  • Wulfram Gerstner

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (CRSII2_147636)

  • Aditya Gilra
  • Wulfram Gerstner

Horizon 2020 Framework Programme (Human Brain Project 720270)

  • Wulfram Gerstner

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Johanni Brea, Samuel Muscinelli and Laureline Logiaco for helpful discussions, and Samuel Muscinelli, Laureline Logiaco, Chris Stock, Tilo Schwalger, Olivia Gozel, Dane Corneil and Vasiliki Liakoni for comments on the manuscript. Financial support was provided by the European Research Council (Multirules, grant agreement no. 268689), the Swiss National Science Foundation (Sinergia, grant agreement no. CRSII2_147636), and the European Commission Horizon 2020 Framework Program (H2020) (Human Brain Project, grant agreement no. 720270).

Reviewing Editor

  1. Peter Latham, University College London, United Kingdom

Publication history

  1. Received: May 3, 2017
  2. Accepted: November 22, 2017
  3. Accepted Manuscript published: November 27, 2017 (version 1)
  4. Version of Record published: December 14, 2017 (version 2)
  5. Version of Record updated: February 23, 2018 (version 3)

Copyright

© 2017, Gilra et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,305
    Page views
  • 489
    Downloads
  • 6
    Citations

Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)