Learning recurrent dynamics in spiking networks
 Cited 2
 Views 1,983
 Annotations
Abstract
Spiking activity of neurons engaged in learning and performing a task show complex spatiotemporal dynamics. While the output of recurrent network models can learn to perform various tasks, the possible range of recurrent dynamics that emerge after learning remains unknown. Here we show that modifying the recurrent connectivity with a recursive least squares algorithm provides sufficient flexibility for synaptic and spiking rate dynamics of spiking networks to produce a wide range of spatiotemporal activity. We apply the training method to learn arbitrary firing patterns, stabilize irregular spiking activity in a network of excitatory and inhibitory neurons respecting Dale’s law, and reproduce the heterogeneous spiking rate patterns of cortical neurons engaged in motor planning and movement. We identify sufficient conditions for successful learning, characterize two types of learning errors, and assess the network capacity. Our findings show that synapticallycoupled recurrent spiking networks possess a vast computational capability that can support the diverse activity patterns in the brain.
https://doi.org/10.7554/eLife.37124.001Introduction
Neuronal populations exhibit diverse patterns of recurrent activity that can be highly irregular or wellstructured when learning or performing a behavioral task (Churchland and Shenoy, 2007; Churchland et al., 2012; Harvey et al., 2012; Pastalkova et al., 2008; Li et al., 2015). An open questions whether learninginduced synaptic rewiring is sufficient to give rise to the wide range of spiking dynamics that encodes and processes information throughout the brain.
It has been shown that a network of recurrently connected neuron models can be trained to perform complex motor and cognitive tasks. In this approach, synaptic connections to a set of outputs are trained to generate a desired populationaveraged signal, while the activity of individual neurons within the recurrent network emerges in a selforganized way that harnesses chaotic temporally irregular activity of a network of ratebased neurons (Sompolinsky et al., 1988) that is made repeatable through direct feedback from the outputs or through training of the recurrent connections (Maass et al., 2007; Sussillo and Abbott, 2009). The resulting irregular yet stable dynamics provides a rich reservoir from which complex patterns such as motor commands can be extracted by trained output neurons (Sussillo and Abbott, 2009; Buonomano and Maass, 2009; Jaeger and Haas, 2004), and theoretical studies have shown that the network outputs are able to perform universal computations (Maass et al., 2007).
Here, we explore whether there is even a need for a set of output neurons. Instead, each unit in the recurrent network could be considered to be an output and learn target patterns directly while simultaneously serving as a reservoir. Laje and Buonomano (2013) showed that individual rate units in a recurrent network can learn to stabilize innate chaotic trajectories that an untrained network naturally generates. The trained trajectories are then utilized to accomplish timing tasks by summing their activities with trained weights. DePasquale et al. (2018) obtained a set of target trajectories from a target network driven externally by the desired network output. They showed that training individual units on such target trajectories and then adjusting the readout weights yielded better performance than an untrained random recurrent network with a trained feedback loop (i.e. ‘traditional’ FORCE learning). Rajan et al., 2016 trained a small fraction of synaptic connections in a randomly connected rate network to produce sequential activity derived from cortical neurons engaged in decision making tasks.
Although these studies demonstrate that units within a ratebased network can learn recurrent dynamics defined by specific forms of target functions, the possible repertoire of the recurrent activity that a recurrent network can learn has not been extensively explored. Moreover, extending this idea to spiking networks, where neurons communicate with time dependent spike induced synapses, poses an additional challenge because it is difficult to coordinate the spiking dynamics of many neurons, especially, if spike times are variable as in a balanced network (London et al., 2010). Some success has been achieved by training spiking networks directly with a feedback loop (Nicola and Clopath, 2017) or using a ratebased network as an intermediate step (DePasquale et al., 2016; Thalmeier et al., 2016). A different topdown approach is to build networks that emit spikes optimally to correct the discrepancy between the actual and desired network outputs (Boerlin et al., 2013; Denève and Machens, 2016). This optimal coding strategy in a tightly balanced network can be learned with a local plasticity rule (Brendel et al., 2017) and is able to generate arbitrary network output at the spike level (Bourdoukan and Deneve, 2015; Denève et al., 2017).
We show that a network of spiking neurons is capable of supporting arbitrarily complex coarsegrained recurrent dynamics provided the spatiotemporal patterns of the recurrent activity are diverse, the synaptic dynamics are fast, and the number of neurons in the network is large. We give a theoretical basis for how a network can learn and show various examples, which include stabilizing strong chaotic rate fluctuations in a network of excitatory and inhibitory neurons that respects Dale’s law and constructing a recurrent network that reproduces the spiking rate patterns of a large number of cortical neurons involved in motor planning and movement. Our study suggests that individual neurons in a recurrent network have the capability to support near universal dynamics.
Results
Spiking networks can learn complex recurrent dynamics
We considered a network of $N$ quadratic integrateandfire neurons that are recurrently connected with spikeactivated synapses weighted by a connectivity matrix $W$. We show below that our results do not depend on the spiking mechanism. We focused on two measures of coarsegrained timedependent neuron activity: (1) the synaptic drive ${u}_{i}\left(t\right)$ to neuron $i$ which is given by the $W$weighted sum of lowpass filtered incoming spike trains, and (2) the timeaveraged spiking rate ${R}_{i}\left(t\right)$ of neuron $i$. The goal was to find a weight matrix $W$ that can autonomously generate desired recurrent target dynamics when the network of spiking neurons connected by $W$ is stimulated briefly with an external stimulus (Figure 1a). The target dynamics were defined by a set of functions ${f}_{1}\left(t\right),{f}_{2}\left(t\right),\mathrm{...},{f}_{N}\left(t\right)$ on a time interval $[0,T]$. Learning of the recurrent connectivity $W$ was considered successful if ${u}_{i}\left(t\right)$ or ${R}_{i}\left(t\right)$ evoked by the stimulus matches the target functions ${f}_{i}\left(t\right)$ over the time interval $[0,T]$ for all neurons $i=1,2,\mathrm{...},N$.
Previous studies have shown that recurrently connected rate units can learn specific forms of activity patterns, such as chaotic trajectories that the initial network could already generate (Laje and Buonomano, 2013), trajectories from a target network (DePasquale et al., 2018), and sequential activity derived from imaging data (Rajan et al., 2016). Our study expanded these results in two ways: first, we trained the recurrent dynamics of spiking networks, and, second, we showed that the repertoire of recurrent dynamics that can be encoded is vast. The primary goal of our paper was to investigate the computational capability of spiking networks to generate arbitrary recurrent dynamics, therefore we neither trained the network outputs (Sussillo and Abbott, 2009; Sussillo et al., 2015; Nicola and Clopath, 2017) nor constrained the target signals to those designed for performing specific computations (DePasquale et al., 2018). We focused on training the recurrent activity as in the work of Laje and Buonomano (2013) (without the readouts) and Rajan et al., 2016, and considered arbitrary target functions. To train the activity of individual neurons within a spiking network, we extended the Recursive Least Squares (RLS) algorithm developed by Laje and Buonomano in ratebased networks (Laje and Buonomano, 2013). The algorithm was based on the FORCE algorithm (Haykin, 1996; Sussillo and Abbott, 2009), originally developed to train the network outputs by minimizing a quadratic cost function between the activity measure and the target together with a quadratic regularization term (see Materials and methods, 'Training recurrent dynamics'). Example code that trains a network of quadratic integrateandfire neurons is available at https://github.com/chrismkkim/SpikeLearning (Kim and Chow, 2018; copy archived at https://github.com/elifesciencespublications/SpikeLearning).
As a first example, we trained the network to produce synaptic drive patterns that matched a set of sine functions with random frequencies and the spiking rate to match the positive part of the same sine functions. The initial connectivity matrix had connection probability $p=0.3$ and the coupling strength was drawn from a Normal distribution with mean $0$ and standard deviation $\sigma $. Prior to training, the synaptic drive fluctuated irregularly, but as soon as the RLS algorithm was instantiated, the synaptic drives followed the target with small error; rapid changes in $W$ quickly adjusted the recurrent dynamics towards the target (Sussillo and Abbott, 2009) (Figure 1b,c). As a result, the population spike trains exhibited reproducible patterns across training trials. A brief stimulus preceded each training session to reset the network to a specific state. If the training was successful, the trained response could be elicited whenever the same stimulus was applied regardless of the network state. We were able to train a network of ratebased neurons to learn arbitrarily complex target patterns using the same learning scheme (Figure 1—figure supplement 1).
Training the spiking rate was more challenging than training the synaptic drive because small changes in recurrent connectivity did not immediately affect the spiking activity if the effect was below the spikethreshold. Therefore, the spike trains may not follow the desired spiking rate pattern during the early stage of training, and the population spike trains no longer appeared similar across training trials (Figure 1d). This was also reflected in relatively small changes in recurrent connectivity and the substantially larger number of training runs required to produce desired spiking patterns (Figure 1e). However, by only applying the training when the total input to a neuron is suprathreshold, the spiking rate could be trained to reproduce the target patterns. The correlation between the actual filtered spike trains and the target spiking rate increased gradually as the training progressed.
Previous work that trained the network readout had proposed that the initial recurrent network needed to be at the ‘edge of chaos’ to learn successfully (Bertschinger and Natschläger, 2004; Sussillo and Abbott, 2009; Abbott et al., 2016; Thalmeier et al., 2016; Nicola and Clopath, 2017). However, we found that the recurrent connectivity could learn to produce the desired recurrent dynamics regardless of the initial network dynamics and connectivity. Even when the initial network had no synaptic connections, the brief stimulus preceding the training session was sufficient to build a fully functioning recurrent connectivity that captured the target dynamics. The RLS algorithm could grow new synapses or tune existing ones as long as some of the neurons became active after the initial stimulus (Figure 1—figure supplement 2).
Learning was not limited to one set of targets; the same network was able to learn multiple sets of targets. We trained the network to follow two independent sets of targets, where each target function was a sine function with random frequency. Every neuron in the network learned both activity patterns after training, and, when stimulated with the appropriate cue, the network recapitulated the specified trained pattern of recurrent dynamics, regardless of initial activity. The synaptic drive and the spiking rate were both able to learn multiple target patterns (Figure 2).
Learning arbitrary patterns of activity
Next, we considered targets generated from various families of functions: complex periodic functions, chaotic trajectories, and OrnsteinHollenbeck (OU) noise. We randomly selected $N$ different target patterns from one of the families to create a set of heterogeneous targets, and trained the synaptic drive of a network consisting of $N$ neurons to learn the target dynamics. These examples demonstrated that recurrent activity patterns that a spiking network can generate is not limited to specific forms of patterns considered in previous studies (Laje and Buonomano, 2013; Rajan et al., 2016; DePasquale et al., 2018), but can be arbitrary functions. The successful learning suggested that single neurons embedded in a spiking network have the capability to perform universal computations.
As we will show more rigorously in the next section, we identified two sufficient conditions on the dynamical state and spatiotemporal structure of target dynamics that ensure a wide repertoire of recurrent dynamics can be learned. The first is a ‘quasistatic’ condition that stipulates that the dynamical time scale of target patterns must be slow enough compared to the synaptic time scale and average spiking rate. The second is a ‘heterogeneity’ condition that requires the spatiotemporal structure of target patterns to be diverse enough. The target patterns considered in Figure 3 had slow temporal dynamics in comparison to the synaptic time constant (${\tau}_{s}=20$ ms) and the patterns were selected randomly to promote diverse structure. After training each neuron’s synaptic drive to produce the respective target pattern, the synaptic drive of every neuron in the network followed its target.
To verify the quasistatic condition, we compared the actual to a quasistatic approximation of the spiking rate and synaptic drive. The spiking rates of neurons were approximated using the currenttorate transfer function with timedependent synaptic input, and the synaptic drive was approximated by a weighted sum of the presynaptic neurons’ spiking rates. We elicited the trained patterns over multiple trials starting at random initial conditions to calculate the trialaveraged spiking rates. The quasistatic approximations of the synaptic drive and spiking rate closely matched the actual synaptic drive (Figure 3a) and trialaveraged spiking rates (Figure 3b).
To examine how the heterogeneity of target patterns may facilitate learning, we created sets of target patterns where the fraction of randomly generated targets was varied systematically. For nonrandom targets, we used the same target pattern repeatedly. Networks trained to learn target patterns with strong heterogeneity showed that a network is able to encode target patterns with high accuracy if there is a large fraction of random targets (Figure 3c). Networks that were trained on too many repeated target patterns failed to learn. Beyond a certain fraction of random patterns, including additional patterns did not improve the performance, suggesting that the set of basis functions was overcomplete. We probed the stability of overcomplete networks under neuron loss by eliminating all the synaptic connections from a fraction of the neurons. A network was first trained to learn target outputs where all the patterns were selected randomly (i.e. fraction of random targets equals 1) tonsure that the target patterns form a set of redundant basis functions. Then, we elicited the trained patterns after removing a fraction of neurons from the network, which entails eliminating all the synaptic connections from the lost neurons. A trained network with 5% neuron loss was able to generate the trained patterns perfectly, 10% neuron loss resulted in a mild degradation of network response, and trained patterns completely disappeared after 40% neuron loss (Figure 3d).
The target dynamics considered in Figure 3 had population spiking rates of 9.1 Hz (periodic), 7.2 Hz (chaotic) and 12.1 Hz (OU) within the training window. To examine how population activity may influence learning, we trained networks to learn target patterns whose average amplitude was reduced gradually across target sets. The networks were able to learn when the population spiking rate of the target dynamics was as low as 1.5 Hz. However, the performance deteriorated as the population spiking rate decreased further (Figure 3—figure supplement 1). To demonstrate that learning does not depend on the spiking mechanism, we trained the synaptic drive of spiking networks using different neuron models. A network of leaky integrateandfire neurons, as well as a network of Izhikevich neurons whose neuron parameters were tuned to have five different firing patterns, successfully learned complex synaptic drive patterns (Figure 3—figure supplement 2).
Stabilizing rate fluctuations in a network respecting Dale’s law
A random network with balanced excitation and inhibition is a canonical model for a cortical circuit that produces asynchronous single unit activity (Sompolinsky et al., 1988; van Vreeswijk et al., 1996; Renart et al., 2010; Ostojic, 2014; Rosenbaum et al., 2017). The chaotic activity of balanced rate models (Sompolinsky et al., 1988) has been harnessed to accomplish complex tasks by including a feedback loop (Sussillo and Abbott, 2009), stabilizing chaotic trajectories (Laje and Buonomano, 2013) or introducing lowrank structure to the connectivity matrix (Mastrogiuseppe and Ostojic, 2017). Balanced spiking networks have been shown to possess similar capabilities (Thalmeier et al., 2016; DePasquale et al., 2016; Abbott et al., 2016; Nicola and Clopath, 2017; Denève and Machens, 2016), but it is unknown if it is possible to stabilize the heterogeneous fluctuations of the spiking rate in the strong coupling regime (Ostojic, 2014). Here, we extended the work of Laje and Buonomano (2013) to spiking networks and showed that strongly fluctuating single neuron activities can be turned into dynamic attractors by adjusting the recurrent connectivity.
We considered a network of randomly connected excitatory and inhibitory neurons that respected Dale’s Law. Prior to training, the synaptic and spiking activity of individual neurons showed large variations across trials because small discrepancies in the initial network state led to rapid divergence of network dynamics. When simulated with two different initial conditions, the synaptic drive to neurons deviated strongly from each other (Figure 4a), and the spiking activity of single neurons was uncorrelated across trials and the trialaveraged spiking rate had little temporal structure (Figure 4b). The network activity was also sensitive to small perturbation; the microstate of two identically prepared networks diverged rapidly if one spike was deleted from one of the networks (Figure 4c). It has been previously questioned as to whether the chaotic nature of an excitatoryinhibitory network could be utilized to perform reliable computations (London et al., 2010; Monteforte and Wolf, 2012).
As in Laje and Buonomano (2013), we sought to tame the chaotic trajectories of single neuron activities when the coupling strength is strong enough to induce large and irregular spiking rate fluctuations in time and across neurons (Ostojic, 2014). We initiated the untrained network with random initial conditions to harvest innate synaptic activity, that is a set of synaptic trajectories that the network already knows how to generate. Then, the recurrent connectivity was trained so that the synaptic drive of every neuron in the network follows the innate pattern when stimulated with an external stimulus. To respect Dale’s Law, the RLS learning rule was modified such that it did not update synaptic connections if there were changes in their signs.
After training, the synaptic drive to every neuron in the network was able to track the innate trajectories in response to the external stimulus within the trained window and diverged from the target pattern outside the trained window (Figure 4d). When the trained network was stimulated to evoke the target patterns, the trialaveraged spiking rate developed a temporal structure that was not present in the untrained network (Figure 4e). To verify the reliability of learned spiking patterns, we simulated the trained network twice with identical initial conditions but deleted one spike $200$ ms after evoking the trained response from one of the simulations. Within the trained window, the relative deviation of the microstate was markedly small in comparison to the deviation observed in the untrained network. Outside the trained window, however, two networks diverged rapidly again, which demonstrated that training the recurrent connectivity created an attracting flux tube around what used to be chaotic spike sequences (Monteforte and Wolf, 2012) (Figure 4f,g). Analyzing the eigenvalue spectrum of the recurrent connectivity revealed that the distribution of eigenvalues shifts towards zero and the spectral radius decreased as a result of training, which is consistent with the more stable network dynamics found in trained networks (Figure 4h).
To demonstrate that learning the innate trajectories works well when an excitatoryinhibitory network satisfies the quasistatic condition, we scanned the coupling strength $J$ (see Materials and methods, 'Training recurrent dynamics' for the definition) and synaptic time constant ${\tau}_{s}$ over a wide range and evaluated the accuracy of the quasistatic approximation in untrained networks. We find that increasing either $J$ or ${\tau}_{s}$ promoted strong fluctuations in spiking rates (Ostojic, 2014; Harish and Hansel, 2015), hence improving the quasistatic approximation (Figure 4i). Learning performance was correlated with adherence to the quasistatic approximation, resulting in better performance for strong coupling and long synaptic time constants.
Generating an ensemble of in vivo spiking patterns
We next investigated if the training method applied to actual spike recordings of a large number of neurons. In a previous study, a network of rate units was trained to match sequential activity imaged from posterior parietal cortex as a possible mechanism for shortterm memory (Harvey et al., 2012; Rajan et al., 2016). Here, we aimed to construct recurrent spiking networks that captured heterogeneous spiking activity of cortical neurons involved in motor planning and movement (Churchland and Shenoy, 2007; Churchland et al., 2012; Li et al., 2015).
The in vivo spiking data was obtained from the publicly available data of Li et al. (2015), where they recorded the spike trains of a large number of neurons from the anterior lateral motor cortex of mice engaged in planning and executing directed licking over multiple trials. We compiled the trialaverage spiking rate of ${N}_{\text{cor}}=227$ cortical neurons from their data set (Li et al., 2014), and trained a recurrent network model to reproduce the spiking rate patterns of all the $N}_{cor$ neurons autonomously in response to a brief external stimulus. We only trained the recurrent connectivity and did not alter single neuron dynamics or external inputs.
First, we tested if a recurrent network of size ${N}_{\text{cor}}$ was able to generate the spiking rate patterns of the same number of cortical neurons. This network model assumed that the spiking patterns of ${N}_{\text{cor}}$ cortical neurons could be selfgenerated within a recurrent network. After training, the spiking rate of neuron models captured the overall trend of the spiking rate, but not the rapid changes that may be pertinent to the short term memory and motor response (Figure 5b). We hypothesized that the discrepancy may be attributed to other sources of input to the neurons not included in the model, such as recurrent input from other neurons in the local population or input from other areas of the brain, or the neuron dynamics that cannot be captured by our neuron model. We thus sought to improve the performance by adding ${N}_{\text{aux}}$ auxiliary neurons to the recurrent network to mimic the spiking activity of unobserved neurons in the local population, and trained the recurrent connectivity of a network of size ${N}_{\text{cor}}={N}_{\text{aux}}=2$ (Figure 5a). The auxiliary neurons were trained to follow spiking rate patterns obtained from an OU process and provided heterogeneity to the overall population activity patterns. When ${N}_{\text{aux}}/{N}_{\text{cor}}\ge 2$, the spiking patterns of neuron models accurately fit that of cortical neurons (Figure 5c), and the population activity of all ${N}_{\text{cor}}$ cortical neurons was well captured by the network model (Figure 5d). The fit to cortical activity improved gradually as a function of the fraction of auxiliary neurons in the network due to increased heterogeneity in the target patterns (Figure 5e)
To verify that the cortical neurons in the network model were not simply driven by the feed forward inputs from the auxiliary neurons, we randomly shuffled a fraction of recurrent connections between cortical neurons after a successful training. The fit to cortical data deteriorated as the fraction of shuffled synaptic connections between cortical neurons was increased, which confirmed that the recurrent connections between the cortical neurons played a role in generating the spiking patterns (Figure 5f).
Sufficient conditions for learning
We can quantify the sufficient conditions the target patterns need to satisfy in order to be successfully encoded in a network. The first condition is that the dynamical time scale of both neurons and synapses must be sufficiently fast compared to the target patterns such that targets can be considered constant (quasistatic) on a short time interval. In terms of network dynamics, the quasistatic condition implies that the synaptic and neuron dynamics operate as if in a stationary state even though the stationary values change as the network activity evolves in time. In this quasistatic state, we can use a mean field description of the spiking dynamics to derive a selfconsistent equation that captures the timedependent synaptic and spiking activity of neurons (Buice and Chow, 2013; Ostojic, 2014; Brunel, 2000) (see Materials and methods, 'Mean field description of the quasistatic dynamics'). Under the quasistatic approximation, the synaptic drive satisfies
and the spiking rate ${R}_{i}=\varphi ({U}_{i}+{I}_{i})$ satisfies
where $\varphi $ is the currenttorate transfer (i.e. gain) function and ${I}_{i}$ is a constant external input.
The advantage of operating in a quasistatic state is that both measures of network activity become conducive to learning new patterns. First, Equation 1 is closed in terms of $U$, which implies that training the synaptic drive is equivalent to training a ratebased network. Second, the RLS algorithm can efficiently optimize the recurrent connectivity $W$, thanks to the linearity of Equation 1 in $W$, while the synaptic drive closely follows the target patterns as shown in Figure 1b. The spiking rate also provides a closed description of the network activity, as described in Equation 2. However, due to nonlinearity in $W$, it learns only when the total input to a neuron is suprathreshold, that is the gradient of $\varphi $ must be positive. For this reason, the learning error cannot be controlled as tightly as the synaptic drive and requires additional trials for successful learning as shown in Figure 1d.
The second condition requires the target patterns to be sufficiently heterogeneous in time and across neurons. Such complexity allows the ensemble of spiking activity to have a rich spatiotemporal structure to generate the desired activity patterns of every neuron within the network. In the perspective of ‘reservoir computing’ (Maass et al., 2002; Jaeger and Haas, 2004; Sussillo and Abbott, 2009), every neuron in a recurrent network is considered to be a readout, and, at the same time, it is part of the reservoir that is collectively used to produce desired patterns in single neurons. The heterogeneity condition is equivalent to having a set of complete (or overcomplete) basis functions, that is $\varphi ({U}_{j}+{I}_{j}),j=1,\mathrm{...},N$ in Equation 1 and ${R}_{j},j=1,\mathrm{...},N$ in Equation 2, to generate the target patterns, that is the left hand side of Equations 1 and 2. The two conditions are not necessarily independent. Heterogeneous targets also foster asynchronous spiking activity that support quasistatic dynamics.
We can illustrate the necessity of heterogeneous target functions with a simple argument. Successful learning is achieved for the synaptic drive when Equation 1 is satisfied. If we discretize time into $P$‘quasistatic’ bins then we can consider the target ${U}_{i}\left(t\right)$ as a $N\times P$ matrix that satisfies the system of equations expressed in matrix form as $U=WV$, where $V\equiv \varphi (U+I)$ is an $N\times P$ matrix. Since the elements of $W$ are the unknowns, it is convenient to consider the transpose of the matrix equation, ${U}^{T}={V}^{T}{W}^{T}$. Solving for ${W}^{T}$ is equivalent to finding ${w}_{i}$ in ${u}_{i}={V}^{T}{w}_{i}$ for $i=1,\mathrm{...},N$, where ${u}_{i}$ is a vector in $P$dimensional Euclidean space ${\mathbb{R}}^{P}$ denoting the ${i}^{\text{th}}$ column of ${U}^{T}$ (the synaptic drive of neuron $i$) and ${w}_{i}$ is an $N$dimensional vector denoting the ${i}^{\text{th}}$ column of ${W}^{T}$ (the incoming synaptic connections to neuron $i$). We also denote the column vectors of ${V}^{T}$ in $\mathbb{R}}^{P$ by ${v}_{1},\mathrm{...},{v}_{N}$ (the firing rate patterns of neurons induced by the target functions). For each $i$, the system of equations consists of $P$ equations and $N$ unknowns.
In general, the system of equations is solvable if all target functions ${u}_{i},i=1,\mathrm{...},N$ lie in the subspace spanned by ${v}_{1},\mathrm{...},{v}_{N}$. This is equivalent to stating that the target functions can be selfconsistently generated by the firing rate patterns induced by the target functions. We define target functions to be sufficiently heterogeneous if $\text{rank}\left(V\right)$ is maximal and show that this is a sufficient condition for solutions to exist. Since the span of ${v}_{1},\mathrm{...},{v}_{N}$ encompasses the largest possible subspace in ${\mathbb{R}}^{P}$ if $\text{rank}\left(V\right)$ is maximal, it is justified as a mathematical definition of sufficiently heterogeneous. In particular, if $N\ge P$ and $\text{rank}\left(V\right)$ is maximal, we have $\text{dim}\phantom{\rule{thinmathspace}{0ex}}\text{span}\left\{{\mathbf{v}}_{1},...,{\mathbf{v}}_{N}\right\}=P$, which implies that the set of firing rate vectors ${v}_{1},\mathrm{...},{v}_{N}$ fully span ${\mathbb{R}}^{P}$, of which the target vectors ${u}_{i}$ are elements; in other words, ${v}_{1},\mathrm{...},{v}_{N}$ forms an (over)complete set of basis functions of ${\mathbb{R}}^{P}$. On the other hand, if $N<P$ and $\text{rank}\left(V\right)$ is maximal, we have $\text{dim}\phantom{\rule{thinmathspace}{0ex}}\text{span}\left\{{\mathbf{v}}_{1},...,{\mathbf{v}}_{N}\right\}=N$, which implies linearly independent ${v}_{1},\mathrm{...},{v}_{N}$ can only span an $N$dimensional subspace of ${\mathbb{R}}^{P}$, but such subspace still attains the largest possible dimension.
Now we consider the solvability of ${u}_{i}={V}^{T}{w}_{i}$ when $\text{rank}\left(V\right)$ is maximal. For $N\ge P$, the set of vectors ${v}_{1},\mathrm{...},{v}_{N}$ fully span ${\mathbb{R}}^{P}$, or equivalently we can state that there are more unknowns ($N$) than independent equations ($P$), in which case the equation can always be satisfied and learning the pattern is possible. If $N$ is strictly larger than $P$ then a regularization term is required for the algorithm to converge to a specific solution out of the many possible solutions, the number of which decreases as $P$ approaches $N$. For $N<P$, on the other hand, ${v}_{1},\mathrm{...},{v}_{N}$ spans an $N$dimensional subspace of ${\mathbb{R}}^{P}$, or equivalently there will be more equations than unknowns and perfect learning is not possible. However, since $\text{rank}\left(V\right)$ is maximal, there is an approximate regression solution of the form $W=U{V}^{T}{\left(V{V}^{T}\right)}^{1}$, where the inverse of $V{V}^{T}$ exists since the set of vectors ${v}_{1},\mathrm{...},{v}_{N}$ is linearly independent.
When $\text{rank}\left(V\right)$ is not maximal, successful learning is still possible as long as all ${u}_{i},i=1,\mathrm{...},N$ lie close to the subspace spanned by ${v}_{1},\mathrm{...},{v}_{N}$. However, the success depends on the specific choice of target functions, because the dimension of the subspace spanned by ${v}_{1},\mathrm{...},{v}_{N}$ is strictly less than $P$, so whether the rows of $U$ are contained in or close to this subspace is determined by the geometry of the subspace. This shows why increasing pattern heterogeneity, which makes the columns of ${V}^{T}$ more independent and the rank higher, is beneficial for learning. Conversely, as a larger number of neurons is trained on the same target, as considered in Figure 3c, it becomes increasingly difficult to develop the target pattern ${u}_{i}$ with the limited set of basis functions ${v}_{1},\mathrm{...},{v}_{N}$.
This argument also shows why learning capability declines as $P$ increases, with a steep decline for $P>N$. If we ascribe a quasistatic bin to some fraction of the pattern correlation time then $P$ will scale with the length of the pattern temporal length. In this way, we can intuitively visualize the temporal storage capacity demonstrated below in Figure 7 through simulations.
We note that although Equations 1 and 2 describe the dynamical state in which learning works well, merely finding $W$ that satisfies one of the equations does not guarantee that a spiking network with recurrent connectivity $W$ will produce the target dynamics in a stable manner. The recurrent connectivity $W$ needs to be trained iteratively as the network dynamics unfold in time to ensure that the target dynamics is generated in a stable manner (Sussillo and Abbott, 2009). There are three aspects of the training scheme that promote stable dynamics around the target trajectories. First, the stimulus at the onset of the learning window is applied at random times so it only sets the initial network states close to each other but with some random deviations. Training with initial conditions sampled from a small region in the state space forces the trained network to be robust to the choice of initial condition, and the target dynamics can be evoked reliably. Second, various network states around the target trajectories are explored while $W$ is learning the desired dynamics. Inbetween the time points when $W$ is updated, the network states evolve freely with no constraints and can thus diverge from the desired trajectory. This allows the network to visit different network states in the neighborhood of the target trajectories during training, and the trained network becomes resistant to relatively small perturbations from the target trajectories. Third, the synaptic update rule is designed to reduce the error between the target and the ongoing network activity each time $W$ is updated. Thus, the sequential nature of the training procedure automatically induces stable dynamics by contracting trajectories toward the target throughout the entire path. In sum, robustness to initial conditions and network states around the target trajectories, together with the contractive property of the learning scheme, allow the trained network to generate the target dynamics in a stable manner.
Characterizing learning error
Learning errors can be classified into two categories. There are tracking errors, which arise because the target is not a solution of the true spiking network dynamics and sampling errors, which arise from encoding a continuous function with a finite number of spikes. We note that for a rate network, there would only be a tracking error. We quantified these learning errors as a function of the network and target time scales. The intrinsic time scale of spiking network dynamics was the synaptic decay constant ${\tau}_{s}$, and the time scale of target dynamics was the decay constant ${\tau}_{c}$ of OU noise. We used target patterns generated from OU noise since the trajectories have a predetermined time scale and their spatiotemporal patterns are sufficiently heterogeneous.
We systematically varied $\tau}_{s$ and ${\tau}_{c}$ from fast AMPAlike (~ 1 ms) to slow NMDAlike synaptic transmission (~ 100 ms) and trained the synaptic drive of networks with synaptic time scale ${\tau}_{s}$ to learn a set of OU trajectories with timescale ${\tau}_{c}$. The parameter scan revealed a learning regime, where the networks successfully encoded the target patterns, and two errordominant regimes. The tracking error was prevalent when synapses were slow in comparison to target patterns, and the sampling error dominated when the synapses were fast (Figure 6a).
A network with a synaptic decay time ${\tau}_{s}=200$ ms failed to follow rapid changes in the target patterns, but still captured the overall shape, when the target patterns had a faster time scale ${\tau}_{c}=100$ ms (Figure 6b, Tracking error). This prototypical example showed that the synaptic dynamics were not fast enough to encode the target dynamics in the tracking error regime. With a faster synapse ${\tau}_{s}=30$ ms, the synaptic drive was able to learn the identical target trajectories with high accuracy (Figure 6b, Learning). Note that although the target time scale (${\tau}_{c}=100$ ms) was significantly slower than the synaptic time scale (${\tau}_{s}=30$ ms), tuning the recurrent synaptic connections was sufficient for the network to generate slow network dynamics using fast synapses. This phenomenon was shown robustly in the learning regime in Figure 4a where learning occurred successfully for the parameters lying above the diagonal line (${\tau}_{c}>{\tau}_{s}$). When the synapse was too fast ${\tau}_{s}=5$ ms, however, the synaptic drive fluctuated around the target trajectories with high frequency (Figure 6b, Sampling error). This was a typical network response in the sampling error regime where discrete spikes with narrow width and large amplitude were summed to ‘sample’ the target synaptic activity.
To better understand how network parameters determined the learning errors, we mathematically analyzed the errors assuming that (1) target dynamics can be encoded if the quasistatic condition holds, and (2) the mean field description of the target dynamics is accurate (see Materials and methods, 'Analysis of learning error'). The learning errors were characterized as a deviation of these assumptions from the actual spiking network dynamics. We found that the tracking errors ${\u03f5}_{\text{track}}$ were substantial if the quasistatic condition was not valid, that is synapses were not fast enough for spiking networks to encode targets, and the sampling errors ${\u03f5}_{\text{sample}}$ occurred if the mean field description became inaccurate, that is discrete representation of targets in terms of spikes deviated from their continuous representation in terms of spiking rates. The errors were estimated to scale with
which implied that tracking error can be controlled as long as synapses are relatively faster than target patterns, and the sampling error can be controlled by either increasing ${\tau}_{s}$ to stretch the width of individual spikes or increasing $N$ to encode the targets with more input spikes. The error estimates revealed the versatility of recurrent spiking networks to encode arbitrary patterns since ${\u03f5}_{\text{track}}$ can be reduced by tuning ${\tau}_{s}$ to be small enough and ${\u03f5}_{\text{sample}}$ can be reduced by increasing $N$ to be large enough. In particular, target signals substantially slower than the synaptic dynamics (i.e. ${\tau}_{s}/{\tau}_{c}\ll 1$) can be encoded reliably as long as the network size is large enough to represent the slow signals with filtered spikes that have narrow widths. Such slow dynamics were also investigated in randomly connected recurrent networks when coupling is strong (Sompolinsky et al., 1988; Ostojic, 2014) and reciprocal connections are overrepresented (Martí et al., 2018).
We examined the performance of trained networks to verify if the theoretical results can explain the learning errors. The learning curve, as a function of ${\tau}_{s}$, had an inverted Ushape when both types of errors were present (Figure 6c, d). Successful learning occurred in an optimal range of ${\tau}_{s}$, and, consistent with the error analysis, the performance decreased monotonically with ${\tau}_{s}$ on the right branch due to increase in the tracking error while the performance increased monotonically with ${\tau}_{s}$ on the left branch due to decrease in the sampling error. The tracking error was reduced if target patterns were slowed down from ${\tau}_{c}=50$ ms to ${\tau}_{c}=200$ ms, hence decreased the ratio ${\tau}_{s}/{\tau}_{c}$. Then, the learning curve became sigmoidal, and the performance remained high even when ${\tau}_{s}$ was in the slow NMDA regime (Figure 6c). On the other hand, the sampling error was reduced if the network size was increased from $N=500$ to $1500$, which lifted the left branch of the learning curve (Figure 6d). Note that when two error regimes were well separated, changes in target time scale ${\tau}_{c}$ did not affect ${\u03f5}_{\text{sample}}$, and changes in network size $N$ did not affect ${\u03f5}_{\text{sample}}$, as predicted.
Finally, we condensed the training results over a wide range of target time scales in the tracking error regime (Figure 6e), and similarly condensed the training results over different network sizes in the sampling error regime (Figure 6f) to demonstrate that ${\tau}_{s}/{\tau}_{c}$ and $N{\tau}_{s}$ explained the overall performance in the tracking and sampling error regimes, respectively.
Learning capacity increases with network size
It has been shown that a recurrent rate network’s capability to encode target patterns deteriorates as a function of the length of time (Laje and Buonomano, 2013), but increase in network size can enhance its storage capacity (Jaeger, 2001; White et al., 2004; Rajan et al., 2016). Consistent with these results, we found that the performance of recurrent spiking networks to learn complex trajectories decreased with target length and improved with network size (Figure 7a).
To assess the storage capacity of spiking networks, we evaluated the maximal target length that can be encoded in a network as a function of network size. It was necessary to define the target length in terms of its ‘effective length’ to account for the fact that target patterns with the same length may have different effective length due to their temporal structures; for instance, OU noise with short temporal correlation times has more structure to be learned than a constant function. For target trajectories generated from an OU process with decay time ${\tau}_{c}$, we rescaled the target length $T$ with respect to ${\tau}_{c}$ and defined the effective length $\stackrel{~}{T}=T/{\tau}_{c}$. The capacity of a network was the maximal $\stackrel{~}{T}$ that can be successfully encoded in a network.
To estimate the maximal $\stackrel{~}{T}$, we trained networks of fixed size to learn OU trajectories while varying $T$ and ${\tau}_{c}$ (each panel in Figure 7b). Then, for each ${\tau}_{c}$, we found the maximal target length ${T}_{\text{max}}$ that can be learned successfully, and estimated the maximal $\stackrel{~}{T}$ by finding a constant $\stackrel{~}{T}}_{max$ that best fits the line $T}_{max}={\stackrel{~}{T}}_{max}{\tau}_{c$ to training results (black lines in Figure 7b). Figure 7c shows that the learning capacity $\stackrel{~}{T}}_{max$ increases monotonically with the network size.
Discussion
Our findings show that individual neurons embedded in a recurrent network can learn to produce complex activity by adjusting the recurrent synaptic connections. Most previous research on learning in recurrent neural networks focused on training the network outputs to perform useful computations and subsequently analyzed the recurrent activity in comparison with measured neuron activity (Sussillo and Abbott, 2009; Sussillo et al., 2015; Sussillo and Barak, 2013; Wang et al., 2018; Chaisangmongkon et al., 2017; Remington et al., 2018). In contrast to such outputcentric approaches, our study takes a networkcentric perspective and directly trains the activity of neurons within a network individually. Several studies have trained a ratebased network model to learn specific forms of target recurrent activity, such as innate chaotic dynamics (Laje and Buonomano, 2013), sequential activity (Rajan et al., 2016), and trajectories from a target network (DePasquale et al., 2018). In this study, we showed that the synaptic drive and spiking rate of a synapticallycoupled spiking network can be trained to follow arbitrary spatiotemporal patterns. The necessary ingredients for learning are that the spike train inputs to a neuron are weakly correlated (i.e. heterogeneous target patterns), the synapses are fast enough (i.e. small tracking error), and the network is large enough (i.e. small sampling error and large capacity). We demonstrated that (1) a network consisting of excitatory and inhibitory neurons can learn to track its strongly fluctuating innate synaptic trajectories, and (2) are current spiking network can learn to reproduce the spiking rate patterns of an ensemble of cortical neurons involved in motor planning and movement.
Our scheme works because the network quickly enters a quasistatic state where the instantaneous firing rate of a neuron is a fixed function of the inputs (Figure 3a, b; Equations 1, 2). Learning fails if the synaptic time scale is slow compared to the time scale of the target, in which case the quasistatic condition is violated and the tracking error becomes large. There is a tradeoff between tracking error and sampling noise; fast synapse can decrease the tracking error, but it also increases the sampling noise. Increasing the network size can decrease sampling noise without affecting the tracking error (Figure 6e, f; Equation 3). Therefore, analysis of learning error and simulations suggest that it is possible to learn arbitrarily complex recurrent dynamics by adjusting the synaptic time scale and network size.
An important structural property of our network model is that the synaptic inputs are summed linearly, which allows the synaptic activity to be trained using a recursive form of linear regression (Sussillo and Abbott, 2009; Equation 6). Linear summation of synaptic inputs is a standard assumption for many spiking network models (van Vreeswijk et al., 1996; Renart et al., 2010; Brunel, 2000; Wang and Buzsáki, 1996; Rosenbaum et al., 2017) and there is physiological evidence that linear summation is prevalent (Cash and Yuste, 1998; Cash and Yuste, 1999). Training the spiking rate, on the other hand, cannot take full advantage of the linear synapse due to the nonlinear currenttotransfer function (Figure 1d, e; Equation 2). The network is capable of following a wide repertoire of patterns because even though the network dynamics are highly nonlinear, the system effectively reduces to a linear system for learning. Moreover, learning capacity can be estimated using a simple solvability condition for a linear system. However, nonlinear dendritic processing has been widely observed (Gasparini and Magee, 2006; Nevian et al., 2007) and may have computational consequences (Memmesheimer, 2010; Memmesheimer and Timme, 2012; Thalmeier et al., 2016). It requires further investigation to find out whether a recurrent network with nonlinear synapses can be trained to learn arbitrary recurrent dynamics.
We note that our learning scheme does not train precise spike times; it either trains the spiking rate or the synaptic drive. The stimulus at the onset of the learning window attempts to set the network to a specific state, but due to the variability of the initial conditions the network states can only be set approximately close to each other across trials. Because of this discrepancy in network states at the onset, the spike times are not aligned precisely across trials. Hence, our learning scheme supports rate coding as opposed to spike coding. However, spike trains that have temporally irregular structure across neurons actually enhance the rate coding scheme by providing sufficient computational complexity to encode the target dynamics (Results, 'Sufficient conditions for learning'). In fact, all neurons in the network can be trained to follow the same target patterns as long as there is sufficient heterogeneity, for example noisy external input, and the neuron time constant is fast enough (Figure 3—figure supplement 3). We also note that the same learning scheme can also be used to train the recurrent dynamics of ratebased networks (Figure 1—figure supplement 1). In fact, the learning is more efficient in a rate network since there is no sampling error to avoid.
The RLS algorithm, as demonstrated in this and other studies (Sussillo and Abbott, 2009; Sussillo et al., 2015; Laje and Buonomano, 2013; Rajan et al., 2016; DePasquale et al., 2018; Wang et al., 2018), successfully generates desired outputs in a stable manner because the synaptic update rule contracts the network activity towards the target output, and the synaptic connections are adjusted while the network explores various states around the target trajectories. It would be interesting to examine more rigorously how such an iterative learning scheme turns a set of arbitrary functions into dynamic attractors to which the network dynamics converge transiently. Recent studies investigated how stable dynamics emerge when the readouts of a ratebased network are trained to learn fixed points or continuous values (Rivkind and Barak, 2017; Beer and Barak, 2018). In addition, previous studies have investigated the mathematical relationship between the patterns of stored fixed points and the recurrent connectivity in simple network models (Curto et al., 2013; Brunel, 2016).
Although our results demonstrated that recurrent spiking networks have the capability to generate a wide range of repertoire of recurrent dynamics, it is unlikely that a biological network is using this particular learning scheme. The learning rule derived from recursive least squares algorithm is very effective but is nonlocal in time, that is it uses the activity of all presynaptic neurons within the train time window to update synaptic weights. Moreover, each neuron in the network is assigned with a target signal and the synaptic connections are updated at a fast time scale as the error function is computed in a supervised manner. It would be of interest to find out whether more biologically plausible learning schemes, such as rewardbased learning (Fiete and Seung, 2006; Hoerzer et al., 2014; Miconi, 2017) can lead to similar performance.
Materials and methods
Network of spiking neurons
We considered a network of $N$ randomly and sparsely connected quadratic integrateandfire neurons given by
where ${v}_{i}$ is a dimensionless variable representing membrane potential, ${I}_{i}\left(t\right)$ is an applied input, ${u}_{i}\left(t\right)$ is the total synaptic drive the neuron receives from other neurons in the recurrent network, and $\tau =10$ ms is a neuron time constant. The threshold to spiking is zero input. For negative total input, the neuron is at rest and for positive input, ${v}_{i}$ will go to infinity or ‘blow up’ in finite time from any initial condition. The neuron is considered to spike at ${v}_{i}=\infty $ whereupon it is reset to $\infty $ (Ermentrout, 1996; Latham et al., 2000).
To simulate the dynamics of quadratic integrateandfire neurons, we used its phase representation, that is theta neuron model, that can be derived by a simple change of variables, ${v}_{i}=\mathrm{tan}\left({\theta}_{i}/2\right)$; its dynamics are governed by
where a spike is emitted when $\theta \left(t\right)=\pi $. The synaptic drive to a neuron obeys
where ${s}_{j}\left(t\right)={\sum}_{{t}_{j}^{k}<t}\text{}\delta \left(t{t}_{j}^{k}\right)$ is the spike train neuron $j$ generates up to time $t$, and ${\tau}_{s}$ is a synaptic time constant.
The recurrent connectivity ${W}_{ij}$ describes the synaptic coupling from neuron $j$ to neuron $i$. It can be any real matrix but in many of the simulations we use a random matrix with connection probability $p$, and the coupling strength of nonzero elements is modeled differently for different figures.
Training recurrent dynamics
To train the synaptic and spiking rate dynamics of individual neurons, it is more convenient to divide the synaptic drive Equation 6 into two parts; one that isolates the spike train of single neuron and computes its synaptic filtering
and the other that combines all the presynaptic neurons’ spiking activity and computes the synaptic drive
The synaptic drive ${u}_{i}$ and the filtered spike train ${r}_{i}$ are two measures of spiking activity that have been trained in this study. Note that Equations 7 and 8 generate synaptic dynamics that are equivalent to Equation 6.
Training procedure
We select $N$ target trajectories ${f}_{1}\left(t\right),\mathrm{...},{f}_{N}\left(t\right)$ of length $T$ ms for a recurrent network consisting of $N$ neurons. We train either the synaptic drive or spiking rate of individual neuron $i$ to follow the target ${f}_{i}\left(t\right)$ over time interval $[0,T]$ for all $i=1,\mathrm{...},N$. External stimulus ${I}_{i}$ with amplitude sampled uniformly from $[1,1]$ is applied to neuron $i$ for all $i=1,2,\mathrm{...},N$ for 100 ms immediately preceding the training to situate the network at a specific state. During training, the recurrent connectivity $W$ is updated every $\Delta t$ ms using a learning rule described below in order to steer the network dynamics toward the target dynamics. The training is repeated multiple times until changes in the recurrent connectivity stabilize.
Training synaptic drive
Recent studies extended the RLS learning (also known as FORCE methods) developed in rate networks (Sussillo and Abbott, 2009) either directly (Nicola and Clopath, 2017) or indirectly using rate networks as an intermediate step (DePasquale et al., 2016; Abbott et al., 2016; Thalmeier et al., 2016) to train the output of spiking networks. Our learning rule uses the RLS learning but is different from previous studies in that (a) it trains the activity of individual neurons within a spiking network and (b) neurons are trained directly by adjusting the recurrent synaptic connections without using any intermediate networks. We modified the learning rule developed by Laje and Buonomano in a network of rate units (Laje and Buonomano, 2013) and also provided mathematical derivation of the learning rules for both the synaptic drive and spiking rates (see Materials and methods, 'Derivation of synaptic learning rules' for details).
When learning the synaptic drive patterns, the objective is to find recurrent connectivity $W$ that minimizes the cost function
which measures the meansquare error between the targets and the synaptic drive over the time interval $[0,T]$ plus a quadratic regularization term. To derive the learning rule, we use Equation 8 to express $u$ as a function of $W$, view the synaptic connections ${W}_{i1},\mathrm{...},{W}_{iN}$ to neuron $i$ to be the readout weights that determine the synaptic drive ${u}_{i}$, and apply the learning rule to the row vectors of $W$. To keep the recurrent connectivity sparse, learning occurs only on synaptic connections that are nonzero prior to training.
Let ${\mathit{w}}_{i}\left(t\right)$ be the reduced row vector of $W\left(t\right)$ consisting of elements that have nonzero connections to neuron $i$ prior to training. Similarly, let ${\mathit{r}}_{i}\left(t\right)$ be a (column) vector of filtered spikes of presynaptic neurons that have nonzero connections to neuron $i$. The synaptic update to neuron $i$ is
where the error term is
and the inverse of the correlation matrix of filtered spike trains is
Finally, $W\left(t\right)$ is obtained by concatenating the row vectors ${\mathit{w}}_{i}\left(t\right),i=1,...,N$.
Training spiking rate
To train the spiking rate of neurons, we approximate the spike train ${s}_{i}\left(t\right)$ of neuron $i$ with its spiking rate $\varphi \left({u}_{i}\right(t)+{I}_{i})$ where $\varphi $ is the currenttorate transfer function of theta neuron model. For constant input,
and for noisy input
Since ${\varphi}_{2}$ is a good approximation of ${\varphi}_{1}$ and has a smooth transition around $x=0$, we used $\varphi \equiv {\varphi}_{2}$ with $c=0.1$ (Brunel and Latham, 2003). The objective is to find recurrent connectivity $W$ that minimizes the cost function
If we define ${w}_{i}$ and ${r}_{i}$ as before, we can derive the following synaptic update to neuron $i$
where the error term is
and
(see Materials and methods, 'Derivation of synaptic learning rules' for details). Note that the nonlinear effects of the transfer function is included in
which scales the spiking activity of neuron $i$ by its gain function ${\varphi}^{\prime}$.
As before, $W\left(t\right)$ is obtained by concatenating the row vectors ${\mathit{r}}_{i}\left(t\right),i=1,...,N$.
Simulation parameters
Figure 1
A network of $N=200$ neurons was connected randomly with probability $p=0.3$ and the coupling strength was drawn from a Normal distribution with mean 0 and standard deviation $\sigma /\sqrt{Np}$ with $\sigma =4$. In addition, the average of all nonzero synaptic connections to a neuron was subtracted from the connections to the neuron such that the summed coupling strength was precisely zero. Networks with balanced excitatory and inhibitory connections produced highly fluctuating synaptic and spiking activity in all neurons. The synaptic decay time was ${\tau}_{s}=20$ ms.
The target functions for the synaptic drive (Figure 1b) were sine waves $f\left(t\right)=A\mathrm{sin}\left(2\pi \left(t{T}_{0}\right)/{T}_{1}\right)$ where the amplitude $A$, initial phase ${T}_{0}$, and period ${T}_{1}$ were sampled uniformly from $[0.5,1.5]$, $[0,1000\mathrm{m}\mathrm{s}]$ and $[300\mathrm{m}\mathrm{s},1000\mathrm{m}\mathrm{s}]$, respectively. We generated $N$ distinct target functions of length $T=1000$ ms. The target functions for the spiking rate (Figure 1d) were ${\pi}^{1}\sqrt{{\left[f\right(t\left)\right]}_{+}}$ where $f\left(t\right)$ were the same synaptic drive patterns that have been generated.
Immediately before each training loop, every neuron was stimulated for 50 ms with constant external stimulus that had random amplitude sampled from $[1,1]$. The same external stimulus was used across training loops. The recurrent connectivity was updated every $\Delta t=2$ ms during training using the learning rule derived from RLS algorithm and the learning rate was $\lambda =1$. After training, the network was stimulated with the external stimulus to evoke the trained patterns. The performance was measured by calculating the average Pearson correlation between target functions and the evoked network response.
Figure 2
The initial network and target functions were generated as in Figure 1 using the same parameters, but now the target functions consisted of two sets of $N$ sine waves. To learn two sets of target patterns, the training loops alternated between two patterns, and immediately before each training loop, every neuron was stimulated for 50 ms with constant external stimuli that had random amplitudes, using a different stimulus for each pattern. Each target pattern was trained for 100 loops (i.e. total 200 training loops), synaptic update was every $\Delta t=2$ ms, and the learning rate was $\lambda =10$. To evoke one of the target patterns after training, the network was stimulated with the external stimulus that was used to train that target pattern.
The network consisted of $N=500$ neurons. The initial connectivity was sparsely connected with connection probability $p=0.3$ and coupling strength was sampled from a Normal distribution with mean $0$ and standard deviation $\sigma /\sqrt{Np}$ with $\sigma =1$. The synaptic decay time was ${\tau}_{s}=20$ ms.
We considered three families of target functions with length $T=1000$ ms. The complex periodic functions were defined as a product of two sine waves $f\left(t\right)=A\phantom{\rule{thinmathspace}{0ex}}\mathrm{s}\mathrm{i}\mathrm{n}\left(2\pi \left(t{T}_{0}\right)/{T}_{1}\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{s}\mathrm{i}\mathrm{n}\left(2\pi \left(t{T}_{0}\right)/{T}_{2}\right)$ where $A$, ${T}_{0}$, ${T}_{1}$ and ${T}_{2}$ were sampled randomly from intervals $[0.5,1.5]$, $[0,1000\text{\hspace{0.17em}ms}]$, $[500\text{\hspace{0.17em}ms},1000\text{\hspace{0.17em}ms}]$, and $[100\text{\hspace{0.17em}ms},500\text{\hspace{0.17em}ms}]$, respectively. The chaotic rate activity was generated from a network of $N$ randomly connected rate units, $\tau {\dot{x}}_{i}={x}_{i}+\sum _{j=1}^{N}\text{}{M}_{ij}h\left({x}_{j}\right)$ where $\tau =40$ ms, $h\left(x\right)={\pi}^{1}\sqrt{{\left[x\right]}_{+}}$ and ${M}_{ij}$ is nonzero with probability $p=0.3$ and is drawn from Gaussian distribution with mean zero and standard deviation $g/\sqrt{Np}$ with $g=5$. The OrnsteinUlenbeck process was obtained by simulating, ${\tau}_{c}\dot{x}=x+s\xi \left(t\right)$, $N$ times with random initial conditions and different realizations of the white noise $\xi \left(t\right)$ satisfying $\u27e8\xi \u27e9=0$ and $\u27e8\xi \left(t\right)\xi \left({t}^{\mathrm{\prime}}\right)\u27e9=\delta \left(t{t}^{\mathrm{\prime}}\right)$. The decay time constant was ${\tau}_{c}=200$ ms, and the amplitude of target function was determined by $s=0.3$.
The recurrent connectivity was updated every $\Delta t=2$ ms during training, the learning rate was $\lambda =1$, and the training loop was repeated 30 times.
Figure 4
A balanced network had two populations where the excitatory population consisted of $(1f)N$ neurons and the inhibitory population consisted of $f\phantom{\rule{thinmathspace}{0ex}}N$ neurons with ratio $f=0.2$ and network size $N=1000$. Each neuron received $p(1f)N$ excitatory connections with strength $J$ and $p\phantom{\rule{thinmathspace}{0ex}}f\phantom{\rule{thinmathspace}{0ex}}N$ inhibitory connections with strength $gJ$ from randomly selected excitatory and inhibitory neurons. The connection probability was set to $p=0.1$ to have sparse connectivity. The relative strength of inhibition to excitation $g$ was set to $5$ so that the network was inhibition dominant (Brunel, 2000). In Figure 4a–h, the initial coupling strength $J=6$ and synaptic decay time ${\tau}_{s}=60$ ms were adjusted to be large enough, so that the synaptic drive and spiking rate of individual neurons fluctuated strongly and slowly prior to training.
After running the initial network that started at random initial conditions for 3 s, we recorded the synaptic drive of all neurons for 2 s to harvest target trajectories that are innate to the balanced network. Then, the synaptic drive was trained to learn the innate trajectories, where synaptic update occurred every $10$ ms, learning rate was $\lambda =10$ and training loop was repeated 40 times. To respect Dale’s Law while training the network, we did not modify the synaptic connections if the synaptic update reversed the sign of original connections, either from excitatory to inhibitory or from inhibitory to excitatory. Moreover, the synaptic connections that attempted to change their signs were excluded in subsequent trainings. In bf Figure 4h, the initial and trained connectivity matrices were normalized by a factor $\sqrt{\left[\right(1f){J}^{2}+f{\left(gJ\right)}^{2}](1p)}$ so that the spectral radius of the initial connectivity matrix is approximately 1, then we plotted the eigenvalue spectrum of the normalized matrices.
In Figure 4i, the coupling strength $J$ was scanned from 1 to 6 in increments of 0.25, and the synaptic decay time ${\tau}_{s}$ was scanned from 5 ms to 100 ms in increments of 5 ms. To measure the accuracy of quasistatic approximation in untrained networks, we simulated the network dynamics for each pair of $J$ and ${\tau}_{s}$, then calculated the average Person correlation between the predicted synaptic drive (Equation 1) and the actual synaptic drive. To measure the performance of trained networks, we repeated the training 10 times using different initial network configurations and innate trajectories, and calculated the Pearson correlation between the innate trajectories and the evoked network response for all 10 trainings. The heat map shows the best performance out of 10 trainings for each pair, $J$ and ${\tau}_{s}$.
Figure 5
The initial connectivity was sparsely connected with connection probability $p=0.3$ and the coupling strength was sampled from a Normal distribution with mean $0$ and standard deviation $\sigma /\sqrt{Np}$ with $\sigma =1$. The synaptic decay time was ${\tau}_{s}=50$ ms. There were in total $N$ neurons in the network model, of which ${N}_{\text{cor}}$ neurons, called cortical neurons, were trained to learn the spiking rate patterns of cortical neurons, and ${N}_{\text{aux}}$ neurons, called auxiliary neurons, were trained to learn trajectories generated from OU process.
We used the trialaveraged spiking rates of neurons recorded in the anterior lateral motor cortex of mice engaged in motor planning and movement that lasted 4600 ms (Li et al., 2015). The data was available from the website CRCNS.ORG (Li et al., 2014). We selected ${N}_{\text{cor}}=227$ neurons from the data set, whose average spiking rate during the behavioral task was greater than $5$ Hz. Each cortical neuron in the network model was trained to learn the spiking rate pattern of one of the real cortical neurons.
To generate target rate functions for the auxiliary neurons, we simulated an OU process, ${\tau}_{c}\dot{x}\left(t\right)=x\left(t\right)+s\xi \left(t\right)$, with ${\tau}_{c}=800$ ms and $s=0.1$, then converted into spiking rate $\varphi \left({\left[x\right(t\left)\right]}_{+}\right)$ and lowpass filtered with decay time ${\tau}_{s}$ to make it smooth. Each auxiliary neuron was trained on 4600 mslong target rate function that was generated with a random initial condition.
Figure 6 and 7
Networks consisting of $N=500$ neurons with no initial connections and synaptic decay time ${\tau}_{s}$ were trained to learn OU process with decay time ${\tau}_{c}$ and length $T$. In Figure 6, target length was fixed to $T=1000$ ms while the time constants ${\tau}_{s}$ and ${\tau}_{c}$ were varied systematically from ${10}^{0}$ ms to $5\cdot {10}^{2}$ ms in logscale. The trainings were repeated five times for each pair of ${\tau}_{s}$ and ${\tau}_{c}$ to find the average performance. In Figure 7, the synaptic decay time was fixed to ${\tau}_{s}=20$ ms and $T$ was scanned from 250 ms to 5000 ms in increments of $250$ ms, ${\tau}_{c}$ was scanned from 25 ms to 500 ms in increments of 25 ms, and $N$ was scanned from 500 to 1000 in increments of 50.
To ensure that the network connectivity after training is sparse, synaptic learning occurred only on connections that were randomly selected with probability $p=0.3$ prior to training. Recurrent connectivity was updated every $\Delta t=2$ ms during training, learning rate was $\lambda =1$, and training loop was repeated 30 times. The average Pearson correlation between the target functions and the evoked synaptic activity was calculated to measure the network performance after training.
Derivation of synaptic learning rules
Here, we derive the synaptic update rules for the synaptic drive and spiking rate trainings, Equations 10 and 16. We use RLS algorithm (Haykin, 1996) to learn target functions ${f}_{i}\left(t\right),i=1,2,\mathrm{...},N$ defined on a time interval $[0,T]$, and the synaptic update occurs at evenly spaced time points, $0={t}^{0}\le {t}^{1}\mathrm{...}\le {t}^{K}=T$.
In the following derivation, superscript $k$ on a variable ${X}_{i}^{k}$ implies that $X$ is evaluated at ${t}^{k}$, and the subscript $i$ implies that $X$ pertains to neuron $i$.
Training synaptic drive
The cost function measures the discrepancy between the target functions ${f}_{i}\left(t\right)$ and the synaptic drive ${u}_{i}\left(t\right)$ for all $i=1,\mathrm{...},N$ at discrete time points ${t}_{0},\mathrm{...},{t}_{K}$,
The Recursive Least Squares (RLS) algorithm solves the problem iteratively by finding a solution ${W}^{n}$ to Equation 20 at ${t}^{n}$ and updating the solution at next time step ${t}^{n+1}$. We do not directly find the entire matrix ${W}^{n}$, but find each row of ${W}^{n}$, that is synaptic connections to each neuron $i$ that minimize the discrepancy between ${u}_{i}$ and ${f}_{i}$, then simply combine them to obtain ${W}^{n}$.
To find the ${i}^{th}$ row of ${W}^{n}$, we denote it by ${w}_{i}^{n}$ and rewrite the cost function for neuron $i$ that evaluates the discrepancy between ${f}_{i}\left(t\right)$ and ${u}_{i}\left(t\right)$ on a time interval $[0,{t}^{n}]$,
Calculating the gradient and setting it to 0, we obtain
We express the equation concisely as follows.
To find ${w}_{i}^{n}$ iteratively, we rewrite Equation 22 up to ${t}^{n1}$,
and subtract Equations 23 and 24 to obtain
The update rule for ${w}_{i}^{n}$ is then given by
where the error term is
The matrix inverse $P}^{n}={\left[{R}^{n}+\lambda I\right]}^{1$ can be computed iteratively
using the matrix identity
Training spiking rate
To train the spiking rate of neurons, we approximate the spike train ${s}_{i}\left(t\right)$ of neuron $i$ with its spiking rate $\varphi \left({u}_{i}\right(t)+{I}_{i})$ where $\varphi$ is the currenttorate transfer function of theta neuron model. For constant input,
and for noisy input
Since ${\varphi}_{2}$ is a good approximation of ${\varphi}_{1}$ and has a smooth transition around $x=0$, we used $\varphi \equiv {\varphi}_{2}$ with $c=0.1$ (Brunel and Latham, 2003).
If the synaptic update occurs at discrete time points, $t}^{0},...,{t}^{K$, the objective is to find recurrent connectivity $W$ that minimizes the cost function
As in training the synaptic drive, we optimize the following cost function to train each row of ${W}^{n}$ that evaluates the discrepancy between the spiking rate of neuron $i$ and the target spiking rate $f}_{i$ over a time interval $[0,{t}^{n}]$,
Calculating the gradient and setting it to zero, we obtain
where
is the vector of filtered spike trains scaled by the gain of neuron $i$. Note that when evaluating $\varphi}^{\mathrm{\prime}$ in Equation 32, we use the approximation $u}_{i}^{k}\approx {\mathit{w}}_{i}^{n}\cdot {\mathit{r}}^{k$ to avoid introducing nonlinear functions of $\mathit{w}}_{i}^{n$.
To find an update rule for ${w}_{i}^{n}$, we rewrite Equation 31 up to ${t}_{n1}$,
and subtract Equations 31 and 33 and obtain
Since ${w}_{i}^{n1}$ is updated by small increment, we can approximate the first line in Equation 34,
where we use the approximation $u}_{i}^{k}\approx {w}_{i}^{n}\cdot {r}^{k$ as before to evaluate the derivative ${\varphi}^{\prime}$. Substituting Equation 35 to Equation 34, we obtain the update rule
where the error is
and the correlation matrix of the normalized spiking activity is
As shown above, the matrix inverse $P}^{n}={\left[{R}^{n}+\lambda I\right]}^{1$ can be computed iteratively,
Mean field description of the quasistatic dynamics
We say that a network is in a quasistatic state if the synaptic drive to a neuron changes sufficiently slower than the dynamical time scale of neurons and synapses. Here, we use a formalism developed by Buice and Chow (2013) and derive Equations 1 and 2, which provide a mean field description of the synaptic and spiking rate dynamics of neurons in the quasistatic state.
First, we recast single neuron dynamic Equation 5 in terms of the empirical distribution of neuron’s phase ${\eta}_{i}\left(\theta ,t\right)=\delta \left({\theta}_{i}\left(t\right)\theta \right)$. Since the number of neurons in the network is conserved, we can write the Klimontovich equation for the phase distribution
where $F\left(\theta ,I\right)=1cos\theta +I\left(1+cos\theta \right)$. The synaptic drive Equation 6 can be written in the form
since $s}_{j}\left(t\right)={\eta}_{j}\left(\pi ,t\right)\dot{\theta}{}_{{\theta}_{j}=\pi$ and ${\dot{\theta}}_{j}{}_{{\theta}_{j}=\pi}=2$ for a theta neuron model. Equation 39, together with Equation 40, fully describes the network dynamics.
Next, to obtain a mean field description of the spiking dynamics, we take the ensemble average prepared with different initial conditions and ignore the contribution of higher order moments resulting from nonlinear terms $\u27e8{u}_{i}{\eta}_{i}\u27e9$. Then we obtain the mean field equation
where $\u27e8{u}_{i}\u27e9={U}_{i}$ and $\u27e8{\eta}_{i}\u27e9={\rho}_{i}$. We note that the mean field Equations 41 and 42 provide a good description of the trained network dynamics because $W$ learns over repeatedly training trials and starting at random initial conditions, to minimize the error between target trajectories and actual neuron activity.
Now, we assume that the temporal dynamics of synaptic drive and neuron phase can be suppressed in the quasistatic state,
Substituting Equation 43 to Equation 41, but allowing $U\left(t\right)$ to be timedependent, we obtain the quasistatic solution of phase density
the currenttorate transfer function of a theta neuron model. Substituting Equation 43 and Equation 45 to Equation 42, we obtain a quasistatic solution of the synaptic drive
If we define the spiking rate of a neuron as ${R}_{i}\left(t\right)=\varphi \left({U}_{i}+{I}_{i}\right)$, we immediately obtain
Analysis of learning error
In this section, we identify and analyze two types of learning errors, assuming that for sufficiently heterogeneous targets, (1) the learning rule finds a recurrent connectivity $W$ that can generate target patterns if the quasistatic condition holds, and (2) the mean field description of the spiking network dynamics is accurate due to the error function and repeated training trials. These assumptions imply that Equations 46 and 47 hold for the target patterns ${U}_{i}\left(t\right)$ and the trained $W$. We show that learning errors arise when our assumptions become inaccurate, hence the network dynamics described by Equations 46 and 47 deviate from the actual spiking network dynamics. As we will see, tracking error is prevalent if the target is not an exact solution of the mean field dynamics (i.e. quasistatic approximation fails), and the sampling error dominates if the discrete spikes do not accurately represent continuous targets (i.e. mean field approximation fails).
Suppose we are trying to learn a target $\hat{u}}_{i$ which obeys an OrnsteinUlenbeck process
on a time interval $0<t<T$ where ${\xi}_{i}\left(t\right)$ are independent white noise with zero mean and variance $\sigma}^{2$. The time constant $\tau}_{c$ determines the temporal correlation of a target trajectory. In order for perfect training, the target dynamics (Equation 48) needs to be compatible with the network dynamics (Equation 6); in other words, there must exist a recurrent connectivity $W$ such that the following equation
obtained by substituting the solution of Equation 48 into Equation 6 must hold for $0<t<T$. Here, $s\left[{\hat{u}}_{j}\left(t\right)\right]$ maps the synaptic drive ${\hat{u}}_{j}\left(t\right)$ to the entire spike train ${s}_{j}\left(t\right)$.
It is very difficult to find $W$ that may solve Equation 49 exactly since it requires fully understanding the solution space of a highdimensional system of nonlinear ordinary differential equations. Instead, we assume that the target patterns are quasistatic and the learning rule finds a recurrent connectivity $W$ that satisfies
We then substitute Equation 50 to Equation 49 to estimate how the quasistatic mean field dynamics deviate from the actual spiking network dynamics. A straightforward calculation shows that
where we define the tracking and sampling errors as
and
on the time interval $0<t<T$.
Tracking error
From its definition, $\u03f5}_{\text{track}$ captures the deviation of the quasistatic solution (Equation 50) from the exact solution of the mean field description obtained when ${\u03f5}_{\text{sample}}=0$. $\u03f5}_{\text{track}$ becomes large if the quasistatic condition (Equation 43) fails and, in such network state, the synaptic dynamic is not able to ‘track’ the target patterns, thus learning is obstructed. In the following, we estimate $\u03f5}_{\text{track}$ in terms of two time scales ${\tau}_{s}$ and ${\tau}_{c}$.
First, we take the Fourier transform of Equation 52 and obtain
Next, normalize $F\left[{\u03f5}_{\text{track}}\right]$ with respect to $F\left[\hat{u}\right]$ to estimate the tracking error for target patterns with different amplitudes, then compute the power of normalized tracking error.
where ${\mathrm{\Omega}}_{c}=1/\left(2\pi {\tau}_{c}\right)$ is the cutoff frequency of the power spectrum of a Gaussian process, ${S}_{GP}\left(\omega \right)={\sigma}^{2}{\tau}_{c}^{2}/\left(1+4{\pi}^{2}{\tau}_{c}^{2}\omega \right)$. Thus, the tracking error scales with $\tau}_{s}/{\tau}_{c$.
Sampling error
$\u03f5}_{\text{sample}$ captures how the actual representation of target patterns in terms of spikes deviates from their continuous representation in terms of rate functions. In the following, we estimate $\u03f5}_{extsample$ in terms of ${\tau}_{s}$ and $N$ under the assumption that the continuous representation provides an accurate description of the target patterns.
We lowpass filtered $\u03f5}_{\text{sample}$ to estimate the sampling error since the synaptic drive (i.e. the target variable in this estimate) is a $W$ weighted sum of filtered spikes with width that scales with ${\tau}_{s}$. If the spike trains of neurons are uncorrelated (i.e. cross product terms are negligible),
where ${r}_{j}\left(t\right)$ is the filtered spike train and ${\overline{r}}_{j}=\u27e8{r}_{j}\left(t\right)\u27e9=\frac{1}{\mathrm{\Delta}t}{\int}_{{t}_{k}}^{{t}_{k+1}}\text{}{r}_{j}\left(s\right)ds$ is the empirical estimate of mean spiking rate on a short time interval.
First, we calculate the fluctuation of filtered spike trains under the assumption that a neuron generates spikes sparsely, hence the filtered spikes are nonoverlapping. Let ${s}_{j}\left(t\right)=\sum _{k}\text{}\delta \left(t{t}_{j}^{k}\right)$ be a spike train of neuron $j$ and the filtered spike train ${r}_{j}\left(t\right)=\frac{1}{{\tau}_{s}}\sum _{k}\text{}\mathrm{e}\mathrm{x}\mathrm{p}\left(\left(t{t}_{j}^{k}\right)/{\tau}_{s}\right)H\left(t{t}_{j}^{k}\right)$. Then, the rate fluctuation of neuron $j$ is
where $k$ is summed over the average number of spikes, ${\overline{r}}_{j}\mathrm{\Delta}t$, generated in the time interval of length $\Delta t$.
Next, to estimate the effect of network size on the sampling error, we examined Equation 50 and observed that $O\left(W\right)\sim 1/N$. This follows from that, for predetermined target patterns, $O\left(U\right),O\left(\varphi \left(U\right)\right)\sim 1$ regardless of the network size, hence $O\left(W\right)$ must scale with $1/N$ in order for both sides of the equation to be compatible. If the network is dense, that is the number of synaptic connections to a neuron is $pN$ on average, then the sampling error scales as follows.
References

1
Building functional networks of spiking model neuronsNature Neuroscience 19:350–355.https://doi.org/10.1038/nn.4241
 2

3
Realtime computation at the edge of chaos in recurrent neural networksNeural Computation 16:1413–1436.https://doi.org/10.1162/089976604323057443

4
Predictive coding of dynamical variables in balanced spiking networksPLoS Computational Biology 9:e1003258.https://doi.org/10.1371/journal.pcbi.1003258

5
Enforcing balance allows local supervised learning in spiking recurrent networksAdvances in Neural Information Processing SystemsAdvances in Neural Information Processing Systems 28 (NIPS 2015). pp. 982–990.
 6

7
Dynamics of sparsely connected networks of excitatory and inhibitory spiking neuronsJournal of Computational Neuroscience 8:183–208.https://doi.org/10.1023/A:1008925309027

8
Firing rate of the noisy quadratic integrateandfire neuronNeural Computation 15:2281–2306.https://doi.org/10.1162/089976603322362365

9
Is cortical connectivity optimized for storing information?Nature Neuroscience 19:749–755.https://doi.org/10.1038/nn.4286

10
Dynamic finite size effects in spiking neural networksPLoS Computational Biology 9:e1002872.https://doi.org/10.1371/journal.pcbi.1002872

11
Statedependent computations: spatiotemporal processing in cortical networksNature Reviews Neuroscience 10:113–125.https://doi.org/10.1038/nrn2558

12
Input summation by cultured pyramidal neurons is linear and PositionIndependentThe Journal of Neuroscience 18:10–15.https://doi.org/10.1523/JNEUROSCI.180100010.1998
 13
 14

15
Temporal complexity and heterogeneity of singleneuron activity in premotor and motor cortexJournal of Neurophysiology 97:4235–4257.https://doi.org/10.1152/jn.00095.2007
 16

17
Encoding binary neural codes in networks of thresholdlinear neuronsNeural Computation 25:2858–2903.https://doi.org/10.1162/NECO_a_00504

18
Efficient codes and balanced networksNature Neuroscience 19:375–382.https://doi.org/10.1038/nn.4243
 19
 20
 21

22
Type I membranes, phase resetting curves, and synchronyNeural Computation 8:979–1001.https://doi.org/10.1162/neco.1996.8.5.979

23
Gradient learning in spiking neural networks by dynamic perturbation of conductancesPhysical Review Letters 97:e048104.https://doi.org/10.1103/PhysRevLett.97.048104

24
Statedependent dendritic computation in hippocampal CA1 pyramidal neuronsJournal of Neuroscience 26:2088–2100.https://doi.org/10.1523/JNEUROSCI.442805.2006

25
Asynchronous rate Chaos in spiking neuronal circuitsPLoS Computational Biology 11:e1004266.https://doi.org/10.1371/journal.pcbi.1004266
 26
 27
 28

29
Simple model of spiking neuronsIEEE Transactions on Neural Networks 14:1569–1572.https://doi.org/10.1109/TNN.2003.820440

30
Short Term Memory in Echo State Networks, 5GMDGerman National Research Institute for Computer Science.
 31

32
Spike learning, version 693e790Github.

33
Robust timing and motor patterns by taming Chaos in recurrent neural networksNature Neuroscience 16:925–933.https://doi.org/10.1038/nn.3405

34
Intrinsic dynamics in neuronal networks. I. TheoryJournal of Neurophysiology 83:808–827.https://doi.org/10.1152/jn.2000.83.2.808

35
Extracellular recordings from anterior lateral motor cortex (alm) neurons of adult mice performing a tactile decision behaviorCollaborative Research in Computational Neuroscience  Data Sharing.
 36
 37
 38

39
Computational aspects of feedback in neural circuitsPLoS Computational Biology 3:e165.https://doi.org/10.1371/journal.pcbi.0020165
 40
 41
 42

43
Nonadditive coupling enables propagation of synchronous spiking activity in purely random networksPLoS Computational Biology 8:e1002384.https://doi.org/10.1371/journal.pcbi.1002384
 44
 45
 46

47
Supervised learning in spiking neural networks with FORCE trainingNature Communications 8:2208.https://doi.org/10.1038/s41467017018273
 48
 49
 50
 51
 52

53
Local dynamics in trained recurrent neural networksPhysical Review Letters 118:258101.https://doi.org/10.1103/PhysRevLett.118.258101

54
The spatial structure of correlated neuronal variabilityNature Neuroscience 20:107–114.https://doi.org/10.1038/nn.4433

55
Chaos in random neural networksPhysical Review Letters 61:259–262.https://doi.org/10.1103/PhysRevLett.61.259
 56

57
Opening the black box: lowdimensional dynamics in highdimensional recurrent neural networksNeural computation, 25, 10.1162/NECO_a_00409, 23272922.

58
A neural network that finds a naturalistic solution for the production of muscle activityNature Neuroscience 18:1025–1033.https://doi.org/10.1038/nn.4042

59
Learning Universal Computations with SpikesPLoS computational biology 12:e1004895.https://doi.org/10.1371/journal.pcbi.1004895
 60

61
Gamma oscillation by synaptic inhibition in a hippocampal interneuronal network modelThe Journal of Neuroscience 16:6402–6413.https://doi.org/10.1523/JNEUROSCI.162006402.1996

62
Flexible timing by temporal scaling of cortical responsesNature Neuroscience 21:102–110.https://doi.org/10.1038/s4159301700286

63
Shortterm memory in orthogonal neural networksPhysical Review Letters 92:148102.https://doi.org/10.1103/PhysRevLett.92.148102
Decision letter
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "Learning recurrent dynamics in spiking networks" for consideration by eLife. Your article has been reviewed by Timothy Behrens as the Senior Editor and two reviewers, one of whom is the Reviewing Editor. One of the reviewers, Brian DePasquale, has agreed to reveal his identity.
The Reviewing Editor has drafted this decision to help you prepare a revised submission.
Summary:
The focus of most (maybe all?) research on RNNs has been on output: how do you train recurrent networks to produce a low dimensional (compared to the dimensionality of the network) target function? This paper asks a different question: how do you train recurrent networks so that neurons within the network produce their own target function? Here the target can be either firing rate or synaptic drive. What's interesting about this paper is that the authors give the conditions under which training is possible. These rules provide insight into the limits of performance of spiking RNNs, something that has, I believe, been lacking in the field.
Essential revisions:
Reviewers A and B's full reviews, along with the subsequent discussion are included below. You should focus on making A happy (which is pretty easy, since he/she had only minor comments). However, when you revise the paper, you probably should keep B's review in mind – his/her interpretation of your paper may be shared by others.
Reviewer A comments:
The focus of most (maybe all?) research on RNNs has been on output: how do you train recurrent networks to produce a low dimensional (compared to the dimensionality of the network) target function? This paper asks a different question: how do you train recurrent networks so that neurons within the network produce their own target function? Here the target can be either firing rate or synaptic drive.
What's interesting about this paper is that the authors give the conditions under which training is possible:
 The target functions have to be sufficiently diverse..
 A sufficiently large number of neurons have to be associated with targets.
 The synaptic time constant can't be too long compared to the timescale of the target functions.
 the synaptic time constant can't be too short compared to 1/N (N is the number of neurons in the network).
 The training period can't be too long compared to N.
These rules provide insight into the limits of performance of spiking RNNs, something that has, I believe, been lacking in the field. [Minor comments not shown.]
Reviewer B comments:
The authors present an interesting study examining the capability of recurrent networks of QIF neurons for learning various types of signals. They apply a RLS learning rule so that the network produces various periodic functions, filtered noise, the activity of chaotic networks, and experimental data. They present evidence to support their contention that two key conditions need be met for learning: (1) including a sufficiently rich set of target functions; (2) appropriately matched synaptic timescale relative to the target dynamics. Additionally, a mean field analysis was performed and shown to be consistent with dynamics after learning.
While a handful of curious findings are presented, for the following two reasons I do not believe the work, as presented, is suitable for publication in this journal.
First, the examples, learning conditions and supporting experiments do not constitute a significant advance over previous work. While I cannot refer to prior work that has studied the ability to produce the specific signals presented, the authors do not properly motivate the importance of these particular signals. Prior publications that train RNNs (spiking too) typically focus on constructing networks that can perform tasks, where the motivation for the signals they construct is obvious: they use signals necessary for performing computations. For three of the examples (rate chaos, OU noise and spiking chaos) it's not clear what computation these networks might perform after learning (in fact, the primary motivation for stabilizing rate chaos, e.g. in FORCE learning, was so that the network COULD produce a computation after learning). In the case of the work of Laje and Buonomano (which the authors claim to extend to spiking networks), they sought to stabilize chaotic rate trajectories as a model for understanding timing tasks; here the authors do not present any use for these dynamical patterns, just that they can be learned.
Second, the presented work is simultaneously too technical (and therefore too specific) and too vague to be of significant interest to the general neuroscience community. The mean field analysis is interesting and stands out when compared to other published works, which have not extensively studied how analytic solutions compare to simulations in trained networks. Nevertheless, this is a minor, technical point, not likely to be interesting to most neuroscientists (unless the authors can explain why this is interesting).
While the question of universal computation is of general interest, the presented results are not sufficiently rigorous to match the claims. While they claim that previous work has been impressive but not sufficient to showcase the full richness of what can be expressed by spiking RNNs, the authors claim to learn "arbitrary" and "near universal" dynamics but only show examples (as previous work has) from a rather small set of potential dynamics. If this paper were to be published, the authors would need to clarify what they mean by this to accurately reflect the presented results or include a proof of the universality they are claiming to show. Another question of general interest is what constitutes a "sufficient basis" for learning but here again the authors only provide a few examples that are not shown to extend beyond the direct application studied. This approach is inferior to previous methods, since no prescription is offered for establishing sufficiency other than generating additional random signals; in previous work where specific computations were examined, the only signals required were those necessary for performing the computation and these could be "bootstrapped" by exploiting knowledge about the trainability of firing rate networks. The authors offer a bold claim (identifying the sufficiency of a rich basis) with pretty weak support (just keep adding random signals).
I conclude by highlighting the most interesting and promising result: training with neural data. That the training required auxiliary dynamics is interesting and provides a prediction of a set of dynamical patterns that should exist in the brain in order for the network to perform the task it did. Expounding on these results would yield a very interesting finding, albeit beyond the scope of a simple revision to the present work.
A's response to B's comments (during discussion phase):
My main feeling is that they are looking at a very different problem than anybody else: they're training every single neuron (or a large fraction of neurons) to produce target functions, with different target functions (but similar timescales) for each neuron. This is, in my opinion, completely unrealistic – it's much more realistic to train networks to produce relatively low dimensional output patterns. But what I thought was interesting was that the paper gave bounds on performance in general in terms of timescales of the network and the target output of each neuron. If you can't produce target functions with a particular timescale when you can train each neuron individually, I would think that you certainly can't train the network as a whole to produce target functions with that timescale. Which is a point they might have emphasized more, but that would be easy to fix. But maybe I'm interested simply because it was something I always wondered about.
With this view, your first objection, which is that they didn't consider interesting functions, becomes, I think, moot. Your second objection is that the work is too technical and too vague. I agree the mean field analysis is very technical, and will be followed by only a handful of people. But it doesn't seem to be an integral part of the paper. And I think the rest is pretty easy to follow. Not sure what you meant by too vague – could you expand?
You also say that their results are not sufficiently rigorous. However, if they can make the arguments rigorous, would you be satisfied, at least about the rigorousness (but not necessarily about the first two points)?
B's response to A:
I disagree with you that they are looking at a different problem than anybody else. However, none of this works addresses the conditions for learning, I agree with you there.
I agree with you that understanding the limits of learning in RNNs is interesting and lacking, but to me, their results related to that point only account for a fraction of the full paper. As written, that point is embedded within a range of other findings that are unmotivated (e.g. the various other signals they train networks to perform that do not support their results about the limits of learning), incomplete (using data to train networks, as I wrote in my review) and that do not ultimately support the point about learnability. However, if they were to remove these other results, the point on learnability alone would not constitute a full paper.
In my view, they introduce two main conditions for learning: that the synaptic timescale should roughly match the targets and that the targets should be "suffciently rich". The second condition highlights my concerns about the vagueness and rigorness of the results. They illustrate the need for sufficient richness by removing targets from some of the training sets they used and show that these network cannot learn (Figure 3). While this is anecdotally interesting, I do not believe that illustrating that this is true in a handful of cases is sufficient to say anything general about the conditions necessary for learning. At best, this is good enough to make some pretty handwaving arguments about the need for an overcomplete basis, which is precisely what they do in Section 2. Again, it's not that I don't find this interesting, I just don't seem much substance there.
Their first condition for learning is definitely much more rigorously addressed (the simulations and calculations of Figure 6 are nice) but I find this neither surprising or groundbreaking (you can't learn slow signals without slow synapses, or fast signals without fast synapses....) I think this point could be stated much more simply without the need for the various training targets they used (chaotic rate models and stochastic noise) which again, strengthens my resolve that much of what they do feels unmotivated. Even though Figure 6 being nicely done, 1 figure is not enough for a paper.
Finally, I think the tasks they are learning are perfectly "interesting", just unmotivated.
https://doi.org/10.7554/eLife.37124.016Author response
We would like to thank the reviewers for their valuable comments. We have made the code for learning recurrent spiking dynamics available online at https://github.com/chrismkkim/SpikeLearning
We will use following summary as reference points when responding to the reviewers’ comments:
S1) As Reviewer A pointed out, most research on reservoir computing in RNNs has focused on training the network output to perform a specific computation [Sussillo and Abbott, 2009; Laje and Buonomano, 2013; Rajan, Harvey, and Tank, 2016; DePasquale et al., 2018]. The implicit idea is that the RNN’s job is to provide a reservoir of trajectories that can be combined linearly with trainable weights to produce a desired output. In contrast, we ask the question of whether the output neurons are necessary at all. What if the RNN could be trained such that the output and reservoir were one and the same? And can this be done in a spiking network? None of these questions had answers before our work.
S2) In order for the reservoir computing scheme to work the RNN needs to be rich enough and repeatable, which is nontrivial to attain [Sussillo and Abbott, 2009] achieves this by stabilizing an inherently chaotic network with feedback or training, [Laje and Buonomano, 2013] trains the weights of the RNN to stabilize specific chaotic trajectories, [Rajan, Harvey, and Tank, 2016] used sequential activity derived from experimental data, and [DePasquale et al., 2018] used network activity extracted from a “target network”.
S3) We identified two conditions under which learning is successful for any arbitrary signal: (1) the synaptic time scales must be fast enough and the network size must be large enough and (2) the set of target functions must be sufficiently heterogeneous.
Response to essential revisions:
Reviewer B commented that the set of target functions a network learns (e.g. OU noise) was not motivated for performing computations. We respectfully disagree. As stated in (S1), our goal was to show that an RNN could produce arbitrary patterns such that there would be no need to distinguish between the RNN and the output neurons. The question we address is specifically the computational capability of an RNN in terms of the patterns it can produce. As stated in (S2), previous studies considered only specific forms of target functions for the RNN, hence it remains unknown whether a spiking network can learn “arbitrary” functions. Laje and Buonomano even stated (see their Online Methods) that “choosing innate trajectories from another network did not produce effective training”, alluding that recurrent networks may not be able to learn arbitrary functions. Our examples show that this is not the case and that individual neurons may be able to perform universal computations. The target functions considered in Figure 3 (periodic functions, rate chaos, OU noise) provide a few examples of random functions that a network can learn. We felt that this provided a generic selection of arbitrary functions, and also showed that the network is able to reproduce firing patterns of an animal performing a task. We revised the introduction of the original manuscript and added additional explanation in the first paragraph of discussion to make it clear that our goal was to investigate the computational capability of RNNs to encode recurrent spiking dynamics.
We also wish to correct Reviewer B’s interpretation of the first condition for learning (small sampling and tracking errors) since his statement “you can’t learn slow signals without slow synapses, or fast signals without fast synapses” is not correct and we apologize for not making this point more clear in our original text. It is possible to learn slow signals without slow synapses. As stated in (S3), the first condition requires that synapses to be fast enough (i.e. small tracking error,etrack~ts/tc) and network size to be large enough (i.e. small sampling error,esample~1/tsN), then the network can learn any signals, in particular the slow ones. As long as the synapse is faster than the target (i.e. their time scales need not be comparable), and there are plenty of neurons in the network to produce continuous target signals, forming appropriate recurrent connections can compensate for the short synaptic time scale and allows the network to generate slow signals. This is the main conclusion of the error analysis and is stated immediately after Equation 3. Our simulation results also support the error analysis: in Figure 6A, the parameter regime above the diagonal line (i.e. slow signal and fast synapses, t_{c} > t_{s}) learns successfully, and similarly learning is successful if t_{s}/t_{c} < 1 in Figure 6E. Such slow dynamics, consistent with our results, have been studied in random rate networks [Sompolinsky, Crisanti and Sommers, 1988], random spiking networks [Ostojic, 2014] and random rate networks when reciprocal connections are overrepresented [Mart´ı, Brunel and Ostojic, 2018]. The example shown in Figure 6B also demonstrates that networks with fast synapses (τ_{s} = 30ms) can learn slow signals (τ_{c} = 100ms). We pointed out these facts in the third and fourth paragraphs of Results section. What may have led to the confusion is that for a fixed network size, synapses cannot be too fast or too slow, as there is a trade off between tracking and sampling error (See inverted “U”curves in Figure 6C,D). However, as we also show, the sampling error can be overcome with more neurons so synapses can be arbitrarily fast as long as the network can be arbitrarily large (Figure 6D).
The second condition for learning is that the set of target functions need to be sufficiently heterogeneous. Reviewer B criticized that our definition of the second condition is vague and lacks rigor since we only provided anecdotal examples in Figure 3. However, we actually discussed the mathematical basis for the second condition in subsection “Learning capacity increases with network size” in the original manuscript. We now realize that our original treatment was not convincing and have strived to improve it so that it better articulates our argument. Here we present the main ideas of the mathematical argument (See the revised manuscript for details).
The network dynamics under the quasistatic condition can be expressed in a matrix form U = WV where U and V ≡ φ(U + I) are N × P matrices. Here, the i^{th} row of U and V is the time discretized trajectory of neuron i’s synaptic drive and spiking rate, respectively. The question of successful learning is reduced to analyzing the solution set of W to this system of equations. In principle, learning is possible when the rows of U are spanned by the rows of V, i.e. target functions can be selfconsistently generated by firing rate patterns induced by the targets. We define target functions to be sufficiently heterogeneous if the firing rate matrix V is fullrank because under such condition that the image of V encompasses the largest possible space.
We show that rank(V) being maximal is sufficient for an exact or approximate solution W to exist. When N ≥ P, the maximal rank(V) implies that the rows of V span the entire Euclidean space R^{P}, of which the target functions (i.e. rows of U) are elements. Equivalently, we can say that the number of unknowns (N) exceeds or equals the number of independent equations (P). Therefore, multiple perfect solutions are possible, and the regularization term is required for the learning scheme to converge to one solution. When N < P, the maximal rank(V) implies that the linearly independent rows of V span an Ndimensional subspace of R^{P}. The number of unknowns (N) is less than the number of equations (P), so perfect learning is not possible and we can only find an approximate regression solution W = UV^{T}(VV^{T})^{−1} where the inverse of VV^{T} exists because rank(V) is maximal. If rank(V) is not maximal, a solution can still exists if rows of U are close to the subspace spanned by V. However, in this case the success depend on the specific choice of target functions because the dimension of the subspace spanned by V is strictly less than P, so whether the rows of U are contained in or close to this subspace is determined by the geometry of the subspace. This shows why increasing pattern heterogeneity, which makes the rows of V independent and the rank higher, is beneficial for learning. Conversely, if a large number of neurons is trained on the same target, it becomes increasingly difficult to develop the target patterns with limited basis functions. Given that this mathematical explanation is easily missed, we discussed the underlying mathematics in detail after introducing the second condition for learning in the fourth paragraph of Results section in the revised manuscript.
https://doi.org/10.7554/eLife.37124.017Article and author information
Author details
Funding
National Institute of Diabetes and Digestive and Kidney Diseases (Intramural Research Program)
 Christopher M Kim
 Carson C Chow
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This research was supported [in part] by the Intramural Research Program of the NIH, The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK).
Publication history
 Received: March 29, 2018
 Accepted: September 14, 2018
 Accepted Manuscript published: September 20, 2018 (version 1)
 Version of Record published: October 15, 2018 (version 2)
Copyright
This is an openaccess article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Metrics

 1,983
 Page views

 379
 Downloads

 2
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Download citations (links to download the citations from this article in formats compatible with various reference manager tools)
Open citations (links to open the citations from this article in various online reference manager services)
Further reading

 Computational and Systems Biology
 Structural Biology and Molecular Biophysics

 Chromosomes and Gene Expression
 Computational and Systems Biology