1. Neuroscience
Download icon

Spike frequency adaptation supports network computations on temporally dispersed information

  1. Darjan Salaj
  2. Anand Subramoney
  3. Ceca Kraisnikovic
  4. Guillaume Bellec
  5. Robert Legenstein
  6. Wolfgang Maass  Is a corresponding author
  1. Institute of Theoretical Computer Science, Graz University of Technology, Austria
  2. Laboratory of Computational Neuroscience, Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland
Research Article
  • Cited 0
  • Views 485
  • Annotations
Cite this article as: eLife 2021;10:e65459 doi: 10.7554/eLife.65459

Abstract

For solving tasks such as recognizing a song, answering a question, or inverting a sequence of symbols, cortical microcircuits need to integrate and manipulate information that was dispersed over time during the preceding seconds. Creating biologically realistic models for the underlying computations, especially with spiking neurons and for behaviorally relevant integration time spans, is notoriously difficult. We examine the role of spike frequency adaptation in such computations and find that it has a surprisingly large impact. The inclusion of this well-known property of a substantial fraction of neurons in the neocortex – especially in higher areas of the human neocortex – moves the performance of spiking neural network models for computations on network inputs that are temporally dispersed from a fairly low level up to the performance level of the human brain.

Introduction

Since brains have to operate in dynamic environments and during ego-motion, neural networks of the brain need to be able to solve ‘temporal computing tasks’, that is, tasks that require integration and manipulation of temporally dispersed information from continuous input streams on the behavioral time scale of seconds. Models for neural networks of the brain have inherent difficulties in carrying out such temporal computations on the time scale of seconds since spikes and postsynaptic potentials take place on the much shorter time scales of milliseconds and tens of milliseconds. It is well known that biological neurons and synapses are also subject to a host of slower dynamic processes, but it has remained unclear whether any of these can be recruited for robust temporal computation on the time scale of seconds. We focus here on a particularly prominent one of these slower processes: spike frequency adaptation (SFA) of neurons. SFA denotes the effect that preceding firing activity of a neuron transiently increases its firing threshold (see Figure 1A for an illustration). Experimental data from the Allen Institute (Allen Institute, 2018b) show that a substantial fraction of excitatory neurons of the neocortex, ranging from 20% in mouse visual cortex to 40% in the human frontal lobe, exhibit SFA (see Appendix 1—figure 8). Although a rigorous survey of time constants of SFA is still missing, the available experimental data show that SFA does produce history dependence of neural firing on the time scale of seconds, in fact, up to 20 s according to Pozzorini et al., 2013, Pozzorini et al., 2015. The biophysical mechanisms behind SFA include inactivation of depolarizing currents and the activity-dependent activation of slow hyperpolarizing or shunting currents (Gutkin and Zeldenrust, 2014; Benda and Herz, 2003).

Experimental data on neurons with spike frequency adaptation (SFA) and a simple model for SFA.

(A) The response to a 1 s long step current is displayed for three sample neurons from the Allen brain cell database (Allen Institute, 2018b). The cell id and sweep number identify the exact cell recording in the Allen brain cell database. (B) The response of a simple leaky integrate-and-fire (LIF) neuron model with SFA to the 1-s-long step current. Neuron parameters used: top row β=0.5mV, τa=1s, Iinput=0.024A; middle row β=1mV, τa=1s, Iinput=0.024A; bottom row β=1mV, τa=300ms, Iinput=0.022A. (C) Symbolic architecture of recurrent spiking neural network (SNN) consisting of LIF neurons with and without SFA. (D) Minimal SNN architecture for solving simple instances of STORE-RECALL tasks that we used to illustrate the negative imprinting principle. It consists of four subpopulations of input neurons and two LIF neurons with SFA, labeled NR and NL, that project to two output neurons (of which the stronger firing one provides the answer). (E) Sample trial of the network from (D) for two instances of the STORE-RECALL task. The input ‘Right’ is routed to the neuron NL, which fires strongly during the first STORE signal (indicated by a yellow shading of the time segment), that causes its firing threshold (shown at the bottom in blue) to strongly increase. The subsequent RECALL signal (green shading) excites both NL and NR, but NL fires less, that is, the storing of the working memory content ‘Right’ has left a ‘negative imprint’ on its excitability. Hence, NR fires stronger during recall, thereby triggering the answer ‘Right’ in the readout. After a longer pause, which allows the firing thresholds of NR and NL to reset, a trial is shown where the value ‘Left’ is stored and recalled.

SFA is an attractive feature from the perspective of the metabolic cost of neural coding and computation since it reduces firing activity (Gutierrez and Denève, 2019). But this increased metabolic efficiency comes at the cost of making spike codes for sensory stimuli history-dependent. Hence, it becomes harder to decode information from spikes if neurons with SFA are involved (Weber and Fairhall, 2019; Weber et al., 2019). This problem appears already for very simple input streams, where the same stimulus is presented repeatedly, since each presentation is likely to create a somewhat different neural code in the network. However, it has recently been shown that a careful network construction can ensure that stable neural codes emerge on the network level (Gutierrez and Denève, 2019).

A number of potential computational advantages of neurons with SFA have already been identified, such as cellular short-term memory (Marder et al., 1996; Turrigiano et al., 1996), enhancement of sensitivity to synchronous input and desirable modifications of the frequency response curve (Benda et al., 2010; Ermentrout, 1998; Wang, 1998; Gutkin and Zeldenrust, 2014). On the network level, SFA may enhance rhythms and support Bayesian inference (Kilpatrick and Ermentrout, 2011; Deneve, 2008). The contribution of SFA to temporal computing capabilities of recurrent spiking neural networks (SNNs) had first been examined in Bellec et al., 2018a, and its role for language processing in feedforward networks was subsequently examined in Fitz et al., 2020.

Here we are taking a closer look at enhanced temporal computing capabilities of SNNs that are enabled through SFA and also compare the computational benefit of SFA with that of previously considered slower dynamic processes in SNNs: short-term synaptic plasticity. Most experimental analyses of temporal computing capabilities of biological neural networks have focused on arguably the simplest type of temporal computing, where the response to a stimulus has to be given after a delay. In other words, information about the preceding stimulus has to be kept in a working memory. We start with this simple task (STORE-RECALL task) since this task makes the analysis of neural coding especially transparent. We use it here to demonstrate a novel principle that is used by neurons with SFA to implement working memory – ‘the negative imprinting principle.’ That is, firing of a neuron leaves a negative imprint of its activity because its excitability is reduced due to SFA. Such negative imprinting has previously been utilized in Gutierrez and Denève, 2019. We then show that the working memory capability of SNNs with SFA scales up to much more demanding and ecologically more realistic working memory tasks, where not just a single but numerous features, for example, features that characterize a previously encountered image, movie scene, or sentence, have to be stored simultaneously.

However, these working memory tasks capture just a small fragment of temporal computing capabilities of brains. Substantially more common and more difficult are tasks where information is temporally dispersed over a continuous input stream, say in a sentence or a video clip, and has to be integrated over time in order to solve a task. This requires that the information that is stored in working memory has to be continuously updated. We tested temporal computing capabilities of SNNs with SFA for two types of such tasks. First, we consider standard benchmark tasks for temporal computing capabilities: keyword spotting, time-series classification (sequential MNIST), and delayed XOR task. Then we consider two tasks that are arguably at the heart of higher-level cognitive brain processing (Lashley, 1951): processing and manipulations of sequences of symbols according to dynamic rules. We also analyze the neural codes that emerge in SNNs with SFA for such tasks and compare them with neural codes for corresponding tasks in the brain (Barone and Joseph, 1989; Liu et al., 2019; Carpenter et al., 2018). Since our focus is on computing, rather than learning capabilities, we use a powerful tool for optimizing network parameters for task performance: backpropagation through time (BPTT). While this method is not assumed to be biologically realistic, it has recently been shown that almost the same task performance can, in general, be achieved for SNNs – with and without SFA – through training with a biologically more realistic learning method: e-prop (Bellec et al., 2020). We demonstrate this here for the case of the 12AX task.

Results

Network model

We employ a standard simple model for neurons with SFA, the generalized leaky integrate-and-fire (LIF) model GLIF2 from Teeter et al., 2018; Allen Institute, 2018a. A practical advantage of this simple model is that it can be efficiently simulated and that it is amenable to gradient descent training (Bellec et al., 2018a). It is based on a standard LIF neuron model. In a LIF neuron, inputs are temporally integrated, giving rise to its membrane potential. The neuron produces a spike when its membrane potential is above a threshold vth. After the spike, the membrane potential is reset and the neuron enters a refractory period during which it cannot spike again. The precise dynamics of the LIF model is given in Equation (2) in Materials and methods. The GLIF2 model extends the LIF model by adding to the membrane voltage a second hidden variable, a variable component a(t) of the firing threshold A(t) that increases by a fixed amount after each spike z(t) of the neuron, and then decays exponentially back to 0 (see Figure 1B, E). This variable threshold models the inactivation of voltage-dependent sodium channels in a qualitative manner. We write zj(t) for the spike output of neuron j, which switches from 0 to 1 at time t when the neuron fires at time t, and otherwise has value 0. With this notation, one can define the SFA model by the equations

(1) Aj(t)=vth+βaj(t),aj(t+1)=ρjaj(t)+(1ρj)zj(t),

where vth is the constant baseline of the firing threshold Aj(t), and β>0 scales the amplitude of the activity-dependent component. The parameter ρj=exp(-1τa,j) controls the speed by which aj(t) decays back to 0, where τa,j is the adaptation time constant of neuron j. For simplicity, we used a discrete-time model with a time step of δt=1 ms (see Materials and methods for further details). We will in the following also refer to this model as ‘LIF with SFA.’ Consistent with the experimental data (Allen Institute, 2018a), we consider recurrent networks of LIF neurons, SNNs, of which some fraction is equipped with SFA. It turns out that the precise fraction of neurons with SFA does not matter for most tasks, especially if it stays somewhere in the biological range of 20–40%. We usually consider for simplicity fully connected recurrent networks, but most tasks can also be solved with sparser connectivity. Neurons in the recurrent network project to readout neurons, which produce the output of the network (see Figure 1C). The final output was either the maximum value after applying the softmax or thresholded values after applying the sigmoid on each readout neuron.

In order to analyze the potential contribution of SFA to temporal computing capabilities of SNNs, we optimized the weights of the SNN for each task. We used for this stochastic gradient descent in the form of BPTT (Mozer, 1989; Robinson and Fallside, 1987; Werbos, 1988), which is, to the best of our knowledge, the best-performing optimization method. Although this method performs best for differentiable neural network models, it turns out that the non-differentiable output of a spiking neuron can be overcome quite well with the help of a suitably scaled pseudo-derivative (Bellec et al., 2018a). In general, similar task performance can also be achieved with a biologically plausible learning method for SNNs e-prop (Bellec et al., 2020). Although computing rather than learning capabilities are in the focus of this paper, we demonstrate for one of the most demanding tasks that we consider, 12AX task, that almost the same task performance as with BPTT can be achieved with e-prop.

SFA provides working memory simultaneously for many pieces of information and yields powerful generalization capability

To elucidate the mechanism by which SFA supports temporal computing capabilities of SNNs, we first consider classical working memory tasks, where information just has to be temporally stored by the neural network, without the need for frequent updates of this working memory during an instance of the task.

Negative imprinting principle

To demonstrate how neurons with SFA can contribute to solving working memory tasks, we first consider the standard case where just a single value, for example, the position left or right of a prior stimulus, has to be stored during a delay. The simple network shown in Figure 1D, consisting of two neurons with SFA (NL and NR), can already solve this task if there are long gaps between different instances of the task. We assume that these two neurons receive spike inputs from four populations of neurons. Two of them encode the value that is to be stored, and the other two convey the commands STORE and RECALL through high-firing activity in these populations of input neurons (see Figure 1E for an illustration). The neuron NL (NR) fires when the population that encodes the STORE command fires (yellow shading in Figure 1D) and simultaneously the input population for value ‘Right’ (‘Left’) is active. Furthermore, we assume that the input population that encodes the RECALL command (green shading in Figure 1E) causes both NL and NR to fire. However, the firing threshold of that one of them that had fired during the preceding STORE command is higher (see the blue threshold in the left half and the red threshold in the right half of Figure 1E), causing a weaker response to this RECALL command. Hence, if the spikes of NL and NR are each routed to one of the two neurons in a subsequent winner-take-all (WTA) circuit, the resulting winner encodes the value that has been stored during the preceding STORE command. The time courses of the firing thresholds of neurons NL and NR in Figure 1E clearly indicate the negative imprinting principle that underlies this working memory mechanism: the neuron that fires less during RECALL was the one that had responded the strongest during the preceding STORE phase, and this firing left a stronger ‘negative imprint’ on this neuron. Note that this hand-constructed circuit does not work for large ranges of time differences between STORE and RECALL, and more neurons with SFA are needed to solve the subsequently discussed full versions of the task.

Scaling the negative imprinting principle up to more realistic working memory tasks

We wondered whether SNNs with SFA can also solve more realistic working memory tasks, where not just a single bit, but a higher-level code of a preceding image, sentence, or movie clip needs to be stored. Obviously, brains are able to do that, but this has rarely been addressed in models. In addition, brains are generally exposed to ongoing input streams also during the delay between STORE and RECALL, but need to ignore these irrelevant input segments. Both of these more demanding aspects are present in the more demanding version of the STORE-RECALL task that is considered in Figure 2A. Here the values of 20, instead of just 1, input bits need to be stored during a STORE command and recalled during a RECALL command. More precisely, a 20-dimensional stream of input bits is given to the network, whose values during each 200 ms time segment are visualized as 4 × 5 image in the top row of Figure 2A. Occasionally, a pattern in the input stream is marked as being salient through simultaneous activation of a STORE command in a separate input channel, corresponding, for example, to an attentional signal from a higher brain area (see yellow shading in Figure 2A). The task is to reproduce during a RECALL command the pattern that had been presented during the most recent STORE command. Delays between STORE and RECALL ranged from 200 to 1600 ms. 20 binary values were simultaneously extracted as network outputs during RECALL by thresholding the output values of 20 linear readout neurons. We found that an SNN consisting of 500 neurons with SFA, whose adaptive firing threshold had a time constant of τa=800 ms, was able to solve this task with an accuracy above 99% and average firing activity of 13.90 ± 8.76 Hz (mean ± standard deviation). SFA was essential for this behavior because the recall performance of a recurrent network of LIF neurons without SFA, trained in exactly the same way, stayed at chance level (see Materials and methods). In Figure 2A, one sees that those neurons with SFA that fired stronger during STORE fire less during the subsequent RECALL, indicating a use of the negative imprinting principle also for this substantially more complex working memory task.

20-dimensional STORE-RECALL and sMNIST task.

(A) Sample trial of the 20-dimensional STORE-RECALL task where a trained spiking neural network (SNN) of leaky integrate-and-fire (LIF) neurons with spike frequency adaptation (SFA) correctly stores (yellow shading) and recalls (green shading) a pattern. (B, C) Test accuracy comparison of recurrent SNNs with different slow mechanisms: dual version of SFA where the threshold is decreased and causes enhanced excitability (ELIF), predominantly depressing (STP-D) and predominantly facilitating short-term plasticity (STP-F). (B) Test set accuracy of five variants of the SNN model on the one-dimensional STORE-RECALL task. Bars represent the mean accuracy of 10 runs with different network initializations. (C) Test set accuracy of the same five variants of the SNN model for the sMNIST time-series classification task. Bars represent the mean accuracy of four runs with different network initializations. Error bars in (B) and (C) indicate standard deviation.

Interestingly, this type of working memory in an SNN with SFA shares an important feature with the activity-silent form of working memory in the human brain that had been examined in the experiments of Wolff et al., 2017. It had been shown there that the representation of working memory content changes significantly between memory encoding and subsequent network reactivation during the delay by an ‘impulse stimulus’: a classifier trained on the network activity during encoding was not able to classify the memory content during a network reactivation in the delay, and vice versa. Obviously, this experimental result from the human brain is consistent with the negative imprinting principle. We also tested directly whether the experimentally observed change of neural codes in the human brain also occurs in our model. We trained a classifier for decoding the content of working memory during STORE and found that this classifier was not able to decode this content during RECALL, and vice versa (see Materials and methods). Hence, our model is in this regard consistent with the experimental data of Wolff et al., 2017.

We also found that it was not possible to decode the stored information from the firing activity between STORE and RECALL, as one would expect if the network would store the information through persistent firing. Actually, the firing activity was quite low during this time period. Hence, this demo shows that SNNs with SFA have, in addition to persistent firing, a quite different method for transient storage of information.

Generalization of SFA-enhanced temporal computations to unseen inputs

In contrast to the brain, many neural network models for working memory can store only information on which they have been trained. In fact, this tends to be unavoidable if a model can only store a single bit. In contrast, the human brain is also able to retain new information in its working memory. The SNN with SFA that we used for the 20-dimensional working memory task also had this capability. It achieved a performance of 99.09%, that is, 99.09% of the stored 20-dimensional bit vectors were accurately reproduced during recall, on bit vectors that had never occurred during training. In fact, we made sure that all bit vectors that had to be stored during testing had a Hamming distance of at least five bits to all bit vectors used during training. A sample segment of a test trial is shown in Figure 2A, with the activity of input neurons at the top and the activation of readout neurons at the bottom.

No precise alignment between time constants of SFA and working memory duration is needed

Experimental data from the Allen Institute database suggest that different neurons exhibit a diversity of SFA properties. We show that correspondingly a diversity of time constants of SFA in different neurons provides high performance for temporal computing. We consider for simplicity the one-dimensional version of the task of Figure 2A, where just a single bit needs to be stored in working memory between STORE and RECALL commands. The expected delay between STORE and RECALL (see the header row of Table 1) scales the working memory time span that is required to solve this task. Five fixed time constants were tested for SFA (τa=200 ms, 2 s, 4 s, 8 s, see top five rows of Table 1). Also, a power-law distribution of these time constants, as well as a uniform distribution, was considered (see last two rows of Table 1). One sees that the resulting diversity of time constants for SFA yields about the same performance as a fixed choice of the time constant that is aligned with the required memory span of the task. However, a much larger time constant (see the row with τa=8 s in the column with an expected memory span of 200 ms or 2 s for the task) or a substantially smaller time constant (see the row with τa=2 s in the column with an expected memory span of 8 s) tends to work well.

Table 1
Recall accuracy (in %) of spiking neural network (SNN) models with different time constants of spike frequency adaptation (SFA) (rows) for variants of the STORE-RECALL task with different required memory time spans (columns).

Good task performance does not require good alignment of SFA time constants with the required time span for working memory. An SNN consisting of 60 leaky integrate-and-fire (LIF) neurons with SFA was trained for many different choices of SFA time constants for variations of the one-dimensional STORE-RECALL task with different required time spans for working memory. A network of 60 LIF neurons without SFA trained under the same parameters did not improve beyond chance level (~50% accuracy), except for the task instance with an expected delay of 200 ms where the LIF network reached 96.7% accuracy (see top row).

Expected delay between STORE and RECALL200 ms2 s4 s8 s16 s
Without SFA (τa=0 ms)96.751504951
τa=200 ms99.9273.6585151
τa=2 s99.099.698.892.275.2
τa=4 s99.199.799.797.890.5
τa=8 s99.699.899.797.797.1
τa power-law dist. in [0, 8] s99.699.798.496.383.6
τa uniform dist. in [0, 8] s96.299.998.692.192.6

SFA improves the performance of SNNs for common benchmark tasks that require nonlinear computational operations on temporally dispersed information

We now turn to more demanding temporal computing tasks, where temporally dispersed information not only needs to be stored, but continuously updated in view of new information. We start out in this section with three frequently considered benchmark tasks of this type: sequential MNIST, Google Speech Commands, and delayed XOR.

Sequential MNIST (sMNIST) is a standard benchmark task for time-series classification. In this variant of the well-known handwritten digit recognition dataset MNIST, the pixel values of each sample of a handwritten digit are presented to the network in the form of a time series, one pixel in each ms, as they arise from a row-wise scanning pattern of the handwritten digit. This task also requires very good generalization capability since the resulting sequence of pixel values for different handwriting styles of the same digit may be very different, and the network is tested on samples that had never before been presented to the network.

An SNN with SFA was able to solve this task with a test accuracy of 93.7%, whereas an SNN without SFA only reached an accuracy of 51.8%. We refer to Section 2 of Appendix 1 for further details.

We also compared the performance of SNNs with and without SFA on the keyword-spotting task Google Speech Commands Dataset (Warden, 2018) (v0.02). To solve this task, the network needs to correctly classify audio recordings of silence, spoken unknown words, and utterings of 1 of 10 keywords by different speakers. On this task, the performance of SNNs increases with the inclusion of SFA (from 89.04% to 91.21%) and approaches the state-of-the-art artificial recurrent model (93.18%) (see Section 3 of Appendix 1 and Appendix 1—table 1).

Finally, we tested the performance of SNNs with SFA on the delayed-memory XOR task, a task which had previously already been used as benchmark tasks for SNNs in Huh and Sejnowski, 2018. In this task, the network is required to compute the exclusive-or operation on a time series of binary input pulses and provide the answer when prompted by a go-cue signal. Across 10 different runs, an SNN with SFA solved the task with 95.19±0.014% accuracy, whereas the SNN without SFA just achieved 61.30±0.029% (see Section 4 of Appendix 1 and Appendix 1—figure 3).

The good performance of SNNs with SFA on all three tasks demonstrates that SFA provides computational benefits to SNNs also for substantially more demanding temporal computing tasks in comparison with standard working memory tasks. Before we turn to further temporal computing tasks that are of particular interest from the perspective of neuroscience and cognitive science, we first analyze the contribution of other slow mechanisms in biological neurons and synapses on the basic working memory task and on sMNIST.

Comparing the contribution of SFA to temporal computing with that of other slow processes in neurons and synapses

Facilitating short-term plasticity (STP-F) and depressing short-term plasticity (STP-D) are the most frequently discussed slower dynamic processes in biological synapses. STP-F of synapses, also referred to as paired-pulse facilitation, increases the amplitudes of postsynaptic potentials for the later spikes in a spike train. Whereas synaptic connections between pyramidal cells in the neocortex are usually depressing (Markram et al., 2015), it was shown in Wang et al., 2006 that there are facilitating synaptic connections between pyramidal cells in the medial prefrontal cortex of rodents, with a mean time constant of 507 ms (standard deviation 37 ms) for facilitation. It was shown in Mongillo et al., 2008 that if one triples the experimentally found mean time constant for facilitation, then this mechanism supports basic working memory tasks.

STP-D of synapses, also referred to as paired-pulse depression, reduces the amplitude of postsynaptic potentials for later spikes in a spike train. The impact of this mechanism on simple temporal computing tasks had been examined in a number of publications (Maass et al., 2002; Buonomano and Maass, 2009; Masse et al., 2019; Hu et al., 2020).

In addition, we consider a dual version of SFA: a neuron model where each firing of the neuron causes its firing threshold to decrease – rather than increase as in SFA – which then returns exponentially to its resting value. We call this neuron model the enhanced-excitability LIF (ELIF) model. Such neural mechanism had been considered, for example, in Fransén et al., 2006. Note that a transient increase in the excitability of a neuron can also be caused by depolarization-mediated suppression of inhibition, a mechanism that has been observed in many brain areas (Kullmann et al., 2012). The dynamics of the salient hidden variables in all three models are described in Materials and methods and illustrated in Appendix 1—figure 2.

We tested the resulting five different types of SNNs, each consisting of 60 neurons, first on the simple 1D working memory task. The results in Figure 2B show that SNNs with SFA provide by far the best performance on this task.

Figure 2C shows that for sMNIST both SNNs with SFA and SNNs with STP-D achieve high performance. Surprisingly, the performance of SNNs with facilitating synapses is much worse, both for sMNIST and for the working memory task.

SFA supports demanding cognitive computations on sequences with dynamic rules

Complex cognitive tasks often contain a significant temporal processing component, including the requirement to flexibly incorporate task context and rule changes. To test whether SFA can support such cognitive processing, we consider the 12AX task (Frank et al., 2001). This task is an extension of the A-X version of the continuous performance task (CPT-AX), which has been extensively studied in humans (Barch et al., 2009). It tests the ability of subjects to apply dynamic rules when detecting specific subsequences in a long sequence of symbols while ignoring irrelevant inputs (O'Reilly and Frank, 2006; MacDonald, 2008). It also probes the capability to maintain and update a hierarchical working memory since the currently active rule, that is, the context, stays valid for a longer period of time and governs what other symbols should be stored in working memory.

More precisely, in the 12AX task, the subject is shown a sequence of symbols from the set 1, 2, A, X, B, Y, C, Z. After processing any symbol in the sequence, the network should output ‘R’ if this symbol terminates a context-dependent target sequence and ‘L’ otherwise. The current target sequence depends on the current context, which is defined through the symbols ‘1’ and ‘2.’ If the most recently received digit was a ‘1’, the subject should output ‘R’ only when it encounters a symbol ‘X’ that terminates a subsequence A…X. This occurs, for example, for the seventh symbol in the trial shown in Figure 3. In case that the most recent input digit was a ‘2’, the subject should instead respond ‘R’ only after the symbol ‘Y’ in a subsequent subsequence B…Y (see the 20th symbol in Figure 3). In addition, the processed sequence contains letters ‘C’ and ‘Z’ that are irrelevant and serve as distractors. This task requires a hierarchical working memory because the most recently occurring digit determines whether subsequent occurrences of ‘A’ or ‘B’ should be placed into working memory. Note also that neither the content of the higher-level working memory, that is, the digit, nor the content of the lower-level working memory, that is, the letter A or B, are simply recalled. Instead, they affect the target outputs of the network in a more indirect way. Furthermore, the higher-level processing rule affects what is to be remembered at the lower level.

Solving the 12AX task by a network of spiking neurons with spike frequency adaptation (SFA).

A sample trial of the trained network is shown. From top to bottom: full input and target output sequence for a trial, consisting of 90 symbols each, blow-up for a subsequence – spiking input for the subsequence, the considered subsequence, firing activity of 10 sample leaky integrate-and-fire (LIF) neurons without and 10 sample LIF neurons with SFA from the network, time course of the firing thresholds of these neurons with SFA, output activation of the two readout neurons, the resulting sequence of output symbols which the network produced, and the target output sequence.

A simpler version of this task, where X and Y were relevant only if they directly followed A or B, respectively, and where fewer irrelevant letters occurred in the input, was solved in O'Reilly and Frank, 2006; Martinolli et al., 2018; Kruijne et al., 2020 through biologically inspired artificial neural network models that were endowed with special working memory modules. Note that for this simpler version no lower-order working memory is needed because one just has to wait for an immediate transition from A to X in the input sequence or for an immediate transition from B to Y. But neither the simpler nor the more complex version, which is considered here, of the 12AX task has previously been solved by a network of spiking neurons.

In the version of the task that we consider, the distractor symbols between relevant symbols occur rather frequently. Hence, robust maintenance of relevant symbols in the hierarchical working memory becomes crucial because time spans between relevant symbols become longer, and hence the task is more demanding – especially for a neural network implementation.

Overall, the network received during each trial (episode) sequences of 90 symbols from the set {1, 2, A, B, C, X, Y, Z}, with repetitions as described in Materials and methods. See the top of Figure 3 for an example (the context-relevant symbols are marked in bold for visual ease).

We show in Figure 3 that a generic SNN with SFA can solve this quite demanding version of the 12AX task. The network consisted of 200 recurrently connected spiking neurons (100 with and 100 without SFA), with all-to-all connections between them. After training, for new symbol sequences that had never occurred during training, the network produced an output string with all correct symbols in 97.79% of episodes.

The average firing activity of LIF neurons with SFA and LIF neurons without SFA was (12.37±2.90) Hz and (10.65±1.63) Hz (mean ± standard deviation), respectively (the average was calculated over 2000 test episodes for one random initialization of the network). Hence, the network operated in a physiologically meaningful regime.

These results were obtained after optimizing synaptic weights via BPTT. However, training with a recently published biologically plausible learning method called random e-prop (Bellec et al., 2020) produced a similar performance of 92.89% (averaged over five different network initializations).

We next asked how the fraction of neurons with SFA affects SNN performance in the case of the usual parameter optimization via BPTT. When all, rather than just half of the 200, LIF neurons were endowed with SFA, a much lower accuracy of just 72.01% was achieved. On the other hand, if just 10% of the neurons had SFA, a performance of 95.39% was achieved. In contrast, a recurrent SNN with the same architecture but no neurons with SFA only achieved a performance of 0.39% (each success rate was averaged over five network initializations). Hence, a few neurons with SFA suffice for good performance, and it is important to also have neurons without SFA for this task.

Neuronal networks in the brain are subject to various sources of noise. A highly optimized SNN model with sparse firing activity might utilize brittle spike-time correlations. Such a network would therefore be highly susceptible to noise. To test whether this was the case in our model, we tested how the performance of the above network changed when various levels of noise were added to all network neurons during testing. We found that although the spike responses of the neurons become quite different, see Appendix 1—figures 5 and 6, the performance of the SNN model is little affected by low noise and decays gracefully for higher levels of noise. For details, see Section 5 of Appendix 1.

Surprisingly, it was not necessary to create a special network architecture for the two levels of working memory that our more complex version of the 12AX task requires: a near perfectly performing network emerged from training a generic fully connected SNN with SFA.

SFA enables SNNs to carry out complex operations on sequences of symbols

Learning to carry out operations on sequences of symbols in such a way that they generalize to new sequences is a fundamental capability of the human brain, but a generic difficulty for neural networks (Marcus, 2003). Not only humans but also non-human primates are able to carry out operations on sequences of items, and numerous neural recordings starting with (Barone and Joseph, 1989) up to recent results such as (Carpenter et al., 2018; Liu et al., 2019) provide information about the neural codes for sequences that accompany such operations in the brain. The fundamental question of how serial order of items is encoded in working memory emerges from the more basic question of how the serial position of an item is combined with the content information about its identity (Lashley, 1951). The experimental data both of Barone and Joseph, 1989 and Liu et al., 2019 suggest that the brain uses a factorial code where position and identity of an item in a sequence are encoded separately by some neurons, thereby facilitating flexible generalization of learned experience to new sequences.

We show here that SNNs with SFA can be trained to carry out complex operations on sequences, are able to generalize such capabilities to new sequences, and produce spiking activity and neural codes that can be compared with neural recordings from the brain. In particular, they also produce factorial codes, where separate neurons encode the position and identity of a symbol in a sequence. One basic operation on sequences of symbols is remembering and reproducing a given sequence (Liu et al., 2019). This task had been proposed by Marcus, 2003 to be a symbolic computation task that is fundamental for symbol processing capabilities of the human brain. But non-human primates can also learn simpler versions of this task, and hence it was possible to analyze how neurons in the brain encode the position and identity of symbols in a sequence (Barone and Joseph, 1989; Carpenter et al., 2018). Humans can also reverse sequences, a task that is more difficult for artificial networks to solve (Marcus, 2003; Liu et al., 2019). We show that an SNN with SFA can carry out both of these operations and is able to apply them to new sequences of symbols that did not occur during the training of the network.

We trained an SNN consisting of 320 recurrently connected LIF neurons (192 with and 128 without SFA) to carry out these two operations on sequences of 5 symbols from a repertoire of 31 symbols. Once trained, the SNN with SFA could duplicate and reverse sequences that it had not seen previously, with a success rate of 95.88% (average over five different network initializations). The ‘success rate’ was defined as the fraction of test episodes (trials) where the full output sequence was generated correctly. Sample episodes of the trained SNN are shown in Figure 4A, and a zoom-in of the same spike rasters is provided in Appendix 1—figure 7. For comparison, we also trained a LIF network without SFA in exactly the same way with the same number of neurons. It achieved a performance of 0.0%.

Analysis of a spiking neural network (SNN) with spike frequency adaptation (SFA) trained to carry out operations on sequences.

(A) Two sample episodes where the network carried out sequence duplication (left) and reversal (right). Top to bottom: spike inputs to the network (subset), sequence of symbols they encode, spike activity of 10 sample leaky integrate-and-fire (LIF) neurons (without and with SFA) in the SNN, firing threshold dynamics for these 10 LIF neurons with SFA, activation of linear readout neurons, output sequence produced by applying argmax to them, and target output sequence. (B–F) Emergent neural coding of 279 neurons in the SNN (after removal of neurons detected as outliers) and peri-condition time histogram (PCTH) plots of two sample neurons. Neurons are sorted by time of peak activity. (B) A substantial number of neurons were sensitive to the overall timing of the tasks, especially for the second half of trials when the output sequence is produced. (C) Neurons separately sorted for duplication episodes (top row) and reversal episodes (bottom row). Many neurons responded to input symbols according to their serial position, but differently for different tasks. (D) Histogram of neurons categorized according to conditions with statistically significant effect (three-way ANOVA). Firing activity of a sample neuron that fired primarily when (E) the symbol ‘g’ was to be written at the beginning of the output sequence. The activity of this neuron depended on the task context during the input period; (F) the symbol ‘C’ occurred in position 5 in the input, irrespective of the task context.

The average firing activity of LIF neurons without SFA and LIF neurons with SFA was (19.88±2.68) Hz, and (21.51±2.95) Hz (mean ± standard deviation), respectively. The average was calculated over 50,000 test episodes for one random initialization of the network.

A diversity of neural codes emerge in SNNs with SFA trained to carry out operations on sequences

Emergent coding properties of neurons in the SNN are analyzed in Figure 4B–D, and two sample neurons are shown in Figure 4E, F. Neurons are sorted in Figure 4B, C according to the time of their peak activity (averaged over 1000 episodes), like in Harvey et al., 2012. The neurons have learned to abstract the overall timing of the tasks (Figure 4B). A number of network neurons (about one-third) participate in sequential firing activity independent of the type of task and the symbols involved (see the lower part of Figure 4B and the trace for the average activity of neurons left of the marker for the start of duplication or reversal). This kind of activity is reminiscent of the neural activity relative to the start of a trial that was recorded in rodents after they had learned to solve tasks that had a similar duration (Tsao et al., 2018).

The time of peak activity of other neurons in Figure 4B depended on the task and the concrete content, indicated by a weak activation during the loading of the sequence (left of the marker), but stronger activation after the start of duplication or reversal (right of the marker). The dependence on the concrete content and task is shown in Figure 4C. Interestingly, these neurons change their activation order already during the loading of the input sequence in dependence of the task (duplication or reversal). Using three-way ANOVA, we were able to categorize each neuron as selective to a specific condition (symbol identity, serial position in the sequence, and type of task) or a nonlinear combination of conditions based on the effect size ω2. Each neuron could belong to more than one category if the effect size was above the threshold of 0.14 (as suggested by Field, 2013). Similar to recordings from the brain (Carpenter et al., 2018), a diversity of neural codes emerged that encode one variable or a combination of variables. In other words, a large fraction of neurons encoded a nonlinear combinations of all three variables (see Figure 4D). Peri-condition time histogram (PCTH) plots of two sample neurons are shown in Figure 4E, F: one neuron is selective to symbol ‘g’ but at different positions depending on task context; the other neuron is selective to symbol ‘C’ occurring at position 5 in the input, independent of task context. Thus one sees that a realization of this task by an SNN, which was previously not available, provides rich opportunities for a comparison of emergent spike codes in the model and neuronal recordings from the brain. For more details, see the last section of Materials and methods.

Discussion

Brains are able to carry out complex computations on temporally dispersed information, for example, on visual inputs streams, or on sequences of words or symbols. We have addressed the question how the computational machinery of the brain, recurrently connected networks of spiking neurons, can accomplish that.

The simplest type of temporal computing task just requires to hold one item, which typically can be characterized by a single bit, during a delay in a working memory, until it is needed for a behavioral response. This can be modeled in neural networks by creating an attractor in the network dynamics that retains this bit of information through persistent firing during the delay. But this implementation is inherently brittle, especially for SNNs, and it is not clear whether it can be scaled up to more ecological working memory tasks where multiple features, for example, the main features that characterize an image or a story, are kept in working memory, even in the presence of a continuous stream of distracting network inputs. We have shown that SFA enables SNNs to solve this task, even for feature vector that the network had never encountered before (see Figure 2A).

This model for working memory shares many properties with how the human brain stores content that is not attended (Wolff et al., 2017):

  1. The data of Wolff et al., 2017 suggest that in order to understand such working memory mechanisms it is necessary to ‘look beyond simple measures of neural activity and consider a richer diversity of neural states that underpin content-dependent behavior.’ We propose that the current excitability of neurons with SFA is an example for such hidden neural state that is highly relevant in this context. This provides a concrete experimentally testable hypothesis.

  2. They proposed more specifically that ‘activity silent neural states are sufficient to bridge memory delays.’ We have shown this in Figure 2A for a quite realistic working memory task, where a complex feature vector has to be kept in memory, also in the presence of continuous distractor inputs.

  3. They found that an unspecific network input, corresponding to the activation of a population of input neurons for RECALL in our model, is able to recover in the human brain an item that has been stored in working memory, but that storing and recalling of an unattended item by ‘pinging’ the brain generates very different network activity. This is exactly what happens in our model. We have shown that a classifier that was trained to decode the stored item from the neural activity during encoding (STORE) was not able to decode the working memory content during RECALL, and vice versa. Furthermore, we have elucidated a particular neural coding principle, the negative imprinting principle, that is consistent with this effect (see the illustration in Figure 1E). An immediate experimentally testable consequence of the negative encoding principle is that the same network responds with reduced firing to repeated representations of an unattended item. This has in fact already been demonstrated for several brain areas, such as sensory cortices (Kok and de Lange, 2015) and perirhinal cortex (Winters et al., 2008).

  4. They found that decoding of an unattended working memory item without ‘pinging’ the network ‘dropped to chance relatively quickly after item presentation.’ We found that also in our model the content of working memory could not be decoded during the delay between STORE and RECALL.

But obviously ecologically relevant temporal computing tasks that the brain routinely solves are much more complex and demanding than such standard working memory tasks. The 12AX is a prime example for such more demanding temporally computing task, where two different types of memories need to be continuously updated: memory about the currently valid rule and memory about data. The currently valid rule determines which item needs to be extracted from the continuous stream of input symbols and remembered: the symbol A or the symbol B. We have shown that SFA enables an SNN to solve this task, without requiring a specific architecture corresponding to the two types of working memory that it requires. This result suggests that it is also unlikely that such two-tiered architecture can be found in the brain, rather that both types of working memory are intertwined in the neural circuitry.

Our result suggests that most other temporal computing tasks that brains are able to solve can also be reproduced by such simple models. We have tested this hypothesis for another cognitively demanding task on temporally dispersed information that has been argued to represent an important ‘atom of neural computation’ in the brain, that is, an elementary reusable computing primitive on which the astounding cognitive capabilities of the human brain rely (Marcus, 2003; Marcus et al., 2014): the capability to reproduce or invert a given sequence of symbols, even if this sequence has never been encountered before. We have shown in Figure 4 that SFA enables SNNs to solve this task. Since also monkeys can be trained to carry out simple operations on sequences of symbols, there are in this case experimental data available on neural codes that are used by the primate brain to encode serial order and identity of a sequence item. We found that, like in the brain, a diversity of neural codes emerge: neurons encode one or several of the relevant variables – symbol identity, serial position of a symbol, and type of task. Such comparison of neural coding properties of brains and neural network models is only possible if the model employs – like the brain – spiking neurons, and if the firing rates of these neurons remain in a physiological range of sparse activity, as the presented SNN models tend to provide. Hence, the capability to produce such brain-like computational capabilities in SNNs is likely to enhance the convergence of further biological experiments, models, and theory for uncovering the computational primitives of the primate brain.

Since there is a lack of further concrete benchmark tasks from neuroscience and cognitive science for temporal computing capabilities of brains, we have also tested the performance of SNNs with SFA on some benchmark tasks that are commonly used in neuromorphic engineering and AI, such as sequential MNIST and the Google Speech Commands Dataset. We found that SNNs with SFA can solve also these tasks very well, almost as well as the state-of-the-art models in machine learning and AI: artificial neural networks with special – unfortunately biologically implausible – units for temporal computing such as long short-term memory (LSTM) units.

Besides SFA, there are several other candidates for hidden states of biological neurons and synapses that may support brain computations on temporally dispersed information. We examined three prominent candidates for other hidden states and analyzed how well they support these computations in comparison with SFA: depressing and facilitating short-term plasticity of synapses, as well as an activity-triggered increase in the excitability of neurons (ELIF neurons). We have shown in Figure 2B that these other types of hidden states provide lower performance than SFA for the simple working memory task. However, for a more demanding time-series classification task short-term depression of synapses provides about the same performance as SFA (see Figure 2C). An important contribution of depressing synapses for temporal computing has already previously been proposed (Hu et al., 2020). This is on first sight counter-intuitive, just as the fact that spike-triggered reduction rather than increase of neural excitability provides better support for temporal computing. But a closer look shows that it just requires a ‘sign-inversion’ of readout units to extract information from reduced firing activity. On the other hand, reduced excitability of neurons has the advantage that this hidden state is better protected against perturbations by ongoing network activity: a neuron that is in an adapted state where it is more reluctant to fire is likely to respond less to noise inputs, thereby protecting its hidden state from such noise. In contrast, a more excitable neuron is likely to respond also to weaker noise inputs, thereby diluting its hidden state. These observations go in a similar direction as the results of Mongillo et al., 2018; Kim and Sejnowski, 2021, which suggest that inhibition, rather than excitation, is critical for robust memory mechanisms in the volatile cortex.

Finally, it should be pointed out that there are numerous other hidden states in biological neurons and synapses that change on slower time scales. One prominent example are metabotropic glutamate receptors, which are present in a large fraction of synapses throughout the thalamus and cortex. Metabotropic receptors engage a complex molecular machinery inside the neuron, which integrates signals also over long time scales from seconds to hours and days (Sherman, 2014). However, at present, we are missing mathematical models for these processes, and hence it is hard to evaluate their contribution to temporal computing.

We have analyzed here the capabilities and limitations of various types of SNNs and have not addressed the question of how such capabilities could be induced in SNNs of the brain. Hence, we have used the most powerful optimization method for inducing computational capabilities in SNNs: a spike-oriented adaptation of BPTT. It was previously shown in Bellec et al., 2020 that in general almost the same performance can be achieved in SNNs with SFA when BPTT is replaced by e-prop, a more biologically plausible network gradient descent algorithm. We have tested this also for the arguably most difficult temporal computing task that was examined in this paper, 12AX, and found that e-prop provides almost the same performance. However, temporal computing capabilities are likely to arise in brains through a combination of nature and nurture, and it remains to be examined to what extent the genetic code endows SNNs of the brain with temporal computing capabilities. In one of the first approaches for estimating the impact of genetically encoded connection probabilities on computational capabilities, it was already shown that connection probabilities can provide already some type of working memory without any need for learning (Stöckl et al., 2021), but this approach has not yet been applied to SNNs with SFA or other slowly changing hidden states.

Finally, our results raise the question whether the distribution of time constants of SFA in a cortical area is related to the intrinsic time scale of that cortical area, as measured, for example, via intrinsic fluctuations of spiking activity (Murray et al., 2014; Wasmuht et al., 2018). Unfortunately, we are lacking experimental data on time constants of SFA in different brain areas. We tested the relation between time constants of SFA and the intrinsic time scale of neurons according to Wasmuht et al., 2018 for the case of the STORE-RECALL task (see Section 1 of Appendix 1 and Appendix 1—figure 1). We found that the time constants of neurons with SFA had little impact on their intrinsic time scale for this task, in particular much less than the network input. We have also shown in control experiments that the alignment between the time scale of SFA and the time scale of working memory duration can be rather loose. Even a random distribution of time constants for SFA works well.

Altogether, we have shown that SFA, a well-known feature of a substantial fraction of neurons in the neocortex, provides an important new facet for our understanding of computations in SNNs: it enables SNNs to integrate temporally dispersed information seamlessly into ongoing network computations. This paves the way for reaching a key goal of modeling: to combine detailed experimental data from neurophysiology on the level of neurons and synapses with the brain-like high computational performance of the network.

Materials and methods

In this section, we first describe the details of the network models that we employ, and then we continue with the description of the training methods. After that, we give details about all the tasks and analyses performed.

Network models

LIF neurons

Request a detailed protocol

A LIF neuron j spikes as soon at its membrane potential Vj(t) is above its threshold vth. At each spike time t, the membrane potential Vj(t) is reset by subtracting the threshold value vth and the neuron enters a strict refractory period for 3–5 ms (depending on the experiment) where it cannot spike again. Between spikes, the membrane voltage Vj(t) is following the dynamics

(2) τmV˙j(t)=-Vj(t)+RmIj(t),

where τm is the membrane constant of neuron j, Rm is the resistance of the cell membrane, and Ij is the input current.

Our simulations were performed in discrete time with a time step δt=1 ms. In discrete time, the input and output spike trains are modeled as binary sequences xi(t),zj(t){0,1δt}, respectively. Neuron j emits a spike at time t if it is currently not in a refractory period, and its membrane potential Vj(t) is above its threshold. During the refractory period following a spike, zj(t) is fixed to 0. The neural dynamics in discrete time reads as follows:

(3) Vj(t+δt)=αVj(t)+(1-α)RmIj(t)-vthzj(t)δt,

where α=exp(-δtτm), with τm being the membrane constant of the neuron j. The spike of neuron j is defined by zj(t)=H(Vj(t)-vthvth)1δt, with H(x)=0 if x<0 and 1 otherwise. The term -vthzj(t)δt implements the reset of the membrane voltage after each spike.

In all simulations, the Rm was set to 1GΩ. The input current Ij(t) is defined as the weighted sum of spikes from external inputs and other neurons in the network:

(4) Ij(t)=iWjiinxi(t-djiin)+iWjireczi(t-djirec),

where Wjiin and Wjirec denote respectively the input and the recurrent synaptic weights and djiin and djirec the corresponding synaptic delays.

LIF neurons with SFA

Request a detailed protocol

The SFA is realized by replacing the fixed threshold vth with the adaptive threshold Aj(t), which follows the dynamics (reproducing Equation (1) for arbitrary δt):

(5) Aj(t)=vth+βaj(t),aj(t+δt)=ρjaj(t)+(1ρj)zj(t)δt.

Now, the parameter ρj is given by ρj=exp(-δtτa,j). In all our simulations, δt was set to 1 ms.

The spiking output of LIF neuron with SFA j is then defined by zj(t)=H(Vj(t)-Aj(t)Aj(t))1δt.

Adaptation time constants of neurons with SFA were chosen to match the task requirements while still conforming to the experimental data from rodents (Allen Institute, 2018b; Pozzorini et al., 2013Pozzorini et al., 2015; Mensi et al., 2012). For an analysis of the impact of the adaptation time constants on the performance, see Table 1.

LIF neurons with activity-dependent increase in excitability: ELIF neurons

Request a detailed protocol

There exists experimental evidence that some neurons fire for the same stimulus more for a repetition of the same sensory stimulus. We refer to such neurons as ELIF neurons since they are becoming more excitable. Such repetition enhancement was discussed, for example, in Tartaglia et al., 2014. But to the best of our knowledge, it has remained open whether repetition enhancement is a network effect, resulting, for example, from a transient depression of inhibitory synapses onto the cell that is caused by postsynaptic firing (Kullmann et al., 2012), or a result of an intrinsic firing property of some neurons. We used a simple model for ELIF neurons that is dual to the above-described LIF neuron model with SFA: the threshold is lowered through each spike of the neuron, and then decays exponentially back to its resting value. This can be achieved by using a negative value for β in Equation (1).

Models for short-term plasticity (STP) of synapses

Request a detailed protocol

We modeled the STP dynamics according to the classical model of STP in Mongillo et al., 2008. The STP dynamics in discrete time, derived from the equations in Mongillo et al., 2008, are as follows:

(6) uji(t+δt)=exp(-δtF)uji(t)+Uji(1-uji(t))zi(t)δt,
(7) uji(t+δt)=Uji+uji(t),
(8) rji(t+δt)=exp(-δtD)rji(t)+uji(t)(1-rji(t))zi(t)δt,
(9) rji(t+δt)=1-rji(t),
(10) WjiSTP(t+δt)=Wjirecuji(t)rji(t),

where zi(t) is the spike train of the presynaptic neuron and Wjirec scales the synaptic efficacy of synapses from neuron i to neuron j. Networks with STP were constructed from LIF neurons with the weight Wjirec in Equation (4) replaced by the time-dependent weight WjiSTP(t).

STP time constants of facilitation-dominant and depression-dominant network models were based on the values of experimental recordings in Wang et al., 2006 of PFC-E1 (D=194±18 ms, F=507±37 ms, U=0.28±0.02) and PFC-E2 (D=671±17 ms, F=17±5 ms, U=0.25±0.02) synapse types, respectively. Recordings in Wang et al., 2006 were performed in the medial prefrontal cortex of young adult ferrets. In the sMNIST task for the depression-dominant network model (STP-D), we used values based on PFC-E2, and for facilitation-dominant network model (STP-F) we used values based on PFC-E1 (see sMNIST task section below). For the STORE-RECALL task, we trained the network with the data-based time constants based on PFC-E2 and PFC-E1 and also an extended time constants variant where both facilitation and depression time constants were equally scaled up until the larger time constant matched the requirement of the task (see One-dimensional STORE-RECALL task section below).

Weight initialization

Request a detailed protocol

Initial input and recurrent weights were drawn from a Gaussian distribution Wjiw0nin𝒩(0,1), where nin is the number of afferent neurons and 𝒩(0,1) is the zero-mean unit-variance Gaussian distribution and w0=1VoltRmδt is a normalization constant (Bellec et al., 2018a). In the default setting, it is possible for neurons to have both positive and negative outgoing weights, also to change their sign during the optimization process. See Section 2 of Appendix 1 for more results with sparse connectivity and enforcement of Dale’s law using deep rewiring (Bellec et al., 2018b).

Sigmoid and softmax functions

Request a detailed protocol

In the STORE-RECALL task (1- and 20-dimensional), the sigmoid function was applied to the neurons in the output layer. The sigmoid function is given by

(11) σ(x)=11+e-x,

where x represents a real-valued variable. The result, bounded to [0,1] range, is then thresholded at the value of 0.5 to obtain the final predictions – neuron active or not. More precisely, the neuron is active if σ(x)0.5, otherwise it is not.

The softmax function (used in tasks sMNIST, 12AX, Duplication/Reversal) is given by

(12) Softmax(xi)=exij=1mexj,

where xi is a real-valued output of neuron i in the output layer with m neurons. The final prediction after applying the softmax is then obtained by taking the maximum of all values calculated for each neuron in the output layer.

Training methods

BPTT

Request a detailed protocol

In artificial recurrent neural networks, gradients can be computed with BPTT (Mozer, 1989; Robinson and Fallside, 1987; Werbos, 1988). In SNNs, complications arise from the non-differentiability of the output of spiking neurons. In our discrete-time simulation, this is formalized by the discontinuous step function H arising in the definition of the spike variable zj(t). All other operations can be differentiated exactly with BPTT. For feedforward artificial neural networks using step functions, a solution was to use a pseudo-derivative H(x):=max{0,1-|x|} (Esser et al., 2016), but this method is unstable with recurrently connected neurons. It was found in Bellec et al., 2018a that dampening this pseudo-derivative with a factor γ<1 (typically γ=0.3) solves that issue. Hence, we use the pseudo-derivative

(13) dzj(t)dvj(t):=γmax{0,1|vj(t)|},

where vj(t) denotes the normalized membrane potential vj(t)=Vj(t)-Aj(t)Aj(t). Importantly, gradients can propagate in adaptive neurons through many time steps in the dynamic threshold without being affected by the dampening. 

Unless stated otherwise, the input, the recurrent, and the readout layers were fully connected and the weights were trained simultaneously.

e-prop

Request a detailed protocol

In the 12AX task, the networks were trained using the biologically plausible learning method random e-prop (Bellec et al., 2020) in addition to BPTT.

Tasks

One-dimensional STORE-RECALL task

Request a detailed protocol

The input to the network consisted of 40 input neurons: 10 for STORE, 10 for RECALL, and 20 for population coding of a binary feature. Whenever a subpopulation was active, it would exhibit a Poisson firing with a frequency of 50 Hz. For experiments reported in Figure 2D, each input sequence consisted of 20 steps (200 ms each) where the STORE or the RECALL populations were activated with probability 0.09 interchangeably, which resulted in delays between the STORE-RECALL pairs to be in the range [200, 3600] ms. For experiments reported in Table 1, the input sequences of experiments with the expected delay of 2, 4, 8, and 16 s were constructed as a sequence of 20, 40, 80, and 120 steps, respectively, with each step lasting for 200 ms. For the experiment with expected delay of 200 ms, the input sequence consisted of 12 steps of 50 ms.

Networks were trained for 400 iterations with a batch size of 64 in Table 1 and 128 in Figure 2B. We used Adam optimizer with default parameters and initial learning rate of 0.01, which was decayed every 100 iterations by a factor of 0.3. To avoid unrealistically high firing rates, the loss function contained a regularization term (scaled with coefficient 0.001) that minimizes the squared difference of the average firing rate of individual neurons from a target firing rate of 10 Hz. In Figure 1D, E, the weights were chosen by hand and not trained. The test performance was computed as the batch average over 2048 random input sequences.

Networks consisted of 60 recurrently connected neurons in all experiments except in Figure 1D, E, where only two neurons were used without recurrent connections. The membrane time constant was τm=20 ms, the refractory period 3 ms. In Figure 1D, E, the two LIF neurons with SFA had β=3 mV and τa=1200 ms. In Figure 2D, for LIF with SFA and ELIF networks, we used β=1 mV and β=0.5 mV, respectively, with τa=2000 ms. Table 1 defines the adaptation time constants and expected delay of the experiments in that section. To provide a fair comparison between STP and SFA models in Figure 2D, we train two variants of the STP model: one with the original parameters from Wang et al., 2006 and another where we scaled up both F and D until the larger one reached 2000 ms, the same time constant used in the SFA model. The scaled up synapse parameters of STP-D network were F=51±15 ms, D=2000±51 ms, and U=0.25, and of STP-F network F=2000±146 ms, D=765±71 ms, and U=0.28. The data-based synapse parameters are described in the STP synapse dynamics section above. The baseline threshold voltage was 10 mV for all models except ELIF for which it was 20 mV and the two neurons in Figure 1D, E for which it was 5 mV. The synaptic delay was 1 ms. The input to the sigmoidal readout neurons were the neuron traces that were calculated by passing all the network spikes through a low-pass filter with a time constant of 20 ms.

20-Dimensional STORE-RECALL task

Request a detailed protocol

The input to the network consisted of commands STORE and RECALL, and 20 bits, which were represented by subpopulations of spiking input neurons. STORE and RECALL commands were represented by four neurons each. The 20 bits were represented by population coding where each bit was assigned four input neurons (two for value 0, and two for value 2). When a subpopulation was active, it would exhibit a Poisson firing with a frequency of 400 Hz. Each input sequence consisted of 10 steps (200 ms each) where a different population encoded bit string was shown during every step. Only during the RECALL period the input populations, representing the 20 bits, were silent. At every step, the STORE or the RECALL populations were activated interchangeably with probability 0.2, which resulted in the distribution of delays between the STORE-RECALL pairs in the range [200, 1600] ms.

To measure the generalization capability of a trained network, we first generated a test set dictionary of 20 unique feature vectors (random bit strings of length 20) that had at least a Hamming distance of 5 bits among each other. For every training batch, a new dictionary of 40 random bit strings (of length 20) was generated, where each string had a Hamming distance of at least 5 bits from any of the bit string in the test set dictionary. This way we ensured that, during training, the network never encountered any bit string similar to one from the test set.

Networks were trained for 4000 iterations with a batch size of 256 and stopped if the error on the training batch was below 1%. We used Adam optimizer (Kingma and Ba, 2014) with default parameters and initial learning rate of 0.01, which is decayed every 200 iterations by a factor of 0.8. We also used learning rate ramping, which, for the first 200 iterations, monotonically increased the learning rate from 0.00001 to 0.01. The same firing rate regularization term was added to the loss as in the one-dimensional STORE-RECALL setup (see above). To improve convergence, we also included an entropy component to the loss (scaled with coefficient 0.3), which was computed as the mean of the entropies of the outputs of the sigmoid neurons. The test performance was computed as average over 512 random input sequences.

We trained SNNs with and without SFA, consisting of 500 recurrently connected neurons. The membrane time constant was τm=20 ms, and the refractory period was 3 ms. Adaptation parameters were β=4 mV and τa=800 ms with baseline threshold voltage 10 mV. The synaptic delay was 1 ms. The same sigmoidal readout neuron setup was used as in the one-dimensional STORE-RECALL setup (see above).

We ran five training runs with different random seeds (initializations) for both SNNs with and without SFA. All runs of the SNN with SFA network converged after ~ 3600 iterations to a training error below 1%. At that point we measured the accuracy on 512 test sequences generated using the previously unseen test bit strings, which resulted in test accuracy of 99.09% with a standard deviation of 0.17%. The LIF network was not able to solve the task in any of the runs (all runs resulted in 0% training and test accuracy with zero standard deviation). On the level of individual feature recall accuracy, the best one out of five training runs of the LIF network was able to achieve 49% accuracy, which is the chance level since individual features are binary bits. In contrast, all SNNs with SFA runs had individual feature-level accuracy of above 99.99%.

Decoding memory from the network activity

Request a detailed protocol

We trained a support vector machine (SVM) to classify the stored memory content from the network spiking activity in the step before the RECALL (200 ms before the start of RECALL command). We performed a cross-validated grid-search to find the best hyperparameters for the SVM, which included kernel type {linear, polynomial, RBF} and penalty parameter C of the error term {0.1, 1, 10, 100, 1000}. We trained SVMs on test batches of the five different training runs of 20-dimensional STORE-RECALL task. SVMs trained on the period preceding the RECALL command of a test batch achieved an average of 4.38% accuracy with a standard deviation of 1.29%. In contrast, SVMs trained on a period during the RECALL command achieved an accuracy of 100%. This demonstrates that the memory stored in the network is not decodable from the network firing activity before the RECALL input command.

Additionally, analogous to the experiments of Wolff et al., 2017, we trained SVMs on network activity during the encoding (STORE) period and evaluated them on the network activity during reactivation (RECALL), and vice versa. In both scenarios, the classifiers were not able to classify the memory content of the evaluation period (0.0% accuracy).

sMNIST task

Request a detailed protocol

The input consisted of sequences of 784 pixel values created by unrolling the handwritten digits of the MNIST dataset, one pixel after the other in a scanline manner as indicated in Appendix 1—figure 3A. We used 1 ms presentation time for each pixel gray value. Each of the 80 input neurons was associated with a particular threshold for the gray value, and this input neuron fired whenever the gray value crossed its threshold in the transition from the previous to the current pixel.

Networks were trained for 36,000 iterations using the Adam optimizer with batch size 256. The initial learning rate was 0.01, and every 2500 iterations the learning rate was decayed by a factor of 0.8. The same firing rate regularization term was added to the loss as in the STORE-RECALL setup (see above) but with the scaling coefficient of 0.1.

All networks consisted of 220 neurons. Network models labeled LIF with SFA and ELIF in the Figure 2C had 100 neurons out of 220 with SFA or transient excitability, respectively. The network with SFA had 100 neurons out of 220 with SFA and the rest without. The neurons had a membrane time constant of τm=20 ms, a baseline threshold of vth=10 mV, and a refractory period of 5 ms. LIF neurons with SFA and ELIF neurons had the adaptation time constant τa=700 ms with adaptation strength β=1.8 mV and –0.9 mV, respectively. The synaptic delay was 1 ms. Synapse parameters were F=20 ms, D=700 ms, and U=0.2 for the STP-D model, and F=500 ms, D=200 ms, and U=0.2 for the STP-F model. The output of the SNN was produced by the softmax of 10 linear output neurons that received the low-pass filtered version of the spikes from all neurons in the network, as shown in the bottom row of Appendix 1—figure 3B. The low-pass filter had a time constant of 20 ms. For training the network to classify into one of the 10 classes, we used cross-entropy loss computed between the labels and the softmax of output neurons.

The 12AX task

Request a detailed protocol

The input for each training and testing episode consisted of a sequence of 90 symbols from the set {1,2,A,B,C,X,Y,Z}. A single episode could contain multiple occurrences of digits 1 or 2 (up to 23), each time changing the target sequence (A…X or B…Y) after which the network was supposed to output R. Each digit could be followed by up to 26 letters before the next digit appeared. More precisely, the following regular expression describes the string that was produced: [12][ABCXYZ]{1,10}((A[CZ]{0,6}X|B[CZ]{0,6}Y)|([ABC][XYZ])){1,2}. Each choice in this regular expression was made randomly.

The network received spike trains from the input population of spiking neurons, producing Poisson spike trains. Possible input symbols were encoded using ‘one-hot encoding’ scheme. Each input symbol was signaled through a high firing rate of a separate subset of five input neurons for 500 ms. The output consisted of two readout neurons, one for L, one for the R response. During each 500 ms time window, the input to these readouts was the average activity of neurons in the SNN during that time window. The final output symbol was based on which of the two readouts had the maximum value.

The neurons had a membrane time constant of τm=20 ms, a baseline threshold vth=30 mV, a refractory period of 5 ms, and synaptic delays of 1 ms. LIF neurons with SFA had an adaptation strength of β=1.7 mV, and adaptation time constants were chosen uniformly from [1,13500] ms.

A cross-entropy loss function was used to minimize the error between the softmax applied to the output layer and targets, along with a regularization term (scaled with coefficient 15) that minimized the squared difference of average firing rate between individual neurons and a target firing rate of 10 Hz. The SNN was trained using the Adam optimizer for 10,000 iterations with a batch size of 20 episodes and a fixed learning rate of 0.001. An episode consisted of 90 steps, with between 4 and 23 tasks generated according to the task generation procedure described previously. We trained the network consisting of 200 LIF neurons (100 with and 100 without SFA) with BPTT using five different network initializations, which resulted in an average test success rate of 97.79 with a standard deviation of 0.42%.

In the experiments where the fraction of neurons with SFA varied, the network with 200 LIF neurons with SFA (i.e., all LIF neurons with SFA) achieved a success rate of 72.01% with a standard deviation of 36.15%, whereas the network with only 20 LIF neurons with SFA and 180 LIF neurons without SFA achieved a success rate of 95.39% with a standard deviation of 1.55%. The network consisting of 200 LIF neurons without SFA (i.e., all neurons without SFA) was not able to solve the task, and it achieved a success rate of 0.39% with a standard deviation of 0.037%. Each success rate reported is an average calculated over five different network initializations.

The network consisting of 100 LIF neurons with and 100 LIF neurons without SFA, trained with random e-prop, resulted in an average test success rate of 92.89% with a standard deviation of 0.75% (average over five different network initializations).

Symbolic computation on strings of symbols (Duplication/Reversal task)

Request a detailed protocol

The input to the network consisted of 35 symbols: 31 symbols represented symbols from the English alphabet {a, b, c, d, … x, y, z, A, B, C, D, E}, one symbol was for ‘end-of-string’ (EOS) ‘*’, one for cue for the output prompt ‘?’, and two symbols to denote whether the task command was duplication or reversal. Each of the altogether 35 input symbols were given to the network in the form of higher firing activity of a dedicated population of 5 input neurons outside of the SNN (‘one-hot encoding’). This population of input neurons fired at a ‘high’ rate (200 Hz) to encode 1, and at a ‘low’ rate (2 Hz) otherwise. The network output was produced by linear readouts (one per potential output symbol, each with a low-pass filter with a time constant of 250 ms) that received spikes from neurons in the SNN (see the row ‘Output’ in Figure 4A). The final output symbol was selected using the readout that had the maximum value at the end of each 500 ms time window (a softmax instead of the hard argmax was used during training), mimicking WTA computations in neural circuits of the brain (Chettih and Harvey, 2019) in a qualitative manner.

The network was trained to minimize the cross-entropy error between the softmax applied to the output layer and targets. The loss function contained a regularization term (scaled with coefficient 5) that minimizes the squared difference of average firing rate between individual neurons and a target firing rate of 20 Hz.

The training was performed for 50,000 iterations, with a batch size of 50 episodes. We used Adam optimizer with default parameters and a fixed learning rate of 0.001. Each symbol was presented to the network for a duration of 500 ms. The primary metric we used for measuring the performance of the network was success rate, which was defined as the percentage of episodes where the network produced the full correct output for a given string, that is, all the output symbols in the episode had to be correct. The network was tested on 50,000 previously unseen strings.

The network consisted of 192 LIF neurons with SFA and 128 LIF neurons without SFA. All the neurons had a membrane time constant of τm=20 ms, a baseline threshold vth=30 mV, a refractory period of 5 ms, and a synaptic delay of 1 ms. LIF neurons with SFA in the network had an adaptation strength of β=1.7 mV. It was not necessary to assign particular values to adaptation time constants of firing thresholds of neurons with SFA; we simply chose them uniformly randomly to be between 1 ms and 6000 ms, mimicking the diversity of SFA effects found in the neocortex (Allen Institute, 2018b) in a qualitative manner. All other parameters were the same as in the other experiments. We trained the network using five different network initializations (seeds) and tested it on previously unseen strings. Average test success rate was 95.88% with standard deviation 1.39%.

Analysis of spiking data for Duplication/Reversal task

Request a detailed protocol

We used three-way ANOVA to analyze if a neuron’s firing rate is significantly affected by task, serial position in the sequence, symbol identity, or combination of these (similar to Lindsay et al., 2017). In such a multifactorial experiment, factors are crossed with each other, and we refer to these factors as ‘conditions.’ For two possible tasks, 5 possible positions in the input sequence, and 31 possible symbols, there are 2531=310 different conditions. The analysis was performed on the activity of the neurons of the trained SNN during 50,000 test episodes. From each episode, a serial position from the input period was chosen randomly, and hence each episode could be used only once, that is, as one data point. This was to make sure that each entry in the three-way ANOVA was completely independent of other entries since the neuron’s activity within an episode is highly correlated. Each data point was labeled with the corresponding triple of (task type, serial position, symbol identity). To ensure that the dataset was balanced, the same number of data points per particular combination of conditions was used, discarding all the excess data points, resulting in a total of 41,850 data points – 135 data points per condition, that is, 135 repeated measurements for each condition and per neuron, but with no carryover effects for repetitions per neuron since the internal state variables of a neuron are reset between episodes. In such a scenario, neurons can be seen as technical replicates. For the analysis, neurons whose average firing rate over all episodes (for the input period) was lower than 2 Hz or greater than 60 Hz were discarded from the analysis to remove large outliers. This left 279 out of the 320 neurons. To categorize a neuron as selective to one or more conditions, or combination of conditions, we observed p-values obtained from three-way ANOVA and calculated the effect size ω2 for each combination of conditions. If the p-value was less than 0.001 and ω2 greater than 0.14 for a particular combination of conditions, the neuron was categorized as selective to that combination of conditions. The ω2 threshold of 0.14 was suggested by Field, 2013 to select large effect sizes. Each neuron can have a large effect size for more than one combination of conditions. Thus, the values shown in Figure 4D sum to a value greater than 1. The neuron shown in Figure 4E had the most prominent selectivity for the combination of Task × Position × Symbol, with ω2=0.394 and p<0.001. The neuron shown in Figure 4F was categorized as selective to a combination of Position × Symbol category, with ω2=0.467 and p<0.001. While the three-way ANOVA tells us if a neuron is selective to a particular combination of conditions, it does not give us the exact task/symbol/position that the neuron is selective to. To find the specific task/symbol/position that the neuron was selective to, Welch’s t-test was performed, and a particular combination with maximum t-statistic and p<0.001 was chosen to be shown in Figure 4E, F.

Appendix 1

Autocorrelation-based intrinsic time scale of neurons trained on STORE-RECALL task

We wondered whether the adaptive firing threshold of LIF neurons with SFA affects the autocorrelation function of their firing activity – termed intrinsic time scale in Wasmuht et al., 2018. We tested this for an SNN consisting of 200 LIF neurons without and 200 LIF neurons with SFA that was trained to solve a one-dimensional version of the STORE-RECALL task. It turned out that during the delay between STORE and RECALL these intrinsic time constants were in the same range as those measured in the monkey cortex (see Figure 1C in Wasmuht et al., 2018). Furthermore, neurons of the trained SNN exhibited very similar distributions of these time constants (see Appendix 1—figure 1), suggesting that these intrinsic time constants are determined largely by their network inputs, and less by the neuron type.

Appendix 1—figure 1
Histogram of the intrinsic time scale of neurons trained on STORE-RECALL task.

We trained 64 randomly initialized spiking neural networks (SNNs) consisting of 200 leaky integrate-and-fire (LIF) neurons with and 200 without spike frequency adaptation (SFA) on the single-feature STORE-RECALL task. Measurements of the intrinsic time scale were performed according to Wasmuht et al., 2018 on the spiking data of SNNs solving the task after training. Averaged data of all 64 runs is presented in the histogram. The distribution is very similar for neurons with and without SFA.

Appendix 1—figure 2
Illustration of models for an inversely adapting enhanced-excitability LIF (ELIF) neuron, and for short-term synaptic plasticity.

(A) Sample spike train. (B) The resulting evolution of firing threshold for an inversely adapting neuron (ELIF neuron). (C, D) The resulting evolution of the amplitude of postsynaptic potentials (PSPs) for spikes of the presynaptic neuron for the case of a depression-dominant (STP-D: D >> F) and a facilitation-dominant (STP-F: F >> D) short-term synaptic plasticity.

sMNIST task with sparsely connected SNN obeying Dale’s law

This task has originally been used as a temporal processing benchmark for ANNs and has successfully been solved with the LSTM type of ANNs (Hochreiter and Schmidhuber, 1997). LSTM units store information in registers – like a digital computer – so that the stored information cannot be perturbed by ongoing network activity. Networks of LSTM units or variations of such units have been widely successful in temporal processing and reach the level of human performance for many temporal computing tasks.

Since LSTM networks also work well for tasks on larger time scales, for comparing SNNs with LSTM networks, we used a version of the task with 2 ms presentation time per pixel, thereby doubling the length of sequences to be classified to 1568 ms. Gray values of pixels were presented to the LSTM network simply as analog values. A trial of a trained SNN with SFA (with an input sequence that encodes a handwritten digit ‘3’ using population rate coding) is shown in Appendix 1—figure 3B. The top row of Figure 3B shows a version where the gray value of the currently presented pixel is encoded by population coding, through the firing probability of 80 input neurons. Somewhat better performance was achieved when each of the 80 input neurons was associated with a particular threshold for the gray value, and this input neuron fired whenever the gray value crossed its threshold in the transition from the previous to the current pixel (this input convention was used to produce the results below).

Appendix 1—figure 3
sMNIST time-series classification benchmark task.

(A) Illustration of the pixel-wise input presentation of handwritten digits for sMNIST. (B) Rows top to bottom: input encoding for an instance of the sMNIST task, network activity, and temporal evolution of firing thresholds for randomly chosen subsets of neurons in the SC-SNN, where 25% of the leaky integrate-and-fire (LIF) neurons were inhibitory (their spikes are marked in red). The light color of the readout neuron for digit ‘3’ around 1600 ms indicates that this input was correctly classified. (C) Resulting connectivity graph between neuron populations of an SC-SNN after backpropagation through time (BPTT) optimization with DEEP R on sMNIST task with 12% global connectivity limit. 

Besides a fully connected network of LIF neurons with SFA, we also tested the performance of a variant of the model, called SC-SNN, that integrates additional constraints of SNNs in the brain: it is sparsely connected (12% of possible connections are present) and consists of 75% excitatory and 25% inhibitory neurons that adhere to Dale’s law. By adapting the sparse connections with the rewiring method in Bellec et al., 2018a during BPTT training, the SC-SNN was able to perform even better than the fully connected SNN of LIF neurons with SFA. The resulting architecture of the SC-SNN is shown in Appendix 1—figure 3C. Its activity of excitatory and inhibitory neurons, as well as the time courses of adaptive thresholds for (excitatory) LIF neurons with SFA of the SC-SNN, is shown in Appendix 1—figure 3B. In this setup, the SFA had τa=1400 ms. When we used an SNN with SFA, we improved the accuracy on this task to 96.4%, which approaches the accuracy of the artificial LSTM model that reached the accuracy of 98.0%.

We also trained a liquid state machine version of the SNN model with SFA where only the readout neurons are trained. This version of the network reached the accuracy of 63.24±1.48% over five independent training runs.

Google Speech Commands

We trained SNNs with and without SFA on the keyword-spotting task with Google Speech Commands Dataset (Warden, 2018) (v0.02). The dataset consists of 105,000 audio recordings of people saying 30 different words. Fully connected networks were trained to classify audio recordings, which were clipped to 1 s length, into one of 12 classes (10 keywords, as well as 2 special classes for silence and unknown words; the remaining 20 words had to be classified as ‘unknown’). Comparison of the maximum performance of trained spiking networks against state-of-the-art artificial recurrent networks is shown in Table 1. Averaging over five runs, the SNN with SFA reached 90.88±0.22%, and the SNN without SFA reached 88.79±0.16% accuracy. Thus an SNN without SFA can already solve this task quite well, but the inclusion of SFA halves the performance gap to the published state of the art in machine learning. The only other report on a solution to this task with spiking networks is Zenke and Vogels, 2020. There the authors train a network of LIF neurons using surrogate gradients with BPTT and achieve 85.3±0.3% accuracy on the full 35 classes setup of the task. In this setup, the SNN with SFA reached 88.5±0.16% test accuracy.

Appendix 1—table 1
Google Speech Commands.

Accuracy of the spiking network models on the test set compared to the state-of-the-art artificial recurrent model reported in Kusupati et al., 2018. Accuracy of the best out of five simulations for spiking neural networks (SNNs) is reported. SFA: spike frequency adaptation.

ModelTest accuracy (%)
FastGRNN-LSQ (Kusupati et al., 2018)93.18
SNN with SFA91.21
SNN89.04

Features were extracted from the raw audio using the Mel Frequency Cepstral Coefficient (MFCC) method with 30 ms window size, 1 ms stride, and 40 output features. The network models were trained to classify the input features into one of the 10 keywords (yes, no, up, down, left, right, on, off, stop, go) or to two special classes for silence or unknown word (where the remainder of 20 recorded keywords are grouped). The training, validation, and test set were assigned 80, 10, and 10% of data respectively while making sure that audio clips from the same person stay in the same set.

All networks were trained for 18,000 iterations using the Adam optimizer with batch size 100. The output spikes of the networks were averaged over time, and the linear readout layer was applied to those values. During the first 15,000 iterations, we used a learning rate of 0.001 and for the last 3000, we used a learning rate of 0.0001. The loss function contained a regularization term (scaled with coefficient 0.001) that minimizes the squared difference of average firing rate between individual neurons and a target firing rate of 10 Hz.

Both SNNs with and without SFA consisted of 2048 fully connected neurons in a single recurrent layer. The neurons had a membrane time constant of τm=20 ms, the adaptation time constant of SFA was τa=100 ms, and adaptation strength was β=2 mV. The baseline threshold was vth=10 mV, and the refractory period was 2 ms. The synaptic delay was 1 ms.

Delayed-memory XOR

We also tested the performance of SNNs with SFA on a previously considered benchmark task, where two items in the working memory have to be combined nonlinearly: the delayed-memory XOR task (Huh and Sejnowski, 2018). The network is required to compute the exclusive-or operation on the history of input pulses when prompted by a go-cue signal (see Appendix 1—figure 4).

Appendix 1—figure 4
Delayed-memory XOR task.

Rows top to bottom: input signal, go-cue signal, network readout, network activity, and temporal evolution of firing thresholds.

The network received on one input channel two types of pulses (up or down) and a go-cue on another channel. If the network received two input pulses since the last go-cue signal, it should generate the output ‘1’ during the next go-cue if the input pulses were different or ‘0’ if the input pulses were the same. Otherwise, if the network only received one input pulse since the last go-cue signal, it should generate a null output (no output pulse). Variable time delays are introduced between the input and go-cue pulses. The time scale of the task was 600 ms, which limited the delay between input pulses to 200 ms.

This task was solved in Huh and Sejnowski, 2018, without providing performance statistics, by using a type of neuron that has not been documented in biology – a non-leaky quadratic integrate and fire neuron. We are not aware of previous solutions by networks of LIF neurons. To compare and investigate the impact of SFA on network performance in the delayed-memory XOR task, we trained SNNs, with and without SFA, of the same size as in Huh and Sejnowski, 2018 – 80 neurons. Across 10 runs, SNNs with SFA solved the task with 95.19±0.014% accuracy, whereas the SNNs without SFA converged at lower 61.30±0.029% accuracy.

The pulses on the two input channels were generated with 30 ms duration and the shape of a normal probability density function normalized in the range [0,1]. The pulses were added or subtracted from the baseline zero input current at appropriate delays. The go-cue was always a positive current pulse. The six possible configurations of the input pulses (+, –, ++, ––, +-, –+) were sampled with equal probability during training and testing.

Networks were trained for 2000 iterations using the Adam optimizer with batch size 256. The initial learning rate was 0.01, and every 200 iterations the learning rate was decayed by a factor of 0.8. The loss function contained a regularization term (scaled with coefficient 50) that minimizes the squared difference of the average firing rate of individual neurons from a target firing rate of 10 Hz. This regularization resulted in networks with a mean firing rate of 10 Hz where firing rates of individual neurons were spread in the range [1, 16] Hz.

Both SNNs with and without SFA consisted of 80 fully connected neurons in a single recurrent layer. The neurons had a membrane time constant of τm=20 ms, a baseline threshold vth=10 mV, and a refractory period of 3 ms. SFA had an adaptation time constant of τa=500 ms and an adaptation strength of β=1 mV. The synaptic delay was 1 ms. For training the network to classify the input into one of the three classes, we used the cross-entropy loss between the labels and the softmax of three linear readout neurons. The input to the linear readout neurons were the neuron traces that were calculated by passing all the network spikes through a low-pass filter with a time constant of 20 ms.

12AX task in a noisy network

As a control experiment, aimed at testing the robustness of the solution (performance as a function of the strength of added noise), we simulated the injection of an additional noise current into all LIF neurons (with and without SFA). The previously trained network (trained without noise) was reused and tested on a test set of 2000 episodes. In each discrete time step, the noise was added to the input current Ij(t) (see Equation (4) in the main text), hence affecting the voltage of the neuron:

(AE1) Ij(t)=iWjiinxi(tdjiin)+iWjireczi(tdjirec)+Inoise,

where Inoise was drawn from a normal distribution with mean zero, and standard deviation σ{0.05,0.075,0.1,0.2,0.5}.

Performance of the network without noise was 97.85% (performance of one initialization of the network with 100 LIF neurons with SFA and 100 LIF neurons without SFA). During testing, including the noise current of mean zero and standard deviation σ{0.05,0.075,0.1,0.2,0.5} led to the performance of 92.65, 89.05, 80.25, 27.25, and 0.25%, respectively. The network performance degrades gracefully up to a current of standard deviation of about 0.1.

For an illustration of the effect of noise, see Appendix 1—figures 5 and 6. There, we compare the output spikes, adaptive threshold, and membrane voltage of one neuron with noise current to the versions without noise. The shown simulations started from exactly the same initial condition and noise with standard deviation 0.05 (0.075) was injected only into the shown neuron (other neurons did not receive any noise current). One sees that even this weak noise current produces a substantial perturbation of the voltage, adaptive threshold, and spiking output of the neuron.

Appendix 1—figure 5
Effect of a noise current with zero mean and standard deviation 0.05 added to a single neuron in the 12AX task.

Spike train of a single neuron without noise, followed by spike train in the presence of the noise, adaptive threshold of the neuron that corresponds to the spike train with no noise (shown in blue), spike train with noise present (shown in orange), and corresponding neuron voltages over the time course of 200 ms.

Appendix 1—figure 6
Effect of a noise current with zero mean and standard deviation 0.075 added to a single neuron in the network for the 12AX task.

Spike train of a single neuron without noise, followed by spike train in the presence of the noise, adaptive threshold of the neuron that corresponds to the spike train with no noise (shown in blue), spike train with noise present (shown in orange), and corresponding neuron voltages over the time course of 200 ms.

Duplication/Reversal task

A zoom-in for the rasters shown in Figure 4A (from the main text) is shown in Appendix 1—figure 7 for the time period 3-4 s.

Appendix 1—figure 7
A zoom-in of the spike raster for a trial solving Duplication task (left) and Reversal task (right).

A sample episode where the network carried out sequence duplication (left) and sequence reversal (right), shown for the time period of 3–4 ms (two steps after the start of network output). Top to bottom: spike inputs to the network (subset), sequence of symbols they encode, spike activity of 10 sample leaky integrate-and-fire (LIF) neurons (without and with spike frequency adaptation [SFA]) in the spiking neural network (SNN), firing threshold dynamics for these 10 LIF neurons with SFA, activation of linear readout neurons, output sequence produced by applying argmax to them, and target output sequence.

Appendix 1—figure 8
Distribution of adaptation index from Allen Institute cell measurements (Allen Institute, 2018b).

Data availability

An implementation of the network model in Tensorflow/Python is available at https://github.com/IGITUGraz/LSNN-official (copy archived at https://archive.softwareheritage.org/swh:1:rev:a9158a3540da92ae51c46a3b7abd4eae75a2bb86). The sMNIST dataset is available at https://www.tensorflow.org/datasets/catalog/mnist. The google speech commands dataset is available at https://storage.cloud.google.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz. The code for experiments presented in the main paper are available at https://github.com/IGITUGraz/Spike-Frequency-Adaptation-Supports-Network-Computations.

References

  1. Book
    1. Field A
    (2013)
    Discovering Statistics Using IBM SPSS Statistics
    Sage.
  2. Book
    1. Kok P
    2. de Lange FP
    (2015) Predictive coding in sensory cortex
    In: de Lange FP, editors. An Introduction to Model-Based Cognitive Neuroscience. Berlin, Germany: Springer. pp. 221–244.
    https://doi.org/10.1007/978-1-4939-2236-9_11
    1. Kruijne W
    2. Bohte SM
    3. Roelfsema PR
    4. Olivers CNL
    (2020)
    Flexible working memory through selective gating and attentional tagging
    Neural Computation pp. 1–40.
  3. Book
    1. Lashley KS
    (1951)
    The Problem of Serial Order in Behavior
    Oxford, United Kingdom: Bobbs-Merrill.
  4. Book
    1. Marcus GF
    (2003)
    The Algebraic Mind: Integrating Connectionism and Cognitive Science
    Cambridge, United States: MIT Press.
    1. Mozer MC
    (1989)
    A focused back-propagation algorithm for temporal pattern recognition
    Complex Systems 3:349–381.
  5. Book
    1. Robinson A
    2. Fallside F
    (1987)
    The Utility Driven Dynamic Error Propagation Network
    University of Cambridge Press.
    1. Sherman SM
    (2014) The function of metabotropic glutamate receptors in thalamus and cortex
    The Neuroscientist : A Review Journal Bringing Neurobiology, Neurology and Psychiatry 20:136–149.
    https://doi.org/10.1177/1073858413478490

Decision letter

  1. Timothy O'Leary
    Reviewing Editor; University of Cambridge, United Kingdom
  2. Timothy E Behrens
    Senior Editor; University of Oxford, United Kingdom
  3. Gabrielle Gutierrez
    Reviewer; University of Washington, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

Although it is clear that the brain computes using spikes there are inherent obstacles to implementing many simple computational tasks using spiking networks, in part due to the short timescale associated with spiking events and synaptic potentials which makes computations involving short-term memory problematic. This paper demonstrates in simulations that longer timescale adaptation – spike frequency adaptation – can facilitate computations that in spiking networks by endowing them with a cellular form of short term memory. Interestingly, computational performance appears to benefit from heterogeneity in a network, with a subset of cells exhibiting adaptation. This demonstrates the importance of commonly overlooked physiological properties in implementing nontrivial computational tasks.

Decision letter after peer review:

Thank you for submitting your article "Spike frequency adaptation supports network computations on temporally dispersed information" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Timothy Behrens as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Gabrielle Gutierrez (Reviewer #1).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential Revisions:

1) The authors need to rewrite the results and introduction with a broad biological audience in mind. As it stands, it targets readers who work on artificial neural networks and the results are essentially benchmarks against artificial neural networks. Many of the assumptions and connections to biology are not discussed adequately. In particular, key omissions and simplifications need to be discussed: the artificial nature of training, the lack of structure in the recurrent network, the likely differences between the statistics of spiking in the model vs what might be observed in an intact brain.

2) It would be helpful to lead the results with a simplified model that illustrates the negative imprinting mechanism directly. The results as they stand use rather complex models and provide largely observational evidence for the main claims of the paper. Since performance gains over other artificial neural networks are variable according to the task, some handle on when SFA is beneficial would help.

3) The relevance of SFA (as opposed to other slow timescale mechanisms) deserves fuller attention. The SFA characteristics are extracted from intracellular slice recordings where there is no substantial background activity. In an intact animal (with realistic spiking statistics) it is questionable whether the results will hold in a network where ongoing activity could maintain the SFA recovery variable close to steady-state. If this is the case, the conclusions need to be adequately tempered.

Reviewer #1 (Recommendations for the authors):The authors demonstrate the utility of spike frequency adaptation (SFA) in an artificial spiking neural network (SNN) for accomplishing tasks with a temporal dimension. They first show that a SNN with SFA can perform a simple store/recall task. They then test their model against tasks with increasing complexity. The authors find that in all of these tests, the SNN with SFA is able to match or outperform the most commonly implemented artificial neural network mechanisms which are also less biologically relevant. This study brings an awareness of the SFA mechanism to neural network modelers and it quantifies the improvement it brings to the performance of complex tasks. Furthermore, this study offers neurobiologists a new perspective on the SFA mechanism and its function in neural processing.

The most exciting strength of this paper is its multidisciplinary nature. It has the potential to drive innovation in the field of artificial neural networks (ANNs) by exploring how a known biological neural mechanism – SFA – can impact the performance of ANNs for solving temporally-complex tasks.

The multidisciplinary nature of this paper is a double-edged sword, however. At times there was a bit of a disconnect between the conclusions and the results. In most cases, the performance of the model against a canonical benchmark is rigorously quantified in a way that is standard in many neuroengineering papers. However, there were also multiple attempts to connect the model's performance to actual findings in the brain. I think that this is a worthwhile thing to do because it targets a broader readership that makes the study more relevant to the eLife audience; however, those comparisons were less rigorous than the ANN benchmark comparisons. This was particularly the case for the Negative Imprinting phenomenon and the emergent mixed-selectivity codes.

With regards to the rigorous benchmark comparisons presented, this paper could be made more accessible to the general neuroscience audience by putting the results and benchmark comparisons into a fuller context for those who are not as familiar with how well recent models have performed on these tasks. At times, the results showed very impressive improvement of the model over standard benchmarks while for other tasks the improvements appeared incremental. It would be helpful to have the performance improvements from the SFA model placed in the context of recent improvements in the canonical or state-of-the-art ANNs. This would allow the reader to know whether a small increase in performance represents an incremental improvement or a giant leap forward in the context of the history of ANNs.

Overall, there are a few main things I suggest the authors focus on revising. The figures and their captions could be made clearer in some places, as well as places in the main text that refer to specific parts of the figures. The other areas I think the authors should focus on are the last section of the Results, the Negative Imprinting Principle section, and the Discussion section.

The last section of the Results should probably be rewritten – especially the first paragraph. In general, in this section a more careful account of how the authors arrive at their conclusions is needed. For example, the authors state that "a large fraction of neurons were mixed-selective" but they only show or mention an analysis of two neurons (Figure 4E,F).

A few times in the text (line 19 in abstract, line 43, line 160, line 363), reference was made to the amounts of SFA observed in the brain or in particular regions, or the properties of neurons with SFA. These statements should have citations of actual studies (in the main text) to support them. If any of these claims are based on the author's own analysis of the Allen Institute data, they may want to include that in the Methods or Supplement and cite their Methods or Supplement section in the main text instead of citing the database.

It should be explicitly stated which synapses were being trained (recurrent weights, readout weights, or both) and there should be an explicit expression for output sigmoid neurons in the Methods.

An explanation or rationale for why SFA networks only had SFA in a portion of LIF neurons would be helpful.

One of the results I found most striking was the finding that only 1 network level with SFA was needed to achieve the 12AX task rather than the canonical 2 network levels. Given that the authors present the first instance of a SNN solving the 12AX task, it seems that these results are worth unpacking in greater depth in the Results and Discussion sections.

Figure 1: Perhaps include STORE and RECALL in schematic in C, or a general command module that represents analogous commands from the various tasks (i.e. duplicate, replicate, etc.).

Figure 2: The caption refers to "grey values" but it was not clear what/where exactly those are in the context of this figure. This should be clarified.

Figure 3: I found the Top to Bottom listing in the caption to be confusing because it did not refer to the labels on the figure very directly, leaving things open to interpretation. The caption should be re-written to clarify this.

Figure 4: Error in number of neurons (B-F)? Caption says 279 neurons but the main text says there were 320 neurons. Please clarify the discrepancy.

The plots in B and C should have annotations that orient the reader to the relevant parts of the task. For example, a line down the middle of Figure 4B indicating the start of duplication or reversal. In Figure 4C, it would be helpful to see where the abcde sequence falls on the time axes.

Line 52: Fairhall and others have written several reviews that could be cited here as well (Weber and Fairhall, 2019, for example).

Line 122: Replace "perfectly reproduced" with "accurately reproduced" since the output displayed in the figure is clearly not a "perfect" reproduction of the input even if it is deemed an accurate representation by whatever tolerance metric is used.

Line 129: It was confusing for me to determine which part of Figure 2 was the 3rd to last row. I suggest either labeling the rows in Figure 2 with A,B,C, etc or referring to the labels that are already there when describing a given row in the text, i.e. "The row labelled Thresholds in Figure 2 shows the temporal dynamics…".

Paragraph on line 146: I suggest re-wording this paragraph to make it clearer and more focused on your own results. After the first sentence, you should directly describe what you did to test whether the memory storage is supported or not by attractor dynamics. The discussion of the Wolff study and how it relates to your results should be incorporated into the Discussion section instead.

Line 166: I think you meant "upper five rows" not "upper four rows". In either case, I found it confusing to refer to the rows described in the text. This paragraph would be more readable if the results were relayed more directly. For example, the sentence that starts on line 166 could be re-written as: "Four fixed time constants were tested for the SNN with SFA (tau_a = 200 ms, 2 s, etc., see Table 1)."

Line 299: It is not clear how Figure 4B relates to this sentence. The citation to Figure 4B may be misplaced, or the sentence may need to be rewritten. The rest of this paragraph does not make a clear point and should be re-written.

Paragraph starting line 342 refers to points A, B, and C in Figure 2, but there are no such labels in that figure.

Line 349: Replace "currently not in the focus of attention" with "not attended".

Line 363: Given that this study did not examine the extent of SFA needed to improve performance, I suggest softening this statement. There could be a nonlinear relationship between the amount of SFA and performance on temporal computing tasks – perhaps too much SFA could diminish performance.

Line 365: This sentence should be rewritten. As it is written now, it indicates that the analysis itself was loose, but I think the intention was to point out that a strict alignment between the SFA and the task time scales is not necessary.

Reviewer #2 (Recommendations for the authors):

Salaj et al., simulate biophysically-inspired spiking neural networks that solve a range of sequential processing tasks. Understanding how neural networks perform sequential tasks is important for understanding how animals recognize patterns in time or perform sequential behaviors. Such processes are especially relevant for understanding language processing in humans. The authors show that spike-frequency adaptation can endow spiking networks with a form of working memory that allows them to solve sequential tasks.

In spike-frequency adaptation (SFA), neurons that have been active recently become harder to activate in the future, emitting fewer spikes for a given input.

Salaj et al., offers a proof-of-principle that spiking neural networks with SFA can support a dynamic memory, comparable to the rate-based Long Short-Term Memory (LSTM) networks from machine learning. Because of spike-frequency adaptation, neurons that have been active recently become harder to re-activate. A network can query a memory by checking which neurons remain silent upon recall. This model offers a compelling hypothesis for a form of short-term working memory that they term "negative imprinting".

As a computational study, this work has very few weaknesses. The model is simplified to the point where many components are biologically implausible, but this avoids excess detail that would make the results hard to interpret. Care is taken to incorporate spike-frequency adaptation in an abstract way that provides insight, but avoids strong assumptions. The memory capabilities and mechanisms that emerge in the trained models imply some testable predictions about features that one might find in biological networks.

However, the grounding of the model in biology is limited. This is understandable, since it is difficult to constrain a model using datasets that do not yet exist. The paper does not explore whether the statistics of spiking activity that emerges in the simulations is consistent with any of the operating regimes of spiking networks observed in experiments. The hypothesis is interesting, but it is likely that the models exhibit some unrealistic behaviors. It would be useful to discuss the implications of these differences. More specific, experimentally-testable predictions that could falsify the model would be welcome.

In the context of machine learning and neural computation, Salaj et al., offer empirical evidence that networks with spike-frequency adaptation can compete with the LSTM networks on simple sequential tasks. However, it remains unclear whether the strategies learned in spiking networks have a deep connection to LSTMs, or whether these results scale to harder problems. But, there is no need to solve everything at once. Salaj et al., address several technical problems in constructing such networks, which will be of interest to researchers exploring similar models.

Overall, this study asked whether spike-frequency adaptation could provide a computational substrate for sequential processing. Salaj et al., show, through a range of simulations on diverse tasks, that the answer is yes (at least in theory). This provides a clear modeling foundation that can guide further experimental and modeling studies.

However, this modeling framework might be in some sense too powerful. Spike-frequency adaptation endows neurons with a slow variable. Training via backpropagation-through-time allows networks to use these slow variables for sequential memory, but it is not clear that this is unique to spike-frequency adaptation. Any slow process could be similarly harnessed, provided it is affected by the history of spiking activity, and alters a neuron's computational properties.

An important question, then, is whether biological learning rules build networks that use spike-frequency adaptation in a way that is computationally similar to these simulations. This is not a question that can be answered through simulation alone, but the principles outlined in Salaj et al. suggest experiments that could address it.

Salaj et al., borrow concepts from machine learning to construct interpretable models of spiking neural computation. Their simulated networks solve sequential tasks using spiking neurons. It is not clear that their work applies directly to biological neural networks, but it provides concrete hypotheses. This work extends our theoretical understanding of potential roles for spike-frequency adaptation in neural computation. Previous work has indicated that spike-frequency adaptation can store an 'afterimage' of recent neural activity. Some models propose that this could allow neurons to optimize communication, by sending fewer spikes for persistent inputs. Previous studies have conjectured that similar mechanisms could support sequential processing. Salaj et al., stands as an important contribution toward making such hypotheses concrete, and positing specific mechanisms.

I was pleased to receive your manuscript for review. I found the results intriguing, and liked the paper. As a computational study, the work is very impressive and solid. However, I think the presentation must be revised to reach eLife's target audience.

My understanding is that eLife targets a general audience in biology. Your work uses modeling to make a concrete hypothesis about the function of spike-frequency adaptation in neural computation. As written, it is relevant and accessible to theorists working at the intersection of neurophysiology, neural computation, and machine learning. Would it be worth elaborating on parts of the manuscript to reach out to an even broader audience of experimentalists in biology? Perhaps:

– Better orient readers as to what biophysical processes the paper addresses in the introduction.

– Discuss a specific compelling experiment that suggests a computational role for SFA, what specifically was measured, etc.

– Draw more specific connections to biology when possible.

– Highlight which aspects of the model have a clear biological interpretation.

– Be explicit about unphysiological aspects of the model (and why these assumptions are ok).

– Be critical of the simulations: are there any unphysiological behaviors? Should we be concerned?

– What ’specific’ experimental results could falsify your model? What would proving your model false imply?

– What core predictions of the model will likely survive, even if many of the details are wrong?

– Computational mechanisms are very flexible: there are likely other models that could show something similar. Is there something special about this solution that makes it more plausible than others?

Especially toward the end, further discussing the theory in the context of neurophysiology, and proposing a few concrete experiments (perhaps extending ones in prior literature), could be useful.

The library that supports these simulations is provided, but the code to generate the specific models in the paper is not. (Is this correct?) If so, is there any way to provide this? I realize research code is hard to clean up for publication, but for this study it would be welcome. It doesn't need to be a detailed tutorial, just some scripts or ipython notebooks that reproduce the major findings. (If you used notebooks, PDF snapshots of the executed notebooks might also do, if the code is visible?)

You show that a SVM cannot decode a memory trace during the delay/hold period from spiking activity. Can you also show that the SVM ‘can’ decode the memory, if it is also provided the slow adaptation variables? This will better confirm that it is spike-frequency adaptation that stores the memory trace.

Google Speech Commands are mentioned in the data availability statement and supplement, but nowhere in the main text. Can you discuss why these experiments were performed and what they tell us? Or, are they really necessary for the main result?

The "negative imprinting principle" is a major organizing concept introduced in this paper. At present, it is mentioned (with little explanation) in the introduction. The section introducing "negative imprinting" in the results is isolated. The idea is very general, and could apply to any process with slow timescales that depends on spiking history and affects neuronal dynamics. Should we simply think of "negative imprinting" as "firing rate suppression stores (some sort of?) memory trace". Is there any deeper connection with the components of a LSTM? Can you elaborate or speculate on further connections to other studies?

Can you address the concerns outlined in the public review, and above?

– Improve the presentation of the results to provide clearer and more concrete benefits to a broader audience in biology.

– Discuss limitations: nonphysical aspects of the model, and nonphysical behavior of the mode.

– Be as concrete as possible about what would be needed to falsify the prediction, ideally proposing experiments (at least abstractly).

– Discuss likely failure modes of the theory, and what this would imply for neurophysiology/neural computation.

– Discuss whether the backpropagation training is too powerful (Sejnowski lab applied it to neuronal parameters, and it learned slow synapses, so I think it really is: it uses anything it can to build a memory).

More generally, the text could benefit from a rewrite to improve organization and better guide readers. There are also some parts that are hard to read, but I'm less worried about those, as they can be caught in the final copyediting, etc.

Literature connections

Connections with the literature could be better explored. Discussing these might also make the paper more accessible to a general audience, and increase impact. Here are some I can think of:

GJ Gutierrez, S Denève. (2019). Population adaptation in efficient balanced networks. eLife

Kim, R., Li, Y., and Sejnowski, T. J. (2019). Simple framework for constructing functional spiking recurrent neural networks. Proceedings of the national academy of sciences, 116(45), 22811-22820.

Li Y, Kim R, Sejnowski TJ. (2020). Learning the synaptic and intrinsic membrane dynamics underlying working memory in spiking neural network models. bioRxiv.

The connection to Gutierrez and Denève, (2019) is interesting. In their model, they use spike-frequency adaptation to implement efficient coding: in principle, neurons could reduce their firing rates for persistent stimuli. This also creates a "negative imprint", and introduces a sort of sequential dependence in the neural code that is used to improve its efficiency (reduce the number of spikes needed).

Contrasting your work with Kim et al., (2019) might be useful. They configure the time-constants and population rates so that the network occupies a rate-coding regime. The explicit treatment of SFA in your work counters this: continuous, slow-timescale variables can be stored within neurons; there is no need to create a rate-coding regime to support a slow dynamical system.

Li et al., (2020) applied machine learning to adjust neuronal time constants to create slow timescales. Their model seems to have built working memory by slowing down synaptic integration. I'm not sure this is plausible, especially since the balanced state should be associated with high conductances which reduce the membrane time constant? Internal variables that support spike-frequency adaptation might be a more robust substrate for working memory.

Also, Jing Cai's 2021 Cosyne talk is relevant. (video is online). They trained deep semantic nets, which learn neuronal responses that look like language neurons in the brain. They argue that some neurons involved in language processing have activity that can be interpreted as a predictive model. This seems to point to a larger body of research on the neural correlates of language processing that might be worth discussing. I don't know if Cai's work is directly relevant, but any work on understanding how neurons process language would be very nice to discuss in context.

Perhaps point out: your work provides a neural substrate for working memory that is much more efficient than any of the attractor-based methods developed in neural mass/field theory in the past 30 years. Is all of that work on manifold attractors and neural assemblies nonsense? What about Mark Goldman's various tricks for getting slow dynamics in rate networks (non-normal dynamics, feedback control). SFA more efficient, is there any role for these other mechanisms? Population dynamics in frontal and parietal cortex do seem more consistent with mechanisms for working memory that use reverberating population activity. Can these other models do things that would be hard for implement using SFA? Or is SFA secretly an important factor in stabilizing these working-memory correlates that we observe in large populations?

Comments on the text

192: See section 2 of the Supplement for more results with sparse connectivity, enforcement of Dale's Law and comparison to ANNs.

This is very brief. When reading the main results, I was worried that these results might be incompatible with the operating regimes of biological networks. It is helpful to know whether the model still works with more realistic network. I think it is also important to examine the statistics of population activity and see if they deviate in any obvious way from neural recordings. Is this possible?

40: SFA denotes a feature of spiking neurons where their preceding firing activity transiently increases their firing threshold.

Surely SFA refers to a broad class of mechanisms that reduce firing rate after prolonged activity, not just threshold adaptation? Gain modulation, synaptic depression, other slow changes in channel states, could lower firing rates by adjusting other physiological parameters. But, in this work threshold adaptation is used as a qualitative proxy for a range of these phenomena. I'm curious, is there any broad survey of different types of spike-frequency adaptation? Would you expect different mechanisms to have different implications for computation?

66 including the well-known 12AX task

What is 12AX? (I had never heard of it.)

70: A practical advantage of this simple model is that it can be very efficiently simulated and is amenable to gradient descent training methods.

Is there a reference that could go here for further reading on simulation/training?

72: It assumes that the firing threshold A(t) of a leaky integrate-and-fire (LIF) neuron contains a variable component a(t) that increases by a fixed amount after each of its spikes z(t).

I suspect this introduction of the LIF model is too fast for a general audience. Some experimentalists (especially students) may not have heard of the LIF model before. Equation 1 is too abrupt. Define the model equations (in Methods) here. This is also important for making it clear what variable abbreviations mean.

88: We used Backpropagation through time (BPTT) for this, which is arguably the best performing optimization method for SNNs that is currently known.

This needs a citation to support it. Please provide a citation that (1) provides a good introduction to BPTT for a general audience, and (2) justifies that this is the best-performing method known.

Figure 1-C:

The implication is that this is a purely excitatory recurrent network; So there are no I cells and we should not think of this as similar to the balanced networks of Deneve, Machens, and colleagues, nor should we think of this as an inhibition-stabilized network in the stabilized supralinear regime? Should we believe that the operating regime of this network is at least vaguely similar to something that happens in some biological neural networks?

211: The 12AX task – which can be viewed as a simplified version of the Wisconsin Card Sorting task (Berg, 1948) – tests the capability of subjects to apply dynamically changing rules for detecting specific subsequences in a long sequence of symbols as target sequences, and to ignore currently irrelevant inputs (O'Reilly and Frank, 2006; MacDonald III, 2008).

I'm unfamiliar with the Wisconsin Card Sorting task is. The authors seem to assume that readers will already be familiar with common sequential tasks used in psychophysics. For a general audience, more background should be provided.

Figure 3:

I like this figure a lot; It makes it easy to grasp what the 12AX task is and how the spiking net solves it. The inconsistent font sizes and text orientations for the vertical axes is triggering some moderate dyslexia, however.

270: Obviously, this question also lies at the heart of open questions about the interplay between neural codes for syntax and semantics that enable language understanding in the human brain.

This is one of the more interesting connections; Perhaps it can be hinted at earlier as well. I suspect there is more recent work that one could cite exploring this. (and this was not obvious to me)

Page 9:

The results, as written, are a bit disorienting. Parts of the results seem to flow into discussion, then switch abruptly back to more results. It seems like sections could be better organized and more clearly linked into a single linear story. Some additional text could be provided to guide the reader throughout?

277: In particular, they also produce factorial codes, where separate neurons encode the position and identity of a symbol in a sequence.

I feel like I missed this result, could it be more explicitly emphasized wherever it is presented in the preceding text?

295: For comparison, we also trained a LIF network without SFA in exactly the same way with the same number of neurons. It achieved a performance of 0.0%.

This is really nice (and nicely written!). Could the simulations presented earlier benefit from a similar comparison?

However! At this point I'm very impressed, but now wondering: is this too powerful? Can backpropagation simply learn to harness any slow variables that might be present? Should we be worried about whether a backpropagation-trained network can be clearly compared to biology?

297: A diversity of neural codes in SNNs with SFA…

Is this section still talking about the repeat/reverse task?

322: Previous neural network solutions for similarly demanding temporal computing tasks were based on artificial Long Short-Term Memory (LSTM) units. These LSTM units are commonly used in machine learning, but they cannot readily be mapped to units of neural networks in the brain.

This is nice; More citations; Also, hint at this in the introduction? It's a good hook.

326: We have shown that by adding to the standard model for SNNs one important feature of a substantial fraction of neurons in the neocortex, SFA, SNNs become able to solve such demanding temporal computing tasks.

Can we conjecture (1) whether this in some way implements anything similar to the gates in LSTMs? and (2) whether we should expect to see anything like this in any biological network?

Methods, lines 429-235 are key:

This explains why we see scaling factors inside the Heaviside step function earlier; These do nothing for the actual network dynamics, but scale the pseudo-gradients in some useful way. The choice to attenuate the pseudo-gradients by $\γ=0.3$ is arbitrary and mysterious. It is possible to provide some intuition about why and how training fails when $\γ$ is too large? Is it possible to provide some intuition about why the pseudo-gradients should also be normalized by the current threshold of the neuron?

Presumably the role of the adaptation variable $a$ is also included in the BPTT derivatives? Or is it enough to adapt weights and leave the effect of this implicit? I ask because it seemed like Bellec, 2020 truncated the expansion of some of the partial derivatives, but training still worked.

What training approach was used for BPTT? In my own attempts, BPTT nearly always unstable because of successive loss of precision. Dissipative systems and a mixture of time constants on disparate scales are especially problematic. Any suggestions for us mortals?

https://doi.org/10.7554/eLife.65459.sa1

Author response

Essential Revisions:

1) The authors need to rewrite the results and introduction with a broad biological audience in mind. As it stands, it targets readers who work on artificial neural networks and the results are essentially benchmarks against artificial neural networks. Many of the assumptions and connections to biology are not discussed adequately. In particular, key omissions and simplifications need to be discussed: the artificial nature of training, the lack of structure in the recurrent network, the likely differences between the statistics of spiking in the model vs what might be observed in an intact brain.

We have rewritten the Introduction and Discussion, and also many parts of Results with a broad biological audience in mind. In particular, we have extended discussions about the relation of our work to neuroscience literature. We have changed some model parameters in order to reduce firing rates, and have added data on firing rates in our models. They appear to be now in a physiological range. In addition, we have carried out, for the 12AX task, control experiments that evaluate network performance in the presence of noise.

We have also added remarks regarding the lack of structure in the architecture of our models, in particular for the network that solves the 12AX task. We are pointing to the possibility that this may be a feature rather than a bug of our model: Cortical circuitry tends to be in our view much less structured than most handcrafted models. In particular, many simple assumptions about hierarchical processing in cortical networks tend to get into conflict with experimental data.

We have added further remarks that point out that this paper studies computational capabilities and limitations of different types of spiking neural networks, and not how these capabilities emerge in neural networks of the brain. We also have pointed to another recent paper (Bellec et al., Nature Communications) where it had been shown that for a variety of tasks and spiking neural network architectures the learning performance of BPTT can be approximated quite well by e-prop, which appears to be substantially more plausible learning method from the biological perspective. We have also added performance data for e-prop for one central task in this paper, the 12AX task. We also have pointed out at the end of the Discussion that temporal computing capabilities are likely to be to a large extent genetically encoded, rather than learned, and suggested for future work a concrete approach for testing whether this is in principle possible.

2) It would be helpful to lead the results with a simplified model that illustrates the negative imprinting mechanism directly. The results as they stand use rather complex models and provide largely observational evidence for the main claims of the paper. Since performance gains over other artificial neural networks are variable according to the task, some handle on when SFA is beneficial would help.

We agree, and hence we added a simplified model (new Figure 1D, E) that illustrates the negative imprinting principle directly, for a very simple temporal computing task. We have also added a number of control experiments where the same task was solved by different types of spiking neural networks, with and without SFA see in particular Figure 2B and C.

3) The relevance of SFA (as opposed to other slow timescale mechanisms) deserves fuller attention. The SFA characteristics are extracted from intracellular slice recordings where there is no substantial background activity. In an intact animal (with realistic spiking statistics) it is questionable whether the results will hold in a network where ongoing activity could maintain the SFA recovery variable close to steady-state. If this is the case, the conclusions need to be adequately tempered.

We agree, and have added a new section “Comparing the contribution of SFA to temporal computing with that of other slow processes in neurons and synapses” to Results.

Regarding the remark that the SFA variable could saturate in-vivo, due to bombardment by synaptic input: We have added a discussion of the implicit self-protection mechanism of the adapted state of neurons with SFA: Whenever information has been loaded into their hidden variable, the adaptive threshold, this automatically protects the neurons from firing in response to weaker inputs. But if these neurons do not fire, the value of their salient hidden variable is not affected by noise inputs. We have discussed this mechanism in particular in the context of another candidate mechanism for storing information in the hidden state of a neuron: spike triggered increases of neuronal excitability. This mechanism has no such self-protection against noise, and also the contribution of such neurons (“ELIF neurons”) to temporal computing is much less pronounced.

Reviewer #1 (Recommendations for the authors):

The authors demonstrate the utility of spike frequency adaptation (SFA) in an artificial spiking neural network (SNN) for accomplishing tasks with a temporal dimension. They first show that a SNN with SFA can perform a simple store/recall task. They then test their model against tasks with increasing complexity. The authors find that in all of these tests, the SNN with SFA is able to match or outperform the most commonly implemented artificial neural network mechanisms which are also less biologically relevant. This study brings an awareness of the SFA mechanism to neural network modelers and it quantifies the improvement it brings to the performance of complex tasks. Furthermore, this study offers neurobiologists a new perspective on the SFA mechanism and its function in neural processing.

The most exciting strength of this paper is its multidisciplinary nature. It has the potential to drive innovation in the field of artificial neural networks (ANNs) by exploring how a known biological neural mechanism – SFA – can impact the performance of ANNs for solving temporally-complex tasks.

The multidisciplinary nature of this paper is a double-edged sword, however. At times there was a bit of a disconnect between the conclusions and the results. In most cases, the performance of the model against a canonical benchmark is rigorously quantified in a way that is standard in many neuroengineering papers. However, there were also multiple attempts to connect the model's performance to actual findings in the brain. I think that this is a worthwhile thing to do because it targets a broader readership that makes the study more relevant to the eLife audience; however, those comparisons were less rigorous than the ANN benchmark comparisons. This was particularly the case for the Negative Imprinting phenomenon and the emergent mixed-selectivity codes.

With regards to the rigorous benchmark comparisons presented, this paper could be made more accessible to the general neuroscience audience by putting the results and benchmark comparisons into a fuller context for those who are not as familiar with how well recent models have performed on these tasks. At times, the results showed very impressive improvement of the model over standard benchmarks while for other tasks the improvements appeared incremental. It would be helpful to have the performance improvements from the SFA model placed in the context of recent improvements in the canonical or state-of-the-art ANNs. This would allow the reader to know whether a small increase in performance represents an incremental improvement or a giant leap forward in the context of the history of ANNs.

Overall, there are a few main things I suggest the authors focus on revising. The figures and their captions could be made clearer in some places, as well as places in the main text that refer to specific parts of the figures. The other areas I think the authors should focus on are the last section of the Results, the Negative Imprinting Principle section, and the Discussion section.

The last section of the Results should probably be rewritten – especially the first paragraph. In general, in this section a more careful account of how the authors arrive at their conclusions is needed. For example, the authors state that "a large fraction of neurons were mixed-selective" but they only show or mention an analysis of two neurons (Figure 4E,F).

The first paragraph is rewritten (a few sentences reordered). More details (what one can see from Figure 4) are given (L416-420).

Analysis that we conducted to conclude about mixed-selectivity of neurons was described, but a reference to Figure 4D that shows fractions of neurons selective to different combinations of variables was missing, and now added (L428-434). More details about the analysis are provided in Methods, last paragraph.

A few times in the text (line 19 in abstract, line 43, line 160, line 363), reference was made to the amounts of SFA observed in the brain or in particular regions, or the properties of neurons with SFA. These statements should have citations of actual studies (in the main text) to support them. If any of these claims are based on the author's own analysis of the Allen Institute data, they may want to include that in the Methods or Supplement and cite their Methods or Supplement section in the main text instead of citing the database.

We added a new Appendix figure 8 to the Appendix 1 and referenced it in the introduction (L37) to make this finding more clear. We will also be publishing the code for the reproduction of this figure based on the Allen Institute data.

It should be explicitly stated which synapses were being trained (recurrent weights, readout weights, or both) and there should be an explicit expression for output sigmoid neurons in the Methods.

The Methods are updated to explicitly state the trained synapses (L669). The expressions for the sigmoid (used in 1D and 20D STORE-RECALL task) and softmax function (used in sMNIST, 12AX, Duplication/Reversal tasks) are added (L645).

An explanation or rationale for why SFA networks only had SFA in a portion of LIF neurons would be helpful.

We added remarks regarding this decision (to test the effect on the performance) and new results demonstrating the effect of the proportion of neurons with SFA in the network on the performance (L355-362).

One of the results I found most striking was the finding that only 1 network level with SFA was needed to achieve the 12AX task rather than the canonical 2 network levels. Given that the authors present the first instance of a SNN solving the 12AX task, it seems that these results are worth unpacking in greater depth in the Results and Discussion sections.

We included a paragraph discussing this (L485-494).

Figure 1: Perhaps include STORE and RECALL in schematic in C, or a general command module that represents analogous commands from the various tasks (i.e. duplicate, replicate, etc.).

We added a simplified model for illustration of a simple version of the STORE-RECALL task, (new schematic in Figure 1D), and we also use it to explain the negative imprinting principle.

Figure 2: The caption refers to "grey values" but it was not clear what/where exactly those are in the context of this figure. This should be clarified.

The caption of Figure 2 has been adapted to avoid confusion.

Figure 3: I found the Top to Bottom listing in the caption to be confusing because it did not refer to the labels on the figure very directly, leaving things open to interpretation. The caption should be re-written to clarify this.

Some labels in Figure 3 were vertical, some horizontal. This is changed now (everything horizontal), and the caption is slightly adapted.

Figure 4: Error in number of neurons (B-F)? Caption says 279 neurons but the main text says there were 320 neurons. Please clarify the discrepancy.

The plots in B and C should have annotations that orient the reader to the relevant parts of the task. For example, a line down the middle of Figure 4B indicating the start of duplication or reversal. In Figure 4C, it would be helpful to see where the abcde sequence falls on the time axes.

A part stating that some neurons were discarded from the analysis is added in the caption. The criteria to remove neurons was described in the last section of Materials and methods (L860).

Figure 4B and C are modified as suggested.

Line 52: Fairhall and others have written several reviews that could be cited here as well (Weber and Fairhall, 2019, for example).

Citations added in the paragraph discussing efficient neural coding (L43-51).

Line 122: Replace "perfectly reproduced" with "accurately reproduced" since the output displayed in the figure is clearly not a "perfect" reproduction of the input even if it is deemed an accurate representation by whatever tolerance metric is used.

Suggestion implemented (L214).

Line 129: It was confusing for me to determine which part of Figure 2 was the 3rd to last row. I suggest either labeling the rows in Figure 2 with A,B,C, etc or referring to the labels that are already there when describing a given row in the text, i.e. "The row labelled Thresholds in Figure 2 shows the temporal dynamics…".

In the previous version, we had referred to the thresholds in Figure 2 to explain the negative imprinting principle. This is improved now – we describe and explain the negative imprinting principle by referring to Figure 1E (L141-163).

Paragraph on line 146: I suggest re-wording this paragraph to make it clearer and more focused on your own results. After the first sentence, you should directly describe what you did to test whether the memory storage is supported or not by attractor dynamics. The discussion of the Wolff study and how it relates to your results should be incorporated into the Discussion section instead.

Since the reference to Wolff study is the motivation for that experiment, we did not find it appropriate to remove it from this section (Results). However, the paragraph is rewritten (L190-206), stating that the main result is consistent with the experimental data, and all the details are described in Materials and methods (L749).

The Wolff study is discussed again in the Discussion (L456).

Line 166: I think you meant "upper five rows" not "upper four rows". In either case, I found it confusing to refer to the rows described in the text. This paragraph would be more readable if the results were relayed more directly. For example, the sentence that starts on line 166 could be re-written as: "Four fixed time constants were tested for the SNN with SFA (tau_a = 200 ms, 2 s, etc., see Table 1)."

Suggestion implemented (L227).

Line 299: It is not clear how Figure 4B relates to this sentence. The citation to Figure 4B may be misplaced, or the sentence may need to be rewritten. The rest of this paragraph does not make a clear point and should be re-written.

The relation of the sentence to Figure 4B is rewritten, by explicitly pointing to the parts of the figure (L416-425). The paragraph and its relation to Figure 4B-F is improved (L417).

Paragraph starting line 342 refers to points A, B, and C in Figure 2, but there are no such labels in that figure.

This was a misunderstanding of the reader (we referred to time points A, B, C, and also to Figure 2), but we removed the references to these time points A, B, C, and rephrased the section (L464).

Line 349: Replace "currently not in the focus of attention" with "not attended".

Modification implemented (L457).

Line 363: Given that this study did not examine the extent of SFA needed to improve performance, I suggest softening this statement. There could be a nonlinear relationship between the amount of SFA and performance on temporal computing tasks – perhaps too much SFA could diminish performance.

Suggestion implemented, (paragraph starting on L562).

We also included a new result in the section for the 12AX where we demonstrate the effect that the amount of SFA (fraction of neurons with SFA) has on the performance (L355-362), and state the result where too much SFA diminished the performance (L356).

Line 365: This sentence should be rewritten. As it is written now, it indicates that the analysis itself was loose, but I think the intention was to point out that a strict alignment between the SFA and the task time scales is not necessary.

Suggestion implemented (L569-571).

Reviewer #2 (Recommendations for the authors):

Salaj et al., simulate biophysically-inspired spiking neural networks that solve a range of sequential processing tasks. Understanding how neural networks perform sequential tasks is important for understanding how animals recognize patterns in time or perform sequential behaviors. Such processes are especially relevant for understanding language processing in humans. The authors show that spike-frequency adaptation can endow spiking networks with a form of working memory that allows them to solve sequential tasks.

In spike-frequency adaptation (SFA), neurons that have been active recently become harder to activate in the future, emitting fewer spikes for a given input.

Salaj et al., offers a proof-of-principle that spiking neural networks with SFA can support a dynamic memory, comparable to the rate-based Long Short-Term Memory (LSTM) networks from machine learning. Because of spike-frequency adaptation, neurons that have been active recently become harder to re-activate. A network can query a memory by checking which neurons remain silent upon recall. This model offers a compelling hypothesis for a form of short-term working memory that they term "negative imprinting".

As a computational study, this work has very few weaknesses. The model is simplified to the point where many components are biologically implausible, but this avoids excess detail that would make the results hard to interpret. Care is taken to incorporate spike-frequency adaptation in an abstract way that provides insight, but avoids strong assumptions. The memory capabilities and mechanisms that emerge in the trained models imply some testable predictions about features that one might find in biological networks.

However, the grounding of the model in biology is limited. This is understandable, since it is difficult to constrain a model using datasets that do not yet exist. The paper does not explore whether the statistics of spiking activity that emerges in the simulations is consistent with any of the operating regimes of spiking networks observed in experiments. The hypothesis is interesting, but it is likely that the models exhibit some unrealistic behaviors. It would be useful to discuss the implications of these differences. More specific, experimentally-testable predictions that could falsify the model would be welcome.

In the context of machine learning and neural computation, Salaj et al., offer empirical evidence that networks with spike-frequency adaptation can compete with the LSTM networks on simple sequential tasks. However, it remains unclear whether the strategies learned in spiking networks have a deep connection to LSTMs, or whether these results scale to harder problems. But, there is no need to solve everything at once. Salaj et al., address several technical problems in constructing such networks, which will be of interest to researchers exploring similar models.

Overall, this study asked whether spike-frequency adaptation could provide a computational substrate for sequential processing. Salaj et al., show, through a range of simulations on diverse tasks, that the answer is yes (at least in theory). This provides a clear modeling foundation that can guide further experimental and modeling studies.

However, this modeling framework might be in some sense too powerful. Spike-frequency adaptation endows neurons with a slow variable. Training via backpropagation-through-time allows networks to use these slow variables for sequential memory, but it is not clear that this is unique to spike-frequency adaptation. Any slow process could be similarly harnessed, provided it is affected by the history of spiking activity, and alters a neuron's computational properties.

An important question, then, is whether biological learning rules build networks that use spike-frequency adaptation in a way that is computationally similar to these simulations. This is not a question that can be answered through simulation alone, but the principles outlined in Salaj et al. suggest experiments that could address it.

Salaj et al., borrow concepts from machine learning to construct interpretable models of spiking neural computation. Their simulated networks solve sequential tasks using spiking neurons. It is not clear that their work applies directly to biological neural networks, but it provides concrete hypotheses. This work extends our theoretical understanding of potential roles for spike-frequency adaptation in neural computation. Previous work has indicated that spike-frequency adaptation can store an 'afterimage' of recent neural activity. Some models propose that this could allow neurons to optimize communication, by sending fewer spikes for persistent inputs. Previous studies have conjectured that similar mechanisms could support sequential processing. Salaj et al., stands as an important contribution toward making such hypotheses concrete, and positing specific mechanisms.

I was pleased to receive your manuscript for review. I found the results intriguing, and liked the paper. As a computational study, the work is very impressive and solid. However, I think the presentation must be revised to reach eLife's target audience.

My understanding is that eLife targets a general audience in biology. Your work uses modeling to make a concrete hypothesis about the function of spike-frequency adaptation in neural computation. As written, it is relevant and accessible to theorists working at the intersection of neurophysiology, neural computation, and machine learning. Would it be worth elaborating on parts of the manuscript to reach out to an even broader audience of experimentalists in biology? Perhaps:

– Better orient readers as to what biophysical processes the paper addresses in the introduction.

– Discuss a specific compelling experiment that suggests a computational role for SFA, what specifically was measured, etc.

– Draw more specific connections to biology when possible.

– Highlight which aspects of the model have a clear biological interpretation.

– Be explicit about unphysiological aspects of the model (and why these assumptions are ok).

– Be critical of the simulations: are there any unphysiological behaviors? Should we be concerned?

– What ’specific’ experimental results could falsify your model? What would proving your model false imply?

– What core predictions of the model will likely survive, even if many of the details are wrong?

– Computational mechanisms are very flexible: there are likely other models that could show something similar. Is there something special about this solution that makes it more plausible than others?

We have added a new section with results comparing SFA to other mechanisms (L270-299).

We also discussed the self-protection mechanism of memory content (L532-539) achieved through the means of SFA (but also, other depressing mechanisms).

Especially toward the end, further discussing the theory in the context of neurophysiology, and proposing a few concrete experiments (perhaps extending ones in prior literature), could be useful.

Suggestion implemented. The Discussion section is rewritten, and we make a few suggestions for the experiments relating to the existing ones in prior literature (L458-463, L540-546, L557-561, L562-565).

The library that supports these simulations is provided, but the code to generate the specific models in the paper is not. (Is this correct?) If so, is there any way to provide this? I realize research code is hard to clean up for publication, but for this study it would be welcome. It doesn't need to be a detailed tutorial, just some scripts or ipython notebooks that reproduce the major findings. (If you used notebooks, PDF snapshots of the executed notebooks might also do, if the code is visible?)

We are in the process of preparation and plan to release all code to reproduce all results presented in the paper and Appendix 1.

You show that a SVM cannot decode a memory trace during the delay/hold period from spiking activity. Can you also show that the SVM ‘can’ decode the memory, if it is also provided the slow adaptation variables? This will better confirm that it is spike-frequency adaptation that stores the memory trace.

This is possible but is somewhat trivial. We attempted to clarify this better by adding a new result and Figure 1E where we illustrate the negative imprinting principle and how slow adaptation variables are exploited for the memory.

Google Speech Commands are mentioned in the data availability statement and supplement, but nowhere in the main text. Can you discuss why these experiments were performed and what they tell us? Or, are they really necessary for the main result?

We included two paragraphs: in the Results on (L252), and explained the reason to consider this task (L513-516).

The "negative imprinting principle" is a major organizing concept introduced in this paper. At present, it is mentioned (with little explanation) in the introduction. The section introducing "negative imprinting" in the results is isolated. The idea is very general, and could apply to any process with slow timescales that depends on spiking history and affects neuronal dynamics. Should we simply think of "negative imprinting" as "firing rate suppression stores (some sort of?) memory trace". Is there any deeper connection with the components of a LSTM? Can you elaborate or speculate on further connections to other studies?

We significantly expanded the negative imprinting section (L141) and added new results (also Fig1D, E) to illustrate the negative imprinting principle better. We also expanded the Discussion section with a paragraph on the implications of our results (L520-546), and proposed a testable hypothesis (L458-463, L476-480). The relation to LSTM is discussed in more detail at another point in the rebuttal below, but mentioned in the Discussion (L516-519).

Can you address the concerns outlined in the public review, and above?

– Improve the presentation of the results to provide clearer and more concrete benefits to a broader audience in biology.

Introduction section is expanded (e.g., L43-51), Discussion section is rewritten, and the connection to neuroscience literature is discussed (L478, L528-546, L562-565).

– Discuss limitations: nonphysical aspects of the model, and nonphysical behavior of the mode.

The LIF model, in principle, can produce neuronal activity in an unphysiological range. We state the average firing rates for our experiments, and they appear to be in a meaningful range.

In the text, we state that the firing rate of neurons in a physiological range of sparse activity is needed if we want to compare them to brain activity, and implications for understanding the brain (L506-512).

– Be as concrete as possible about what would be needed to falsify the prediction, ideally proposing experiments (at least abstractly).

– Discuss likely failure modes of the theory, and what this would imply for neurophysiology/neural computation.

Our model makes a number of concrete suggestions for further experiments which can validate the role of SFA for temporal computation. It suggests that a refined decoder that takes negative imprinting into account is able to elucidate the transformation of stored information between time points A (encoding) and an intermediate time point C (network reactivation) in the experiment of (Wolff et al., 2017). A more direct approach would be to reproduce the results of (Wolff et al., 2017) but with calcium imaging. If one can separately tag neurons with SFA with genetic mechanisms, one could see their particular role within temporal computing tasks. Another approach would be an ablation study where SFA is disabled via optogenetic manipulation, for which our results predict that the performance of such networks on working memory tasks would degrade.

In discussion, we state that the lack of the mathematical models that describe the processes on slower time scales are missing (L540-546). This could, in principle, be used as an experiment to support or contradict our results.

We added the remarks on the implications of our results (L520-539).

– Discuss whether the backpropagation training is too powerful (Sejnowski lab applied it to neuronal parameters, and it learned slow synapses, so I think it really is: it uses anything it can to build a memory).

We added a new section with results (L270-299) where we compare SFA to other mechanisms (see Figure 2 B,C) which also demonstrates that backpropagation is not too powerful and without a capable substrate (like SFA) cannot solve the tasks. Also, we did not train any of the time constants or similar parameters which could change the temporal dynamics of the models.

More generally, the text could benefit from a rewrite to improve organization and better guide readers. There are also some parts that are hard to read, but I'm less worried about those, as they can be caught in the final copyediting, etc.

Literature connections

Connections with the literature could be better explored. Discussing these might also make the paper more accessible to a general audience, and increase impact. Here are some I can think of:

GJ Gutierrez, S Denève. (2019). Population adaptation in efficient balanced networks. ELife

We have added the suggested reference (L43-51, L71).

Kim, R., Li, Y., and Sejnowski, T. J. (2019). Simple framework for constructing functional spiking recurrent neural networks. Proceedings of the national academy of sciences, 116(45), 22811-22820.

This is one of many papers presenting a novel method of converting rate networks to spiking networks working in a rate regime. Since our work is not about learning methods in spiking networks, we do not find it appropriate to discuss this specific algorithm.

Li Y, Kim R, Sejnowski TJ. (2020). Learning the synaptic and intrinsic membrane dynamics underlying working memory in spiking neural network models. bioRxiv.

This paper also discusses the learning method and learning the time constants, none of which are related to the results or message of our paper. Our conclusion is that this paper is also not appropriate related literature for our work. However, we have added more recent neuroscience-related literature regarding working memory (L537): (Mongillo et al., 2018; Kim and Sejnowski, 2021).

The connection to Gutierrez and Denève, (2019) is interesting. In their model, they use spike-frequency adaptation to implement efficient coding: in principle, neurons could reduce their firing rates for persistent stimuli. This also creates a "negative imprint", and introduces a sort of sequential dependence in the neural code that is used to improve its efficiency (reduce the number of spikes needed).

Discussed in Introduction (L43-51), but in the context of efficient, and stable neural codes. Referenced again after introducing the negative imprinting (L71).

Contrasting your work with Kim et al., (2019) might be useful. They configure the time-constants and population rates so that the network occupies a rate-coding regime. The explicit treatment of SFA in your work counters this: continuous, slow-timescale variables can be stored within neurons; there is no need to create a rate-coding regime to support a slow dynamical system.

Li et al., (2020) applied machine learning to adjust neuronal time constants to create slow timescales. Their model seems to have built working memory by slowing down synaptic integration. I'm not sure this is plausible, especially since the balanced state should be associated with high conductances which reduce the membrane time constant? Internal variables that support spike-frequency adaptation might be a more robust substrate for working memory.

Also, Jing Cai's 2021 Cosyne talk is relevant. (video is online). They trained deep semantic nets, which learn neuronal responses that look like language neurons in the brain. They argue that some neurons involved in language processing have activity that can be interpreted as a predictive model. This seems to point to a larger body of research on the neural correlates of language processing that might be worth discussing. I don't know if Cai's work is directly relevant, but any work on understanding how neurons process language would be very nice to discuss in context.

Interesting work (and related, to some extent because of the similar approach to analyze the selectivity of neurons), but no publication or a preprint available yet.

Perhaps point out: your work provides a neural substrate for working memory that is much more efficient than any of the attractor-based methods developed in neural mass/field theory in the past 30 years. Is all of that work on manifold attractors and neural assemblies nonsense? What about Mark Goldman's various tricks for getting slow dynamics in rate networks (non-normal dynamics, feedback control). SFA more efficient, is there any role for these other mechanisms? Population dynamics in frontal and parietal cortex do seem more consistent with mechanisms for working memory that use reverberating population activity. Can these other models do things that would be hard for implement using SFA? Or is SFA secretly an important factor in stabilizing these working-memory correlates that we observe in large populations?

Comments on the text

192: See section 2 of the Supplement for more results with sparse connectivity, enforcement of Dale's Law and comparison to ANNs.

This is very brief. When reading the main results, I was worried that these results might be incompatible with the operating regimes of biological networks. It is helpful to know whether the model still works with more realistic network. I think it is also important to examine the statistics of population activity and see if they deviate in any obvious way from neural recordings. Is this possible?

We added the firing rates of neurons in experiments (L183, L348, L410), and a new result testing the robustness of the trained model to noise in the input (L363-370) and more details in (Appendix, section 5).

40: SFA denotes a feature of spiking neurons where their preceding firing activity transiently increases their firing threshold.

Surely SFA refers to a broad class of mechanisms that reduce firing rate after prolonged activity, not just threshold adaptation? Gain modulation, synaptic depression, other slow changes in channel states, could lower firing rates by adjusting other physiological parameters. But, in this work threshold adaptation is used as a qualitative proxy for a range of these phenomena. I'm curious, is there any broad survey of different types of spike-frequency adaptation? Would you expect different mechanisms to have different implications for computation?

We added a new section with results comparing the SFA to other mechanisms (L270).

However, the primary focus of our work was whether the SFA could implement working memory (through negative imprinting) and enable powerful temporal computations in SNNs (L61-75, L190-201, L456).

66 including the well-known 12AX task

What is 12AX? (I had never heard of it.)

We removed “well-known”, and expanded the section about this task with a couple of introductory sentences (L302-L306).

70: A practical advantage of this simple model is that it can be very efficiently simulated and is amenable to gradient descent training methods.

Is there a reference that could go here for further reading on simulation/training?

We added a citation to (Bellec et al., 2018) (L99).

72: It assumes that the firing threshold A(t) of a leaky integrate-and-fire (LIF) neuron contains a variable component a(t) that increases by a fixed amount after each of its spikes z(t).

I suspect this introduction of the LIF model is too fast for a general audience. Some experimentalists (especially students) may not have heard of the LIF model before. Equation 1 is too abrupt. Define the model equations (in Methods) here. This is also important for making it clear what variable abbreviations mean.

We have adapted the text to introduce the LIF model more gently (L99-103). However, our best judgment was not to move the equations for LIF to the main results.

88: We used Backpropagation through time (BPTT) for this, which is arguably the best performing optimization method for SNNs that is currently known.

This needs a citation to support it. Please provide a citation that (1) provides a good introduction to BPTT for a general audience, and (2) justifies that this is the best-performing method known.

References for introduction of BPTT are added (L126, L657), and for the second part, we softened the statement (L88, L126).

Figure 1-C:

The implication is that this is a purely excitatory recurrent network; So there are no I cells and we should not think of this as similar to the balanced networks of Deneve, Machens, and colleagues, nor should we think of this as an inhibition-stabilized network in the stabilized supralinear regime? Should we believe that the operating regime of this network is at least vaguely similar to something that happens in some biological neural networks?

Appendix 1 contains results of sparse networks respecting Dale’s law that are trained and in fact achieve even better results relative to the models not respecting these constraints (Appendix 1, section 2). Figure 1C does not imply that the models are purely excitatory. In fact, the weights of each synapse are freely learned under no constraint regarding its sign (L642).

211: The 12AX task – which can be viewed as a simplified version of the Wisconsin Card Sorting task (Berg, 1948) – tests the capability of subjects to apply dynamically changing rules for detecting specific subsequences in a long sequence of symbols as target sequences, and to ignore currently irrelevant inputs (O'Reilly and Frank, 2006; MacDonald III, 2008).

I'm unfamiliar with the Wisconsin Card Sorting task is. The authors seem to assume that readers will already be familiar with common sequential tasks used in psychophysics. For a general audience, more background should be provided.

The section is expanded, and the comparison to the Wisconsin Card Sorting is removed (L302-306).

Figure 3:

I like this figure a lot; It makes it easy to grasp what the 12AX task is and how the spiking net solves it. The inconsistent font sizes and text orientations for the vertical axes is triggering some moderate dyslexia, however.

Modified to be consistent.

270: Obviously, this question also lies at the heart of open questions about the interplay between neural codes for syntax and semantics that enable language understanding in the human brain.

This is one of the more interesting connections; Perhaps it can be hinted at earlier as well. I suspect there is more recent work that one could cite exploring this. (and this was not obvious to me)

We updated the introduction to hint at this earlier (L83-87), however, we could not find more recent work to cite.

Page 9:

The results, as written, are a bit disorienting. Parts of the results seem to flow into discussion, then switch abruptly back to more results. It seems like sections could be better organized and more clearly linked into a single linear story. Some additional text could be provided to guide the reader throughout?

This would be difficult to disentangle, because it was a motivation for such an analysis to conduct. Parts of the paragraph are rewritten for easier readability (page 12). We hint at this task and the neural codes that emerge and give references, (in the Introduction, (L83-87)), hence this paragraph should be easier to understand now.

277: In particular, they also produce factorial codes, where separate neurons encode the position and identity of a symbol in a sequence.

I feel like I missed this result, could it be more explicitly emphasized wherever it is presented in the preceding text?

Introduced in (L385). By factorial code is meant that, for example, position and identity of an item are encoded separately by some neurons.

295: For comparison, we also trained a LIF network without SFA in exactly the same way with the same number of neurons. It achieved a performance of 0.0%.

This is really nice (and nicely written!). Could the simulations presented earlier benefit from a similar comparison?

We also do that for the 12AX task (L358), STORE-RECALL and sMNIST task (Figure 2B, C, bar with the label LIF), also in Table 1 (first row).

However! At this point I'm very impressed, but now wondering: is this too powerful? Can backpropagation simply learn to harness any slow variables that might be present? Should we be worried about whether a backpropagation-trained network can be clearly compared to biology?

To address this issue we added a new Results section where we compare SFA to different mechanisms (L270), all trained with backpropagation. It is true that backpropagation can in theory harness any slow variables, however, it is important to point out that the time constants are not trained, and that some mechanisms, despite long time constants, are not able to be exploited to solve the working memory tasks (see these new results, L270).

297: A diversity of neural codes in SNNs with SFA…

Is this section still talking about the repeat/reverse task?

We slightly modified the title for this part, by adding “trained to carry out operations on sequences”, (L413). It should be clearer now that it relates to the duplicate/reverse task.

322: Previous neural network solutions for similarly demanding temporal computing tasks were based on artificial Long Short-Term Memory (LSTM) units. These LSTM units are commonly used in machine learning, but they cannot readily be mapped to units of neural networks in the brain.

This is nice; More citations; Also, hint at this in the introduction? It's a good hook.

We mention that LSTMs are state-of-the-art in machine learning (ML) (L516) and emphasize that machine learning commonly uses artificial models that are not biologically realistic (while we are able to do similar tasks with biologically realistic models).

326: We have shown that by adding to the standard model for SNNs one important feature of a substantial fraction of neurons in the neocortex, SFA, SNNs become able to solve such demanding temporal computing tasks.

Can we conjecture (1) whether this in some way implements anything similar to the gates in LSTMs? and (2) whether we should expect to see anything like this in any biological network?

The similarity to the LSTMs is that both LSTM and SFA have a “slow variable”. However, that is where the similarity ends. The slow variable (cell state) of LSTM is completely controlled by the trained gates which control the manipulation of the state based on the input. It can be viewed as a random access memory. In contrast, the slow variable of SFA decays constantly to its baseline and is modified by the neuron output in a single direction, which makes it much less flexible compared to LSTM.

Thus, we can say (1) no, this does not implement anything similar to the LSTM gates. The only similarity is in the improved performance on the temporal computing tasks.

Regarding (2), yes, SFA is a prominent feature of biological networks and in the discussion (L458-463, L476-480), we discuss two hypotheses that could be used to verify if the SFA is used in the biological networks in the way we predict.

Methods, lines 429-235 are key:

This explains why we see scaling factors inside the Heaviside step function earlier; These do nothing for the actual network dynamics, but scale the pseudo-gradients in some useful way. The choice to attenuate the pseudo-gradients by $\γ=0.3$ is arbitrary and mysterious. It is possible to provide some intuition about why and how training fails when $\γ$ is too large? Is it possible to provide some intuition about why the pseudo-gradients should also be normalized by the current threshold of the neuron?

We added a citation to Bellec et al., 2018 pointing to the fact that we used the same training method of that work and did not consider the effect of this parameter in this paper.

Presumably the role of the adaptation variable $a$ is also included in the BPTT derivatives? Or is it enough to adapt weights and leave the effect of this implicit? I ask because it seemed like Bellec, 2020 truncated the expansion of some of the partial derivatives, but training still worked.

We added new results that train the networks with the method of Bellec et al., 2020 (L92, L352-354, L550-557).

What training approach was used for BPTT? In my own attempts, BPTT nearly always unstable because of successive loss of precision. Dissipative systems and a mixture of time constants on disparate scales are especially problematic. Any suggestions for us mortals?

The code which implements this is already published by Bellec et al., 2018: https://github.com/IGITUGraz/LSNN-official

The code for this paper is in preparation and will be published alongside the paper.

https://doi.org/10.7554/eLife.65459.sa2

Article and author information

Author details

  1. Darjan Salaj

    Institute of Theoretical Computer Science, Graz University of Technology, Graz, Austria
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Contributed equally with
    Anand Subramoney and Ceca Kraisnikovic
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9183-5852
  2. Anand Subramoney

    Institute of Theoretical Computer Science, Graz University of Technology, Graz, Austria
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Supervision, Validation, Investigation, Methodology, Writing - original draft, Writing - review and editing
    Contributed equally with
    Darjan Salaj and Ceca Kraisnikovic
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7333-9860
  3. Ceca Kraisnikovic

    Institute of Theoretical Computer Science, Graz University of Technology, Graz, Austria
    Contribution
    Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Writing - original draft, Writing - review and editing
    Contributed equally with
    Darjan Salaj and Anand Subramoney
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0906-920X
  4. Guillaume Bellec

    1. Institute of Theoretical Computer Science, Graz University of Technology, Graz, Austria
    2. Laboratory of Computational Neuroscience, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
    Contribution
    Conceptualization, Software, Supervision, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7568-4994
  5. Robert Legenstein

    Institute of Theoretical Computer Science, Graz University of Technology, Graz, Austria
    Contribution
    Resources, Supervision, Funding acquisition, Methodology, Writing - original draft, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8724-5507
  6. Wolfgang Maass

    Institute of Theoretical Computer Science, Graz University of Technology, Graz, Austria
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Investigation, Methodology, Writing - original draft, Writing - review and editing
    For correspondence
    maass@igi.tugraz.at
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1178-087X

Funding

Horizon 2020 Framework Programme (Human Brain Project 785907)

  • Darjan Salaj
  • Anand Subramoney
  • Guillaume Bellec
  • Wolfgang Maass

Horizon 2020 Framework Programme (Human Brain Project 945539)

  • Darjan Salaj
  • Anand Subramoney
  • Guillaume Bellec
  • Wolfgang Maass

Horizon 2020 Framework Programme (SYNCH project 824162)

  • Ceca Kraisnikovic
  • Robert Legenstein

FWF Austrian Science Fund (ERA-NET CHIST-ERA programme (project SMALL project number I 4670-N))

  • Robert Legenstein

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We would like to thank Pieter Roelfsema and Christopher Summerfield for detailed comments on an earlier version of the manuscript. This research was partially supported by the Human Brain Project (Grant Agreement number 785907 and 945539), the SYNCH project (Grant Agreement number 824162) of the European Union, and under partial support by the Austrian Science Fund (FWF) within the ERA-NET CHIST-ERA programme (project SMALL, project number I 4670-N). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Quadro P6000 GPU used for this research. Computations were carried out on the Human Brain Project PCP Pilot Systems at the Juelich Supercomputing Centre, which received co-funding from the European Union (Grant Agreement number 604102) and on the Vienna Scientific Cluster (VSC).

Senior Editor

  1. Timothy E Behrens, University of Oxford, United Kingdom

Reviewing Editor

  1. Timothy O'Leary, University of Cambridge, United Kingdom

Reviewer

  1. Gabrielle Gutierrez, University of Washington, United States

Publication history

  1. Received: December 4, 2020
  2. Accepted: June 29, 2021
  3. Version of Record published: July 26, 2021 (version 1)

Copyright

© 2021, Salaj et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 485
    Page views
  • 69
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Casey Paquola et al.
    Tools and Resources Updated

    Neuroimaging stands to benefit from emerging ultrahigh-resolution 3D histological atlases of the human brain; the first of which is ‘BigBrain’. Here, we review recent methodological advances for the integration of BigBrain with multi-modal neuroimaging and introduce a toolbox, ’BigBrainWarp’, that combines these developments. The aim of BigBrainWarp is to simplify workflows and support the adoption of best practices. This is accomplished with a simple wrapper function that allows users to easily map data between BigBrain and standard MRI spaces. The function automatically pulls specialised transformation procedures, based on ongoing research from a wide collaborative network of researchers. Additionally, the toolbox improves accessibility of histological information through dissemination of ready-to-use cytoarchitectural features. Finally, we demonstrate the utility of BigBrainWarp with three tutorials and discuss the potential of the toolbox to support multi-scale investigations of brain organisation.

    1. Neuroscience
    Gabriella R Sterne et al.
    Tools and Resources Updated

    Neural circuits carry out complex computations that allow animals to evaluate food, select mates, move toward attractive stimuli, and move away from threats. In insects, the subesophageal zone (SEZ) is a brain region that receives gustatory, pheromonal, and mechanosensory inputs and contributes to the control of diverse behaviors, including feeding, grooming, and locomotion. Despite its importance in sensorimotor transformations, the study of SEZ circuits has been hindered by limited knowledge of the underlying diversity of SEZ neurons. Here, we generate a collection of split-GAL4 lines that provides precise genetic targeting of 138 different SEZ cell types in adult Drosophila melanogaster, comprising approximately one third of all SEZ neurons. We characterize the single-cell anatomy of these neurons and find that they cluster by morphology into six supergroups that organize the SEZ into discrete anatomical domains. We find that the majority of local SEZ interneurons are not classically polarized, suggesting rich local processing, whereas SEZ projection neurons tend to be classically polarized, conveying information to a limited number of higher brain regions. This study provides insight into the anatomical organization of the SEZ and generates resources that will facilitate further study of SEZ neurons and their contributions to sensory processing and behavior.