Evolving interpretable plasticity for spiking networks
Abstract
Continuous adaptation allows survival in an everchanging world. Adjustments in the synaptic coupling strength between neurons are essential for this capability, setting us apart from simpler, hardwired organisms. How these changes can be mathematically described at the phenomenological level, as socalled ‘plasticity rules’, is essential both for understanding biological information processing and for developing cognitively performant artificial systems. We suggest an automated approach for discovering biophysically plausible plasticity rules based on the definition of task families, associated performance measures and biophysical constraints. By evolving compact symbolic expressions, we ensure the discovered plasticity rules are amenable to intuitive understanding, fundamental for successful communication and humanguided generalization. We successfully apply our approach to typical learning scenarios and discover previously unknown mechanisms for learning efficiently from rewards, recover efficient gradientdescent methods for learning from target signals, and uncover various functionally equivalent STDPlike rules with tuned homeostatic mechanisms.
eLife digest
Our brains are incredibly adaptive. Every day we form memories, acquire new knowledge or refine existing skills. This stands in contrast to our current computers, which typically can only perform preprogrammed actions. Our own ability to adapt is the result of a process called synaptic plasticity, in which the strength of the connections between neurons can change. To better understand brain function and build adaptive machines, researchers in neuroscience and artificial intelligence (AI) are modeling the underlying mechanisms.
So far, most work towards this goal was guided by human intuition – that is, by the strategies scientists think are most likely to succeed. Despite the tremendous progress, this approach has two drawbacks. First, human time is limited and expensive. And second, researchers have a natural – and reasonable – tendency to incrementally improve upon existing models, rather than starting from scratch.
Jordan, Schmidt et al. have now developed a new approach based on ‘evolutionary algorithms’. These computer programs search for solutions to problems by mimicking the process of biological evolution, such as the concept of survival of the fittest. The approach exploits the increasing availability of cheap but powerful computers. Compared to its predecessors (or indeed human brains), it also uses search strategies that are less biased by previous models.
The evolutionary algorithms were presented with three typical learning scenarios. In the first, the computer had to spot a repeating pattern in a continuous stream of input without receiving feedback on how well it was doing. In the second scenario, the computer received virtual rewards whenever it behaved in the desired manner – an example of reinforcement learning. Finally, in the third ‘supervised learning’ scenario, the computer was told exactly how much its behavior deviated from the desired behavior. For each of these scenarios, the evolutionary algorithms were able to discover mechanisms of synaptic plasticity to solve the new task successfully.
Using evolutionary algorithms to study how computers ‘learn’ will provide new insights into how brains function in health and disease. It could also pave the way for developing intelligent machines that can better adapt to the needs of their users.
Introduction
How do we learn? Whether we are memorizing the way to the lecture hall at a conference or mastering a new sport, somehow our central nervous system is able to retain the relevant information over extended periods of time, sometimes with ease, other times only after intense practice. This acquisition of new memories and skills manifests at various levels of the system, with changes of the interaction strength between neurons being a key ingredient. Uncovering the mechanisms behind this synaptic plasticity is a key challenge in understanding brain function. Most studies approach this monumental task by searching for phenomenological models described by symbolic expressions that map local biophysical quantities to changes of the connection strength between cells (Figure 1A,B).
Approaches to deciphering synaptic plasticity can be broadly categorized into bottomup and topdown. Bottomup approaches typically rely on experimental data (e.g., Artola et al., 1990; Dudek and Bear, 1993; Bi and Poo, 1998; Ngezahayo et al., 2000) to derive dynamic equations for synaptic parameters that lead to functional emergent macroscopic behavior if appropriately embedded in networks (e.g., Gütig et al., 2003; Izhikevich, 2007; Clopath et al., 2010). Topdown approaches proceed in the opposite direction: from a highlevel description of network function, for example, in terms of an objective function (e.g., Toyoizumi et al., 2005; Deneve, 2008; Kappel et al., 2015; Kutschireiter et al., 2017; Sacramento et al., 2018; Göltz et al., 2019), dynamic equations for synaptic changes are derived and biophysically plausible implementations suggested. Evidently, this demarcation is not strict, as most approaches seek some balance between experimental evidence, functional considerations and model complexity. However, the relative weighting of each of these aspects is usually not made explicit in the communication of scientific results, making it difficult to track by other researchers. Furthermore, the selection of specific tasks to illustrate the effect of a suggested learning rule is usually made only after the rule was derived based on other considerations. Hence, this typically does not consider competing alternative solutions, as an exhaustive comparison would require significant additional investment of human resources. A related problem is that researchers, in a reasonable effort to use resources efficiently, tend to focus on promising parts of the search space around known solutions, leaving large parts of the search space unexplored (Radi and Poli, 2003). Automated procedures, in contrast, can perform a significantly less biased search.
We suggest an automated approach to discover learning rules in spiking neuronal networks that explicitly addresses these issues. Automated procedures interpret the search for biological plasticity mechanisms as an optimization problem (Bengio et al., 1992), an idea typically referred to as metalearning or learningtolearn. These approaches make the emphasis of particular aspects that guide this search explicit and place the researcher at the very end of the process, supporting much larger search spaces and the generation of a diverse set of hypotheses. Furthermore, they have the potential to discover domainspecific solutions that are more efficient than generalpurpose algorithms. Early experiments focusing on learning in artificial neural networks (ANNs) made use of gradient descent or genetic algorithms to optimize parameterized learning rules (Bengio et al., 1990; Bengio et al., 1992; Bengio et al., 1993) or genetic programming to evolve less constrained learning rules (Bengio et al., 1994; Radi and Poli, 2003), rediscovering mechanisms resembling the backpropagation of errors (Linnainmaa, 1970; Ivakhnenko, 1971; Rumelhart et al., 1985). Recent experiments demonstrate how optimization methods can design optimization algorithms for recurrent ANNs (Andrychowicz et al., 2016), evolve machine learning algorithms from scratch (Real et al., 2020), and optimize parametrized learning rules in neuronal networks to achieve a desired function (Confavreux et al., 2020).
We extend these metalearning ideas to discover freeform, yet interpretable plasticity rules for spiking neuronal networks. The discrete nature of spikebased neuronal interactions endows these networks with rich dynamical and functional properties (e.g., Dold et al., 2019; Jordan et al., 2019; Keup et al., 2020). In addition, with the advent of nonvon Neumann computing systems based on spiking neuronal networks with online learning capabilities (Moradi et al., 2017; Davies et al., 2018; Billaudelle et al., 2019), efficient learning algorithms for spiking systems become increasingly relevant for nonconventional computing. Here, we employ genetic programming (Figure 1C,D; Koza, 2010) as a search algorithm for two main reasons. First, genetic programming can operate on analytically tractable mathematical expressions describing synaptic weight changes that are interpretable. Second, an evolutionary search does not need to compute gradients in the search space, thereby circumventing the need to estimate a gradient in nondifferentiable systems.
We successfully apply our approach, which we refer to as ‘evolvingtolearn’ (E2L), to three different learning paradigms for spiking neuronal networks: rewarddriven, errordriven, and correlationdriven learning. For the rewarddriven task, our approach discovers new plasticity rules with efficient reward baselines that perform competively and even outperform previously suggested methods. The analytic form of the resulting expressions suggests experimental approaches that would allow us to distinguish between them. In the errordriven learning scenario, the evolutionary search discovers a variety of solutions which, with appropriate analysis of the corresponding expressions, can be shown to effectively implement stochastic gradient descent. Finally, in the correlationdriven task, our method generates a variety of STDP kernels and associated homeostatic mechanisms that lead to similar networklevel behavior. This sheds new light onto the observed variability of synaptic plasticity and thus suggests a reevaluation of the reported variety in experimentally measured STDP curves with respect to their possible functional equivalence.
Our results demonstrate the significant potential of automated procedures in the search for plasticity rules in spiking neuronal networks, analogous to the transition from handdesigned to learned features that lies at the heart of modern machine learning.
Results
Setting up an evolutionary search for plasticity rules
We introduce the following recipe to search for biophysically plausible plasticity rules in spiking neuronal networks. First, we determine a task family of interest and an associated experimental setup which includes specification of the network architecture, for example, neuron types and connectivity, as well as stimulation protocols or training data sets. Crucially, this step involves defining a fitness function to guide the evolutionary search towards promising regions of the search space. It assigns high fitness to those individuals, that is, learning rules, that solve the task well and low fitness to others. The fitness function may additionally contain constraints implied by experimental data or arising from computational considerations. We determine each individual’s fitness on various examples from the given task family, for example, different input spike train realizations, to discover plasticity rules that generalize well (Chalmers, 1991; Soltoggio et al., 2018). Finally, we specify the neuronal variables available to the plasticity rule, such as lowpassfiltered traces of pre and postsynaptic spiking activity or neuromodulator concentrations. This choice is guided by biophysical considerations, for example, which quantities are locally available at a synapse, as well as by the task family, for example, whether reward or error signals are provided by the environment. We write the plasticity rule in the general form $\mathrm{\Delta}w=\eta f(\mathrm{\dots})$, where η is a fixed learning rate, and employ an evolutionary search to discover functions $f$ that lead to high fitness.
We propose to use genetic programming (GP) as an evolutionary algorithm to discover plasticity rules in spiking neuronal networks. GP applies mutations and selection pressure to an initially random population of computer programs to artificially evolve algorithms with desired behaviors (e.g., Koza, 1992). Here, we consider the evolution of mathematical expressions. We employ a specific form of GP, Cartesian genetic programming (CGP; e.g., Miller and Thomson, 2000; Miller, 2011), that uses an indexed graph representation of programs. The genotype of an individual is a twodimensional Cartesian graph (Figure 2A, top). Over the course of an evolutionary run, this graph has a fixed number of input, output, and internal nodes. The operation of each internal node is fully described by a single function gene and a fixed number of input genes. A function table maps function genes to mathematical operations (Figure 2A, bottom), while input genes determine from where this node receives data. A given genotype is decoded into a corresponding computational graph (the phenotype, Figure 2B) which defines a function $f$. During the evolutionary run, mutations of the genotype alter connectivity and node operations, which can modify the encoded function (Figure 2C). The indirect encoding of the computational graph via the genotype supports variablelength phenotypes, since some internal nodes may not be used to compute the output (Figure 2B). The size of the genotype, in contrast, is fixed, thereby restricting the maximal size of the computational graph and ensuring compact, interpretable mathematical expressions. Furthermore, the separation into genotype and phenotype allows the buildup of ‘silent mutations’, that is, mutations in the genotype that do not alter the phenotype. These silent mutations can lead to more efficient search as they can accumulate and in combination lead to an increase in fitness once affecting the phenotype (Miller and Thomson, 2000). A $\mu +\lambda $ evolution strategy (Beyer and Schwefel, 2002) drives evolution by creating the next generation of individuals from the current one via tournament selection, mutation and selection of the fittest individuals (see section Evolutionary algorithm). Prior to starting the search, we choose the mathematical operations that can appear in the plasticity rule and other (hyper)parameters of the Cartesian graph and evolutionary algorithm. For simplicity, we consider a restricted set of mathematical operations and additionally make use of nodes with constant output. After the search has completed, for example, by reaching a target fitness value or a maximal number of generations, we analyze the discovered set of solutions.
In the following, we describe the results of three experiments following the recipe outlined above.
Evolving an efficient rewarddriven plasticity rule
We consider a simple reinforcement learning task for spiking neurons. The experiment can be succinctly described as follows: $N$ inputs project to a single readout modeled by a leaky integrator neuron with exponential postsynaptic currents and stochastic spike generation (for details see section Rewarddriven learning task). We generate a finite number $M$ of frozenPoissonnoise patterns of duration $T$ and assign each of these randomly to one of two classes. The output neuron should learn to classify each of these spatiotemporal input patterns into the corresponding class using a spike/nospike code (Figure 3A,B).
The fitness $\mathcal{F}(f)$ of an individual encoding the function $f$ is measured by the mean reward per trial averaged over a certain number of experiments ${n}_{\text{exp}}$, each consisting of $n$ classification trials
where ${R}_{k}(f):=\frac{1}{n}{\sum}_{i=1}^{n}{R}_{k,i}(f)$ is the mean reward per trial obtained in experiment $k$. The reward ${R}_{k,i}$ is the reward obtained in the $i$ th trial of experiment $k$. It is one if the classification is correct and 1 otherwise. In the following, we drop the subscripts from ${R}_{k,i}$ where its meaning is clear from context. Each experiment contains different realizations of frozennoise input spike trains with associated class labels.
Previous work on rewarddriven learning (Williams, 1992) has demonstrated that in policygradientbased approaches (e.g., Sutton and Barto, 2018), subtracting a so called ‘reward baseline’ from the received reward can improve the convergence properties by adjusting the magnitude of weight updates. However, choosing a good reward baseline is notoriously difficult (Williams, 1988; Dayan, 1991; Weaver and Tao, 2001). For example, in a model for reinforcement learning in spiking neurons, Vasilaki et al., 2009 suggest the expected positive reward as a suitable baseline. Here, we consider plasticity rules which, besides immediate rewards, have access to expected rewards. These expectations are obtained as moving averages over a number of consecutive trials (here: 100 trials, i.e., 50 s) during one experiment and can either be estimated jointly ($\overline{R}\in [1,1]$) or separately for positive (${\overline{R}}^{+}\in [0,1]$) and negative (${\overline{R}}^{}\in [1,0]$) rewards, with $\overline{R}={\overline{R}}^{+}+{\overline{R}}^{}$ (for details, see section Rewarddriven learning task). In the former case, the plasticity rule takes the general form
while for seperately estimated positive and negative rewards it takes the form
In both cases, η is a fixed learning rate and ${E}_{j}^{\text{r}}(t)$ is an eligibility trace that contains contributions from the spiking activity of pre and postsynaptic neurons which is updated over the course of a single trial (for details see section Rewarddriven learning task). The eligibility trace arises as a natural consequence of policygradient methods aiming to maximize the expected reward (Williams, 1992) and is a common ingredient of rewardmodulated plasticity rules for spiking neurons (Vasilaki et al., 2009; Frémaux and Gerstner, 2015). It is a lowpass filter of the product of two terms: the first is positive if the neuron was more active than expected by synaptic input; this can happen because the neuronal output is stochastic, to promote exploration. The second is a lowpass filter of presynaptic activity. A simple plasticity rule derived from maximizing the expected rewards would, for example, change weights according to the product of the received reward and the eligibility trace: $\mathrm{\Delta}{w}_{j}=R{E}_{j}^{\text{r}}$. If by chance a neuron is more active than expected, and the agent receives a reward, all weights of active afferents are increased, making it more likely for the neuron to fire in the future given identical input. Reward and eligibility trace values at the end of each trial ($t=T$) are used to determine synaptic weight changes. In the following, we drop the time argument of ${E}_{j}^{\text{r}}$ for simplicity. Using CGP with three ($R$, ${E}_{j}^{\text{r}},\overline{R}$), or four inputs ($R$, ${E}_{j}^{\text{r}},{\overline{R}}^{+},{\overline{R}}^{}$), respectively, we search for plasticity rules that maximize the fitness $\mathcal{F}(f)$ (Equation 1).
None of the evolutionary runs with access to the expected reward ($\overline{R}$) make use of it as a reward baseline (see Appendix section Full evolution data for different CGP hyperparameter choices for full data). Some of them discover highperforming rules identical to that suggested by Urbanczik and Senn, 2009: $\mathrm{\Delta}{w}_{j}=\eta (R1){E}_{j}^{\text{r}}$ (LR0, $\mathcal{F}=216.2$, Figure 3C,D). This rule uses a fixed base line, the maximal reward (${R}_{\text{max}}=1$), rather than the expected reward. Some runs discover a more sophisticated variant of this rule with a term that decreases the effective learning rate for negative rewards as the agent improves, that is, when the expected reward increases: $\mathrm{\Delta}{w}_{j}=\eta (1+R\overline{R})(R1){E}_{j}^{\text{r}}$ (LR1, $\mathcal{F}=234.2$, Figure 3C,D; see also Appendix section Causal and homeostatic terms over trials). Using this effective learningrate, this rule achieve higher fitness than the vanilla formulation at the expense of requiring the agent to keep track of the expected reward.
Using the expected reward as a baseline, for example, $\mathrm{\Delta}{w}_{j}=\eta (R\overline{R}){E}_{j}^{\text{r}}$, is unlikely to yield highperforming solutions: an agent may get stuck in weight configurations in which it continuously receives negative rewards, yet, as it is expecting negative rewards, does not significantly change its weights. This intuition is supported by our observation that none of the highperforming plasticity rules discovered by our evolutionary search make use of such a baseline, in contrast to previous studies (e.g., Frémaux and Gerstner, 2015). Subtracting the maximal reward, in contrast, can be interpreted as an optimistic baseline (cf. Sutton and Barto, 2018, Ch2.5), which biases learning to move away from weight configurations that result in negative rewards, while maintaining weight configurations that lead to positive rewards. However, a detrimental effect of such an optimistic baseline is that learning is sparse, as it only occurs upon receiving negative rewards, an assumption at odds with behavioral evidence.
In contrast, evolutionary runs with access to separate estimates of the negative and positive rewards discover plasticity rules with efficient baselines, resulting in increased fitness (see Appendix section Full evolution data for different CGP hyperparameter choices for the full data). In the following, we discuss four such highperforming plasticity rules with at least 10% higher fitness than LR0 (Figure 3D). We first consider the rule (LR2, $\mathcal{F}=242.0$, Figure 3D)
where we introduced the expected absolute reward ${\overline{R}}_{\text{abs}}:={\overline{R}}^{+}{\overline{R}}^{}={\overline{R}}^{+}+{\overline{R}}^{}$, with ${\overline{R}}_{\text{abs}}\in [0,1]$. Note the difference to the expected reward $\overline{R}={\overline{R}}^{+}+{\overline{R}}^{}$. Since the absolute magnitude of positive and negative rewards is identical in the considered task, ${\overline{R}}_{\text{abs}}$ increases in each trial, starting at zero and slowly converging to one with a time constant of 50 s. Instead of keeping track of the expected reward, the agent can thus simply optimistically increase its baseline with each trial. Behind this lies the, equally optimistic, expectation that the agent improves its performance over trials. Starting out as $R{E}_{j}^{\text{r}}$ and converging to $(R1){E}_{j}^{\text{r}}$ this rule combines efficient learning from both positive and negative rewards initially, with improved convergence towards successful weight configuration during late learning by a rewarddependent modulation of the effective learning rate (see also Appendix section Causal and homeostatic terms over trials). Note that such a strategy may lead to issues with un or relearning. This may be overcome by the agent resetting the expected absolute reward ${\overline{R}}_{\text{abs}}$ upon encountering a new task, similar to a ‘novelty’ signal.
Furthermore, our algorithm discovers a variation of this rule (LR3, $\mathcal{F}=256.0$, Figure 3D), which replaces η with $\eta /(1+{\overline{R}}^{+})$ to decrease the magnitude of weight changes in regions of the weight space associated with high performance. This can improve convergence properties.
We next consider the rule (LR4, $\mathcal{F}=247.2$, Figure 3D):
This rule has the familiar form of LR0 and LR1, with an additional homeostatic term. Due to the prefactors $R1$, this rule only changes weights on trials with negative reward. Initially, the expected reward ${\overline{R}}^{+}$ is close to zero and the homeostatic term results in potentiation of all synapses, causing more and more neurons to spike. In particular, if initial weights are chosen poorly, this can make learning more robust. As the agent improves and the expected positive rewards increases, the homeostatic term becomes negative (see also Appendix section Causal and homeostatic terms over trials). In this regime, it can be interpreted as pruning all weights until only those are left that do not lead to negative rewards. This term can hence be interpreted as an adapting action baseline (Sutton and Barto, 2018).
Finally, we consider the rule (LR5, $\mathcal{F}=254.8$, Figure 3D):
To analyze this seemingly complex rule, we consider the expression for trials with positive and trials with negative reward separately:
Both expressions contain a ‘causal’ term depending on pre and postsynaptic activity via the eligibility trace, as well as, and a ‘homeostatic’ term. Aside from the constant scaling factor, the causal term of $\mathrm{\Delta}{w}_{j}^{+}$ is identical to LR2 (Equation 4), that is, it only causes weight changes early during learning, and converges to zero for later times. Similarly, the causal term of $\mathrm{\Delta}{w}_{j}^{}$ is initially identical to that of LR2 (Equation 4), decreasing weights for connections contributing to wrong decisions. However it increases in magnitude as the agent improves and the expected reward increases. The homeostatic term of $\mathrm{\Delta}{w}_{j}^{+}$ is potentiating, similarly to LR4 (Equation 5): it encourages spiking by increasing all synaptic weights during early learning and decreases over time. The homeostatic term for negative rewards is also potentiating, but does not vanish for long times unless the agent is performing perfectly (${\overline{R}}^{}\to 0$). Over time, this plasticity rule hence reacts less and less to positive rewards, while increasing weight changes for negative rewards. The rewardmodulated potentiating homeostatic mechanisms can prevent synaptic weights from vanishing and thus encourage exploration for challenging scenarios in which the agent mainly receives negative rewards.
In conclusion, by making use of the separately estimated expected negative and positive rewards in precise combinations with the eligibility trace and the instantaneous reward, our evolvingtolearn approach discovered a variety of rewardbased plasticity rules, many of them outperforming previously known solutions (e.g., Urbanczik and Senn, 2009). The evolution of closedform expressions allowed us to analyze the computational principles that allow these newly discovered rules to achieve high fitness. This analysis suggests new mechanisms for efficient learning, for example from ‘novelty’ and via rewardmodulated homeostatic mechanisms. Each of these new hypotheses for rewarddriven plasticity rules makes specific predictions about behavioral and neuronal signatures that potentially allow us to distinguish between them. For example LR2, LR3, and LR5 suggest that agents initially learn both from positive and negative rewards, while later they mainly learn from negative rewards. In particular the initial learning from positive rewards distinguishes these hypotheses from LR0, LR1, and LR4, and previous work (Urbanczik and Senn, 2009). As LR2 does not make use of the, separately estimated, expected rewards, it is potentially employed in settings in which such estimates are difficult to obtain. Furthermore, LR4 and LR5 suggest that precisely regulated homeostatic mechanisms play a crucial role besides weight changes due to pre and postsynaptic activity traces. During early learning, both rules implement potentiating homeostatic mechanisms triggered by negative rewards, likely to explore many possible weight configurations which may support successful behavior. During late learning, LR4 suggests that homeostatic changes become depressing, thus pruning unnecessary or even harmful connections. In contrast, they remain positive for LR5, potentially avoiding catastrophic dissociation between inputs and outputs for challenging tasks. Besides experimental data from the behavioral and neuronal level, different artificial rewardlearning scenarios could further further select for strengths or against weaknesses of the discovered rules. Furthermore, additional considerations, for example achieving small variance in weight updates (Williams, 1986; Dayan, 1991), may lead to particular rules being favored over others. We thus believe that our new insights into reinforcement learning are merely a forerunner of future experimental and theoretical work enabled by our approach.
Evolving an efficient errordriven plasticity rule
We next consider a supervised learning task in which a neuron receives information about how far its output is from the desired behavior, instead of just a scalar reward signal as in the previous task. The widespread success of this approach in machine learning highlights the efficacy of learning from errors compared to correlation or rewarddriven learning (Goodfellow et al., 2016). It has therefore often been hypothesized that evolution has installed similar capabilities in biological nervous systems (see, e.g. Marblestone et al., 2016; Whittington and Bogacz, 2019).
Urbanczik and Senn, 2014 introduced an efficient, flexible, and biophysically plausible implementation of errordriven learning via multicompartment neurons. For simplicity, we consider an equivalent formulation of this learning principle in terms of two point neurons modeled as leaky integrator neurons with exponential postsynaptic currents and stochastic spike generation. One neuron mimics the somatic compartment and provides a teaching signal to the other neuron acting as the dendritic compartment. Here, the difference between the teacher and student membrane potentials drives learning:
where $v$ is the teacher potential, $u$ the student membrane potential, and η a fixed learning rate. ${\overline{s}}_{j}(t)=(\kappa *{s}_{j})(t)$ represents the the presynaptic spike train s_{j} filtered by the synaptic kernel κ. Equation 7 can be interpreted as stochastic gradient descent on an appropriate cost function (Urbanczik and Senn, 2014) and can be extended to support credit assignment in hierarchical neuronal networks (Sacramento et al., 2018). For simplicity, we assume the student has direct access to the teacher’s membrane potential, but in principle one could also employ proxies such as firing rates as suggested in Pfister et al., 2010; Urbanczik and Senn, 2014.
We consider a onedimensional regression task in which we feed random Poisson spike trains into the two neurons (Figure 4A).
The teacher maintains fixed input weights while the student’s weights should be adapted over the course of learning such that its membrane potential follows the teacher’s (Figure 4B,C). The fitness $\mathcal{F}(f)$ of an individual encoding the function $f$ is measured by the root meansquared error between the teacher and student membrane potential over the simulation duration $T$, excluding the initial 10%, averaged over ${n}_{\text{exp}}$ experiments:
Each experiment consists of different input spike trains and different teacher weights. The general form of the plasticity rule for this errordriven learning task is given by:
Using CGP with three inputs ($v,u,{\overline{s}}_{j}$), we search for plasticity rules that maximize the fitness $\mathcal{F}(f)$.
Starting from low fitness, about half of the evolutionary runs evolve efficient plasticity rules (Figure 4D) closely matching the performance of the stochastic gradient descent rule of Urbanczik and Senn, 2014. While two runs evolve exactly Equation 7, other solutions with comparable fitness are discovered, such as
At first sight, these rules may appear more complex, but for the considered parameter regime under the assumptions $v\approx u;v,u\gg 1$, we can write them as (see Appendix section Errordriven learning – simplification of the discovered rules):
with a multiplicative constant ${c}_{1}=\mathcal{O}(1)$ and a negligible additive constant c_{2}. Elementary manipulations of the expressions found by CGP thus demonstrate the similarity of these superficially different rules to Equation 7. Consequently, they can be interpreted as approximations of gradient descent. The constants generally fall into two categories: finetuning of the learning rate for the set of task samples encountered during evolution (c_{1}), which could be responsible for the slight increase in performance, and factors that have negligible influence and would most likely be pruned over longer evolutionary timescales (c_{2}). This pruning could be accelerated, for example, by imposing a penalty on the model complexity in the fitness function, thus preferentially selecting simpler solutions.
In conclusion, the evolutionary search rediscovers variations of a humandesigned efficient gradientdescentbased learning rule for the considered errordriven learning task. Due to the compact, interpretable representation of the plasticity rules we are able to analyze the set of highperforming solutions and thereby identify phenomenologically identical rules despite their superficial differences.
Evolving an efficient correlationdriven plasticity rule
We now consider a task in which neurons do not receive any feedback from the environment about their performance but instead only have access to correlations between pre and postsynaptic activity. Specifically, we consider a scenario in which an output neuron should discover a repeating frozennoise pattern interrupted by random background spikes using a combination of spiketimingdependent plasticity and homeostatic mechanisms. Our experimental setup is briefly described as follows: $N$ inputs project to a single output neuron (Figure 5A).
The activity of all inputs is determined by a Poisson process with a fixed rate. A frozennoise activity pattern of duration $T}_{\text{pattern}}\phantom{\rule{thinmathspace}{0ex}}\mathrm{m}\mathrm{s$ is generated once and replayed every ${T}_{\text{inter}}\mathrm{ms}$ (Figure 5B) while inputs are randomly spiking in between.
We define the fitness $\mathcal{F}(f)$ of an individual encoding the function $f$ by the minimal average signaltonoise ratio ($\mathrm{SNR}$) across ${n}_{\text{exp}}$ experiments:
The signaltonoise ratio ${\mathrm{SNR}}_{k}$, following Masquelier, 2018, is defined as the difference between the maximal free membrane potential during pattern presentation averaged over multiple presentations ($\u27e8{u}_{k,i,\text{max}}\u27e9$) and the mean of the free membrane potential in between pattern presentations ($\u27e8{u}_{k,\text{inter}}\u27e9$) divided by its variance ($\mathrm{Var}({v}_{k,\mathrm{inter}})$):
The free membrane potential is obtained in a separate simulation with frozen weights by disabling the spiking mechanism for the output neuron. This removes measurement noise in the signaltonoise ratio arising from spiking and subsequent membranepotential reset. Each experiment consists of different realizations of a frozennoise pattern and background spiking.
We evolve learning rules of the following general form, which includes a dependence on the current synaptic weight in line with previously suggested STDP rules (Gütig et al., 2003):
Here, ${E}_{j}^{\text{c}}:={\mathrm{e}}^{\mathrm{\Delta}{t}_{j}/\tau}$ represents an eligibility trace that depends on the relative timing of post and presynaptic spiking ($\mathrm{\Delta}{t}_{j}={t}_{\mathrm{post}}{t}_{\mathrm{pre},j}$) and is represented locally in each synapse (e.g., Morrison et al., 2008). η represents a fixed learning rate. The synaptic weight is bound such that ${w}_{j}\in [0,1]$. We additionally consider weightdependent homeostatic mechanisms triggered by pre and postsynaptic spikes, respectively. These are implemented by additional functions of the general form:
Weight changes are determined jointly by Equation 15 and Equation 16 as $\mathrm{\Delta}{w}_{j}=\mathrm{\Delta}{w}_{j}^{\text{STDP}}+\mathrm{\Delta}{w}^{\mathrm{hom}}$. Using CGP, we search for functions ${f}_{\text{dep}}$, ${f}_{\text{fac}}$, ${f}_{\text{pre}}^{\text{hom}}$, and ${f}_{\text{post}}^{\text{hom}}$ that maximize the fitness $\mathcal{F}({f}_{\text{dep}},{f}_{\text{fac}})$ (Equation 13).
As a baseline we consider a rule described by Masquelier, 2018 (Figure 5C). It is a simple additive spiketimingdependent plasticity (STDP) rule that replaces the depression branch of traditional STDP variants with a postsynaptically triggered constant homeostatic term ${w}^{\mathrm{hom}}<0$ (Kempter et al., 1999). The synaptic weight of the projection from input $j$ changes according to (Figure 5G):
with homeostatic mechanisms:
To illustrate the result of synaptic plasticity following Equation 17 and Equation 18, we consider the evolution of the membrane potential of an output neuron over the course of learning (Figure 5C). While the target neuron spikes randomly at the beginning of learning, its membrane potential finally stays subthreshold in between pattern presentations and crosses the threshold reliably upon pattern presentation.
After 2000 generations, half of the runs of the evolutionary algorithm discover highfitness solutions (Figure 5D). These plasticity rules lead to synaptic weight configurations which cause the neuron to reliably detect the frozennoise pattern. From these wellperforming learning rules, we pick two representative examples (Figure 5D,E) to analyze in detail. Learning rule 1 (LR1, Figure 5D) is defined by the following equations:
Learning rule 2 (LR2, Figure 5E) is defined by the following equations:
The form of these discovered learning rules and associated homeostatic mechanisms suggests that they use distinct strategies to detect the repeated spatiotemporal pattern. LR1 causes potentiation for small time differences, regardless of whether they are causal or anticausal (note that $({w}_{j}1)\ge 0$ since ${w}_{j}\in [0,1]$). In the Hebbian spirit, this learning rule favors correlation between presynaptic and postsynaptic firing. Additionally, it potentiates synaptic weights upon presynaptic spikes, and depresses them for each postsynaptic spike. In contrast, LR2 implements a similar strategy as the learning rule of Masquelier, 2018: it potentiates synapses only for small, positive (causal) time differences. Additionally, however, it pronouncedly punishes anticausal interactions. Similarly to LR1, its homeostatic component potentiates synaptic weights upon presynaptic spikes, and depresses them for each postsynaptic spike.
Note how both rules reproduce important components of experimentally established STDP traces (e.g., Caporale and Dan, 2008). Despite their differences both in the form of the STDP kernel as well as the associated homeostatic mechanisms, both rules lead to high fitness, that is, comparable systemlevel behavior.
Unlike the classical perception of homeostatic mechanisms as merely maintaining an ideal working point of neurons (Davis and Bezprozvanny, 2001), in both discovered plasticity rules these components support the computational goal of detecting the repeated pattern. By potentiating large weights more strongly than small weights, the presynaptically triggered homeostatic mechanisms support the divergence of synaptic weights into strong weights, related to the repeated pattern, and weak ones, providing background input. This observation suggests that homeostatic mechanisms and STDP work hand in hand to achieve desired functional outcomes, similar to homeostatic terms in stabilized Hebbian rules (Oja, 1982; Miller and MacKay, 1994). Experimental approaches hence need to take both factors into account and variations in observed STDP curves should be reconsidered from a point of functional equivalence when paired with data on homeostatic changes.
In conclusion, for the correlationdriven task, the evolutionary search discovered a wide variety of plasticity rules with associated homeostatic mechanisms supporting successful task learning, thus enabling new perspectives for learning in biological substrates.
Discussion
Uncovering the mechanisms of learning via synaptic plasticity is a critical step toward understanding brain (dys)function and building truly intelligent, adaptive machines. We introduce a novel approach to discover biophysically plausible plasticity rules in spiking neuronal networks. Our metalearning framework uses genetic programming to search for plasticity rules by optimizing a fitness function specific to the respective task family. Our evolvingtolearn approach discovers highperforming solutions for various learning paradigms, rewarddriven, errordriven, and correlationdriven learning, yielding new insights into biological learning principles. Moreover, our results from the rewarddriven and correlationdriven task families demonstrate that homeostatic terms and their precise interation with plasticity play an important role in shaping network function, highlighting the importance of considering both mechanisms jointly.
The experiments considered here were mainly chosen due to their simplicity and prior knowledge about corresponding plasticity rules that provided us with a highperformance reference for comparison. Additionally, in each experiment, we restricted ourselves to a constrained set of possible inputs to the plasticity rule. Here, we chose quantities which have been previously shown to be linked to synaptic plasticity in various learning paradigms, such as reward, lowpass filtered spike trains, and correlations between pre and postsynaptic activities. This prior knowledge avoids requiring the evolutionary algorithm to rediscover these quantities but limits the search space, thus potentially excluding other efficient solutions.
A key point of E2L is the compact representation of the plasticity rules. We restrict the complexity of the expressions by three considerations. First, we assume that effective descriptions of weight changes can be found that are not unique to each individual synapse. This is a common assumption in computational neuroscience and based on the observation that nature must have found a parsimonious encoding of brain structure, as not every connection in the brain can be specified in the DNA of the organism (Zador, 2019); rather, genes encode general principles by which the neuronal networks and subnetworks are organized and reorganized (Risi and Stanley, 2010). Our approach aims at discovering such general principles for synaptic plasticity. Second, physical considerations restrict the information available to the plasticity rule to local quantities, such as pre and postsynaptic activity traces or specific signals delivered via neuromodulators (e.g., Cox and Witten, 2019; Miconi et al., 2020). Third, we limit the maximal size of the expressions to keep the resulting learning rules interpretable and avoid overfitting.
We explicitly want to avoid constructing an opaque system that has high task performance but does not allow us to understand how the network structure is shaped over the course of learning. Since we obtain analytically tractable expressions for the plasticity rule, we can analyze them with conventional methods, in contrast to approaches representing plasticity rules with ANNs (e.g., Risi and Stanley, 2010; Orchard and Wang, 2016; Bohnstingl et al., 2019), for which it is challenging to fully understand their macroscopic computation. This analysis generates intuitive understanding, facilitating communication and humanguided generalization from a set of solutions to different network architectures or task domains. In the search for plasticity rules suitable for physical implementations in biological systems, these insights are crucial as the identified plasticity mechanisms can serve as building blocks for learning rules that generalize to the actual challenges faced by biological agents. Rather than merely applying the discovered rules to different learning problems, researchers may use the analytic expressions and prior knowledge to distill general learning principles – such as the computational role of homeostasis emerging from the present work – and combine them in new ways to extrapolate beyond the task families considered in the evolutionary search. Therefore, our evolvingtolearn approach is a new addition to the toolset of the computational neuroscientist in which human intuition is paired with efficient search algorithms. Moreover, simple expressions highlight the key interactions between the local variables giving rise to plasticity, thus providing hints about the underlying biophysical processes and potentially suggesting new experimental approaches.
From a different perspective, while the learning rules found in the experiments described above were all evolved from random expressions, one can also view the presented framework as a hypothesistesting machine. Starting from a known plasticity rule, our framework would allow researchers to address questions like: assuming the learning rule would additionally have access to variable $x$, could this be incorporated into the weight updates such that learning would improve? The automated procedure makes answering such questions much more efficient than a humanguided manual search. Additionally, the framework is suitable to find robust biophysically plausible approximations for complex learning rules containing quantities that might be nonlocal, difficult to compute, and/or hard to implement in physical substrates. In particular, multiobjective optimization is suitable to evolve a known, complex rule into simpler versions while maintaining high task performance. Similarly, one could search for modifications of general rules that are purposefully tuned to quickly learn within a specific task family, outperforming more general solutions. In each of these cases, prior knowledge about effective learning algorithms provides a starting point from which the evolutionary search can discover powerful extensions.
The automated search can discover plasticity rules for a given problem that exploit implicit assumptions in the task. It therefore highlights underconstrained searches, be this due to scarcity of biological data, the simplicity of chosen tasks or the omission of critical features in the task design. For instance, without asserting equal average spike rates of background and pattern neurons in the correlationdriven task, one could discover plasticity rules that exploit the rate difference rather than the spatiotemporal structure of the input.
Evolved Plastic Artificial Neural Networks (EPANNs; e.g., Soltoggio et al., 2018) and in particular adaptive HyperNEAT (Risi and Stanley, 2010), represent an alternative approach to designing plastic neural networks. In contrast to our method, however, these approaches include the network architecture itself into the evolutionary search, alongside synaptic plasticity rules. While this can lead to highperformance solutions due to a synergy between network architecture and plasticity, this interplay has an important drawback, as in general it is difficult to tease apart the contribution of plasticity from that of network structure to high task performance (cf. Gaier and Ha, 2019). In addition, the distributed, implicit representation of plasticity rules in HyperNEAT can be difficult to interpret, which hinders a deeper understanding of the learning mechanisms. In machinelearningoriented applications, this lack of credit assignment is less of an issue. For research into plasticity rules employed by biological systems, however, it presents a significant obstacle.
Future work needs to address a general issue of any optimization method: how can we systematically counter overfitting to reveal general solutions? A simple approach would increase the number of sample tasks during a single fitness evaluation. However, computational costs increase linearly in the number of samples. Another technique penalizes the complexity of the resulting expressions, for example, proportional to the size of the computational graph. Besides avoiding overfitting, such a penalty would automatically remove ‘null terms’ in the plasticity rules, that is, trivial subexpressions which have no influence on the expressions’ output. Since it is a priori unclear how this complexity penalty should be weighted against the original fitness measures, one should consider multiobjective optimization algorithms (e.g., Deb, 2001).
Another issue to be addressed in future work is the choice of the learning rate. Currently, this value is not part of the optimization process and all tasks assume a fixed learning rate. The analysis of the reward and errordriven learning rules revealed that the evolutionary algorithm tried to optimize the learning rate using the variables it had access to, partly generating complex terms that that amount to a variable scaling of the learning rate. The algorithm may benefit from the inclusion of additional constants which it could, for example, use for an unmitigated, permanent scaling of the learning rate. However, the dimensionality of the search space scales exponentially in the number of operators and constants, and the feasibility of such an approach needs to be carefully evaluated. One possibility to mitigate this combinatorial explosion is to combine the evolutionary search with gradientbased optimization methods that can finetune constants in the expressions (Topchy and Punch, 2001; Izzo et al., 2017).
Additionally, future work may involve less preprocessed data as inputs while considering more diverse mathematical operators. In the correlationdriven task, one could for example provide the raw times of pre and postsynaptic spiking to the graph instead of the exponential of their difference, leaving more freedom for the evolutionary search to discover creative solutions. We expect particularly interesting applications of our framework to involve more complex tasks that are challenging for contemporary algorithms, such as lifelong learning, which needs to tackle the issue of catastrophic forgetting (French, 1999) or learning in recurrent spiking neuronal networks. In order to yield insights into information processing in the nervous system, the design of the network architecture should be guided by known anatomical features, while the considered task families should fall within the realm of ecologically relevant problems.
The evolutionary search for plasticity rules requires a large number of simulations, as each candidate solution needs to be evaluated on a sufficiently large number of samples from the task family to encourage generalization (e.g., Chalmers, 1991; Bengio et al., 1992). Due to silent mutations in CGP, that is, modifications of the genotype that do not alter the phenotype, we use caching methods to significantly reduce computational cost as only new solutions need to be evaluated. However, even employing such methods, the number of required simulations remains large, in the order of ${10}^{3}{10}^{4}$ per evolutionary run. For the experiments considered here, the computational costs are rather low, requiring $2448$ node hours for a few parallel runs of the evolutionary algorithms, easily within reach of a modern workstation. The total time increases linearly with the duration of a single simulation. When considering more complex tasks which would require larger networks and hence longer simulations, one possibility to limit computational costs would be to evolve scalable plasticity rules in simplified versions of the tasks and architectures. Such rules, quickly evolved, may then be applied to individual instances of the original complex tasks, mimicking the idea of ‘evolutionary hurdles’ that avoid wasting computational power on lowquality solutions (So et al., 2019; Real et al., 2020). A proof of concept for such an approach is the delta rule: originally in used in smallscale tasks, it has demonstrated incredible scaling potential in the context of error backpropagation. Similar observations indeed hold for evolved optimizers (Metz et al., 2020).
Neuromorphic systems – dedicated hardware specifically designed to emulate neuronal networks – provide an attractive way to speed up the evolutionary search. To serve as suitable substrates for the approach presented here, these systems should be able to emulate spiking neuronal networks in an accelerated fashion with respect to real time and provide onchip plasticity with a flexible specification of plasticity mechanisms (e.g., Davies et al., 2018; Billaudelle et al., 2019; Mayr et al., 2019).
We view the presented methods as a machinery for generating, testing, and extending hypotheses on learning in spiking neuronal networks driven by problem instances and prior knowledge and constrained by experimental evidence. We believe this approach holds significant promise to accelerate progress toward deep insights into information processing in physical systems, both biological and biologically inspired, with immanent potential for the development of powerful artificial learning machines.
Materials and methods
Evolutionary algorithm
Request a detailed protocolWe use a $\mu +\lambda $ evolution strategy (Beyer and Schwefel, 2002) to evolve a population of individuals towards high fitness. In each generation, λ new offsprings are created from μ parents via tournament selection (e.g., Miller and Goldberg, 1995) and subsequent mutation. From these $\mu +\lambda $, individuals the best μ individuals are selected as parents for the next generation (Alg. 4.1). In this work, we use a tournament size of one and a fixed mutation probability ${p}_{\text{mutate}}$ for each gene in an offspring individual. Since in CGP crossover of individuals can lead to significant disruption of the search process due to major changes in the computational graphs (Miller, 1999), we avoid it here. In other words, new offspring are only modified by mutations. We use neutral search (Miller and Thomson, 2000), in which an offspring is preferred over a parent with equal fitness, to allow the accumulation of silent mutations that can jointly lead to an increase in fitness. As it is computationally infeasible to exhaustively evaluate an individual on all possible tasks from a task family, we evaluate individuals only on a limited number of sample tasks and aggregate the results into a scalar fitness, either by choosing the minimal result or averaging. We manually select the number of sample tasks to balance computational costs and sampling noise for each task. In each generation, we use the same initial conditions to allow a meaningful comparison of results across generations. If an expression is encountered that cannot be meaningfully evaluated, such as division by zero, the corresponding individual is assigned a fitness of $\mathrm{\infty}$.
Algorithm 1: Variant of $\mu +\lambda $ evolution strategies used in this study. Note the absence of a crossover step. 

Data: Initial random parent Population ${P}_{0}=\{{p}_{i}\}$ of size μ $t\leftarrow 0$ while $t<{n}_{\mathrm{generations}}$ do Create new offspring population ${Q}_{t}=\text{CreateOffspringPopulation}({P}_{t})$ Combine parent + offspring populations ${R}_{t}={P}_{t}\cup {Q}_{t}$ Evaluate fitness of each individual in ${R}_{t}$ Pick ${P}_{t+1}\subset {R}_{t}$ best individuals as new parents $t\leftarrow t+1$ end Function CreateOffspringPopulation ($P$) begin Offspring population $Q=\{\}$ while $Q<\lambda $ do Choose random subset of $P$ of size ${N}_{\mathrm{tournament}}$ Choose best individual in the subset and append to $Q$ end for ${q}_{i}\in Q$ do Mutate each gene of $q}_{i$ with mutation probability ${p}_{\text{mutation}}$ end Return $Q$ end 
HALCGP
Request a detailed protocolHALCGP (Schmidt and Jordan, 2020, https://github.com/HappyAlgorithmsLeague/halcgp, Jordan, 2021b) is an extensible pure Python library implementing Cartesian genetic programming to represent, mutate and evaluate populations of individuals encoding symbolic expressions targeting applications with computationally expensive fitness evaluations. It supports the translation from a CGP genotype, a twodimensional Cartesian graph, into the corresponding phenotype, a computational graph implementing a particular mathematical expression. These computational graphs can be exported as pure Python functions, NumPycompatible functions (van der Walt et al., 2011), SymPy expressions (Meurer et al., 2017) or PyTorch modules (Paszke et al., 2019). Users define the structure of the twodimensional graph from which the computational graph is generated. This includes the number of inputs, columns, rows, and outputs, as well as the computational primitives, that is, mathematical operators and constants, that compose the mathematical expressions. Due to the modular design of the library, users can easily implement new operators to be used as primitives. It supports advanced algorithmic features, such as shuffling the genotype of an individual without modifying its phenotype to introduce additional drift over plateus in the search space and hence lead to better exploration (Goldman and Punch, 2014). The library implements a $\mu +\lambda $ evolution strategy to evolve individuals (see section Evolutionary algorithm). Users need to specify hyperparameters for the evolutionary algorithm, such as the size of parent and offspring populations and the maximal number of generations. To avoid reevaluating phenotypes that have been previously evaluated, the library provides a mechanism for caching results on disk. Exploiting the wide availability of multicore architectures, the library can parallelize the evaluation of all individuals in a single generation via separate processes.
NEST simulator
Request a detailed protocolSpiking neuronal network simulations are based on the 2.16.0 release of the NEST simulator (Gewaltig and Diesmann, 2007, https://github.com/nest/nestsimulator; Eppler, 2021 commit 3c6f0f3). NEST is an opensource simulator for spiking neuronal networks with a focus on large networks with simple neuron models. The computationally intensive propagation of network dynamics is implemented in C++ while the network model can be specified using a Python API (PyNEST; Eppler et al., 2008; Zaytsev and Morrison, 2014). NEST profits from modern multicore and multinode systems by combining local parallelization with OpenMP threads and internode communication via the Message Passing Interface (MPI) (Jordan et al., 2018). The standard distribution offers a variety of established neuron and plastic synapse models, including variants of spiketimingdependent plasticity, rewardmodulated plasticity and structural plasticity. New models can be implemented via a domainspecific language (Plotnikov et al., 2016) or custom C++ code. For the purpose of this study, we implemented a rewarddriven (Urbanczik and Senn, 2009) and an errordriven learning rule (Equation 7; Urbanczik and Senn, 2014), as well as a homeostatic STDP rule (Equation 17; Masquelier, 2018) via custom C++ code. Due to the specific implementation of spike delivery in NEST, we introduce a constant in the STDP rule that is added at each potentiation call instead of using a separate depression term. To support arbitrary mathematical expressions in the errordriven (Equation 9) and correlationdriven synapse models (Equation 15), we additionally implemented variants in which the weight update can be specified via SymPy compatible strings (Meurer et al., 2017) that are parsed by SymEngine (https://github.com/symengine/symengine; SymEngine Contributors, 2021) a C++ library for symbolic computation. All custom synapse models and necessary kernel patches are available as NEST modules in the repository accompanying this study (https://github.com/HappyAlgorithmsLeague/e2lcgpsnn (copy archived at swh:1:rev:2f370ba6ec46a46cf959afcc6c1c1051394cd02a), Jordan, 2021a).
Computing systems
Request a detailed protocolExperiments were performed on JUWELS (Jülich Wizard for European Leadership Science), an HPC system at the Jülich Research Centre, Jülich, Germany, with 12 Petaflop peak performance. The system contains 2271 generalpurpose compute nodes, each equipped with two Intel Xeon Platinum 8168 processors (2×24 cores) and 12×8 GB main memory. Compute nodes are connected via an EDRInfiniband fattree network and run CentOS 7. Additional experiments were performed on the multicore partition of Piz Daint, an HPC system at the Swiss National Supercomputing Centre, Lugano, Switzerland with 1.731 Petaflops peak performance. The system contains 1813 generalpurpose compute nodes, each equipped with two Intel Xeon E52695 v4 processors (2×18 cores) and 64 GB main memory. Compute nodes are connected via Cray Aries routing and communications ASIC with Dragonfly network topology and run Cray Linux Environment (CLE). Each experiment employed a single compute node.
Rewarddriven learning task
Request a detailed protocolWe consider a reinforcement learning task for spiking neurons inspired by Urbanczik and Senn, 2009. Spiking activity of the output neuron is generated by an inhomogeneous Poisson process with instantaneous rate $\varphi $ determined by its membrane potential $u$ (Pfister et al., 2006; Urbanczik and Senn, 2009):
Here, ρ is the firing rate at threshold, ${u}_{\text{th}}$ the threshold potential, and $\mathrm{\Delta}u$ a parameter governing the noise amplitude. In contrast to Urbanczik and Senn, 2009, we consider an instantaneous reset of the membrane potential after a spike instead of an hyperpolarization kernel. The output neuron receives spike trains from sources randomly drawn from an input population of size $N$ with randomly initialized weights (${w}_{\text{initial}}\sim \mathcal{N}(0,{\sigma}_{w})$). Before each pattern presentation, the output neurons membrane potential and synaptic currents are reset.
The eligibility trace in every synapse is updated in continuous time according to the following differential equation (Urbanczik and Senn, 2009; Frémaux and Gerstner, 2015):
where ${\tau}_{\text{M}}$ governs the time scale of the eligibility trace and has a similar role as the decay parameter γ in policygradient methods (Sutton and Barto, 2018), $\mathrm{\Delta}u$ is a parameter of the postsynaptic cell governing its noise amplitude, $y$ represents the postsynaptic spike train, and ${\overline{s}}_{j}(t)=(\kappa *{s}_{j})(t)$ the presynaptic spike train s_{j} filtered by the synaptic kernel κ. The learning rate η was manually tuned to obtain high performance with the one suggested by Urbanczik and Senn, 2009. Expected positive and negative rewards in trial $i$ are separately calculated as moving averages over previous trials (Vasilaki et al., 2009):
where ${m}_{\text{r}}$ determines the number of relevant previous trials and ${[x]}_{+}:=\text{max}(0,x),{[x]}_{}:=\text{min}(0,x)$. Note that ${\overline{R}}^{+}\in [0,1]$ and ${\overline{R}}^{}\in [1,0]$, since $R\in \{1,1\}$. We obtain the average reward as a sum of these separate estimates $\overline{R}={\overline{R}}^{+}+{\overline{R}}^{};\overline{R}\in [1,1]$, while the expected absolute reward is determined by their difference ${\overline{R}}_{\text{abs}}={\overline{R}}^{+}{\overline{R}}^{};{\overline{R}}_{\text{abs}}\in [0,1]$.
Errordriven learning task
Request a detailed protocolWe consider an errordriven learning task for spiking neurons inspired by Urbanczik and Senn, 2014. $N$ Poisson inputs with constant rates (${r}_{i}\sim \mathcal{U}[{r}_{\text{min}},{r}_{\text{max}}],i\in [1,N]$) project to a teacher neuron and, with the same connectivity pattern, to a student neuron. As in section Rewarddriven learning task, spiking activity of the output neuron is generated by an inhomogeneous Poisson process. In contrast to section Rewarddriven learning task, the membrane potential is not reset after spike emission. Fixed synaptic weights from the inputs to the teacher are uniformly sampled from the interval $[{w}_{\text{min}},{w}_{\text{max}}]$, while weights to the student are all initialized to a fixed value w_{0}. In each trial we randomly shift all teacher weights by a global value ${w}_{\text{shift}}$ to avoid a bias in the error signal that may arise if the teacher membrane potential is initially always larger or always smaller than the student membrane potential. Target potentials are read out from the teacher every $\delta t$ and provided instantaneously to the student. The learning rate η was chosen via grid search on a single example task for high performance with Equation 7. Similar to Urbanczik and Senn, 2014, we lowpass filter weight updates with an exponential kernel with time constant ${\tau}_{\text{I}}$ before applying them.
Correlationdriven learning task
Request a detailed protocolWe consider a correlationdriven learning task for spiking neurons similar to Masquelier, 2018: a spiking neuron, modeled as a leaky integrateandfire neuron with deltashaped postsynaptic currents, receives stochastic spike trains from $N$ inputs via plastic synapses.
To construct the input spike trains, we first create a frozennoise pattern by drawing random spikes ${\mathcal{S}}_{i}^{\mathrm{pattern}}\in [0,{T}_{\mathrm{pattern}}],i\in [0,N1]$ from a Poisson process with rate ν. Neurons that fire at least once in this pattern are in the following called ‘pattern neurons’, the remaining are called ‘background neurons’. We alternate this frozennoise pattern with random spike trains of length ${T}_{\mathrm{inter}}$ generated by a Poisson process with rate ν (Figure 5B). To balance the average rates of pattern neurons and background neurons, we reduce the spike rate of pattern neurons in between patterns by a factor α. Background neurons have an average rate of ${\nu}_{\mathrm{inter}}=\nu \frac{{T}_{\mathrm{inter}}}{{T}_{\mathrm{inter}}+{T}_{\mathrm{pattern}}}$. We assume that pattern neurons spike only once during the pattern. Thus, they have an average rate of rate of $\nu =\alpha {\nu}_{\mathrm{inter}}+\frac{1}{{T}_{\mathrm{inter}}+{T}_{\mathrm{pattern}}}=\alpha {\nu}_{\mathrm{inter}}+{\nu}_{\mathrm{pattern}}$. Plugging in the previous expression for ${\nu}_{\text{inter}}$ and solving for α yields $\alpha =1\frac{{\nu}_{\mathrm{pattern}}}{{\nu}_{\mathrm{inter}}}$. We choose the same learning rate as Masquelier, 2018. Due to the particular implementation of STDPlike rules in NEST (Morrison et al., 2007), we do not need to evolve multiple functions describing correlationinduced and homeostatic changes separately, but can evolve only one function for each branch of the STDP window. Terms in these functions which do not vanish for ${E}_{j}^{\text{c}}\to 0$ are effectively implementing presynaptically triggered (in the acausal branch) and postsynaptically triggered (in the causal branch) homeostatic mechanisms.
Appendix 1
A rewarddriven learning
Full evolution data for different CGP hyperparameter choices
The following tables summarize all evolved plasticity rules for the four different hyperparameter sets used for the rewarddriven learning experiments.
CGP hyperparameter set 0  

Population  $\mu =1,{p}_{\text{mutation}}=0.035$ 
Genome  ${n}_{\text{inputs}}=3,{n}_{\text{outputs}}=1,{n}_{\text{rows}}=1,{n}_{\text{columns}}=24,{l}_{\text{max}}=24$ 
Primitives  Add, Sub, Mul, Div, Const(1.0), Const(0.5) 
EA  $\lambda =4,{n}_{\text{breeding}}=4,{n}_{\text{tournament}}=1,\text{reorder}=\text{true}$ 
Other  $\mathrm{m}\mathrm{a}\mathrm{x}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\mathrm{g}\mathrm{e}\mathrm{n}\mathrm{e}\mathrm{r}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}\mathrm{s}=1000,\mathrm{m}\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{m}\mathrm{a}\mathrm{l}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\mathrm{f}\mathrm{i}\mathrm{t}\mathrm{n}\mathrm{e}\mathrm{s}\mathrm{s}=500.0$ 
Discovered plasticity rules for hyperparameter set 0  

Label  Fitness $\mathrm{F}$  Expression $f$ 
LR0  216.2  ${E}_{j}^{\text{r}}+{E}_{j}^{\text{r}}/R$ 
LR1  73.0  $(R+E_{j}^{\text{r}}{}^{2})/\overline{R}$ 
LR2  216.2  ${E}_{j}^{\text{r}}(R1.0)$ 
LR3  221.6  ${E}_{j}^{\text{r}}/(2R+(R+1.0)(R+\overline{R})+1.0)$ 
LR4  234.2  ${E}_{j}^{\text{r}}(R1)(R+\overline{R})$ 
LR5  216.2  ${E}_{j}^{\text{r}}(R1)$ 
LR6  69.2  $4.0E_{j}^{\text{r}}{}^{2}/\overline{R}+2.0{E}_{j}^{\text{r}}$ 
LR7  234.2  ${E}_{j}^{\text{r}}(R1)(R+\overline{R})/R$ 
CGP hyperparameter set 1  

Population  $\mu =1,{p}_{\text{mutation}}=0.035$ 
Genome  $n}_{\text{inputs}}={\mathbf{4}}^{\ast},{n}_{\text{outputs}}=1,{n}_{\text{rows}}=1,{n}_{\text{columns}}=\mathbf{12},{l}_{\text{max}}=\mathbf{12$ 
Primitives  Add, Sub, Mul, Div, Const(1.0), Const(0.5) 
EA  $\lambda =4,{n}_{\text{breeding}}=4,{n}_{\text{tournament}}=1,\text{reorder}=\text{true}$ 
Other  $\mathrm{m}\mathrm{a}\mathrm{x}\phantom{\rule{thinmathspace}{0ex}}\mathrm{g}\mathrm{e}\mathrm{n}\mathrm{e}\mathrm{r}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}\mathrm{s}=1000,\mathrm{m}\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{m}\mathrm{a}\mathrm{l}\phantom{\rule{thinmathspace}{0ex}}\mathrm{f}\mathrm{i}\mathrm{t}\mathrm{n}\mathrm{e}\mathrm{s}\mathrm{s}=500.0$ 

${}^{*}$ Bold highlights values changed with respect to hyperparameter set 0.
Discovered plasticity rules for hyperparameter set 1  

Label  Fitness $\mathrm{F}$  Expression $f$ 
LR0  238.6  $({E}_{j}^{\text{r}}(R+{\overline{R}}^{}(R+{\overline{R}}^{}))+{E}_{j}^{\text{r}}+{\overline{R}}^{})/(R+{\overline{R}}^{}(R+{\overline{R}}^{}))$ 
LR1  233.4  ${E}_{j}^{\text{r}}(R1)/(R(R{\overline{R}}^{+}))$ 
LR2  217.2  ${E}_{j}^{\text{r}}(R+{\overline{R}}^{}+1.0)$ 
LR3  227.6  $R{\overline{R}}^{}{E}_{j}^{\text{r}}+{E}_{j}^{\text{r}}/R$ 
LR4  247.2  $(R1.0)(R+{E}_{j}^{\text{r}}+2{\overline{R}}^{+})$ 
LR5  198.2  $({E}_{j}^{\text{r}}{\overline{R}}^{+}{\overline{R}}^{})/(R+{\overline{R}}^{+})$ 
LR6  216.2  ${E}_{j}^{\text{r}}(R1)$ 
LR7  225.8  ${E}_{j}^{\text{r}}{\overline{R}}^{}+(R{\overline{R}}^{})({E}_{j}^{\text{r}}+{\overline{R}}^{})$ 
CGP hyperparameter set 2  

Population  $\mu =1,{p}_{\text{mutation}}=0.035$ 
Genome  $n}_{\text{inputs}}=4,{n}_{\text{outputs}}=1,{n}_{\text{rows}}=1,{n}_{\text{columns}}={\mathbf{24}}^{\ast},{l}_{\text{max}}=\mathbf{24$ 
Primitives  Add, Sub, Mul, Div, Const(1.0), Const(0.5) 
EA  $\lambda =4,{n}_{\text{breeding}}=4,{n}_{\text{tournament}}=1,\text{reorder}=\mathbf{\text{false}}$ 
Other  $\mathrm{m}\mathrm{a}\mathrm{x}\phantom{\rule{thinmathspace}{0ex}}\mathrm{g}\mathrm{e}\mathrm{n}\mathrm{e}\mathrm{r}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}\mathrm{s}=1000,\mathrm{m}\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{m}\mathrm{a}\mathrm{l}\phantom{\rule{thinmathspace}{0ex}}\mathrm{f}\mathrm{i}\mathrm{t}\mathrm{n}\mathrm{e}\mathrm{s}\mathrm{s}=500.0$ 

${}^{*}$ Bold highlights values changed with respect to hyperparameter set 1.
Discovered plasticity rules for hyperparameter set 2  

Label  Fitness $\mathrm{F}$  Expression $f$ 
LR0  127.2  ${E}_{j}^{\text{r}}/(R+{\overline{R}}^{+}{\overline{R}}^{})$ 
LR1  192.0  ${E}_{j}^{\text{r}}/(R+{\overline{R}}^{+})$ 
LR2  216.2  ${E}_{j}^{\text{r}}(R1)$ 
LR3  170.6  $(2{E}_{j}^{\text{r}}{\overline{R}}^{}(R{\overline{R}}^{})+{E}_{j}^{\text{r}}1)/(R{\overline{R}}^{})$ 
LR4  237.6  $(R{E}_{j}^{\text{r}}({\overline{R}}^{}+1)+{E}_{j}^{\text{r}}+{\overline{R}}^{})/(R({\overline{R}}^{}+1))$ 
LR5  233.4  ${E}_{j}^{\text{r}}(1R)/(R{\overline{R}}^{+})$ 
LR6  120.8  $(R+{\overline{R}}^{})({E}_{j}^{\text{r}}{\overline{R}}^{+})$ 
LR7  254.8  $(R{\overline{R}}^{}+2{E}_{j}^{\text{r}})(R{\overline{R}}^{}+R{\overline{R}}^{+})$ 
CGP hyperparameter set 3  

Population  $\mu =1,{p}_{\text{mutation}}=0.035$ 
Genome  ${n}_{\text{inputs}}=4,{n}_{\text{outputs}}=1,{n}_{\text{rows}}=1,{n}_{\text{columns}}=24,{l}_{\text{max}}=24$ 
Primitives  Add, Sub, Mul, Div, Const(1.0), Const(0.5) 
EA  $\lambda =4,{n}_{\text{breeding}}=4,{n}_{\text{tournament}}=1,\text{reorder}={\mathbf{\text{true}}}^{\mathbf{\ast}}$ 
Other  $\mathrm{m}\mathrm{a}\mathrm{x}\phantom{\rule{thinmathspace}{0ex}}\mathrm{g}\mathrm{e}\mathrm{n}\mathrm{e}\mathrm{r}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}\mathrm{s}=1000,\mathrm{m}\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{m}\mathrm{a}\mathrm{l}\phantom{\rule{thinmathspace}{0ex}}\mathrm{f}\mathrm{i}\mathrm{t}\mathrm{n}\mathrm{e}\mathrm{s}\mathrm{s}=500.0$ 

${}^{*}$ Bold highlights values changed with respect to hyperparameter set 2.
Discovered plasticity rules for hyperparameter set 3  

Label  Fitness $\mathrm{F}$  Expression $f$ 
LR0  236.0  ${E}_{j}^{\text{r}}({R}^{3}({\overline{R}}^{}+1)+1)/R$ 
LR1  242.0  ${E}_{j}^{\text{r}}(R{\overline{R}}^{+}+{\overline{R}}^{})$ 
LR2  242.0  ${E}_{j}^{\text{r}}(R{\overline{R}}^{+}+{\overline{R}}^{})$ 
LR3  227.6  $R({E}_{j}^{\text{r}}+{\overline{R}}^{}){E}_{j}^{\text{r}}$ 
LR4  256.0  ${E}_{j}^{\text{r}}(R{\overline{R}}^{+}+{\overline{R}}^{})/({\overline{R}}^{+}+1.0)$ 
LR5  71.0  $({\overline{R}}^{+}(R+{E}_{j}^{\text{r}}+{\overline{R}}^{}(R+{\overline{R}}^{})+{\overline{R}}^{}){\overline{R}}^{})/{\overline{R}}^{+}$ 
LR6  216.2  ${E}_{j}^{\text{r}}(R1.0)$ 
LR7  227.8  $({E}_{j}^{\text{r}}{\overline{R}}^{}{}^{2})(R+{\overline{R}}^{}{}^{2}1.0)$ 
Causal and homeostatic terms over trials
Appendix 1—figure 5 illustrates the behavior of the causal and homeostatic terms of the six plasticity rules discovered in the rewarddriven learning experiments.
Cumulative reward over trials
Appendix 1—figure 6 illustrates the cumulative reward over trials for the six platicity rules discovered in the rewarddriven learning experiments.
Errordriven learning – simplification of the discovered rules
As in the main manuscript $v$ is the teacher potential, $u$ the student membrane potential, and η a fixed learning rate. ${\overline{s}}_{j}(t)=(\kappa *{s}_{j})(t)$ represents the the presynaptic spike train s_{j} filtered by the synaptic kernel κ.
We first consider Equation 10:
where we introduced $\delta :=vu$. From the third to the fourth line, we assumed that the mismatch between student and teacher potential is much smaller than their absolute magnitude and that their absolute magnitude is much larger than one. For our parameter choices and initial conditions, this is a reasonable assumption.
We next consider Equation 11:
As previously, from the third to fourth line, we assumed that the mismatch between student and teacher potential is much smaller than their absolute magnitude and that their absolute magnitude is much larger than one. This implies $\frac{{\overline{s}}_{j}}{v}\ll 1$ as ${\overline{s}}_{j}\approx \mathcal{O}(1)$ for small input rates.
The additional terms in both Equation 10 and Equation 11 hence reduce to a simple scaling of the learning rate and thus perform similarly to the purple rule in Figure 4.
Correlationdriven learning – detailed experimental results
Appendix 1—figure 7 illustrates membrane potential dynamics for the output neuron using the two plasticity rules discovered in the correlationdriven learning experiments.
Simulation details
Appendix 1—table 1, Appendix 1—table 2, and Appendix 1—table 3 summarize the network models used in the experiments (according to Nordlie et al., 2009).
Data availability
All data and scripts required to reproduce the manuscript figures, as well as source code, simulation and analysis scripts are publicly available at https://github.com/HappyAlgorithmsLeague/e2lcgpsnn (copy archived at https://archive.softwareheritage.org/swh:1:rev:2f370ba6ec46a46cf959afcc6c1c1051394cd02a).
References

ConferenceLearning to learn by gradient descent by gradient descent30th Conference on Neural Information Processing Systems. pp. 3981–3989.

ConferenceLearning a synaptic learning ruleIJCNN91Seattle International Joint Conference on Neural Networks.https://doi.org/10.1109/IJCNN.1991.155621

ConferenceOn the optimization of a synaptic learning rulePreprints Conf. Optimality in Artificial and Biological Neural Networks.

BookGeneralization of a Parametric Learning RuleIn: Gielen S, Kappen B, editors. ICANN ’93. Springer. 502.https://doi.org/10.1007/9781447120636_131

ConferenceUse of genetic programming for the search of a new learning rule for neural networksIEEE World Congress on Computational Intelligence. pp. 324–327.https://doi.org/10.1109/ICEC.1994.349932

Evolution strategies–a comprehensive introductionNatural Computing 1:3–52.https://doi.org/10.1023/A:1015059928466

Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell typeJournal of Neuroscience 18:10464–10472.

Neuromorphic hardware learns to learnFrontiers in Neuroscience 13:483.https://doi.org/10.3389/fnins.2019.00483

Spike timingdependent plasticity: a hebbian learning ruleAnnual Review of Neuroscience 31:25–46.https://doi.org/10.1146/annurev.neuro.31.060407.125639

BookThe evolution of learning: An experiment in genetic connectionismIn: Chalmers DJ, editors. Connectionist Models. Amsterdam, Netherlands: Elsevier. pp. 81–90.https://doi.org/10.1016/B9781483214481.500147

Connectivity reflects coding: a model of voltagebased STDP with homeostasisNature Neuroscience 13:344–352.https://doi.org/10.1038/nn.2479

ConferenceA metalearning approach to (re) discover plasticity rules that carve a desired function into a neural network34th Conference on Neural Information Processing Systems.

Striatal circuits for reward learning and decisionmakingNature Reviews Neuroscience 20:482–494.https://doi.org/10.1038/s4158301901892

Maintaining the stability of neural function: a homeostatic hypothesisAnnual Review of Physiology 63:847–869.https://doi.org/10.1146/annurev.physiol.63.1.847

BookConnectionist ModelsIn: Touretzky D, Elman J, Sejnowski T, Hinton G, editors. Oxford Companion to Consciousness. New York: Oxford University Press. pp. 45–51.

BookMultiObjective Optimization Using Evolutionary AlgorithmsNew Jersey, United States: John Wiley & Sons.

Bayesian spiking neurons I: inferenceNeural Computation 20:91–117.https://doi.org/10.1162/neco.2008.20.1.91

Bidirectional longterm modification of synaptic effectiveness in the adult and immature HippocampusThe Journal of Neuroscience 13:2910–2918.https://doi.org/10.1523/JNEUROSCI.130702910.1993

PyNEST: a convenient interface to the NEST simulatorFrontiers in Neuroinformatics 2:12.https://doi.org/10.3389/neuro.11.012.2008

Neuromodulated SpikeTimingDependent plasticity, and theory of ThreeFactor learning rulesFrontiers in Neural Circuits 9:85.https://doi.org/10.3389/fncir.2015.00085

Catastrophic forgetting in connectionist networksTrends in Cognitive Sciences 3:128–135.https://doi.org/10.1016/S13646613(99)012942

Analysis of cartesian genetic programming’s Evolutionary MechanismsIEEE Transactions on Evolutionary Computation 19:359–373.https://doi.org/10.1109/TEVC.2014.2324539

Learning input correlations through nonlinear temporally asymmetric hebbian plasticityThe Journal of Neuroscience 23:3697–3714.https://doi.org/10.1523/JNEUROSCI.230903697.2003

ConferencePolynomial theory of complex systemsIEEE Transactions on Systems, Man, and Cybernetics. pp. 364–378.https://doi.org/10.1109/TSMC.1971.4308320

ConferenceDifferentiable genetic programmingEuropean Conference on Genetic Programming. pp. 35–51.

Deterministic networks for probabilistic computingScientific Reports 9:1–17.https://doi.org/10.1038/s41598019541377

Network plasticity as bayesian inferencePLOS Computational Biology 11:e1004485.https://doi.org/10.1371/journal.pcbi.1004485

Hebbian learning and spiking neuronsPhysical Review E 59:4498–4514.https://doi.org/10.1103/PhysRevE.59.4498

BookGenetic Programming: On the Programming of Computers by Means of Natural SelectionCambridge, United States: MIT press.

Humancompetitive results produced by genetic programmingGenetic Programming and Evolvable Machines 11:251–284.https://doi.org/10.1007/s1071001091123

BookThe Representation of the Cumulative Rounding Error of an Algorithm as a Taylor Expansion of the Local Rounding ErrorsHelsinki, Finland: University of Helsinki.

Toward an integration of deep learning and neuroscienceFrontiers in Computational Neuroscience 10:94.https://doi.org/10.3389/fncom.2016.00094

SymPy: symbolic computing in PythonPeerJ Computer Science 3:e103.https://doi.org/10.7717/peerjcs.103

ConferenceAn empirical study of the efficiency of learning boolean functions using a cartesian genetic programming approachProceedings of the 1st Annual Conference on Genetic and Evolutionary Computation. pp. 1135–1142.https://doi.org/10.5555/2934046.2934074

BookCartesian Genetic ProgrammingBerlin, Heidelberg: Springer.https://doi.org/10.1007/9783642173103_2

Genetic algorithms, tournament selection, and the effects of noiseComplex Systems 9:193–212.

The role of constraints in hebbian learningNeural Computation 6:100–126.https://doi.org/10.1162/neco.1994.6.1.100

ConferenceCartesian genetic programmingEuropean Conference on Genetic Programming. pp. 121–132.

A scalable multicore architecture with heterogeneous memory structures for dynamic neuromorphic asynchronous processors (DYNAPs)IEEE Transactions on Biomedical Circuits and Systems 12:106–122.https://doi.org/10.1109/TBCAS.2017.2759700

Spiketimingdependent plasticity in balanced random networksNeural Computation 19:1437–1467.https://doi.org/10.1162/neco.2007.19.6.1437

Phenomenological models of synaptic plasticity based on spike timingBiological Cybernetics 98:459–478.https://doi.org/10.1007/s0042200802331

Synaptic activity modulates the induction of bidirectional synaptic changes in adult mouse HippocampusThe Journal of Neuroscience 20:2451–2458.https://doi.org/10.1523/JNEUROSCI.200702451.2000

Towards reproducible descriptions of neuronal network modelsPLOS Computational Biology 5:e1000456.https://doi.org/10.1371/journal.pcbi.1000456

A simplified neuron model as a principal component analyzerJournal of Mathematical Biology 15:267–273.https://doi.org/10.1007/BF00275687

ConferenceThe evolution of a generalized neural learning ruleNeural Networks (IJCNN), 2016 International Joint Conference. pp. 4688–4694.

ConferencePyTorch: an imperative style, highperformance deep learning library33rd Conference on Neural Information Processing Systems. .. pp. 8024–8035.

Synapses with shortterm plasticity are optimal estimators of presynaptic membrane potentialsNature Neuroscience 13:1271–1275.https://doi.org/10.1038/nn.2640

BookDiscovering efficient learning rules for feedforward neural networks using genetic programmingIn: Abraham A, Jain LC, Kacprzyk J, editors. Recent Advances in Intelligent Paradigms and Applications. Springer. pp. 133–159.https://doi.org/10.1007/9783790817706_7

ConferenceAutoMLZero: evolving machine learning algorithms from scratchInternational Conference on Machine Learning. pp. 8007–8019.

BookIndirectly encoding neural plasticity as a pattern of local rulesIn: Doncieux S, Girard B, Guillot A, Hallam J, Meyer JA, Mouret JB, editors. From Animals to Animats 11. New York, United States: Springer. pp. 533–543.https://doi.org/10.1007/9783642151934_50

ConferenceDendritic cortical microcircuits approximate the backpropagation algorithmNIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems. pp. 8721–8732.https://doi.org/10.5555/3327546.3327550

ConferenceFaster genetic programming based on local gradient search of numeric leaf valuesProceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation. pp. 155–162.https://doi.org/10.5555/2955239.2955258

Reinforcement learning in populations of spiking neuronsNature Neuroscience 12:250–252.https://doi.org/10.1038/nn.2264

The NumPy array: a structure for efficient numerical computationComputing in Science & Engineering 13:22–30.https://doi.org/10.1109/MCSE.2011.37

ConferenceThe optimal reward baseline for gradientbased reinforcement learningProceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence. pp. 535–538.https://doi.org/10.5555/2074022.2074088

Theories of error BackPropagation in the brainTrends in Cognitive Sciences 23:235–250.https://doi.org/10.1016/j.tics.2018.12.005

BookReinforcement Learning in Connectionist Networks: A Mathematical AnalysisSan Diego: University of California.

BookToward a Theory of ReinforcementLearning Connectionist SystemsBoston: Northeastern University.

CyNEST: a maintainable Cythonbased interface for the NEST simulatorFrontiers in Neuroinformatics 8:23.https://doi.org/10.3389/fninf.2014.00023
Decision letter

Mark CW van RossumReviewing Editor; University of Nottingham, United Kingdom

Michael J FrankSenior Editor; Brown University, United States

Henning SprekelerReviewer; Technische Universität Berlin, Germany
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
A precise quantitative description of synaptic plasticity is currently unknown, so that most formulate learning rules somewhat ad hoc. This computational neuroscience paper uses genetic algorithms (GAs) to find synaptic plasticity rules that perform best on a variety of simulated plasticity tasks with spiking neurons. In principle GAs are potentially powerful, but by being nondifferentiable they are limited by computational requirements and can only find simple equations and rely on preprocessing such predefined eligibility traces. Future research might be able to generalize this technique.
Decision letter after peer review:
[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]
Thank you for submitting your work entitled "Evolving to learn, discovering interpretable plasticity rules for spiking networks" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by a Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Henning Sprekeler (Reviewer #2).
Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife. Our excuses for the long time we needed to reach our decision.
While the reviewers both saw the potential benefits of the method, the current application only reproduces already known plasticity rules (sometimes not without extra tweaking).
The reviewers agreed that the manuscript does not sufficiently support the applicability of the suggested method to less handcrafted situations. The scalability of the method is potentially a concern. To be eligible for further consideration in eLife, this would have to be shown, either by applying the method to a situation with more and less handcrafted inputs to the learning rule and/or by identifying novel, efficient and task ensemblespecific rules. Such a revision would likely require more time than eLife aims for.
Reviewer #1:
This paper uses genetic algorithms to find synaptic plasticity rules.
Overall, I found the results interesting, but as genetic algorithms have been wellestablished and the amount of new results are limited, I see this more as a tutorial and an effort to bring other researchers to use GAs.
However, the limitations of the method were not clear:
I would like to see a discussion of the computation time, and the scaling with model complexity and other limitations of the technique.
I found it also hard to judge whether this study presents an advance of other methods, such using multiple traces to fit learning rules (Gerstner c.s.).
The inputs to the GAs are quite engineered traces and it was unclear how important this was.
The reworking of the STDP rules in the last Results section was not so clear to me.
First the concept of instability needs to be explained better.
Do these reworked rules perform identically? If so, is there a equivalence class of STDP rules that perform identically on this task?
Furthermore, can stability not be included in the fitness function (either as direct constraint on \Δ w (t>\pm \infty), or by widening the task repertoire)?
It is not so elegant to have a supposedly general technique, and then handtune the solutions….
In conclusion, I'm not convinced the study presents enough of a conceptual or methodological advance.
Reviewer #2:
The article "Evolving to learn, discovering interpretable learning rules for spiking networks" by Jordan et al. proposes an evolutionary algorithm to metalearn plasticity rules for spiking neurons. The algorithm learns to combine userdetermined inputs into a mathematical formula for updating synaptic weights, by gradually mutating and selecting a set of candidates based on their performance. The algorithm is applied to (families of) rewardbased, errorbased, and correlationbased tasks, all three performed by a single neuron. In each case, the algorithm recovers previously proposed learning rules (or variants thereof) that are known to optimize some performance measure.
The article is clearly written, timely, and presents an exciting approach to metalearning that holds the promise of not only generating taskspecific learning rules, but of providing them in an interpretable form. Its key weakness is its limitation to a rediscovery of existing general purpose rules, in a setup where the quantities that enter the rules seem somewhat preengineered. As such, the paper is a proofofconcept presentation of a very exciting method on simplistic examples. Whether the approach will actually be applicable to a situation with more inputs, for more complex settings (e.g., multilayer networks) and whether it will ultimately discover task ensemblespecific learning rules is yet to be seen.
1. Preengineering of inputs: It's nice that the authors test their method on three different learning tasks. However, the inputs that enter these learning rules (e.g., the eligibility traces) are chosen by hand and reflect substantial domain knowledge. To show that the method could be applied by a more agnostic user, it would be nice to see that, e.g., different rules could be learned from the same set of inputs. Would it be possible to learn the shape of the eligibility traces? Does the number of successful evolutionary runs decline quickly with the number of inputs the rules are allowed to use?
2. Computational effort: Metalearning is somewhat notorious in its demand for computing resources, and the authors acknowledge a highperformance computing center. How computationally expensive is the method? How does the computational expense scale with the number of inputs to the learning rules?
3. Discovery of unknown/nongeneral purpose rules: The paper would be strengthened substantially by an example of a discovered rule that is better than known general purpose rules. I fully appreciate that a search for such a situation may amount to finding a needle in a haystack and I suspect the authors have tried. I nevertheless dare to make a suggestion for a candidate: In Vasilaki et al. (2009), the authors used rewardbased spiking learning rules to learn a navigation task and argue that policygradient methods fail. What works much better in the end is a biased rule that effectively amounts to something simple like pre x post x reward. Such a rule is clearly biased and could make catastrophic errors in other situations. However, I've suspected for a while (and I think I discussed this with Walter Senn at some point) that such a rule could actually be very powerful in a setting where rewards and state and action representations are very sparse. I wouldn't suggest to the authors to try their method on a navigation task, but policy gradient rules in spiking neurons are notoriously slow in much simpler settings. It feels like it should be possible to beat them. Maybe there is a simple single neuron task with sparse inputs, sparse target outputs and sparse rewards?
4. How sensitive is the method to hyperparameters? The authors use mutation probability 0.045 in first tasks, but 0.05 in last.
[Editors’ note: further revisions were suggested prior to acceptance, as described below.]
Thank you for submitting your article "Evolving interpretable plasticity for spiking networks" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by a Reviewing Editor and Michael Frank as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Henning Sprekeler (Reviewer #2).
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.
Summary:
This computational neuroscience paper uses genetic algorithms (GAs) to find synaptic plasticity rules that perform best on a variety of simulated plasticity tasks with spiking neurons. While GAs are potential powerful, being nondifferentiable the study is limited by computationally requirements and can only find very simple equations and use fairly advanced preprocessing such predefined eligibility traces.
Essential revisions:
1) The authors have added a number of additions to the manuscript that present clear improvement. However, I have doubts about one of the main additions: the inclusion of different baselines for the RL task. From my understanding, rewards are either 1 or 1. Doesn't that mean that [R]+ = (R+1)/2 and [R] = (R1)/2? If that's true, all three baselines are all linearly dependent. My suspicion is that this leads to substantial amounts of "neutral evolution" of terms that basically sum up to zero (or converge to zero exponentially). I juggled around a bit with the various rules and found quite a few of those "null" terms.
I suspect that the point discussed in the section on errordriven learning rules (l. 270) also applies to the RL rules, and that the differences between the rules mostly amount to learning rate (schedules) and "null terms". This may also explain why the gains in performance are not overwhelming. However, because the difference between the evolved rules is not analyzed in depth, this doesn't become clear in the manuscript. I'd suggest to support the discussion of the learning rules by figures. Maybe plot the different "causal" and "homeostatic" terms over time, and potentially something like a running average covariance with (R1) E?
2) Proper statistical analysis of the findings.
In particular in Figure 3 significance estimates against the nullhypothesis need to be presented.
For that section, I would also like to see if the found learning rules differ in their convergence speed on new data.
3) Include data on the convergence speed of the learning. I still would like to see compute time used for a typical run, which is suspiciously absent from the paper.
Reviewer #1:
This paper uses genetic algorithms (GAs) to find synaptic plasticity rules on a variety of plasticity problems with spiking neurons. GAs have as an advantage that a free exploration of possible models potentially coming up with original, superior solutions. On the other hand, being nondifferentiable they are severely limited by computationally requirements and can only find very simple equations and use fairly advanced preprocessing such predefined eligibility traces.
The paper reads on occasion more like an advertisement than a scientific paper.
For the unsupervised rules, was the LTP made explicitly dependent on (w1)? Or was this found automatically?
The divergencies described in the previous version of the manuscript seemed to have disappeared like snow in the sun.
Reviewer #2:
– The discussion about the need to jointly consider learning rule and homeostatic mechanisms is nice, but of limited novelty. I'd suggest to cite at least Mackay and Miller (1994) here.
– Figure 3: Panel 3 doesn't show any rules with new baselines. Is this old data?
– I failed to find the time window on which the baselines are computed (m=?) This is quite important for your discussion about time varying learning rates.
– Section references are broken. Recompile?
– Double words: l. 82, 377
– l. 364: I learned the hard way to stay away from "first demonstration" claims in papers …
https://doi.org/10.7554/eLife.66273.sa1Author response
[Editors’ note: the authors resubmitted a revised version of the paper for consideration. What follows is the authors’ response to the first round of review.]
Reviewer #1:
This paper uses genetic algorithms to find synaptic plasticity rules.
Overall, I found the results interesting, but as genetic algorithms have been wellestablished and the amount of new results are limited, I see this more as a tutorial and an effort to bring other researchers to use GAs.
We thank the reviewer for their comments, however, our manuscript is not a tutorial on genetic algorithms. Besides demonstrating for the first time the power of genetic programming in the search for plasticity rules in spiking neuronal networks, it provides new results with implications for both learning algorithm design as well as neuroscientific experiments. We hope that our new results on reinforcement learning and a significant revision of the correlationdriven learning section makes our contributions more evident and improves their accessibility. In the revised version of the manuscript we have addressed all concerns voiced by the reviewer. Please also consider our pointbypoint reply below.
However, the limitations of the method were not clear:
I would like to see a discussion of the computation time, and the scaling with model complexity and other limitations of the technique.
We agree with the reviewer that the discussion of these topics was rather limited. The Discussion section of the revised manuscript now contains a description of the computational requirements of our methods.
I found it also hard to judge whether this study presents an advance of other methods, such using multiple traces to fit learning rules (Gerstner c.s.).
As we argue in the revised manuscript, our approach presents an advance over previous methods on several fronts. In contrast to traditional approaches relying on mathematical derivations requiring a significant number of assumption besides human intuition and manual work, the automated search requires fewer assumptions, is less biased and mainly consumes machine time. In contrast to previous automated searches for optimization algorithms or plasticity rules relying on representation by artificial neural networks [e.g., 1, 2] we obtain symbolic expressions which allow us to understand the encoded computational principles. In contrast to previous automated searches relying on fixed symbolic expressions with optimized coefficients [e.g., 3, 4] we consider a significantly larger search space by allowing our evolutionary algorithm to manipulate the form of the expressions. But most importantly, the revised manuscript presents new insights into biological information processing, which, we argue, would be difficult to obtain with other methods.
The inputs to the GAs are quite engineered traces and it was unclear how important this was.
The inputs to the plasticity rules are quantities that have been previously been shown to be linked to plasticity, such as reward, lowpass filtered spike trains, and correlations between pre and postsynaptic activities. The eligibility trace has been successfully used in previous work on reinforcement learning with spiking neuronal networks and is a natural consequence of policygradient approaches. We agree with the reviewer that these inputs do reflect domain knowledge and that a possible direction of future research may explore which expressions would arise when one would, for example, provide the individual components of the eligibility trace. However, we are convinced that our algorithm would then be able to recover the current form of the eligibility trace from its components if this form is indeed a prerequisite for high performance.
Moreover, we would like to point out that making use of domain knowledge is potentially desirable and explicitly supported by our approach. We can leverage prior knowledge about the relevance of certain kinds of plasticityrelated quantities instead of letting the algorithm “reinvent” them. If the user provides useful variables, the automated search is able to leverage information contained in these to suggest powerful rules. If a certain quantity has been consistently demonstrated to be related to synaptic plasticity in the literature, it seems reasonable to provide it as an input to the plasticity rule if the goal is to achieve high task performance, not the discovery of new “features”, e.g., variants of eligibility traces.
We have expanded the discussion of these points in the revised version of the manuscript.
The reworking of the STDP rules in the last Results section was not so clear to me.
First the concept of instability needs to be explained better.
Do these reworked rules perform identically? If so, is there a equivalence class of STDP rules that perform identically on this task?
We agree with the reviewer that the presentation of these results was suboptimal. The discovered plasticity rules indeed are functionally equivalent, i.e., despite their difference in mechanism, they lead to similar networklevel behavior. The constants in the discovered plasticity rules can indeed be interpreted as pre and postsynaptically triggered homeostatic mechanisms, suggesting that plasticity and homeostatic principles should always be considered jointly. The revised manuscript contains a more accessible presentation of these results.
Furthermore, can stability not be included in the fitness function (either as direct constraint on \Δ w (t>\pm \infty), or by widening the task repertoire)?
Including such direct constraints in the fitness function would be possible. However, this would require an additional hyperparameter for each constraint which governs how strongly this constraint influences the fitness of an individual. Similar to regularizers in traditional optimization problems, these hyperparameters would require careful tuning which is rather unnatural in our taskfocused framework.
Equivalently, a similar effect could be achieved by widening the task repertoire to include specific tasks that effectively implement such constraints. This would, of course, be possible, but would have the same drawback, namely the tuning and weighing of these additional tasks. Another option would be to include multiple natural, but still different tasks, which could implicitly optimize for stability by effectively imposing a higher number of constraints than a single task. This scenario might also require some tuning of the balance between the tasks, but would certainly be more natural and should therefore be explored in future work.
It is not so elegant to have a supposedly general technique, and then handtune the solutions….
We thank the reviewer for raising this concern, but we would like to point out that the analysis and potentially even modification of the discovered solutions is an essential part of the workflow enabled by our approach. One main advantage of our method is the availability of compact humanreadable symbolic expressions, in contrast to the plethora of methods relying on artificial neural networks to represent plasticity rules or optimization algorithms. This allows us to analyze the discovered rules with traditional methods to gain an understanding of the computational principles underlying them. We view our approach as a hypothesisgenerating machine that suggests possible plasticity rules. By themselves, these could provide new, computationally interesting solutions to learning problems. However, the benefit of our approach does not stop here. Equipped with the knowledge and expertise gained over many decades, researchers in our community can use these automatically generated expressions as starting points for further refinement, recombination and extrapolation. While all plasticity rules in our manuscript were generated by the evolutionary algorithm without handtuning, future work may involve manipulating the suggested expressions to incorporate additional expert knowledge not available to the automated search. We have expanded the discussion of these points in the revised manuscript.
In conclusion, I'm not convinced the study presents enough of a conceptual or methodological advance.
We thank the reviewer for their detailed critique of the manuscript and hope that our answers, alongside a substantially revised manuscript including new results, help to better substantiate the relevance of our approach. We hope that the revised manuscript, including new, previously unknown learning rules for reinforcement learning with spiking neurons, make the relevance of our work more accessible.
Reviewer #2:
The article "Evolving to learn, discovering interpretable learning rules for spiking networks" by Jordan et al. proposes an evolutionary algorithm to metalearn plasticity rules for spiking neurons. The algorithm learns to combine userdetermined inputs into a mathematical formula for updating synaptic weights, by gradually mutating and selecting a set of candidates based on their performance. The algorithm is applied to (families of) rewardbased, errorbased, and correlationbased tasks, all three performed by a single neuron. In each case, the algorithm recovers previously proposed learning rules (or variants thereof) that are known to optimize some performance measure.
The article is clearly written, timely, and presents an exciting approach to metalearning that holds the promise of not only generating taskspecific learning rules, but of providing them in an interpretable form. Its key weakness is its limitation to a rediscovery of existing general purpose rules, in a setup where the quantities that enter the rules seem somewhat preengineered. As such, the paper is a proofofconcept presentation of a very exciting method on simplistic examples. Whether the approach will actually be applicable to a situation with more inputs, for more complex settings (e.g., multilayer networks) and whether it will ultimately discover task ensemblespecific learning rules is yet to be seen.
We thank the reviewer for appreciating the novelty of our approach, their valuable comments, and constructive critique. The revised manuscript contains novel results that go beyond merely rediscovering previously know plasticity rules. Below we have addressed the issues raised by the reviewer in a pointtopoint reply.
1. Preengineering of inputs: It's nice that the authors test their method on three different learning tasks. However, the inputs that enter these learning rules (e.g., the eligibility traces) are chosen by hand and reflect substantial domain knowledge. To show that the method could be applied by a more agnostic user, it would be nice to see that, e.g., different rules could be learned from the same set of inputs. Would it be possible to learn the shape of the eligibility traces? Does the number of successful evolutionary runs decline quickly with the number of inputs the rules are allowed to use?
The inputs to the plasticity rules are quantities that have been previously been shown to be linked to plasticity, such as reward, lowpass filtered spike trains, and correlations between pre and postsynaptic activities. The eligibility trace has been successfully used in previous work on reinforcement learning with spiking neuronal networks and is a natural consequence of policygradient approaches. We agree with the reviewer that these inputs do reflect domain knowledge and that a possible direction of future research may explore which expressions would arise when one would, for example, provide the individual components of the eligibility trace. However, we are convinced that our algorithm would then be able to recover the current form of the eligibility trace from its components if this form is indeed a prerequisite for high performance.
Moreover, we would like to point out that making use of domain knowledge is potentially desirable and explicitly supported by our approach. We can leverage prior knowledge about the relevance of certain kinds of plasticityrelated quantities instead of letting the algorithm “reinvent” them. If the user provides useful variables, the automated search is able to leverage information contained in these to suggest powerful rules. If a certain quantity has been consistently demonstrated to be related to synaptic plasticity in the literature, it seems reasonable to provide it as an input to the plasticity rule if the goal is to achieve high task performance, not the discovery of new “features”, e.g., variants of eligibility traces.
The scaling of the evolutionary search with the number of inputs to the expressions heavily depends on a number of factors: the complexity of the “optimal solution”, the types of inputs, e.g., whether they are independent or correlated quantities, the operators available to the search and so on. Naturally, we expect the required runtime to increase with the complexity of the considered problem. We report that so far, we have not encountered insurmountable issues. However, we believe a detailed investigation of how CGP scales along various hyperparameter axes is outside the scope of the current work.
We have expanded the discussion of these points in the revised version of the manuscript and also refer to our answer to the reviewer’s next question.
2. Computational effort: Metalearning is somewhat notorious in its demand for computing resources, and the authors acknowledge a highperformance computing center. How computationally expensive is the method? How does the computational expense scale with the number of inputs to the learning rules?
The reviewer is correctly pointing out the high demand of computational resources for meta learning. For the experiments considered in the present manuscript, the computational costs are rather low, requiring 24 − 48 node hours for a few parallel runs of the evolutionary algorithms, easily within reach of a modern workstation. However, more complex tasks requiring larger networks and/or longer runs will increase the computational resources and thus require longer waiting times, more parallelism, or faster simulation platforms. In particular, we point to neuromorphic systems as likely candidates for this task. Another option is to evolve rules relying on small networks and simplified task, and only run few instances at full scale for example by considering so called “hurdles” in which a fitness evaluation is stopped early unless the fitness value is not high enough after initial simulations [e.g., 5]. Furthermore, previous work has successfully demonstrated the application of CGP to directly evolve agents for complex tasks, such as playing Atari games, providing some confidence in the method’s scalability [6]. We have expanded the discussion of these points in the revised version of the manuscript.
3. Discovery of unknown/nongeneral purpose rules: The paper would be strengthened substantially by an example of a discovered rule that is better than known general purpose rules. I fully appreciate that a search for such a situation may amount to finding a needle in a haystack and I suspect the authors have tried. I nevertheless dare to make a suggestion for a candidate: In Vasilaki et al. (2009), the authors used rewardbased spiking learning rules to learn a navigation task and argue that policygradient methods fail. What works much better in the end is a biased rule that effectively amounts to something simple like pre x post x reward. Such a rule is clearly biased and could make catastrophic errors in other situations. However, I've suspected for a while (and I think I discussed this with Walter Senn at some point) that such a rule could actually be very powerful in a setting where rewards and state and action representations are very sparse. I wouldn't suggest to the authors to try their method on a navigation task, but policy gradient rules in spiking neurons are notoriously slow in much simpler settings. It feels like it should be possible to beat them. Maybe there is a simple single neuron task with sparse inputs, sparse target outputs and sparse rewards?
We agree with the reviewer that discovering new learning principles using the presented approach is what makes it exciting. First, we would like to point out that the initial submission contained such new findings: various forms of STDP kernels lead to similar networklevel behavior. However, the presentation of the corresponding results was certainly improvable and we have accordingly completely revised the corresponding section. Second, we thank the reviewer for suggesting to reexamine Vasilaki et al. (2009) [7]. As discussed there and in many previous works [e.g., 8, 9, 10] the choice of a suitable reward baseline is notoriously difficult in policygradient learning. We therefore expanded our reinforcementlearning task with a number of new inputs representing different quantities suggested as reward baselines, such as the expected rewards and the expected positive and negative reward, respectively. Exploiting the unbiased search of our evolutionary approach, we were curious to see which learning rules would be discovered. Indeed, the evolutionary search found a variety of interesting plasticity rules for learning from rewards making use of the these new quantities. Our results for the task family considered here can be briefly summarized as follows:
References
– average rewards not distinguishing between positive and negative rewards do not provide a good baseline;
– a quantity loosely related to the “novelty” of a task provides a surprisingly effective adaptive baseline that does not require agents to keep track of expected rewards;
– homeostatic mechanisms and their interplay with activityregulated plasticity support high performance.
Not only do these discoveries provide insight into computational aspects of reinforcement learning, they also allow specific experimental predictions on the behavioral and neuronal level. To reflect these new results we have completely rewritten the corresponding section in the manuscript. We hope that the revised manuscript makes a more convincing point of the new insights that can be gained via our approach
4. How sensitive is the method to hyperparameters? The authors use mutation probability 0.045 in first tasks, but 0.05 in last.
The method is not specifically sensitive to hyperparameters, and we have not explicitly tuned these. Where possible we have chosen parameters extracted from previous work on CGP [e.g., 11]. However, compared to previous work our genome size was chosen rather small to keep the complexity of the resulting expression low. Complexity penalties introduced in the fitness evaluation could be introduced to make such choices obsolete in the future.
[Editors’ note: what follows is the authors’ response to the second round of review.]
Summary:
This computational neuroscience paper uses genetic algorithms (GAs) to find synaptic plasticity rules that perform best on a variety of simulated plasticity tasks with spiking neurons. While GAs are potential powerful, being nondifferentiable the study is limited by computationally requirements and can only find very simple equations and use fairly advanced preprocessing such predefined eligibilty traces.
We thank the editorial team for preparing this summary. However, we disagree with several points expressed in the summary:
• Nondifferential optimization algorithms are not fundamentally limited; the recent years have seen several successful largescale applications of nondifferentiable optimization algorithms [1]. We therefore consider that, while historically factual, many concerns about fundamental limitations of GAs are outdated. Along with an increasing number of researchers, we thus believe that the eld of algorithmic optimization has a lot to gain from the application of Gas in appropriate domains, e.g., nondifferentiable search spaces, multidimensional optimization, etc.
• We emphasize throughout the manuscript that it is our explicit goal to discover simple equations: in order to be interpretable, understandable and generalizable by humans; learning rules should not be arbitrarily complex.
• Using \advanced preprocessed quantities" is not a shortcoming of our approach, but an opportunity to leverage prior knowledge, such as eligibility traces which are commonly encountered in reinforcement learning algorithms; if such prior knowledge is intentionally ignored, one is free to supply different (\simpler") signals to the plasticity rule. We would like to point out that this is indeed done in the errordriven learning task.
Essential revisions:
1) The authors have added a number of additions to the manuscript that present clear improvement. However, I have doubts about one of the main additions: the inclusion of different baselines for the RL task. From my understanding, rewards are either 1 or 1. Doesn't that mean that [R]+ = (R+1)/2 and [R] = (R1)/2? If that's true, all three baselines are all linearly dependent. My suspicion is that this leads to substantial amounts of "neutral evolution" of terms that basically sum up to zero (or converge to zero exponentially). I juggled around a bit with the various rules and found quite a few of those "null" terms.
I suspect that the point discussed in the section on errordriven learning rules (l. 270) also applies to the RL rules, and that the differences between the rules mostly amount to learning rate (schedules) and "null terms". This may also explain why the gains in performance are not overwhelming. However, because the difference between the evolved rules is not analyzed in depth, this doesn't become clear in the manuscript. I'd suggest to support the discussion of the learning rules by figures. Maybe plot the different "causal" and "homeostatic" terms over time, and potentially something like a running average covariance with (R1) E?
Yes, this is the correct definition of positive and negative rewards, these quantities are indeed linearly dependent. In many cases, the new terms do represent an effective scaling of the learning rate, which depends on previously received reward and/or punishment as discussed in the main manuscript (e.g., l176ff, l192ff, l202ff, l215ff). Furthermore, these additional quantities are reflected in terms which can be interpreted as homeostatic mechanisms (e.g., l205ff, l218ff). Results on datasets not seen during evolution (Fig3D, new appendix Fig10) demonstrate that these terms, rather than being “null”, do have an effect on learning performance. We have now included figures in the appendix which visualize the time development of these additional causal and homeostatic terms over time, complementary to the mathematical analysis provided in the main manuscript.
2) Proper statistical analysis of the findings.
In particular in Figure 3 significance estimates against the nullhypothesis need to be presented.
For that section, I would also like to see if the found learning rules differ in their convergence speed on new data.
We have performed Welch’s Ttests for all learning rules against LR0 using an even larger number of experiments that previously (now 80, previously 30). pvalues for all discovered learning rules are smaller than floating point precision (p < 10^{−16}). We have included this information in the main manuscript.
Our defined fitness function implicitly measures convergence speed: the fitness is proportional to the cumulative reward obtained over a fixed number of trials. Naturally, learning rules which converge slower will thus have a lower fitness value. We have now included additional figures in the appendix (Fig11) which compare the convergence speed of the different learning rules on new data.
3) Include data on the convergence speed of the learning. I still would like to see compute time used for a typical run, which is suspiciously absent from the paper.
Regarding convergence speed, please see our answer to the previous question.
The compute time required is explicitly addressed in the discussion (l467ff), quite contrary to being “suspiciously absent”. We believe the required time (2448h runtime on a single workstation) is within reasonable reach for most labs.
Reviewer #1:
This paper uses genetic algorithms (GAs) to find synaptic plasticity rules on a variety of plasticity problems with spiking neurons. GAs have as an advantage that a free exploration of possible models potentially coming up with original, superior solutions. On the other hand, being nondifferentiable they are severely limited by computationally requirements and can only find very simple equations and use fairly advanced preprocessing such predefined eligibilty traces.
The paper reads on occasion more like an advertisement than a scientific paper.
While we agree with the reviewer’s assessment concerning the advantages of GAs, we find it necessary to add some nuance to the asserted drawbacks:
– GAs are not “severely limited by computational requirements” (see answer Summary above).
– Finding simple rules is not a drawback but an explicit goal of our approach (see answer to Summary above).
Furthermore, we believe to have offered scientifically rigorous arguments for all of our claims, based on sound theory and extensive simulations. Should the positive conclusions that we draw from such evidence be subsumed as “advertisement”, then we might disagree on semantics, but wholeheartedly stand by our claims.
For the unsupervised rules, was the LTP made explicitly dependent on (w1)? Or was this found automatically?
The divergencies described in the previous version of the manuscript seemed to have disappeared like snow in the sun.
No, we did not adjust the LTP part manually, these terms were found autonomously by the GA.
The divergences discussed in the previous version of the manuscript have not “disappeared like snow in the sun”. On the contrary: the snow remains completely unchanged, but we do shed additional light on its makeup. More specifically, we reanalyzed the learning rules and in particular their implementation in NEST and discovered a more appropriate interpretation of constant terms (terms not decaying to zero for long intervals between pre and postsynaptic spiking) namely as pre and postsynaptically triggered homeostatic mechanism. This is discussed at length in our previous reply letter as well as in our previous manuscript revision.
Reviewer #2:
– The discussion about the need to jointly consider learning rule and homeostatic mechanisms is nice, but of limited novelty. I'd suggest to cite at least Mackay and Miller (1994) here.
We thank the reviewer for their critical evaluation and constructive criticism. We have included the reference in the revised manuscript.
– Figure 3: Panel 3 doesn't show any rules with new baselines. Is this old data?
Figure 3C shows data from multiple runs of the GA without access to the new baselines, in that sense it’s similar to the ”old data”.
– I failed to find the time window on which the baselines are computed (m=?) This is quite important for your discussion about time varying learning rates.
The time window is unfortunately only reported in the parameter tables in the appendix (m_{r} = 100trials (50s)). In the revised version we mention this number in the main text.
– Section references are broken. Recompile?
We have replaced all broken references, thanks.
– Double words: l. 82, 377
We have fixed the double words, thanks.
– l. 364: I learned the hard way to stay away from "first demonstration" claims in papers …
We have removed the “first demonstration” sentence; we agree this is challenging to assess appropriately.
Reference
1. Such, F. P., Madhavan, V., Conti, E., Lehman, J., Stanley, K. O., & Clune, J. (2017). Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv:1712.06567.; Stanley, K. O., Clune, J., Lehman, J., & Miikkulainen, R. (2019). Designing neural networks through neuroevolution. Nature Machine Intelligence, 1(1), 2435.; Real, E., Aggarwal, A., Huang, Y., & Le, Q. V. (2019, July). Regularized evolution for image classi_er architecture search. In Proceedings of the AAAI conference on arti_cial intelligence (Vol. 33, No. 01, pp. 47804789).; Real, E., Liang, C., So, D., & Le, Q. (2020). AutoMLzero: evolving machine learning algorithms from scratch. In International Conference on Machine Learning (pp. 80078019). PMLR.
https://doi.org/10.7554/eLife.66273.sa2Article and author information
Author details
Funding
European Commission (604102)
 Jakob Jordan
 Walter Senn
 Mihai A Petrovici
European Commission (720270)
 Jakob Jordan
 Walter Senn
 Mihai A Petrovici
European Commission (785907)
 Jakob Jordan
 Walter Senn
 Mihai A Petrovici
Universität Heidelberg (Manfred Stärk Foundation)
 Mihai A Petrovici
National Centre for Supercomputing Applications
 Jakob Jordan
 Maximilian Schmidt
European Commission (800858)
 Jakob Jordan
European Commission (945539)
 Jakob Jordan
 Walter Senn
 Mihai A Petrovici
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We gratefully acknowledge funding from the European Union, under grant agreements 604102, 720270, 785907, 945539 (HBP) and the Manfred Stärk Foundation. We further express our gratitude towards the Gauss Centre for Supercomputing e.V. (https://www.gausscentre.eu) for cofunding this project by providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer JUWELS at Jülich Supercomputing Centre (JSC). We acknowledge the use of Fenix Infrastructure resources, which are partially funded from the European Union’s Horizon 2020 research and innovation programme through the ICEI project under the grant agreement No. 800858. We thank all participants from the HBP SP9 Fürberg meetings for stimulating interactions and Tomoki Fukai for initial discussions and support. We also thank Henrik Mettler and Akos Kungl for helpful comments on the manuscript. All network simulations carried out with NEST (https://www.nestsimulator.org).
Senior Editor
 Michael J Frank, Brown University, United States
Reviewing Editor
 Mark CW van Rossum, University of Nottingham, United Kingdom
Reviewer
 Henning Sprekeler, Technische Universität Berlin, Germany
Publication history
 Received: January 5, 2021
 Accepted: August 19, 2021
 Version of Record published: October 28, 2021 (version 1)
Copyright
© 2021, Jordan et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,362
 Page views

 297
 Downloads

 2
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Developmental Biology
 Computational and Systems Biology
Many developmental processes depend on precise temporal control of gene expression. We have previously established a theoretical framework for regulatory strategies that can govern such high temporal precision, but experimental validation of these predictions was still lacking. Here, we use the timedependent expression of a Wnt receptor that controls neuroblast migration in Caenorhabditis elegans as a tractable system to study a robust, cellintrinsic timing mechanism in vivo. Singlemolecule mRNA quantification showed that the expression of the receptor increases nonlinearly, a dynamic that is predicted to enhance timing precision over an unregulated, linear increase in timekeeper abundance. We show that this upregulation depends on transcriptional activation, providing in vivo evidence for a model in which the timing of receptor expression is regulated through an accumulating activator that triggers expression when a specific threshold is reached. This timing mechanism acts across a cell division that occurs in the neuroblast lineage and is influenced by the asymmetry of the division. Finally, we show that positive feedback of receptor expression through the canonical Wnt pathway enhances temporal precision. We conclude that robust cellintrinsic timing can be achieved by combining regulation and feedback of the timekeeper gene.

 Computational and Systems Biology
 Neuroscience
Brains are not engineered solutions to a welldefined problem but arose through selective pressure acting on random variation. It is therefore unclear how well a model chosen by an experimenter can relate neural activity to experimental conditions. Here we developed 'Model identification of neural encoding (MINE)'. MINE is an accessible framework using convolutional neural networks (CNN) to discover and characterize a model that relates aspects of tasks to neural activity. Although flexible, CNNs are difficult to interpret. We use Taylor decomposition approaches to understand the discovered model and how it maps task features to activity. We apply MINE to a published cortical dataset as well as experiments designed to probe thermoregulatory circuits in zebrafish. MINE allowed us to characterize neurons according to their receptive field and computational complexity, features which anatomically segregate in the brain. We also identified a new class of neurons that integrate thermosensory and behavioral information which eluded us previously when using traditional clustering and regressionbased approaches.