Abstract
During many tasks the brain receives real-time feedback about performance. What should it do with that information, at the synaptic level, so that tasks can be performed as well as possible? The conventional answer is that it should learn by incrementally adjusting synaptic strengths. We show, however, that learning on its own is severely suboptimal. To maximize performance, synaptic plasticity should also operate on a much faster timescale – essentially, the synaptic weights should act as a control signal. We propose a normative plasticity rule that embodies this principle. In this, fast synaptic weight changes greedily suppress downstream errors, while slow synaptic weight changes implement statistically optimal learning. This enables near-perfect task performance immediately, efficient task execution on longer timescales, and confers robustness to noise and other perturbations. Applied in a cerebellar microcircuit model, the theory explains longstanding experimental observations and makes novel testable predictions.
Introduction
The standard view of learning is that synaptic strengths are adjusted to minimize some loss – typically the performance on a task, or set of tasks (Richards and Kording 2023; Bredenberg and Savin 2023). From this perspective, given the complexity of the nervous system and the range of tasks it must perform, conventional wisdom is that synaptic strengths change slowly. This makes sense for tasks with a discrete set of responses: Do I lick left or right in response to a stimulus? Is that grating oriented to the right or left of vertical? However, for naturalistic tasks with continuous output, such as reaching for an object, feedback about performance could be used to adjust weights on much faster timescales. In fact, for sufficiently fast feedback compared to the timescale over which the world changes, fast weight changes could, at least in principle, enable near-perfect performance.
Here we propose that synapses can maximize task performance by adjusting weights on two timescales. On fast timescales, synapses can use real-time feedback to suppress immediate errors. As long as the world changes relatively smoothly over time, if feedback at one moment indicates that neural output is too high or low, synaptic strengths can be transiently decreased or increased to compensate. On slow timescales, synapses can use the same feedback signal for statistically optimal learning.
We first illustrate the main concept with a toy model, demonstrating how two timescales of plasticity can work in concert. We then consider a more realistic model of a neuron, where synapses dynamically integrate input and feedback to extract useful learning signals from noisy observations. We derive synaptic update rules for this model by framing synaptic plasticity as an optimal control problem. This leads to substantial improvements over classical gradient-based learning. We then generalize the theory to incorporate small populations of neurons and delayed feedback transmitted via spikes. Applied to a model of temporal processing in the cerebellum, our theory provides normative explanations for common experimental observations, and makes novel testable predictions. Altogether, these results provide a principled account of how the brain can exploit multiple timescales of plasticity for efficient online adaptation and learning.
Results
We propose that neurons can optimize their performance by having two separate timescales of synaptic plasticity. To illustrate the general idea, consider a simple online linear regression problem. We model a single neuron that must learn to transform a set of time-varying input rates, νi(t), into a target output, y*(t). The output of the neuron is given by
with parameters wi denoting its tunable synaptic weights. The classical solution to this problem is to discretize time and iteratively modify the weights using the delta rule (Widrow and Hoff 1960),
where Δwi ≡ wi(t + Δt) − wi(t). This eventually minimizes the output error provided the learning rate, η, is small.
The delta rule is attractive because it is generally effective and it is biologically plausible – each synapse requires access only to its own local input rate, νi, and a global error feedback signal, y − y*. If, however, νi and target output, y*, change smoothly over time, temporal correlations can be exploited to dramatically improve performance. To exploit temporal correlations, we endow the synaptic weights with independent fast and slow components,
The role of the fast weights, δwi, is to transiently adjust neuronal output to suppress immediate errors – errors that can be predicted if y* changes slowly. The role of the slow weights, wi, is to adapt, on a much slower timescale, to the true weights, thereby reducing the need for ongoing fast-weight corrections (Fig. 1a).

A theory of fast and slow synaptic plasticity.
a) Left: A synapse must make online adjustments to its strength by integrating local signals, such as its own input and error feedback. We propose that these signals can be optimally exploited through two timescales of plasticity. ‘Fast weights’, δwi, fluctuate rapidly to suppress downstream error, whereas ‘slow weights’, wi, converge gradually to the values required of a given task. Right: In the toy-model simulation from panel (b), as the slow weights find the solution, fast-weight fluctuations are reduced. Shown are example weight trajectories from one randomly selected synapse out of 20. b) In an illustrative online regression task, a neuron must learn to match its output to a time-varying target (gray dashed line). With a classical delta rule (black line), weights adapt over time to eventually correct the output. With two timescales of plasticity (purple line), fast weights can pin the output to the target from the outset, while slow weights evolve in the background to learn the task.
For the model described by equation (1), a simple implementation of this strategy is to observe the error at discrete times t, and then update all of the fast weights uniformly via
where Δδwi ≡ δwi(t + Δt) − δwi(t) and
where the approximation is good if there are a large number of synapses and Δt is small relative to how fast the target and input firing rates change (Methods: Online linear regression).
While suppressing error with rapidly fluctuating weights is effective, continually making these shortterm adjustments is likely to be energetically costly. This can be addressed by allowing the slow weights to learn. Intuitively, the closer the output is to the target, the less work will be needed to correct it. Seeking to make long-term adjustments to the slow weights, such that, ultimately,
where
In simulations of this simple strategy, the weights that solve the regression problem can be learned just as efficiently as in the classical case, but with the striking difference that the output is pinned to the target from the outset (Fig. 1b). Thus, a neuron, or indeed an animal, could accurately execute a task even while it is still being learned. Below we develop a control theory framework that generalizes this to more realistic scenarios.
Synaptic plasticity as optimal control
We now apply this idea to a more realistic setting: a model neuron driven by multiple spike trains and subject to intrinsic noise. Similar to the regression problem above, the neuron must tune its synaptic weights to drive a downstream output, y, to match a time-varying target, y*. The target could represent, for instance, the movement of a limb, but in this analysis we will take it to be an abstract, time-varying one-dimensional signal. We assume there are N synapses, and each one has access to two sources of information: the presynaptic spikes it receives, and an error signal.
The dynamics of the model neuron are described by equations for the total synaptic current, I, and firing rate, r,
where a dot denotes a time derivative. In these equations, τI ∼ 5 ms and τr ∼ 50 ms denote the synaptic and rate-modulation time constants, wi denotes the weight of synapse i, and each xi is an independent Poisson spike train represented by a sum of delta functions,
where ti,k is the time of the kth spike arriving at synapse i. The parameter α sets the scale of firing rate response to synaptic current, and the final terms, ξI and ξr, denote zero-mean, unit-variance Gaussian white noise processes: ⟨ξI(t)ξI(t′)⟩ = δ(t − t′), and similarly for ξr. Functionally, a barrage of input spikes is linearly filtered to drive the firing rate of the neuron.
The downstream output of the neuron, y, is driven by the firing rate of the neuron and white noise,
The time constant, τy (taken to be ∼100 ms), determines how quickly the output responds to changes in firing rate, and the final term is, as above, zero-mean, unit-variance Gaussian white noise.
We develop the theory using a teacher-student learning framework: the target, y*, is generated by simulating equations (7) and (9), except with no noise and with a set of target weights, denoted
where ξf is zero-mean, unit-variance Gaussian white noise.
The full derivation of the plasticity rule is provided in Methods: Derivation of the Bayesian plasticity rule. Here we outline the general approach and main results. Similar to the analysis above, we decompose the synaptic weights into independent fast and slow components,
To derive the fast-weight rule, we consider a typical control strategy: using past error signals (equa-tion (10)), compute a control variable,
Suppose that each synapse can compute such an error-minimizing control,
with ν denoting the expected input rate per synapse, then leads to
We would like to choose
where T is the task duration and the expectation, 𝔼 […], is over the noise. The parameter λu is the ‘cost of control’; making it larger reduces fluctuations in the fast weights but increases the squared error between y and y*; making it smaller has the opposite effect. The quadratic loss in equation (12), along with the Gaussian white noise in the dynamical equations, means that
The resulting equations predict how synapses should process input and feedback for optimal plasticity. For ease of intuition, we present the results here with some simplifying approximations. The results without approximations, given by equation (76), are used in simulations, although the approximate equations below provide comparable performance (Fig. S1).
To set the fast weights, u is computed by processing the error feedback signal, f (equation (10)), via
(Methods, equations (80a) and (92)). Equation (13a) filters the noisy feedback to compute an online estimate,
How do the fast weights evolve in this scenario? Taking a time derivative of equation (11), and substituting equations (13a) and (13b), leads to
The fast weights are thus a low-pass filtered version of the negative of the error signal, which is the optimal strategy in the presence of noise. To see how this relates to the toy-model strategy above, we use the fact that when the control cost, λu, and feedback noise variance,
where we used equation (10) for f. Choosing the control cost to ensure that the product 𝒦ℒ obeys the relationship
What about the slow weights – should they even be updated? And if so, how? To answer the first question, we note that much of the error being controlled by the fast weights is due to mismatch between the slow weights,
A simple approach to updating the slow weights would be to use a modified delta rule, similar to equation (6). However, in this more realistic setting, the delta rule is suboptimal for two reasons. First, the error feedback signal, f, is noisy, which leads to noisy weight fluctuations, and even instability unless the learning rate is very small. Second, as pointed out by Aitchison et al. (2021), efficient weight updates should depend on uncertainty, with more uncertain weights updated more rapidly.
Qualitatively at least, the first problem can be fixed by filtering the feedback signal, f, and the second by scaling the learning rate by the uncertainty. Determining exactly how to design the filter and how much to scale, is, however, somewhat nontrivial. Fortunately, the analysis used to derive the update rule for the fast weights leads naturally to a rule for updating the slow weights that addresses these problems.
As shown in Methods: Derivation of the Bayesian learning rule (see in particular equation (80)), the update rule for the slow weights is
The first term on the right-hand side of equation (16a) looks very similar to the delta rule. However, there are three notable differences. First, the term proportional to
Mathematically, equations (16a) and (16b) represent the evolving mean and variance of a posterior distribution over target weights, similar to the approach introduced by Aitchison et al. (2021). This is the best estimate that can be formed by the synapse, given the observed data, and therefore the most efficient learning rule for
Using the teacher-student task, we compared our plasticity rule to a classical gradient-based rule for a neuron with 1000 synapses. The results are shown in Fig. 2. The classical approach is a generalized delta rule based on the Real-Time Recurrent Learning algorithm (Williams and Zipser 1989; Pearlmutter 1995), which is the canonical approach for online learning in dynamic models. We included simulations of the full Bayesian rule, as well as versions using the slow and fast components alone. Learning with the Bayesian slow-weight rule alone is already superior to the classical rule, both in terms of total output error (Fig. 2a) and the speed of convergence of the weights towards their target values (Fig. 2b). Output error is reduced further still when the fast weights are included, due to suppression of noise and compensation for weight mismatch at early stages of learning. Moreover, weight convergence in this case is almost indistinguishable from when slow-weight learning is implemented in isolation. Although a similarly low level of error is obtained using the fast weights alone, this requires much larger control currents (Fig. 2c), and no memory of the task is actually retained in the weights. With some small modifications (Methods: Feedback delays), our rule also accounts for time lags in the feedback signal, representing communication delays in the brain. In this case, slow-weight learning is virtually unaffected and fast-weight control remains effective for delays on the order of the downstream time constant (Fig. 2d).

Synaptic plasticity as optimal control.
a) Performance comparison between the classical gradient-based rule (black) and Bayesian rule without fast weights (blue), with plastic fast weights and frozen slow weights (red), and with plastic slow and fast weights (purple). Bars denote root-mean-squared error (RMSE) between output, y, and target output, y*, averaged over the entire task. Data points denote different random seeds. The dashed line gives the average error when slow weights are set to their target values at every point in time. The learning rate for the classical rule and control cost for the Bayesian rule were selected via grid search to minimize output error. b) The Bayesian rules yield faster convergence of weights to their target values compared to the classical rule, quantified as RMSE between weight vectors. Shaded areas are standard deviation from 10 random seeds. c) Root-mean-squared output error versus fast weight fluctuations, the latter computed as the root-mean-squared fast weight divided by root-mean-squared slow weight (averages taken over entire task). The size of the fast weight fluctuations were controlled by the cost parameter λu (see equation (12)). In the full plasticity rule (purple), even small fluctuations of ∼ 10% lead to a large reduction in error. Without slow-weight learning (red), much more control is needed for a similar level of performance. Points denote the average over seeds for a fixed λu; the shaded ellipses denote standard deviation across seeds. d) Control is effective for feedback delays shorter than the output time constant (here, τy = 100 ms). Shaded areas are standard deviation from 10 random seeds.
Control and learning with spiking feedback
So far we have considered a continuous feedback signal, representing diffuse transmission of a neuromodulator. But many important feedback pathways in the brain communicate via spikes; we now generalize the theory to incorporate these signals. Our modeling choices are specifically motivated by the sparse climbing-fiber inputs to cerebellar Purkinje cells, which provide instructive signals that drive plasticity at parallel-fiber synapses (Hull and Regehr 2022; Silva et al. 2024). However, the general formulation is applicable to other scenarios, such as dendritic spikes in cortical neurons that are triggered by feedback connections (Larkum 2013; Richards and Lillicrap 2019; Fişek et al. 2023).
We model the spiking feedback signal, fS, as an inhomogeneous Poisson process with rate
Concretely, fS comprises a series of delta functions centered at spike times that are generated with instantaneous rate γ(t). The synapses ‘see’ the spikes, and can use those to update synaptic strength. The parameters γ0 and ρ set the spontaneous rate of feedback and steepness of the nonlinear response to output error (representing, for instance, the transfer function of an error-coding neuron).
With spiking feedback, the standard formalism cannot be used for estimating the error and target weights. Instead, similar to Eden et al. (2004) and Pfister et al. (2010), we derived a recursive Bayesian algorithm. The details are provided in Methods: Spiking feedback, with the full plasticity rule described by equation (117). This has a similar structure to the rule discussed above (equations (13) and (16)). Incoming feedback spikes are optimally filtered to form an estimate,
The eligibility trace, which evolves similarly to equation (16c), records the recent activity level of the synapse. Qualitatively, the slow weights of recently active synapses tend to decrease in steps when feedback spikes arrive and increase gradually in between. This process eventually reaches a steady state where the observed feedback rate is equal to the predicted rate, which happens when the target weights have been accurately estimated.

Control and learning with spiking feedback.
a) When feedback is communicated via noisy spiking activity, optimal plasticity requires the synapse to infer the true underlying continuous error signal. The estimated error is then used to drive both control and learning. Fast weights seek to improve performance by canceling the estimated error (lower panel), while slow weights seek to reduce the error permanently. b) Performance depends on the rate of feedback. High feedback spike rates increase the precision of error estimates, thereby enhancing learning and control. A population of neurons acting on multiple independent feedback signals (dashed purple trace) can compensate for very low individual feedback rates. Shaded areas are standard deviation from 10 random seeds.
We compare the Bayesian rules to a spiking-feedback version of the classical rule used in the continuous model above (Methods, equation (38)). Notably, the classical spike-based rule conforms to the standard model of Purkinje cell plasticity (Ito 2001; Coesmans et al. 2004): synaptic input alone leads to long-term potentiation (LTP), whereas the conjunction of synaptic input and feedback spikes are required for long-term depression (LTD).
Simulations with the spiking-feedback model were consistent with those of the continuous model, recapitulating all of the previously observed advantages of Bayesian plasticity (Figs. 3b and S2). Learning in this case is substantially harder, requiring an order of magnitude more time to reach equivalent levels of performance, primarily because spikes occur rarely. The spontaneous feedback rate, γ0, is therefore a key determinant of performance, with higher rates leading to faster learning and lower output error (Fig. S2).
Not only does a low feedback rate make learning hard, it also makes fast-weight updates less effective. This is not surprising, since online corrections can only be reliably made soon after receiving an error-encoding spike. However, we found performance can be boosted by distributing the task and feedback across multiple neurons. We explored this by simulating M = 20 neurons, each as described by equation (7). For ease of comparison with the single-neuron simulations, we placed 50 synapses on each of the M neurons to give the same total number of synapses. The firing rates in the population model, rm, are summed to drive a common downstream output, modifying equation (9),
Each neuron receives a separate Poisson feedback signal, with spikes generated independently from the common error-dependent rate γ(t). Because the collected feedback spikes now provide a denser error signal, effective control can be achieved at the population level even when γ0 is small (Fig. 3b, dashed line).
Fast and slow synaptic plasticity in the cerebellum
Our theory was derived in a setting where the target weights are defined explicitly and the target output is guaranteed to be realizable by the model. To validate our results in a biological setting and, cucially, to make experimentally testable predictions, we apply it to a cerebellar learning problem.
Our population model can be mapped onto the structure of a cerebellar microzone – a group of Purkinje cells receiving correlated climbing-fiber input and whose outputs converge on common down-stream targets (Fig. 4a) (Apps et al. 2018). As an abstraction of microzone processing, we consider a population of 20 neurons that must learn to transform patterns of synaptic input into time-varying outputs (Fig. 4b). The inputs come from parallel fiber spikes, assumed to be driven by time-varying patterns of granule cell activity (Medina et al. 2000; Gilmer et al. 2023). The output represents a downstream motor signal, or a predicted motor output (Wolpert et al. 1998). Feedback is provided by sparse climbing-fiber spikes, with γ0 = 2 spikes/s, leading to firing rates close to that observed in vivo (Armstrong and Rawson 1979).

Fast and slow plasticity in the cerebellum.
a) Schematic of the cerebellar microzone model. A group of Purkinje cells receiving synaptic input from parallel fibers project to a common downstream output. Each Purkinje cell receives feedback spikes from a separate climbing fiber, signaling the error between actual and target outputs. b) Schematic of the learning task. Ten patterns of temporally organized parallel-fiber input must be mapped by a population of Purkinje cells to time-varying target outputs. c) Example outputs early and late in learning, replicating the results from the toy model in Fig. 1b. d) Performance during early (100 − 200 s) and late (5000 − 5100 s) stages of learning. Bars denote RMSE between output and target output, averaged over 100 s. e) Left: After training on 10 input-output mappings, half of the target outputs were changed. Right: Bayesian plasticity confers faster recovery than the classical approach. With fast weights, the output is largely insensitive to the perturbation. Error curves have been smoothed with a moving average filter of width 10 s. Shaded areas are standard deviations from simulations pooled over 10 random training seeds × 10 perturbations.
We defined 10 pairs of input-output patterns {νp(t),
So far we have focused on learning a particular input-output mapping. What happens when the desired input-output mapping changes? To address this, we tested the response of the trained models to changes in the target outputs. As above, we trained on 10 input-output mappings, but after training we changed 5 of the outputs (Fig. 4e). An efficient response requires synapses to adapt selectively to learn the new input-output pairs, without disrupting previously learned pairs. We find that Bayesian plasticity is much more robust than the classical approach, enabling rapid recovery over a few tens of pattern presentations. With fast-weight updates, the output error is almost unaffected by the perturbation. And in the background, the slow weights gradually converge to their new target values, reducing the magnitude of ongoing control currents and updating the stored memory of the task.
Signatures of fast and slow synaptic plasticity
We use the cerebellar learning task to make experimental predictions for the control and learning components of our theory. In the cerebellar model, the control signal depends strongly on climbing-fiber input, transiently adjusting activity whenever a feedback spike arrives. To measure the size of this effect in our model, we plotted the firing rates of neurons relative to the time of their associated climbing-fiber spikes. These simulations show a pronounced dip in firing rates after a climbing-fiber spike, but only when there are fast-weight updates; the dip disappears both for classical and Bayesian updates of slow weights only (Fig. 5a). Such patterns of activity – climbing-fiber-induced ‘spike pauses’ – have been widely observed in vivo (Bell and Grimm 1969; Bloedel and Roberts 1971; Sato et al. 1992; Barmack and Yakhnitsa 2003; Han et al. 2020). However, while widely observed, their computational significance has remained unclear; our model provides a novel, normative explanation that spike pauses are a strategy for suppressing downstream error. Moreover, it makes two experimentally testable predictions: the duration of the dip increases with the time constant of downstream dynamics, τy, and the magnitude of the dip decreases with feedback delays (Fig. 5b). Both predictions could be tested by manipulating output dynamics experimentally, or exploiting the fact that feedback delays vary across different regions of the cerebellum (Suvrathan et al. 2016; Jayabal et al. 2022).

Signatures of fast and slow synaptic plasticity.
a) Suppression of Purkinje-cell firing rates after climbing-fiber input is a signature of fast-weight updates. The solid lines denote averages of firing rates aligned to climbing-fiber feedback spikes during the cerebellar learning task. Shaded areas are the standard deviation of the average firing rate change of 20 neurons × 10 random seeds. b) The duration and magnitude of the predicted firing rate suppression vary systematically with the time constant of the downstream output and the feedback delay. c) A standard experimental protocol can discriminate between classical and Bayesian plasticity. After training on the cerebellar learning task, synapses were stimulated with conjunctions of parallel-fiber input bursts and climbing-fiber feedback spikes (50 repetitions at 2 reps/s). d) Left: The Bayesian rules produce LTD over a narrower range of PF-CF intervals than the classical rule, and also exhibit greater variability across synapses. Right: The variability is due to the adaptive learning rate: in the Bayesian rules, unlike the classical rule, the magnitude of weight changes depend on the average rate of input during the task. Solid lines denote synaptic weight changes measured at the end of the protocol, averaged over 100 tested synapses. Shaded areas are standard deviation.
Our model also makes predictions about the slow-weight dynamics. For that we used the trained models to simulate a common experimental plasticity protocol. In this protocol, synapses are repeatedly stimulated with a short burst of parallel fiber spikes, followed by one or more climbing-fiber spikes (here repeated 50 times at 2 repetitions/s) (Fig. 5c). In experiments, LTP is generally observed at zero or negative intervals between parallel and climbing-fiber stimulation, but switches to LTD for sufficiently positive intervals or sufficiently many climbing-fiber spikes (Wang et al. 2000; Safo and Regehr 2008; Mathy et al. 2009; Suvrathan et al. 2016; Bouvier et al. 2018). All of our models reproduced these observations, and have qualitatively similar behavior as the parameters of the plasticity protocol are varied (Fig. S3). However, the Bayesian rules yielded sharper tuning to the interval, and, importantly, greater variability across synapses (Fig. 5d). The source of this variability is the adaptive learning rate of Bayesian plasticity: the magnitude of slow-weight changes during the protocol depends on the average rate of input received during the task. Our theory can therefore be discriminated from the standard model by combining well-established plasticity protocols with recording or manipulation of input rates. While this may be possible in vivo in the near future with optogenetics and voltage imaging (Fan et al. 2023), in vitro, we predict that preconditioning a synapse with a long train of parallel fiber spikes can block subsequent plasticity induction by driving down its intrinsic learning rate.
Discussion
We proposed that synaptic plasticity should operate at two timescales: a fast timescale to suppress immediate errors, and a slow timescale to learn. This multiscale plasticity rule significantly outperforms online gradient-based learning, confers robustness to noise and other perturbations, and means that neurons can accurately perform a task almost immediately – well before learning has stored a longer-term solution in the weights.
Fast weights for feedback control
Our theory frames synaptic plasticity as an optimal control problem. This is a natural framework for any system where real-time feedback is available to guide the dynamics. While there is a long tradition of using control theory to understand neural function (Todorov 2004; McNamee and Wolpert 2019), our work is most closely related to recent studies that have also connected fast control mechanisms with slower processes of learning. In one approach, fast feedback was used to drive spiking network dynamics to a regime that enables local learning of complex tasks (Bourdoukan and Denève 2015; Denève et al. 2017; Alemi et al. 2018). In another, the Deep Feedback Control theory uses feedback to drive network output to a target steady-state, and then exploits credit assignment signals implicit within the controller to gradually tune feedforward weights (Meulemans et al. 2021; Meulemans et al. 2022; Rossbroich and Zenke 2023). In a recurrent network model of motor adaptation, Feulner et al. (2022) notably used the same error feedback signal for both control and learning, finding network dynamics resembling data from monkeys. Our theory differs markedly from all of these approaches, however, as the control signals in our models can be computed and implemented by individual synapses. Thus, in place of complex network-level control structures, we are positing a novel processing role for intracellular signaling at the synapse (Bhalla 2014; Benna and Fusi 2016; Zenke et al. 2017). Exploiting this local layer of processing could allow the brain to operate much more efficiently: physically, it reduces the amount of wiring and neurons needed for high performance; algorithmically, neurons make dual use of the feedback signals they receive.
The fast mechanism we propose differs from typical forms of short-term plasticity (Zucker and Regehr 2002; Abbott and Regehr 2004). Namely, in our model, fast weight changes depend strongly on error feedback, rather than presynaptic spike patterns alone. How could this be implemented biologically? If error signals are carried by a neuromodulator, as assumed in our continuous-feedback model, there are a range of candidate pathways. Neuromodulators including acetylcholine (McGehee et al. 1995; Gil et al. 1997), serotonin (Feng et al. 2001), dopamine (Higley and Sabatini 2010), norepinephrine (Cheun and Yeh 1992), among others (Marder 2012; Nadim and Bucher 2014), are all capable of selectively regulating synaptic transmission and the magnitudes of postsynaptic currents. In common with our model, these effects are transient, can be bidirectional, and are distinct from influences on membrane excitability and long-term plasticity. Alternatively, feedback signals might be processed by closely apposed glia. For instance, astrocytes would be well-placed to perform the computations we describe: they are known to integrate a variety of molecular signals and dynamically fine-tune both both pre and postsynaptic properties (Perea et al. 2009; De Pittà et al. 2016; Papouin et al. 2017).
In our cerebellar model, we assume that the fast fluctuations are driven by climbing fiber input. At a functional level, decades of past experiments are in striking agreement with our theory: when a climbing fiber spike is received by a Purkinje cell, there is a brief pause in output firing. Previous functional explanations for spike pauses include precise temporal or multiplexed coding (Steuber et al. 2007; De Schutter and Steuber 2009; Han et al. 2020), encoding dendritic spike rates (Davie et al. 2008), and transmitting teaching signals to downstream neurons (Mathews et al. 2012). While not mutually exclusive, we offer the simple explanation that if climbing fibers are indeed signaling error, then pauses are an effective mechanism for making short-term corrections. Our modeling predicts this mechanism would be most apparent in experiments focused on regions and modalities with short feedback delays, such as those receiving signals from the spinal cord for postural maintenance, or proprioceptive feedback from the periphery. It will be interesting to investigate how feedback control of cerebellar output could also support the function of more complex, nested motor-control loops (Wolpert et al. 1998; Rotondo et al. 2023), and cognitive computations more generally (Hull 2020; Kostadinov and Häusser 2022; Pemberton et al. 2023).
Slow weights for learning
Optimizing weights for control on fast timescales also leads to highly efficient learning on slow timescales. The rule that arises naturally in our framework works efficiently by maintaining a Bayesian estimate of the target synaptic weight and its uncertainty – the best a synapse can do, given a sequence of noisy local observations. Updating weights slowly in line with this evolving estimate gradually reduces the magnitude of corrections that need to be applied via the fast weights.
Bayesian approaches to plasticity were explored in much earlier work, both in machine learning (Bun-tine and Weigend 1991; MacKay 1992) and neuroscience (Dayan and Kakade 2000), but have only recently begun to be broadly exploited (Blundell et al. 2015; Hernández-Lobato and Adams 2015; Kappel et al. 2015; Kirkpatrick et al. 2017; Drugowitsch et al. 2019; Hiratani and Latham 2020; Aitchison et al. 2021; Jegminat et al. 2022; Malkin et al. 2024). Aitchison et al. (2021), in particular, were the first to develop a Bayesian theory of local synaptic plasticity. In common with our slow-weight rule, they found that tracking weight uncertainty leads to an adaptive learning rate, yielding superior performance to a simple online delta rule. As well as the novel connection to control, the slow-weight component of our theory generalizes their results in two important directions. First, Aitchison et al. (2021) assume that data seen by a synapse comprises independent samples, approximating away any history dependence. By contrast, our dynamic model and learning rule accounts for, and crucially exploits, the temporal correlations inherent in neural signals. Second, our theory incorporates sources of spiking feedback and communication delays, making Bayesian plasticity more widely applicable as a theory of learning in the brain. Simulating our more realistic model, we outlined how this theory could be tested (Fig. 5): if synapses have evolved to use local data as efficiently as possible, then adding a pre-conditioning train of input to common in vitro plasticity protocols should unmask the characteristic adaptive learning rate.
Outlook
Our general framework can be used to make predictions about plasticity in other cells and circuits, including more complex scenarios than we have studied here. For instance, while we have focused on linear neural dynamics – a suitable approximation for cerebellar Purkinje cells (Llinás and Sugimori 1980; Brunel et al. 2004; Walter and Khodakhah 2006) – the theory could be adapted to account for nonlinear integration in cortical and hippocampal neurons (Poirazi et al. 2003; Payeur et al. 2021; Bicknell and Häusser 2021). This could be achieved with a straightforward modification of the Kalman filtering approach that we used, for which there are already established theoretical tools for nonlinear filtering and control (Crassidis and Junkins 2011; Kutschireiter et al. 2020). At the network level, combining fast and slow plasticity with recurrent connectivity would be a challenging, but likely very fruitful, direction. It has previously been shown, for instance, that the concerted action of multiple plasticity rules can enhance memory formation and stability (Zenke et al. 2015).
Finally, to derive our fast and slow plasticity rules, the synapses had to know the underlying equations that transform synaptic drive to the output. How this is learned is an interesting, and important, problem. It is likely that some of the learning can be accomplished on much slower, even evolutionary, timescales (Friedrich et al. 2021). On shorter timescales, Moore et al. (2024) proposed an elegant solution: learn the optimal controller from data without ever learning the underlying equations. Applying this formalism, they demonstrated that a wide range of neurophysiological data can be explained by modeling individual neurons as optimal feedback controllers. Incorporating similar ideas to study nonlinear computations in hierarchical and recurrent neural circuits is an important avenue for future work.
Methods
Online linear regression
Using the simple linear regression model described by equation (1), we illustrated how concurrent fast and slow updates to synaptic weights can be exploited to optimize task performance. Here we provide the derivation of the plasticity rules for that model (equations (4) and (6)).
We consider a discrete-time setting where the inputs, output and weights are updated at small intervals, Δt. Decomposing synaptic weights into fast and slow components, equation (1) becomes
After observing the local inputs, νi, and error, y − y*, at time t, synapses update their fast and slow weight components in parallel,
The role of the fast weights, δwi, is to greedily suppress the current error, so that y approximately matches y* – independent of the setting of the slow weights. We achieve this by choosing the fast-weight updates to be proportional to the negative of the error,
where N is the total number of synapses and
To derive the approximate expression in the second line, we replaced νi(t +Δt) by νi(t), which is valid if Δt is small compared to the timescale over which the firing rates change, and dropped the term
Thus, with mild assumptions, the fast-weight updates given in equation (22) force the output to closely match the target at all times.
While the fast-weight updates ensure low output error, this comes at the cost of ongoing, and potentially large, fluctuations in weights. Those fluctuations can be seen in the right panel of Fig. 1a, especially at early times before the slow weights have been updated. We address this with a local rule for the slow weights that permanently pushes y closer to y*, thereby reducing the need for ongoing corrections. To that end, at each time step, we adjust the slow weights with the aim of minimizing the loss
The loss is minimized by making updates in steps proportional to the negative of the gradient
For the last line we used the fact that the fast weights do not depend on the synapse index, so Σj δwjνj = δwi Σj νj ≈ Nνδwi. This leads to a local slow-weight update rule
where
Classical online learning
For the model described by equations (7) and (9), with either continuous feedback (equation (10)) or spiking feedback (equation (17)), we compared our Bayesian plasticity rules to classical, gradient-based counterparts. The classical rules are based on the Real Time Recurrent Learning algorithm (RTRL; Williams and Zipser 1989; Pearlmutter 1995), which is the canonical online, gradient-based learning rule for dynamic models. While typically employed in recurrent neural networks, here we apply it to a single-neuron model, adapted slightly to enable fair comparisons with alternatives.
For continuous feedback, we aim to minimize the squared output error, (y − y*)2. However, unlike the simple model, here we need to account for history dependence – the output at any given time depends on inputs that were received in the past. To account for this, we start by considering a loss integrated over a time horizon, h, in which the weights are held fixed,
The derivative of the loss with respect to the weight of synapse i over that period is
The second factor in the integral is a dynamic quantity, which can be computed by taking derivatives of equations (7) and (9) with respect to wi. This gives the linear system
for the sensitivities
where η is the learning rate. Functionally, sy,i plays the role of an eligibility trace by recording the recent impact of the synapse on the output.
Formally, after the weights have been updated, the sensitivities should then be reset to the zero initial condition to track activity over the next time interval. However, assuming weight changes are slow compared to the model dynamics, the RTRL rule is commonly run as a fully online approximation by taking h → 0, giving a differential equation for the weight dynamics, and neglecting the reset of the sensitivities (Pearlmutter 1995). We take this approach here and make two further modifications. First, we replace the term (y − y*) with its noisy observation, f. Second, we append an additional weight decay term to account for the slow drift of the target weights (see Aitchison et al. (2021) and the derivation of the Bayesian rule below for motivation). This leads to classical online updates for each synapse
For spiking feedback, we use the same approach, but with a different loss function. In this case the output error is communicated via a Poisson process with rate γ(t), as described by equation (17). To handle the random and discrete nature of spiking feedback, we work with spike probabilities.
Over a time horizon h, the feedback spike count, denoted n, will be distributed as
Updates to the weights should seek to match this distribution to the one that would arise if all of the weights were set correctly. We express this in terms of a ‘target rate’, γ*, which defines a target distribution, P (n|γ*), via equation (33).
We use the cross-entropy loss between target and actual distributions to derive the updates; this is given by
The derivative of the loss with respect to weight wi is, using
As before, we use equation (30) to compute the
Similar to the continuous feedback model, we define weight updates to be proportional to the negative of the loss gradient, and then take h → 0 to yield a continJuous-time expression. Expanding equation (35) to first order in h, approximating the integrals as
Thus, the spiking-feedback analogue of equation (31) is
where ηS is the learning rate.
Taking h → 0, and approximating the unobserved rate γ with feedback spikes, fS, leads to a continuous-time learning rule
where we have appended a weight decay term as before.
Finally, to set γ*, we use the fact that when the weights are set correctly, the error comprises Gaussian fluctuations about zero due to noise. With the exponential nonlinearity in the feedback rate function (equation (17)), the expected rate is the mean of a log-normal variable, giving
where
In equation (40), A is a matrix encoding the model dynamics (see equation (70), below), and Λnoise is diagonal matrix comprising noise variances
In all simulations, learning rates for the classical rules were selected via grid search to minimize output error.
Derivation of the Bayesian plasticity rule
Problem setting
Given the model dynamics described by equations (7) and (9), we consider a general task in which the goal is choose weights, wi, to match the output, y, to a time-varying target, y*. We assume y* is generated via deterministic dynamics identical to y, using a set of target weights
The drift timescale, τw ∼ 1000 s, is orders of magnitude slower than the neuronal and output dynamics.
The above assumptions about the relationship of target weights to y*, and the drift described by equation (41), are exact in the teacher-student learning paradigm. In the cerebellar learning simulations, target weights are instead defined implicitly as weight values that would solve the task.
Error minimization from the perspective of a single synapse
As in the simple online linear regression model, we decompose the weights into independent slow and fast components,
To derive learning rules in this more complicated setting, we start by introducing a set of variables equal to the difference between actual and target quantities,
From equations (7) and (9), these evolve via
The target weights,
However, there’s a problem: δy depends on the activity of all the synapses; information that any one synapse doesn’t have. To remedy this, we’ll work from the perspective of synapse i, and model δy in terms of local variables.
Decomposing synaptic weights into slow and fast components, the sum in equation (43a) can be expanded as
The first term depends only on synapse i, so it’s local. The second term represents a source of error due to the other N − 1 unobserved synapses. For Poisson input and large N, this can be approximated as Gaussian white noise (Fourcaud and Brunel 2002),
As above, ν denotes the expected input rate per synapse.
For the third term in equation (44), we view the sum over fast weights as approximating a single scalar control variable, denoted u, later to be chosen to minimize output error. Assuming that all of the synapses can compute u locally, by setting the fast weights uniformly as
the third term in equation (44) can be expressed as
where the approximation is good in the large N limit. Inserting equations (45) and (47) into equation (44), and inserting that into equation (43), we arrive at a set of equations that uses only information local to synapse i,
Equation (48) is a model of the error that the synapse can use to guide plasticity. In our rule, the synapse learns by updating its slow weight,
Optimal control solution
Because the noise in equation (48) is Gaussian and we want to minimize the squared error, (y − y*)2, the problem of finding the optimal u reduces to the well-known linear-quadratic-Gaussian (LQG) control problem (Crassidis and Junkins 2011). In this setting, the optimal control, denoted
Note that this includes a control cost, λuu2, which limits the size of
To solve equation (49) using standard methods, we first need to express our model for the error (equation (48)) and feedback signal (equation (10)) in a canonical linear state space form. We start by combining the model variables into a single vector,
Then, we express the control variable as a vector, ui, and make a change of variables that will clean up an offset term,
Introducing the vector
the dynamics of Φi and feedback, f, can be expressed together as
where
and ξ′ is Gaussian white noise with covariance matrix
With this set up, the control problem in equation (49) can be written
where HH is an outer product.
Although the control variable, vi, is now a vector instead of a scalar, u, the optimal solution will still only have one non-zero component (the second one). This is because multiplication by B in equation (53a) means that only the second component of vi will influence the output, so the control cost term in equation (56) will force the other components to be zero. The scalar
where the subscript 2 denotes the second component of the vector.
Nevertheless, equation (56) is still not identical to equation (49), because the variable change in equation (51) has shifted the quantity
Note also that we have written the control variable with a subscript i, since at this stage the solution depends on local variables. We will later make approximations that remove this dependence, so that the control variables are the same for all synapses, as assumed in equation (46).
Ignoring biological constraints, LQG theory gives the solution to the problem defined by equations (50)–(56) as
The vector
where Pi and Si are determined by a pair of matrix Riccati equations (Crassidis and Junkins 2011),
The equation for Pi, a covariance matrix representing uncertainty about the estimate
While these equations are exact, they are far from biologically plausible. In the next several sections we remedy that by making several approximations.
Control gain approximation
Solving equation (60b) backwards in time poses a problem for online implementation, because the dynamics matrix, Ci (equation (54a)), depends on the synaptic input, xi, which cannot be known in advance. We therefore simply drop the term xi from the matrix Ci. With this approximation and sufficiently large T, equation (60b) relaxes to a steady state that is independent of the terminal condition and synapse index. The steady-state matrix S is then defined implicitly as the positive-semidefinite solution to the algebraic Riccati equation
where C is the same as Ci but with xi set to zero.
Writing equation (61) in component form, and using the fact that S is positive-semidefinite, straight-forward algebra shows that the first row and column of S are all zero. Consequently, because of the structure of B (equation (54b)), the only nonzero elements of Li are the second through fourth elements of the second row. That in turn implies that only the second element of
where ℒ is proportional to the second through fourth elements of the second row of S,
and
We can now use equation (58b) to derive an explicit expression for the control variable,
In the approximation in the last line, we ignored the second term, which will allow us to make the controller independent of i, and makes little difference to performance in practice.
We solve equation (61) numerically to obtain the control gain for simulations. An analytical approximation is provided in equation (91) below, for the simpler version of the plasticity rule presented in the main text; that approximation also works well in practice (Fig. S1).
Feedback gain approximation
Although we now have all the ingredients for a local online plasticity rule, we also approximate the feedback gain Ki for the sake of interpretability.
We first unpack equation (60a) by partitioning the state-estimate covariance matrix into blocks,
where, to highlight the functional roles that each block will play in the plasticity rule, we introduced the variables
where the numeric subscripts on the right-hand side denote the range of row and column indices for each each block. The variable
Defining
and using the definitions in equations (70) and (71), equation (60a) can be expanded to give the dynamics of each of the blocks as
where 0(3×2) denotes a 3 × 2 matrix of zeros, A is the lower right (2 : 4) × (2 : 4) block of Ci,
and Λ is the lower right (2 : 4) × (2 : 4) block of Λ′,
Because equations (69a) and (69b) track uncertainty related to the target weight, their temporal dynamics are strongly dependent on the local synaptic input, xi – this is what that couples the weight to the observed feedback. By contrast, equation (69c) tracks the uncertainty of δi, which is dominated by uncertainty about the activity of unobserved synapses, so depends only weakly on xi. Mathematically, this is reflected in the fourth term of equation (69c) being dominated by the order-N synaptic noise term in Λ (see equations (45) and (71)). We can, therefore, ignore the xi-dependent term in equation (69c). Then, Σi relaxes to a steady state that is independent of the synapse index. That steady state is the solution to the algebraic Riccati equation,
We use equation (72) in place of (69c). But to capture the most important dependence on input, xi, we solve equations (69a) and (69b) individually for each synapse.
Finally, we can express Ki, equation (59a), in terms of the quantities in equation (69). Using the fact that Ki is proportional to the fourth column of Pi (see equation (52) for H), the first component is given by
which is used to update the estimate of the target weight,
Computations that should be performed by a synapse
Bringing everything together, we express the plasticity rule as a system of differential equations that optimally process local signals to enable concurrent control and learning.
Unpacking the error estimate from equation (58a), expressing the feedback gain via equations (72) and (74), and approximating
The only remaining issue is that the control variable,
Setting
Altogether, this gives synaptic weights
Plasticity rule approximation for τI, τr ≪ τy
Control is most effective when the downstream time constant is large. This regime also permits a simple approximation of the plasticity rule that makes parameter dependencies explicit. We use these equations in the main text for ease of intuition (equations (13a) – (16b)). Here we provide the details of the approximation.
Starting from equations (76a) and (76d), we express the components of the vectors
where the subscripts on 𝒦 indicate their components,
With these approximations, equation (76) becomes
where
is a weighted sum of the three components of the Kalman gain (equation (74)). Equation (80) has been copied to the main-text equations (13) and (16) with some minor cosmetic changes; to clean up the presentation, there we have dropped the y subscripts, explicitly denoted
To compute 𝒦, we note first of all that
where the second equality comes from equation (74) and we have defined
To derive an explicit expression for τ · Σ · h, we operate on both sides of equation (72) with τ, giving us
Then, combining the fact that τ · A = −h with equation (82), we arrive at
Where
Solving equation (85) gives the scalar feedback gain explicitly as
For the control gain, we make use of the duality of state-estimation and control for the LQG problem (Crassidis and Junkins 2011), which means that knowing the parameters of the equation for
Defining
as the analogue of equation (59b) then leads to
and thus a scalar control gain
Here we have used the notation
Inserting the optimal control
into equation (80a) yields the plasticity rule corresponding to equations (13) and (16) in the main text.
Steady-state learning rate
Using the approximate plasticity rule given in equation (16), we can derive an expression for how the steady-state learning rate of a Bayesian synapse depends on key parameters, and especially on the firing rate.
The first observation is that because 𝒦 is typically large, after a spike zi decays rapidly, and so whenever a new spike arrives it’s effectively zero. Thus, assuming for the moment that σi is constant, equation (16c) tells us that when a spike occurs at time t = 0, zi(t) is given by (for times t > 0)
Inserting this into equations (16a) and (16b), and assuming for the moment that both are constant on a timescale of τy/𝒦, we see that when a spike occurs they change by
where the subscript ‘spike’ indicates that these are changes in response to a spike.
In the large 𝒦 limit, both
Because equation (94a) tells us the weight change per pre-synaptic spike, we can identify the term in front of
To derive an explicit expression for
where we used equation (93) for the time course of zi(t) after a spike and, recall, νi is the input firing rate to synapse i. Note the slight abuse of notation: above, and in what follows in this section,
For typical parameter values (see Table 1), and using the definitions in equations (86) and (87), we find that the second term inside the square root is large (∼ 103 − 104) as long as input rates, νi, are greater than ∼ 0.1 spikes/s. Equation (97) thus simplifies to
Inserting this into equation (95), we arrive at
Notably, this scaling with input rate is consistent with the results of Aitchison et al. (2021), which they demonstrated was consistent with experiments.

Model and simulation parameters.
Spiking feedback
We now consider the case where feedback is communicated via spikes. The spiking feedback signal, fS, is modeled as an inhomogeneous Poisson processes with rate γ, as described by equation (17). The derivation is essentially the same as for the continuous feedback case above; the difference is that it’s not a textbook problem, so we have to do it ourselves. The approach, though, is relatively standard: we compute the time evolution of the mean and covariance of Φi conditioned on feedback spikes. These correspond to
Model discretization and notation
To model feedback spikes, we discretize time into bins of size Δt, and assume that there are either zero or one spike in each bin; this assumption is valid because we will eventually take the limit Δt → 0. Our first step is to turn equation (53a) into discrete time updates rather than differential equations.
Discrete time variables are distinguished from their continuous versions with a t subscript, and several quantities are rescaled by the time step to reduce clutter. We define
The synaptic inputs xi,t are implemented as discrete-time delta functions, taking the value 1/Δt when there is a spike and zero otherwise. The discrete-time version of equation (53a) is, then, given by
For Poisson feedback with the rate determined by equation (17), the analogue of equation (53b) is an expression for the probability of observing a feedback spike given Φi. Assuming small Δt, and treating spikes as Bernoulli random variables fS,t ∈ {0, 1}, this is given by
with H = (0, 0, 0, 1), as above (see equation (52)).
Using equations (101) and (102) as the model for feedback observations, we aim to infer a distribution over the unobserved states, Φi,t, conditioned on the history of local data seen by the synapse. The data is denoted
The distribution is modeled as a Gaussian with conditional mean and covariance
Recursive Bayesian filter
We use standard Bayesian recursion to derive explicit expressions for the mean and covariance of Φi. Our approach is similar to that of Eden et al. (2004) and Pfister et al. (2010).
At each time step, we first update the distribution over Φi,t using observation fS,t, via Bayes theorem, and then predict Φi,t+Δt from the model dynamics,
The first equation holds for our model because fS,t only depends on Φi,t, and Φi,t only depends on the input and weight from previous time steps. We use assumed density filtering for the update step, approximating the left-hand side of equation (105a) as Gaussian by matching the mean and covariance of the right-hand side.
When fS,t = 1, the approximation is exact, giving a Gaussian
The normalizer is given by
After some straightforward algebra, the updated mean and covariance are found to be
When fS,t = 0,
which leads to, again after some straightforward algebra,
Putting both cases together, we arrive at
which characterizes the posterior distribution on the left-hand side of the update step in equation (105a).
Next, having completed the update step, we perform the prediction step in equation (105b) by evaluating the integral. The second factor in the integral is Gaussian by assumption, following from the previous update. The first factor, describing the state transition probability, is also Gaussian, following from the dynamics of equation (53a). This means the left-hand side of equation (105b) is Gaussian with
Substituting equations (111a) and (111b) into these expressions, and using the definitions in equations (104a) and (104b), gives a discrete-time estimator
Expanding to first order in Δt, using equations (100a)–(100d), and taking Δt → 0 gives the continuous-time version:
Here, the discrete-time feedback fS,t has converged to a sum of delta functions fS, and
is the estimated feedback rate.
equations (114a) and (114b) correspond to equations (58a) and (60a) in the continuous feedback case, with ρPi · H playing the role of feedback gain Ki, and
As above, the feedback gain can be approximated by partitioning the covariance matrix Pi (see equation (66)), and ignoring the minor contribution of xi to the lower-right block,
A key difference between these equations and equation (69) is that in the latter Σi relaxes to a steady state, whereas here it depends dynamically on the estimate of the feedback rate
The rest of the derivation proceeds identically to the continuous case. Unpacking equation (114a) into the original notation, and setting
Putting everything together, the plasticity rule for spiking feedback is
with weights set via equation (77) as before.
Feedback delays
In the brain, error feedback cannot be provided instantaneously to a neuron due to communication and sensory processing delays, typically on the order of ∼ 10 − 100 ms. We model this constraint by introducing a lag of time τ into the feedback signal, modifying equation (10) to become
Similarly, for the spiking feedback model, equation (17) becomes
For the slow weights, the learning rules can be adapted by simply feeding lagged copies of the input and weight parameters into the governing equations, such that the Bayesian filter estimate reflects the unobserved states as they were at time t − τ. Working through the details, the only change required to the learning rules is to replace xi(t) with xi(t − τ) in equation (76d), and similarly in the spiking feedback case. Such a lagged copy of the input could be maintained at the synapse via simple chemical signaling cascades, as shown by Jayabal et al. (2022). As long as the assumed target-weight drift is slow relative to the feedback delay, τw ≫ τ, this simple modification of the learning rule maintains consistent performance in the presence of delays.
For the fast weights, we employ the Smith predictor technique to build a controller that accounts for the delay (Smith 1959). Conceptually, we need to compute a control signal at time t that will cancel the current error, but we only have an estimate from feedback observations of the error as it was at time t − τ. We therefore use the known model dynamics and recent history of applied control signals to predict the current error by propagating the estimate δ forward in time. We use this to construct the controller
and thus the fast-weight rule
where the control gain ℒ is the same as that above.
Writing
The first term propagates
The convolution can be simplified by considering an online implementation. Denoting the integral by U(t), differentiation gives
which only requires knowing the most recently applied control and a single lagged variable from the past.
Multiple neurons
In Fig. 3C and Fig. 4, we simulated a small feedforward network of M neurons. In this model, each of the neurons has identical dynamics, as described by equations (7a) and (7b), but receives independent synaptic input and noise. Noise variances were scaled by a factor 1/M relative to the single neuron simulations, for ease of comparison (see Table 1 for parameters). A common downstream output is driven by the sum over firing rates, rm, described by equation (19). A spiking feedback signal, fS,m, is generated independently for each neuron using the Poisson rate function in equation (17).
In the plasticity rule, the current, rate and synaptic noise terms in equation (71) are scaled by M to reflect the contribution of all neurons to the error. The fast weight rule (equation (46)) also needs to be modified to account for the fact that the feedback signals, and therefore error estimates, differ across neurons. Ignoring locality constraints, the optimal strategy would be to combine the error estimates computed by each of the M neurons,
and use that to construct a single global control variable,
In equation (125),
Simulation details
Online linear regression
For the simulations in Fig. 1, the model neuron was driven with sinusoidal inputs and required to learn weights that would produce a time-varying, periodic target output. We used N = 20 synapses and a period Tperiod = 1 s.
Input rates were defined as
The target output was generated as
With
Teacher-student task
In Figs. 2 and 3, simulations were initialized by independently drawing target weights
Cerebellar learning task
The cerebellar learning task in Fig. 4 requires mapping time-varying patterns of synaptic input to associated target outputs. Unlike the teacher-student task, the target outputs are generated as random wavelets, rather than through explicit target weights.
Input patterns of duration Tpattern = 1 s were constructed in a similar manner to Bicknell and Häusser (2021). A synapse was selected to be active on a given pattern with probability pactive = 0.5. If active, its time-dependent presynaptic firing rate in that pattern was defined by drawing a peak time tpeak,i ∼ Uniform(0, Tpattern), maximum amplitude νmax,i ∼ Uniform(90, 110) (spikes/s) and centering a Gaussian bump of activity within the pattern interval,
with width parameter σν = 0.1 s. Input spikes were generated stochastically each pattern presentation using this rate function.
For the target outputs, we generated random trajectories that drift slowly over time and could be stitched together in random order to yield a single continuous target (analogous to composing different sequences of motor actions). This was implemented using truncated Fourier series with random coefficients that drift via Ornstein-Uhlenbeck processes. Different trajectories are forced to agree at the endpoints by using a common constant offset and multiplying the time-dependent parts by a bump function. Specifically, the drifting target output for pattern p was defined via
with initial coefficients ap,n,
Plasticity protocol
For the simulated plasticity experiments in Fig. 5, we used models that had been trained on the cerebellar learning task. We applied the protocol to 100 synapses using the classical and Bayesian learning rules, initializing with the trained weights and uncertainties. The classical learning rate, η, and control cost, λu, were set at the values that minimized output error during the task. For most simulations, the protocol consisted of repeated pairings of parallel fiber input bursts (5 spikes at 10 ms intervals) and single climbing fiber spikes at a given delay. The numbers of spikes were varied in a subset of simulations (Fig. S3c,d). The pairing was repeated 50 times at 500 ms intervals. Plasticity effects were computed at the end of the protocol as the change in slow-weights (or classical weights) from their initial trained values.
Supplementary Figures

Approximation of the Bayesian learning rule.
Comparison of the full Bayesian rule, given by equation (76), and the approximation presented for ease of interpretation in equations (13) and (16), in terms of output error (a) and weight error (b). The approximation is valid whenever the output time constant dominates the model dynamics, τr, τI ≪ τy. Here we use parameters τI = 1 ms, τr = 10 ms, and τy = 100 ms.

Simulations with spiking feedback.
Results from the spiking feedback model are consistent with the continuous feedback model (Fig. 2), although learning takes longer and performance depends strongly on the spontaneous feedback rate γ0. a) Performance as a function of feedback rate (replotted from Fig. 3 for context). Shaded areas are s.d. from 10 random seeds. b) Detailed comparison between learning rules with γ0 = 64 spikes/s. c) Control remains effective in the presence of feedback delays. Shaded areas are s.d. from 10 random seeds. d) Weights converge to their target values more quickly with increasing rates of feedback.

Parameter dependence of simulated plasticity experiments.
The LTP/LTD curve produced using the protocol in Fig. 5c depends systematically on the parameters of the model and protocol. a) Tuning to the PF-CF interval becomes broader as the output time constant τy increases. b) Peaks in the LTP and LTD lobes of the curve are shifted in proportion to the feedback delay. c) The magnitude of plasticity increases with the number of spikes in the parallel fiber burst. d) LTD is amplified and shifted to earlier PF-CF intervals with increasing numbers of climbing fiber spikes.
Acknowledgements
This work was supported by the Gatsby Charitable Foundation and the Wellcome Trust (110114/Z/15/Z). The authors acknowledge the use of the UCL Myriad High Performance Computing Facility (Myriad@UCL), and associated support services, in the completion of this work.
Additional information
Code availability
Simulation code is available at https://github.com/babicknell/SynControl
References
- Synaptic computationNature 431:796–803
- Synaptic plasticity as Bayesian inferenceNat. Neurosci 24:565–571
- Learning nonlinear dynamics in efficient, balanced spiking networks using local plasticity rulesProc. AAAI Conf. Artif. Intell 32
- Cerebellar modules and their role as operational cerebellar processing unitsThe Cerebellum 17:654–682
- Activity patterns of cerebellar cortical neurones and climbing fibre afferents in the awake catJ. Physiol 289:425–448
- Cerebellar climbing fibers modulate simple spikes in Purkinje cellsJ. Neurosci 23:7904–7916
- Discharge properties of Purkinje cells recorded on single and double microelectrodesJ. Neurophysiol 32:1044–1055
- Computational principles of synaptic memory consolidationNat. Neurosci 19:1697–1706
- Molecular computation in neurons: a modeling perspectiveCurr. Opin. Neurobiol 25:31–37
- A synaptic learning rule for exploiting nonlinear dendritic computationNeuron 109:4001–4017
- Action of climbing fibers in cerebellar cortex of the catJ. Neurophysiol 34:17–31
- Weight uncertainty in neural networkIn: Proc. Int. Conf. Mach. Learn pp. 1613–1622
- Enforcing balance allows local supervised learning in spiking recurrent networksAdv. Neural Inf. Process. Syst 28
- Cerebellar learning using perturbationseLife 7:e31599
- Desiderata for normative models of synaptic plasticityarXiv
- Optimal information storage and the distribution of synaptic weights: perceptron versus Purkinje cellNeuron 43:745–757
- Bayesian backpropagationComplex Syst 5:603–643
- Modulation of GABAA receptor-activated current by norepinephrine in cerebellar Purkinje cellsNeuroscience 51:951–960
- Bidirectional parallel fiber plas-ticity in the cerebellum under climbing fiber controlNeuron 44:691–700
- Optimal estimation of dynamic systemsChapman & Hall
- The origin of the complex spike in cerebellar Purkinje cellsJ. Neurosci 28:7599–7609
- Explaining away in weight spaceAdv. Neural Inf. Process. Syst 13
- Astrocytes: Orchestrating synaptic plasticity?Neuroscience 323
- Patterns and pauses in Purkinje cell simple spike trains: experiments, modeling and theoryNeuroscience 162:816–826
- The brain as an efficient and robust adaptive learnerNeuron 94:969–977
- Learning optimal decisions with confidenceProc. Natl. Acad. Sci. U.S.A 116:24872–24880
- Dynamic analysis of neural encoding by point process adaptive filteringNeural Comput 16:971–998
- All-optical physiology resolves a synaptic basis for behavioral timescale plasticityCell 186:543–559
- Serotonin receptors modulate GABAA receptor channels through activation of anchored protein kinase C in prefrontal cortical neuronsJ. Neurosci 21:6502–6511
- Feedback-based motor control can guide plasticity and drive rapid learningbioRxiv
- Cortico-cortical feedback engages active dendrites in visual cortexNature 617:769–776
- Dynamics of the firing probability of noisy integrate-and-fire neuronsNeural Comput 14:2057–2110
- Neural optimal feedback control with local learning rulesAdv. Neural Inf. Process. Syst 34:16358–16370
- Differential regulation of neocortical synapses by neuromodulators and activityNeuron 19:679–686
- An emergent temporal basis set robustly supports cerebellar time-series learningJ. Neurophysiol 129:159–176
- Climbing fiber synapses rapidly and transiently inhibit neighboring Purkinje cells via ephaptic couplingNat. Neurosci 23:1399–1409
- Probabilistic backpropagation for scalable learning of bayesian neural networksIn: Proc. Intl. Conf. Mach. Learn pp. 1861–1869
- Competitive regulation of synaptic Ca2+ influx by D2 dopamine and A2A adenosine receptorsNat. Neurosci 13:958–966
- Rapid Bayesian learning in the mammalian olfactory systemNat. Commun 11:3845
- Prediction signals in the cerebellum: beyond supervised motor learningeLife 9:e54073
- The cerebellar cortexAnnu. Rev. Neurosci 45:151–175
- Cerebellar long-term depression: characterization, signal transduction, and functional rolesPhysiol. Rev 81:1143–1195
- Experience adaptively tunes the timing rules for associative plasticitybioRxiv
- Learning as filtering: Implications for spike-based plasticityPLoS Comput. Biol 18:e1009721
- Network plasticity as Bayesian inferencePLoS Comput. Biol 11:e1004485
- Overcoming catastrophic forgetting in neural networksProc. Natl. Acad. Sci 114:3521–3526
- Reward signals in the cerebellum: origins, targets, and functional implicationsNeuron 110:1290–1303
- The Hitchhiker’s guide to nonlinear filteringJ. Math. Psychol 94:102307
- A cellular mechanism for cortical associations: an organizing principle for the cerebral cortexTrends Neurosci 3:141–151
- Electrophysiological properties of in vitro Purkinje cell somata in mammalian cerebellar slicesJ. Physiol 305:171–195
- A practical Bayesian framework for backpropagation networksNeural Comput 4:448–472
- Synaptic plasticity forms and functionsAnnu, Rev. Neurosci 43:95–117
- Signatures of Bayesian inference emerge from energy-efficient synapseseLife 12, RP92595
- Neuromodulation of neuronal circuits: back to the futureNeuron 76:1–11
- Effects of climbing fiber driven inhibition on Purkinje neuron spikingJ. Neurosci 32:17988–17997
- Encoding of oscillations by axonal bursts in inferior olive neuronsNeuron 62:388–399
- Nicotine enhancement of fast excitatory synaptic transmission in CNS by presynaptic receptorsScience 269:1692–1696
- Internal models in biological controlAnnu. Rev. Control Robot. Auton. Syst 2:339–364
- Timing mechanisms in the cerebellum: testing predictions of a large-scale computer simulationJ. Neurosci 20:5516–5525
- Credit assignment in neural networks through deep feedback controlAdv. Neural Inf. Process. Syst 34:4674–4687
- The least-control principle for local learning at equilibriumAdv. Neural Inf. Process. Syst 35:33603–33617
- The neuron as a direct data-driven controllerProc. Natl. Acad. Sci. U.S.A 121:e2311893121
- Neuromodulation of neurons and synapsesCurr. Opin. Neurobiol 29:48–56
- Astrocytic control of synaptic functionPhilos. Trans. R. Soc. B 372:20160154
- Burst-dependent synaptic plasticity can coordinate learning in hierarchical circuitsNat. Neurosci 24:1010–1019
- Gradient calculations for dynamic recurrent neural networks: A surveyIEEE Trans. Neural Netw 6:1212–1228
- Cerebellar-driven cortical dynamics enable task acquisition, switching and consolidationbioRxiv
- Tripartite synapses: astrocytes process and control synaptic informationTrends Neurosci 32:421–431
- Synapses with short-term plasticity are optimal estimators of presynaptic membrane potentialsNat. Neurosci 13:1271–1275
- Pyramidal neuron as two-layer neural networkNeuron 37:989–999
- Dendritic solutions to the credit assignment problemCurr. Opin, Neurobiol 54:28–36
- The study of plasticity has always been about gradientsJ. Physiol 601:3141–3149
- Dis-inhibitory neuronal circuits can control the sign of synaptic plasticityAdv. Neural Inf. Process. Syst 37
- How cerebellar architecture and dense activation patterns facilitate online learning in dynamic tasksbioRxiv
- Timing dependence of the induction of cerebellar LTDNeuropharmacology 54:213–218
- Short-term modulation of cerebellar Purk-inje cell activity after spontaneous climbing fiber inputJ. Neurophysiol 68:2051–2062
- Climbing fibers provide essential instructive signals for associative learningNat. Neurosci 27:940–951
- A controller to overcome dead timeiSA journal 6:28–33
- Cerebellar LTD and pattern recognition by Purkinje cellsNeuron 54:121–136
- Timing rules for synaptic plasticity matched to behavioral functionNeuron 92:959–967
- Optimality principles in sensorimotor controlNat. Neurosci 7:907–915
- The linear computational algorithm of cerebellar Purkinje cellsJ. Neurosci 26:12861–12872
- Coincidence detection in single dendritic spines mediated by calcium releaseNat. Neurosci 3:1266–1273
- Adaptive switching circuitsIRE WESCON convention record 4:96–104
- A learning algorithm for continually running fully recurrent neural networksNeural Comput 1:270–280
- Internal models in the cerebellumTrends Cog. Sci 2:338–347
- Diverse synaptic plasticity mechanisms orchestrated to form and retrieve memories in spiking neural networksNat. Comms 6:6922
- Continual learning through synaptic intelligenceIn: Int. Conf. Mach. Learn pp. 3987–3995
- Short-term synaptic plasticityAnnu. Rev. Physiol 64:355–405
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Copyright
© 2025, Brendan A Bicknell & Peter E Latham
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 42
- downloads
- 0
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.