Cerebellar learning using perturbations

Abstract
Introduction
Results
Discussion
Materials and methods
Appendix 1
Data availability
References
Article and author information
Metrics

Abstract

The cerebellum aids the learning of fast, coordinated movements. According to current consensus, erroneously active parallel fibre synapses are depressed by complex spikes signalling movement errors. However, this theory cannot solve the credit assignment problem of processing a global movement evaluation into multiple cell-specific error signals. We identify a possible implementation of an algorithm solving this problem, whereby spontaneous complex spikes perturb ongoing movements, create eligibility traces and signal error changes guiding plasticity. Error changes are extracted by adaptively cancelling the average error. This framework, stochastic gradient descent with estimated global errors (SGDEGE), predicts synaptic plasticity rules that apparently contradict the current consensus but were supported by plasticity experiments in slices from mice under conditions designed to be physiological, highlighting the sensitivity of plasticity studies to experimental conditions. We analyse the algorithm’s convergence and capacity. Finally, we suggest SGDEGE may also operate in the basal ganglia.

https://doi.org/10.7554/eLife.31599.001

Introduction

A central contribution of the cerebellum to motor control is thought to be the learning and automatic execution of fast, coordinated movements. Anatomically, the cerebellum consists of a convoluted, lobular cortex surrounding the cerebellar nuclei (Figure 1A, Eccles et al., 1967; Ito, 1984). The main input to the cerebellum is the heterogeneous mossy fibres, which convey multiple modalities of sensory, contextual and motor information. They excite both the cerebellar nuclei and the cerebellar cortex; in the cortex they synapse with the very abundant granule cells, whose axons, the parallel fibres, excite Purkinje cells. Purkinje cells constitute the sole output of the cerebellar cortex and project an inhibitory connection to the nuclei, which therefore combine a direct and a transformed mossy fibre input with opposite signs. The largest cell type in the nuclei, the projection neurone, sends excitatory axons to several motor effector systems, notably the motor cortex via the thalamus. Another nuclear cell type, the nucleo-olivary neurone, inhibits the inferior olive. The cerebellum receives a second external input: climbing fibres from the inferior olive, which form an extensive, ramified connection with the proximal dendrites of the Purkinje cell. Each Purkinje cell receives a single climbing fibre. A more modular diagram of the olivo-cerebellar connectivity relevant to this paper is shown in Figure 1B; numerous cell types and connections have been omitted for simplicity.

Figure 1

Download asset Open asset

The cerebellar circuitry and properties of Purkinje cells.

(A) Simplified circuit diagram. MF, mossy fibres; CN, (deep) cerebellar nuclei; GC, granule cells; Cb Ctx, cerebellar cortex; PF, parallel fibres; PC, Purkinje cells; PN, projection neurones; NO, nucleo-olivary neurones; IO, inferior olive; CF, Climbing fibres. (B) Modular diagram. The $\pm$ signs next to the synapses indicate whether they are excitatory or inhibitory. The granule cell and indirect inhibitory inputs they recruit have been subsumed into a bidirectional mossy fibre–Purkinje cell input, M. Potentially plastic inputs of interest here are denoted with an asterisk. $i$ , input; $o$ , output; $ℰ (o)$ , error (which is a function of the output). (C) Typical Purkinje cell electrical activity from an intracellular patch-clamp recording. Purkinje cells fire two types of action potential: simple spikes and, in response to climbing fibre input, complex spikes. (D) According to the consensus plasticity rule, a complex spike will depress parallel fibre synapses active about 100 ms earlier. The diagram depicts idealised excitatory postsynaptic currents (EPSCs) before and after typical induction protocols inducing long-term potentiation (LTP) or depression (LTD). Grey, control EPSC; blue, green, post-induction EPSCs.

https://doi.org/10.7554/eLife.31599.002

Purkinje cells discharge two distinct types of action potential (Figure 1C). They nearly continuously emit simple spikes—standard, if brief, action potentials—at frequencies that average 50 Hz. This frequency is modulated both positively and negatively by the intensity of inputs from the mossy fibre–granule cell pathway (which can also recruit interneurons that inhibit Purkinje cells; Eccles et al., 1967). Such modulations of Purkinje cell firing are thought to underlie their contributions to motor control. In addition, when the climbing fibre is active, an event that occurs continuously but in a somewhat irregular pattern with a mean frequency of around 1 Hz, the Purkinje cell emits a completely characteristic complex spike under the influence of the intense excitation from the climbing fibre (Figure 1C).

The history of research into cerebellar learning is dominated by the theory due to Marr (1969) and Albus (1971). They suggested that the climbing fibre acts as a ‘teacher’ to guide plasticity of parallel fibre–Purkinje cell synapses. It was several years, however, before experimental support for this hypothesis was obtained (Ito et al., 1982; Ito and Kano, 1982), by which time the notion that the climbing fibre signalled errors had emerged (Ito, 1972; Ito, 1984). Error modalities thought to be represented by climbing fibres include: pain, unexpected touch, imbalance, and retinal slip. According to the modern understanding of this theory, by signalling such movement errors, climbing fibres induce long-term depression (LTD) of parallel fibre synapses that were active at the same time (Ito et al., 1982; Ito and Kano, 1982; Sakurai, 1987; Crepel and Jaillard, 1991) or, more precisely, shortly before (Wang et al., 2000; Sarkisov and Wang, 2008; Safo and Regehr, 2008). A compensating long-term potentiation (LTP) is necessary to prevent synaptic saturation (Lev-Ram et al., 2002; Lev-Ram et al., 2003; Coesmans et al., 2004) and its induction is reported to follow high-frequency parallel fibre activity in the absence of complex spikes (Jörntell and Ekerot, 2002; Bouvier et al., 2016). Plasticity of parallel fibre synaptic currents according to these plasticity rules is diagrammed in Figure 1D.

Cerebellar learning with the Marr-Albus-Ito theory has mostly been considered, both experimentally and theoretically, at the level of single cells or of uniformly responding groups of cells learning a single stereotyped adjustment. Such predictable and constrained movements, exemplified by eye movements and simple reflexes, provide some of the best studied models of cerebellar learning: the vestibulo-ocular reflex (Robinson, 1976; Ito et al., 1974; Blazquez et al., 2004), nictitating membrane response/eye blink conditioning (McCormick et al., 1982; Yeo et al., 1984; Yeo and Hesslow, 1998), saccade adaptation (Optican and Robinson, 1980; Dash and Thier, 2014; Soetedjo et al., 2008) and regulation of limb movements by withdrawal reflexes (Ekerot et al., 1995; Garwicz et al., 2002). All of these motor behaviours have in common that there could conceivably be a fixed mapping between an error and a suitable corrective action. Thus, adaptations necessary to ensure gaze fixation are exactly determined by the retinal slip. Such fixed error-correction relations may have been exploited during evolution to create optimised correction circuitry.

The problems arise with the Marr-Albus-Ito theory if one tries to extend it to more complex situations, where neurones must respond heterogeneously (not uniformly) and/or where the flexibility to learn arbitrary responses is required. Many motor control tasks, for instance coordinated movements involving the hands, can be expected to fall into this class. To learn such complex/arbitrary movements with the Marr-Albus-Ito theory requires error signals that are specific for each cell, each movement and each time within each movement. The theory is thus incomplete, because it does not describe how a global evaluation of movement error can be processed to provide such detailed instructions for plasticity to large numbers of cells, a general difficulty which the brain has to face that was termed the credit assignment problem by Minsky (1961).

A suitable algorithm for solving the general cerebellar learning problem would be stochastic gradient descent, a classical optimisation method that Minsky (1961) suggested might operate in the brain. We shall speak of ‘descent’ when minimising an error and ‘ascent’ when maximising a reward, but the processes are mathematically equivalent. In stochastic gradient descent, the objective function is explored by random variations in the network that alter behaviour, with plasticity then retaining those variations that improve the behaviour, as signalled by a decreased error or increased reward. Several possible mechanisms of varying biological plausibility have been proposed. In particular, perturbations caused by synaptic release (Minsky, 1954; Seung, 2003) or external inputs (Doya and Sejnowski, 1988) have been suggested, while various (abstract) mechanisms have been proposed for extraction of changes in the objective function (Williams, 1992). To avoid confusion, note that these forms of stochastic gradient descent differ from those conforming to the more restrictive definition used in the machine learning community, in which the stochastic element is the random sampling of examples from the training set (Robbins and Monro, 1951; Shalev-Shwartz and Ben-David, 2014). This latter form has achieved broad popularity in online learning from large data sets and in deep learning. Although the theoretical framework for (perturbative) stochastic gradient descent is well established, the goal of identifying in the brain a network and cellular implementation of such an algorithm has proved elusive.

The learning behaviour with the best established resemblance to stochastic gradient ascent is the acquisition of song in male songbirds. The juvenile song is refined by a trial and error process to approach a template memorised from a tutor during a critical period (Konishi, 1965; Mooney, 2009). The analogy with stochastic gradient ascent was made by Doya and Sejnowski (1988) and was then further developed experimentally (Olveczky et al., 2005) and theoretically (Fiete et al., 2007). However, despite these very suggestive behavioural correlates, relatively little progress has been made in verifying model predictions for plasticity or identifying the structures responsible for storing the template and evaluating its match with the song.

A gradient descent mechanism for the cerebellum has been proposed by the group of Dean et al. (2002), who term their algorithm decorrelation. Simply stated (Dean and Porrill, 2014), if both parallel fibre and climbing fibre inputs to a Purkinje cell are assumed to vary about their respective mean values, their correlation or anticorrelation indicates the local gradient of the error function and thus the sign of the required plasticity. At the optimum, which is a minimum of the error function, there should be no correlation between variations of the climbing fibre rate and those of the parallel fibre input. Hence the name: the algorithm aims to decorrelate parallel and climbing fibre variations. An appropriate plasticity rule for implementing decorrelation is a modified covariance rule (Sejnowski, 1977; who moreover suggested in abridged form a similar application to cerebellar learning). Although decorrelation provides a suitable framework, its proponents are still in the process of developing a cellular implementation (Menzies et al., 2010). Moreover, we believe that the detailed implementation suggested by Menzies et al., 2010 is unable to solve a temporal credit assignment problem (that we identify below) arising in movements that can only be evaluated upon completion.

Below, we analyse in more detail how the Marr-Albus-Ito theory fails to solve the credit assignment problem and suggest how the cerebellum might implement stochastic gradient descent. We shall provide support for unexpected predictions the proposed implementation makes regarding cerebellar synaptic plasticity, which are different from those suggested for decorrelation. Finally, we shall perform a theoretical analysis demonstrating that learning with the algorithm converges and that it is able to attain maximal storage capacity.

Results

Requirements for cerebellar learning

We begin by examining the current consensus (Marr-Albus-Ito) theory of cerebellar learning and illustrating some of its limitations when extended to the optimisation of complex, arbitrary movements. The learning framework we consider is the following. The cerebellar circuitry must produce at its output trains of action potentials of varying frequencies at given times (Thach, 1968). We consider only firing rates $r (t)$ , which are a function of time in the movement (Figure 2A). The cerebellar output influences the movement and the resulting movement error, which can be sensed and fed back to the Purkinje cells in the form of climbing fibre activity.

Figure 2

Download asset Open asset

Analysis of cerebellar learning.

(A) The model of cerebellar learning we address is to adjust the temporal firing profiles (miniature graphs of rate as a function of time, $r (t)$ , *magenta*) of multiple cerebellar output neurones (nuclear projection neurones) (*black circles*) to optimise a movement, exploiting the evaluated movement error, which is fed back to the cerebellar circuit (Purkinje cells) (*blue arrow*). (B) The Marr-Albus-Ito theory was simulated in a network simulation embodying different error definitions (Results, Materials and methods). The Marr-Albus-Ito algorithm is able to minimise the average signed error (*blue*, *Signed*) of a group of projection neurones, but not the unsigned error (*red*, *Unsigned*). (C) However, as expected, optimising the signed error does not optimise individual cell firing profiles: comparison of the underlying firing profiles (initial, *magenta*; final, *green*) with their target (*grey*) for a specimen projection neurone illustrates that neither the temporal profile nor even the mean single-cell firing rate is optimised. If the unsigned error is used, there is no convergence and the firing rates simply saturate (*red*) and the error *increases*. (D) A different learning algorithm will be explored below: stochastic gradient descent, according to which temporally and spatially localised firing perturbations ( $δ r$ , *red*, of a Purkinje cell) are consolidated if the movement improves; this requires extraction of the *change* of error ( $Δ$ Error).

https://doi.org/10.7554/eLife.31599.003

We constructed a simple network simulation to embody this framework (see Materials and methods). In it, mossy fibre drive to the cerebellum was considered to be movement- and time-dependent but not to vary during learning. Granule cells and molecular layer interneurons were collapsed into the mossy fibre inputs, which acted on Purkinje cells through unconstrained synapses (negative weights could arise through plasticity of interneuron synapses; Jörntell and Ekerot, 2002; Jörntell and Ekerot, 2003; Mittmann and Häusser, 2007) that were modified to optimise the firing profiles of the projection neurones of the deep cerebellar nuclei. We implemented the Marr-Albus-Ito algorithm via a plasticity rule in which synaptic weights were updated by the product of presynaptic activity and the presence of the error signal. Thus, active mossy fibre (pathway) inputs would undergo LTP (increasing Purkinje cell firing; Lev-Ram et al., 2003) after each movement unless the climbing fibre was active, in which case they would undergo LTD (decreasing activity). The model represented a microzone whose climbing fibres were activated uniformly by a global movement error.

The behaviour of the Marr-Albus-Ito algorithm in this network depends critically on how the definition of the error is extended from the error for an individual cell. The most natural definition for minimising all errors would be to sum the absolute differences between the actual firing profile of each cerebellar nuclear projection neurone and its target, see Equation 7. However, this error definition is incompatible with the Marr-Albus-Ito algorithm (Unsigned in Figure 2B). Examination in Figure 2C of the firing profile of one of the projection neurones shows that their rate simply saturates. The algorithm is able to minimise the average signed error defined in Equation 14 (Signed in Figure 2B). However, inspection of a specimen final firing profile illustrates the obvious limitation that neither the detail of the temporal profile nor even the mean firing rate for individual cells is optimised when using this error (see Figure 2C).

This limitation illustrates the credit assignment problem. How to work back from a single evaluation after the movement to optimise the firing rate of multiple cells at different time points, with all of those firing profiles also differing between multiple movements which must all be learnt in parallel? A model problem that the Marr-Albus-Ito algorithm cannot solve would be two neurones receiving the same inputs and error signal but needing to undergo opposite plasticity. In the general case of learning complex, arbitrary movements, this requires impractical foreknowledge to be embodied by the climbing fibre system, to know exactly which cell must be depressed to optimise each movement, and this still would not solve the problem of inducing plasticity differentially at different time points during a firing profile, given that often only a single, post-movement evaluation will be available. An example would be a saccade, where the proximity to the target can only be assessed once the movement of the eye has ceased.

The complex spike as trial and error

The apparent difficulty of solving the credit assignment problem within the Marr-Albus-Ito algorithm led us to consider whether the cerebellum might implement a form of stochastic gradient descent. By combining a known perturbation of one or a small group of neurones with information about the change of the global error (Figure 2D), it becomes possible to consolidate, specifically in perturbed neurones, those motor command modifications that reduce the error, thus leading to a progressive optimisation. We set out to identify a biologically plausible implementation in the cerebellum. In particular, that implementation should be able to optimise multiple movements in parallel, employing only feasible cellular computations—excitation, inhibition and thresholding.

A cerebellar implementation of stochastic gradient descent must include a source of perturbations of the Purkinje cell firing rate $δ r$ . The fact that Purkinje cells can contribute to different movements with arbitrary and unknown sequencing imposes an implementation constraint preventing simple-minded approaches like comparing movements performed twice in succession. We recall that we assume that no explicit information categorising or identifying movements is available to the Purkinje cell. It is therefore necessary that knowledge of both the presence and sign of $δ r$ be available within the context of a single movement execution.

In practice, a number of different perturbation mechanisms can still satisfy these requirements. For instance, any binary signal would be suitable, since the sign of the perturbation with respect to its mean would be determined by the simple presence or absence of the signal. Several plausible mechanisms along these lines have been proposed, including external modulatory inputs (Doya and Sejnowski, 1988; Fiete et al., 2007), failures and successes of synaptic transmission (Seung, 2003) or the absence and presence of action potentials (Xie and Seung, 2004). However, none of these mechanisms has yet attracted experimental support at the cellular level.

In the cerebellar context, parallel fibre synaptic inputs are so numerous that the correlation between individual input variations and motor errors is likely to be extremely weak, whereas we seek a perturbation that is sufficiently salient to influence ongoing movement. Purkinje cell action potentials are also a poor candidate, because they are not back-propagated to parallel fibre synapses (Stuart and Häusser, 1994) and therefore probably cannot guide their plasticity, but the ability to establish a synaptic eligibility trace is required. Bistable firing behaviour of Purkinje cells (Loewenstein et al., 2005; Yartsev et al., 2009), with the down-state (or long pauses) representing a clear perturbation towards lower (zero) firing rates, is a perturbation candidate. However, exploratory plasticity experiments did not support this hypothesis and the existence of bistability in vivo is disputed (Schonewille et al., 2006a).

We thus propose, in accordance with a suggestion due to Harris (1998), another possible perturbation of Purkinje cell firing: the complex spike triggered by the climbing fibre. We note that there are probably two types of inferior olivary activity. Olivary neurones mediate classical error signalling triggered by external synaptic input, but they also exhibit continuous and irregular spontaneous activity in the absence of overt errors. We suggest the spontaneous climbing fibre activations cause synchronised perturbation complex spikes (pCSs) in small groups of Purkinje cells via the $\sim$ 1:10 inferior olivary–Purkinje cell divergence (Schild, 1970; Mlonyeni, 1973; Caddy and Biscoe, 1976), dynamic synchronisation of olivary neurones through electrical coupling (Llinás and Yarom, 1986; Bazzigaluppi et al., 2012a) and common synaptic drive. The excitatory perturbation—a brief increase of firing rate (Ito and Simpson, 1971; Campbell and Hesslow, 1986; Khaliq and Raman, 2005; Monsivais et al., 2005)—feeds through the cerebellar nuclei (changing sign; Bengtsson et al., 2011) to the ongoing motor command and causes a perturbation of the movement, which in turn may modify the error of the movement.

The perturbations are proposed to guide learning in the following manner. If a perturbation complex spike results in an increase of the error, the raised activity of the perturbed Purkinje cells was a mistake and reduced activity would be preferable; parallel fibre synapses active at the time of the perturbing complex spikes should therefore be depressed. Conversely, if the perturbation leads to a reduction of error (or does not increase it), the increased firing rate should be consolidated by potentiation of the simultaneously active parallel fibres.

How could an increase of the error following a perturbation be signalled to the Purkinje cell? We suggest that the climbing fibre also performs this function. Specifically, if the perturbation complex spike increases the movement error, a secondary error complex spike (eCS) is emitted shortly afterwards, on a time scale of the order of 100 ms (50–300 ms). This time scale is assumed because it corresponds to the classical error signalling function of the climbing fibre, because it allows sufficient time for feedback via the error modalities known to elicit complex spikes (touch, pain, balance, vision) and because such intervals are known to be effective in plasticity protocols (Wang et al., 2000; Sarkisov and Wang, 2008; Safo and Regehr, 2008). The interval could also be influenced by the oscillatory properties of olivary neurones (Llinás and Yarom, 1986; Bazzigaluppi et al., 2012b).

The predicted plasticity rule is therefore as diagrammed in Figure 3. Only granule cell synapses active simultaneously with the perturbation complex spike undergo plasticity, with the sign of the plasticity being determined by the presence or absence of a subsequent error complex spike. Granule cell synapses active in the absence of a synchronous perturbation complex spike should not undergo plasticity, even if succeeded by an error complex spike. We refer to these different protocols with the abbreviations (and give our predicted outcome in parenthesis): G_ _ (no change), GP_ (LTP), G_E (no change), GPE (LTD), where G indicates granule cell activity, P the presence of a perturbation complex spike and E the presence of an error complex spike. Note that both granule cells and climbing fibres are likely to be active in high-frequency bursts rather than the single activations idealised in Figure 3.

Figure 3

Download asset Open asset

Predicted plasticity rules.

Synchronous activation of granule cell synapses and a perturbation complex spike (pCS) leads to LTP (GP_, increased synaptic weight $w$ ; top left, *red*), while the addition of a succeeding error complex spike (eCS) leads to LTD (GPE, top right, *magenta*). The bottom row illustrates the corresponding ‘control’ cases from which the perturbation complex spike is absent; no plasticity should result (G_ _ blue and G_E *green*).

https://doi.org/10.7554/eLife.31599.004

Several of the predictions of this rule appear to be incompatible with the current consensus. Thus, parallel fibre synapses whose activity is simultaneous with (GP_) or followed by a complex spike (G_E) have been reported to be depressed (Sakurai, 1987; Crepel and Jaillard, 1991; Lev-Ram et al., 1995; Coesmans et al., 2004; Safo and Regehr, 2008; Gutierrez-Castellanos et al., 2017), while we predict potentiation and no change, respectively. Furthermore, parallel fibre activity alone (G_ _) leads to potentiation (Lev-Ram et al., 2002; Jörntell and Ekerot, 2002; Lev-Ram et al., 2003; Coesmans et al., 2004; Gutierrez-Castellanos et al., 2017), while we predict no change.

Synaptic plasticity under physiological conditions

As described above, the plasticity rules we predict for parallel fibre–Purkinje cell synapses are, superficially at least, close to the opposite of the consensus in the literature. Current understanding of the conditions for inducing plasticity gives a key role to the intracellular calcium concentration (combined with nitric oxide signalling; Coesmans et al., 2004; Bouvier et al., 2016), whereby high intracellular calcium concentrations are required for LTD and moderate concentrations lead to LTP. Standard experimental conditions for studying plasticity in vitro, notably the extracellular concentration of calcium, are likely to result in more elevated intracellular calcium concentrations during induction than pertain physiologically. Recognising that this could alter plasticity outcomes, we set out to test whether our predicted plasticity rules might be verified under more physiological conditions.

We made several changes to standard protocols (see Materials and methods); one was cerebellum-specific, but the others also apply to in vitro plasticity studies in other brain regions. We did not block GABAergic inhibition. We lowered the extracellular calcium concentration from the standard 2 mM (or higher) used in slice work to 1.5 mM (Pugh and Raman, 2008), which is near the maximum values measured in vivo in rodents (Nicholson et al., 1978; Jones and Keep, 1988; Silver and Erecińska, 1990). To avoid the compact bundles of active parallel fibres produced by the usual stimulation in the molecular layer, we instead used weak granule cell layer stimuli, which results in sparse and spatially dispersed parallel fibre activity. Interestingly, it has been reported that standard protocols using granule cell stimulation are unable to induce LTD (Marcaggi and Attwell, 2007). We used a pipette solution designed to prolong energy supply in extended cells like the Purkinje cell (see Materials and methods). Experiments were carried out in adult mouse sagittal cerebellar slices using otherwise standard patch-clamp techniques.

Pairs of granule cell test stimuli with an interval of 50 ms were applied at 0.1 Hz before and after induction; EPSCs were recorded in voltage clamp at $-$ 70 mV. Pairs of climbing fibre stimuli with a 2.5 ms interval were applied at 0.5 Hz throughout the test periods, mimicking tonic climbing fibre activity, albeit at a slightly lower rate. The interleaved test granule cell stimulations were sequenced 0.5 s before the climbing fibre stimulations. The analysis inclusion criteria and amplitude measurement for the EPSCs are detailed in the Materials and methods. The average amplitude of the granule cell EPSCs retained for analysis was $-$ 62 $\pm$ 46 pA (mean $\pm$ s.d., $n$ = 58). The rise and decay time constants (of the global averages) were 0.74 $\pm$ 0.36 ms and 7.2 $\pm$ 2.7 ms (mean $\pm$ sd), respectively.

During induction, performed in current clamp without any injected current, the granule cell input consisted of a burst of five stimuli at 200 Hz, reproducing the propensity of granule cells to burst at high frequencies (Chadderton et al., 2004; Jörntell and Ekerot, 2006). The climbing fibre input reflected the fact that these can occur in very high-frequency bursts (Eccles et al., 1966; Maruta et al., 2007). We used two stimuli at 400 Hz to represent the perturbation complex spike and four for the subsequent error complex spike if it was included in the protocol. Depending on the protocol, the climbing fibre stimuli had different timings relative to the granule cell stimuli: a pair of climbing fibre stimuli at 400 Hz, 11–15 ms or $\sim$ 500 ms after the start of the granule cell burst and/or four climbing fibre stimuli at 400 Hz, 100–115 ms after the beginning of the granule cell burst (timing diagrams will be shown in the Results). In a fraction of cells, the climbing fibre stimuli after the first were not reliable; our grounds for including these cells are detailed in the Materials and methods. The interval between the two bursts of climbing fibre stimuli when the error complex spike was present was about 100 ms. We increased the interval between induction episodes from the standard one second to two, to reduce any accumulating signal during induction. 300 such induction sequences were applied (Lev-Ram et al., 2002).

We first show the protocols relating to LTP (Figure 4). A granule cell burst was followed by a distant perturbation climbing fibre stimulus or the two inputs were activated simultaneously. In the examples shown, the protocol with simultaneous activation (GP_, Figure 4C,D) caused a potentiation of about 40%, while the temporally separate inputs caused a smaller change of 15% in the opposite direction (G_ _, Figure 4A,B). We note that individual outcomes were quite variable; group data and statistics will be shown below. The mean paired-pulse ratio in our recordings was A2/A1 = 1.75 $\pm$ 0.32 (mean $\pm$ sd, $n$ = 58). As here, no significant differences of paired-pulse ratio were observed with any of the plasticity protocols: plasticity $-$ baseline difference for GP_, mean $-$ 0.08, 95% confidence interval ( $-$ 0.34, 0.20), $n$ = 15; GPE, mean 0.12, 95 % c.i. ( $-$ 0.07, 0.33), $n$ = 10; G_ _, mean $-$ 0.01, 95 % c.i. ( $-$ 0.24, 0.24), $n$ = 18; G_E, mean $-$ 0.09, 95 % c.i. ( $-$ 0.23, 0.29), $n$ = 15.

Figure 4

Download asset Open asset

Simultaneous granule cell and climbing fibre activity induces LTP.

(A) Membrane potential (*blue*) of a Purkinje cell during an induction protocol (G_ _) where a burst of 5 granule cell stimuli at 200 Hz was followed after 0.5 s by a pair of climbing fibre stimuli at 400 Hz. (B) Average EPSCs recorded up to 10 min before (*black*) and 20–30 min after the end of the protocol of A (*blue*). Paired test stimuli (*triangles*) were separated by 50 ms and revealed the facilitation typical of the granule cell input to Purkinje cells. In this case, the induction protocol resulted in a small reduction (*blue* vs. *black*) of the amplitude of responses to both pulses. (C) Purkinje cell membrane potential (*red*) during a protocol (GP_) where the granule cells and climbing fibres were activated simultaneously, with timing otherwise identical to A. (D) EPSCs recorded before (*black*) and after (*red*) the protocol in C. A clear potentiation was observed in both of the paired-pulse responses.

https://doi.org/10.7554/eLife.31599.005

Figure 5 illustrates tests of our predictions regarding the induction of LTD. As before, a granule cell burst was paired with the perturbation climbing fibre, but now a longer burst of climbing fibre stimuli was appended 100 ms later, representing an error complex spike (GPE, Figure 5C,D). A clear LTD of about 40% developed following the induction. In contrast, if the perturbation complex spike was omitted, leaving the error complex spike (G_E, Figure 5A,B), no clear change of synaptic weight occurred (an increase of about 10%). During induction, cells would generally begin in a tonic firing mode, but nearly all ceased firing by the end of the protocol. The specimen sweeps in Figure 5 are taken towards the end of the induction period and illustrate the Purkinje cell responses when spiking had ceased.

Figure 5

Download asset Open asset

LTD requires simultaneous granule cell and climbing fibre activity closely followed by an additional complex spike.

(A) Membrane potential of a Purkinje cell (*green*) during a protocol where a burst of five granule cell stimuli at 200 Hz was followed after 100 ms by four climbing fibre stimuli at 400 Hz (G_E). (B) Average EPSCs recorded up to 10 min before (*black*) and 20–30 min after the end of the protocol of A (*green*). The interval between the paired test stimuli (*triangles*) was 50 ms. The induction protocol resulted in little change (*green* vs. *black*) of the amplitude of either pulse. (C) Purkinje cell membrane potential (*magenta*) during the same protocol as in A with the addition of a pair of climbing fibre stimuli simultaneous with the granule cell stimuli (GPE). (D) EPSCs recorded before (*black*) and after (*magenta*) the protocol in C. A clear depression was observed.

https://doi.org/10.7554/eLife.31599.006

The time courses of the changes of EPSC amplitude are shown in normalised form in Figure 6 (see Materials and methods). The individual data points of the relative EPSC amplitudes for the different protocols are also shown. A numerical summary of the group data and statistical comparisons is given in Table 1.

Figure 6 with 1 supplement see all

Download asset Open asset

Time course and amplitude of plasticity.

(A) number, (B) box-and-whisker plots of individual plasticity ratios (*coloured lines* represent the means, *open symbols* represent cells with failures of climbing fibre stimulation; see Materials and methods) and (C) time course of the mean EPSC amplitude for GP_ (red) and G_ _ (blue) protocols of Figure 4, normalised to the pre-induction amplitude. Averages every 2 min, mean $\pm$ sem. Non-integer $n$ arise because the numbers of responses averaged were normalised by those expected in two minutes, but some responses were excluded (see Materials and methods) and some recordings did not extend to the extremities of the bins. Induction lasted for 10 min starting at time 0. (**D, E**) and (F) similar plots for the GPE (*magenta*) and G_E (*green*) protocols of Figure 5.

https://doi.org/10.7554/eLife.31599.007

Table 1

Group data and statistical tests for plasticity outcomes.

In the upper half of the table, the ratios of EPSC amplitudes after/before induction are described and compared with a null hypothesis of no change (ratio = 1). The GP_ and GPE protocols both induced changes, while the control protocols (G_ _, G_E) did not. The bottom half of the table analyses differences of those ratios between protocols. The 95% confidence intervals (c.i.) were calculated using bootstrap methods, while the $p$ -values were calculated using a two-tailed Wilcoxon rank sum test. The p-values marked with an asterisk have been corrected for a two-stage analysis by a factor of 2 (Materials and methods).

https://doi.org/10.7554/eLife.31599.009

Comparison	Mean	95 % c.i.		$p$	$n$
GP_	1.40	1.26,	1.72	0.0001	15
GPE	0.53	0.45,	0.63	0.002	10
G_ _	0.97	0.78,	1.16	0.77	18
G_E	0.98	0.79,	1.24	0.60	15
GP_ vs G_ _	0.43	0.19,	0.74	0.021*
GPE vs G_E	−0.44	−0.72,	−0.24	0.0018*
GP_ vs G_E	0.42	0.14,	0.73	0.01*
GPE vs G_ _	−0.43	−0.65,	−0.22	0.008*
G_ _ vs G_E	0.01	−0.26,	0.32	0.93

In a complementary series of experiments, we explored the plasticity outcome when six climbing fibre stimuli were grouped in a burst and applied simultaneously with the granule cell burst. This allowed comparison with the GPE protocol, which also contained 4 + 2 = 6 climbing fibre stimuli. The results in Figure 6—figure supplement 1 shows that a modest LTP was observed on average: after/before ratio 1.12 $\pm$ 0.12 (mean $\pm$ SEM); 95% c.i. 0.89–1.35. This result is clearly different from the LTD observed under GPE ( $p$ =0.0034; two-tailed Wilcoxon rank sum test, after Bonferroni correction for four possible comparisons).

These results therefore provide experimental support for all four plasticity rules predicted by our proposed mechanism of stochastic gradient descent. We argue in the Discussion that the apparent contradiction of these results with the literature is not unexpected if the likely effects of our altered conditions are considered in the light of known mechanisms of potentiation and depression.

Extraction of the change of error

Above we have provided experimental evidence in support of the counterintuitive synaptic plasticity rules predicted by our proposed learning mechanism. In that mechanism, following a perturbation complex spike, the sign of plasticity is determined by the absence or presence of a follow-up error complex spike that signals whether the movement error increased (spike present) or decreased (spike absent). We now return to the outstanding problem of finding a mechanism able to extract this change of error, $δ ℰ$ .

Several roughly equivalent schemes have been proposed (Williams, 1992), including subtraction of the average error (Barto et al., 1983) and decorrelation (Dean and Porrill, 2014), a specialisation of the covariance rule (Sejnowski, 1977). However, in general, these suggestions have not extended to detailed cellular implementations. In order to restrict our implementation to biologically plausible mechanisms we selected a method that involves subtracting the average error from the trial-to-trial error (Barto et al., 1983; Doya and Sejnowski, 1988). The residual of the subtraction is then simply the variation of the error $δ ℰ$ as desired.

As mechanism for this subtraction, we propose that the excitatory synaptic drive to the inferior olive is on average balanced by input from the GABAergic nucleo-olivary neurones. A diagram is shown in Figure 7 to illustrate how this might work in the context of repetitions of a single movement (we extend the mechanism to multiple interleaved movements below). Briefly, a feedback plasticity reinforces the inhibition whenever it is too weak to prevent an error complex spike from being emitted. When the inhibition is strong enough to prevent an error complex spike, the inhibition is weakened. If the variations of the strength of the inhibition are sufficiently small and the error does not change rapidly, the level of inhibition will attain a good approximation of the average error. Indeed, this mechanism can be viewed as maintaining an estimate of the movement error. However, the error still varies about its mean on a trial-to-trial basis because of the random perturbations that influence the movement and therefore the error. In consequence, error complex spikes are emitted when the error exceeds the (estimated) average; this occurs when the perturbation increases the error. This mechanism enables extraction of the sign of $δ ℰ$ in the context of a single movement realisation. In support of such a mechanism, there is evidence that inhibition in the olive builds up during learning and reduces the probability of complex spikes (Kim et al., 1998).

Figure 7

Download asset Open asset

Adaptive tracking to cancel the mean error input to the inferior olive.

(A) The olive is assumed to receive an excitatory signal representing movement error $ℰ$ and an inhibitory input $ℐ$ from the nucleo-olivary neurones of the cerebellar nuclei. (B) The inputs to the inferior olive are represented in discrete time ( $τ$ )—each bar can be taken to represent a discrete movement realisation. The error (*blue*) varies about its average (*dashed blue line*) because perturbation complex spikes influence the movement and associated error randomly. The strength of the inhibition is shown by the *green* trace. When the excitatory error input exceeds the inhibition, an error complex spike is emitted (*bottom black trace*) and the inhibition is strengthened by plasticity, either directly or indirectly. In the converse situation and in the consequent absence of an error complex spike, the inhibition is weakened. In this way, the inhibition tracks the average error and the emission of an error complex spike signals an error exceeding the estimated average. Note that spontaneous perturbation complex spikes are omitted from this diagram.

https://doi.org/10.7554/eLife.31599.010

More than one plasticity mechanism could produce the desired cancellation of excitatory drive to the inferior olive. We outline two possibilities here, but it will be necessary in the implementation below to make a concrete if somewhat arbitrary choice; we shall make it on the basis of the available, circumstantial evidence.

The first possible mechanism would involve plasticity of the inhibitory synapses made by nucleo-olivary neurones in the inferior olive (Figure 1A,B). Perturbation and error complex spikes would be distinguished in an appropriate plasticity rule by the presence of excitatory synaptic input to the olive. This would offer a simple implementation, since plastic and cancelled inputs would be at neighbouring synapses (De Zeeuw et al., 1998); information about olivary spikes would also be directly available. However, the lack of published evidence and our own unsuccessful exploratory experiments led us to consider an alternative plasticity locus.

A second possible implementation for cancelling the average error signal would make the mossy fibre to nucleo-olivary neurone synapses plastic. The presence of an error complex spike would need to potentiate these inputs, thereby increasing inhibitory drive to the olive and tending to reduce the likelihood of future error complex spikes being emitted. Inversely, the absence of the error complex spike should depress the same synapses. Movement specificity could be conferred by applying the plasticity only to active mossy fibres, the patterns of which would differ between movements. This would enable movement-specific cancellation as long as the overlap between mossy fibre patterns was not too great.

How would information about the presence or absence of the error complex spike be supplied to the nucleo-olivary neurones? A direct connection between climbing fibre collaterals and nucleo-olivary neurones exists (De Zeeuw et al., 1997), but recordings of cerebellar neurones following stimulation of the olive suggest that this input is not strong, probably eliciting no more than a single spike per activation (Bengtsson et al., 2011). The function of this apparently weak input is unknown.

An alternative route to the cerebellar nuclear neurones for information about the error complex spike is via the Purkinje cells. Climbing fibres excite Purkinje cells which in turn inhibit cerebellar nuclear neurones, in which a strong inhibition can cause a distinctive rebound of firing (Llinás and Mühlethaler, 1988). It has been reported that peripheral stimulation of the climbing fibre receptive field, which might be expected to trigger the emission of error complex spikes, causes large IPSPs and an excitatory rebound in cerebellar nuclear neurones (Bengtsson et al., 2011). These synaptically induced climbing fibre–related inputs were stronger than spontaneously occurring IPSPs. In our conceptual framework, this could be interpreted as indicating that error complex spikes are stronger and/or arise in a greater number of olivary neurones than perturbation complex spikes. The two types of complex spike would therefore be distinguishable, at least in the cerebellar nuclei.

Plasticity of active mossy fibre inputs to cerebellar nuclear neurones has been reported which follows a rule similar to that our implementation requires. Thus, mossy fibres that burst before a hyperpolarisation (possibly the result of an error complex spike) that triggers a rebound have their inputs potentiated (Pugh and Raman, 2008), while mossy fibres that burst without a succeeding hyperpolarisation and rebound are depressed (Zhang and Linden, 2006). It should be noted, however, that this plasticity was studied at the input to projection neurones and not at that to the nucleo-olivary neurones. Nevertheless, the existence of highly suitable plasticity rules in a cell type closely related to the nucleo-olivary neurones encouraged us to choose the cerebellar nuclei as the site of the plasticity that leads to cancellation of the excitatory input to the olive.

We now consider how synaptic integration in the olive leads to emission or not of error complex spikes. The nucleo-olivary synapses (in most olivary nuclei) display a remarkable degree of delayed and long-lasting release (Best and Regehr, 2009), suggesting that inhibition would build up during a command and thus be able to oppose the excitatory inputs signalling movement errors that appear some time after the command is transmitted. The error complex spike would therefore be produced (or not) after the command. On this basis, we shall make the simplifying assumption that the cerebellum generates a relatively brief motor control output or ‘command’, of the order of 100 ms or less and a single error calculation is performed after the end of that command. As for the saccade example previously mentioned, many movements can only be evaluated after completion. In effect, this corresponds to an offline learning rule.

Simulations

Above we outlined a mechanism for extracting the error change $δ ℰ$ ; it is based on adapting the inhibitory input to the inferior olive to cancel the average excitatory error input in a movement-specific manner. To verify that this mechanism could operate successfully in conjunction with the scheme for cortical plasticity already described, we turned to simulation.

A reduced model of a cerebellar microzone was developed and is described in detail in the Materials and methods. In overview, mossy fibre input patterns drove Purkinje and cerebellar nuclear neurones during commands composed of 10 discrete time bins. Purkinje cell activity was perturbed by randomly occurring complex spikes, which each increased the firing in a single time bin. The learning task was to adjust the output patterns of the nuclear projection neurones to randomly chosen targets. Cancellation of the average error was implemented by plasticity at the mossy fibre to nucleo-olivary neurone synapse while modifications of the mossy fibre pathway input to Purkinje cells reflected the rules for stochastic gradient descent described earlier. Synaptic weights were updated offline after each command. The global error was the sum of absolute differences between the projection neurone outputs and their target values. Error complex spikes were broadcast to all Purkinje cells when the error exceeded the integral of inhibitory input to the olive during the movement. There were thus 400 (40 projection neurones $\times$ 10 time bins) variables to optimise using a single error value.

The progress of the simulation is illustrated in Figure 8, in which two different movements were successfully optimised in parallel; only one is shown. The error can be seen to decrease throughout the simulation, indicating the progressive improvement of the learnt command. The effect of learning on the difference between the output and the target values can be seen in Figure 8C and D. The initial match is very poor because of random initial weight assignments, but it has become almost exact by the end of the simulation.

Figure 8

Download asset Open asset

Simulated cerebellar learning by stochastic gradient descent with estimated global errors.

The total error ( $ℰ$ , *blue*) at the cerebellar nuclear output and the cancelling inhibition ( $ℐ$ , *green*) reaching the inferior olive are plotted as a function of trial number ( $τ$ ) in (A and B) for one of two interleaved patterns learnt in parallel. An approximately 10-fold reduction of error was obtained. It can be seen in A that the cancelling inhibition follows the error very closely over most of the learning time course. However, the zoom in B shows that there is no systematic reduction in error until the inhibition accurately cancels the mean error. (C) Initial firing profile of a typical cerebellar nuclear projection neurone ( $P N$ , *magenta*); the simulation represented 10 time bins with initially random frequency values per neurone, with a mean of 30 Hz. The target firing profile for the same neurone ( $R$ , *grey*) is also plotted. (D) At the end of the simulation, the firing profile closely matched the target.

https://doi.org/10.7554/eLife.31599.011

The optimisation only proceeds once the inhibitory input to the olive has accurately subtracted the average error. Thus, in Figure 8B it can be seen that the initial values of the inhibitory and excitatory (error) inputs to the olive differed. The inhibition tends towards the error. Until the two match, the overall error shows no systematic improvement. This confirms the need for accurate subtraction of the mean error to enable extraction of the error changes necessary to descend the error gradient. This simulation supports the feasibility of our proposed cerebellar implementation of stochastic gradient descent.

Algorithm convergence and capacity

The simulations above provide a proof of concept for the proposed mechanism of cerebellar learning. Nonetheless, even for the relatively simple network model in the simulations, it is by no means obvious to determine the regions of parameter space in which the model converges to the desired outputs. It is also difficult to analyse the algorithm’s performance compared to other more classical ones. To address these issues, we abstract the core mechanism of stochastic gradient descent with estimated global errors in order to understand better the algorithm dynamics and to highlight the role of four key parameters. Analysis of this mechanism shows that this algorithm, even in this very reduced form, exhibits a variety of dynamical regimes, which we characterise. We then show how the key parameters and the different dynamical learning regimes directly appear in an analog perceptron description of the type considered in the previous section. We find that the algorithm’s storage capacity is similar to the optimal capacity of an analog perceptron.

Reduced model

In order to explore more exhaustively the convergence of the learning algorithm, we considered a reduced version with a single principal cell that nevertheless captures its essence (Figure 9A). A detailed analysis of this circuit is presented in the Appendix 1, which we summarise and illustrate here.

Figure 9

Download asset Open asset

Single-cell convergence dynamics in a reduced version of the algorithm.

(A) Simplified circuitry implementing stochastic gradient descent with estimated global errors. Combined Purkinje cells/projection neurones (P) provide the output $o$ . These cells receive an excitatory plastic input from mossy fibres ( $M$ ). Mossy fibres also convey a plastic excitatory input $𝒥$ to the nucleo-olivary neurones (NO), which receive inhibitory inputs from the Purkinje cells and supply the inhibitory input $ℐ = {[𝒥 - q P]}_{+}$ to the inferior olive (IO). The inferior olive receives an excitatory error input $ℰ$ . The olivary neurones emit spikes that are transmitted to the P-cells via the climbing fibre $c$ . (B) Effects of plasticity on the simplified system in the plane defined by the P-cell rate $P$ (with optimum $R$ , perturbation $A$ ) and excitatory drive to the nucleo-olivary neurones $𝒥$ . Plastic updates of type $Δ P Δ 𝒥$ (*blue arrows*) and $Δ 𝒥$ (*green arrows*) are shown. The updates change sign on the line C (*dashed blue*) and D (*solid green*), respectively. Lines $C_{\pm}$ and $D_{\pm}$ delimit the ‘convergence corridors’. The diagram is drawn for the case $q < 1$ , in which $A$ , the perturbation of $P$ is larger than the perturbation $q A$ of $ℐ$ . (C) Parameters $β$ and $ρ$ determine along which lines the system converges to $R$ . The *red dot* in the $D_{+} / D_{-}$ region shows the parameter values used in panels E and F and also corresponds to the effective parameters of the perceptron simulation of Figure 10. The other *red dots* show the parameters used in the learning examples displayed in Appendix 1—figure 1. (D) When $q > 1$ , $C_{+}$ and $D_{-}$ do not cross and learning does not converge to the desired rate. After going down the $C_{+} / D_{+}$ corridor, the point $(P, 𝒥)$ continues along the $C_{-} / D_{-}$ corridor without stopping close to the desired target rate $R$ . *Open circle*, start point; *filled circle*, end point. (E) Dynamics in the $(P, 𝒥)$ plane for $ρ = 0.2$ and $β = 2$ ( $A = 10, Δ P = 1, Δ 𝒥 = 2, q = 0.5$ ). Trajectories (*coloured lines*) start from different initial conditions (*open circles*). *Open circles*, start points; *filled circles*, end points. (F) Time courses of $P$ as a function of trial $τ$ for the trajectories in C (*same colours*). *Dashed lines*: predicted rate of convergence (Appendix 1). All trajectories end up fluctuating around $P = R - A (q + 1) / 2$ , which is $R - 7.5$ with the chosen parameters. *Open circles*, start points; *filled circles*, end points.

https://doi.org/10.7554/eLife.31599.012

Figure 10

Download asset Open asset

Convergence and capacity for an analog perceptron.

(A) Convergence for a single learned pattern for stochastic gradient descent with estimated global errors (SGDEGE). Different lines (*thin grey lines*) correspond to distinct simulations. The linear decrease of the error ( $P - R$ ) predicted from the simplified model without zero-weight synapses is also shown (*dashed lines*). When learning increases the rate towards its target, the predicted convergence agrees well with the simulation. When learning decreases the rate towards its target, the predicted convergence is larger than observed because a fraction of synapses have zero weight. (B) Mean error vs. number of trials (always per pattern) for different numbers of patterns for SGDEGE with $q = 0.5$ . *Colours from black to red*: 50, 100, 200, 300, 350, 380, 385, 390, 395, 400, 410, 420, 450. (C) Mean error vs. number of trials when learning using the delta rule. The final error is zero when the number of patterns is below capacity (predicted to be 391 for these parameters), up to finite size effects. *Colours from black to blue*: same numbers of patterns as listed for B. (D) Mean error after $1 \times 10^{5}$ trials for the delta rule (*blue*) and for the SGDEGE with $q = 0$ (*cyan*), 0.25 (*green*), 0.5 (*red*), 0.75 (*magenta*), as a function of the number of patterns $p$ . The mean error diverges from its minimal value close to the theoretical capacity (*dashed line*) for both learning algorithms. (E) Dynamics of pattern learning in the $P - 𝒥$ or more precisely $(P - R) - (𝒥 - q P)$ plane below maximal capacity ( $p = 300$ ). The error ( $P - R$ ) corresponding to each pattern (*grey*) is reduced as it moves down its convergence corridor. The trajectory for a single specimen pattern is highlighted in *red*, while the endpoints of all trajectories are indicated by *salmon filled circles* but are all obscured at (0,0) by the *red filled circle* of the specimen trajectory. (F) Same as in E for a number of patterns above maximal capacity ( $p = 420$ ). After learning several patterns, rates remain far from their targets (*salmon filled circles*). The SGDEGE algorithm parameters used to generate this figure are $A = 2, Δ 𝒥 = 0.4, Δ P = 0.2, ρ = 0.2$ . The parameter $q = 0.5$ except in panel D where it is explicitly varied. The analog perceptron parameters are $P_{m a x} = 100, f = 0.2, N_{m} = 1000$ and the threshold $θ = 12.85$ with $γ = 1$ ).

https://doi.org/10.7554/eLife.31599.013

We focus on the rate $P$ of a single P-cell (principal cell, in essence a combination of a Purkinje cell and a nuclear projection neurone), without considering how the firing is driven by synaptic inputs. The other variable of the model is the strength $𝒥$ of the plastic excitatory input from mossy fibres to nucleo-olivary neurones Figure 9. The learning task is to bring the firing rate $P$ of the P-cell to a desired target rate $R$ , guided by an estimation of the current error relying on $𝒥$ , which is refined concurrently with $P$ from trial to trial.

The error $ℰ$ is determined by the difference between $P$ and $R$ ; we choose the absolute difference $ℰ = | P - R |$ . In the presence of a perturbation $A$ , occurring with probability $ρ$ , the rate becomes $P + A$ and so $ℰ = | P + A - R |$ . The current estimate of the error is measured by the strength $ℐ$ of the inhibition on the olivary neurone associated with the P-cell. It assumes the value $ℐ = {[𝒥 - q P]}_{+}$ (the brackets ${[X]}_{+}$ denoting the rectified value of X, ${[X]}_{+} = X$ if $X \geq 0$ and $0$ otherwise); $q$ represents the strength of P-cell (in reality Purkinje cell) synapses on nucleo-olivary neurones. The inhibition of the inferior olive (IO) arises from the discharge of the nucleo-olivary neurones (NO) induced by the mossy fibre inputs, the $𝒥$ component of $ℐ$ . The NOs are themselves inhibited by the P-cell which is accounted for by the $- q P$ component of $ℐ$ (Figure 9A). Note that in the presence of a perturbation of the P-cell, the estimate of the error is itself modified to ${[𝒥 - q P - q A]}_{+}$ to account for the decrease (for $q > 0$ ) of the NO discharge arising from the P-cell firing rate increase.

The system learning dynamics can be analysed in the $P$ - $𝒥$ plane (Figure 9B). After each ‘movement’ realisation, the two system variables, $P$ and $𝒥$ are displaced within the $P$ - $𝒥$ plane by plasticity according to the algorithm. In the presence of a perturbation, $P$ is decreased (a leftwards displacement) if $ℰ > {[𝒥 - q P - q A]}_{+}$ and otherwise increased. However, $P$ remains unchanged if there is no perturbation. Conversely, $𝒥$ is always updated irrespective of the presence or absence of a perturbation. If $ℰ > {[𝒥 - q P - q A]}_{+}$ in the presence of a perturbation or $ℰ > 𝒥 - q P$ in its absence, $𝒥$ is increased (upwards displacement) and it is decreased otherwise.

Updates can therefore be described as $Δ P Δ 𝒥$ or $Δ 𝒥$ . The resultant directions vary between the different regions of the $(P, 𝒥)$ -plane (Figure 9B) delimited by the two borders $C$ and $D$ , defined as $𝒥 - q P - q A = | P - R + A |$ and $𝒥 - q P = | P - R |$ , respectively. The two half-lines bordering each sector are denoted by a plus or minus index according to their slopes.

Stochastic gradient descent is conveniently analysed with a phase-plane description, by following the values $(P, 𝒥)$ from one update to the next. The dynamics randomly alternate between updates of type $Δ P Δ 𝒥$ and $Δ 𝒥$ which leads $(P, 𝒥)$ to follow a biased random walk in the $(P, 𝒥)$ plane. Mathematical analysis (see Appendix 1) shows that the dynamics proceed in three successive phases, as observed in the simulations of Figure 8. First, the pair of values $(P, 𝒥)$ drifts from the initial condition towards one of the two ‘corridors’ between the lines $C_{+}$ and $D_{+}$ or between $C_{-}$ and $D_{-}$ (see Figure 9B). This first phase leads the estimated error $ℐ$ to approximate closely the error $ℰ$ , as seen in the full network simulation (Figure 8).

When a corridor is reached, in a second phase, $(P, 𝒥)$ follows a stochastic walk in the corridor with, under suitable conditions, a mean linear decrease of the error in time with bounded fluctuations. The precise line followed and the mean rate of error decrease depends on the initial conditions and two parameters: $ρ$ the probability of a perturbation occurring and $β = Δ 𝒥 / Δ P$ , as indicated in Figure 9C. Typical trajectories and time courses for specific values of $ρ$ and $β$ and different initial conditions are shown in Figure 9D,E,F as well as in Appendix 1—figure 1.

Error decrease in this second phase requires certain restrictions upon the four key parameters— $ρ$ , $β$ , $Δ P$ , $Δ 𝒥$ . These restrictions are that $0 < ρ < 1$ , which ensures that updates are not restricted to a line, and $Δ 𝒥 < (1 - q) A$ as well as $Δ P + Δ 𝒥 / (1 + q) < A$ , which are sufficient to ensure that a trajectory will eventually enter a convergence corridor (large updates might always jump over the corridor).

In a final phase, $(P, 𝒥)$ fluctuates around the intersection of $C_{+}$ and $D_{-}$ , when it exists.

$C_{+}$ and $D_{-}$ do not cross for $q > 1$ , when the perturbation $q A$ of the estimated error $ℐ$ is larger than $A$ the perturbation of the P-cell discharge. In this case, $P$ does not stop close to its target value, as illustrated in Figure 9D. For $q < 1$ , $C_{+}$ and $D_{-}$ intersect at $(R - A (q + 1) / 2, q R + (1 - q^{2}) A / 2)$ . The final error in the discharge rate fluctuates around $A (q + 1) / 2$ . Namely, the mean final error grows from $A / 2$ when $q = 0$ (vanishing inhibition of the nucleo-olivary discharge by the P-cell) to $A$ when $q = 1$ (maximal admissible inhibition of the nucleo-olivary discharge by the P-cell). The failed convergence for $q \geq 1$ could be expected, since in this scenario the perturbation corrupts the error estimate more than it influences the global error due to the perturbed P-cell firing rate.

Analog perceptron

In order to extend the above analysis to include multiple synaptic inputs and to explore the algorithm’s capacity to learn multiple movements, we considered an analog perceptron using the algorithm of stochastic gradient descent with estimated global errors. This allowed us to investigate the storage capacity attainable with the algorithm and to compare it to the theoretical maximal capacity for an analog perceptron (Clopath and Brunel, 2013).

The architecture was again that of Figure 9A; the details of the methods can be found in Appendix 1 and we summarise the conclusions here, with reference to Figure 10. The simulation included 1000 inputs (mossy fibres) to a P-cell; note that much larger numbers of inputs pertain in vivo (Napper and Harvey, 1988).

As found in the reduced model above, convergence of the algorithm requires the perturbation of the error estimate to be smaller than the effect of the P-cell perturbation on the true error, namely $q < 1$ . When this condition holds, the rate of learning a single pattern and the final error can be related to that predicted from the simplified analysis above (Figure 10A and in Appendix 1).

Learning using perturbations is slower than when using the full error information (i.e., the ‘delta rule’ (Dayan and Abbott, 2001); compare Figure 10B and C), for all numbers of patterns. The difference can be attributed to the different knowledge requirement of the two algorithms. When the error magnitude is known, which requires knowledge of the desired endpoint, large updates can be made for large errors, as done in the delta rule. In contrast, the proposed SGDEGE algorithm does not require knowledge of the error absolute magnitude or sign and thus proceeds with constant updating steps. Extensions of the SGDEGE algorithm that incorporate adaptive weight updates and could accelerate learning are mentioned in the Discussion.

The precision of learning is limited by the on-going perturbations to $A (1 + q) / 2$ (e.g. $A / 2$ for $q = 0$ or, $1.5 A$ for $q = 0.5$ as used in Figure 10B), but the final average error increases beyond this floor only very close to the theoretical capacity computed for the input-output association statistics (Figure 10D). The contribution of interfering patterns to the slowed learning is similar for the two algorithms. The behaviour of individual learning trajectories below maximal capacity is shown in Figure 10E and F.

These analyses establish the convergence mechanisms of stochastic gradient descent with estimated global errors and show that it can attain the maximum theoretical storage capacity up to a non-zero final error resulting from the perturbation. Learning is slower than when the full error information is used in the delta rule, but we argue that the availability of that information would not be biologically plausible for most complex movements.

Discussion

A cellular implementation of stochastic gradient descent

Analysis of the requirements and constraints for a general cerebellar learning algorithm highlighted the fact that the current consensus Marr-Albus-Ito model is only capable of learning simple reflex movements. Optimisation of complex, arbitrary movements, of which organisms are certainly capable and to which the cerebellum is widely believed to contribute, would require a different algorithm. We therefore sought to identify within the cerebellar system an implementation of stochastic gradient descent. This should comprise several elements: a source of perturbations, a mechanism for extracting the change of error, and a plasticity rule incorporating this information. We identified a strong constraint on any implementation, requiring each calculation to be made in the context of a single movement realisation. This arises from the potentially arbitrary sequencing of movements with different optima. We also sought a mechanism that only makes use of plausible cellular calculations: summation of excitation and inhibition in the presence of a threshold.

We suggest that the perturbation is provided by the complex spike, which has suitable properties: spontaneous irregular activity, an unambiguous sign during the action potential burst, salience at a cellular and network level, and the ability to influence synaptic plasticity. This choice of perturbation largely determines the predicted cerebellar cortical plasticity rules: only granule cell inputs active at the same time as a perturbation complex spike undergo plasticity, whose sign is determined by the absence (LTP) or presence (LTD) of a succeeding error complex spike. We have provided evidence that the synaptic plasticity rules do operate as predicted, in vitro under conditions designed to be more physiological than is customary.

An additional plasticity mechanism seems to be required to read off the change of error. The general mechanism we propose involves subtraction of the average error to expose the random variations caused by the perturbations of the movement. The subtraction results from adaptive tracking of the excitatory input to the olivary neurones by the inhibitory input from the nucleo-olivary neurones of the cerebellar nuclei. We chose to place the plasticity at the mossy fibre–nucleo-olivary neurone synapse, mostly because of the existence of suitable plasticity rules at the mossy fibre synapse onto the neighbouring projection neurones. However, plasticity in the olive at the nucleo-olivary input would probably be functionally equivalent and we do not intend to rule out this or alternative sites of the error-cancelling plasticity.

By simulating a simplified cerebellar network implementing this mechanism, we established the ability of our proposed mechanism to learn multiple arbitrary outputs, optimising 400 variables per movement with a single error value. More formal analysis of a simplified version of stochastic gradient descent with estimated global errors established convergence of the algorithm and allowed us to estimate its storage capacity.

Implications for studies of synaptic plasticity

The plasticity rules for parallel fibre–Purkinje cell synapses predicted by our algorithm appeared to be incompatible with the well-established consensus. However, we show that under different, arguably more physiological, conditions, we were able to provide support for the four predicted outcomes.

We made several changes to the experimental conditions, only one of which is specific to the cerebellum. Thus, keeping synaptic inhibition intact has long been recognised as being of potential importance, with debates regarding its role in hippocampal LTP dating back decades (Wigström and Gustafsson, 1983a; Wigström and Gustafsson, 1983b; Arima-Yoshida et al., 2011). Very recent work also highlights the importance of inhibition in the induction of cerebellar plasticity (Rowan et al., 2018; Suvrathan and Raymond, 2018).

We also made use of a lower extracellular calcium concentration than those almost universally employed in studies of plasticity in vitro. In vivo measurements of the extracellular calcium concentration suggest that it does not exceed 1.5 mM in rodents, yet most studies use at least 2 mM. A 25% alteration of calcium concentration could plausibly change plasticity outcomes, given the numerous nonlinear calcium-dependent processes involved in synaptic transmission and plasticity (Nevian and Sakmann, 2006; Graupner and Brunel, 2007).

A major change of conditions we effected was cerebellum-specific. Nearly all studies of granule cell–Purkinje cell plasticity have employed stimulation of parallel fibres in the molecular layer. Such concentrated, synchronised input activity is unlikely to arise physiologically. Instead of this, we stimulated in the granule cell layer, a procedure expected to generate a more spatially dispersed input on the Purkinje cell, presumably leading to minimised dendritic depolarisations. Changing the stimulation method has been reported to prevent induction of LTD using standard protocols (Marcaggi and Attwell, 2007).

Although we cannot predict in detail the mechanistic alterations resulting from these changes of conditions, it is nevertheless likely that intracellular calcium concentrations during induction will be reduced, and most of the changes we observed can be interpreted in this light. It has long been suggested that high calcium concentrations during induction lead to LTD, while lower calcium concentrations generate LTP (Coesmans et al., 2004); we have recently modelled the induction of this plasticity, incorporating both calcium and nitric oxide signalling (Bouvier et al., 2016). Consistently with this viewpoint, a protocol that under standard conditions produce LTD—simultaneous activation of granule cells and climbing fibres—could plausibly produce LTP in the present conditions as a result of reduced intracellular calcium. Analogously, granule cell stimulation that alone produces LTP under standard conditions might elicit no change if calcium signalling were attenuated under our conditions.

Interestingly, LTP resulting from conjunctive granule cell and climbing fibre stimulation has been previously reported, in vitro (Mathy et al., 2009; Suvrathan et al., 2016) and in vivo (Wetmore et al., 2014). In contrast, our results do not fit well with several other studies of plasticity in vivo (Ito et al., 1982; Jörntell and Ekerot, 2002; Jörntell and Ekerot, 2003; Jörntell and Ekerot, 2011). However, in these studies quite intense stimulation of parallel and/or climbing fibre inputs was used, which may result in greater depolarisations and calcium entry than usually encountered. This difference could therefore account for the apparent discrepancy with the results we predict and have found in vitro.

It is unlikely that the interval between perturbation and error complex spikes would be fixed from trial to trial and it is certainly expected to vary with sensory modality (vision being slow). Our theoretical framework would therefore predict that a relatively wide range of intervals should be effective in inducing LTD through the GPE protocol. However, this prediction is untested and it remains possible that different intervals (potentially in different cerebellar regions) may lead to different plasticity outcomes, as suggested by the recent work of Suvrathan et al. (2016); this might also contribute to some of the variability of individual plasticity outcomes we observe.

Another open question is whether different relative timings of parallel and climbing fibre activity would result in different plasticity outcomes. In particular, one might hypothesise that parallel fibres active during a pause following a perturbation complex spike might display plasticity of the opposite sign to that reported here for synchrony with the complex spike itself.

In summary, while in vitro studies of plasticity are likely to reveal molecular mechanisms leading to potentiation and depression, the outcomes from given stimulation protocols may be very sensitive to the precise conditions, making it difficult to extrapolate to the in vivo setting, as we have shown here for the cerebellum. Similar arguments could apply to in vitro plasticity studies in other brain regions.

Current evidence regarding stochastic gradient descent

As mentioned in the introductory sections, the general cerebellar learning algorithm we propose here is not necessarily required in situations where movements are simple or constrained, admitting a fixed mapping between errors and corrective action. Furthermore, such movements constitute the near totality of well-studied models of cerebellar learning. Thus, the vestibulo-ocular reflex and saccade adaptation involve eye movements, which are naturally constrained, while the eyeblink is a stereotyped protective reflex. There is therefore a possibility that our mechanism does not operate in the cerebellar regions involved in oculomotor behaviour, even if it does operate elsewhere.

In addition, these ocular behaviours apparently display error functions that are incompatible with our assumptions. In particular, disturbance of a well optimised movement would be expected to increase error. However, it has been reported multiple times that climbing fibre activity can provide directional error information, including reductions of climbing fibre activity below baseline (e.g. Soetedjo et al., 2008). This argument is not totally conclusive, however. Firstly, we recall that the error is represented by the input to the inferior olive, not its output. It is thus possible that inputs from the nucleo-olivary neurones (or external inhibitory inputs) to the olive also have their activity modified by the disturbance of the movement, causing the reduction of climbing fibre activity. Secondly, what matters for our algorithm is the temporal sequence of perturbation and error complex spikes, but investigation of these second-order statistics of complex spike activity in relation to plasticity has, to our knowledge, not been reported. Similarly, it has been reported that learning and plasticity (LTD) occur in the absence of modulation of climbing fibre activity (Ke et al., 2009). Although this is difficult to reconcile with either the standard theory or our algorithm, it does not entirely rule out the existence of perturbation-error complex spike pairs that we predict lead to LTD.

Trial-to-trial plasticity correlated with the recent history of complex spikes has been demonstrated in oculomotor adaptation (Medina and Lisberger, 2008; Yang and Lisberger, 2014). This suggests that one way of testing whether our algorithm operates would be to examine whether the history of complex spike activity can predict future changes in the simple spike firing rate, according to the plasticity rules described above. For instance, two complex spikes occurring at a short interval should cause at the time of the first an increase of simple spike firing in subsequent trials. However, the most complete datasets reported to date involve oculomotor control (Yang and Lisberger, 2014; Catz et al., 2005; Soetedjo et al., 2008; Ke et al., 2009), where, as mentioned above, our algorithm may not be necessary.

Beyond the predictions for the plasticity rules at parallel fibre–Purkinje cell synapses tested above, there are a number of aspects of our theory that do fit well with existing observations. The simple existence of spontaneous climbing fibre activity is one. Additional suggestive features concern the evolution of climbing fibre activity during eyeblink conditioning (Ohmae and Medina, 2015). Once conditioning has commenced, the probability of complex spikes in response to the unconditioned stimulus decreases, which would be consistent with the build-up of the inhibition cancelling the average error signal in the olive. Furthermore, omission of the unconditioned stimulus then causes a reduction in the probability of complex spikes below the baseline rate, strongly suggesting a specifically timed inhibitory signal has indeed developed at the time of the unconditioned stimulus (Kim et al., 1998).

We suggest that the cancellation of average error involves plasticity at mossy fibre–nucleo-olivary neurone synapses. To date no study has reported such plasticity, but the nucleo-olivary neurones have only rarely been studied. Plasticity at the mossy fibre synapses on projection neurones has been studied both in vitro (Pugh and Raman, 2006; Pugh and Raman, 2008; Zhang and Linden, 2006) and in vivo (Ohyama et al., 2006), but is not used in our proposed algorithm. Axonal remodelling and synaptogenesis of mossy fibres in the cerebellar nuclei may underlie this plasticity (Kleim et al., 2002; Boele et al., 2013; Lee et al., 2015) and could also contribute to the putative plasticity at mossy fibre synapses on nucleo-olivary neurones.

Finally, our theory of course predicts that perturbation complex spikes perturb ongoing movements. It is well established that climbing fibre activation can elicit movements (Barmack and Hess, 1980; Kimpo et al., 2014; Zucca et al., 2016), but it remains to be determined whether the movements triggered by spontaneous climbing fibre activity are perceptible. Stone and Lisberger (1986) reported the absence of complex-spike-triggered eye movements in the context of the vestibulo-ocular reflex. However, it is known that the visual system is very sensitive to retinal slip (Murakami, 2004), so it may be necessary to carry out high-resolution measurements and careful averaging to confirm or exclude the existence of perceptible movement perturbations.

Climbing fibre receptive fields and the bicycle problem

If the cerebellum is to contribute to the optimisation of complex movements, its output controlling any given muscle must be adjustable by learning involving multiple error signals. Thus, it may be useful to adjust an arm movement using vestibular error information as part of a balancing movement, but using touch information in catching a ball. However, it is currently unclear to what extent individual Purkinje cells can access different error signals.

There is an extensive literature characterising the modalities and receptive fields of climbing fibres. The great majority of reports are consistent with climbing fibres having fixed, specific modalities or very restricted receptive fields, with neighbouring fibres having similar properties (Garwicz et al., 1998; Jörntell et al., 1996). Examples would be a climbing fibre driven by retinal slip in a specific direction (Graf et al., 1988) or responding only to a small patch of skin (Garwicz et al., 2002). These receptive fields are quite stereotyped and have proven to be reliable landmarks in the functional regionalisation of the cerebellum; they are moreover tightly associated with the genetically specified zebrin patterning of the cerebellum (Schonewille et al., 2006b; Mostofi et al., 2010; Apps and Hawkes, 2009).

The apparently extreme specialisation of climbing fibres would limit the ability of the cerebellar circuitry to optimise complex movements. We can illustrate this with a human behaviour: riding a bicycle, which is often taken as an example of a typical cerebellar behaviour. This is an acquired skill for which there is little evolutionary precedent. It is likely to involve learning somewhat arbitrary arm movements in response to vestibular input (it is possible to ride a bike with one’s eyes closed). The error signals guiding learning could be vestibular, visual or possibly cutaneous/nociceptive (as a result of a fall), but not necessarily those related to the arm whose movement is learnt. How can such disparate or uncommon but sometimes essential error signals contribute to cerebellar control of the arm? We call this the ‘bicycle problem’.

At least two, non-exclusive solutions to this problem can be envisaged (see Figure 11). The first we term ‘output convergence’; it involves the convergence of multiple cerebellar regions receiving climbing fibres of different modalities onto each specific motor element (for instance a muscle) being controlled. Striking, if partial, evidence for this is found in a study by (Ruigrok et al., 2008), who injected the retrograde, trans-synaptic tracer rabies virus into individual muscles. Multiple cerebellar zones were labelled, showing that they all contribute to the control of those muscles, as posited. What is currently less clear is whether such separate zones receive climbing fibre inputs with different modalities. We note that the output convergence solution to the bicycle problem implies that the synaptic changes in those regions receiving the appropriate error information must outweigh any drift in synaptic weights from those regions deprived of meaningful error information. Adaptation of the learning rate, an algorithmic extension we suggest below, could contribute to ensuring that useful synaptic changes dominate.

Figure 11

Download asset Open asset

Diagram illustrating two possible solutions to the ‘bicycle problem’: how to use vestibular error information to guide learning of arm movements to ride a bicycle.

In the ‘output convergence’ solution, the outputs from cerebellar regions receiving different climbing fibre modalities converge onto a motor unit (represented by a muscle in the diagram). In the ‘error broadcast’ solution, error complex spikes are transmitted beyond their traditional receptive fields, either by divergent synaptic inputs and/or via the strong electrical coupling between inferior olivary neurones.

https://doi.org/10.7554/eLife.31599.014

We term the second solution to the bicycle problem the ‘error broadcast’ solution. According to this, error inputs to the olive are broadcast to olivary neurones (and Purkinje cells) outside the traditional receptive field. Although the weight of literature appears to be against this, there are both possible mechanisms and a small amount of supporting data for this suggestion. In terms of mechanism, the well-known electrical coupling of olivary neurones (Devor and Yarom, 2002) could recruit cells that do not directly receive suprathreshold synaptic input. This may occur much more frequently in vivo than in the quiescent/anesthetised conditions employed for most studies of climbing fibre receptive fields. Evidence for ‘broadcast’ of what we would term error complex spikes in vivo has been reported for visual and auditory stimuli (Mortimer, 1975; Ozden et al., 2012); these stimuli may be correlated with startle responses. Eye blink conditioning using a visual unconditioned stimulus has also been reported (Rogers et al., 1999).

The existence of broadcast error complex spikes would provide a mechanism explaining the giant IPSPs in the cerebellar nuclei elicited by peripheral stimulation (Bengtsson et al., 2011) and could also account for the correlation between visual smooth pursuit behaviour and the nature of individual complex spikes (Yang and Lisberger, 2014): behaviour could only correlate with a single cell if others were receiving the same input.

Possible extensions to the algorithm

Our implementation of cerebellar stochastic gradient descent and its simulation were purposefully kept as simple as possible, to provide a proof-of-concept with a minimum of assumptions and to simplify parallel analyses. It is likely that parts of the implementation will need to be altered and/or extended as further information becomes available.

Probably the most uncertain element of the implementation is the adaptive site in the cancellation of the average error. We chose to make the mossy fibre–nucleo-olivary neurone synapse plastic, but the plasticity could certainly operate in the olive instead of or in addition to the cerebellar nuclear site. Further studies of synaptic transmission and plasticity in both structures are clearly warranted in this context.

A simplification in our implementation is that it represents brief, discrete commands in what amounts to an offline learning rule. Error complex spikes are only emitted after the command and indeed were not simulated explicitly. This has the great advantage of avoiding the question of whether Purkinje cells that have not received a perturbation complex spike would interpret a broadcast error complex spike as a perturbation. A first remark is to stress the fact that the movement evaluation will often only occur after the movement is complete. Even if an eCS caused a small movement or some additional plasticity, that might be less of an issue outside of a movement context. Additionally, it is possible that cellular mechanisms could exist that would enable the Purkinje cell to distinguish the two types of input and therefore avoid spurious plasticity. The most obvious mechanism would be that, as already hinted at in the literature, error complex spikes are likely to be stronger. An extended synaptic plasticity rule could therefore include a case in which an error complex spike received in the absence of a recent perturbation spike has a neutral plasticity effect. There is currently little data on which to base a detailed implementation, although we note that in our experiments, bursts of 6 climbing fibre stimuli—a strong complex spike—resulted in only a modest LTP compared to the GP_ protocol.

A potentially unsatisfactory aspect of our simulations was the time taken to learn. Of the order of 50,000 iterations were required to optimise the 400 independent variables of the cerebellar output. Stochastic gradient descent is inherently slow, since just one or a few of those variables can be perturbed in one movement realisation, and the weight changes are furthermore individually small. Before considering possible acceleration methods, we note that some motor behaviours are repeated huge numbers of times. An obvious example is locomotion. Thus, public health campaigns in vogue in the USA at the time of writing aim for people to take 10,000 steps per day. So, clearly, a target of a few hundred thousand steps could be achieved in a matter of days or weeks.

Part of the slowness of learning results from the conflicting pressures on the plastic weight changes. Large changes allow rapid learning, but could prevent accurate optimisation. An obvious extension of our implementation that would resolve this conflict would be to allow large plastic changes far from the optimum but to reduce them as the optimum is approached. The information required to do this is available as the net drive (error excitation $-$ cancellation inhibition) to the olivary neurones at the time of emission of an error complex spike. If the drive is strong, one can imagine a long burst of action potentials being emitted. There is in vitro (Mathy et al., 2009) and in vivo (Yang and Lisberger, 2014; Rasmussen et al., 2013) evidence that climbing fibre burst length can influence plasticity in Purkinje cells. It seems possible that the same could be true in the cerebellar nuclei (or alternative plastic site in the subtraction of the average error). However, the above mechanism for adapting learning rates would only work directly in the LTD direction, since olivary cells cannot signal the strength of a net inhibition when no error complex spike is emitted.

A mechanism that could regulate the speed of learning in both LTP and LTD directions would be to target perturbations to the time points where they would be most useful—those with the greatest errors. This might be achieved by increasing the probability (and possibly the strength) of the perturbation complex spikes shortly before strongly unbalanced (excitatory or inhibitory) inputs to the olive. This process offers a possible interpretation for various observations of complex spikes occurring before error evaluation: related to movements (Bauswein et al., 1983; Kitazawa et al., 1998) or triggered by conditioned stimuli (Rasmussen et al., 2014; Ohmae and Medina, 2015).

Movement-specific adaptations of the learning rates could provide an explanation for the phenomenon of ‘savings’, according to which relearning a task after extinction occurs at a faster rate than the initial learning. The adaptations could plausibly be maintained during extinction and therefore persist until the relearning phase. These adaptations could appear to represent memories of previous errors (Herzfeld et al., 2014).

Finally, the output convergence solution we proposed above for the bicycle problem could also reflect a parallelisation strategy enabling the computations involved in stochastic gradient descent to be scaled from the small circuit we have simulated to the whole cerebellum. As mentioned above, this might involve one of the above schemes for adjusting learning rates in a way that would help plasticity in regions with ‘useful’ error information to dominate changes in those without.

Insight into learning in other brain regions

We believe that our proposed implementation of stochastic gradient descent offers possible insight into learning processes in other brain regions.

To date, the most compelling evidence for a stochastic gradient descent mechanism has been provided in the context of the acquisition of birdsong. A specific nucleus, the ‘LMAN’ has been shown to be responsible for song variability during learning and also to be required for learning (Doya and Sejnowski, 1988; Olveczky et al., 2005). Its established role is therefore analogous to our perturbation complex spike. Our suggestion that the same input signals both perturbation and error change may also apply in the birdsong context, where it would imply that LMAN also assumes the role of determining the sign of plasticity at the connections it perturbed. However, such an idea has for now not been examined and there is as yet a poor understanding of how the trial song is evaluated and of the mechanism for transmitting that information to the adaptive site; indeed the adaptive site itself has not been identified unequivocally.

We see a stronger potential analogy with our mechanism of stochastic gradient descent in the learning of reward-maximising action sequences by the basal ganglia. Under the stimulus of cortical inputs, ensembles of striatal medial spiny neurones become active, with the resulting activity partition determining the actions selected by disinhibition of central pattern generators through inhibition of the globus pallidus (Grillner et al., 2005). It is thought that the system learns to favour the actions that maximise the (discounted) reward, which is signalled by activity bursts in dopaminergic midbrain neurones and phasic release of dopamine, notably in the striatum itself (Schultz, 1986). This has been argued (Schultz et al., 1997) to reflect reinforcement learning or more specifically temporal difference learning (Sutton and Barto, 1998).

We note that temporal difference learning can be decomposed into two problems: linking actions to potential future rewards and a gradient ascent to maximise reward. In respect of the gradient ascent, we note that dopamine has a second, very well-known action in the striatum: it is necessary for the initiation of voluntary movements, since reduction of dopaminergic input to the striatum is the cause of Parkinson’s disease, in which volitional movement is severely impaired. The key point is to combine the two roles of the dopaminergic system in the initiation of movement and in signalling reward. The initiation of movement by dopamine, which could contribute to probabilistic action selection, would be considered analogous to our perturbation complex spike and could create an eligibility trace. A subsequent reward signal would result in plasticity of eligible synapses reinforcing the selection of that action, with possible sites of plasticity including the cortico-striatal synapses that were successful in exciting an ensemble of striatal neurones (Rui Costa made a similar suggestion at the 5th Colloquium of the Institut du Fer à Moulin, Paris, 2014). This would constitute a mechanism of gradient ascent analogous to that we have proposed for gradient descent in the cerebellar system.

Of particular interest is whether correct optimisation also involves a mechanism for subtracting the average error in order to extract gradient information. Such subtraction would be entirely consistent with the reports of midbrain dopaminergic neurones responding more weakly to expected rewards and responding with sub-baseline firing to omission of a predicted reward. These phenomena moreover appear to involve an adaptive inhibitory mechanism (Eshel et al., 2015). These observations could be interpreted as a subtraction of the average reward by a process analogous to that we propose for the extraction of the change of error $δ ℰ$ .

Conclusion

We have proposed a complete and plausible mechanism of stochastic gradient descent in the cerebellar system, in which the climbing fibre perturbs movements, creates an eligibility trace, signals error changes and guides plasticity at the sites of perturbation. We verify predicted plasticity rules that contradict the current consensus and highlight the importance of studying plasticity under physiological conditions. The gradient descent requires extraction of the change of error and we propose an adaptive inhibitory mechanism for doing this via cancellation of the average error. Our implementation of stochastic gradient descent suggests the operation of an analogous mechanism (of gradient ascent) in the basal ganglia initiated and rewarded by dopaminergic signalling.

Materials and methods

Electrophysiology

Request a detailed protocol

Animal experimentation methods were authorised by the ‘Charles Darwin N°5’ ethics committee (authorisation no. 4445). Adult female C57Bl/6 mice (2–5 months old) were anesthetised with isoflurane (Nicholas Piramal Ltd, India) and killed by decapitation. The cerebellum was rapidly dissected into cold solution containing (in mM): 230 sucrose, 26 NaHCO₃, 3 KCl, 0.8 CaCl₂, 8 MgCl₂, 1.25 NaH₂PO₄, 25 d-glucose supplemented with 50 $μ M$ d-APV to protect the tissue during slicing. 300 $μ$ m sagittal slices were cut in this solution using a Campden Instruments 7000 smz and stored at 32 $^{\circ} C$ in a standard extracellular saline solution containing (in mM): 125 NaCl, 2.5 KCl, 1.5 CaCl₂, 1.8 MgCl₂, 1.25 NaH₂PO₄, 26 NaHCO₃ and 25 d-glucose, bubbled with 95 % O₂ and 5 % CO₂ (pH 7.4). Slices were visualised using an upright microscope with a 40 X, 0.8 NA water-immersion objective and infrared optics (illumination filter 750 $\pm$ 50 nm). The recording chamber was continuously perfused at a rate of 4–6 ml min $^{- 1}$ with a solution containing (mM): 125 NaCl, 2.5 KCl, 1.5 CaCl₂, 1.8 MgCl₂, 1.25 NaH₂PO₄, 26 NaHCO₃, 25 d-glucose and 10 tricine, a Zn $^{2 +}$ buffer (Paoletti et al., 1997), bubbled with 95 % O₂ and 5 % CO₂ (pH 7.4). Patch pipettes had resistances in the range 2–4 M $Ω$ with the internal solutions given below. Unless otherwise stated, cells were voltage clamped at −70 mV in the whole-cell configuration. Voltages are reported without correction for the junction potential, which was about 10 mV (so true membrane potentials were more negative than we report). Series resistances were 4–10 M $Ω$ and compensated with settings of ~90 % in a Multiclamp 700B amplifier (Molecular Devices). Whole-cell recordings were filtered at 2 kHz and digitised at 10 kHz. Experiments were performed at 32–34°C. The internal solution contained (in mM): 128 K-gluconate, 10 HEPES, 4 KCl, 2.5 K₂HPO₄, 3.5 Mg-ATP, 0.4 Na₂-GTP, 0.5 l-(–)-malic acid, 0.008 oxaloacetic acid, 0.18 $α$ -ketoglutaric acid, 0.2 pyridoxal 5’-phosphate, 5 l-alanine, 0.15 pyruvic acid, 15 l-glutamine, 4 l-asparagine, 1 reduced l-glutathione, 0.5 NAD⁺, 5 phosphocreatine, 1.9 CaCl₂, 1.5 MgCl₂, 0.1 K_3.8EGTA. Free [Ca²⁺] was calculated with Maxchelator (C. Patton, Stanford) to be 120 nM. Chemicals were purchased from Sigma-Aldrich, D-APV from Tocris.

Recordings were made in the vermis of lobules three to eight of the cerebellar cortex. Granule cell EPSCs were elicited with stimulation in the granule cell layer with a glass pipette of tip diameter 8–12 μm filled with HEPES-buffered saline. Climbing fibre electrodes had $\sim 2 μ$ m diameter and were also positioned in the granule cell layer. Images were taken every 5 min; experiments showing significant slice movement ( $>$ 20 $μ$ m) were discarded. Stimulation intensity was fixed at the beginning of the experiment (1–15 V; 50–200 μs) and maintained unchanged during the experiment.

Analysis

No formal power calculation to determine the sample sizes was performed for the plasticity experiments. Recordings were analysed from 75 cells in 55 animals. Slices were systematically changed after each recording. Animals supplied 1–3 cells to the analysis, usually with rotating induction protocols, but there was no formal randomisation or blocking and there was no blinding. An initial analysis focusing on between-protocol comparisons was performed when $n = 33$ cells were retained for analysis (criteria detailed below). Because the p-value for the GPE vs. G_E difference was close to a $p < 0.001$ threshold (Colquhoun, 2014) and the confidence interval for the GPE plasticity ratio was narrower than for the others, acquisition for that protocol was reduced and additional experiments prioritised the G_ _, GP_ and G_E protocols. Experiments were ultimately halted for external reasons, when $n = 58$ cells were retained. The two-stage analysis is a form of multiple comparison; in consequence, the output of the statistical test on the final data has been corrected by doubling the indicated p-values in Table 1.

Inspection of acquired climbing fibre responses revealed some failures of stimuli after the first in a fraction of cells, presumably because the second and subsequent stimuli at short intervals fell within the relative refractory period. As a complex spike was always produced these cells have been included in our analysis, but where individual data are displayed, we identify those cells in which failures of secondary climbing fibre stimuli were observed before the end of the induction period.

Analysis made use of a modular Python framework developed in house. Analysis of EPSC amplitudes for each cell began by averaging all of the EPSCs acquired to give a smooth time course. The time of the peak of this ‘global’ average response was determined. Subsequent measurement of amplitudes of other averages or of individual responses was performed by averaging the current over 0.5 ms centred on the time of the peak of the global average. The baseline calculated over 5 ms shortly before the stimulus artefact was subtracted to obtain the EPSC amplitude. Similar analyses were performed for both EPSCs of the paired-pulse stimulation.

Individual EPSCs were excluded from further analysis if the baseline current in the sweep exceeded $-$ 1 nA at $-$ 70 mV. Similarly, the analysis automatically excluded EPSCs in sweeps in which the granule cell stimuli elicited an action potential in the Purkinje cell (possibly through antidromic stimulation of its axon or through capacitive coupling to the electrode). However, during induction, in current clamp, such spikes were accepted. For displaying time series, granule cell responses were averaged in bins of 2 min.

The effects of series resistance changes in the Purkinje cell were estimated by monitoring the transient current flowing in response to a voltage step. The amplitude of the current 2 ms after the beginning of the capacity transient was measured. We shall call this the ‘dendritic access’. Modelling of voltage-clamp EPSC recordings in a two-compartment model of a Purkinje cell (Llano et al., 1991) suggests that this measure is approximately proportional to EPSC amplitude as the series resistance changes over a reasonable range (not shown). It therefore offers a better estimate of the effect on the EPSC amplitude of series resistance changes than would the value of the series resistance (or conductance), which is far from proportional to EPSC amplitude. Intuitively, this can be seen to arise because the EPSC is filtered by the dendritic compartment and the measure relates to the dendritic component of the capacitive transient, whereas the series resistance relates to the somatic compartment. We therefore calculated $R_{res}$ , the ratio of the dendritic access after induction (when plasticity was assessed) to the value before induction, in order to predict the changes the EPSC amplitude arising from changes of series resistance.

Because we elicited EPSCs using constant-voltage stimulation, variations of the resistance of the tip of the stimulating electrode (for instance if cells are drawn into it) could alter the stimulating current flow in the tissue. We monitored this by measuring the amplitude of the stimulus artefact. Specifically, we calculated $R_{stim}$ , the after/before ratio of the stimulation artefact amplitude.

We then used a robust linear model to examine the extent to which changes of series resistance or apparent stimulation strength could confound our measurements of plasticity, which we represented as the after/before ratio of EPSC amplitudes $R_{EPSC}$ ; the model (in R syntax) was:

R_{EPSC} \sim protocol + R_{res} + R_{stim}

This showed that series resistance changes, represented by $R_{res}$ , had a significant influence ( $t$ -value 2.30, 69 degrees of freedom) with a slope close to the predicted unity (1.14). In contrast, changes of the stimulus artefact had no predictive value (slope −0.003, $t$ -value −0.01).

Extending our modelling using a mixed-effect model (call 'lmer' in the lme4 R package) to include animals as a random effect did not indicate that it was necessary to take into account any between-animal variance, as this was reported to be zero. On this basis, we consider each cell recorded to be the biological replicate.

We did not wish to rely on the parametric significance tests of the linear model for comparing the plasticity protocols (although all of comparisons we report below as significant were also significant in the model). Instead, we equalised the dendritic filtering and stimulation changes between groups by eliminating those cells in which $R_{res}$ or $R_{stim}$ differed by more than 20% from the mean values for all cells (0.94 $\pm$ 0.10 and 1.01 $\pm$ 0.19, respectively; mean $\pm$ sd, $n =$ 75). After this operation, which eliminated 17 cells out of 75 leaving 58 (from 47 animals), the mean ratios varied by only a few percent between groups (ranges 5 % and 2 % for $R_{res}$ and $R_{stim}$ , respectively) and would be expected to have only a minimal residual influence. Normalising the $R_{EPSC}$ s of the trimmed groups by $R_{res}$ did not alter the conclusions presented below. Note that after this trimming, the remaining changes of $R_{res}$ imply that all EPSC amplitudes after induction were underestimated by about 6 % relative to those at the beginning of the recording. The differences of $R_{EPSC}$ between induction protocols were evaluated statistically using two-tailed nonparametric tests implemented by the 'wilcox.test' command in R (R Core Team, 2013).

95 % confidence limits were calculated using R bootstrap functions (‘BCa’ method). For confidence limits of differences between means, stratified resampling was used.

A small additional series of experiments of 16 cells from 16 animals of which 12 were retained by similar criteria to those explained above (except that series resistance changes had to be evaluated using the current amplitude 1 ms into the response to the voltage step) were analysed separately and underlie the plasticity data in Figure 6—figure supplement 1.

Simulation methods

Request a detailed protocol

Network simulations were designed as follows (see also diagram in Figure 12). A total of $S \times L$ Purkinje cells were placed on a rectangular grid of extent $S$ in the sagittal plane and width $L$ in the lateral direction. The activity of each Purkinje cell during a ‘movement’ was characterised by its firing rate in $T$ time bins ${t = 1, \dots, T}$ .

{P C_{s, l} (t), s = 1, \dots, S, l = 1, \dots, L}

Purkinje cells contacted projection neurones ( $P N$ ) and nucleo-olivary neurones ( $N O$ ) in the cerebellar nuclei that contained $L$ cells of each type. The activities of both types of nuclear cell were also characterised by their firing rates in the different time bins, ${P N_{l} (t), l = 1, \dots, L}$ and ${N O_{l} (t), l = 1, \dots, L}$ . Mossy fibres, granule cells (parallel fibres) and molecular layer interneurons were subsumed into a single cell type $M$ (for mossy fibre) with $N$ cells restricted to each row of $L$ Purkinje cells, with a total of $N \times S$ mossy fibres. Mossy fibre activity was represented in a binary manner, $M_{i, s} (t) = 0$ or $1$ , with ${i = 1, \dots, N}$ and ${s = 1, \dots, S}$ .

Figure 12

Download asset Open asset

Diagram of the simulated network model.

The details are explained in the text.

https://doi.org/10.7554/eLife.31599.015

The connectivity was chosen such that all Purkinje cells in the sagittal ‘column’ $l$ projected to the $l$ -th nuclear projection neurone with identical inhibitory (negative) weights, as well as to the $l$ -th nucleo-olivary neurone with potentially different, but also identical inhibitory weights. Mossy fibres were chosen to project to Purkinje cells with groups of $N$ fibres at a given sagittal position $s$ , contacting cells on the row of $L$ Purkinje cells at the sagittal position with probability 1/2. The activity of Purkinje and cerebellar nuclear cells in the absence of climbing fibre activity was thus

P C_{s, l} (t) = Φ (\sum_{i = 1, N} w_{i, s}^{l} σ_{i, s}^{l} M_{i, s} (t))

P N_{l} (t) = Φ (\sum_{i = 1, N; s = 1, S} u_{i, s}^{l} M_{i, s} (t) + u_{(P C \to P N)} \sum_{s = 1, S} P C_{s, l} (t))

N O_{l} (t) = Φ (\sum_{i = 1, N; s = 1, S} v_{i, s}^{l} M_{i, s} (t) + u_{(P C \to N O)} \sum_{s = 1, S} P C_{s, l} (t)),

where $σ_{i, s}^{l}$ enforces the 1/2 probability of connection between a Purkinje cell and a parallel fibre that traverses its dendritic tree: $σ_{i, s}^{l}$ is equal to $1$ or $0$ with probability 1/2, independently drawn for each triple $(i, s, l)$ . The f-I curve $Φ$ was taken to be a saturating threshold linear function, ${Φ (x) = 0$ for $x < 0, Φ (x) = x$ for $0 < x < r_{m a x}$ and $Φ (x) = r_{m a x}$ for $x > r_{m a x}}$ . The weights $u_{i, s}^{l}$ were non-plastic and chosen such that a given mossy fibre $(i, s)$ contacted a single projection neurone and a single nucleo-olivary neurone among the $L$ possible ones of each type. In other words, for each mossy fibre index $(i, s)$ , a number $l_{(i, s)}$ was chosen at random with uniform probability among the $L$ numbers ${1, \dots, L}$ and the weights ${u_{i, s}^{l}, l = 1, \dots, l}$ were determined as

u_{i, s}^{l} = δ_{l, l_{(i, s)}} u_{(M \to P N)},

where $u_{(M \to P N)}$ was a constant. The weights $v_{i, s}^{l}$ and $w_{i, s}^{l}$ were plastic and followed the learning dynamics described below (see Equations 11 and 13).

The learning task itself consisted of producing, in response to $μ = 1, \dots, p$ spatiotemporal patterns of mossy fibre inputs $M_{i, s}^{μ} (t)$ , the corresponding output target rates of the projection neurones ${R_{l}^{μ} (t) l = 1, \dots, L, t = 1, . . ., T}$ . For each pattern $μ$ , the inputs were obtained by choosing at random with uniform probability $N S / 2$ active fibres. For each active fibre $(i, s)$ , a time bin $t_{(i, s)}^{μ}$ was chosen at random with uniform probability and the activity of the fibre was set to one in this time bin, $M_{(i, s)}^{μ} (t) = δ_{t, t_{(i, s)}^{μ}}$ . The activity was set to zero in all time bins for the $N S / 2$ inactive fibres. The target rates where independently chosen with uniform probability between 0 and $2 {\bar{r}}_{D}$ for each projection neurone in each pattern $μ$ , where ${\bar{r}}_{D}$ is the desired average firing rate for both projection and nucleo-olivary neurones in the cerebellar nuclei.

The olivary neurones were not explicitly represented. It was assumed that the $L \times S$ Purkinje cells were contacted by $L$ climbing fibres with one climbing fibre contacting the $S$ Purkinje cells at a given lateral position.

The learning algorithm then proceeded as follows. Patterns ${M_{i, s}^{μ} (t)}$ were presented sequentially. After pattern $μ$ was chosen, perturbations of Purkinje cell firing by complex spikes were generated as follows. The probability that each climbing fibre emitted a perturbation complex spike was taken to be $ρ$ per pattern presentation; when a climbing fibre was active, it was considered to perturb the firing of its Purkinje cells in a single time bin chosen at random. Denoting by $η_{l} (t) = 1$ that climbing fibre $l$ had emitted a spike in time bin $t$ (and $η_{l} (t) = 0$ when there was no spike), the $S$ firing rates of the Purkinje cell at position $l$ (see Equation 3) were taken to be

P C_{s, l} (t) = Φ (\sum_{i = 1, N} w_{i, s}^{l} σ_{i, s}^{l} M_{i, s} (t) + η_{l} (t) A), s = 1, \dots, S,

where $A$ defines the amplitude of the complex-spike perturbation of Purkinje cell firing.

Given the pattern $μ$ and the firing of Purkinje cells (Equation 6), the activities of cerebellar nuclear neurones were given by Equations 3 and 4. The current ‘error’ for pattern/movement $μ$ was quantified by the average distance of the projection neurones’ activity from their target rates

ℰ^{μ} = \frac{1}{L T} \sum_{l = 1, L; t = 1, T} | P N_{l}^{μ} (t) - R_{l}^{μ} (t) | .

The learning step after the presentation of pattern $μ$ was determined by the comparison (not explicitly implemented) in the olivary neurones between the excitation $ℰ^{μ}$ and the inhibition $ℐ^{μ}$ coming from the discharges ofnucleo-olivary neurones with

ℐ^{μ} = \frac{1}{L T} \sum_{l = 1, L; t = 1, T} N O_{l}^{μ} (t) .

An error complex spike was propagated to all Purkinje cells after a ‘movement’ when the olivary activity $I O = ℰ^{μ} - ℐ^{μ}$ was positive. Accordingly, modifications of the weights of mossy fibre synapses on perturbed Purkinje cells ( $w$ ) and on nucleo-olivary neurones ( $v$ ) were determined after presentation of pattern $μ$ by the sign $c$ of $I O$ , $c = sign (I O)$ , as,

w_{i, s}^{l} \to w_{i, s}^{l} - α_{w} c \sum_{t = 1, T} η_{l} (t) M_{i, s}^{μ} (t)

v_{i, s}^{l} \to {[v_{i, s}^{l} + α_{v} c \sum_{t = 1, T} M_{i, s}^{μ} (t)]}_{+}

where the brackets served to enforce a positivity constraint on the weights $v_{i, s}^{l}$ ( ${[x]}_{+} = x, x > 0$ and ${[x]}_{+} = 0, x < 0$ ).

In the reported simulations, the non-plastic weights in Equations 3–5 were identical and constant for all synapses of a given type:

u_{(M \to P N)} = 4 T L {\bar{r}}_{D} / N S

u_{(P C \to P N)} = - {\bar{r}}_{D} / ({\bar{r}}_{P C} S)

u_{(P C \to N O)} = q u_{(P C \to P N)} = - q {\bar{r}}_{D} / ({\bar{r}}_{P C} S),

where $q$ describes the strength of the $P C \to N O$ connection relative to the $P C \to P N$ connection. The initial weights of the plastic synapses were drawn from uniform distributions such that initial firing rates were on average given by ${\bar{P C}}_{0} = {\bar{r}}_{P C}$ and ${\bar{N O}}_{0} =$ 15 Hz for Purkinje cells and nucleo-olivary neurones, respectively. The latter value was chosen such that the expected initial mismatch between inhibition $ℐ$ and excitation $ℰ$ for random pairings of initial $P N$ firing rates and target rates $R$ was of the order of 1 Hz. Consequently, for $M \to P C$ synapses $w_{i, s}^{l}$ was drawn from $[0, 8 T {\bar{r}}_{P C} / N]$ and for $M \to N O$ synapses, weights $v_{i, s}^{l}$ were drawn from $[0, ({\bar{N O}}_{0} + q {\bar{r}}_{D}) 4 T L / (N S)]$ . These weights furthermore ensured initial average firing rates close to ${\bar{r}}_{D}$ in the nuclear projection neurones. Target rates $R$ are uniformly distributed between 0 Hz and $2 {\bar{r}}_{D}$ .

The parameters used in the reported simulation (full and simplified) are provided in Table 2.

Table 2

Parameters used in the simulation shown in Figure 12.

https://doi.org/10.7554/eLife.31599.016

Parameter	Symbol	Value
Sagittal extent	$S$	10
Lateral extent	$L$	40
Time bins per movement	$T$	10
Mossy fibres per sagittal position	$N$	2000
Maximum firing rate (all neurones)	$r_{m a x}$	300 Hz
pCS probability per cell per movement	$ρ$	0.03
Amplitude of Purkinje cell firing perturbation	$A$	2 Hz
Learning rate of $M \to P C$ synapses	$α_{w}$	0.02
Learning rate of $M \to N O$ synapses	$α_{v}$	0.0002
Mean Purkinje cell firing rate	${\bar{r}}_{P C}$	50 Hz
Mean firing rate for nuclear neurones	${\bar{r}}_{D}$	30 Hz
$M \to P N$ synaptic weight	$u_{(M \to P N)}$	2.4
$P C \to P N$ synaptic weight	$u_{(P C \to P N)}$	$-$ 0.06
$P C \to N O$ relative synaptic weight	$q$	0.5
Weight increment of $M \to P C$ drift (simple model)	$β_{w}$	0.002

In the following we describe the simplified version of the model without perturbations used to highlight shortcomings of the current Marr-Albus-Ito theory. $M$ , $P C$ , and $P N$ cell types are defined as above with identical synaptic connections and identical initial weights. The learning rules are different as no comparison between an estimated error (inhibition) and the global error (excitation) is made. We therefore do not consider the population of $N O$ cells in this version. Two methods of calculating the error were tested. In the first, the error conveyed by the climbing fibres contains information about the sign of the movement error and is given by

ℰ^{μ} = \frac{1}{L T} \sum_{l = 1, L; t = 1, T} (R_{l}^{μ} (t) - P N_{l}^{μ} (t)) .

In the second, the absolute values according to Equation 7 were used.

The synaptic weight changes are as follows. Whenever the error is positive, the synaptic weights of active $M \to P C$ connections are decreased by an amount $α_{w}$ , while otherwise there is a slow positive drift of $M \to P C$ synaptic weights with increments $β_{w}$ :

w_{i, s}^{l} \to {\begin{cases} w_{i, s}^{l} - α_{w} \sum_{t = 1, T} M_{i, s}^{μ} (t) & if ℰ^{μ} > 0 \\ w_{i, s}^{l} + β_{w} & else . \end{cases}

In order to show relaxation of the average of $P N$ firing rates to the average of the target rates $R$ , we chose different initial distributions of uniformly distributed target rates with averages between 10 Hz and 50 Hz.

We show in the main text that the four key parameters governing convergence of our learning algorithm are the learning rates of mossy fibre–Purkinje cell synapses $α_{w}$ and mossy fibre–nucleo-olivary neurone synapses $α_{v}$ , as well as the probability of a perturbation complex spike occurring in a given movement in a given cell $ρ,$ and the resulting amplitude of the perturbation of Purkinje cell firing $A$ . Varying each of these $\pm$ 10 % individually altered the final error (1.4 Hz, averaged over trials 55,000–60,000) by at most 7%, indicating that this final output was not ill-conditioned or finely tuned with respect to these parameters.

The simulation was coded in Python.

Appendix 1

Mathematical appendix

Reduced model

The model focuses on the rate $P (τ)$ of one P-cell. Learning a pattern (movement) requires adjustment of the $P (τ)$ rate to the target value $R$ . The movement error is defined as

ℰ (τ) = | P (τ) - R |

Note that here $τ$ represents trial/learning step number rather than temporal variations within a single movement realisation. In this simplified model, the P-cell activity during a movement is characterised by a single value rather than a sequence of values in time.

The other variable of the model besides $P (τ)$ is $𝒥 (τ)$ . It is used to quantify the strength

ℐ (τ) = {[𝒥 (τ) - q P (τ)]}_{+}

of the $(N O \to ℐ O)$ inhibitory input which provides the current value of the estimated global error. The two terms in the expression of the current are meant to represent the excitatory action of the MF inputs on the discharge of the $N O$ $(𝒥 (τ))$ and the inhibitory effects of the Purkinje cell discharge ( $- q P (τ)$ ). Learning steps consist of updates of the values $P (τ)$ and $𝒥 (τ)$ . They are of two kinds depending on whether the P-cell rate is perturbed or not.

Updates of type $Δ P Δ 𝒥$ proceed as follows. The firing rate of the P-cell is increased by $A > 0$ and becomes $P (τ) + A$ . The value of the global error corresponding to this perturbed firing rate is thus

ℰ_{p} = | P (τ) + A - R |

$P (τ)$ and $𝒥 (τ)$ are updated depending on whether $ℰ_{p}$ is smaller or greater than the current value of the estimated global error ${[𝒥 (τ) - q P (τ) - q A]}_{+}$ . The parameter $q > 0$ quantifies how much the $N O$ output is affected by the P-cell discharge, with the brackets indicating rectification (which imposes the constraint that the $N O$ output remains non-negative).

If $ℰ_{p} > {[𝒥 (τ) - q P (τ) - q A]}_{+}$ the perturbation is judged to have increased the error and therefore to have the wrong sign: the perturbed firing rate needs to be decreased. Concurrently, $𝒥$ needs to be increased since $ℐ$ is judged too low compared to the real value of the error. Thus $P (τ)$ and $𝒥 (τ)$ are changed to
$P (τ + 1) = {[P (τ) - Δ P]}_{+}$ $𝒥 (τ + 1) = 𝒥 (τ) + Δ 𝒥$

As above the brackets indicate rectification, which imposes the constraint that firing rates are non-negative.

If $ℰ_{p} < {[𝒥 (τ) - q P (τ) - q A]}_{+}$ , the converse reasoning leads to changes of $P (τ)$ and $𝒥 (τ)$ in the opposite directions
$P (τ + 1) = P (τ) + Δ P$ $𝒥 (τ + 1) = {[𝒥 (τ) - Δ 𝒥]}_{+}$

Updates of type $Δ 𝒥$ are performed as described above but without any perturbation ( $A = 0$ ) and without any update of $P$ (only the estimated error $ℐ (τ)$ is updated in updates of type $Δ 𝒥$ ). Namely,

If $ℰ > {[𝒥 (τ) - q P (τ)]}_{+}$ the estimated error is judged too low compared to the real value of the error and $𝒥$ needs to be increased. Thus $P (τ)$ is kept unchanged and $𝒥 (τ)$ is modified
$P (τ + 1) = P (τ)$ $𝒥 (τ + 1) = 𝒥 (τ) + Δ 𝒥$

If $ℰ < {[𝒥 (τ) - q P (τ)]}_{+}$ , the converse reasoning leads to changes $𝒥 (τ)$ in the opposite directions
$P (τ + 1) = P (τ)$ $𝒥 (τ + 1) = {[𝒥 (τ) - Δ 𝒥]}_{+}$

The proposed core operation of the algorithm for one movement can thus be described as follows. At each time step, a perturbation occurs with probability $ρ$ . The occurrence of a perturbation gives rise to an update of type $Δ P Δ 𝒥$ , performed as described by Equations 19-22. For the complementary $(1 - ρ)$ fraction of time steps, no perturbation occurs $(A = 0)$ and updates of type $Δ 𝒥$ are performed, as described by Equations 23-26.

This abstract model depends on five parameters: $A$ , the amplitude of the rate perturbation, $Δ P$ and $Δ 𝒥$ , the update amplitudes of the rate and error estimate, respectively, $q$ the strength of inhibition of the NO by the P-cell, and $ρ$ , which describes the probability of an update of type $Δ P Δ 𝒥$ (and the complementary probability $1 - ρ$ of $Δ 𝒥$ updates). The ratio of the update amplitudes play an important role and is denoted by $β$ , $β = Δ 𝒥 / Δ P$ . The target firing rates are chosen in the range $[0, R_{m a x}]$ , which simply fixes the firing rate scale. We would like to determine the conditions on the five parameters $A, Δ P, Δ 𝒥, q$ and $ρ$ for the algorithm to ‘converge’. For simplicity, we here consider only constant perturbation and learning steps. Therefore, at best the rate and the error fluctuates around the target rates and zero error, in a bounded domain of size determined by the magnitude of the constant perturbation and learning steps. We say that the algorithm converges when this situation is reached. We would also like to understand how they determine the rate of convergence and the residual error.

Convergence in the one-cell case

Stochastic gradient descent is conveniently analysed with a phase-plane description, by following the values $(P, 𝒥)$ from one update to the next in the $(P, 𝒥)$ plane. The dynamics randomly alternate between updates of type $Δ P Δ 𝒥$ and $Δ 𝒥$ , depending on whether the P-cell rate is perturbed by the CF or not. We consider these two types of update in turn.

For updates of type $Δ P Δ 𝒥$ , the update depends on whether the perturbed error $ℰ_{p} = | P - R + A |$ is larger or smaller than the estimated error ${[𝒥 - q P]}_{+}$ . Namely, it depends on the location of the current $(P, 𝒥)$ with respect to the two lines in the $P$ - $𝒥$ plane, $C_{\pm}$ , ${(q \pm 1) P + (q \pm 1) A \mp R = 𝒥}$ , of slopes $q \pm 1$ (see Figure 9B). The dynamics of Equations 19-22 are such that each update moves the point $(P, 𝒥)$ by adding to it the vectorial increment $\pm (Δ P, - Δ 𝒥)$ with the $+$ sign holding in the quadrant above the lines $C_{\pm}$ and the minus sign elsewhere. These updates move the point $(P, 𝒥)$ towards the lines $C_{\pm}$ . In the triangular domain below the line $C_{-}$ , the update does not directly move the $(P, 𝒥)$ trajectory towards the $C_{-}$ line, when $β < (1 - q)$ and the update has an angle greater than the inclination of the $C_{-}$ . However, it reduces $P$ . When $P = 0$ below $C_{-}$ , the updates are strictly upward and towards $C_{-}$ .

Updates of type $Δ 𝒥$ depend on the location of the point $(P, 𝒥)$ with respect to the lines $D_{\pm}$ , ${(q \pm 1) P \mp R = 𝒥}$ , since the cell firing rate is not perturbed in these updates. In the quadrant above the lines $D_{\pm}$ , an update moves the point $(P, 𝒥)$ downwards by $Δ 𝒥$ , in other words, adds the vectorial increment $(0, - Δ 𝒥)$ . In the complementary domain of the $(P, 𝒥)$ plane, an update moves the point $(P, 𝒥)$ upwards by $Δ 𝒥$ , in other words, adds the opposite vectorial increment $(0, Δ 𝒥)$ . Both updates move $(P, 𝒥)$ towards the lines $D_{\pm}$ .

Learning proceeds by performing updates of type $Δ P Δ 𝒥$ and $Δ 𝒥$ with respective probabilities $ρ$ and $(1 - ρ)$ . Starting from an initial coordinate $(P, 𝒥)$ , three phases can be distinguished as described in the main text.

Above the lines $C_{+}$ and $D_{-}$ , the mixed updates lead the point $(P, 𝒥)$ to perform a random walk with a systematic mean rightward-downward drift per update equal to

ρ (Δ P, - Δ 𝒥) + (1 - ρ) (0, - Δ 𝒥) = (ρ Δ P, - Δ 𝒥),

for $(P, 𝒥)$ above $C_{+}$ and $D_{-}$ . Below the lines $C_{-}$ and $D_{+}$ , the updates are opposite and the mean drift per update is leftward-upward, equal to

(- ρ Δ P, Δ 𝒥),

for $(P, 𝒥)$ below $C_{-}$ and $D_{+}$ . Depending on its initial condition and the exact set of updates drawn, this leads $(P, 𝒥)$ to reach either one of the two ‘convergence corridors’, between the lines $C_{+}$ and $D_{+}$ , or between the lines $C_{-}$ and $D_{-}$ (see Figure 9B). In the triangular domain below the line $C_{-}$ , the mean leftward-upward drift has an angle greater than the inclination of the $C_{-}$ line for $ρ Δ P (1 - q) > Δ 𝒥$ (i.e. $β < (1 - q) ρ$ ). In this case, the mean drift does not ensure that the $(P, 𝒥)$ trajectory crosses the $C_{-}$ line. However, if $P$ becomes zero before crossing $C_{-}$ , the positivity constraint on the rate, imposes that subsequent updates are strictly upward, until $(P, 𝒥)$ reaches $C_{-}$ . Examples of such trajectories can be seen in Appendix 1—figure 1C and D (red lines).

Appendix 1—figure 1

Download asset Open asset

Reduced model: different cases of stochastic gradient learning are illustrated, for the parameters marked by *solid red circles* in Figure 9C, except for the case already shown in Figure 9E and F.

(A) Dynamics in the $(P, 𝒥)$ plane for $ρ = 0.5$ and $β = 2$ ( $A = 10, Δ P = 1, Δ 𝒥 = 2$ ). Trajectories (*solid lines*) from different initial conditions (*open circles*) are represented in the $(P, 𝒥)$ plane. The trajectories converge by oscillating around $C_{+}$ in the $C_{+} / D_{+}$ ‘corridor’ and around $D_{-}$ in the $C_{-} / D_{-}$ ‘corridor’. Trajectory endings are marked by *filled circles*. (B) Time courses of $P (τ)$ with corresponding colours. The slope of convergence (*dotted lines*) predicted by Equation 32 and 33 agrees well with the observed convergence rate for $C_{+}$ while it is less accurate for $D_{-}$ when the assumption that the ‘corridor’ width is much greater than $Δ P$ and $Δ 𝒥$ does not hold. (**C, D**) Same graphs for $ρ = 0.75$ . The trajectories converge by oscillating around $C_{+}$ in the $C_{+} / D_{+}$ and around $C_{-}$ in the $C_{-} / D_{-}$ ‘corridor’. (**E, F**) Same graphs for $ρ = 0.75$ and $β = 0.25$ (i.e. $A = 2, Δ P = 1, Δ 𝒥 = 0.25$ ). The trajectories converge by oscillating around $C_{+}$ , which is the only attractive line.

https://doi.org/10.7554/eLife.31599.019

When a corridor is reached, in a second phase, $(P, 𝒥)$ follows a stochastic walk in the corridor.

A sufficient condition for convergence is that a single update cannot cross the two boundary lines of a corridor at once. For the $(C_{+}, D_{+})$ corridor, the crossing by a single $Δ P Δ 𝒥$ update provides the most stringent requirement,.

A > Δ 𝒥 / (q + 1) + Δ P = Δ 𝒥 [1 / (q + 1) + 1 / β] .

The crossing of the $(C_{-}, D_{-})$ corridor by a single $Δ 𝒥$ provides the other requirement

A (1 - q) > Δ 𝒥

When these conditions are met, alternation between the two types of updates produces a mean downward drift of the error. This downward drift controls the convergence rate and depends on the relative size of the perturbation and the discrete $Δ P$ and $Δ 𝒥$ modifications (in other words, the number of modifications that are needed to cross the convergence corridor). For simplicity, we consider the case where the perturbation $A$ is large compared to $Δ P$ and $Δ 𝒥$ , so that oscillations basically take place around one line of the corridor, as illustrated in Figure 9E and Appendix 1—figure 1. This amounts to being able to neglect the probability of crossing the other line of the corridor. The convergence rate then depends on the corridor line around which it takes place and can be obtained by noting that around a given corridor line, one of the two types of updates always has the same sign.

In the $(C_{+}, D_{+})$ corridor, performing type $Δ P Δ 𝒥$ updates for fraction $ρ$ of the steps and type $Δ 𝒥$ updates for the complementary fraction $(1 - ρ)$ leads to the average displacement per step,

ρ (- Δ P, Δ 𝒥) + (1 - ρ) (0, - Δ 𝒥) = (- ρ Δ P, (2 ρ - 1) Δ 𝒥)

Thus, as summarised in Figure 9C

For $ρ < β / [(q + 1) + 2 β]$ the average displacement leads to $D_{+}$ (Figure 9E). The average downward drift in the corridor can be obtained by noting that for a large $A$ , the $(P, 𝒥)$ trajectory has a negligibly small probability of crossing the $C_{+}$ line. Therefore, all the type $Δ P Δ 𝒥$ updates are of the same sign, of the form $(- Δ P, Δ 𝒥)$ , and chosen with probability $ρ$ . Since updates of type $Δ 𝒥$ do not change the value of $P$ , $P$ approaches its target rate with a mean speed per step $V_{c}$ ,
$D_{+} : V_{c} = - ρ Δ P$

A comparison between this computed drift and simulated trajectories is shown in Figure 9F.

For $ρ > β / [(q + 1) + 2 β]$ , the average displacement leads to $C_{+}$ (Appendix 1—figure 1A,C,E). In this case, all type $Δ 𝒥$ updates are of the same sign, of the form $(0, - Δ 𝒥)$ . In contrast, a fraction $f$ of type $Δ P Δ 𝒥$ updates are of the form $(- Δ P, Δ 𝒥)$ , while a fraction $(1 - f)$ is of the opposite form $(Δ P, - Δ 𝒥)$ with $f$ to be determined. The average drift is thus $(1 - ρ) (0, - Δ 𝒥) + ρ [f (- Δ P, Δ 𝒥) + (1 - f) (Δ P, - Δ 𝒥)] = (ρ (1 - 2 f) Δ P, [(ρ - 1) + ρ (2 f - 1)] Δ 𝒥)$ . Requiring that this drift has slope $(1 + q)$ , like $C_{+}$ , gives $2 f - 1 = β (1 - ρ) / [(1 + q + β) ρ]$ and $f = [β + ρ (1 + q)] / [2 ρ (1 + q + β)]$ (which indeed obeys $0 < f < 1$ in the parameter domain considered). Thus, $P$ approaches its target rate with a mean speed per step $V_{c}$ ,
$C_{+} : V_{c} = - \frac{1 - ρ}{1 + q + β} Δ 𝒥$

A comparison between this computed drift and simulated trajectories is shown in Appendix 1—figure 1B,D,F.

In the $(C_{-}, D_{-})$ corridor, the average drift per step is opposite to the drift in the $(C_{+}, D_{+})$ corridor (Equation 31). Thus,

For $β > (1 - q)$ and $ρ > β / (2 β + q - 1)$ the average displacement leads to $C_{-}$ (Appendix 1—figure 1A). Updates of type $Δ 𝒥$ are always of the form $(0, Δ 𝒥)$ , while a fraction $f$ of type $Δ P Δ 𝒥$ updates are of the form $(Δ P, - Δ 𝒥)$ and a fraction $(1 - f)$ are of the opposite form $(- Δ P, Δ 𝒥)$ . Again, since the average drift is along $C_{-}$ of slope $q - 1$ , one obtains $2 f - 1 = β (1 - ρ) / [(β + q - 1) ρ]$ or $f = [β + ρ (q - 1)] / [2 ρ (β + q - 1)]$ (which obeys $0 < f < 1$ in the parameter domain considered). The convergence speed $V_{c}$ is thus,
$C_{-} : V_{c} = \frac{1 - ρ}{β + q - 1} Δ 𝒥$

A comparison between this computed drift and simulated trajectories is shown in Appendix 1—figure 1B.

For $β < 1 - q$ , or ${β > 1, ρ < β (2 β - 1)}$ , the complementary parameter domain, the drift in the corridor leads to $D_{-}$ (Appendix 1—figure 1A,C). The domain $β < (1 - q) ρ$ can be excluded, since when the point $(P, 𝒥)$ crosses $D_{-}$ , the drift in the upper quadrant $(C_{+}, D_{-})$ (Equation 27) tends to bring it to the other corridor $(C_{+}, D_{+})$ (Appendix 1—figure 1E). Near the line $D_{-}$ , type $Δ P Δ 𝒥$ updates are always of the form $(Δ P, - Δ 𝒥)$ . The convergence speed $V_{c}$ is thus
$D_{-} : V_{c} = ρ Δ P$

A comparison between this computed drift and simulated trajectories is shown in Figure 9F and Appendix 1—figure 1B.

Perceptron model and simulations

The architecture of the perceptron model is again the simplified circuit of Figure 9A, with $N_{M} = 1000$ mossy fibres projecting onto a single P-cell, with weights $w_{i}$ , $i = 1, \dots, N_{M}$ (which are all positive or zero). $N_{P}$ patterns are generated randomly. Activities of mossy fibres in different patterns are i.i.d. binary random variables $M_{i}^{μ}$ with coding level $f$ (i.e. $M_{i} = 1$ with probability $f$ and $0$ with probability $1 - f$ ; in the present simulations $f = 0.2$ ). The P-cell desired rates for the $N_{P}$ patterns are i.i.d. uniform variables $R^{μ}$ from 0 to $P_{m a x}$ (=100 Hz).

In the case of the stochastic gradient descent with estimated global errors algorithm, at each trial a pattern $μ$ is randomly drawn without replacement among the total $N_{P}$ and there is a probability $ρ$ of a perturbation of amplitude $A$ . The output of the P-cell is

P^{μ} = {[\frac{1}{\sqrt{N_{M}}} (\sum_{i} w_{i} M_{i}^{μ} - θ N_{M})]}_{+} + η A

where $η = 1$ with probability $ρ$ and $η = 0$ otherwise, which thus introduces a random perturbation of amplitude $A$ into the P-cell firing. The error is defined as $ℰ = | P^{μ} - R^{μ} |$ . Comparison with previously obtained results for the capacity of this analog perceptron (Clopath and Brunel, 2013) motivates our choice of weights normalisation and the parameterisation of the threshold as $θ = \frac{1}{2} P_{m a x} (\sqrt{\frac{f}{3 (1 - f) γ}} - \frac{1}{\sqrt{N_{M}}})$ , where $γ$ is a composite parameter reflecting the statistics of input and output firing, but here is equal to one.

An inferior olivary neurone receives the excitatory error signal but it also receives inhibitory inputs from the nucleo-olivary neurones driven by the mossy fibre inputs (which we have denoted $M$ above), with weights $v_{i}$ . These are also plastic and sign-constrained. They represent the current estimated error. The net drive of the inferior olivary neurone is

I O = ℰ - ℐ, w i t h ℐ = {[\frac{1}{\sqrt{N_{M}}} (\sum_{i} v_{i} M_{i}^{μ} - θ N_{M}) - q P^{μ}]}_{+} .

Here, the term $q P^{μ}$ represents the specific inhibition of nucleo-olivary neurones by Purkinje cells and the term proportional to $θ$ represents non-specific inhibition. In other words, $\frac{1}{\sqrt{N_{M}}} (\sum_{i} v_{i} M_{i}^{μ} - θ N_{M})$ corresponds to $𝒥$ in Equation 17. For simplicity we assume the non-specific inhibition to be constant and the value of $θ$ equal to that used in Equation 36.

The climbing fibre signal controlling plasticity is

c = sign (I O)

Weights are changed according to the following rule. Weights of $M \to P$ synapses active simultaneously with a perturbation are increased if $c$ is negative and decreased if it is positive. Weights of $M \to N O$ synapses are increased if $c$ is positive and decreased if not. Thus

w_{i} = {[w_{i} - α_{w} c η M_{i}^{μ}]}_{+}

v_{i} = {[v_{i} + α_{v} c M_{i}^{μ}]}_{+}

with the brackets indicating rectification (to impose the excitatory constraint).

The parameters of the simplified model of the previous subsection can be written as a function of those controlling the learning process in the present analog perceptron. The probability $ρ$ of the two types of updates (with and without perturbation) and the amplitude of perturbation $A$ are clearly identical in both models. In order to relate the previous $Δ P$ and $Δ 𝒥$ to the present amplitude change of the weights $α_{w}$ at mossy fibre–P-cell inputs and $α_{v}$ for the indirect drive to the inferior olive from mossy fibres, we neglect the rectification constraints in Equations 38 and 39 which is valid as long as synaptic weights are not very small (i.e. comparable to $α_{w}$ or $α_{v}$ ; see below for further discussion). Therefore, the weight modifications result in the changes $Δ P$ , of the perceptron firing rate, and $Δ 𝒥$ , of the inhibitory input $ℐ$ to the olive,

Δ P = \frac{1}{\sqrt{N_{M}}} \sum_{i} α_{w} M_{i}^{μ} = α_{w} f \sqrt{N_{M}}

Δ 𝒥 = \frac{1}{\sqrt{N_{M}}} \sum_{i} α_{v} M_{i}^{μ} = α_{v} f \sqrt{N_{M}}

The convergence rate estimates of the previous section are compared with direct stochastic gradient descent learning simulations of the analog perceptrons in Figure 10. As shown in Figure 10A, for many single patterns, the convergence rate agrees well with the estimate Equation 33. However, for a fraction of the trajectories, the rate of convergence is about half of the prediction. This arises because these simulations constrained synaptic weights to be non-negative, which creates a significant fraction of synapses with negligible weights (Brunel et al., 2004) and produces a smaller effective step $Δ P$ than estimated without taking this positivity constraint into account. For a larger number of patterns below the maximal learning capacity, the convergence rate per pattern is slower by a factor of $\approx 1.5 - 5$ for $N_{P}$ between $100 - 350$ (Figure 10B).

The SGDEGE algorithm is compared to the usual delta rule in Figure 10. For the delta rule, the patterns are presented sequentially. When the pattern $μ$ is presented, the weights are changed according to the signed error $ℰ_{s}$ as,

w_{i} = {[w_{i} - α_{w} M_{i}^{μ} ℰ_{s}]}_{+}

The error is defined by the distance (positive or negative) of the P-cell firing rate to the target rate, $ℰ_{s} = P^{μ} - R^{μ}$ , and $P^{μ}$ is given as before by Equation 36 (with $A = 0$ ).

The convergence of the SGDEGE algorithm is considerably slower than that obtained for the delta rule (Figure 10C), while the relative slowing due to interference of multiple patterns is comparable for the delta rule and the proposed algorithm (compare panels B,C in Figure 10). The slower convergence rate of the SGDEGE algorithm arises from the use of update steps of fixed amplitude while the update is proportional to the error magnitude for the delta rule. A modified delta rule with constant amplitude updates would give a convergence rate comparable to the SGDEGE one.

Data availability

Source data, analysis/simulation scripts and software libraries have been depositied at the Zenodo repository.

The following data sets were generated

(2018) Zenodo
Cerebellar learning using perturbations: data, analysis/simulation scripts.

https://doi.org/10.5281/zenodo.1481929
(2018) Zenodo
Cerebellar learning using perturbations: software libraries.

https://doi.org/10.5281/zenodo.1481925

References

1. Albus JS
(1971) A theory of cerebellar function
Mathematical Biosciences 10:25–61.

https://doi.org/10.1016/0025-5564(71)90051-4
- Google Scholar
1. Apps R
2. Hawkes R
(2009) Cerebellar cortical organization: a one-map hypothesis
Nature Reviews Neuroscience 10:670–681.

https://doi.org/10.1038/nrn2698
- PubMed
- Google Scholar
(2011) The mechanisms of the strong inhibitory modulation of long-term potentiation in the rat dentate gyrus
European Journal of Neuroscience 33:1637–1646.

https://doi.org/10.1111/j.1460-9568.2011.07657.x
- PubMed
- Google Scholar
1. Barmack NH
2. Hess DT
(1980) Eye movements evoked by microstimulation of dorsal cap of inferior olive in the rabbit
Journal of Neurophysiology 43:165–181.

https://doi.org/10.1152/jn.1980.43.1.165
- Google Scholar
Conference
(1983) Neuronlike adaptive elements that can solve difficult learning control problems
IEEE Transactions on Systems, Man, and Cybernetics . pp. 834–846.

https://doi.org/10.1109/TSMC.1983.6313077
- Google Scholar
(1983) Simple and complex spike activity of cerebellar purkinje cells during active and passive movements in the awake monkey
The Journal of Physiology 339:379–394.

https://doi.org/10.1113/jphysiol.1983.sp014722
- PubMed
- Google Scholar
(2012a) Olivary subthreshold oscillations and burst activity revisited
Frontiers in Neural Circuits 6:91.

https://doi.org/10.3389/fncir.2012.00091
- PubMed
- Google Scholar
(2012b) Properties of the nucleo-olivary pathway: an in vivo whole-cell patch clamp study
PLOS ONE 7:e46360.

https://doi.org/10.1371/journal.pone.0046360
- PubMed
- Google Scholar
(2011) In vivo analysis of inhibitory synaptic inputs and rebounds in deep cerebellar nuclear neurons
PLOS ONE 6:e18822.

https://doi.org/10.1371/journal.pone.0018822
- PubMed
- Google Scholar
1. Best AR
2. Regehr WG
(2009) Inhibitory regulation of electrically coupled neurons in the inferior olive is mediated by asynchronous release of GABA
Neuron 62:555–565.

https://doi.org/10.1016/j.neuron.2009.04.018
- PubMed
- Google Scholar
(2004) The vestibulo-ocular reflex as a model system for motor learning: what is the role of the cerebellum?
The Cerebellum 3:188–192.

https://doi.org/10.1080/14734220410018120
- PubMed
- Google Scholar
(2013) Axonal sprouting and formation of terminals in the adult cerebellum during associative motor learning
Journal of Neuroscience 33:17897–17907.

https://doi.org/10.1523/JNEUROSCI.0511-13.2013
- PubMed
- Google Scholar
1. Bouvier G
2. Higgins D
3. Spolidoro M
4. Carrel D
5. Mathieu B
6. Léna C
7. Dieudonné S
8. Barbour B
9. Brunel N
10. Casado M
(2016) Burst-Dependent bidirectional plasticity in the cerebellum is driven by presynaptic NMDA receptors
Cell Reports 15:104–116.

https://doi.org/10.1016/j.celrep.2016.03.004
- PubMed
- Google Scholar
1. Brunel N
2. Hakim V
3. Isope P
4. Nadal JP
5. Barbour B
(2004) Optimal information storage and the distribution of synaptic weights: perceptron versus purkinje cell
Neuron 43:745–757.

https://doi.org/10.1016/j.neuron.2004.08.023
- PubMed
- Google Scholar
1. Caddy KWT
2. Biscoe TJ
(1976) The number of Purkinje cells and olive neurones in the normal and Lurcher mutant mouse
Brain Research 111:396–398.

https://doi.org/10.1016/0006-8993(76)90783-6
- Google Scholar
1. Campbell NC
2. Hesslow G
(1986) The secondary spikes of climbing fibre responses recorded from purkinje cell axons in cat cerebellum
The Journal of Physiology 377:225–235.

https://doi.org/10.1113/jphysiol.1986.sp016183
- PubMed
- Google Scholar
1. Catz N
2. Dicke PW
3. Thier P
(2005) Cerebellar complex spike firing is suitable to induce as well as to stabilize motor learning
Current Biology 15:2179–2189.

https://doi.org/10.1016/j.cub.2005.11.037
- PubMed
- Google Scholar
(2004) Integration of quanta in cerebellar granule cells during sensory processing
Nature 428:856–860.

https://doi.org/10.1038/nature02442
- PubMed
- Google Scholar
1. Clopath C
2. Brunel N
(2013) Optimal properties of analog perceptrons with excitatory weights
PLOS Computational Biology 9:e1002919.

https://doi.org/10.1371/journal.pcbi.1002919
- PubMed
- Google Scholar
(2004) Bidirectional parallel fiber plasticity in the cerebellum under climbing fiber control
Neuron 44:691–700.

https://doi.org/10.1016/j.neuron.2004.10.031
- PubMed
- Google Scholar
1. Colquhoun D
(2014) An investigation of the false discovery rate and the misinterpretation of p-values
Royal Society Open Science 1:140216.

https://doi.org/10.1098/rsos.140216
- PubMed
- Google Scholar
1. Crepel F
2. Jaillard D
(1991) Pairing of pre- and postsynaptic activities in cerebellar purkinje cells induces long-term changes in synaptic efficacy in vitro
The Journal of Physiology 432:123–141.

https://doi.org/10.1113/jphysiol.1991.sp018380
- PubMed
- Google Scholar
1. Dash S
2. Thier P
(2014) Cerebellum-dependent motor learning: lessons from adaptation of eye movements in primates
Progress in Brain Research 210:121–155.

https://doi.org/10.1016/B978-0-444-63356-9.00006-6
- PubMed
- Google Scholar
Book
1. Dayan P
2. Abbott LF
(2001)
Theoretical Neuroscience, 806

Cambridge, MA: MIT Press.
- Google Scholar
(1997)
Climbing fibre collaterals contact neurons in the cerebellar nuclei that provide a GABAergic feedback to the inferior olive

Neuroscience 80:981–986.
- PubMed
- Google Scholar
(1998) Microcircuitry and function of the inferior olive
Trends in Neurosciences 21:391–400.

https://doi.org/10.1016/S0166-2236(98)01310-1
- PubMed
- Google Scholar
(2002) Decorrelation control by the cerebellum achieves oculomotor plant compensation in simulated vestibulo-ocular reflex
Proceedings of the Royal Society B: Biological Sciences 269:1895–1904.

https://doi.org/10.1098/rspb.2002.2103
- Google Scholar
1. Dean P
2. Porrill J
(2014) Decorrelation learning in the cerebellum: computational analysis and experimental questions
Progress in Brain Research 210:157–192.

https://doi.org/10.1016/B978-0-444-63356-9.00007-8
- PubMed
- Google Scholar
1. Devor A
2. Yarom Y
(2002) Electrotonic coupling in the inferior olivary nucleus revealed by simultaneous double patch recordings
Journal of Neurophysiology 87:3048–3058.

https://doi.org/10.1152/jn.2002.87.6.3048
- PubMed
- Google Scholar
Book
1. Doya K
2. Sejnowski TJ
(1988)
A computational model of birdsong learning by auditory experience and auditory feedback

In: Brugge J, Poon P, editors. Central Auditory Processing and Neural Modeling. New York: Plenum Press. pp. 77–88.
- Google Scholar
(1966) The excitatory synaptic action of climbing fibres on the purkinje cells of the cerebellum
The Journal of Physiology 182:268–296.

https://doi.org/10.1113/jphysiol.1966.sp007824
- PubMed
- Google Scholar
Book
(1967)
The Cerebellum as a Neuronal Machine

Berlin Heidelberg: Springer.
- Google Scholar
(1995) Functional relation between corticonuclear input and movements evoked on microstimulation in cerebellar nucleus interpositus anterior in the cat
Experimental Brain Research 106:365–376.

https://doi.org/10.1007/BF00231060
- PubMed
- Google Scholar
1. Eshel N
2. Bukwich M
3. Rao V
4. Hemmelder V
5. Tian J
6. Uchida N
(2015) Arithmetic and local circuitry underlying dopamine prediction errors
Nature 525:243–246.

https://doi.org/10.1038/nature14855
- PubMed
- Google Scholar
1. Fiete IR
2. Fee MS
3. Seung HS
(2007) Model of birdsong learning based on gradient estimation by dynamic perturbation of neural conductances
Journal of Neurophysiology 98:2038–2057.

https://doi.org/10.1152/jn.01311.2006
- PubMed
- Google Scholar
(1998) Cutaneous receptive fields and topography of mossy fibres and climbing fibres projecting to cat cerebellar C3 zone
The Journal of Physiology 512:277–293.

https://doi.org/10.1111/j.1469-7793.1998.277bf.x
- PubMed
- Google Scholar
(2002) Common principles of sensory encoding in spinal reflex modules and cerebellar climbing fibres
The Journal of Physiology 540:1061–1069.

https://doi.org/10.1113/jphysiol.2001.013507
- PubMed
- Google Scholar
(1988) Spatial organization of visual messages of the rabbit's cerebellar flocculus. II. Complex and simple spike responses of Purkinje cells
Journal of Neurophysiology 60:2091–2121.

https://doi.org/10.1152/jn.1988.60.6.2091
- PubMed
- Google Scholar
1. Graupner M
2. Brunel N
(2007) STDP in a bistable synapse model based on CaMKII and associated signaling pathways
PLOS Computational Biology 3:e221.

https://doi.org/10.1371/journal.pcbi.0030221
- PubMed
- Google Scholar
(2005) Mechanisms for selection of basic motor programs--roles for the striatum and pallidum
Trends in Neurosciences 28:364–370.

https://doi.org/10.1016/j.tins.2005.05.004
- PubMed
- Google Scholar
(2017) Motor learning requires purkinje cell synaptic potentiation through activation of AMPA-Receptor subunit GluA3
Neuron 93:409–424.

https://doi.org/10.1016/j.neuron.2016.11.046
- PubMed
- Google Scholar
1. Harris CM
(1998) On the optimal control of behaviour: a stochastic perspective
Journal of Neuroscience Methods 83:73–88.

https://doi.org/10.1016/S0165-0270(98)00063-6
- PubMed
- Google Scholar
(2014) A memory of errors in sensorimotor learning
Science 345:1349–1353.

https://doi.org/10.1126/science.1253138
- PubMed
- Google Scholar
1. Ito M
2. Simpson JI
(1971) Discharges in purkinje cell axons during climbing fiber activation
Brain Research 31:215–219.

https://doi.org/10.1016/0006-8993(71)90648-2
- PubMed
- Google Scholar
1. Ito M
(1972) Neural design of the cerebellar motor control system
Brain Research 40:81–84.

https://doi.org/10.1016/0006-8993(72)90110-2
- PubMed
- Google Scholar
1. Ito M
2. Shiida T
3. Yagi N
4. Yamamoto M
(1974) Visual influence on rabbit horizontal vestibulo-ocular reflex presumably effected via the cerebellar flocculus
Brain Research 65:170–174.

https://doi.org/10.1016/0006-8993(74)90344-8
- PubMed
- Google Scholar
1. Ito M
2. Kano M
(1982) Long-lasting depression of parallel fiber-Purkinje cell transmission induced by conjunctive stimulation of parallel fibers and climbing fibers in the cerebellar cortex
Neuroscience Letters 33:253–258.

https://doi.org/10.1016/0304-3940(82)90380-9
- PubMed
- Google Scholar
(1982) Climbing fibre induced depression of both mossy fibre responsiveness and glutamate sensitivity of cerebellar purkinje cells
The Journal of Physiology 324:113–134.

https://doi.org/10.1113/jphysiol.1982.sp014103
- PubMed
- Google Scholar
Book
1. Ito M
(1984)
The Cerebellum and Neural Control

Raven Press.
- Google Scholar
1. Jones HC
2. Keep RF
(1988) Brain fluid calcium concentration and response to acute hypercalcaemia during development in the rat
The Journal of Physiology 402:579–593.

https://doi.org/10.1113/jphysiol.1988.sp017223
- PubMed
- Google Scholar
(1996) Relation between cutaneous receptive fields and muscle afferent input to climbing fibres projecting to the cerebellar C3 zone in the cat
European Journal of Neuroscience 8:1769–1779.

https://doi.org/10.1111/j.1460-9568.1996.tb01320.x
- PubMed
- Google Scholar
1. Jörntell H
2. Ekerot CF
(2002) Reciprocal bidirectional plasticity of parallel fiber receptive fields in cerebellar purkinje cells and their afferent interneurons
Neuron 34:797–806.

https://doi.org/10.1016/S0896-6273(02)00713-4
- PubMed
- Google Scholar
1. Jörntell H
2. Ekerot CF
(2003) Receptive field plasticity profoundly alters the cutaneous parallel fiber synaptic input to cerebellar interneurons in vivo
The Journal of Neuroscience 23:9620–9631.

https://doi.org/10.1523/JNEUROSCI.23-29-09620.2003
- PubMed
- Google Scholar
1. Jörntell H
2. Ekerot CF
(2006) Properties of somatosensory synaptic integration in cerebellar granule cells in vivo
Journal of Neuroscience 26:11786–11797.

https://doi.org/10.1523/JNEUROSCI.2939-06.2006
- PubMed
- Google Scholar
1. Jörntell H
2. Ekerot CF
(2011) Receptive field remodeling induced by skin stimulation in cerebellar neurons in vivo
Frontiers in Neural Circuits 5:3.

https://doi.org/10.3389/fncir.2011.00003
- PubMed
- Google Scholar
1. Ke MC
2. Guo CC
3. Raymond JL
(2009) Elimination of climbing fiber instructive signals during motor learning
Nature Neuroscience 12:1171–1179.

https://doi.org/10.1038/nn.2366
- PubMed
- Google Scholar
1. Khaliq ZM
2. Raman IM
(2005) Axonal propagation of simple and complex spikes in cerebellar purkinje neurons
Journal of Neuroscience 25:454–463.

https://doi.org/10.1523/JNEUROSCI.3045-04.2005
- PubMed
- Google Scholar
(1998) Inhibitory cerebello-olivary projections and blocking effect in classical conditioning
Science 279:570–573.

https://doi.org/10.1126/science.279.5350.570
- PubMed
- Google Scholar
1. Kimpo RR
2. Rinaldi JM
3. Kim CK
4. Payne HL
5. Raymond JL
(2014) Gating of neural error signals during motor learning
eLife 3:e02076.

https://doi.org/10.7554/eLife.02076
- PubMed
- Google Scholar
(1998) Cerebellar complex spikes encode both destinations and errors in arm movements
Nature 392:494–497.

https://doi.org/10.1038/33141
- PubMed
- Google Scholar
1. Kleim JA
2. Freeman JH
3. Bruneau R
4. Nolan BC
5. Cooper NR
6. Zook A
7. Walters D
(2002) Synapse formation is associated with memory storage in the cerebellum
PNAS 99:13228–13231.

https://doi.org/10.1073/pnas.202483399
- PubMed
- Google Scholar
1. Konishi M
(1965)
The role of auditory feedback in the control of vocalization in the white-crowned sparrow

Zeitschrift Für Tierpsychologie 22:770–783.
- Google Scholar
1. Lee KH
2. Mathews PJ
3. Reeves AM
4. Choe KY
5. Jami SA
6. Serrano RE
7. Otis TS
(2015) Circuit mechanisms underlying motor memory formation in the cerebellum
Neuron 86:529–540.

https://doi.org/10.1016/j.neuron.2015.03.010
- PubMed
- Google Scholar
1. Lev-Ram V
2. Makings LR
3. Keitz PF
4. Kao JP
5. Tsien RY
(1995) Long-term depression in cerebellar purkinje neurons results from coincidence of nitric oxide and depolarization-induced Ca2+ transients
Neuron 15:407–415.

https://doi.org/10.1016/0896-6273(95)90044-6
- PubMed
- Google Scholar
1. Lev-Ram V
2. Wong ST
3. Storm DR
4. Tsien RY
(2002) A new form of cerebellar long-term potentiation is postsynaptic and depends on nitric oxide but not cAMP
PNAS 99:8389–8393.

https://doi.org/10.1073/pnas.122206399
- PubMed
- Google Scholar
(2003) Reversing cerebellar long-term depression
PNAS 100:15989–15993.

https://doi.org/10.1073/pnas.2636935100
- Google Scholar
(1991) Synaptic- and agonist-induced excitatory currents of purkinje cells in rat cerebellar slices
The Journal of Physiology 434:183–213.

https://doi.org/10.1113/jphysiol.1991.sp018465
- PubMed
- Google Scholar
1. Llinás R
2. Mühlethaler M
(1988) Electrophysiology of guinea-pig cerebellar nuclear cells in the in vitro brain stem-cerebellar preparation
The Journal of Physiology 404:241–258.

https://doi.org/10.1113/jphysiol.1988.sp017288
- PubMed
- Google Scholar
1. Llinás R
2. Yarom Y
(1986) Oscillatory properties of guinea-pig inferior olivary neurones and their pharmacological modulation: an in vitro study
The Journal of Physiology 376:163–182.

https://doi.org/10.1113/jphysiol.1986.sp016147
- PubMed
- Google Scholar
(2005) Bistability of cerebellar purkinje cells modulated by sensory stimulation
Nature Neuroscience 8:202–211.

https://doi.org/10.1038/nn1393
- PubMed
- Google Scholar
1. Marcaggi P
2. Attwell D
(2007) Short- and long-term depression of rat cerebellar parallel fibre synaptic transmission mediated by synaptic crosstalk
The Journal of Physiology 578:545–550.

https://doi.org/10.1113/jphysiol.2006.115014
- PubMed
- Google Scholar
1. Marr D
(1969) A theory of cerebellar cortex
The Journal of Physiology 202:437–470.

https://doi.org/10.1113/jphysiol.1969.sp008820
- PubMed
- Google Scholar
(2007) Intraburst and interburst signaling by climbing fibers
Journal of Neuroscience 27:11263–11270.

https://doi.org/10.1523/JNEUROSCI.2559-07.2007
- PubMed
- Google Scholar
1. Mathy A
2. Ho SS
3. Davie JT
4. Duguid IC
5. Clark BA
6. Häusser M
(2009) Encoding of oscillations by axonal bursts in inferior olive neurons
Neuron 62:388–399.

https://doi.org/10.1016/j.neuron.2009.03.023
- PubMed
- Google Scholar
(1982) Initial localization of the memory trace for a basic form of learning
PNAS 79:2731–2735.

https://doi.org/10.1073/pnas.79.8.2731
- PubMed
- Google Scholar
1. Medina JF
2. Lisberger SG
(2008) Links from complex spikes to local plasticity and motor learning in the cerebellum of awake-behaving monkeys
Nature Neuroscience 11:1185–1192.

https://doi.org/10.1038/nn.2197
- PubMed
- Google Scholar
1. Menzies JR
2. Porrill J
3. Dutia M
4. Dean P
(2010) Synaptic plasticity in medial vestibular nucleus neurons: comparison with computational requirements of VOR adaptation
PLOS ONE 5:e13182.

https://doi.org/10.1371/journal.pone.0013182
- PubMed
- Google Scholar
Thesis
1. Minsky M
(1954)
Theory of neural-analog reinforcement systems and its application to the brain-model problem

Princeton.
- Google Scholar
1. Minsky M
(1961) Steps toward artificial intelligence
Proceedings of the IRE 49:8–30.

https://doi.org/10.1109/JRPROC.1961.287775
- Google Scholar
1. Mittmann W
2. Häusser M
(2007) Linking synaptic plasticity and spike output at excitatory and inhibitory synapses onto cerebellar purkinje cells
Journal of Neuroscience 27:5559–5570.

https://doi.org/10.1523/JNEUROSCI.5117-06.2007
- PubMed
- Google Scholar
1. Mlonyeni M
(1973) The number of purkinje cells and inferior olivary neurones in the cat
The Journal of Comparative Neurology 147:1–9.

https://doi.org/10.1002/cne.901470102
- PubMed
- Google Scholar
(2005) Determinants of action potential propagation in cerebellar purkinje cell axons
Journal of Neuroscience 25:464–472.

https://doi.org/10.1523/JNEUROSCI.3871-04.2005
- PubMed
- Google Scholar
1. Mooney R
(2009) Neurobiology of song learning
Current Opinion in Neurobiology 19:654–660.

https://doi.org/10.1016/j.conb.2009.10.004
- PubMed
- Google Scholar
1. Mortimer JA
(1975) Cerebellar responses to teleceptive stimuli in alert monkeys
Brain Research 83:369–390.

https://doi.org/10.1016/0006-8993(75)90831-8
- PubMed
- Google Scholar
1. Mostofi A
2. Holtzman T
3. Grout AS
4. Yeo CH
5. Edgley SA
(2010) Electrophysiological localization of eyeblink-related microzones in rabbit cerebellar cortex
Journal of Neuroscience 30:8920–8934.

https://doi.org/10.1523/JNEUROSCI.6117-09.2010
- PubMed
- Google Scholar
1. Murakami I
(2004) Correlations between fixation stability and visual motion sensitivity
Vision Research 44:751–761.

https://doi.org/10.1016/j.visres.2003.11.012
- PubMed
- Google Scholar
1. Napper RM
2. Harvey RJ
(1988) Number of parallel fiber synapses on an individual purkinje cell in the cerebellum of the rat
The Journal of Comparative Neurology 274:168–177.

https://doi.org/10.1002/cne.902740204
- PubMed
- Google Scholar
1. Nevian T
2. Sakmann B
(2006) Spine Ca2+ signaling in spike-timing-dependent plasticity
Journal of Neuroscience 26:11001–11013.

https://doi.org/10.1523/JNEUROSCI.1749-06.2006
- PubMed
- Google Scholar
(1978) Calcium and potassium changes in extracellular microenvironment of cat cerebellar cortex
Journal of Neurophysiology 41:1026–1039.

https://doi.org/10.1152/jn.1978.41.4.1026
- PubMed
- Google Scholar
1. Ohmae S
2. Medina JF
(2015) Climbing fibers encode a temporal-difference prediction error during cerebellar learning in mice
Nature Neuroscience 18:1798–1803.

https://doi.org/10.1038/nn.4167
- PubMed
- Google Scholar
1. Ohyama T
2. Nores WL
3. Medina JF
4. Riusech FA
5. Mauk MD
(2006) Learning-induced plasticity in deep cerebellar nucleus
Journal of Neuroscience 26:12656–12663.

https://doi.org/10.1523/JNEUROSCI.4023-06.2006
- PubMed
- Google Scholar
(2005) Vocal experimentation in the juvenile songbird requires a basal ganglia circuit
PLOS Biology 3:e153.

https://doi.org/10.1371/journal.pbio.0030153
- PubMed
- Google Scholar
1. Optican LM
2. Robinson DA
(1980) Cerebellar-dependent adaptive control of primate saccadic system
Journal of Neurophysiology 44:1058–1076.

https://doi.org/10.1152/jn.1980.44.6.1058
- PubMed
- Google Scholar
1. Ozden I
2. Dombeck DA
3. Hoogland TM
4. Tank DW
5. Wang SS
(2012) Widespread state-dependent shifts in cerebellar activity in locomoting mice
PLOS ONE 7:e42650.

https://doi.org/10.1371/journal.pone.0042650
- PubMed
- Google Scholar
(1997) High-affinity zinc inhibition of NMDA NR1-NR2A receptors
The Journal of Neuroscience 17:5711–5725.

https://doi.org/10.1523/JNEUROSCI.17-15-05711.1997
- PubMed
- Google Scholar
1. Pugh JR
2. Raman IM
(2006) Potentiation of mossy fiber EPSCs in the cerebellar nuclei by NMDA receptor activation followed by postinhibitory rebound current
Neuron 51:113–123.

https://doi.org/10.1016/j.neuron.2006.05.021
- PubMed
- Google Scholar
1. Pugh JR
2. Raman IM
(2008) Mechanisms of potentiation of mossy fiber EPSCs in the cerebellar nuclei by coincident synaptic excitation and inhibition
Journal of Neuroscience 28:10549–10560.

https://doi.org/10.1523/JNEUROSCI.2061-08.2008
- PubMed
- Google Scholar
Software
1. R Core Team
(2013) R: A Language and Environment for Statistical Computing
R Foundation for Statistical Computing, Vienna, Austria.

http://www.R-project.org/
(2013) Number of spikes in climbing fibers determines the direction of cerebellar learning
Journal of Neuroscience 33:13436–13440.

https://doi.org/10.1523/JNEUROSCI.1527-13.2013
- PubMed
- Google Scholar
(2014) Changes in complex spike activity during classical conditioning
Frontiers in Neural Circuits 8:90.

https://doi.org/10.3389/fncir.2014.00090
- PubMed
- Google Scholar
1. Robbins H
2. Monro S
(1951) A stochastic approximation method
The Annals of Mathematical Statistics 22:400–407.

https://doi.org/10.1214/aoms/1177729586
- Google Scholar
1. Robinson DA
(1976) Adaptive gain control of vestibuloocular reflex by the cerebellum
Journal of Neurophysiology 39:954–969.

https://doi.org/10.1152/jn.1976.39.5.954
- PubMed
- Google Scholar
(1999) The cerebellum is necessary for rabbit classical eyeblink conditioning with a non-somatosensory (photic) unconditioned stimulus
Behavioural Brain Research 104:105–112.

https://doi.org/10.1016/S0166-4328(99)00054-6
- PubMed
- Google Scholar
1. Rowan MJM
2. Bonnan A
3. Zhang K
4. Amat SB
5. Kikuchi C
6. Taniguchi H
7. Augustine GJ
8. Christie JM
(2018) Graded Control of Climbing-Fiber-Mediated Plasticity and Learning by Inhibition in the Cerebellum
Neuron 99:999–1015.

https://doi.org/10.1016/j.neuron.2018.07.024
- PubMed
- Google Scholar
(2008) Multiple cerebellar zones are involved in the control of individual muscles: a retrograde transneuronal tracing study with rabies virus in the rat
European Journal of Neuroscience 28:181–200.

https://doi.org/10.1111/j.1460-9568.2008.06294.x
- PubMed
- Google Scholar
1. Safo P
2. Regehr WG
(2008) Timing dependence of the induction of cerebellar LTD
Neuropharmacology 54:213–218.

https://doi.org/10.1016/j.neuropharm.2007.05.029
- PubMed
- Google Scholar
1. Sakurai M
(1987) Synaptic modification of parallel fibre-Purkinje cell transmission in in vitro guinea-pig cerebellar slices
The Journal of Physiology 394:463–480.

https://doi.org/10.1113/jphysiol.1987.sp016881
- PubMed
- Google Scholar
1. Sarkisov DV
2. Wang SS
(2008) Order-dependent coincidence detection in cerebellar Purkinje neurons at the inositol trisphosphate receptor
Journal of Neuroscience 28:133–142.

https://doi.org/10.1523/JNEUROSCI.1729-07.2008
- PubMed
- Google Scholar
1. Schild RF
(1970) On the inferior olive of the albino rat
The Journal of Comparative Neurology 140:255–259.

https://doi.org/10.1002/cne.901400302
- PubMed
- Google Scholar
(2006a) Purkinje cells in awake behaving animals operate at the upstate membrane potential
Nature Neuroscience 9:459–461.

https://doi.org/10.1038/nn0406-459
- PubMed
- Google Scholar
1. Schonewille M
2. Luo C
3. Ruigrok TJ
4. Voogd J
5. Schmolesky MT
6. Rutteman M
7. Hoebeek FE
8. De Jeu MT
9. De Zeeuw CI
(2006b) Zonal organization of the mouse flocculus: physiology, input, and output
The Journal of Comparative Neurology 497:670–682.

https://doi.org/10.1002/cne.21036
- PubMed
- Google Scholar
1. Schultz W
(1986) Responses of midbrain dopamine neurons to behavioral trigger stimuli in the monkey
Journal of Neurophysiology 56:1439–1461.

https://doi.org/10.1152/jn.1986.56.5.1439
- PubMed
- Google Scholar
(1997) A neural substrate of prediction and reward
Science 275:1593–1599.

https://doi.org/10.1126/science.275.5306.1593
- PubMed
- Google Scholar
1. Sejnowski TJ
(1977) Storing covariance with nonlinearly interacting neurons
Journal of Mathematical Biology 4:303–321.

https://doi.org/10.1007/BF00275079
- PubMed
- Google Scholar
1. Seung HS
(2003) Learning in spiking neural networks by reinforcement of stochastic synaptic transmission
Neuron 40:1063–1073.

https://doi.org/10.1016/S0896-6273(03)00761-X
- PubMed
- Google Scholar
Book
1. Shalev-Shwartz S
2. Ben-David S
(2014) Understanding Machine Learning: From Theory to Algorithms
Cambridge University Press.

https://doi.org/10.1017/CBO9781107298019
- Google Scholar
1. Silver IA
2. Erecińska M
(1990) Intracellular and extracellular changes of [Ca2+] in hypoxia and ischemia in rat brain in vivo
The Journal of General Physiology 95:837–866.

https://doi.org/10.1085/jgp.95.5.837
- PubMed
- Google Scholar
(2008) Complex spike activity signals the direction and size of dysmetric saccade errors
Progress in brain research 171:153–159.

https://doi.org/10.1016/S0079-6123(08)00620-1
- PubMed
- Google Scholar
1. Stone LS
2. Lisberger SG
(1986) Detection of tracking errors by visual climbing fiber inputs to monkey cerebellar flocculus during pursuit eye movements
Neuroscience Letters 72:163–168.

https://doi.org/10.1016/0304-3940(86)90073-X
- PubMed
- Google Scholar
1. Stuart G
2. Häusser M
(1994) Initiation and spread of sodium action potentials in cerebellar Purkinje cells
Neuron 13:703–712.

https://doi.org/10.1016/0896-6273(94)90037-X
- PubMed
- Google Scholar
Book
1. Sutton RS
2. Barto AG
(1998)
Introduction to Reinforcement Learning (1st ed)

Cambridge, MA: MIT Press.
- Google Scholar
(2016) Timing Rules for Synaptic Plasticity Matched to Behavioral Function
Neuron 92:959–967.

https://doi.org/10.1016/j.neuron.2016.10.022
- PubMed
- Google Scholar
1. Suvrathan A
2. Raymond JL
(2018) Depressed by Learning-Heterogeneity of the Plasticity Rules at Parallel Fiber Synapses onto Purkinje Cells
The Cerebellum.

https://doi.org/10.1007/s12311-018-0968-8
- PubMed
- Google Scholar
1. Thach WT
(1968) Discharge of Purkinje and cerebellar nuclear neurons during rapidly alternating arm movements in the monkey
Journal of Neurophysiology 31:785–797.

https://doi.org/10.1152/jn.1968.31.5.785
- PubMed
- Google Scholar
(2000) Coincidence detection in single dendritic spines mediated by calcium release
Nature Neuroscience 3:1266–1273.

https://doi.org/10.1038/81792
- PubMed
- Google Scholar
(2014) Bidirectional plasticity of Purkinje cells matches temporal features of learning
Journal of Neuroscience 34:1731–1737.

https://doi.org/10.1523/JNEUROSCI.2883-13.2014
- PubMed
- Google Scholar
1. Wigström H
2. Gustafsson B
(1983a) Facilitated induction of hippocampal long-lasting potentiation during blockade of inhibition
Nature 301:603–604.

https://doi.org/10.1038/301603a0
- PubMed
- Google Scholar
1. Wigström H
2. Gustafsson B
(1983b) Large long-lasting potentiation in the dentate gyrus in vitro during blockade of inhibition
Brain Research 275:153–158.

https://doi.org/10.1016/0006-8993(83)90428-6
- PubMed
- Google Scholar
1. Williams RJ
(1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning
Machine Learning 8:229–256.

https://doi.org/10.1007/BF00992696
- Google Scholar
1. Xie X
2. Seung HS
(2004) Learning in neural networks by reinforcement of irregular spiking
Physical Review E 69:041909.

https://doi.org/10.1103/PhysRevE.69.041909
- PubMed
- Google Scholar
1. Yang Y
2. Lisberger SG
(2014) Purkinje-cell plasticity and cerebellar motor learning are graded by complex-spike duration
Nature 510:529–532.

https://doi.org/10.1038/nature13282
- PubMed
- Google Scholar
(2009) Pausing purkinje cells in the cerebellum of the awake cat
Frontiers in Systems Neuroscience 3:2.

https://doi.org/10.3389/neuro.06.002.2009
- PubMed
- Google Scholar
(1984) Discrete lesions of the cerebellar cortex abolish the classically conditioned nictitating membrane response of the rabbit
Behavioural Brain Research 13:261–266.

https://doi.org/10.1016/0166-4328(84)90168-2
- PubMed
- Google Scholar
1. Yeo CH
2. Hesslow G
(1998) Cerebellum and conditioned reflexes
Trends in Cognitive Sciences 2:322–330.

https://doi.org/10.1016/S1364-6613(98)01219-4
- PubMed
- Google Scholar
1. Zhang W
2. Linden DJ
(2006) Long-term depression at the mossy fiber-deep cerebellar nucleus synapse
Journal of Neuroscience 26:6935–6944.

https://doi.org/10.1523/JNEUROSCI.0784-06.2006
- PubMed
- Google Scholar
(2016) Climbing Fiber Regulation of Spontaneous Purkinje Cell Activity and Cerebellum-Dependent Blink Responses(1,2,3)
eNeuro 3:6.

https://doi.org/10.1523/ENEURO.0067-15.2015
- PubMed
- Google Scholar

Article and author information

Author details

Guy Bouvier

Institut de biologie de l’École normale supérieure (IBENS), École normale supérieure, CNRS, INSERM, PSL University, Paris, France
Present address
1. Department of Physiology, University of California, San Francisco, San Francisco, United States
2. Sandler Neuroscience, University of California, San Francisco, San Francisco, United States
Contribution
Data curation, Investigation, Methodology, Writing—original draft, Writing—review and editing, Performed all experiments shown

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-6160-7186
Johnatan Aljadeff

Departments of Statistics and Neurobiology, University of Chicago, Chicago, United States

Present address
Department of Bioengineering, Imperial College London, London, United Kingdom

Contribution
Software, Formal analysis, Writing—original draft, Writing—review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-7145-0514
Claudia Clopath

Department of Bioengineering, Imperial College London, London, United Kingdom

Contribution
Writing—review and editing, Contributed to Initial theoretical exploration

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-4507-8648
Célian Bimbard

Institut de biologie de l’École normale supérieure (IBENS), École normale supérieure, CNRS, INSERM, PSL University, Paris, France

Present address
Laboratoire des systèmes perceptifs, Département d’études cognitives, École normale supérieure, PSL University, Paris, France

Contribution
Investigation, Performed pilot experiments and suggested implementation of tracking plasticity

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-6380-5856
Jonas Ranft

Institut de biologie de l’École normale supérieure (IBENS), École normale supérieure, CNRS, INSERM, PSL University, Paris, France

Contribution
Software, Formal analysis, Writing—original draft, Writing—review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-7843-7443
Antonin Blot

Institut de biologie de l’École normale supérieure (IBENS), École normale supérieure, CNRS, INSERM, PSL University, Paris, France

Present address
Sainsbury-Wellcome Centre for Neural Circuits and Behaviour, University College London, London, United Kingdom

Contribution
Software

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-1546-3927
Jean-Pierre Nadal
1. Laboratoire de Physique Statistique, École normale supérieure, CNRS, PSL University, Sorbonne Université, Paris, France
2. Centre d’Analyse et de Mathématique Sociales, EHESS, CNRS, PSL University, Paris, France
Contribution
Formal analysis

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-0022-0647
Nicolas Brunel

Departments of Statistics and Neurobiology, University of Chicago, Chicago, United States

Present address
Departments of Neurobiology and Physics, Duke University, Durham, United States

Contribution
Conceptualization, Software, Formal analysis, Supervision, Writing—original draft, Writing—review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-2272-3248
Vincent Hakim

Laboratoire de Physique Statistique, École normale supérieure, CNRS, PSL University, Sorbonne Université, Paris, France

Contribution
Conceptualization, Software, Formal analysis, Supervision, Writing—original draft, Writing—review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-7505-8192
Boris Barbour

Institut de biologie de l’École normale supérieure (IBENS), École normale supérieure, CNRS, INSERM, PSL University, Paris, France

Contribution
Conceptualization, Data curation, Software, Formal analysis, Supervision, Funding acquisition, Methodology, Writing—original draft, Project administration, Writing—review and editing, Performed all data analysis shown

For correspondence
boris.barbour@ens.fr

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-1911-0539

Funding

Agence Nationale de la Recherche (ANR-08-SYSC-005)

Boris Barbour

National Science Foundation (IIS-1430296)

Johnatan Aljadeff
Nicolas Brunel

Fondation pour la Recherche Médicale (DEQ20160334927)

Boris Barbour

Fondation pour la Recherche Médicale

Guy Bouvier

Région Ile-de-France

Guy Bouvier

Labex (ANR-10-LABX-54 MEMOLIFE)

Guy Bouvier
Boris Barbour

Deutsche Forschungsgemeinschaft (RA-2571/1-1)

Jonas Ranft

Idex PSL* Research University (ANR-11-IDEX-0001-02)

Vincent Hakim
Boris Barbour

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We are grateful to the following for discussion and/or comments on this manuscript: David Attwell, Mariano Casado, Paul Dean, Anne Feltz, Richard Hawkes, Clément Léna, Steven Lisberger, Tom Ruigrok, John Simpson, Brandon Stell, Stéphane Supplisson and German Szapiro. We thank Gary Bhumbra for sharing a native-python library for reading Clampex files. A preprint describing this work was posted on the bioRxiv repository on 2016-05-16.

Ethics

Animal experimentation: Animal experimentation methods were performed according to authorisation 04445.02 granted by the 'Charles Darwin N°5' ethics committee.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.