Self-configuring feedback loops for sensorimotor control

  1. Sergio Oscar Verduzco-Flores  Is a corresponding author
  2. Erik De Schutter
  1. Computational Neuroscience Unit, Okinawa Institute of Science and Technology, Japan

Abstract

How dynamic interactions between nervous system regions in mammals performs online motor control remains an unsolved problem. In this paper, we show that feedback control is a simple, yet powerful way to understand the neural dynamics of sensorimotor control. We make our case using a minimal model comprising spinal cord, sensory and motor cortex, coupled by long connections that are plastic. It succeeds in learning how to perform reaching movements of a planar arm with 6 muscles in several directions from scratch. The model satisfies biological plausibility constraints, like neural implementation, transmission delays, local synaptic learning and continuous online learning. Using differential Hebbian plasticity the model can go from motor babbling to reaching arbitrary targets in less than 10 min of in silico time. Moreover, independently of the learning mechanism, properly configured feedback control has many emergent properties: neural populations in motor cortex show directional tuning and oscillatory dynamics, the spinal cord creates convergent force fields that add linearly, and movements are ataxic (as in a motor system without a cerebellum).

Editor's evaluation

This solid modelling study presents a valuable contribution toward understanding the neural control of movement. The authors show that a minimal model comprising key sensorimotor cortical areas as well as a spinal circuits controlling a limb readily replicates landmark observations from behavioural and electrophysiological studies. This work will be of broad interest to motor control researchers, as well as to neurophysiologists interested in testing the predictions derived from this model.

https://doi.org/10.7554/eLife.77216.sa0

Introduction

The challenge

Neuroscience has made great progress in decoding how cortical regions perform specific brain functions like primate vision (Kaas and Collins, 2003; Ballard and Zhang, 2021 and rodent navigation Chersi and Burgess, 2015; Moser et al., 2017). Conversely, the evolutionary much older motor control system still poses fundamental questions, despite a large body of experimental work. This is because, in mammals, in addition to areas in cortex like premotor and motor areas and to some degree sensory and parietal ones, many extracortical regions have important and unique functions: basal ganglia, thalamus, cerebellum, pons, brain stem nuclei like the red nucleus and spinal cord (Eccles, 1981; Loeb and Tsianos, 2015). These structures are highly interconnected by fast conducting axons and all show strong dynamic activity changes, related to the ongoing dynamics of the performed motor act. Clinical and lesion studies have confirmed the necessity of each of these regions for normal smooth motor control of arm reaching (Shadmehr and Wise, 2005; Arber and Costa, 2018).

Fully understanding motor control will thus entail understanding the simultaneous function and interplay of all brain regions involved. Little by little, new experimental techniques will allow us to monitor more neurons, in more regions, and for longer periods (Tanaka et al., 2018, e.g.). But to make sense of these data computational models must step up to the task of integrating all those regions to create a functional neuronal machine.

Finally, relatively little is known about the neural basis of motor development in infants (Hadders-Algra, 2018). Nevertheless, a full understanding of primate motor control will not only require explanation of how these brain regions complement and interact with each other but also how this can be learned during childhood.

With these challenges in mind we recently developed a motor control framework based on differential Hebbian learning (Verduzco-Flores et al., 2022). A common theme in physiology is the control of homeostatic variables (e.g. blood glucose levels, body temperature, etc.) using negative feedback mechanisms (Woods and Ramsay, 2007). From a broad perspective, our approach considers the musculoskeletal system as an extension of this homeostatic control system: movement aims to make the external environment conducive to the internal control of homeostatic variables (e.g. by finding food, or shelter from the sun).

Our working hypothesis (see Verduzco-Flores et al., 2022) is that control of homeostatic variables requires a feedback controller that uses the muscles to produce a desired set of sensory perceptions. The motosensory loop, minimally containing motor cortex, spinal cord, and sensory cortex may implement that feedback controller. To test this hypothesis we implemented a relatively complete model of the sensorimotor loop (Figure 1), using the learning rules in Verduzco-Flores et al., 2022 to produce 2D arm reaching. The activity of the neural populations and the movements they produced showed remarkable consistency with the experimental observations that we describe next.

Main components of the model.

In the left panel, each box stands for a neural population, except for P, which represents the arm and the muscles. Arrows indicate static connections, open circles show input correlation synapses, and the two colored circles show possible locations of synapses with the learning rule in Verduzco-Flores et al., 2022. In the spinal learning model the green circle connections are plastic, and the red circle connections are static. In the cortical learning model the red circle connections are plastic, whereas the green circle connections are static. In the static network all connections are static. A : afferent population. SA : Somatosensory cortex, modulated by afferent input. SP : somatosensory cortex, prescribed pattern. SPA : population signaling the difference between SP and SA : primary motor cortex. C : spinal cord. Inside the C box the circles represent the excitatory (E) and inhibitory (I) interneurons, organized into six pairs. The interneurons in each pair innervate an alpha motoneuron (α), each of which stimulates one of the six muscles in the arm, numbered from 0 to 5. The trios consisting of E, I, α units are organized into agonists and antagonists, depending on whether their α motoneurons cause torques in similar or opposite directions. These relations are shown in the right-side panel.

Relevant findings in motor control

Before describing our modeling approach, we summarize some of the relevant experimental data that will be important to understanding the results. We focus on three related issues: (1) the role of the spinal cord in movement, (2) the nature of representations in motor cortex, and (3) muscle synergies, and how the right pattern of muscle activity is produced.

For animals to move, spinal motoneurons must activate the skeletal muscles. In general, descending signals from the corticospinal tract do not activate the motoneurons directly, but instead provide input to a network of excitatory and inhibitory interneurons (Bizzi et al., 2000; Lemon, 2008; Arber, 2012; Asante and Martin, 2013; Alstermark and Isa, 2012; Jankowska, 2013; Wang et al., 2017; Ueno et al., 2018). Learning even simple behaviors involves long-term plasticity, both at the spinal cord (SC) circuit, and at higher regions of the motor hierarchy (Wolpaw et al., 1983; Grau, 2014; Meyer-Lohmann et al., 1986; Wolpaw, 1997; Norton and Wolpaw, 2018). Despite its obvious importance, there are comparatively few attempts to elucidate the nature of the SC computations, and the role of synaptic plasticity.

The role ascribed to SC is closely related to the role assumed from motor cortex, particularly M1. One classic result is that M1 pyramidal neurons of macaques activate preferentially when the hand is moving in a particular direction. When the preferred directions of a large population of neurons are added as vectors, a population vector appears, which points close to the hand’s direction of motion (Georgopoulos et al., 1982; Georgopoulos et al., 1986). This launched the hypothesis that M1 represents kinematic, or other high-level parameters of the movement, which are transformed into movements in concert with the SC. This hypothesis mainly competes with the view that M1 represents muscle forces. Much research has been devoted to this issue (Kakei et al., 1999; Truccolo et al., 2008; Kalaska, 2009; Georgopoulos and Stefanis, 2007; Harrison and Murphy, 2012; Tanaka, 2016; Morrow and Miller, 2003; Todorov, 2000, e.g.).

Another important observation is that the preferred directions of motor neurons cluster around one main axis. As shown in Scott et al., 2001, this suggests that M1 is mainly concerned with dynamical aspects of the movement, rather than representing its kinematics.

A related observation is that the preferred directions in M1 neurons experience random drifts that overlap learned changes (Rokni et al., 2007; Padoa-Schioppa et al., 2004). This leads to the hypothesis that M1 is a redundant network that is constantly using feedback error signals to capture the task-relevant dimensions, placing the configuration of synaptic weights in an optimal manifold.

A different perspective for studying motor cortex is to focus on how it can produce movements, rather than describing its activity (Shenoy et al., 2013). One specific proposal is that motor cortex has a collection of pattern generators, and specific movements can be created by combining their activity (Shenoy et al., 2013; Sussillo et al., 2015). Experimental support for this hypothesis came through the surprising finding of rotational dynamics in motor cortex activity (Churchland et al., 2012), suggesting that oscillators with different frequencies are used to produce desired patterns. This begs the question of how the animal chooses its desired patterns of motion.

Selecting a given pattern of muscle activation requires planning. Motor units are the final actuators in the motor system, but they number in the tens of thousands, so planning movements in this space is unfeasible. A low-dimensional representation of desired limb configurations (such as the location of the hand in Euclidean coordinates) is better. Movement generation likely involves a coordinate transformation, from the endpoint coordinates (e.g. hand coordinates) into actuator coordinates (e.g. muscle lengths), from which motor unit activation follows directly. Even using pure engineering methods, as for robot control, computing this coordinate transformation is very challenging. For example, this must overcome kinematic redundancies, as when many configurations of muscle lengths put the hand in the same location.

The issue of coordinate transformation is central for motor control (Shadmehr and Wise, 2005; Schöner et al., 2018; Valero-Cuevas, 2009; motor primitives and muscle synergies are key concepts in this discussion). Representing things as combinations of elementary components is a fundamental theme in applied mathematics. For example, linear combinations of basis vectors can represent any vector, and linear combinations of wavelets can approximate any smooth function (Keener, 1995). In motor control, this idea arises in the form of motor primitives. Motor primitives constitute a set of basic motions, such that that any movement can be decomposed into them (Giszter, 2015; Mussa–Ivaldi and Bizzi, 2000; Bizzi et al., 1991). This is closely related to the concept of synergies. The term ‘synergy’ may mean several things (Kelso, 2009; Bruton and O’Dwyer, 2018), but in this paper, we use it to denote a pattern of muscle activity arising as a coherent unit. Synergies may be composed of motor primitives, or they may be the motor primitives themselves.

A promising candidate for motor primitives comes in the form of convergent force fields, which have been observed for the hindlimbs of frogs and rats (Giszter et al., 1993; Mussa-Ivaldi et al., 1994, or in the forelimbs of monkeys Yaron et al., 2020). In experiments where the limb is held at a particular location, local stimulation of the spinal cord will cause a force to the limb’s endpoint. The collection of these force vectors for all of the limb endpoint’s positions forms a force field, and these force fields have two important characteristics: (1) they have a unique fixed point and (2) simultaneous stimulation of two spinal cord locations produces a force field which is the sum of the force fields from stimulating the two locations independently. It is argued that movement planning may be done in terms of force fields, since they can produce movements that are resistant to perturbations, and also permit a solution to the problem of coordinate transformation with redundant actuators (Mussa–Ivaldi and Bizzi, 2000).

The neural origin of synergies, and whether they are used by the motor system is a matter of ongoing debate (Tresch and Jarc, 2009; de Rugy et al., 2013; Bizzi and Cheung, 2013). To us, it is of interest that single spinal units found in the mouse (Levine et al., 2014 and monkey Takei et al., 2017) spinal cord (sometimes called Motor Synergy Encoders, or MSEs) can reliably produce specific patterns of motoneuron activation.

Model concepts

We believe that it is impossible to understand the complex dynamical system in biological motor control without the help of computational modeling. Therefore, we set out to build a minimal model that could eventually control an autonomous agent, while still satisfying biological plausibility constraints.

Design principles and biological-plausibility constraints for neural network modeling have been proposed before (Pulvermüller et al., 2021; O’Reilly, 1998; Richards et al., 2019). Placing emphasis on the motor system, we compiled a set of characteristics that cover the majority of these constraints. Namely:

  • Spanning the whole sensorimotor loop.

  • Using only neural elements. Learning their connection strengths is part of the model.

  • Learning does not rely on a training dataset. It is instead done by synaptic elements using local information.

  • Learning arises from continuous-time interaction with a continuous-space environment.

  • There is a clear vision on how the model integrates with the rest of the brain in order to enact more general behavior.

Our aim is hierarchical control of homeostatic variables, with the spinal cord and motor cortex at the bottom of this hierarchy. At first glance, spinal plasticity poses a conundrum, because it changes the effect of corticospinal inputs. Cortex is playing a piano that keeps changing its tuning. A solution comes when we consider the corticospinal loop (e.g. the long-loop reflex) as a negative control system, where the spinal cord activates the effectors to reduce an error. The role of cortex is to produce perceptual variables that are controllable, and can eventually improve homeostatic regulation. In this regard, our model is a variation of Perceptual Control Theory (Powers, 1973; Powers, 2005), but if the desired value of the controller is viewed as a prediction, then this approach resembles active inference models (Adams et al., 2013). Either way, the goal of the system is to reduce the difference between the desired and the perceived value of some variable.

If cortex creates representations for perceptual variables, the sensorimotor loop must be configured so those variables can be controlled. This happens when the error in those variables activates the muscles in a way that brings the perceived value closer to the desired value. In other words, we must find the input-output structure of the feedback controller implicit in the long-loop reflex. We have found that this important problem can be solved by the differential Hebbian learning rules introduced in Verduzco-Flores et al., 2022. We favor the hypothesis that this learning takes place is in the connections from motor cortex to interneurons and brainstem. Nevertheless, we show that all our results are valid if learning happens in the connections from sensory to motor cortex.

In the Results section we will describe our model, its variations, and how it can learn to reach. Next we will show that many phenomena described above are present in this model. These phenomena emerge from having a properly configured neural feedback controller with a sufficient degree of biological realism. This means that even if the synaptic weights of the connections are set by hand and are static, the phenomena still emerge, as long as the system is configured to reduce errors. In short, we show that a wealth of phenomena in motor control can be explained simply by feedback control in the sensorimotor loop, and that this feedback control can be configured in a flexible manner by the learning rules presented in Verduzco-Flores et al., 2022.

Results

A neural architecture for motor control

The model in this paper contains the main elements of the long-loop reflex, applied to the control of a planar arm using six muscles. The left panel of Figure 1 shows the architecture of the model, which contains 74 firing rate neurons organized in six populations. This architecture resembles a feedback controller that makes the activity in a neural population SA approach the activity in a different population SP.

The six firing-rate neurons (called units in this paper) in SA represent a region of somatosensory cortex, and its inputs consist of the static gamma (II) afferents. In steady state, activity of the II afferents is monotonically related to muscle length (Mileusnic et al., 2006), which in turn can be used to prescribe hand location. Other afferent signals are not provided to SA in the interest of simplicity.

SP represents a different cortical layer of the same somatosensory region as SA, where a ‘desired’ or ‘predicted’ activity has been caused by brain regions not represented in the model. Each firing rate neuron in SA has a corresponding unit in SP, and they represent the mean activity at different levels of the same microcolumn (Mountcastle, 1997). SPA is a region (either in sensory or motor cortex) that conveys the difference between activities in SP and SA, which is the error signal to be minimized by negative feedback control.

Population A represents sensory thalamus and dorsal parts of the spinal cord. It contains 18 units with logarithmic activation functions, each receiving an input from a muscle afferent. Each muscle provides proprioceptive feedback from models of the Ia, Ib, and II afferents. In rough terms, Ia afferents provide information about contraction velocity, and Ib afferents signal the amount of tension in the muscle and tendons.

Population M represents motor cortex. Ascending inputs to M arise from population A, and use a variation of the input correlation learning rule (Porr and Wörgötter, 2006), where the SPA inputs act as a learning signal. The input correlation rule enhances the stability of the controller. More details are presented in Methods. The SPA inputs to M can either be static, or use a learning rule to be described below.

To represent positive and negative values, both M and SPA use a ‘dual representation’, where each error signal is represented by two units. Let ei=sPi-sAi be the error associated with the i-th muscle. One of the two SPA units representing ei is a monotonic function of max(ei,0), whereas the other unit increases according to max(-ei,0). These opposing inputs, along with mutual inhibition between the two units creates dynamics where sensorimotor events cause both excitatory and inhibitory responses, which agrees with experimental observations (Shafi et al., 2007; Steinmetz et al., 2019; Najafi et al., 2020), and allows transmitting ‘negative’ values using excitatory projections. Dual units in M receive the same inputs, but with the opposite sign.

Plasticity mechanisms within the sensorimotor loop should specify which muscles contract in order to reduce an error signaled by SPA. We suggest that this plasticity could take place in the spinal cord and/or motor cortex. To show that our learning mechanisms work regardless of where the learning takes place, we created two main configurations of the model. In the first configuration, called the ‘spinal learning’ model, a ‘spinal’ network C transforms the M outputs into muscle stimulation. C learns to transform sensory errors into appropriate motor commands using a differential Hebbian learning rule (Verduzco-Flores et al., 2022). In this configuration, the error input to each M unit comes from one of the SPA activities. A second configuration, called the ‘cortical learning’ model, has ‘all-to-all’ connections from SPA to M using the differential Hebbian rule, whereas the connections from M to C use appropriately patterned static connections. Both configurations are basically the same model; the difference is that one configuration has our learning rule on the inputs to C, whereas the other has it on the inputs to M (Figure 1).

While analyzing our model we reproduced several experimental phenomena (described below). Interestingly, these phenomena did not arise because of the learning rules. To make this explicit, we created a third configuration of our model, called the ‘static network’. This configuration does not change the weight of any synaptic connection during the simulation. The initial weights were hand-set to approximate the optimal solution everywhere (see Methods). We will show that all emergent phenomena in the paper are also present in the static network.

We explain the idea behind the differential Hebbian rule as applied in the connections from M to C.C contains N interneurons, whose activity vector we denote as c=[c1,,cN]. The input to each of these units is an M dimensional vector e=[e1,,eM]. Each unit in C has an output ci=σ(jωijej), where σ() is a positive sigmoidal function. The inputs are assumed to be errors, and to reduce them we want ej to activate ci when ci can reduce ej. One way this could happen is when the weight ωij from ej to ci is proportional to the negative of their sensitivity derivative:

(1) ωijejci.

Assuming a monotonic relation between the motor commands and the errors, relation 1 entails that errors will trigger an action to cancel them, with some caveats considered in Verduzco-Flores et al., 2022. Synaptic weights akin to Equation 1 can be obtained using a learning rule that extracts correlations between the derivatives of ci and ej (see Methods). Using this rule, the commands coming from population C can eventually move the arm so that SA activity resembles SP activity.

C is organized to capture the most basic motifs of spinal cord connectivity using a network where balance between excitation and inhibition is crucial (Berg et al., 2007; Berg et al., 2019; Goulding et al., 2014). Each one of six α motoneurons stimulate one muscle, and is stimulated by one excitatory (CE), and one inhibitory (CI) interneuron. CE and CI stimulate one another, resembling the classic Wilson-Cowan model (Cowan et al., 2016). The trios composed of α,CE, and CI neurons compose a group that controls the activation of one muscle, with CE and CI receiving convergent inputs from M. This resembles the premotor network model in Petersen et al., 2014. (α,CE,CI) trios are connected to other trios following the agonist-antagonist motif that is common in the spinal cord (Pierrot-Deseilligny and Burke, 2005). This means that CE units project to the CE units of agonists, and to the CI units of antagonists (Figure 1, right panel). When the agonist/antagonist relation is not strongly defined, muscles can be ‘partial’aASaS agonists/antagonists, or unrelated.

Connections from A to C (the ‘short-loop reflex’) use the input correlation learning rule, analogous to the connections from A to M.

Direct connections from M to alpha motoneurons are not necessary for the model to reach, but they were introduced in new versions because in higher primates these connections are present for distal joints (Lemon, 2008). Considering that bidirectional plasticity has been observed in corticomotoneural connections (Nishimura et al., 2013), we chose to endow them with the differential Hebbian rule of Verduzco-Flores et al., 2022.

Because timing is essential to support the conclusions of this paper, every connection has a transmission delay, and all firing rate neurons are modeled with ordinary differential equations.

All the results in this paper apply to the three configurations described above (spinal learning, cortical learning, and static network). To emphasize the robustness and potential of the learning mechanisms, in the Appendix we introduce two variations of the spinal learning model (in the Variations of the spinal learning model section). All results in the paper also apply to those two variations. In one of the variations (the ‘synergistic’ network), each spinal motoneuron stimulates two muscles rather than one. In the second variation (the ‘mixed errors’ network), the inputs from SPA to M are not one-to-one, but instead come from a matrix that combines multiple error signals as the input to each M unit.

Since most results apply to all configurations, and since results could depend on the random initial weights, we report simulation results using three means and three standard deviations (m1±σ1|m2±σ2|m3±σ3), with the understanding that these three value pairs correspond respectively to the spinal learning, motor learning, and static network models. The statistics come from 20 independent simulations with different initial conditions.

A reference section in the Appendix (the Comparison of the 5 configurations section) summarizes the basic traits of all different model configurations (including the two variations of the spinal learning model), and compiles all their numerical results.

For each configuration, a single simulation was used to produce all the representative plots in different sections of the paper.

The model can reach by matching perceived and desired sensory activity

Reaches are performed by specifying an SP pattern equal to the SA activity when the hand is at the target. The acquisition of these SP patterns is not in the scope of this paper (but see Verduzco-Flores et al., 2022).

We created a set of random targets by sampling uniformly from the space of joint angles. Using this to set a different pattern in SP every 40 s, we allowed the arm to move freely during 16 SP target presentations. To encourage exploratory movements we used noise and two additional units described in the Methods.

All model configurations were capable of reaching. To decide if reaching was learned in a trial we took the average distance between the hand and the target (the average error) during the last four target presentations. Learning was achieved when this error was smaller than 10 cm.

The system learned to reach in 99 out of 100 trials (20 for each configuration). One simulation with the spinal learning model had an average error of 14 cm during the last 4 reaches of training. To assess the speed of learning we recorded the average number of target presentations required before the error became less than 10 cm for the first time. This average number of failed reaches before the first success was: (1.8±2|1.2±.9|0±0).

Figure 2A shows the error through 16 successive reaches (640 s of in silico time) in a typical case for the spinal learning model. A supplementary video (Appendix 1—Video 1) shows the arm’s movements during this simulation. Figures similar to Figure 2 can be seen for all configurations as figure supplements (Figure 2—figure supplement 1) (Figure 2—figure supplement 2).

Figure 2 with 4 supplements see all
Representative training phase of a simulation for the spinal learning model.

(A) Distance between the target and the hand through 640 s of simulation, corresponding to 16 reaches to different targets. The horizontal dotted line corresponds to 10 cm. The times when SP changes are indicated with a vertical, dotted yellow line. Notice that the horizontal time axis is the same for all panels of this figure. The average error can be seen to decrease through the first two reaches. (B) Desired versus actual hand coordinates through the training phase. The straight lines denote the desired X (green) and Y (red) coordinates of the hand. The noisy orange and blue lines show the actual coordinates of the hand. (C) Activity of units 0 and 1 in SP and SA. This panel shows that the desired values in the SP units (straight dotted lines) start to become tracked by the perceived values. (D) Activity of M units 1, 2, and their duals. Notice that even when the error is close to zero the activity in the M units does not disappear. E: Activity of the CE,CI,α trio for muscle 0. The intrinsic noise in the units causes ongoing activity. Moreover, the inhibitory activity (orange line) dominates the excitatory activity (blue line).

In Figure 2A, the error increases each time a new target was presented (yellow vertical lines), but as learning continues it was consistently reduced below 10 cm.

Panel B also shows the effect of learning, as the hand’s Cartesian coordinates eventually track the target coordinates whenever they change. This is also reflected as the activity in SA becoming similar to the activity in SP (panel C).

Panels D and E of Figure 2 show the activity of a few units in population M and population C during the 640 s of this training phase. During the first few reaches, M shows a large imbalance between the activity of units and their duals, reflecting larger errors. Eventually these activities balance out, leading to a more homogeneous activity that may increase when a new target appears. M1 activation patterns that produce no movement are called the null-space activity (Kaufman et al., 2014). In our case, this includes patterns where M units have the same activity as their duals. This, together with the noise and oscillations intrinsic to the system cause the activity in M and C to never disappear.

In panel E, the noise in the C units becomes evident. It can also be seen that inhibition dominates excitation (due to CE to CI connections), which promotes stability in the circuit.

We tested whether any of the novel elements in the model were superfluous. To this end, we removed each of the elements individually and checked if the model could still learn to reach. In conclusion, removing individual elements generally deteriorated performance, but the factor that proved essential for all configurations with plasticity was the differential Hebbian learning in the connections from M to C or from SPA to M. For details, see the the Appendix section titled The model fails when elements are removed.

Center-out reaching 1: The reach trajectories present traits of cerebellar ataxia

In order to compare our model with experimental data, after the training phase we began a standard center-out reaching task. Switching to this task merely consisted of presenting the targets in a different way, but for the sake of smoother trajectories we removed the noise from the units in C or M.

Figure 3A shows the eight peripheral targets around a hand rest position. Before reaching a peripheral target, a reach to the center target was performed, so the whole experiment was a single continuous simulation controlled by the SP pattern.

Figure 3 with 3 supplements see all
Center-out reaching.

(A) The arm at its resting position, with hand coordinates (0.3, 0.3) meters, where a center target is located. Eight peripheral targets (cyan dots) were located on a circle around the center target, with a 10 cm radius. The muscle lines, connecting the muscle insertion points, are shown in red. The shoulder is at the origin, whereas the elbow has coordinates (0.3, 0). Shoulder insertion points remain fixed. (B-F) Hand trajectories for all reaches in the three configurations. The trajectory’s color indicates the target. Dotted lines show individual reaches, whereas thick lines indicate the average of the 6 reaches.

Peripheral targets were selected at random, each appearing six times. This produced 48 reaches (without counting reaches to the center), each one lasting 5 s. Panels B through D of Figure 3 show the trajectories followed by the hand in the three configurations. During these 48 reaches the average distance between the hand and the target was (3.3±.01|2.9±.001|2.9±.0003) centimeters.

Currently our system has neither cerebellum nor visual information. Lacking a ‘healthy’ model to make quantitative comparisons, we analyzed and compared them to data from cerebellar patients.

For the sake of stability and simplicity, our system is configured to perform slow movements. Fast and slow reaches are different in cerebellar patients (Bastian et al., 1996). Slow reaches undershoot the target, follow longer hand paths, and show movement decomposition (joints appear to move one at a time). In Figure 3 the trajectories begin close to the 135 degree axis, indicating a slower response at the elbow joint. With the parameters used, the spinal learning and cortical learning models tend to undershoot the target, whereas in the static network the hand can oscillate around the target.

The traits of the trajectories can be affected by many hyperparameters in the model, but the dominant factor seems to be the gain in the control loop. Our model involves delays, activation latencies, momentum, and interaction torques. Unsurprisingly, increasing the gain leads to oscillations along with faster reaching. On the other hand, low gain leads to slow, stable reaching that often undershoots the target. Since we do not have a cerebellum to overcome this trade off, the gain was the only hyperparameter that was manually adjusted for all configurations (See Methods). In particular, we adjusted the slope of the M and SA units so the system was stable, but close to the onset of oscillations. Gain was allowed to be a bit larger in the static network so oscillations could be observed. The figure supplements for Figure 3 shows more examples of configurations with higher gain (See Gain and oscillations in Appendix 1 for details).

The shape of the trajectory also depends on the target. Different reach directions cause different interaction forces, and encounter different levels of viscoelastic resistance from the muscles.

Figure 4 reveals that the approach to the target is initially fast, but gradually slows down. Healthy subjects usually present a bell-shaped velocity profile, with some symmetry between acceleration and deceleration. This symmetry is lost with cerebellar ataxia (Becker et al., 1991; Gilman et al., 1976).

Distance to target and reach velocity through time for the three configurations.

Thick lines show the average over 48 reaches (8 targets, 6 repetitions). Filled stripes show standard deviation. For the spinal and cortical learning configurations (left and center plots) the hand initially moves quickly to the target, but the direction is biased, so it needs to gradually correct the error from this initial fast approach; most of the variance in error and velocity appears when these corrections cause small-amplitude oscillations. In the case of the static network (right plots) oscillations are ongoing, leading to a large variance in velocity.

We are not aware of center-out reaching studies for cerebellar patients in the dark, but (Day et al., 1998) does examine reaching in these conditions. Summarizing its findings:

  1. Movements were slow.

  2. The endpoints had small variation, but they had constant errors.

  3. Longer, more circuitous trajectories, with most changes in direction during the last quarter.

  4. Trajectories to the same target showed variations.

From Figures 3 and 4 we can observe constant endpoint errors when the gain is low, in the spinal and cortical learning models. Circuitous trajectories with a pronounced turn around the end of the third quarter are also observed. Individual trajectories can present variations. A higher gain, as in the static network on the right plots, can increase these variations, as illustrated in the figure supplements for Appendix 1.

Center-out reaching 2: Directional tuning and preferred directions

To find whether directional tuning could arise during learning, we analyzed the M population activity for the 48 radial reaches described in the previous subsection.

For each of the 12 units in M, Figure 5A shows the mean firing rate of the unit when reaching each of the 8 targets. The red arrows show the Preferred Direction (PD) vectors that arise from these distributions of firing rates. For the sake of exposition, Figure 5 shows data for the simpler case of one-to-one connectivity between SPA and M in the spinal learning model, but these results generalize to the case when each M unit receives a linear combination of the SPA activities (the ‘mixed errors’ variation presented in the Variations of the spinal learning model section of the Appendix.)

Directional tuning of the units in M for a simulation with the spinal learning model.

(A) Average firing rate per target, and preferred direction (see Methods) for each of the 12 units in M. Each polar plot corresponds to a single unit, and each of the 8 purple wedges corresponds to one of the 8 targets. The length of a wedge indicates the mean firing rate when the hand was reaching the corresponding target. The red arrow indicates the direction and relative magnitude of the PD vector. The black arrow shows the predicted PD vector, in this case just the corresponding arrows from panel B. (B) For each muscle and target, a wedge shows the muscle’s length at rest position minus the length at the target, divided by the rest position length. The red arrow comes from the sum of the wedges taken as vectors, and represents the muscle’s direction of maximum contraction. Plots corresponding to antagonist muscles are connected by red lines. (C) Average activity of the 6 A units indicating muscle tension. The black arrows come from the sum of wedges taken as vectors, showing the relation between muscle tension and preferred direction.

We found that (11.8±.4|12±0|12±0) units were significantly tuned to reach direction (p<0.001, bootstrap test), with PD vectors of various lengths. The direction of the PD vectors is not mysterious. Each M unit controls the length error of one muscle. Figure 5B shows that the required contraction length depends on both the target and the muscle. The PD vectors of units 0–5 point to the targets that require the most contraction of their muscle. Units 6–11 are the duals of 0–5, and their PD is in the opposite direction. Figure 5C shows that the PD may also be inferred from the muscle activity, reflected as average tension.

In the case when each M unit receives a linear combination of SPA errors, its PD can be predicted using a linear combination of the ‘directions of maximum contraction’ shown in Figure 5B, using the same weights as the SPA inputs. When accounting for the length of the PD vectors, this can predict the PD angle with a coefficient of determination R2(.74±.18|.88±.14|.86±.01).

As mentioned in the Introduction, the PDs of motor cortex neurons tend to align in particular directions Scott et al., 2001. This is almost trivially true for this model, since the PD vectors are mainly produced by linear combinations of the vectors in Figure 5B.

Figure 6 shows the PD for all the M units in a representative simulation for each of the configurations. In every simulation, the PD distribution showed significant bimodality (p<0.001). The main axis of the PD distribution (see Methods) was (59±7|52±2|54±.5) degrees.

Preferred direction vectors for the 12 M units.

In all three plots the arrows denote the direction and magnitude of the preferred direction (PD) for an individual unit. The gray dotted lines shows the main axis of the distribution. The red dotted lines are a 45 degree rotation of the gray line, for comparison with Scott et al., 2001. It can be seen that all configurations display a strong bimodality, especially when considering the units with a larger PD vector. The axis where the PD vectors tend to aggregate is in roughly the same position for the three configurations.

To compare with (Scott et al., 2001) we rotate this line 45 degrees so the targets are in the same position relative to the shoulder (e.g. Lillicrap and Scott, 2013 Figure 1, Kurtzer et al., 2006 Figure 1). This places the average main axes above in a range between 99 and 104 degrees, comparable to the 117 degrees in Scott et al., 2001.

The study in Lillicrap and Scott, 2013 suggested that a rudimentary spinal cord feedback system should be used to understand why the PD distribution arises. Our model is the first to achieve this.

The PD vectors are not stationary, but experience random fluctuations that become more pronounced in new environments (Rokni et al., 2007; Padoa-Schioppa et al., 2004). The brain is constantly remodeling itself, without losing the ability to perform its critical operations (Chambers and Rumpel, 2017). Our model is continuously learning, so we tested the change in the PDs by setting 40 additional center-out reaches (no intrinsic noise) after the previous experiment, once for each configuration.

To encourage changes we set 10 different targets instead of 8. After a single trial for each configuration the change in angle for the 12 PD vectors had means and standard deviations of (3.3±2.4|4.9±2.1|.3±.2) degrees. Larger changes (around 7 degrees) could be observed in the ‘mixed errors’ variation of the model, presented in the Appendix (Variations of the spinal learning model section). We also measured the change in the preferred directions of the muscles, obtained as in Figure 5C. This yielded differences and standard deviations (3.8±2.1|6.4±2.9|.2±.2) degrees.

The average distance between hand and target during the 40 reaches was (3|3.6|2.9) cm, showing that the hand was still moving towards the targets, although with different errors due to their new locations.

Center-out reaching 3: Rotational dynamics

Using a dynamical systems perspective, (Shenoy et al., 2013) considers that the muscle activity c(t) (a vector function of time) arises from the cortical activity vector c(t) after it is transformed by the downstream circuitry:

(2) m(t)=G[r(t)].

It is considered that the mapping G[] may consist of sophisticated controllers, but for the sake of simplicity this mapping is considered static, omitting spinal cord plasticity. The cortical activity arises from a dynamical system:

(3) τr˙(t)=h(r(t))+u(t),

where u(t) represents inputs to motor cortex from other areas, and h() is a function that describes how the state of the system evolves.

A difficulty associated with Equation 3 is explaining how c(t) generates a desired muscle pattern c(t) when the function h() represents the dynamics of a recurrent neural network. One possibility is that M1 has intrinsic oscillators of various frequencies, and they combine their outputs to shape the desired pattern. This prompted the search for oscillatory activity in M1 while macaques performed center-out reaching motions. A brief oscillation (in the order of 200ms, or 5 Hz) was indeed found in the population activity (Churchland et al., 2012, and the model in Sussillo et al., 2015) was able to reproduce this result, although this was done in the open-loop version of Equations 2 and 3, where u(t) contains no afferent feedback (this is further commented in the Supplemental Discussion).

Recently it was shown that the oscillations in motor cortex can arise when considering the full sensorimotor loop, without the need of recurrent connections in motor cortex (Kalidindi et al., 2021). A natural question is whether our model can also reproduce the oscillations in Churchland et al., 2012 without requiring M1 oscillators or recurrent connections.

The analysis in Churchland et al., 2012 is centered around measuring the amount of rotation in the M1 population activity. The first step is to project the M1 activity vectors onto their first six principal components. These six components are then rotated so the evolution of the activity maximally resembles a pure rotation. These rotated components are called the ‘jPCA vectors’. The amount of variance in the M1 activity explained by the first two jPCA vectors is a measure of rotation. The Methods section provides more details of this procedure.

Considering that we have a low-dimensional, non-spiking, slow-reaching model, we can only expect to qualitatively replicate the essential result in Churchland et al., 2012, which is most of the variance being contained in the first jPCA plane.

We replicated the jPCA analysis, with adjustments to account for the smaller number of neurons, the slower dynamics, and the fact that there is no delay period before the reach (See Methods). The result can be observed in Figure 7, where 8 trajectories are seen in the plots. Each trajectory is the average activity of the 12 M units when reaching to one of the 8 targets, projected onto the jPCA plane. The signature of a rotational structure in these plots is that most trajectories circulate in a counterclockwise direction. Quantitatively, the first jPCA plane (out of six) captures (.42±.04|.42±.04|.46±.03) of the variance.

Rotational dynamics in the M population in a representative simulation for all configurations.

Each plot shows the first two jPCA components during 0.25 s, for each of the 8 conditions/targets. Traces are colored according to the magnitude of their initial jPCA1 component, from smallest (green) to largest (red).

With this analysis we show that our model does not require intrinsic oscillations in motor cortex to produce rotational dynamics, in agreement with (Kalidindi et al., 2021 and DeWolf et al., 2016).

The effect of changing the mass

Physical properties of the arm can change, not only as the arm grows, but also when tools or new environments come into play. As a quick test of whether the properties in this paper are robust to moderate changes, we changed the mass of the arm and forearm from 1 to 0.8 kg and ran one simulation for each of the five configurations.

With a lighter arm the average errors during center-out reaching were (2.5|3.2|3) cm. The hand trajectories with a reduced mass can be seen in the top 3 plots of Figure 8. We can observe that the spinal learning model slightly reduced its mean error, whereas the cortical learning model increased it. This can be understood by noticing that a reduction in mass is akin to an increase in gain. The spinal learning model with its original gain was below the threshold of oscillations at the endpoint, and a slight mass decrease did not change this. The cortical learning model with the original gain was already oscillating slightly, and an increase in gain increased the oscillations.

Hand trajectories with low mass (0.8 kg, top 3 plots) and high mass (1.2 kg, bottom 3 plots) for the 3 configurations.

Plots are as in Figure 3. The spinal learning model and the static network show qualitatively similar trajectories compared to those in Figure 3. In contrast, the cortical learning model began to display considerable endpoint oscillations for several targets after its mass was reduced. These oscillations persist after the mass has been increased.

In the same simulation, after the center-out reaching was completed, we once more modified the mass of the arm and forearm, from 0.8 to 1.2 kg, after which we began the center-out reaching again. This time the center-out reaching errors were (2.4|3.3|2.9) cm. The hand trajectories for this high mass condition are in the bottom 3 plots in Figure 8. It can be seen that the spinal learning and cortical learning models retained their respectively improved and decreased performance, whereas the static network performed roughly the same for all mass conditions. A tentative explanation is that with reduced mass the synaptic learning rules tried to compensate for faster movements with weights that effectively increased the gain in the loop. After the mass was increased these weights did not immediately revert, leading to similar trajectories after the increase in mass.

The results of the paper still held after our mass manipulations. For all configurations, PD vectors could be predicted with a coefficient of determination between.74 and.92; All units in M were significantly tuned to direction; the main axis of the PD distribution ranged between 56 and 61 degrees, and the first jPCA plane captured between 33% and 58% of the variance.

Spinal stimulation produces convergent direction fields

Due to the viscoelastic properties of the muscles, the mechanical system without active muscle contraction will have a fixed point with lowest potential energy at the arm’s rest position. Limited amounts of muscle contraction shift the position of that fixed point. This led us to question whether this could produce convergent force fields, which as discussed before are candidate motor primitives, and have been found experimentally.

To simulate local stimulation of an isolated spinal cord we removed all neuronal populations except for those in C, and applied inputs to the individual pairs of CE,CI units projecting to the same motoneuron. Doing this for different starting positions of the hand, and recording its initial direction of motion, produces a direction field. A direction field maps each initial hand location to a vector pointing in the average direction of the force that initially moves the hand.

The first two panels of Figure 9 show the result of stimulating individual E-I pairs in C, which will indeed produce direction fields with different fixed points.

Two sample direction fields and their linear addition for circuit C1.

(A) Direction Field (DF) from stimulation of the interneurons for muscle 0 (biarticular biceps). The approximate location of the fixed point is shown with a blue dot. (B) DF from stimulation of muscle 3 (biarticular triceps) interneurons. A red dot shows the fixed point. (C) Panels A and B overlapped. (D) In green, the DF from stimulating the interneurons for muscles 0 and 3 together. In purple, the sum of the DFs from panels A and B. Dots show the fixed points. The average angle between the green and purple vectors is 4 degrees.

We found that these direction fields add approximately linearly (Figure 9D). More precisely, let D(a+b) be the direction field from stimulating spinal locations a and b simultaneously, and αa+b(x,y) be the angle of D(a+b) at hand coordinates (x,y). Using similar definitions for D(a),D(b),αa(x,y),αb(x,y), we say the direction fields add linearly if αa+b(x,y)=αa(x,y)+αb(x,y),(x,y).

We define the mean angle difference between D(a+b) and D(a)+D(b) as

(4) γa,b=x,yαa+b(x,y)(αa(x,y)+αb(x,y))Ns,

where Ns is the number of (x,y) sample points. We found that when averaged over the 15 (C1) or 144 (C2) possible (a,b) pairs, the mean of γa,b was 13.5 degrees.

Randomly choosing two possibly different pairs (a,b) and (c,d) for the stimulation locations leads to a mean angle difference of 37.6 degrees between the fields D(a+b) and D(c)+D(d). A bootstrap test showed that these angles are significantly larger (p<0.0001) than in the previous case where (a,b)=(c,d).

The resting field is defined as the direction field when no units are stimulated. Removing the resting field from D(a+b),D(a), and D(b) does not alter these results.

Recent macaque forelimb experiments (Yaron et al., 2020) show that the magnitude of the vectors in the D(a+b) fields is larger than expected from D(a)+D(b) (supralinear summation). We found no evidence for this effect, suggesting that it depends on mechanisms beyond those present in our model.

Discussion

Summary of findings and predictions

We have presented a model of the long loop reflex with a main assumption: negative feedback configured with two differential Hebbian learning rules. One novel rule sets the loop’s input-output structure, and the other rule (input correlation) promotes stability. We showed that this model can make arm reaches by trying to perceive a given afferent pattern.

Our study made two main points:

  1. Many experimental phenomena emerge from a feedback controller with minimally-complete musculoskeletal and neural models (emphasis is placed on the balance between excitation and inhibition).

  2. Even if the feedback controller has multiple inputs and outputs, its input-output structure can be flexibly configured by a differential Hebbian learning rule, as long as errors are monotonic.

The first main point above was made using a feedback control network with no learning (called the static network in the Results). We showed that in this static network: (1) reaching trajectories are similar to models of cerebellar ataxia, (2) motor cortex units are tuned to preferred directions, (3) those preferred directions follow a bimodal distribution, (4) motor cortex units present rotational dynamics, (5) reaching is still possible when mass is altered, and (6) spinal stimulation produces convergent direction fields.

The second main point was made using two separate models, both using the same differential Hebbian learning rules, but applied at different locations. The spinal learning model presents the hypothesis that the spinal cord learns to adaptively configure the input-output structure of the feedback controller. The cortical learning model posits that configuring this structure could instead be a function of motor cortex; this would not disrupt our central claims. These two models should not be considered as incompatible hypotheses. Different elements performing overlapping functions are common in biological systems (Edelman and Gally, 2001).

Two variations of the spinal learning model in the Appendix show that this learning mechanism is quite flexible, opening the doors for certain types of synergies, and for more complex errors (that still maintian the constraint of monotonicity).

We list some properties of the model, and possible implications:

  • Basic arm reaching happens through negative feedback, trying to perceive a target value set in cortex. Learning the input-output structure of the feedback controller may require spinal cord plasticity.

    • Cerebellar patients should not be able to adapt to tasks that require fast reactions, as negative feedback alone cannot compensate for delays in the system (Sanguineti et al., 2003). On the other hand, they should be able to learn tasks that require remapping afferent inputs to movements. One example is Richter et al., 2004, where cerebellar patients learned to move in a novel dynamic environment, but their movements were less precise than those of controls.

  • The shape of reaches is dominated by mechanical and viscoelastic properties of the arm and muscles.

    • Unfamiliar viscous forces as in Richter et al., 2004 should predictably alter the trajectory (Figure 3) for cerebellar patients, who should not be able to adapt unless they move slowly and are explicitly compensating.

  • Preferred Directions (PDs) in motor cortex happen because muscles need to contract more when reaching in certain directions.

    • The PD distribution should align with the directions where the muscles need to contract to reduce the error. These directions depend on which error is encoding. If the error is not related to reaching (e.g. related to haptic feedback), a different PD distribution may arise after overtraining.

    • Drift in the PD vectors comes from the ongoing adaptation, and it should not disrupt performance.

  • The oscillations intrinsic to delayed feedback control after the onset of a target are sufficient to explain the quasi-oscillations observed in motor cortex (Churchland et al., 2012; Kalidindi et al., 2021).

  • Convergent force fields happen naturally in musculoskeletal systems when there is balance in the stimulation between agonists and antagonists. Linear addition of force fields is a result of forces/torques adding linearly.

Since our relatively simple model reproduces these phenomena, we believe it constitutes a good null hypothesis for them. But beyond explaining experimental observations, this model makes inroads into the hard problem of how the central nervous system (CNS) can generate effective control signals, recently dubbed the ‘supraspinal pattern formation’ problem (Bizzi and Ajemian, 2020). From our perspective, the CNS does not need to generate precise activation patterns for muscles and synergies; it needs to figure out which perceptions need to change. It is subcortical structures that learn the movement details. The key to make such a model work is the differential Hebbian learning framework in Verduzco-Flores et al., 2022, which handles the final credit assignment problem.

We chose not to include a model of the cerebellum at this stage. Our model reflects the brain structure of an infant baby who can make clumsy reaching movements. At birth the cerebellum is incomplete and presumably not functional. It requires structured input from spinal cord and cortex to establish correct synaptic connections during postnatal development and will contribute to smooth reaching movements at a later age.

Encompassing function, learning, and experimental phenomena in a single simple model is a promising start towards a more integrated computational neuroscience. We consider that such models have the potential to steer complex large-scale models so they can also achieve learning and functionality from scratch.

Methods

Simulations were run in the Draculab simulator (Verduzco-Flores and De Schutter, 2019). All the parameters from the equations in this paper are presented in the Appendix. Parameters not shown can be obtained from Python dictionaries in the source code. This code can be downloaded from: https://gitlab.com/sergio.verduzco/public_materials/-/tree/master/adaptive_plasticity.

Unit equations

With the exception of the A and SP populations, the activity ui of any unit in Figure 1 has dynamics:

(5) τuui˙=σ(I)ui,
(6) σ(I)=11+exp(β(Iη)),

where τ is a time constant, β is the slope of the sigmoidal function, η is its threshold, and I=jωijuj(t-Δtj) is the sum of delayed inputs times their synaptic weights.

Units in the CE,CI populations (in the spinal learning model) or in M (in the cortical learning model) had an additional noise term, which turned Equation 5 into this Langevin equation:

(7) dui(t)=1τu(σ(I)ui(t))+ςdW(t),

where W(t) is a Wiener process with unit variance, and ς is a parameter to control the noise amplitude. This equation was solved using the Euler-Maruyama method. All other unit equations were integrated using the forward Euler method. The equations for the plant and the muscles were integrated with SciPy’s (https://scipy.org/) explicit Runge-Kutta 5(4) method.

Units in the A population use a rectified logarithm activation function, leading to these dynamics for their activity:

(8) τaa˙=log([1+IT]+)a,

where τa is a time constant, I is the scaled sum of inputs, T is a threshold, and [x]+=max(x,0) is the "positive part" function.

Learning rules

The learning rule for the connections from M to CE,CI units in the spinal learning model was first described in Verduzco-Flores et al., 2022. It has an equation:

(9) ω˙ij(t)=-(e¨j(t)-e¨(t))(c˙i(t-Δt)-c˙(t-Δt)).

In this equation, ej(t) represents the activity of the j-th unit in M at time t, and e¨j(t) is its second derivative. Angle brackets denote averages, so that e¨1NMke¨k, where NM is the number of M units. c˙i(t) is the derivative of the activity for the postsynaptic unit, and Δt is a time delay ensuring that the rule captures the proper temporal causality. In the Supplementary Discussion of the Appendix we elaborate on how such a learning rule could be present in the spinal cord.

The learning rule in 9 was also fitted with soft weight-bounding to prevent connections from changing sign, and multiplicative normalization was used to control the magnitude of the weights by ensuring two requirements: (1) all weights from projections of the same M unit should add to wsa, (2) all weights ending at the same C unit should add to wsb. With this, the learning rule adopted the form:

(10) ω˙ij=αωij(Δ+λ[(0.5(ζsa+ζsb)1)]),

In this equation α is a constant learning rate, Δ is the right-hand side expression of Equation 9, and λ is a scalar parameter. The value ζsa is wsa divided by the sum of outgoing weights from the j-th M unit, and ζsb is wsb divided by the sum of incoming M weights on ci. This type of normalization is meant to reflect the competition for resources among synapses, both at the presynaptic and postsynaptic level.

The synapses in the connections from A to M and from A to C used the input correlation rule (Porr and Wörgötter, 2006):

(11) w˙=αICwIAI˙PA,

where IA is the scaled sum of inputs from the A population, αIC is the learning rate, IPA is the scaled sum of inputs from SPA or M, and I˙PA is its derivative. Unlike the original input correlation rule, this rule uses soft weight bounding to avoid weights changing signs. Moreover, the sum of the weights was kept close to a ωs value. In practice this meant dividing the each individual w value by the sum of weights from A-to-M (or A-to-C) connections, and multiplying times ωs at each update. In addition, weight clipping was used to keep individual weights below a value ωmax.

The learning rule in the cortical learning model was the same, but the presynaptic units were in SPA, and the postsynaptic units in M.

Exploratory mechanism

Without any additional mechanisms the model risked getting stuck in a fixed arm position before it could learn. We included two mechanisms to permit exploration in the system. We describe these two mechanisms as they were applied to the spinal learning model and its two variations. The description below also applies to the case of the cortical learning model, with the M units (instead of the C units) receiving the noise and extra connections.

The first exploratory mechanism consists of intrinsic noise in the CE and CI interneurons, which causes low-amplitude oscillations in the arm. We have observed that intrinsic oscillations in the CE,CI units are also effective to allow learning (data not shown), but the option of intrinsic noise permits the use of simple sigmoidal units in C, and contributes to the discussion regarding the role of noise in neural computation.

The second mechanism for exploration consists of an additional unit, called ACT. This unit acted similarly to a leaky integrator of the total activity in SPA, reflecting the total error. If the leaky integral of the SPA activity crossed a threshold, then ACT would send a signal to all the CE and CI units, causing adaptation. The adaptation consisted of an inhibitory current that grew depending on the accumulated previous activity.

To model this, CE and CI units received an extra input Iadapt. When the input from the ACT unit was larger than 0.8, and Iadapt<0.2, the value of Iadapt would be set to (uislow)2. This is the square of a low-passed filtered version of ui. More explicitly,

(12) τslowu˙islow=ui-uislow.

If the input from ACT was smaller than 0.8, or Iadapt became larger than 0.2, then Iadapt would decay towards zero:

(13) τslowI˙adapt=-Iadapt.

With this mechanism, if the arm got stuck then error would accumulate, leading to adaptation in the spinal interneurons. This would cause the most active interneurons to receive the most inhibition, shifting the ‘dominant’ activities, and producing larger amplitude exploratory oscillations.

When a new target is presented, ACT must reset its own activity back to a low value. Given our requirement to fully implement the controller using neural elements, we needed a way to detect changes in SP. A unit denominated CHG can detect these changes using synapses that react to the derivative of the activity in SP units. CHG was connected to ACT in order to reset its activity.

More precisely, when inputs from CHG were larger than 0.1, the activity of ACT had dynamics:

(14) a˙(t)=40a(t).

Otherwise it had these dynamics:

(15) a˙(t)=a(t)(σ(I)θACT), if σ(I)<θACT,
(16) τACTa˙(t)=(σ(I)-θACT)[1-a(t)+γσ˙(I)], otherwise.

As before, σ() is a sigmoidal function, and I is the scaled sum of inputs other than CHG. When σ(I) is smaller than a threshold θACT the value of a actually decreases, as this error is deemed small enough. When σ(I)>θACT the activity increases, but the rate of increase is modulated by a rate of increase σ˙(I)σ(I)-σ(I~), where I~ is a low-pass filtered version of I is a constant parameter.

CHG was a standard sigmoidal unit receiving inputs from SP, with each synaptic weight obeying this equation:

(17) ωj(t)=α|s˙j(t)|ωj(t),

where sj represents the synapse’s presynaptic input.

Plant, muscles, afferents

The planar arm was modeled as a compound double pendulum, where both the arm and forearm were cylinders with 1 kg. of mass. No gravity was present, and a moderate amount of viscous friction was added at each joint (3 Nmsrad). The derivation and validation of the double pendulum’s equations can be consulted in a Jupyter notebook included with Draculab’s source code (in the tests folder).

The muscles used a standard Hill-type model, as described in Shadmehr and Wise, 2005, Pg. 99. The muscle’s tension T obeys:

(18) T˙=KSEb[gI+KPEΔx+bx˙(1+KPEKSE)T],

where I is the input, g an input gain, KPE the parallel elasticity constant, KSE the series elasticity constant, b is the damping constant for the parallel element, x is the length of the muscle, and Δx=x-x1*-x2*. In here, x1* is the resting length of the series element, whereas x2* is the resting length of the parallel element. All resting lengths were calculated from the steady state when the hand was located at coordinates (0.3, 0.3).

We created a model of the Ia and II afferents using simple structural elements. This model includes, for each muscle one dynamic nuclear bag fiber, and one static bag fiber. Both of these fibers use the same tension equation as the muscle, but with different parameters. For the static bag fiber:

(19) T˙s=KSEsbs[KPEsΔx+bsx˙(1+KPEsKSEs)Ts].

The dynamic bag fiber uses the same equation, with the s superscript replaced by d. No inputs were applied to the static or dynamic bag fibers, so they were removed from these equations. The rest lengths of the static and dynamic bag fibers where those of their corresponding muscles times factors l0s,l0d, respectively.

The Ia afferent output is proportional to a linear combination of the lengths for the serial elements in both dynamic and static bag fibers. The II output has two components, one proportional to the length of the serial element, and one approximately proportional to the length of the parallel element, both in the static bag fiber. In practice this was implemented through the following equations:

(20) Ia=gIa[(fsIaKSEs)Ts+(1fsIaKSEd)Td],
(21) II=gII[(fsIIKSEs)Ts+(1fsIIKPEs)(Tsbsx˙)].

In here, gIa and gII are gain factors. fsIa and fsII are constants determining the fraction of Ia and II output that comes from the serial element.

The model of the Golgi tendon organ producing the Ib outputs was taken from Lin and Crago, 2002. First, a rectified tension was obtained as:

(22) r=gIblog(T+/T0+1).

gIb is a gain factor, T0 is a constant that can further alter the slope of the tension, and T+=max(T,0) is the tension, half-rectified. The Ib afferent output followed dynamics:

(23) τIbI˙b=rIb.

Static connections

In all cases, the connections to SA used one-to-one connectivity with the A units driven by the II afferents, whereas connections from A to M and C used all-to-all projections from the units driven by the Ia and Ib afferents. Projections from SA to SPA used one-to-one excitatory connections to the first 6 units, and inhibitory projections to the next six units. Projections from SP to SPA used the opposite sign from this.

Connections from SPA to M were one-to-one, so the j-th unit in SPA only sent a projection to unit j in M. A variation of this connectivity is presented in the Appendix (See Variations of the spinal learning model).

We now explain how we adjusted the synaptic weights of the static network. To understand the projections from M to C and to the alpha motoneurons it is useful to remember that each CE,CI,α trio is associated with one muscle, and the M units also control the error of a single muscle. This error indicates that the muscle is longer than desired. Thus, the M unit associated with muscle i sent excitatory projections to the CE and α units associated with muscle i, and to the CI units of the antagonists of i. Additionally, weaker projections were sent to the CE,α units of muscle i’s agonists. Notice that only excitatory connections were used.

The reverse logic was used to set the connections from A to C and M. If muscle i is tensing or elongating, this can predict an increase in the error for its antagonists, which is the kind of signal that the input correlation rule is meant to detect. Therefore, the Ib afferent (signaling tension) of muscle i sent an excitatory signal to the CI unit associated with muscle i, and to the CE,α units associated with i’s antagonists. Moreover, this Ib afferent also sent an excitatory projection to the dual of the M unit associated with muscle i. Connections from Ia afferents (roughly signaling elongation speed) followed the same pattern, but with slightly smaller connection strengths.

Rotational dynamics

We explain the method to project the activity of M onto the jPCA plane. For all units in M we considered the activity during a 0.5 s sample beginning 50 ms after the target onset. Unlike (Churchland et al., 2012), we did not apply PCA preprocessing, since we only have 12 units in M. Let mi,j,k,t be the activity at time t of the unit i in M, when reaching at target j for the k-th repetition. By mi,j,k,t we denote the average over all repeated reaches to the same target, and by mi,j,k,t we indicate averaging over both targets and repetitions. The normalized average trace per condition is defined as: mi,j(t)mi,j,k,t-mi,j,k,t. Let I stand for the number of units in M, T for the number of time points, and J for the number of targets. Following (Churchland et al., 2012), we unroll the set of mi,j(t) values into a matrix XRJT×I, so we may represent the data through a matrix M that provides the least-squares solution to the problem X˙=XM. This solution comes from the equation M^=(XTX)-1XTX˙. Furthermore, this matrix can be decomposed into symmetric and anti-symmetric components Msymm=(M^+M^T)/2,Mskew=(M^-M^T)/2. The jPCA plane comes from the complex conjugate eigenvalues of Mskew.

In practice, our source code follows the detailed explanation provided in the Supplementary Information of Churchland et al., 2012, which reformulates this matrix problem as a vector problem.

Parameter search

We kept all parameter values in a range where they still made biological sense. Parameter values that were not constrained by biological data were adjusted using a genetic algorithm, and particle swarm optimization (PSO). We used a separate optimization run for each one of the configurations, consisting of roughly 30 iterations of the genetic and PSO algorithms, with populations sizes of 90 and 45 individuals respectively. After this we manually adjusted the gain of the control loop by increasing or decreasing the slope of the sigmoidal units in the M and SA populations. This is further described in the Appendix (Gain and oscillations section).

The parameters used can affect the results in the paper. We chose parameters that minimized either the error during the second half of the learning phase, or the error during center-out reaching. Both of these measures are agnostic to the other results.

Preferred direction vectors

Next we describe how PD vectors were obtained for the M units.

Let mjk denote the firing rate of the j-th M unit when reaching for the k-th target, averaged over 4 s, and across reaches to the same target. We created a function hj:2 that mapped the X,Y coordinates of each target to its corresponding mjk value, but in the domain of hj the coordinates were shifted so the center location was at the origin.

Next we approximated hj with a plane, using the least squares method, and obtained a unit vector uj normal to that plane, starting at the intersection of the z-axis and the plane, and pointing towards the XY plane. The PD vector was defined as the projection of uj on the XY plane.

In order to predict the PD vectors, we first obtained for each muscle the ‘direction of maximum contraction’, verbally described in panel B of Figure 5. More formally, let lik denote the length of the i-th muscle when the hand is at target k, and let li0 denote its length when the hand is at the center location. With r¯k we denote the unit vector with base at the center location, pointing in the direction of the k-th target. The direction of maximum length change for the i-th muscle comes from the following vector sum:

(24) v¯i=k=18[li0likli0]+r¯k,

where [x]+=max(x,0).

For the j-th unit in M, its predicted PD vector comes from a linear combination of the v¯i vectors. Let the input to this unit be iwjiei, where ei is the output of the i-th SPF unit (representing the error in the i-th muscle). The predicted PD vector is:

(25) d¯j=i=05wjiv¯i

To obtain the main axis of the PD distribution, the i-th PD vector was obtained in the polar form (ri,θi), with θ[π,π]. We reflected vectors in the lower half using the rule: θi*=θi+π if θi<0,θi=θi otherwise. The angle of the main axis was the angle of the average PD vector using these modified angles: θmain=arctan(irisinθi*iricosθi*).

Statistical tests

To find whether M units were significantly tuned to the reach direction we used a bootstrap procedure. For each unit we obtained the length of its PD vector 10,000 times when the identity of the target for each reach was randomly shuffled. We considered there was significant tuning when the length of the true PD vector was longer than 99.9% of these random samples.

To obtain the coefficient of determination for the predicted PD angles, let θtruej denote the angle of the true PD for the j-th M unit, and θpredj be the angle of its predicted PD. We obtained residuals for the angles as ϵj=θtruej-θpredj, where this difference is actually the angle of the smallest rotation that turns one angle into the other. Each residual was then scaled by the norm of its corresponding PD vector, to account for the fact that these were not homogeneous. Denoting these scaled residuals as ϵj* the residual sum of squares is SSres=j(ϵj*)2. The total sum of squares was: SStot=j(θtruej-θ¯true)2, where θ¯true is the mean of the θtruej angles. The coefficient of determination comes from the usual formula R2=1-SSresSStot.

To assess bimodality of the PD distribution we used a version of the Rayleigh statistic adapted to look for bimodal distributions where the two modes are oriented at 180 degrees from each other, introduced in Lillicrap and Scott, 2013. This test consists of finding an modified Rayleigh r statistic defined as:

(26) r=1N((i=1Ncos(2ϕi))2+(i=1Ncos(2ϕi))2),

where the ϕi angles are the angles for the PDs. A bootstrap procedure is then used, where this r statistic is produced 100,000 times by sampling from the uniform distribution on the (0,π) interval. The PD distribution was deemed significantly bimodal if its r value was larger than 99.9% of the random r values.

We used a bootstrap test to find whether there was statistical significance to the linear addition of direction fields. To make this independent of the individual pair of locations stimulated, we obtained the direction fields for all 15 possible pairs of locations, and for each pair calculated the mean angle difference between D(a+b) and D(a)+D(b) as described in the main text. We next obtained the mean of these 15 average angle deviations, to obtain a global average angle deviation γglobal.

We then repeated this procedure 400 times when the identities of the stimulation sites a,b were shuffled, to obtain 400 global average angle deviations γglobalj. We declared statistical significance if γglobal was smaller than 99% of the γglobalj values.

Appendix 1

Supplementary discussion

Comparison with previous models

There are many other models of reaching and motor control. Most of them have one or more of the following limitations:

  1. They use non-neural systems to produce motor commands.

  2. They control a single degree of freedom, sidestepping the problem of controller configuration, since the error is one-dimensional.

  3. They do not model a biologically plausible form of synaptic learning.

We will contrast our model with some of the work that does not strongly present these limitations, and with a few others. Due to space constraints the contributions of many models will not be addressed, and for those mentioned we will limit ourselves to explain some of their limitations.

The model in DeWolf et al., 2016 is similar has similar goals to our model, but with very different assumptions. In their model, motor cortex receives a location error vector x, and transforms it into a vector of joint torques. So that this transformation implements adaptive kinematics, it must approximate a Jacobian matrix that includes the effects of the arm’s inertia matrix, using location errors and joint velocities as training signals. This is accomplished by adapting an algorithm taken from the robotics literature (Cheah et al., 2006), implementing it in a spiking neural network. Additionally a second algorithm from robotics (Sanner and Slotine, 1992) is used to provide an adaptive dynamics component, which is interpreted as the cerebellar contributions.

In order to implement vector functions in spiking neural networks, DeWolf et al., 2016 uses the Neural Engineering Framework (Bekolay et al., 2014). The essence of this approach is to represent values in populations of neurons with cosine-like tuning functions. These populations implement expansive recoding, becoming a massively overcomplete basis of the input space. Implementing a function using this population as the input is akin to using a linear decoder to extract the desired function values from the population activity. This can be done through standard methods, such as least-squares minimization, or random gradient descent. The parameters of the linear decoder then become weights of a feedforward neural layer implementing the function.

The model in DeWolf et al., 2016 has therefore a rather different approach. They use engineering techniques to create a powerful motor control system, using algorithms from robotics, and 30,000 neurons to approximate their computations, which are then ascribed to sensory, motor, and premotor cortices, as well as the cerebellum. In contrast, we use 74 firing rate units, and unlike (DeWolf et al., 2016) we include muscles, muscle afferents, transmission delays, and a spinal cord.

There is nothing intrinsically wrong with using an engineering approach to try to understand a biological function. The crucial part is which model will be experimentally validated. Some differences between the models that may be able to separate them experimentally are: (1) In DeWolf et al., 2016 premotor cortex is required to produce the error signal, whereas we ascribed this to sensory cortex. (2) In DeWolf et al., 2016 direct afferent connections to motor cortex are not considered, whereas in our model they are important to maintain stability during learning (in the absence of a cerebellum). (3) In DeWolf et al., 2016 spinal cord adaptation is not necessary to implement adaptive kinematics. In contrast, spinal cord adaptation is important in one of the interpretations of our model.

The model in Dura-Bernal et al., 2015 uses spiking neurons, and a realistic neuromechanical model in order to perform 2D reaching. The feedback is in term of muscle lengths, rather than muscle afferent signals. There is no mechanism to stop the arm, or hold it on target. Most importantly, learning relies on a critic, sending rewarding or punishing signals depending on whether the hand was approaching or getting away from the target. This is implicitly reducing the error dimension using a hidden mechanism. Furthermore, each single target must be trained individually, and it is not discussed how this can lead to a flexible reaching mechanism without suffering from catastrophic interference.

The model in Todorov, 2000 is used to obtain characteristics of M1 activity given the required muscle forces to produce a movement. It is an open-loop, locally-linear model, where all connections from M1 directly stimulate a linear motoneuron. Among other things, it showed that representations of kinematic parameters can appear when the viscoelastic properties of the muscles are taken into account, giving credence to the hypothesis that M1 directly activates muscle groups. Outside of its scope are neural implementation, learning, or the role of spinal cord.

Li et al., 2005 proposes a 2-level hierarchical controller for a 2-DOF arm. Since this model is based on optimal control theory, it is given a cost function, and proceeds by iteratively approaching a solution of the associated Jacobi-Bellman equation. There is no neural implementation of these computations, nor a description of learning.

Martin et al., 2009 is not properly a neural model, but it is rooted in Dynamic Field Theory, which assumes that the population of neuronal activities encodes "activation variables". For this model activation variables represent kinematic parameters as a virtual joint location vector λ(t), which is an internal representation of the arm’s configuration.

The innovative part of Martin et al., 2009 is in describing how λ(t) is updated and used to control a redundant manipulator. In particular, the kinematic Jacobian, its null-space matrix, their derivatives, and the Moore-Penrose pseudoinverse are all computed analytically in order to obtain differential equations where the joint motions that move the end effector decouple from those which don’t.

Encapsulating the muscle commands into a virtual joint location whose dynamics are decoupled for motions that don’t affect the end-effector location is a very interesting concept. Still, this is far from a neural implementation, and learning is not considered.

The model in Caligiore et al., 2014 studies the long-term development of infant reaching using a PD controller, and an actor critic mechanism implementing a neural version of TD-learning (Sutton and Barto, 2018). The 2-dimensional PD controller receives two desired joint angles (interpreted as an equilibrium position), producing appropriate torques. Since it uses the Equilibrium Point (EP) Hypothesis (Feldman, 1986), the reaching portion of this model is tantamount to associating states with equilibrium positions. This model thus performs at a higher level of analysis. Our model could operate at the same level if we added a reinforcement learning component to learn SP values allowing the hand to touch a target whose distance is known. (Caligiore et al., 2014) does not consider separate neuronal populations (e.g. spinal cord, sensory cortex), propagation delays, or low-level learning.

The model in Izawa et al., 2004 shows how reinforcement learning can be applied to redundant actuators, such as biological arms. It is, however, not a neural model.

In Mici et al., 2018, a neural network produces predictions of visual input in order to deal with temporal delays in a sensorimotor system. The network used for this study uses non-local learning, and adds or destroys nodes as required during its operation. It is thus not biologically-plausible.

Tsianos et al., 2014 is a reaching model that also considers the spinal cord as a highly-configurable controller. Corticospinal inputs are assumed to be step commands, which means that motor cortex operates in an open-loop configuration. In order to produce reaching based on these constant commands, a biologically-implausible gradient descent mechanism is required, where the same reach is performed for various values of a synaptic weight, keeping the value that led to the best performance. Furthermore, the model learns to reach one target at a time, which would require learning anew when the new target is more than 45 degrees apart.

As mentioned in the main text, in the context of rotational dynamics, the model in Sussillo et al., 2015 was used to produce desired muscle forces using a recurrent neural network. This model uses the FORCE algorithm (Sussillo and Abbott, 2009) to adjust the weights of a neural network with activity vector c(t) so it can produce experimentally observed muscle activity c(t). Oscillatory dynamics arise when the model is constrained to be simple.

Although very insightful, this model is limited by the fact that Equations 2 and 3 represent an open-loop configuration, where only the M1 dynamics are considered. In essence, the model is doing a function approximation with the FORCE algorithm. The question of how the training data c(t) is produced is not addressed, nor is the role of spinal cord or sensory cortex (but see their Supplementary Figure 1).

Other than the aforementioned model in Tsianos et al., 2014, we are unaware of spinal cord models addressing arm reaching. When these models are coupled with a musculoskeletal system, it is usually for the control of one degree of freedom using antagonistic neural populations. We mention some examples next.

The spinal cord model in Bashor, 1998 inspired much of the subsequent work, organizing the spinal circuit into six pairs of populations controlling two antagonistic muscle groups at one joint. With this model, the effect of Ia and Ib afferent input was studied in various neuronal populations.

Stienen et al., 2007 used a model similar to that in Bashor, 1998 in conjunction with a one DOF musculoskeletal model to study the effect of Ia afferents in the modulation of reflexes. Cutsuridis, 2007 also adapted a similar model, and used it to inquire whether Parkinsonian rigidity arose from alterations in reciprocal inhibition mediated by reduced dopamine.

The model in Cisi and Kohn, 2008 has several features not found in Bashor, 1998, including a heterogeneous motoneuron population, and a mechanism to generate electromyograms. This was used to study the generation of the H-reflex, and response properties of the motor neuron pool (Farina et al., 2014).

The model in Shevtsova et al., 2015 goes beyond models such as Bashor, 1998 by using populations of genetically-identified interneurons. This model is used to study rhythm generating abilities of the spinal circuit, as well as coordination between flexors and extensors (Shevtsova and Rybak, 2016; Danner et al., 2017). Knowledge regarding the role of genetically identified spinal interneurons in movement generation is still evolving (Zelenin et al., 2021; Stachowski and Dougherty, 2021, e.g.).

This paper focuses on mammals, but the spinal cord of simpler organisms is better characterized (Borisyuk et al., 2011; Cangiano and Grillner, 2005, e.g.), and may lead to the first realistic models producing ethological behavior.

Possible implementations of the learning rule

The learning rule in Equation 9 is a Hebbian rule that also presents derivatives, heterosynaptic competition, and normalization (e.g. removing the mean) of its terms. None of these is new in a model claiming biological plausibility (Porr and Wörgötter, 2006; Zappacosta et al., 2018; Fiete et al., 2010; Kaleb et al., 2021, e.g.). We nevertheless mention possible ways for derivatives and normalized terms to appear.

Formally, the time derivative of a function f: evaluated at time t is the limit f(t+Δt)-f(t)Δt as Δt0. If f represents the firing rate of a cell, a measure of change roughly proportional to the derivative can come from f(t)-f(t-Δt) for some small value Δt. The most obvious way that such a difference may arise is through feedforward inhibition (for the ej signal), and feedback inhibition (for the ci signal). Feedforward and feedback inhibition are common motifs in spinal circuits (Pierrot-Deseilligny and Burke, 2005).

A somewhat different way to approach a time derivative is by using two low-pass filters with different time constants:

dfdtffast(t)fslow(t),

where

τ1f˙fast(t)=f(t)-ffast(t),τ2f˙slow(t)=f(t)-fslow(t),τ2τ1.

These principles are illustrated in Lim and Goldman, 2013, where they are used to explain negative-derivative feedback.

Low-pass filtering can also arise in the biochemical cascades following synaptic depolarization. The most salient case is intracellular calcium concentration, which has been described as an indicator of firing rate with leaky integrator dynamics (Helmchen, 1999). Although the physiology of spinal interneurons has not been characterized with sufficient detail to make specific hypotheses, it is clearly possible that feedback inhibition and low-pass filtering are enough to approximate a second-order derivative.

The ej,ci terms in our learning rule are mean-centered. The most straightforward way to subtract a mean is to have inhibitory units with converging inputs (e.g. receiving all the ej signals) providing input to the ci units. The Ib interneurons (Pierrot-Deseilligny and Burke, 2005) are one possibility for mediating this. Another possibility is that the mean-subtraction happens at the single unit level when the input (ej) and lateral (ci) connections are located at different parts of the dendritic tree. In particular, a larger level of overall input activation e¨ could produce a scarcity of postsynaptic resources flowing from the main branches of the dendritic tree into the individual dendritic spines, resulting in reduced plasticity.

Limitations of the model

A model as simple as ours will undoubtedly be wrong in many details. The main simplifying assumptions of our model are:

  • Firing rate encoding. Each unit in this model captures the mean-field activity of a neuronal population. This may occlude computations depending on spike timing, as well as fine-grained computations at the neuronal and dendritic level.

  • Trivial sensory cortex. We assumed that sensory cortex conveyed errors directly based on static gamma afferent activity. Sensory cortex may instead build complex representations of afferent activity. It would be interesting to test how the requirement of controllability could guide learning of these representations.

  • No visual information. Should a visual error be available, it could be treated by M in a similar way to somatosensory errors. If the visual error holds a monotonic relation with the afferent signals, then it should be possible to adjust the long-loop reflex in order to reduce it. When the relation between the visual error and the afferent signals is not monotonic (e.g. in some context the afferent signals correlate positively, and in other negatively), an alternative approach involving reinforcement learning can be pursued (Verduzco-Flores et al., 2022).

  • Very gross anatomical detail. The detailed anatomical organization of cortex and spinal cord is not considered. Moreover, the proportions for different types of cells are not considered.

  • Errors must be monotonic. Muscle activation may not monotonically reduce visual errors. Moreover, the haptic perception of contact is dependent on the environment, so it would not make an appropriate error signal.

  • No cerebellum, basal ganglia, brainstem, or premotor cortex.

Each of these omissions is also a possible route for improving the model. We aim to grow a more sophisticated model, but each new component must integrate with the rest, improve functionality and agree with biological literature.

A final limitation concerns proofs of convergence. Many factors complicate them for this model: transmission delays, noise, synaptic learning, fully neural implementation, as well as complex muscle and afferent models. We tested our results for many initial conditions, but this of course is no guarantee.

50 years ago Marr’s model of the cerebellum became a stepping stone to further theoretical and experimental progress, despite all its shortcomings (Kawato et al., 2021). We aspire our model to be the next step towards a complete model of motor control.

Variations of the spinal learning model

The main text mentions two variations of the spinal learning network that emphasize the robustness and potential of the learning mechanism. We will explain the rationale behind those two variations.

There is evidence for interneurons that drive a set of muscles, possibly setting the circuit foundation for motor synergies (Giszter, 2015; Levine et al., 2014; Bizzi and Cheung, 2013). To explore whether our ideas were compatible with interneurons activating multiple muscles, we explored whether reaching can be learned when the CE and CI units send projections to more than one motoneuron. To achieve this we modified the architecture of Figure 1 so that for every combination of two different muscles there was a pair of CE,CI units that stimulated both of them.

As illustrated in the Appendix 1—figure 1, from the set of 6 muscles there are 15 combinations of 2 muscles, but 3 of them consist of antagonist pairs. Removing these we are left with 12 pairs of muscles, and for each muscle pair we had a Wilson-Cowan-like CE,CI pair sending projections to the alpha motoneurons of both muscles. Furthermore, for each pair of muscles, there is another pair that contains both their antagonists, and we can use this fact to generalize the concept of antagonists when interneurons project to several motoneurons. The CE units sent projections to the CI units of their antagonists. This organization allowed us to maintain the balance between excitation and inhibition in the network, along with the connectivity motifs used previously.

Appendix 1—figure 1
Modified architecture of the C population.

Each pair of CE,CI units projects to two α motoneurons (top). There are 15 possible pairs of muscles, corresponding to the blue lines for each arm in the figure. Three of the pairs (marked with red crosses) contain antagonist muscles, and are not included. The remaining 12 pairs can be arranged into 2 groups of 6 units each. The units in the group marked with green circles are the antagonists of the units with the same number, marked with pink circles.

Because this model could be considered a proof-of-concept for the compatibility of our learning mechanisms with this particular type of synergies, we refer to this model as the “synergistic” network.

To introduce the second variation of the spinal learning network, we may notice that in all configurations so far the projections from SPA to M use one-to-one connectivity (each M unit controls the length error of one muscle). Interestingly, this is not necessary. In a second variation of the spinal learning network, dubbed the “mixed errors” network, each unit in M can be driven by a linear combination of SPA errors.

To ensure that the information about the SPA activity was transmitted to M, we based our SPA to M connections on the following 6-dimensional orthogonal matrix:

(A1) R=[111111111-1-1-11-21-12-1-1-122-1-1-11001-1-101-101].

The rows of this matrix form an orthogonal basis in 6, and normalizing each row we obtain a matrix R* whose rows form an orthonormal basis. Connections from the 12 SPA units to the 12 M units used this 12 × 12 matrix:

(A2) [R*-R*-R*R*].

The one-to-one connections from SPA to M used in our models are unrealistic, but the mixed errors network shows that this simplification can be overcome, since all the results of the paper also apply to this variation.

All numerical tests applied to the 3 configurations in the main text of the paper were also applied to the two variations of the spinal learning model. Results can be seen in this Appendix, in the Comparison of the 5 configurations. Figure 2—figure supplements 3 and 4 illustrate the training phase for the synergistic and mixed error networks.

We also tested whether the stimulation of an isolated spinal cord produced convergent direction fields in the synergistic network, as was done in the main text for the spinal network C common to the other four configurations. We found that the mean angle difference γa,b between the direction fields D(a+b) and D(a)+D(b), averaged over the 144 possible (a,b) pairs, was 19.8 degrees.

Randomly choosing pairs (a,b) and (c,d) for the stimulation locations lead to a γa,b angle of 72.3 degrees. As before, a bootstrap test showed that this γa,b value is significantly different (p<0.0001). Removing the resting field does not alter this result. Moreover, we found no evidence for supralinear summation of force fields in the synergistic network.

The model fails when elements are removed

Due to the larger number of tests, we only used 5 trials for each configuration in this section. The p values reported in this section come from the one-sided t-test. For brevity, the different configurations of the model will be denoted by numbers in the rest of this Appendix: 1=spinal learning, 2=cortical learning, 3=static network, 4=synergistic network, 5=mixed errors network.

A model with fully random connectivity and no plasticity has an exceedingly low probability of having an input-output structure leading it to reduce errors. The configurations of the model with plasticity (configurations 1,2,4,5), however, only have random connections at one of the projections in the sensorimotor loop (either from M to C, or from SPA to M). This may increase the chance of randomly obtaining a good input-output structure, which could throw the usefulness of the learning rules into question.

Removing both types of plasticity in configurations 1,2,4,5 impaired reaching in all 5 tests for each configuration, as reflected by the inability to reduce the average error below 10 centimeters in any of the last 4 reaches of the learning phase. This was also true when removing only the plasticity in the connections from M to C (configurations 1, 4, 5) or from SPA to M (configuration 2). In each one of the plastic configurations (1,2,4,5) the average error for the last 4 training targets was (22±10|22±4|24±9|22±5) centimeters, which was significantly higher than the case with normal plasticity (p<0.001 for configurations 2,4,5, p=0.028 for configuration 1, where the failed trials were not discarded).

Removing plasticity in the connections from A to C or from A to M individually had for the most part no statistically significant effects. Removing plasticity in both connections simultaneously, however, roughly duplicated the error in configurations 2 and 4 during center-out reaching (one-sided t-test, p<0.001). Error may be slightly increased for configuration 1 (the small sample size allowed no strong conclusions), whereas configuration 5 was seemingly unaffected.

Configurations (1,2,4,5) could still learn to reach after removing the ACT unit. Configuration 4 roughly duplicated its center-out reaching error (p<0.001) and its time to initially reach (p=.016). Configuration 1 increased its time to initially reach about 3 times (p<0.001), and the other two configurations were seemingly unaffected. The ACT unit was essential for previous, less robust versions of the model.

Removing noise made learning too slow, to the point where mean error in the last 4 training reaches could not be reduced below 10 cm in any trial for configurations 4, and 5. It was reduced below 10 cm in a single trial for configuration 2. Configuration 1 managed t learn normally. Center-out reaches were not possible in configurations 2, 4, and 5 with mean errors of (15±8|17±11|9±3) centimeters respectively, at least 3 times larger than the models with noise (p<0.001). Center-out reaches were normal in configuration 1, but the first reach with mean error below 10 centimeters took significantly longer to happen (from 2.5 to 6.4 attempts in average, p=0.001).

Removing Ia and Ib afferents, and instead sending the output of II afferents to C, M, and SA prevented reaching in configurations 1, 2, 3, and 4 (except for a single trial in configuration 1). Configuration 5 could still learn to reach, but the mean error in the last 4 training reaches and during the center-out reaching was significantly higher (p<0.001).

Removing the agonist-antagonist connections in C prevented reaching in (5| 0| 0| 3| 2) trials for configurations (1,2,3,4,5) respectively. Error in center-out reaching was significantly increased for configurations 4 and 5, and it did not increase significantly for configurations 2 and 3.

Appendix 1—figure 2 is the same as Figure 2 of the main text, but in this case the noise and the ACT units were both removed. The average distance from the hand to the target was roughly 18 cm. A video illustrating this failure to learn is included with the supplementary videos.

Appendix 1—figure 2
A failure to learn for a simulation of configuration 1 (spinal learning) with no noise and no ACT unit.

Captions are as in Figure 2 of the main text.

Configuration 1 was resilient to removal of noise and the ACT unit individually. The simulation of Appendix 1—figure 2 suggests that removing more than one element will have larger consequences on the performance of the model.

Comparison of the 5 configurations

Once again, the 5 configurations in this paper are represented by a number: 1=spinal learning, 2=cortical learning, 3=static network, 4=synergistic network, 5=mixed errors network. The following table shows the connectivity in each one. Abbreviations: A2A: all-to-all, O2O: one-to-one, DH: the differential Hebbian learning rule from Verduzco-Flores et al., 2022, IC: the Input Correlation learning rule, S: static connections (see section for details on the weights of static connections).

ConfigurationSPA to MM to CA to C,M
1O2O, SA2A, DHA2A, IC
2A2A, DHSA2A, IC
3O2O, SSS
4O2O, SA2A, DHA2A, IC
5SA2A, DHA2A, IC
  • The spinal learning model (configuration 1) is a “basic” network where the input-output structure of the control loop happens in the spinal cord, in the connections from M to C.

  • The cortical learning model (configuration 2) is also a “basic” network, but the input-output structure is resolved in the intracortical connections from SPAtoM.

  • The static network (configuration 3) uses only static connections, and is meant to show that the results in the paper appear in a close to optimal configuration of feedback control, rather than being some sophisticated product of the plasticity rules.

  • The synergistic network (configuration 4) is an extension of configuration 1, where the spinal cord has 12 CE, CI pairs rather than 6, and each pair stimulates 2α motoneurons.

  • The mixed errors network (configuration 5) is a different variation of configuration 1, where the connections from SPAtoM are not one-to-one, but instead come from an orthogonal matrix.

The following table summarizes the numerical results for the 5 configurations.

Measurement12345
Failed reaches11.8±21.2±.90±01.6±1.34±2.5
Center-out error23.3±.012.9±.0012.9±.00033±.0082.8±.0007
M units tuned
to direction11.8±.412±012±012±012±0
R2 forfor
predicted PD.74±.18.88±.14.86±.01.89±.06.82±.03
PD distribution
main axis (deg)59±752±254±.560±358±1
PD drift
angle (deg)3.3±2.44.9±2.1.3±.21.8±1.37±6
Muscle PD
drift angle (deg)3.8±2.16.4±2.9.2±.211.4±15.227.7±34.5
Center-out error
(10 targets)33.62.92.64.5
Variance in
first jPCA.42±.04.42±.04.46±.03.45±.04.47±.07
Center-out error
(light arm)2.53.236.12.9
Center-out error
(heavy arm)2.43.32.95.63.2

Average number of reaches before the first reach when the mean error was below 10 cm. 2 Average distance (in centimeters) between the hand and the target during center-out reaching.

Gain and oscillations

The gain of a feedback loop describes how the amplitude of the error signal increases as it gets transformed into a control signal sent to the plant. As described in the Methods (section), we used a relatively low number of iterations of an optimization algorithm to find suitable parameters for each configuration of the model. This led to configurations with gains that were coarsely tuned. Figure 3—figure supplement 1 is analogous to Figure 3, and shows the hand trajectories right after the optimization algorithm was finished. It can be observed that configurations 2 and 3 were particularly prone to oscillations, and configuration 1 would undershoot many targets.

To improve performance, as well as to facilitate comparison of the 5 configurations, we adjusted their gains. This involved manually adjusting the slope of the sigmoidal units in populations M and SA, until they appeared stable, but on the verge of oscillating (so reaching would be faster). This required from 1 to 3 attempts. The gain of configuration 1 was slightly increased, whereas the gain of configurations 2,3,4 was reduced. Configuration 5 was left with the same parameters.

The trajectories in panels C and D of Figure 3—figure supplement 1 are reminiscent of terminal tremors in cerebellar ataxia. An animation showing the movement of the arm for the 5 configurations before gain adjustment is included among the Supplementary Videos. In addition, Figure 3—figure supplements 2 and 3 show the error and activity of several units during the training reaches for configurations 3 and 4, analogous to Figure 2. It can be observed that the oscillations are present in the whole network, suggesting that the control signals are trying to catch up with an error that keeps reversing direction.

Supplementary videos

To help visualization of the arm’s learning and performance under different conditions, 4 videos were produced. The videos indicate the model configuration using the enumeration from this Appendix: 1=spinal learning, 2=cortical learning, 3=static network, 4=synergistic network, 5=mixed errors network.

To download these videos, please visit https://gitlab.com/sergio.verduzco/public_materials/-/tree/master/adaptive_plasticity

The videos’ content is as follows:

Appendix 1—video 1
Visualization of the arm and the muscles during the learning phase for configuration 1.

Data comes from the simulation shown in Figure 2. Speed is roughly 4 X.

Appendix 1—video 2
Arm animation of the first 180 seconds of center-out reaching for the 5 configurations.

Speed is roughly 4 X.

Appendix 1—video 3
The learning phase for a simulation with configuration 1.

Both noise and the unit were removed, reducing exploration and disrupting learning. Data comes from the same simulation in Appendix 1—figure 2. Speed is 4 X.

Appendix 1—video 4
The first 180 seconds of center-out reaching for the 5 configurations before the gains were adjusted.

Speed is roughly 4 X. Configuration 2 shows target-dependent oscillations after 70 seconds.

Parameter values

Values appear in order for configurations 1–5. A single number means that all configurations use that same value.

Unit parameters

The superscript x on a population name indicates that a parameter has heterogeneous values. This means that a random value was added to the parameter for each individual unit. This random value comes from a uniform distribution centered at zero, with a width equal to 1% of the reported parameter value.

ParameterEquationPopulationValue
τu5CHG,A,ACT10 [ms]
α,SA,SPAx,CI20 [ms]
CE140, 70, 150, 180, 110 [ms]
Mx50 [ms]
β6αx,ACT2
CE1.63, 1.72, 1.70, 3.38, 1.44
CI4.0, 3.38, 3.44, 2.46, 3.63
Mx1.5, 1.5, 2.0, 2.0, 1.17
SA3.0, 2.2, 2.0, 2.3, 3.0
CHG,SPAx9
η6αx1.1
CE2.0, 1.93, 2.13, 2.31, 1.67
CI1.5, 1.41, 1.63, 1.72, 1.7
Mx1.3, 1.96, 0.68, 1.19, 1.38
ACT1
SA0.75
units 0,3
SA.4
units 1,2,5
SA.3
unit 4
CHG.25
SPAx.1
ς7CE,CI0.63, 0, 0, 0.69, 0.72
M0, 0.62, 0, 0, 0
τa8A10 [ms]
T8A.2
Ib,II afferents
A0
Ia afferents
τslow12, 13CE,CI11 [s]
θACT15, 16ACT.31
τACT16ACT10 [ms]
γ16ACT8
α17CHG (synapse)20

Learning rules

ParameterEquationValue
Δt90.33, 0.37, 0.15, 0.36, 0.32 [s]
α(MtoC)10500, 0, 0, 500, 500
α(SPAtoM)100, 527, 0, 0, 0
α(Mtoα)10300, 0, 0, 300, 300
λ10.03
ωsa(MtoC,α)1102.52
ωsa(SPAtoM)103.23, 3.19, 2.98, 3.23, 3.23
ωsb(MtoC)103.29, 2.14, 1.50, 3.69, 0.57
ωsb(Mtoα)102.86, 2.86, 1.50, 2.86, 2.86
αIC(AtoM)1126.17
αIC(AtoC)1122.5
ωs(AtoM)110.85, 1.14, 1, 0.53, 0.53
ωs(AtoC)111.68, 2, 2, 1.55, 2.88
ωmax(AtoM)11.48,.22,.2,.25,.33
ωmax(AtoC)11.3,.64,.64,.28,.59
1 Constraints in the sum of weights are also used with static connections

Muscles and afferents

ParameterEquationValue
KSE1820 [N/m]
KPE1820 [N/m]
b181 [N.s / m]
g1867.11 [N]
muscles 0,3
g18.75 [N]
muscles 1,2,4,5
KSEs19,20,212 [N/m]
KPEs19,20,212 [N/m]
bs19, 21.5[Ns / m]
KSEd19,20,211 [N/m]
KPEd19.2 [N/m]
bd192[N. s / m]
l0s19.7
l0d19.8
gIa20[7.5, 25, 25, 7.5, 25, 25][m-1]
muscles 0–5
fsIa200.1
gII21[5.46,8,8,5.46,8,8]*,
muscles 0–5[5.8,8,8,5.8,8,8]**
[m-1]
fsII210.5
gIb221
T02210 [N]
τg2350 [ms]

Connection delays and weights

Connections considered "local" used a delay of 10 [ms], unless those those connections implied a not-modeled disynaptic inhibition. All other connections had a delay of 20 [ms].

SourceTargetDelay
AM,SA20 [ms]
ACTCE,CI
αmuscle
MCE,CI,α
MM
afferentsA
CHG,SPAACT
SPAM,SPA
AM10 [ms]
CE,CIα
CE,CICE,CI
CICE
SA,SPSPA
SPCHG

The next table shows the value of fixed synaptic weights not specified in section. Columns indicate the source of the connection, rows indicate the target. "Aff" stands for the muscle afferents. Potentially plastic connections are marked as "+". If a connection marked “+” is static in one of the configurations, its weight is determined by the ωsa,ωsb,ωs parameters of Equations 10, 11.

αACECIMAff.SASPSPA
α1-1+
A2 (Ia,Ib),
4 (II)
ACT1
CE+.5a,.18b-1.8c+
CHG+
CI+.5c+
1.83d,.16e
M+*+
muscles1
SA1
SPA1 or –11 or –1–1.77
a Agonist connections, b Partial agonist connections. c Withing the same triplet. d Antagonist connections. e Partial antagonist connections.
units inhibited their duals with weights that depended on the configuration: −0.93,–0.74, −1.00,–1.14, 0.0.

All connections whose source is CHG or ACT have a weight of 1.

Data availability

The current manuscript is a computational study, so no data have been generated for this manuscript. The source code to generate all figures is available as two commented Jupyter notebooks. They can be downloaded from the following repository: https://gitlab.com/sergio.verduzco/public_materials/-/tree/master/adaptive_plasticity (copy archived at swh:1:rev:482c0659d6e90b30a4a1acb4ab3e3a03dfd902c4). Instructions are in the "readme.md" file. Briefly: Prerequisites for running the notebooks are: Python 3.5 or above (https://www.python.org); Jupyter (https://jupyter.org); Draculab (https://gitlab.com/sergio.verduzco/draculab). Please see the links above for detailed installation instructions.

References

  1. Book
    1. Helmchen F
    (1999)
    Dendrites as Biochemical Compartments
    Oxford, England: Oxford University Press.
  2. Book
    1. Jankowska E
    (2013)
    Neuroscience in the 21st Century: From Basic to Clinical
    New York, NY: Springer.
  3. Book
    1. Kaas JH
    2. Collins CE
    (2003) The Primate Visual System
    New York, United States: John Wiley & Sons, Ltd.
    https://doi.org/10.1201/9780203507599
  4. Book
    1. Keener JP
    (1995)
    Principles Of Applied Mathematics: Transformation And Approximation
    New York, United States: Avalon Publishing.
  5. Book
    1. Kelso JAS
    (2009)
    Progress in Motor Control: A Multidisciplinary Perspective, Advances in Experimental Medicine and Biology
    Sternad D, editors. Boston, MA: Springer US.
  6. Conference
    1. Li W
    2. Todorov E
    3. Pan X
    (2005) Hierarchical Feedback and Learning for Multi-joint Arm Movement Control
    In 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference. pp. 4400–4403.
    https://doi.org/10.1109/IEMBS.2005.1615441
    1. Mussa–Ivaldi FA
    2. Bizzi E
    (2000) Motor learning through the combination of primitives
    Philosophical Transactions of the Royal Society of London. Series B 355:1755–1769.
    https://doi.org/10.1098/rstb.2000.0733
  7. Book
    1. Powers WT
    (2005)
    Behavior: The Control of Perception
    New Canaan, CT, US: Benchmark Press.
  8. Book
    1. Schöner G
    2. Tekülve J
    3. Zibner S
    (2018)
    Reaching for Objects: A Neural Process Account in A Developmental Perspective
    Oxfordshire, United Kingdom: Taylor & Francis Group Logo.
  9. Book
    1. Shadmehr R
    2. Wise SP
    (2005)
    The Computational Neurobiology of Reaching and Pointing: A Foundation for Motor Learning
    MIT Press.
  10. Book
    1. Sutton RS
    2. Barto AG
    (2018)
    Reinforcement Learning: An Introduction
    MIT Press.

Decision letter

  1. Juan Álvaro Gallego
    Reviewing Editor; Imperial College London, United Kingdom
  2. Tamar R Makin
    Senior Editor; University of Cambridge, United Kingdom
  3. Marco Capogrosso
    Reviewer; University of Pittsburgh, United States
  4. Rune W Berg
    Reviewer; University of Copenhagen, Denmark

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Decision letter after peer review:

Thank you for submitting your article "Adaptive plasticity in the spinal cord can produce reaching from scratch and reproduces motor cortex directional tuning" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Tamar Makin as the Senior Editor. The following individuals involved in the review of your submission have agreed to reveal their identity: Marco Capogrosso (Reviewer #2); Rune W Berg (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission. The general consensus was that this is an interesting model, which will be a valuable contribution to the literature provided that additional controls and clarifications are provided.

Below you will find the individual reviewers comments. In addition to these, the editor prepared for the authors a summary of the essential revisions required. This is based on both the individual comments and the extensive following discussion between the reviewers and editor.

Essential revisions:

1. The paper focuses on an intriguing new hypothesis: that the spinal cord minimizes an error signal generated in motor cortex. Given the novelty of this hypothesis and its implications for the present manuscript, it is critical that the authors provide additional controls/analyses that specifically address it. These include:

a) What happens if error minimisation happened in the motor cortex rather than in the spinal cord; would the model still produce reaching movements? Do the cortical neurons look like actual neurons (distribution of directional tuning, rotational dynamics)? Do "synergies" still emerge in the spinal cord?

b) Does a model with the same hierarchical structure that doesn't perform error minimisation and is still capable of reaching exhibit the properties that the current model does (in terms of motor cortical activity, synergies, etc)?

2. The authors posit that motor cortex reports and error that is being minimised during learning. Under this assumption, the motor cortical signals should be reduced to zero, which is what indeed seems to happen, e.g., in Figure 2E. Nevertheless, it is known that the activity of motor cortical neurons is not zero during the execution of a correct movement, even in highly trained individuals: e.g., trained monkeys have rotation dynamics when reaching (Churchland Cunningham et al. Nature 2012) or hand cycling (Russo et al. Neuron 2018). How can these observations be reconciled with the current model?

3. The cerebellum is mentioned a couple of times but not included in the model. Based on the known fact that the cerebellum receives and integrates both massive corticopontine input and spinocerebellar proprioceptive input, its role is hence often assumed to be exactly what the motor cortex does in this model. The rationale behind not letting Cerebellum assume this role, but instead making the motor cortex assume this role, is not clear.

4. The authors claim that previously known phenomena (convergent force fields and rotational dynamics) "emerge" with their learning model. However, this is not supported by the results. Firstly, when they tested the convergent force field, the spinal cord was separated from the descending and ascending connections with cortex. This means that this observed property is independent of having learned the cortico-spinal connections. In addition, rotational dynamics were almost exclusively observed in the one-to-one cortical coupling, which the authors call a "less-plausible model. Therefore, these phenomena are not an emergent property that results from their learning model. It is necessary to specify what are the essential factors for each observed phenomenon.

5. This model has attributed all of the learning effects to the synaptic connection between the cortex and spinal cord (descending and ascending connections). However, it has been repeatedly demonstrated that plastic changes within the cortex occur during motor learning (Roth et al. Neuron 2020). How is the physiological relevance of attributing all the learning to descending and ascending pathways justified?

6. In relation to the previous comment: an equivalent circuit could be achieved by changing the intracortical synaptic connections. Why is the corticospinal connection the sole target of learning, and how do the results change (or remain unchanged) if learning happens in other parts of the circuit?

7. No model can reproduce the whole CNS. However, the reviewers think that a few key features should be added to the model:

a) The authors have not included any form of feedback in the spinal cord. Indeed, short-latency stretch reflexes are critical for movement: coupled with motoneurons, they are the basic units to generate movement (Xu, Tianqui et al. PNAS 2018); models of human walking show that gait can emerge only considering sensory inputs to the cord (Song and Geyer J Physiol 2015); and their absence prevents recovery from spinal cord injury (Takeoka, Aya et al. Cell 2014). Ia afferents have perhaps the strongest monosynaptic glutamatergic synaptic input to spinal motoneurons and constitute a primary driver of spinal motoneuron excitability which this model is currently missing. Does adding short-latency feedback change the conclusion of the paper?

b) As the authors discussed in the manuscript, the model omits many components that are known to be involved in learning and, as a result, the model reproduces the impaired motor control rather than the intact state. However, this prevents us from verifying whether the model properly reproduced the biological motor control. Specifically, they claim that the model reproduced ataxic reaching which resemble to the cerebellum patients. However, it is not surprising that a poorly tuned feedback controller (i.e. PID controller) shows an oscillation. Therefore, it is necessary for the authors to show that their model significantly resembles the biological motor control by comparing to the model which share the hierarchical structures but the error is corrected within the cortex and only motor commands descend to the spinal cord.

c) 20 % of corticospinal projections to motoneurons innervating the arm and particularly the hand are thought to be monosynaptic. Does their addition change the present results?

d) The proportions and the connectivity of the network make the model seem less biologically plausible, which should at least be addressed in the text. First, the number of neurons in the populations at various levels seems rather small compared to the immense size of these structures. Why not have more neurons in the networks? This applies to all parts of the model, also to the spinal network. In many species, the ratio of motor neurons to interneurons is about 1:10, yet in this model, it is more like 1:2 (the trio). Also, the connectivity has a bias towards the feedforward network, yet local recurrent connectivity seems to be widely present in the nervous system. The proportions in the connectivity from M to C could also be problematic. All the spinal units (c = [c1, …. ,cN]) receive commands/error signal from M (e = [e1, …. ,eM]). Is M>N? Presuming M is > N this would represent a converging input from M to C, and that is problematic since the corticospinal projections are rather sparse. More likely there is diverging connectivity from M to C. It is unclear to me if this is an important problem, it may not be, but it would be appropriate to address such proportions in the text.

e) The model does not include any form of "pattern generator", a limitation the authors are aware of, and denote as the "supraspinal pattern formation" problem. They seem to export this to another area of the brain while the motor cortex-spinal cord loop takes care of an appropriate execution. This needs to be clarified and/or addressed with the model.

f) It is well-known that the spinal cord autonomously can produce motor behaviors. It is unclear whether C in the model is capable of this.

g) Regarding the synergistic model, their model has spinal interneurons recruiting two motoneuron pools, which is very different from the muscle synergies observed in animals (d'Avella et al., J Neurosci 2006). It is necessary to justify why this type of synergies should be used here.

Reviewer #1 (Recommendations for the authors):

The authors aimed to study corticospinal control of reach movements. More specifically they aim to understand how an unsupervised system could learn to reach and whether that system would then show properties observed in experiments. For this, the authors conceived a neural network model with defined hierarchy and architecture coupled to a simple biomechanical arm model in a closed-loop that is capable of learning a reach-out task. Interestingly, the model shows several properties found in experimental data such as directional tuning of motor cortical units as well as discrete force fields in the spinal cord.

Strengths

The paper has many strengths. It is rigorous and well written, and while not well articulated in the introduction I completely agree with the authors that computational models are necessary if we want to aim at any variation of the concept of "understanding" in relation to the central nervous system. Therefore their goal is absolutely significant and relevant.

I also particularly liked the approach to start from specific hypotheses on the structure and function of the different modeled layers while leaving completely free the organization and strength of synapses in the system. This approach is elegant because starting from clear hypotheses it allows to observe what properties emerge "spontaneously" and compare these properties with experimental data.

The results obtained are reasonable in terms of kinematics and it's interesting to observe the emergence of these properties from an error minimization rule imposed within the spinal cord.

Weaknesses

The conclusions that the authors make in the paper are very strong. Given the emergence of certain interesting properties within the nervous system, it is implicit the implication that this model can have some sort of general meaning, and most importantly, that it validates from the theoretical point of view the homeostatic control idea. More specifically, the paper starts with a very strong hypothesis, which is that the spinal cord is minimizing an error signal generated in motor cortex. For an experimental neuroscientist (as myself) this is intriguing but also strikes as a very strong interpretation of the role of what we call motor cortex that for decades has been imagined as an actuator of movements. Because of the importance of the implication I don't think that the properties shown in the paper are sufficient to make such a claim, even if implicit. Therefore I think that some form of control is required. I understand completely that models are hard to build, but I can't escape wondering what would happen if the motor cortex would not be producing an error signal. What if that error minimization happened instead in the motor cortex and not in the spinal cord? Would your model still produce reach movements? Would those cortical neurons show directionality? Would "synergies" still emerge in the cord? I think this is an important point if we're claiming that those properties emerge as a consequence of the fact that the motor cortex is calculating an error and the spinal cord is minimizing that error. Therefore, to validate the main hypothesis of the paper, which is that the motor cortex is calculating an error and the spinal cord is minimizing that error, then I would like to see that a model that does not do that, but it's still capable of reaching while maintaining the same hierarchical architecture would not show the properties that the authors found in their model.

The second weakness is somehow related to the first. There are certain assumptions on the structure that are not true experimentally and it would be important to incorporate these because they may change the observed properties and could thus be used as further validation.

The first missing property is direct cortico-spinal control of spinal motoneurons. The authors say in the introduction that the motor cortex does not activate spinal motoneurons directly and only connects to interneurons. Instead, the major difference between primates (including humans) and all the other animals is precisely the existence of direct monosynaptic connections between the motor cortex and muscles of the upper limb (particularly the hand). These monosynaptic components are thought to represent about 20% of the cortico-spinal tract. It would be important to see whether their presence changes the results that they obtained.

The second more important functional inaccuracy concerns the spinal cord. The authors send their sensory signals to the "thalamus" which then relays them to the "sensory cortex" and "motor cortex". In the supplementary material they described the fact that they decided not to include what they call "short reflex" that would connect "the thalamus" to the spinal cord, or provide a feedback in some way. Therefore in their model, the spinal cord is not receiving a feedback at all. Unfortunately, this is in stark contrast with the actual structure of the spinal cord. Proprioceptive afferents are pivotal members of the spinal motor infrastructure. They are so important than in simple animals, coupled with motoneurons, they constitute the units that generate movement (Xu, Tianqi, et al. "Descending pathway facilitates undulatory wave propagation in Caenorhabditis elegans through gap junctions." Proceedings of the National Academy of Sciences 115.19 (2018): E4493-E4502.); models of human locomotion show that it is possible to build walking models without pattern generators that only consider sensory inputs int he spinal cord (Song, S., and Geyer, H. (2015). A neural circuitry that emphasizes spinal feedback generates diverse behaviours of human locomotion. J. Physiol. 593, 3493-3511.) and they seem to be critical to direct motor learning and recovery after spinal cord injury (Takeoka, Aya, et al. "Muscle spindle feedback directs locomotor recovery and circuit reorganization after spinal cord injury." Cell 159.7 (2014): 1626-1639.) Most importantly Ia afferents have perhaps the strongest monosynaptic glutamatergic synaptic input to the spinal motoneurons and constitute a primary driver of spinal motoneuron excitability which this model is currently missing. Additionally, some of the agonist-antagonist relationship that the authors find could be significantly changed by the presence of these afferents because Ia afferents connect not only to homologous motoneurons but up to 60% of motoneurons of synergistic muscles and they inhibit antagonists via Ia inhibitory interneurons (See Moraud E. (2016) Mechanisms Underlying the Neuromodulation of Spinal Circuits for Correcting Gait and Balance Deficits after Spinal Cord Injury, Neuron).

In consequence, the presence of this "short-feedback" seems paramount to me to have an accurate representation of the spinal cord, and it's important to verify that this structure would not change the main conclusions of the paper.

Indeed, as the authors have correctly implied, the main difference between the nervous system and an artificial neural network is the underlying defined architecture and hierarchy that must be represented at least for the most important elements. And spinal sensory feedback is a building block of movement and learning. Indeed, completely paralyzed rats are capable of learning different walking patterns when totally disconnected from the brain if excitation is provided by means of electrical stimulation and likely sensory afferents are the driver of this learning (Courtine, Grégoire, et al., "Transformation of nonfunctional spinal circuits into functional states after the loss of brain input." Nature neuroscience 12.10 (2009): 1333-1342.).

Reviewer #2 (Recommendations for the authors):

In this study, the authors investigate how the motor cortex and spinal cord interact in a long-loop reflex organization, from a computational modeling perspective. The motor cortex represents kinematic aspects, i.e. the actual movement of the hand and arm, as opposed to dynamics, which would involve the forces exerted by the muscles or higher-level parameters of the movement. Nevertheless, to achieve the end goal of a certain movement of the hand, a transformation from actuator (muscle) coordinates has to take place, and this is a challenging process. The authors propose that the motor cortex report an error between the desired and the actual movement, use computational modeling to explain how a flexible error reduction mechanism can be achieved and explained by computational modeling of the interplay between cortex and the spinal cord.

Strengths: A theoretical analysis of the interplay between the spinal cord and motor cortex has rarely been studied previously especially using the long-loop reflex, dynamics, pattern formation, and plasticity. This theoretical contribution is important, especially in the light of the vast new experimental data that is being generated by e.g. calcium imaging and electrophysiology and the new observations of rotational cortical dynamics. It is also rare that the role of the spinal cord is being incorporated into the executing of movement in parallel with the motor cortex.

Weaknesses: Some of the assumptions hinges on a flexible error reduction mechanism, which is described in a previous (unpublished) study, which seems somewhat fragile. The model does not really get to the root of the problem of how the nervous system generates the desired movement, which the authors are aware of and call the "supra spinal pattern formation" problem. They seem to export this problem to another area of the brain and describe how the motor cortex-spinal cord loop takes care of an appropriate execution.

The study evaluates multiple aspects of the model in comparison with experimental observation such as direction tuning curve, rotational dynamics, and learning.

Overall, this is an interesting and well-executed study.

Specific comments:

There are a couple of issues regarding the model that is not so clear (or perhaps well-hidden somewhere in the supplement). It is stated that the role of the motor cortex is to report the "error signal" of the movement, i.e. the signal coming from SPA, which is the difference between the desired movement (SP) and the actual movement (SA). First, if the motor cortex is reporting the error, and this error is being minimized while learning, then the signal in the motor cortex should ideally be reduced to zero, which is also what seems to happen in e.g. Figure 2E. Nevertheless, it is known that activity in the motor cortex is not zero during the execution of correct movement, especially in highly trained individuals, e.g. trained monkeys have rotational dynamics when hand-cycling (Churchland 2012). This also seems to be the case in the current study with directional tuning (Figure 5) and rotation, which appear contradictory to me. This is a bit unclear how this is possible. Second, the cerebellum is mentioned a couple of times but not included in the model. It is known that the cerebellum receives and integrates both massive corticopontine input and spinocerebellar proprioceptive input and the role of the cerebellum is hence often assumed to do exactly what the motor cortex does in this model. The rationale behind not letting Cerebellum assume this role, but instead making the motor cortex assume this role, is not clear to me.

The current model does not explain what the authors call the "supra spinal pattern formation" problem, i.e. how the motor commands arise in the neuronal networks in the cortex and elsewhere. The problem of motor commands is conveniently exported to the external parameter "SP" module that is generating the intended movement. This is for a good reason since it is indeed a hard problem to explain how motor programs arise. I do not see a solution to this issue, only that it is a limitation to be aware of, which the authors seem to be. Nevertheless, the model proposed by the authors makes an effort to explain some of the dynamics of the different regions involved in properly learning and executing the motor commands.

I have some concerns regarding the network, the proportions, and the connectivity that make the model seem less biologically plausible, which should at least be addressed in the text. First, the number of neurons in the populations at various levels seems rather small compared to the immense size of these structures. Why not have more neurons in the networks? This applies to all parts of the model, also to the spinal network. In many species, the ratio of motor neurons to interneurons is about 1:10, yet in this model, it is more like 1:2 (the trio). Also, the connectivity has a bias towards the feedforward network, yet local recurrent connectivity seems to be widely present in the nervous system. The proportions in the connectivity from M to C could also be problematic. All the spinal units (c = [c1, …. ,cN]) receive commands/error signal from M (e = [e1, …. ,eM]). Is M>N? Presuming M is > N this would represent a converging input from M to C, and that is problematic since the corticospinal projections are rather sparse. More likely there is diverging connectivity from M to C. It is unclear to me if this is an important problem, it may not be, but it would be appropriate to address such proportions in the text.Reviewer #3 (Recommendations for the authors):

This study proposes a new computational model for learning upper limb reaching control. In particular, this study develops a model that satisfies biological plausibility, which has not always been satisfied by previous models: (1) a network of the whole sensorimotor loop, (2) a model using neural elements, (3) learning using local information, and (4) continuous online learning. Specifically, their model showed that learning in the synaptic weights of descending and ascending projections between the cortex and spinal cord can reproduce ataxic reaching often observed in cerebellar patients. Furthermore, various phenomena that have been demonstrated in the motor control field emerge with this model. Based on these results, the authors argue that this is a biologically plausible learning model for animals' reaching control.

Strength:

1) This study examines a number of previous motor learning models and defines the constraints to develop a biologically plausible model. Given that there are too many parameters to be considered to build the motor control model, I truly appreciate the authors defining the constraints for a biological motor learning model and clarifying the research goal.

Weakness:

1) As the authors discussed in the manuscript, the model omits many components that are known to be involved in learning and, as a result, the model reproduces the impaired motor control rather than the intact state. However, this prevents us from verifying whether the model properly reproduced the biological motor control. Specifically, they claim that the model reproduced ataxic reaching which resemble to the cerebellum patients. However, it is not surprising that a poorly tuned feedback controller (i.e. PID controller) shows an oscillation. Therefore, it is necessary for the authors to show that their model significantly resembles the biological motor control by comparing to the model which shares the hierarchical structures but the error is corrected within the cortex and only motor commands descend to the spinal cord.

2) This model has attributed all of the learning effects to the synaptic connection between the cortex and spinal cord (descending and ascending connections). However, it has been repeatedly demonstrated that plastic changes within the cortex occur during motor learning [1 and others]. Therefore, the physiological relevance of attributing everything to descending and ascending pathways is questionable.

[1] Roth, R. H. et al. Cortical Synaptic AMPA Receptor Plasticity during Motor Learning. Neuron 105, 895-908.e5 (2020).

3) Relating to the previous comment, this model only changed the connection with the spinal cord, but the equivalent circuit can be achieved by changing the intracortical synaptic connection. It is necessary to justify why the connection with the spinal cord should be the sole target of learning.

4) The authors claim that previously known phenomena (convergent force field and rotational dynamics) "emerge" with this learning model. However, this is not supported by the results. Firstly, when they tested the convergent force field, the spinal cord is separated from the descending and ascending connections with cortex. This means that this observed property is independent of the learning of the cortico-spinal connections. In addition, rotational dynamics is almost exclusively observed in the one-to-one cortical coupling, which the authors call a "less-plausible model. Therefore, these phenomena are not emergent properties as a result of their learning model. It is necessary to specify what are the essential factors for each observed phenomenon.

5) Regarding the synergistic model, their model has spinal interneurons recruiting two motoneuron pools, which is very different from the muscle synergies observed in animals

[2]. It is necessary to justify why this type of synergies should be used here.

[2] d'Avella A, Portone A, Fernandez L, Lacquaniti F (2006) Control of fast-reaching movements by muscle synergy combinations. The Journal of neuroscience: the official journal of the Society for Neuroscience 26:7791-7810

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Self-configuring feedback loops for sensorimotor control" for further consideration by eLife. Your revised article has been evaluated by Tamar Makin (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

Although the reviewers were satisfied by the new analyses, they were also unanimous in thinking that the manuscript is hard to follow and that this complexity could make it less broadly appealing. Our recommendation is thus for the authors to try to simplify the text, minimising the mental burden to the reader. The main concerns were:

1) The five different "configurations" are not well explained/motivated throughout the paper.

2) Figure captions are too short, which makes it hard to understand the results (e.g., what is Figure 8 about?).

3) There are too many abbreviations, which again does not help comprehend the text.

4) The Discussion is hard to follow, foremost because it requires that the readers remember all five configurations, which were presented in the much earlier -and very long- Results section.

5) The overall logic could be potentially streamlined so the study becomes easier to understand.

In addition, you may want to consider addressing the comment by Reviewer #4, although it is not mandatory.

Reviewer #4 (Recommendations for the authors):

I really appreciate the authors revising the manuscript. The argument is now much clearer. I found that the authors are very open about what neural implementation this motor learning is occurring in, and it seems too early to suggest the neural implementation since the same learning can occur either in the spinal cord or in the cerebral cortex, or perhaps in other structures such as the cerebellum which is omitted by this model.

As a neurophysiologist, I am curious about what insights this computational model will provide into our understanding of the neural mechanism of motor learning. I understand that the key assumptions of this model are that (1) errors vary monotonically with motor command and (2) differential Hebbian learning at the output connections reduces the errors. The second assumption has already been discussed for possible implementation in the spinal cord. However, the validity of the first assumption is not yet clear. I would like to suggest the authors elaborate on what neural mechanism could transform errors in the multiple sensory systems (e.g visual systems) into the monotonic error such as muscle length. It is also important to mention where the desired somatosensory activity (Sp) comes from.

https://doi.org/10.7554/eLife.77216.sa1

Author response

Essential revisions:

1. The paper focuses on an intriguing new hypothesis: that the spinal cord minimizes an error signal generated in motor cortex. Given the novelty of this hypothesis and its implications for the present manuscript, it is critical that the authors provide additional controls/analyses that specifically address it. These include:

a) What happens if error minimisation happened in the motor cortex rather than in the spinal cord; would the model still produce reaching movements? Do the cortical neurons look like actual neurons (distribution of directional tuning, rotational dynamics)? Do "synergies" still emerge in the spinal cord?

b) Does a model with the same hierarchical structure that doesn't perform error minimisation and is still capable of reaching exhibit the properties that the current model does (in terms of motor cortical activity, synergies, etc)?

a) We have explicitly modeled the scenario where the input-output structure of the controller is resolved in motor cortex. We now emphasize that the results of the paper do not depend on whether this happens in motor cortex or spinal cord, or even on the learning rule. The main ingredients are properly configured feedback control, and a minimally realistic model of the sensorimotor loop. To leave no ambiguity about this point, we include a version of our model with no plasticity (configuration 3).

b) All the model configurations that we present in the paper show all experimental phenomena. Hopefully this will clarify this point.

2. The authors posit that motor cortex reports and error that is being minimised during learning. Under this assumption, the motor cortical signals should be reduced to zero, which is what indeed seems to happen, e.g., in Figure 2E. Nevertheless, it is known that the activity of motor cortical neurons is not zero during the execution of a correct movement, even in highly trained individuals: e.g., trained monkeys have rotation dynamics when reaching (Churchland Cunningham et al. Nature 2012) or hand cycling (Russo et al. Neuron 2018). How can these observations be reconciled with the current model?

The error being minimized does not entail that motor cortex activity disappears. We expand on this point on our reply to reviewer 2.

3. The cerebellum is mentioned a couple of times but not included in the model. Based on the known fact that the cerebellum receives and integrates both massive corticopontine input and spinocerebellar proprioceptive input, its role is hence often assumed to be exactly what the motor cortex does in this model. The rationale behind not letting Cerebellum assume this role, but instead making the motor cortex assume this role, is not clear.

The role assigned to learning in this model, and the role typically ascribed to the cerebellum are not actually the same.

Please see our reply to reviewer 2.

4. The authors claim that previously known phenomena (convergent force fields and rotational dynamics) "emerge" with their learning model. However, this is not supported by the results. Firstly, when they tested the convergent force field, the spinal cord was separated from the descending and ascending connections with cortex. This means that this observed property is independent of having learned the cortico-spinal connections. In addition, rotational dynamics were almost exclusively observed in the one-to-one cortical coupling, which the authors call a "less-plausible model. Therefore, these phenomena are not an emergent property that results from their learning model. It is necessary to specify what are the essential factors for each observed phenomenon.

We did not intend to claim that the phenomena in the paper arise from learning, and we hope that the current manuscript is clear about this point. In the case of an isolated spinal cord the synapses that learn are not present, so the phenomena is clearly an effect of the musculoskeletal model, and of the structure in the spinal cord circuit. We also try to make clearer that the rotational dynamics emerge for all the included configurations of the model.

5. This model has attributed all of the learning effects to the synaptic connection between the cortex and spinal cord (descending and ascending connections). However, it has been repeatedly demonstrated that plastic changes within the cortex occur during motor learning (Roth et al. Neuron 2020). How is the physiological relevance of attributing all the learning to descending and ascending pathways justified?

As clarified in our responses to reviewers 2 and 3, we do not claim that all of the learning is being done in spinal cord. We expect many types of learning to happen along the sensorimotor loop; our model is only addressing how the structure of the feedback controller is obtained. Moreover, we remain open to this learning happening in the motor cortex.

6. In relation to the previous comment: an equivalent circuit could be achieved by changing the intracortical synaptic connections. Why is the corticospinal connection the sole target of learning, and how do the results change (or remain unchanged) if learning happens in other parts of the circuit?

We have addressed this by introducing the version of our model where learning happens in the connections from SPA to M (configuration 2).

7. No model can reproduce the whole CNS. However, the reviewers think that a few key features should be added to the model:

a) The authors have not included any form of feedback in the spinal cord. Indeed, short-latency stretch reflexes are critical for movement: coupled with motoneurons, they are the basic units to generate movement (Xu, Tianqui et al. PNAS 2018); models of human walking show that gait can emerge only considering sensory inputs to the cord (Song and Geyer J Physiol 2015); and their absence prevents recovery from spinal cord injury (Takeoka, Aya et al. Cell 2014). Ia afferents have perhaps the strongest monosynaptic glutamatergic synaptic input to spinal motoneurons and constitute a primary driver of spinal motoneuron excitability which this model is currently missing. Does adding short-latency feedback change the conclusion of the paper?

We have now included feedback from the muscle afferents to the spinal cord.

b) As the authors discussed in the manuscript, the model omits many components that are known to be involved in learning and, as a result, the model reproduces the impaired motor control rather than the intact state. However, this prevents us from verifying whether the model properly reproduced the biological motor control. Specifically, they claim that the model reproduced ataxic reaching which resemble to the cerebellum patients. However, it is not surprising that a poorly tuned feedback controller (i.e. PID controller) shows an oscillation. Therefore, it is necessary for the authors to show that their model significantly resembles the biological motor control by comparing to the model which share the hierarchical structures but the error is corrected within the cortex and only motor commands descend to the spinal cord.

As commented to reviewer 3, we hopefully have gone beyond showing that a feedback controller shows an oscillation when explaining common points between the model and qualitative traits of ataxic movements.

c) 20 % of corticospinal projections to motoneurons innervating the arm and particularly the hand are thought to be monosynaptic. Does their addition change the present results?

We have included projections with plastic synapses from motor cortex to the α motoneurons to illustrate that this does not change the results.

d) The proportions and the connectivity of the network make the model seem less biologically plausible, which should at least be addressed in the text. First, the number of neurons in the populations at various levels seems rather small compared to the immense size of these structures. Why not have more neurons in the networks? This applies to all parts of the model, also to the spinal network. In many species, the ratio of motor neurons to interneurons is about 1:10, yet in this model, it is more like 1:2 (the trio). Also, the connectivity has a bias towards the feedforward network, yet local recurrent connectivity seems to be widely present in the nervous system. The proportions in the connectivity from M to C could also be problematic. All the spinal units (c = [c1, …. ,cN]) receive commands/error signal from M (e = [e1, …. ,eM]). Is M>N? Presuming M is > N this would represent a converging input from M to C, and that is problematic since the corticospinal projections are rather sparse. More likely there is diverging connectivity from M to C. It is unclear to me if this is an important problem, it may not be, but it would be appropriate to address such proportions in the text.

We address these limitations in the paper, and in our response to reviewer 2.

e) The model does not include any form of "pattern generator", a limitation the authors are aware of, and denote as the "supraspinal pattern formation" problem. They seem to export this to another area of the brain while the motor cortex-spinal cord loop takes care of an appropriate execution. This needs to be clarified and/or addressed with the model.

As clarified in the paper, generation of desired sensory perceptions is beyond the scope of this paper, but an approach is outlined in our previous work (reference 11).

f) It is well-known that the spinal cord autonomously can produce motor behaviors. It is unclear whether C in the model is capable of this.

We show that an isolated spinal cord has convergent force fields, which would produce a rudimentary type of reaching. Beyond this we don’t believe our spinal cord model can do much.

g) Regarding the synergistic model, their model has spinal interneurons recruiting two motoneuron pools, which is very different from the muscle synergies observed in animals (d'Avella et al., J Neurosci 2006). It is necessary to justify why this type of synergies should be used here.

As explained to reviewer 3, we used a natural proof of principle for the viability of Motor Synergy Encoders (reference 56). Whether the model can express other type of synergies is an interesting avenue for future research.

Reviewer #1 (Recommendations for the authors):

The authors aimed to study corticospinal control of reach movements. More specifically they aim to understand how an unsupervised system could learn to reach and whether that system would then show properties observed in experiments. For this, the authors conceived a neural network model with defined hierarchy and architecture coupled to a simple biomechanical arm model in a closed-loop that is capable of learning a reach-out task. Interestingly, the model shows several properties found in experimental data such as directional tuning of motor cortical units as well as discrete force fields in the spinal cord.

Strengths

The paper has many strengths. It is rigorous and well written, and while not well articulated in the introduction I completely agree with the authors that computational models are necessary if we want to aim at any variation of the concept of "understanding" in relation to the central nervous system. Therefore their goal is absolutely significant and relevant.

I also particularly liked the approach to start from specific hypotheses on the structure and function of the different modeled layers while leaving completely free the organization and strength of synapses in the system. This approach is elegant because starting from clear hypotheses it allows to observe what properties emerge "spontaneously" and compare these properties with experimental data.

The results obtained are reasonable in terms of kinematics and it's interesting to observe the emergence of these properties from an error minimization rule imposed within the spinal cord.

Weaknesses

The conclusions that the authors make in the paper are very strong. Given the emergence of certain interesting properties within the nervous system, it is implicit the implication that this model can have some sort of general meaning, and most importantly, that it validates from the theoretical point of view the homeostatic control idea. More specifically, the paper starts with a very strong hypothesis, which is that the spinal cord is minimizing an error signal generated in motor cortex. For an experimental neuroscientist (as myself) this is intriguing but also strikes as a very strong interpretation of the role of what we call motor cortex that for decades has been imagined as an actuator of movements. Because of the importance of the implication I don't think that the properties shown in the paper are sufficient to make such a claim, even if implicit. Therefore I think that some form of control is required. I understand completely that models are hard to build, but I can't escape wondering what would happen if the motor cortex would not be producing an error signal. What if that error minimization happened instead in the motor cortex and not in the spinal cord? Would your model still produce reach movements? Would those cortical neurons show directionality? Would "synergies" still emerge in the cord? I think this is an important point if we're claiming that those properties emerge as a consequence of the fact that the motor cortex is calculating an error and the spinal cord is minimizing that error. Therefore, to validate the main hypothesis of the paper, which is that the motor cortex is calculating an error and the spinal cord is minimizing that error, then I would like to see that a model that does not do that, but it's still capable of reaching while maintaining the same hierarchical architecture would not show the properties that the authors found in their model.

Thank you for the various observations up to this point. The question of motor cortex as a generator of errors or as an actuator is a subtle one. We want to raise the possibility that the spinal cord could be an actuator, but as we make now clear, this is not the central theme of the paper. We have restructured the paper to clarify this.

The paper makes two main points: (1) feedback control with a realistic enough musculoskeletal/neural model can explain many experimental phenomena, and (2) This feedback control can be adaptively configured through differential Hebbian learning rules. To leave no room for misunderstanding we added two new configurations. One of these has no learning at all, it is just a feedback controller (configuration 3). The other one (configuration 2) has the learning in motor cortex rather than the spinal cord. All the results hold for both of these configurations.

The second weakness is somehow related to the first. There are certain assumptions on the structure that are not true experimentally and it would be important to incorporate these because they may change the observed properties and could thus be used as further validation.

The first missing property is direct cortico-spinal control of spinal motoneurons. The authors say in the introduction that the motor cortex does not activate spinal motoneurons directly and only connects to interneurons. Instead, the major difference between primates (including humans) and all the other animals is precisely the existence of direct monosynaptic connections between the motor cortex and muscles of the upper limb (particularly the hand). These monosynaptic components are thought to represent about 20% of the cortico-spinal tract. It would be important to see whether their presence changes the results that they obtained.

We did not mean to convey that direct motoneuron connections are entirely absent. The point that we want to convey is that the most dominant form of connectivity is through interneuron connections. As far as we know, direct connections are found in advanced primates, and mostly for distal joints, accounting more for manual dexterity than for reaching.

The good news is that our learning rule works well after introducing these direct connections, which are now part of the model.

The second more important functional inaccuracy concerns the spinal cord. The authors send their sensory signals to the "thalamus" which then relays them to the "sensory cortex" and "motor cortex". In the supplementary material they described the fact that they decided not to include what they call "short reflex" that would connect "the thalamus" to the spinal cord, or provide a feedback in some way. Therefore in their model, the spinal cord is not receiving a feedback at all. Unfortunately, this is in stark contrast with the actual structure of the spinal cord. Proprioceptive afferents are pivotal members of the spinal motor infrastructure. They are so important than in simple animals, coupled with motoneurons, they constitute the units that generate movement (Xu, Tianqi, et al. "Descending pathway facilitates undulatory wave propagation in Caenorhabditis elegans through gap junctions." Proceedings of the National Academy of Sciences 115.19 (2018): E4493-E4502.); models of human locomotion show that it is possible to build walking models without pattern generators that only consider sensory inputs int he spinal cord (Song, S., and Geyer, H. (2015). A neural circuitry that emphasizes spinal feedback generates diverse behaviours of human locomotion. J. Physiol. 593, 3493-3511.) and they seem to be critical to direct motor learning and recovery after spinal cord injury (Takeoka, Aya, et al. "Muscle spindle feedback directs locomotor recovery and circuit reorganization after spinal cord injury." Cell 159.7 (2014): 1626-1639.) Most importantly Ia afferents have perhaps the strongest monosynaptic glutamatergic synaptic input to the spinal motoneurons and constitute a primary driver of spinal motoneuron excitability which this model is currently missing. Additionally, some of the agonist-antagonist relationship that the authors find could be significantly changed by the presence of these afferents because Ia afferents connect not only to homologous motoneurons but up to 60% of motoneurons of synergistic muscles and they inhibit antagonists via Ia inhibitory interneurons (See Moraud E. (2016) Mechanisms Underlying the Neuromodulation of Spinal Circuits for Correcting Gait and Balance Deficits after Spinal Cord Injury, Neuron).

In consequence, the presence of this "short-feedback" seems paramount to me to have an accurate representation of the spinal cord, and it's important to verify that this structure would not change the main conclusions of the paper.

Indeed, as the authors have correctly implied, the main difference between the nervous system and an artificial neural network is the underlying defined architecture and hierarchy that must be represented at least for the most important elements. And spinal sensory feedback is a building block of movement and learning. Indeed, completely paralyzed rats are capable of learning different walking patterns when totally disconnected from the brain if excitation is provided by means of electrical stimulation and likely sensory afferents are the driver of this learning (Courtine, Grégoire, et al., "Transformation of nonfunctional spinal circuits into functional states after the loss of brain input." Nature neuroscience 12.10 (2009): 1333-1342.).

This is a very important observation. We agree that there is an important role to be played by the connections from muscle afferents to interneurons. To address this omission we have included these connections, under the hypothesis that they play a role similar to connections from afferents to motor cortex, namely, predictive error reduction.

Reviewer #2 (Recommendations for the authors):

In this study, the authors investigate how the motor cortex and spinal cord interact in a long-loop reflex organization, from a computational modeling perspective. The motor cortex represents kinematic aspects, i.e. the actual movement of the hand and arm, as opposed to dynamics, which would involve the forces exerted by the muscles or higher-level parameters of the movement. Nevertheless, to achieve the end goal of a certain movement of the hand, a transformation from actuator (muscle) coordinates has to take place, and this is a challenging process. The authors propose that the motor cortex report an error between the desired and the actual movement, use computational modeling to explain how a flexible error reduction mechanism can be achieved and explained by computational modeling of the interplay between cortex and the spinal cord.

Strengths: A theoretical analysis of the interplay between the spinal cord and motor cortex has rarely been studied previously especially using the long-loop reflex, dynamics, pattern formation, and plasticity. This theoretical contribution is important, especially in the light of the vast new experimental data that is being generated by e.g. calcium imaging and electrophysiology and the new observations of rotational cortical dynamics. It is also rare that the role of the spinal cord is being incorporated into the executing of movement in parallel with the motor cortex.

Weaknesses: Some of the assumptions hinges on a flexible error reduction mechanism, which is described in a previous (unpublished) study, which seems somewhat fragile. The model does not really get to the root of the problem of how the nervous system generates the desired movement, which the authors are aware of and call the "supra spinal pattern formation" problem. They seem to export this problem to another area of the brain and describe how the motor cortex-spinal cord loop takes care of an appropriate execution.

The study evaluates multiple aspects of the model in comparison with experimental observation such as direction tuning curve, rotational dynamics, and learning.

Overall, this is an interesting and well-executed study.

Thank you for your kind comments. Regarding the weaknesses of the study, I would like to reiterate my initial reply to the reviewer 1: a central theme of this is study is the explanatory power arising from feedback control and a full model of the sensorimotor loop. This is independent of the learning mechanism. Also, our previous study is now published (reference 11 in the manuscript).

Specific comments:

There are a couple of issues regarding the model that is not so clear (or perhaps well-hidden somewhere in the supplement). It is stated that the role of the motor cortex is to report the "error signal" of the movement, i.e. the signal coming from SPA, which is the difference between the desired movement (SP) and the actual movement (SA). First, if the motor cortex is reporting the error, and this error is being minimized while learning, then the signal in the motor cortex should ideally be reduced to zero, which is also what seems to happen in e.g. Figure 2E. Nevertheless, it is known that activity in the motor cortex is not zero during the execution of correct movement, especially in highly trained individuals, e.g. trained monkeys have rotational dynamics when hand-cycling (Churchland 2012). This also seems to be the case in the current study with directional tuning (Figure 5) and rotation, which appear contradictory to me. This is a bit unclear how this is possible. Second, the cerebellum is mentioned a couple of times but not included in the model. It is known that the cerebellum receives and integrates both massive corticopontine input and spinocerebellar proprioceptive input and the role of the cerebellum is hence often assumed to do exactly what the motor cortex does in this model. The rationale behind not letting Cerebellum assume this role, but instead making the motor cortex assume this role, is not clear to me.

We did not wish to convey the idea that the role of motor cortex is to convey an error, serving as some sort of rely station, and we apologize if that’s the case. Surely, there is more interesting learning being done in motor cortex, and we try to remain open to that possibility. This is one of the reasons for having a configuration of the model where motor cortex neurons are driven by more than one error signal. This can also observed in the fact that in our model motor cortex mixes error and afferent signals to produce a simple form of predictive control.

In the current manuscript we try to clarify that even if the error is reduced to zero the activity in motor cortex does not disappear. Activity in motor cortex can be very high, but if excitation and inhibition are balanced in the spinal cord movement will not be generated. Since there are “dual” subpopulations in M, high homogeneous activity may equally activate excitatory and inhibitory interneurons. A careful look at figures 2, and S1-S4, S8, S9 reveals that the M activity is far from ever being zero, and considering that the firing rate units represent averages over a population, some individual neurons could have very high activity in an equivalent spiking model. For all configurations the lowest stationary activity in M oscillates around 0.2 (figure S4).

The results with directional tuning and rotation come from the fact that large errors, such as those happening after the presentation of a new target (panel D, figures 2, S1-S4), cause a sudden increase in activity from certain M units. This activity quickly returns to baseline level, as the error is reduced.

We believe that the role being ascribed to learning in this system, and the role typically assumed from cerebellum have some subtle distinctions. In our model learning is meant to determine the input-output structure of the system: which muscle contracts in response to an error. For illustration purposes we may think of the classic vestibulo-ocular reflex. Rotation of the head in one direction causes rotation of the eyeball in the opposite direction to stabilize gaze. The inputoutput structure of this system is determined: the head and the eyeball rotate in opposite directions. The degree of this reflex rotation, however, is adaptive, and the cerebellum plays a role here. It is generally thought that the cerebellum creates an internal model of the system being controlled, permitting model-predictive control, and thus increasing performance of the controller. Most cerebellar models following this paradigm assume that a feedback control system with a good-enough configuration is already in place.

The current model does not explain what the authors call the "supra spinal pattern formation" problem, i.e. how the motor commands arise in the neuronal networks in the cortex and elsewhere. The problem of motor commands is conveniently exported to the external parameter "SP" module that is generating the intended movement. This is for a good reason since it is indeed a hard problem to explain how motor programs arise. I do not see a solution to this issue, only that it is a limitation to be aware of, which the authors seem to be. Nevertheless, the model proposed by the authors makes an effort to explain some of the dynamics of the different regions involved in properly learning and executing the motor commands.

This is indeed a difficult problem. We believe that “exporting” the problem to finding a desired set of perceptions “SP” offers an advantage. First, it partially solves the part of creating the spatiotemporal pattern of neuronal activity producing meaningful movement: this arises from the feedback control. Secondly, we believe that finding appropriate “SP” patterns is a task that could be solved with the right hierarchical architecture. More details are in our previous paper (reference 11).

I have some concerns regarding the network, the proportions, and the connectivity that make the model seem less biologically plausible, which should at least be addressed in the text. First, the number of neurons in the populations at various levels seems rather small compared to the immense size of these structures. Why not have more neurons in the networks? This applies to all parts of the model, also to the spinal network. In many species, the ratio of motor neurons to interneurons is about 1:10, yet in this model, it is more like 1:2 (the trio). Also, the connectivity has a bias towards the feedforward network, yet local recurrent connectivity seems to be widely present in the nervous system. The proportions in the connectivity from M to C could also be problematic. All the spinal units (c = [c1, …. ,cN]) receive commands/error signal from M (e = [e1, …. ,eM]). Is M>N? Presuming M is > N this would represent a converging input from M to C, and that is problematic since the corticospinal projections are rather sparse. More likely there is diverging connectivity from M to C. It is unclear to me if this is an important problem, it may not be, but it would be appropriate to address such proportions in the text.

The reviewer is correct that there is a vast gap between the numbers (and proportions) in the real brain, and those in our model. Despite being a “minimal” model, our codebase is rather large, and simulation times are borderline problematic. Simulations including time delays, multiple regions (in a closed loop with a physical plant), and multiple forms of plasticity tend to require care. We don’t include more neurons because we don’t want to increase complexity without a strong motivation. Given that our firing rate units represent the average activity of a population, for a large number of units the net effect of different population sizes can be reflected by the strength of the connectivity, unless there are significant dynamics beyond those provided by the E-I circuit. Still this is a limitation of the model, and we expose it in the paper (line 1634).

This observation also applies to our M to C connections. Interestingly, our simulations with plasticity in M to C begin with random full connectivity, but only have a few strong connections at the end. This result is not presented to avoid further bloating of the paper.

There is indeed a bias towards the feedforward network. We also include ascending connections from A to C and M (creating three separate loops), and lateral connections in M, SPA, and C. Our lateral connections in cortex have the role of supporting the emergence of “dual” representations. Surely there are more subtle roles for recurrent and lateral connections, but in this paper we remain agnostic about them. Once more, we don’t introduce elements unrelated to the goals of the model.

Reviewer #3 (Recommendations for the authors):

This study proposes a new computational model for learning upper limb reaching control. In particular, this study develops a model that satisfies biological plausibility, which has not always been satisfied by previous models: (1) a network of the whole sensorimotor loop, (2) a model using neural elements, (3) learning using local information, and (4) continuous online learning. Specifically, their model showed that learning in the synaptic weights of descending and ascending projections between the cortex and spinal cord can reproduce ataxic reaching often observed in cerebellar patients. Furthermore, various phenomena that have been demonstrated in the motor control field emerge with this model. Based on these results, the authors argue that this is a biologically plausible learning model for animals' reaching control.

Strength:

1) This study examines a number of previous motor learning models and defines the constraints to develop a biologically plausible model. Given that there are too many parameters to be considered to build the motor control model, I truly appreciate the authors defining the constraints for a biological motor learning model and clarifying the research goal.

Weakness:

1) As the authors discussed in the manuscript, the model omits many components that are known to be involved in learning and, as a result, the model reproduces the impaired motor control rather than the intact state. However, this prevents us from verifying whether the model properly reproduced the biological motor control. Specifically, they claim that the model reproduced ataxic reaching which resemble to the cerebellum patients. However, it is not surprising that a poorly tuned feedback controller (i.e. PID controller) shows an oscillation. Therefore, it is necessary for the authors to show that their model significantly resembles the biological motor control by comparing to the model which shares the hierarchical structures but the error is corrected within the cortex and only motor commands descend to the spinal cord.

The qualitative characteristics of reaching in cerebellar patients have been studied for many years. It is indeed natural to think of tremors as faulty feedback control; Norbert Wiener (of Cybernetics fame) and many others were already thinking this way decades ago. We tried to go beyond this by: (1) the use of a closed-loop neuromusculoskeletal model, and (2) compiling and comparing the (somewhat sparse) ataxia literature relevant to our testing conditions.

A main point of the paper (emphasized in this new submission) is that feedback control is a powerful hypothesis, often forgotten by people explaining phenomena in motor control. We now include a model where the error is corrected within cortex, but the point still remains.

2) This model has attributed all of the learning effects to the synaptic connection between the cortex and spinal cord (descending and ascending connections). However, it has been repeatedly demonstrated that plastic changes within the cortex occur during motor learning [1 and others]. Therefore, the physiological relevance of attributing everything to descending and ascending pathways is questionable.

[1] Roth, R. H. et al. Cortical Synaptic AMPA Receptor Plasticity during Motor Learning. Neuron 105, 895-908.e5 (2020).

We did not intent to attribute everything to ascending and descending pathways. As mentioned to reviewer 2, we remain open to other type of learning in motor cortex, and as should be evident in this new version of the paper, we also remain open to learning configuration of the feedback control in motor cortex.

3) Relating to the previous comment, this model only changed the connection with the spinal cord, but the equivalent circuit can be achieved by changing the intracortical synaptic connection. It is necessary to justify why the connection with the spinal cord should be the sole target of learning.

To address this issue we have introduced the configuration 2 of the model, which does precisely this.

4) The authors claim that previously known phenomena (convergent force field and rotational dynamics) "emerge" with this learning model. However, this is not supported by the results. Firstly, when they tested the convergent force field, the spinal cord is separated from the descending and ascending connections with cortex. This means that this observed property is independent of the learning of the cortico-spinal connections. In addition, rotational dynamics is almost exclusively observed in the one-to-one cortical coupling, which the authors call a "less-plausible model. Therefore, these phenomena are not emergent properties as a result of their learning model. It is necessary to specify what are the essential factors for each observed phenomenon.

We did not mean to attribute all results to learning in the cortico-spinal connections, and certainly not the result of convergent force fields, where those connections are not even present. We tested the convergent force fields with an isolated spinal cord because that is the experimental paradigm, and attributed the result to viscoelastic properties of the arm, forces/torques adding linearly, and the balance between agonists and antagonists.

Giving the impression that we attribute every result in the paper to synaptic learning is a fault of our manuscript. We have tried to be very clear about this with the inclusion of configuration 3, where no plasticity is present, but the results appear nonetheless.

The appearance of the plots showing rotational dynamics can change a lot depending on the hyperparamters of the model, and on the detail of the algorithm, such as the sampling window. These plots are different from those in Churchland et al. 2012: we have few conditions, few neurons, no overtrained model, and a different way to produce the plots. To assess the presence of quasi-oscillations I would advice to also pay attention to the numerical measure (percentage of variance in the first jPCA plane). Notice that we have twice as many jPCA planes as the original study, but close to the same variances.

5) Regarding the synergistic model, their model has spinal interneurons recruiting two motoneuron pools, which is very different from the muscle synergies observed in animals

[2]. It is necessary to justify why this type of synergies should be used here.

[2] d'Avella A, Portone A, Fernandez L, Lacquaniti F (2006) Control of fast-reaching movements by muscle synergy combinations. The Journal of neuroscience: the official journal of the Society for Neuroscience 26:7791-7810

The synergistic model is a proof of principle, stating that it is not incompatible with Motor Synergy Encoders (reference 56), and this could be an interesting avenue of future research.

The study in d’Avella et al., 2006 takes the electromyograph waveforms and applies an optimization algorithm to find about five time-dependent patterns (the “synergies”) that explain most of the variance (which turned out to be around 80%). This is indeed a different interpretation of synergies from the one we use. In the introduction we clarified that “synergy” has multiple meanings, but we focus on Motor Synergy Encoders. This is because they fit naturally with the model.

This, however, got me thinking (SVF). Who is to say that applying the optimization procedure in d’Avella et al. 2006 would not lead to explain most variance through a reduced number of waveforms? I think this would be a very interesting future experiment. Thank you for providing the reference, and leading me in this path of thought.

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

Although the reviewers were satisfied by the new analyses, they were also unanimous in thinking that the manuscript is hard to follow and that this complexity could make it less broadly appealing. Our recommendation is thus for the authors to try to simplify the text, minimising the mental burden to the reader. The main concerns were:

1) The five different "configurations" are not well explained/motivated throughout the paper.

To reduce the confusion caused by the multiple configurations we have taken 3 steps. First, we have found that two of the configurations are not necessary to the main argument in the paper, so they have been relegated to the Appendix. Second, rather than using numbers, each configuration now has a descriptive name. Third, we have further clarified the rationale for each configuration.

2) Figure captions are too short, which makes it hard to understand the results (e.g., what is Figure 8 about?).

We wrote more descriptive figure captions.

3) There are too many abbreviations, which again does not help comprehend the text.

We removed some abbreviations, and sometimes repeated their definitions.

4) The Discussion is hard to follow, foremost because it requires that the readers remember all five configurations, which were presented in the much earlier -and very long- Results section.

The Discussion has been redacted, once more stating the idea behind each configuration.

5) The overall logic could be potentially streamlined so the study becomes easier to understand.

We have tried to improve the most tedious paragraphs.

In addition, you may want to consider addressing the comment by Reviewer #4, although it is not mandatory.

Reviewer #4 (Recommendations for the authors):

I really appreciate the authors revising the manuscript. The argument is now much clearer. I found that the authors are very open about what neural implementation this motor learning is occurring in, and it seems too early to suggest the neural implementation since the same learning can occur either in the spinal cord or in the cerebral cortex, or perhaps in other structures such as the cerebellum which is omitted by this model.

As a neurophysiologist, I am curious about what insights this computational model will provide into our understanding of the neural mechanism of motor learning. I understand that the key assumptions of this model are that (1) errors vary monotonically with motor command and (2) differential Hebbian learning at the output connections reduces the errors. The second assumption has already been discussed for possible implementation in the spinal cord. However, the validity of the first assumption is not yet clear. I would like to suggest the authors elaborate on what neural mechanism could transform errors in the multiple sensory systems (e.g visual systems) into the monotonic error such as muscle length. It is also important to mention where the desired somatosensory activity (Sp) comes from.

We found the comment by Reviewer #4 to be very insightful. Dealing with monotonicity of the errors and with the origin of the SP patterns is crucial to eventually extend this work into a more general system (for example, to include vision). Considering the current complexity of the manuscript, we believe it is best to refer the interested readers to our previous work (enclosed with the submission). In particular, our previous paper (first referenced in line 42) develops a proof-of concept for how an actor-critic system can be used to deal with non-monotonicity. We have also suggested (line 264) to consult that paper when considering the origin of the SP patterns.

https://doi.org/10.7554/eLife.77216.sa2

Article and author information

Author details

  1. Sergio Oscar Verduzco-Flores

    Computational Neuroscience Unit, Okinawa Institute of Science and Technology, Okinawa, Japan
    Contribution
    Conceptualization, Software, Formal analysis, Validation, Investigation, Visualization, Writing - original draft, Writing - review and editing
    For correspondence
    sergio.verduzco@gmail.com
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0712-145X
  2. Erik De Schutter

    Computational Neuroscience Unit, Okinawa Institute of Science and Technology, Okinawa, Japan
    Contribution
    Conceptualization, Resources, Supervision, Methodology, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8618-5138

Funding

No external funding was received for this work.

Acknowledgements

The authors wish to thank Prof. Kenji Doya for helping revise early versions of this manuscript.

Senior Editor

  1. Tamar R Makin, University of Cambridge, United Kingdom

Reviewing Editor

  1. Juan Álvaro Gallego, Imperial College London, United Kingdom

Reviewers

  1. Marco Capogrosso, University of Pittsburgh, United States
  2. Rune W Berg, University of Copenhagen, Denmark

Version history

  1. Received: January 20, 2022
  2. Accepted: October 26, 2022
  3. Accepted Manuscript published: November 14, 2022 (version 1)
  4. Version of Record published: November 24, 2022 (version 2)

Copyright

© 2022, Verduzco-Flores and De Schutter

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 632
    Page views
  • 97
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Sergio Oscar Verduzco-Flores
  2. Erik De Schutter
(2022)
Self-configuring feedback loops for sensorimotor control
eLife 11:e77216.
https://doi.org/10.7554/eLife.77216

Further reading

    1. Computational and Systems Biology
    2. Neuroscience
    Huu Hoang, Shinichiro Tsutsumi ... Keisuke Toyama
    Research Article

    Cerebellar climbing fibers convey diverse signals, but how they are organized in the compartmental structure of the cerebellar cortex during learning remains largely unclear. We analyzed a large amount of coordinate-localized two-photon imaging data from cerebellar Crus II in mice undergoing 'Go/No-go' reinforcement learning. Tensor component analysis revealed that a majority of climbing fiber inputs to Purkinje cells were reduced to only four functional components, corresponding to accurate timing control of motor initiation related to a Go cue, cognitive error-based learning, reward processing, and inhibition of erroneous behaviors after a No-go cue. Changes in neural activities during learning of the first two components were correlated with corresponding changes in timing control and error learning across animals, indirectly suggesting causal relationships. Spatial distribution of these components coincided well with boundaries of Aldolase-C/zebrin II expression in Purkinje cells, whereas several components are mixed in single neurons. Synchronization within individual components was bidirectionally regulated according to specific task contexts and learning stages. These findings suggest that, in close collaborations with other brain regions including the inferior olive nucleus, the cerebellum, based on anatomical compartments, reduces dimensions of the learning space by dynamically organizing multiple functional components, a feature that may inspire new-generation AI designs.

    1. Computational and Systems Biology
    2. Genetics and Genomics
    John F Ouyang, Kunal Mishra ... Jacques Behmoaras
    Research Article

    Tissue fibrosis affects multiple organs and involves a master-regulatory role of macrophages which respond to an initial inflammatory insult common in all forms of fibrosis. The recently unravelled multi-organ heterogeneity of macrophages in healthy and fibrotic human disease suggests that macrophages expressing osteopontin (SPP1), associate with lung and liver fibrosis. However, the conservation of this SPP1+ macrophage population across different tissues, and its specificity to fibrotic diseases with different etiologies remain unclear. Integrating 15 single cell RNA-sequencing datasets to profile 235,930 tissue macrophages from healthy and fibrotic heart, lung, liver, kidney, skin and endometrium, we extended the association of SPP1+ macrophages with fibrosis to all these tissues. We also identified a subpopulation expressing matrisome-associated genes (e.g., matrix metalloproteinases and their tissue inhibitors), functionally enriched for ECM remodelling and cell metabolism, representative of a matrisome-associated macrophage (MAM) polarization state within SPP1+ macrophages. Importantly, the MAM polarization state follows a differentiation trajectory from SPP1+ macrophages and is associated with a core set of regulon activity. SPP1+ macrophages without the MAM polarization state (SPP1+MAM-) show a positive association with ageing lung in mice and humans. These results suggest an advanced and conserved polarization state of SPP1+ macrophages in fibrotic tissues resulting from prolonged inflammatory cues within each tissue microenvironment.