When and why does motor preparation arise in recurrent neural network models of motor control?
Abstract
During delayed ballistic reaches, motor areas consistently display movementspecific activity patterns prior to movement onset. It is unclear why these patterns arise: while they have been proposed to seed an initial neural state from which the movement unfolds, recent experiments have uncovered the presence and necessity of ongoing inputs during movement, which may lessen the need for careful initialization. Here, we modeled the motor cortex as an inputdriven dynamical system, and we asked what the optimal way to control this system to perform fast delayed reaches is. We find that delayperiod inputs consistently arise in an optimally controlled model of M1. By studying a variety of network architectures, we could dissect and predict the situations in which it is beneficial for a network to prepare. Finally, we show that optimal inputdriven control of neural dynamics gives rise to multiple phases of preparation during reach sequences, providing a novel explanation for experimentally observed features of monkey M1 activity in double reaching.
eLife assessment
This important study provides a new perspective on why preparatory activity occurs before the onset of movement. The authors report that when there is a cost on the inputs, the optimal inputs should start before the desired network output for a wide variety of recurrent networks. The authors present compelling evidence by combining mathematically tractable analyses in linear networks and numerical simulation in nonlinear networks.
https://doi.org/10.7554/eLife.89131.4.sa0Introduction
During the production of ballistic movements, the motor cortex is thought to operate as a dynamical system whose state trajectories trace out the appropriate motor commands for downstream effectors (Shenoy et al., 2013; Miri et al., 2017; Russo et al., 2018). The extent to which these cortical dynamics are controlled by exogenous inputs before and/or during movement is the subject of ongoing study.
On the one hand, several experimental and modeling studies point to a potential role for exogenous inputs in motor preparation. First, cortical state trajectories are empirically well described by a lowdimensional dynamical system evolving nearautonomously during movement (Churchland et al., 2012; Pandarinath et al., 2018; Schimel et al., 2022), such that there is a priori no reason to suspect that inputs are required for motor production. Rather, inputs would be required during preparation to bring the state of the cortical network into a suitable initial condition. This inputdriven seeding process is corroborated by observations of movementspecific primary motor cortex (M1) activity arising well before movement initiation (Lara et al., 2018; Kaufman et al., 2014; Churchland et al., 2012; Meirhaeghe et al., 2023; Figure 1A), and associated models demonstrate the critical role of preparatory inputs therein (Sussillo et al., 2015; Hennequin et al., 2014; Kao et al., 2021).
On the other hand, recent studies in mice have shown that the motor cortex receives critical patterngenerating input from the thalamus during movement production (Sauerbrei et al., 2020), and recurrent neural network (RNN)based modeling of the motor feedback loop involved in reaching movements suggests that sensory feedback may also contribute significantly to the observed dynamics of M1 (Kalidindi et al., 2021). Moreover, most published network models of delayed reaches are able to perform the task just as well without preparatory inputs, i.e., with external inputs forcefully confined to the movement epoch – an illustratory example is shown in Figure 1B. Thus, the relative contributions of preparatory vs. movementepoch inputs to the dynamics implemented by M1 (potentially as part of a broader set of areas) remain unclear.
In addition to the specific form that inputs to cortical dynamics might take, one may ask more broadly about the computational role of motor preparation. Motor preparation is known to benefit behavior (e.g. by shortening reaction times and enabling more accurate execution Riehle and Requin, 1989; Churchland and Shenoy, 2007; Michaels et al., 2015) and may facilitate motor learning (Sheahan et al., 2016; Sun et al., 2022). However, from the perspective of cortical dynamics, preparation also introduces additional constraints.
Specifically, the high density of M1 neurons projecting directly to the spinal cord (Dum and Strick, 1991) suggests that motor cortical outputs control lowerlevel effectors with little intermediate processing. For preparatory processes to avoid triggering premature movement, any premovement activity in the motor and dorsal premotor (PMd) cortices must therefore engage the pyramidal tract neurons in a way that ensures their activity patterns will not lead to any movement.
While this can be achieved by constraining neural activity to evolve in a nullspace of the motor output (Kaufman et al., 2014), the question nevertheless arises: what advantage is there to having neural dynamics begin earlier in a constrained manner, rather than unfold freely just in time for movement production?
Here, we sought a normative explanation for motor preparation at the level of motor cortex dynamics: we asked whether preparation arises in RNNs performing delayedreaching tasks, and what factors lead to more or less preparation.
Such an explanation could not be obtained from previous network models of delayed reaches, as they typically assume from the getgo that the cortical network receives preparatory inputs during a fixed time window preceding the go cue (Sussillo et al., 2015; Kao et al., 2021). In this case, premovement activity is by designing a critical determinant of the subsequent behavior (Sussillo et al., 2015; Kao et al., 2021; Zimnik and Churchland, 2021). In this work, we removed this modeling assumption and studied models in which the correct behavior could in principle be obtained without explicit motor preparation.
To study the role of motor preparation, and that of exogenous inputs in this process, we followed an optimal control approach (Harris and Wolpert, 1998; Todorov and Jordan, 2002; Yeo et al., 2016). We considered the dynamics of an RNN model of M1 coupled to a model arm (Todorov and Li, 2003), and used a standard control cost functional to quantify and optimize performance in a delayedreaching task. We used the iterative linear quadratic regulator algorithm (iLQR) algorithm (Li and Todorov, 2004) to find the spatiotemporal patterns of network inputs that minimize this cost functional, for any given network connectivity. Critically, these inputs could arise both before and during movement; thus, our framework allowed for principled selection among a continuum of motor strategies, going from purely autonomous motor generation following preparation, to purely inputdriven unprepared dynamics.
We considered an inhibitionstabilized network – which was shown previously to capture prominent aspects of monkey M1 activity (Hennequin et al., 2014; Kao et al., 2021) – and found that optimal control of the model requires preparation, with optimal inputs arising well before movement begins. To understand what features of network connectivity lead to optimal preparatory control strategies, we first turned to lowdimensional models, which could be more easily dissected. We then generalized insights from those systems back to highdimensional networks using tools from control theory, and found that preparation can be largely explained by two quantities summarizing the dynamical response properties of the network.
Finally, we studied the optimal control of movement sequences. Consistent with recent experimental findings (Zimnik and Churchland, 2021), we observed that optimal control of compound reaches leads to inputdriven preparatory activity in a dedicated activity subspace prior to each movement.
Overall, our results show that preparatory neural activity patterns arise from optimal control of reaching movements at the level of motor cortical circuits, thus providing a possible explanation for a number of observed experimental findings.
Model
A model of cortical dynamics for reaching movements
We considered a simple reaching task, in which the hand must move from a resting location to one of eight radially located targets in a 2D plane as fast as possible (Figure 1). The target had to be reached within 600 ms of a go cue that follows a delay period of varying (but known) duration. We modeled the trajectory of the hand via a twojointed model arm (Li and Todorov, 2004; Kao et al., 2021), driven into motion by a pair of torques $\mathit{m}(t)$ (Methods). We further assumed that these torques arise as a linear readout of the momentary firing rates $\mathit{r}(t)$ of a population of M1 neurons,
where $\mathit{C}$ was a randomly generated readout matrix, projecting the neural activity into the output space. We modeled the dynamics of $N=200$ M1 neurons using a standard rate equation,
where the momentary population firing rate vector $\mathit{r}(t)$ was obtained by passing a vector of internal neuronal activations $\mathit{x}(t)$ through a rectified linear function $\varphi [\cdot ]$, elementwise. In Equation 2, $\mathit{h}$ is a constant input that establishes a baseline firing rate of 5 Hz on average, with a standard deviation of 5 Hz across neurons, $\mathit{u}(t)$ is a taskdependent control input (see below), and $\mathit{W}$ denotes the matrix of recurrent connection weights. Throughout most of this work, we considered inhibitionstabilized M1 dynamics (Hennequin et al., 2014; Methods), which have previously been shown to produce activity resembling that of M1 during reaching (Kao et al., 2021).
Thus, our model can be viewed as a twolevel controller, with the arm being controlled by M1, and M1 being controlled by external inputs. Note that each instantiation of our model corresponds to a set of $\mathit{W}$, $\mathit{C}$, and $\mathit{h}$, none of which are specifically optimized for the task.
To prepare or not to prepare?
Previous experimental (Churchland et al., 2012; Shenoy et al., 2013) and modeling (Hennequin et al., 2014; Sussillo et al., 2015; Pandarinath et al., 2018) work suggests that fast ballistic movements rely on strong dynamics, which are observed in M1 and well modeled as nearautonomous (although the underlying dynamical system may not be anatomically confined to M1, as we discuss later). Networklevel models of ballistic control thus rely critically on a preparation phase during which they are driven into a movementspecific state that seeds their subsequent autonomous dynamics (Kao et al., 2021; Sussillo et al., 2015). However, somewhat paradoxically, the same recurrent neural network models can also solve the task in a completely different regime, in which taskrelated inputs arise during movement only, with no preparatory inputs whatsoever. We illustrate this dichotomy in Figure 1. The same centerout reach can be produced with control inputs to M1 that arise either prior to movement only (full lines), or during movement only (dashed lines). In the latter case, no reachspecific preparatory activity is observed, making the model inconsistent with experimental findings. But what rationale is there in preparing for upcoming movements, then?
To address this question, we formulated delayed reaching as an optimal control problem, and asked what external inputs are required, and at what time, to drive the hand into the desired position with minimum control effort. Specifically, we sought inputs that were as weak as possible yet accurately drove the hand to the target within an allotted time window. We also penalized inputs that caused premature movement before the go cue.
Thus, we solved for spatiotemporal input trajectories that minimized a cost functional capturing the various task requirements. Our cost was composed of three terms: $\mathcal{J}}_{\text{target}$ penalizes deviations away from the target, with an ‘urgency’ weight that increases quadratically with time, thus capturing the implicit incentive for animals to perform fast reaches in such experiments (which are normally conducted in sessions of fixed duration).
$\mathcal{J}}_{\text{null}$ penalizes premature movement during preparation, as measured by any deviation in position, speed, and acceleration of the hand. Finally, $\mathcal{J}}_{\text{effort}$ penalizes control effort in the form of input magnitude throughout the whole trial, thus promoting energyefficient control solutions among a typically infinite set of possibilities (Kao et al., 2021; Sterling and Laughlin, 2015). Note that $\mathcal{J}}_{\text{effort}$ can be viewed as a standard regularization term, and must be included to ensure the control problem is well defined. The total objective thus had the following form:
where $\mathit{\theta}$ and $\dot{\mathit{\theta}}$ denote the position and velocity of the hand in angular space, $\displaystyle {\mathrm{\Delta}}_{\text{prep}}$ was the duration of the delay period, and $T$ that of the movement period. As $\mathcal{J}}_{\text{target}$ and $\mathcal{J}}_{\text{null}$ depend on $\mathit{u}(t)$ implicitly through Equations 1 and 2, $\mathcal{J}$ is a function of $\mathit{u}$ only.
Importantly, we allowed for inputs within a time window beginning $\displaystyle {\mathrm{\Delta}}_{\text{prep}}$ ms before, and ending $T$ ms after the go cue (set at $t=0$). Therefore, both preparationonly and movementonly input strategies (Figure 1) could potentially arise, as well as anything inbetween.
Note that this control objective assumes that the delay duration ($\displaystyle {\mathrm{\Delta}}_{\text{prep}}$) is known ahead of time, an assumption that does not hold for many delayedreaching tasks in monkeys where the delay is uncertain. We make this assumption for computational tractability and later discuss extensions to the uncertain case (Discussion).
Here, we solved for the optimal control inputs using the iLQR (Li and Todorov, 2004), an efficient trajectory optimization algorithm that is well suited for handling the nonlinear nature of both the arm’s and the network’s dynamics. As our primary goal was to assess the role of preparation in a normative way, we did not study the putative circuit dynamics upstream of M1 that might lead to the computation of these optimal inputs.
We balanced the various components of our cost functional by choosing $\alpha}_{\text{null}$ and $\alpha}_{\text{effort}$ to qualitatively match the behavioral requirements of a typical reachandhold task. Specifically, we tuned them jointly so as to ensure (i) stillness during preparation and (ii) reach duration of approximately ∼400 ms, with the hand staying within 0.5 cm of the target for ∼200 ms after the end of the reach. We ensured that the main qualitative features of the solution, i.e., the results presented below, were robust to the choice of hyperparameter values within the fairly large range in which the above softconstraints are satisfied (Appendix 1).
Results
Preparation arises as an optimal control strategy
Using the above control framework, we assessed whether the optimal way of performing a delayed reach involves preparation.
More concretely, does the optimal control strategy of the model described in Equation 2 involve any preparatory inputs during the delay period? For any single optimally performed reach, we found that network activity began changing well before the go cue (Figure 2B, bottom), and that this was driven by inputs that arose early (Figure 2B, middle). Thus, although preparatory network activity cancels in the readout (such that the hand remains still; Figure 2B, top) and therefore does not contribute directly to movement, it still forms an integral part of the optimal reach strategy.
To quantify how much the optimal control strategy relied on inputs prior to movement, we defined the preparation index as the ratio of input magnitude during the delay period to that during the remainder of the trial:
We found that the preparation index rose sharply as we increased the delay period, and eventually plateaued at ∼1.3 for delay periods longer than 300 ms (Figure 2C).
Similarly, the total cost of the task was highest in the absence of preparation, and decreased until it also reached a plateau at $\displaystyle {\mathrm{\Delta}}_{\text{prep}}\sim 300$ ms (Figure 2C, black). This appears somewhat counterintuitive, as having a larger $\displaystyle {\mathrm{\Delta}}_{\text{prep}}$ means that both $\mathcal{J}}_{\text{effort}$ and $\mathcal{J}}_{\text{null}$ are accumulated over a longer period. To resolve this paradox, we examined each component of the cost function. We found that the overall decrease in cost with increasing preparation time was driven by a concurrent decrease in both $\mathcal{J}}_{\text{tgt}$ and $\mathcal{J}}_{\text{effort}$. The former effect was due to the model producing faster reaches (Figure 2C inset; hand position for a reach with [red] and without [blue] preparation) while the latter arose from smaller control inputs being necessary when preparation was allowed. Together, these results suggest that the presence of a delay period changes the optimal control strategy for reaching, and increases performance in the task.
The results above show that delaying the reach beyond ∼300 ms brings little benefit; in particular, all components of the cost stabilize past that point (Figure 2C). We thus wondered what features the optimally controlled dynamics would display as $\displaystyle {\mathrm{\Delta}}_{\text{prep}}$ increased beyond 300 ms. Would the network defer preparation to a last minute surge, or prepare more gently over the entire preparatory window?
Would the network produce the same neural activity patterns? We found that the optimal controller made very little use of any preparation time available up to 300 ms before the go cue: with longer preparation times, external input continued to arise just a couple of hundred milliseconds before movement initiation, and single neuron firing rates remained remarkably similar (Figure 3A). This was also seen in PCA projections of the firing rates, which traced out similar trajectories irrespective of the delay period (Figure 3B). We hypothesized that this behavior is due to the network dynamics having a certain maximum characteristic timescale, such that inputs that arrive too early end up being ‘forgotten’ – they increase $\mathcal{J}}_{\text{effort}$ and possibly $\mathcal{J}}_{\text{null}$ without having a chance to influence $\mathcal{J}}_{\text{tgt}$. We confirmed this by varying the characteristic time constant ($\tau$ in Equation 2). For a fixed $\displaystyle {\mathrm{\Delta}}_{\text{prep}}$, we found that for larger (resp. lower) values of $\tau$, the optimal control inputs started rising earlier (resp. later) and thus occupied more (resp. less) of the alloted preparatory period (Appendix 1—figure 3).
Understanding optimal control in simplified models
Having established that the inhibitionstabilized network (ISN) model of M1 relies on preparatory inputs to solve the delayedreaching task, we next tried to understand why it does so.
To further unravel the interplay between the structure of the network and the optimal control strategy, i.e., what aspects of the dynamics of the network warrant preparation, we turned to simpler, twodimensional (2D) models of cortical dynamics. These 2D models are small enough to enable detailed analysis (Appendix 1—figure 2), yet rich enough to capture the two dominant dynamical phenomena that arise in ISN dynamics: nonnormal amplification (Murphy and Miller, 2009; Goldman, 2009; Hennequin, 2012) and oscillations (Brunel, 2000; Dayan and Abbott, 2001). Specifically, networks of E and I neurons have been shown to embed two main motifs of effective connectivity which are revealed by appropriate orthogonal changes of basis: (i) feedforward (‘nonnormal’) connectivity whereby a ‘source mode’ of EI imbalance feeds into a ‘sink mode’ in which balance is restored, and (ii) antisymmetric connectivity that causes the two populations to oscillate.
To study the impact of each of these prototypical connectivity motifs on movement preparation, we implemented them separately, i.e., as two small networks of two units each, with an overall connectivity scale parameter $w$ which we varied (Figure 4A and D; Methods). As both nonnormal and oscillatory dynamics arise from linear algebraic properties of the connectivity matrix, we considered linear network dynamics for this analysis ($\varphi (x)=x$ in Equation 3). Moreover, to preserve the existence of an output nullspace in which preparation could in principle occur without causing premature movement, we reduced the dimensionality of the motor readout from 2D (where there would be no room left for a nullspace) to 1D (leaving a 1D nullspace), and adapted the motor task so that the network now had to move the hand position along a single dimension (Figure 4B and E, top). Analogous to the previous arm model, we assumed that the hand’s acceleration along this axis was directly given by the 1D network readout.
We found that optimal control of both dynamical motifs generally led to preparatory dynamics, with inputs arising before the go cue (Figure 4B and E, bottom). In the feedforward motif, the amount of preparatory inputs appeared to depend critically on the orientation of the readout. When the readout was aligned with the sink (brown) mode (Figure 4B, left), the controller prepared the network by moving its activity along the source (orange) mode (Figure 4C, left). This placed the network in a position from which it had a natural propensity to generate large activity transients along the readout dimension (c.f. flow field in Figure 4A); here, these transients were exploited to drive the fast upstroke in hand acceleration and throw the hand toward the target location. Note that this strategy reduces the amount of input the controller needs to deliver during the movement, because the network itself does most of the work.
Nevertheless, in this case the network’s own impulse response was not rich enough to accommodate the phase reversal required to subsequently slow the hand down and terminate the movement. Optimal control therefore also involved inputs during the movement epoch, leading to a preparatory index of ∼0.54 (Figure 4G, dark blue).
When it was instead the source mode that was read out (Figure 4B, right), the only dimension along which the system could prepare without moving was the sink mode. Preparing this way is of no benefit, because the flow field along the sink mode has no component along the source (readout) mode.
Thus, here the optimal strategy was to defer control to the movement epoch, during which the transient growth of network activity along the readout rested entirely on adequate control inputs. This led to a preparation index of ∼0 (Figure 4G, pale green). Although the network did react with large activity excursions along the sink mode (Figure 4C, right), these were inconsequential for the movement. Importantly, of the two extreme readout configurations discussed above, the first one yielded a smaller overall optimal control cost (by a factor of ∼1.5). Thus, at a metacontrol level, ideal downstream effectors would read out the sink mode, not the source mode. Note that while increasing the connectivity strength initially led to more preparation (Figure 4H), a plateau was eventually reached for $w\ge 4$. Indeed, while stronger dynamics initially make preparation more beneficial, they also make it more difficult for preparatory activity to remain in the readout nullspace.
We obtained similar insights for oscillatory network dynamics (Figure 4D–F). A key difference however was that the flow field was rotationally symmetric such that no distinction could be made between ‘source’ and ‘sink’ units – indeed the optimal control strategy yielded the same results (up to a rotation of the state space) irrespective of which of the two units was driving the hand’s acceleration (compare left and right panels in Figure 4D–F). Nevertheless, the optimal controller consistently moved the network’s activity along the outputnull axis during preparation, in such a way as to engage the network’s own rotational flow immediately after the go cue (Figure 4F). This rotational flow drove a fast rise and decay of activity in the readout unit, thus providing the initial segment of the required hand acceleration. The hand was subsequently slowed down by modest movementepoch control inputs which eventually receded, leading to a preparation index of ∼0.58. Interestingly, the preparation index showed a decrease for very large $w$ (Figure 4G), which did not reflect smaller preparatory inputs (Figure 4H) but rather reflected the larger inputs that were required during movement to cancel the fast oscillations naturally generated by the network.
The above results highlight how the optimal control strategy is shaped by the dynamical motifs present in the network. Crucially, we found that the optimal way to control the movement depends not only on the strength and flow of the internal network dynamics, but also on their interactions with the readout.
Controltheoretic properties predict the amount of preparation
Our investigation of preparation in a lowdimensional system allowed us to isolate the impact of core dynamical motifs, and highlighted how preparation depends on the geometry of the flow field, and its alignment to the readout. However, these intuitions remain somewhat qualitative, making them difficult to generalize to our highdimensional ISN model.
To quantify the key criteria that appear important for preparation, we turned to tools from control theory. We reasoned that, for a network to be able to benefit from preparation and thus exhibit a large preparation index, there must be some advantage to using early inputs that do not immediately cause movement, relative to using later inputs that do. We hypothesized that this advantage could be broken down into two criteria. First, there must exist activity patterns that are momentarily outputnull (i.e. do not immediately cause movement) yet seed outputpotent dynamics that subsequently move the arm. The necessity of this criterion was obvious in the 2D nonnormal network, which did not display any preparation when its nullspace was aligned with its ‘sink’ mode. In the language of control theory, this criterion implies that the nullspace of the readout must be sufficiently ‘observable’ – we captured this in a scalar quantity α (Methods; Kao and Hennequin, 2019; Skogestad, 2007). Second, there must be a sizeable cost to performing the movement in an entirely inputdriven manner without relying on preparation. In other words, the network should be hard to steer along the readout direction, i.e., the readout must be of limited ‘controllability’ – we captured this in another scalar quantity β (Methods).
We illustrate the meaning of these two metrics in Figure 5A and B for a 2D example network that combines nonnormality and oscillations. We show two extreme choices of readout direction (Figure 5A, dashed black): the one that maximizes α (top) and the one that minimizes it (bottom). In the first case, the readout nullspace (dashed orange) is very observable, i.e., trajectories that begin in the nullspace evolve to produce large transients along the readout (solid orange and inset). In the second case, the opposite is true. For each case, we also assessed the controllability of the readout (β). The controllability of a direction corresponds to how much variance activity trajectories exhibit along that direction, when they are randomly and isotropically initialized (Figure 5B). In other words, a very controllable direction is one along which network trajectories have a natural tendency to evolve.
We then assessed how well $\alpha$ and $\beta$ could predict the preparation index of individual networks. In 2D networks, we found that a simple function that grows with $\alpha$ and decreases with $\beta$ could accurately predict preparation across thousands of networks (Appendix 1  Section 3 ‘Additional results in the 2D system’). To assess whether these insights carried over to highdimensional networks, we then generated a range of large ISNs with parameterically varied connectivity strengths and decay timescales (Methods). We then regressed the preparation index against the values of $\alpha$ and $\beta$ computed for each of these networks (as controllability and observability are only defined for linear networks, we set $\varphi (x)=x$ for this investigation). We found that a simple linear mapping, $\text{prep. index}={k}_{0}+{k}_{\alpha}\alpha +{k}_{\beta}\beta$, with parameters fitted to one half of the ISNs, accurately predicted the preparation indices of the other half (Figure 5C; ${R}^{2}=0.93$, fivefold crossvalidated). Interestingly, we observed that although $\alpha$ and $\beta$ (which are both functions of the network connectivity) were highly correlated across different networks, discarding either variable in our linear regression led to a significant drop in $R}^{2$ (Figure 5C, inset). Importantly, it was their difference that best predicted the preparation index (${k}_{\alpha}>0$ and ${k}_{\beta}<0$), consistent with our hypothesis that the preparation index is a relative quantity which increases as the nullspace becomes more observable, but decreases as readout dimensions become more controllable.
We were able to confirm the generality of this predictive model by generating networks with other types of connectivity (oscillatory networks, and networks with unstructured random weights), which displayed dynamics very different from the ISNs (see Appendix 1—figure 6). Interestingly, despite the different distribution of $\alpha$ and $\beta$ parameters in those networks, we could still capture a large fraction of the variance in their preparation index (${R}^{2}=0.69$) using the linear fit obtained from the ISNs alone.
This confirms that $\alpha$ and $\beta$ can capture information about the networks’ dynamics in a universal manner.
Note that we do not make any claims about the specific functional form of the relationship between $\alpha$, $\beta$, and the preparation index. Rather, we claim that there should be a broad trend for the preparation index to increase with $\alpha$ and decrease with $\beta$, and we acknowledge that this relationship could in general be nonlinear. Indeed, in 2D networks, we found that the preparation index was in fact better predicted by the ratio of $\alpha$ over $\beta$ than by their difference (Appendix 1—figure 5).
Finally, as the above results highlight that the amount of preparation depends on the alignment between internal dynamics and readout, one may wonder how much our conclusions depend on our use of a random unstructured readout matrix. First, we note that the effect of the alignment on preparation index is greatly amplified in the lowdimensional networks (Figure 4G). In highdimensional networks, the null space of a random readout matrix $\mathit{C}$ will have some overlap with the most observable directions of the dynamics, thus encouraging preparation. Second, we performed additional simulations where we metaoptimized the readout so as to minimize the average optimal cost per movement. The resulting system is more observable overall (as it allows the network to solve the task at a lower cost) but relies just as much on preparation (Appendix 1—figure 7).
Modeling movement sequences
Having gained a better understanding of what features lead a network to prepare, we next set out to assess whether optimal control could also explain the neural preparatory processes underlying the generation of movement sequences. We revisited the experimental studies of Zimnik and Churchland, 2021, where monkeys were trained to perform two consecutive reaches. Each trial started with the display of both targets, followed by an explicitly enforced delay period before the onset of the first reach. A distinction was made between ‘double’ reaches in which a pause was enforced between reaches, and ‘compound’ reaches in which no pause was required. This study concluded that, rather than the whole movement sequence unrolling from a single preparatory period, each reach was instead successively seeded by its own preparatory activity.
Here, we asked whether such an independent, successive preparation strategy would arise as an optimal control solution, in the same way that singlereach preparation did. Importantly, we could not answer this question by directly examining network inputs as we did for single reaches. Indeed, any network input observed before the second reach could be contributing either to the end of the first movement, or to the preparation of the next. In fact, the issue of teasing apart preparatory vs. movementrelated activity patterns also arose in the analysis of the monkey data. To address this, Zimnik and Churchland, 2021, exploited the fact that monkey M1 activity just before and during single reaches is segregated into two distinct subspaces. Thus, momentary activity patterns (during either single or double reaches) can be unambiguously labeled as preparatory or movementrelated depending on which of the two subspaces they occupied. We performed a similar analysis (Methods) and verified that preparatory and movement activity patterns in the model were also well segregated in their respective subspaces in the singlereach task (Figure 6A and B). We then assessed the occupancy of the preparatory subspace during double reaching in the model, and took this measure as a signature of preparation.
To model optimal control of a double reach, we modified our cost functional to account for the presence of two consecutive targets (see Methods). We considered the same set of eight targets as in our singlereach task, and modeled all possible combinations of two targets (one example shown in Figure 6). We set the hyperparameters of the cost function such that both targets could be reached by the resulting optimal controller, in a way that matched important qualitative aspects of the monkeys’ behavior (in particular, such that both reaches were performed at similar velocities, with the second reach lasting slightly longer on average; Figure 6B and C, top).
We projected the network activity onto preparatory and movement subspaces identified using single and double reaches activity (Methods). For double reaches with a long (600 ms) pause, the preparatory subspace was transiently occupied twice, with the two peaks occurring just before the onset of each movement in the sequence (Figure 6B, bottom).
Notably, the occupancy during the ‘compound’ reach (without pause; Figure 6C) also started rising prior to the first movement before decaying very slightly and peaking again before the second reach, indicating two independent preparatory events. This is somewhat surprising, given that a movement sequence can also be viewed as a single ‘compound’ movement, for which we have shown previously a unique preparatory phase is sufficient (Figure 2). In our model, this behavior can be understood to arise from the requirement that the hand stop briefly at the first target. To produce the second reach, the hand needs to accelerate again, which requires transient regrowth of activity in the network. Given that the network’s dynamical repertoire exhibits limited timescales, this is most easily achieved by reinjecting inputs into the system.
In summary, our results suggest that the ‘independent’ preparation strategy observed in monkeys is consistent with the optimal control of a tworeach sequence. While Zimnik and Churchland, 2021, showed that RNNs trained on this task used this ‘independent’ strategy, this was by design as the network was only cued for the second reach after the first one had started. In addition to replicating this proof of concept that it is possible to prepare while moving, our model also shows how and why independent preparation might arise as an optimal control solution.
Discussion
In this work, we proposed a model for the dynamics of motor cortex during a delayedreaching task in nonhuman primates. Unlike previous work, we treated M1 as an inputdriven nonlinear dynamical system, with generic connectivity not specifically optimized for the task, but with external inputs assumed to be optimal for each reach.
Motivated by a large body of evidence suggesting that preparation is useful before delayed reaches (Churchland et al., 2010; Lara et al., 2018; Afshar et al., 2011; Shenoy et al., 2013), but also evidence for thalamic inputs being necessary for accurate movement execution (Sauerbrei et al., 2020), we used this model to investigate whether and why neural circuits might rely on motor preparation during delayedreaching tasks. Interestingly, preparation arose as an optimal control strategy in our model, with the optimal solution to the task relying strongly on inputs prior to movement onset. Moreover, the benefits of preparation were dependent on the network connectivity, with preparation being more prevalent in networks whose rich internal dynamics can be advantageously seeded by early external inputs. We were able to quantify this intuition with a predictive model relating the dynamical response properties of a network to the amount of preparation it exhibits when controlled optimally.
Finally, we found that prominent features of the monkeys’ neural activity during sequential reaches arose naturally from optimal control assumptions. Specifically, optimally controlled networks relied on two phases of preparation when executing sequences of two reaches, corroborating recent experimental observations in monkey M1 (Zimnik and Churchland, 2021). Together, our results provide a normative explanation for the emergence of preparatory activity in both single and sequential reaching movements.
In recent years, taskoptimized RNNs have become a very popular tool to model neural circuit dynamics. Classically, those models incorporate only those inputs that directly reflect taskrelated stimuli (e.g. motor target, go cue, etc.). This requires assumptions about the form of the inputs, such as modeling them as simple step functions active during specific task epochs. However, as local neural circuits are part of a wider network of brain areas, a large portion of their inputs come from other brain areas at intermediate stages of the computation and may therefore not be directly tied to task stimuli. Thus, it is not always obvious what assumptions can reasonably be made about the inputs that drive the circuit’s dynamics.
Our optimization framework, which does not require us to make any specific assumptions about when and how inputs enter the network (although it does allow to incorporate prior information in the form of constraints), allows to bypass this problem and to implicitly model unobserved inputs from other areas. Importantly, our framework allows to ask questions – such as ‘why prepare’ – that are difficult to phrase in standard inputdriven RNN models. We note, however, that in the investigation we have presented here, the lack of imposed structure for the inputs also implied that the model could not make use of mechanisms known to contribute certain aspects of preparatory neural activity. For example, our model did not exhibit the usual visually driven response to the target input, nor did it have to use the delay epoch to keep such a transient sensory input in memory (Guo et al., 2014; Li et al., 2015).
The main premise of our approach is that one can somehow delineate the dynamical system which M1 implements, and attribute any activity patterns that it cannot autonomously generate to external inputs. Just where the anatomical boundary of ‘M1 dynamics’ lie – and therefore where ‘external inputs’ originate – is unclear, and our results must be interpreted with this limitation in mind. Operationally, previous works in reaching monkeys have shown that M1 data can be mathematically well described by a dynamical system that appears largely autonomous during movement. These works have emphasized that those abstract dynamics, while inferred from M1 data alone, may not be anatomically confined to M1 itself. Instead, they may involve interactions between multiple brain areas, and even possibly parts of the body through delayed sensory feedback. Here, we too tend to think of our M1 models in this way, and therefore attribute external input to brain areas that are one step removed from this potentially broad motorgenerating network. Nevertheless, a more detailed multiarea model of the motorgenerating circuitry including, e.g., spinal networks (Prut and Fetz, 1999) could enable more detailed comparisons to multiregion neural data. In a similar vein, our model makes no distinction between external inputs that drive movementspecific planning computations, and other types of movementunspecific inputs that might drive the transition from planning to execution (e.g. ‘trigger’ inputs, Kaufman et al., 2016). Incorporating such distinctions (e.g. by temporally modulating the cost in individual input channels depending on specific task events, or by having separate channels for movementunspecific inputs) might allow to ask more targeted questions about the role and provenance of external inputs.
A major limitation of our study is the specific choice of a quadratic penalty on the external input in our control objective. While there are possible justifications for such a cost (e.g. regularization of the dynamics to promote robustness of the control solution), its use here is mainly motivated by mathematical tractability. Other costs might be conceivably more relevant and might affect our results. For example, studies of motor cortex have long thought of its dynamics as converting relatively simple inputs reflecting highlevel, temporally stable plans, into detailed, temporally varying motor commands. Thus, a potentially relevant form of a penalty for external inputs would be their temporal complexity. Such a penalty would have the advantage of encouraging a clearer separation between the inputs and the RNN activations; indeed, in our current model, we find that the optimal controls themselves have a temporal structure, part of which could be generated by a dynamical system and thus potentially absorbed into our ‘M1 dynamics’. To address this, we note that our optimization framework can be adjusted to penalize the magnitude of the temporal derivative of the external input $\Vert \dot{\mathit{u}}{\Vert}^{2}$, instead of $\Vert \mathit{u}{\Vert}^{2}$. We experimented with this extension and found qualitatively different optimal inputs and M1 firing rates, which evolved more slowly and plateaued for sufficiently long preparation (Appendix 1—figure 8AD) – this is in fact more consistent with monkey M1 data (e.g. Elsayed et al., 2016). Despite these qualitative difference in the specific form of preparation, our main conclusion stands that inputdriven preparation continues to arise as an optimal solution (Appendix 1—figure 8EF).
Another important assumption we have made is that the optimal controller is aware of the duration of the delay period. While this made solving for the optimal control inputs easier, it made our task more akin to a selfinitiated reach (Lara et al., 2018) than to a typical delayed reach with unpredictable, stochastic delay durations. Future work could revisit this assumption. As a first step toward this, we now briefly outline pilot experiments in this direction. We used an exponential distribution of delays (with mean 300 ms) and devised two modified versions of our model that dealt with the resulting uncertainty in two different ways. In the first strategy, at any time during preparation, the model would estimate the most probable timetogocue given that it hadn’t arrived yet (in this case, this is always 300 ms in the future) and would plan an optimal sequence of inputs accordingly. In the second strategy, the network would prudently assume the earliest possible go cue (i.e. the next time step) and plan accordingly. In both cases, only the first input in the optimal input sequence would be used at each step, and complete replanning would follow in the next step, as the model reassesses the situation given new information (i.e. whether the actual go cue arrived or not; this is a form of ‘model predictive control’, Rawlings et al., 2017). Preparatory inputs arose in both settings, but we found that only the latter strategy led to activity patterns that plateaued early during preparation (see Appendix 1—figure 9).
Throughout the main text, we have referred to $\mathrm{\Delta}}_{\text{prep}$ as the taskenforced delay period. However, a more accurate description may be that it corresponds to a delay period determined by an internally set go signal, which can lag behind the external go cue. While we would not expect a large difference between those two signals, the way in which we define $\displaystyle {\mathrm{\Delta}}_{\text{prep}}$ becomes important as it approaches 0 ms (limit of a quasiautomatic reach; Lara et al., 2018). Indeed, in this limit, our model exhibits almost no activity in the preparatory subspace (as defined in Figure 6 – see further analyses in Appendix 1—figure 10). In contrast, monkey M1 activity was found to transiently occupy the preparatory subspace even in this case (Lara et al., 2018). Evidence for a delay between the earliest possible response to sensory cues and the trigger of movement was also observed in Kaufman et al., 2016, as well as in human behavioral studies (Haith et al., 2016). Thus, one may wish to explicitly incorporate this additional delay in the model in order to make it more realistic. Note however that Haith et al., 2016, showed that this internal delay could be shortened without affecting movement accuracy, suggesting that part of the processing that empirically occurs inbetween the internal and external go cues may not be necessary, but rather reflect a decoupling between the end of preparation and the trigger of movement. This may be important to consider when attempting to compare the model to, e.g., reaction times from behavioral experiments.
Dynamical systems have a longstanding history as models of neural populations (Dayan and Abbott, 2001). However, understanding how neural circuits can perform various computations remains a challenging question.
Recently, there has been increased interest in trying to understand the role of inputs in shaping cortical dynamics. This question has been approached both from a datadriven perspective (Malonis et al., 2021; SoldadoMagraner et al., 2023) and in modeling work with, e.g., Driscoll et al., 2022, showing how a single network can perform different tasks by reorganizing its dynamics under the effect of external inputs and Dubreuil et al., 2021, relating network structure to the ability to process contextual inputs. To better understand how our motor system can generate flexible behaviors (Logiaco et al., 2021; Stroud et al., 2018), and to characterize learning on short timescales (Sohn et al., 2021; Heald et al., 2023), it is important to study how network dynamics can be modulated by external signals that allow rapid adaptation to new contexts without requiring extensive modifications of the network’s connectivity. The optimal control approach we proposed here offers a way to systematically perform such evaluations, in a variety of tasks and under different assumptions regarding how inputs are allowed to impact the dynamics of the local circuit of interest. While our model’s predictions will depend on, e.g., the choice of connectivity or the design of the cost function, an exciting direction for future work will be to obtain those parameters in a datadriven manner, for instance using recently developed methods to infer dynamics from data (Pandarinath et al., 2018; Schimel et al., 2022), and advances in inverse reinforcement learning and differentiable control (Amos et al., 2018) to infer the cost function that behavior optimizes. These could additionally be combined with more biomechanically realistic effectors, such as the differentiable arm models from Codol et al., 2023.
Methods
Experimental model and subject details
In Figure 1, we showed data from two primate datasets that were made available to us by Mark Churchland, Matthew Kaufman, and Krishna Shenoy. Details of animal care, surgery, electrophysiological recordings, and behavioral task have been reported previously in Churchland et al., 2012; Kaufman et al., 2014 (see in particular the details associated with the J and N ‘array’ datasets). The subjects of this study, J and N, were two adult male macaque monkeys (Macaca mulatta). The animal protocols were approved by the Stanford University Institutional Animal Care and Use Committee. Both monkeys were trained to perform a delayedreaching task on a frontoparallel screen. At the beginning of each trial, they fixated on the center of the screen for some time, after which a target appeared on the screen. After a variable delay period (0–1000 ms), a go cue appeared instructing the monkeys to reach toward the target. Recordings were made in the PMd cortex and in the M1 using a pair of implanted 96electrode arrays. In Figure 6, we also reproduced data from Lara et al., 2018, and Zimnik and Churchland, 2021. Details of animal care, surgery, electrophysiological recordings, and behavioral task for those data can be found in the Methods section of the respective papers.
Arm model
To simulate reaching movements, we used the planar twolink arm model described in Li and Todorov, 2004. The two links have lengths $L}_{1$ and $L}_{2$, masses $M}_{1$ and $M}_{2$, and moments of inertia $I}_{1$ and $I}_{2$, respectively. The lower arm’s center of mass is located a distance $D}_{2$ from the elbow. By considering the geometry of the upper and lower limb, the position of the hand and elbow can be written as vectors ${\mathit{y}}_{\text{h}}(t)$ and $\mathit{y}}_{\text{e}$ given by
The joint angles $\mathit{\theta}=({\theta}_{1};{\theta}_{2}{)}^{T}$ evolve dynamically according to the differential equation
where $\mathit{m}(t)$ is the momentary torque vector, $\mathcal{M}$ is the matrix of inertia, $\mathcal{X}$ accounts for the centripetal and Coriolis forces, and $\mathcal{B}$ is a damping matrix representing joint friction. These parameters are given by
with $a}_{1}={I}_{1}+{I}_{2}+{M}_{2}{L}_{1}^{2$, $a}_{2}={M}_{2}{L}_{1}{D}_{2$, and $a}_{3}={I}_{2$.
iLQR algorithm
Throughout this work, we used the iLQR algorithm (Li and Todorov, 2004) to find the locally optimal inputs that minimize our cost function. iLQR is a trajectory optimization algorithm that can handle nonlinear dynamics and nonquadratic costs. iLQR works in an iterative manner, by linearizing the dynamics and performing a quadratic approximation of the cost at each iteration, thus turning the control problem into a local linear quadratic problem whose unique solution is found using LQR (Kalman, 1960). The LQR solver uses a highly efficient dynamic programming approach that exploits the sequential structure of the problem. Our implementation of iLQR (Schimel et al., 2021) followed from Li and Todorov, 2004, with the difference that we performed regularization of the local curvature matrix as recommended by Tassa, 2011.
Generation of the highdimensional readouts and networks
Generation of inhibitorystabilized networks
Simulations in Figures 1, 3, 5, and 6 were conducted using ISNs. Those were generated according to the procedure described in Hennequin et al., 2014, with minor adjustments. In brief, we initialized strongly connected chaotic networks with sparse and lognormally distributed excitatory weights, and stabilized them through progressive $\mathcal{H}}_{2$optimal adjustments of the inhibitory weights until the spectral abscissa of the connectivity matrix fell below 0.8. This yielded strongly connected but stable networks with a strong degree of nonnormality. When considering a larger range of ISNs (Figure 5), we independently varied both the variance of the distribution of initial excitatory weights and the spectral abscissa below which we stopped optimizing the inhibitory weights.
Generation of additional networks in Figure 5
To assess the generality of our findings in Figure 5, we additionally generated randomly connected networks by sampling each weight from a Gaussian distribution with $\sigma =R/\sqrt{N}$, where the spectral radius $R$ was varied between 0 and 0.99. We also sampled skewsymmetric networks by drawing a random network $\mathit{S}$ as above, and setting $\mathit{W}=(\mathit{S}{\mathit{S}}^{T})/2$. We varied the radius $R$ of the $\mathit{S}$ matrices between 0 and 5. Moreover, we considered diagonally shifted skewsymmetric networks $\mathit{W}=(\mathit{S}{\mathit{S}}^{T})/2+\lambda \mathit{I}$, where $\lambda$ denotes the real part of all the eigenvalues and was varied between 0 and 0.8.
The elements of the readout matrix $\mathit{C}$ mapping neural activity onto torques were drawn from a normal distribution with zero mean and standard deviation $\sigma}_{C}=0.05/\sqrt{N$. This was chosen to ensure that firing rates of standard deviation on the order of 30 Hz would be decoded into torques of standard deviation ∼2 N/m, which is the natural variation required for the production of the reaches we considered.
Details of Figure 4
To more easily dissect the phenomena leading to the presence or absence of preparation, we turned to 2D linear networks in Figure 4. We modeled nonnormal networks with a connectivity $\mathit{W}=\left[\begin{array}{cc}0& 0\\ w& 0\end{array}\right]$ and oscillatory networks with connectivity $\mathit{W}=\left[\begin{array}{cc}0& w\\ w& 0\end{array}\right]$. The activity of the two units evolved as
and directly influenced the acceleration of a 1D output $y(t)$ according to
where ${\mathit{C}}_{i}=\left[\begin{array}{cc}\mathrm{cos}{\theta}_{C}& \mathrm{sin}{\theta}_{C}\end{array}\right]$ was a row matrix reading the activity of the network along an angle $\theta}_{C$ from the horizontal (first unit). Our setup aimed to mirror the reaching task studied in this work. We thus optimized inputs to minimize the following cost function:
where ${y}^{\star}=20$ was the target position.
Computing networks’ controllability and observability to predict preparation in Figure 5
As part of our attempt to predict how much a network will prepare given its intrinsic properties only, we computed the prospective potency of the nullspace α, and the controllability of the readout β. For a stable linear dynamical system described by
the system’s observability Gramian $\mathit{Q}$ can be computed as the unique positivedefinite solution of the Lyapunov equation
The prospective potency of the nullspace $\mathit{C}}^{\mathrm{\perp}$ is then defined as
Note that this measure $\alpha$ is invariant to the specific choice of basis for the nullspace $\mathit{C}}^{\mathrm{\perp}$. Similarly, to assess the controllability of the readout, we first computed the controllability Gramian of the system $\mathit{P}$, which is the solution of
with $\mathit{B}=\mathit{I}$ in our system. We then defined the controllability of the readout as
Details of Figure 6
Cost function
We modeled sequences of reaches by modifying our cost functional to account for the presence of two targets, as
where $\tau$ describes how long the monkey’s hands had to stay on the intermediate target before performing its second reach. We used $\tau =600\phantom{\rule{thinmathspace}{0ex}}\text{ms}$ and ${\alpha}_{\text{pause}}=100$ for the double reaches in which a pause was explicitly enforced during the experiment. For compound reaches, the experiment did not require monkeys to stop for any specific duration. However, to ensure that the hand stopped on the target in the model (as it does in experiments when monkeys touch the screen) rather than fly through it, we set $\tau =6\phantom{\rule{thinmathspace}{0ex}}\text{ms}$ and ${\alpha}_{\text{pause}}=100$ when modeling compound reaches.
Preparatory subspace analysis
Lara et al., 2018, proposed an analysis to identify preparatory and movementrelated subspaces. This analysis allows to assess when the neural activity enters those subspaces, independently of whether it is delayperiod or postgocue activity.
The method identifies a set of preparatory dimensions and a set of movement dimensions, constrained to be orthogonal to one another, as in Elsayed et al., 2016. These are found in the following manner: the trialaveraged neural activity is split between preparatory and movementrelated epochs, yielding two matrices of size $N\times MT$, where $N$ is the number of neurons, $T$ is the number of time bins, and $M$ is the number of reaches. One then optimizes the $W}_{\text{prep}}\in {\mathbb{R}}^{N\times {d}_{\text{prep}}$ and $W}_{\text{mov}}\in {\mathbb{R}}^{N\times {d}_{\text{mov}}$ (where $d}_{\text{prep}$ and $d}_{\text{mov}$ are the predefined dimensions of the two subspaces) such that the subspaces respectively capture most variance in the preparatory and movement activities, while being orthogonal to one another. This is achieved by maximizing the following objective:
where $C}_{\text{prep}/\text{mov}$ are the covariance matrices of the neural activity during the preparatory and movement epochs, respectively. The normalizing constant ${Z}_{\text{prep}}({d}_{\text{prep}})$ denotes the maximum amount of variance in preparatory activity that can be captured by any subspace of dimension $d}_{\text{prep}$ (this is found via SVD), and similarly for ${Z}_{\text{mov}}({d}_{\text{mov}})$. The objective is maximized under the constraints ${W}_{\text{prep}}^{T}{W}_{\text{mov}}=0$, ${W}_{\text{prep}}^{T}{W}_{\text{prep}}=I$, and ${W}_{\text{mov}}^{T}{W}_{\text{mov}}=I$. We set subspace dimensions ${d}_{\text{prep}}={d}_{\text{mov}}=6$, although our results were robust to this choice.
The occupancy of the preparatory subspace was defined as
and that of the movement subspace was defined as
For single reaches, we defined preparatory epoch responses as the activity in a 300 ms window before the end of the delay period, and movementepoch responses as the activity in a 300 ms window starting 50 ms after the go cue. We normalized all neural activity traces using the same procedure as Churchland et al., 2012; Elsayed et al., 2016. For double reaches, we followed Zimnik and Churchland, 2021, and used neural activity traces from both single reaches and the first reach of doublereach sequences. Note that we did not include any activity from the second reaches in the sequence, or from compound reaches, when defining the subspaces.
Parameter table
Parameters used for the various simulations.
Symbol  Figure 1  Figure 2  Figure 3  Figure 5  Figure 4  Figure 6  Unit  Description 

$L}_{1$  30  –  30  cm  Length of the upper arm in model  
$L}_{2$  30    30  cm  Length of the forearm in model  
$I}_{1$  0.025  –  0.025  kg/m^{–2}  Inertia of upper arm  
$I}_{2$  0.045    0.045  kg/m^{–2}  Inertia of forearm  
$M}_{1$  1.4  –  1.4  kg  Mass of upper arm  
$M}_{2$  1.0    1.0  kg  Mass of forearm  
$D}_{2$  16  –  16  cm  Elbow to lower arm center of mass distance  
$r$  12  20  12  cm  Radius of the target reach  
$\mu}_{\mathbf{\text{h}}$  20  0  –  mV  Mean baseline firing rate  
$\sigma}_{\mathbf{\text{h}}$  5  0    mV  s.t.d of the baseline firing rate  
$\alpha}_{\text{effort}$  5E7  1E5  5E7  –  Coeff. of input cost  
$\alpha}_{\text{null}$  1  1  10    Coeff. of cost of moving during the delay  
$\alpha}_{\text{pause}$    100  –  Coeff. of cost of moving between reaches  
$\tau$  150  ms  Singleneuron time constant  
$\mathrm{\Delta}}_{\text{move}}^{(1)$  –  300  ms  Duration of the first reach  
$\mathrm{\Delta}}_{\text{prep}$  500  300    300  500  ms  Delay period time  
$T$  1100  900  –  900  2000—1406  ms  Total movement duration  
$N$  200    200    Number of neurons  
$p}_{\text{con}$  0.2  –  0.2  –  Connection probability (E neurons)  
$p}_{\text{E}$  80    80    Percentage of E neurons  
$p}_{\text{I}$  20  –  20  –  Percentage of I neurons 
Appendix 1
A1.1 Choice of the hyperparameters of the model
Our cost function for the delayed singlereaching task was composed of three components. The relative weighings of the different terms in our cost, which are hyperparameters of the model, affect the way in which the task is solved. To ensure robustness of our results to hyperparameter changes, we scanned the space of ${\alpha}_{\text{null}}$ and ${\alpha}_{\text{effort}}$ (as the solution is invariant to scaling of the cost, only those relative weighings matter), and evaluated the solutions found across this hyperparameter space for a delayed reach of 300 ms.
Our evaluation was based on multiple criteria. We considered the target to have been successfully reached if the mean distance to the target in the last 200 ms of the movement was lower than 5 mm (for a reach radius of 12 cm). We considered that the requirement to stay still during the delay period was satisfied if the mean torques during preparation were smaller than 0.02 N/m. We computed the preparation index and total cost as described in Equations 4 and 5. We moreover computed the total input energy per neuron as $\frac{1}{N}{\int}_{{\mathrm{\Delta}}_{prep}}^{T}\Vert \bm{u}{\Vert}^{2}dt$, and the maximum velocity as ${\text{max}}_{t}\sqrt{\dot{x}(t{)}^{2}+\dot{y}(t{)}^{2}}$. These various quantities are shown for a range of hyperparameters in Appendix 1—figure 1, with the choice of hyperparameters used throughout our simulations marked with a red star. This shows that the behavior of the model is consistent across a range of hyperparameter settings around the one we used.
In Appendix 1—figure 2, we illustrate the output of the model for several hyperparameter settings. One can notice that for very small values of ${\alpha}_{\text{effort}}$ the reach is successful, but executed with larger torques and velocity than is necessary – e.g., the red and yellow reaches are equally successful but the red one is much faster – which comes at the cost of larger inputs. We chose the set of hyperparameters for our simulations such as to lie in an intermediate regime in which the task is solved successfully, but without requiring more inputs than necessary.
A1.2 Investigation of the effect of the network decay timescale
Appendix 1—figure 3 highlighted that preparatory inputs tend to consistently arise late during the delay period. We hypothesized that this may be a reflection of the intrinsic tendency of the network dynamics to decay, such that inputs given too early may be ‘lost’. To test this, we changed the characteristic timescale of the dynamics during preparation only, leading to the following dynamics:
with $\tau}_{\text{mov}}=150\phantom{\rule{thinmathspace}{0ex}}\text{ms$. This allowed us to evaluate whether having dynamics decaying more slowly during preparation led to inputs starting earlier. Note that we also rescaled the inputs during preparation by $\frac{{\tau}_{\text{prep}}}{{\tau}_{\text{mov}}}$, to ensure that the effective cost of the inputs was not affected by the timescale change.
As shown in Appendix 1—figure 3, inputs started rising earlier when the network’s decay timescale was longer. This was consistent with the hypothesis that the length of the window of preparation that the optimal controller uses depends on the network’s intrinsic timescale.
A1.3 Additional results in the 2D system
Our visualization of the behavior of 2D networks in Appendix 1—figure 4 allowed us to identify features of the dynamics that were well suited to predicting preparation. Below, we compute $\alpha $ and $\beta $ numerically and analytically in 2D oscillatory and nonnormal networks, to gain insights into how these quantities vary with the networks’ dynamics. We then show how preparation can be predicted highly accurately across a large number of 2D systems, using only those quantities to summarize the network dynamics.
A1.3.1 Controllability and observability computations
In Appendix 1—figure 4, we computed $\alpha $ and $\beta $ numerically, as a function of the connectivity strength and the choice of readout, for the nonnormal and the oscillatory motifs shown in Appendix 1—figure 4.
This highlights the very different behaviors of the two networks, which are to some extent also reflected in higherdimensional models. In particular, we find a strong effect of the alignment between the readout and the network dynamics in nonnormal networks, while $\alpha $ and $\beta $ are independent of ${\theta}_{C}$ in oscillatoryq networks. Interestingly, we see that $\beta $ is constant across all oscillatory networks, while $\alpha $ increases with $w$.
As the reduced 2D model is more amenable to mathematical analysis than its highdimensional counterpart, we can gain further insights into the origin of these differences by computing $\alpha (w,{\theta}_{C})$ and $\beta (w,{\theta}_{C})$ analytically.
Recall that the observability Gramian Q of a linear inputdriven dynamical system satisfies
and the controllability Gramian satisfies
and that we defined $\alpha =\text{Tr}({\bm{C}}^{\perp}\bm{Q}{\bm{C}}^{{\perp}^{T}})$ and $\beta =\text{Tr}(\bm{C}\bm{P}{\bm{C}}^{T})$, where ${\bm{C}}^{\perp}$ denotes the nullspace of the readout matrix. Below, we compute these quantities for the 2D oscillatory and nonnormal networks, with $\bm{B}=\bm{I}$ and $\bm{C}$ a unitnorm vector whose direction we parametrize via a quantity ${\theta}_{C}$. Note that we ignore the effect of dt and $\tau $ in the mathematical analysis, as those quantities can straightforwardly be included in the final result via a rescaling of $w$ and $\bm{B}$.
Oscillatory network
In the case of $\bm{A}=\bm{I}+\bm{S}$, where $\bm{S}$ is a skewsymmetric network (i.e. ${\bm{S}}^{T}=\bm{S}$), Equation A1.3 is solved by $\bm{P}=\bm{I}/2$ independently of the value of $\bm{S}$. This explains why $\beta =\text{Tr}(\bm{C}\bm{P}{\bm{C}}^{T})=\frac{1}{2}\Vert \bm{C}{\Vert}^{2}$ is independent of both the connectivity strength $w$ and the orientation of the readout ${\theta}_{C}$ for skewsymmetric networks (see Appendix 1—figure 4, bottom right). Practically, this means that skewsymmetric networks are equally controllable in all directions: when driven by random inputs, these networks display isotropic activity of equal variance along all directions. Moreover, as $w$ controls the oscillation frequency of the network, but does not change the decay timescale of the eigenmodes, the amount of variance generated by a random stimulation is independent of $w$. Interestingly, we can see in Appendix 1—figure 4 (top right) that $\alpha $ displays a different behavior, and increases with $w$. As highlighted above, skewsymmetric systems are rotationally symmetric. Without loss of generality, we can thus define our 1D vector to read out the first unit, i.e., $C=\left[\begin{array}{cc}1& 0\end{array}\right]$.
The observability Gramian must satisfy
This can be found in closed form by solving the 2D system of equations, yielding
From there, we obtain $\alpha =\text{Tr}({\bm{C}}^{\perp}\bm{Q}{\bm{C}}^{{\perp}^{T}})=\frac{{w}^{2}}{4(1+{w}^{2})}$. As found empirically, this quantity will initially increase before plateauing toward $1/4$ as $w$ becomes large.
One might wonder why observability displays such a dependence on the oscillatory frequency of the network, even though the network is rotationally symmetric, and $w$ does not affect the decay timescale. As highlighted in Equation A1.2, controllability and observability Gramian would be identical for a skewsymmetric system if $\bm{C}=\bm{I}$. However, a feature of the systems we consider is the existence of a nullspace, i.e., the fact that the readout $\bm{C}$ only targets a subset of dimensions across the whole space (implying that ${\bm{C}}^{T}\bm{C}$ is a lowrank matrix). Intuitively, the reason why $\alpha $ increases with $w$ while $\beta $ is constant in skewsymmetric networks can be understood as follows: $\alpha $ is computing how much readout activity a vector initialized in the nullspace of $\bm{C}$ will generate, while $\beta $ is computing the amount of energy that will be generated across all directions by a vector initialized in the readout space. Thus, assuming once again $\bm{C}=\left[\begin{array}{cc}1& 0\end{array}\right]$ and ${\bm{C}}^{\perp}=\left[\begin{array}{cc}0& 1\end{array}\right]$, the activity of vectors initialized along $\bm{C}$ and ${\bm{C}}^{\perp}$ respectively and evolving autonomously from there is given by ${\bm{v}}_{C}(t)=\left[\begin{array}{cc}{e}^{t}\mathrm{cos}(wt)& {e}^{t}\mathrm{sin}(wt)\end{array}\right]$ and ${\bm{v}}_{{C}^{\perp}}(t)=\left[\begin{array}{cc}{e}^{t}\mathrm{sin}(wt)& {e}^{t}\mathrm{cos}(wt).\end{array}\right]$
From there, we can compute $\beta ={\int}_{0}^{\infty}\Vert {\bm{v}}_{C}(t){\Vert}^{2}dt={\int}_{0}^{\infty}{e}^{2t}dt=\frac{1}{2}$. Thus, as found above, only the decay timescale of the envelope (fixed to 1 here) affects the value of $\beta $.
Importantly, $\alpha $ will instead have a dependence on $w$ arising from the fact that it depends on the size of the activity projected into the readout, as
The dependence of this quantity on $w$ can be understood by the fact that activity patterns initialized in the readout nullspace benefit from the existence of rotational dynamics, which allows them to be readout before the activity decays completely.
Nonnormal network
In the nonnormal network, we have $A=I+W=\left[\begin{array}{cc}1& 0\\ w& 1\end{array}\right]$. The nonnormal 2D system, unlike its oscillatory counterpart, does not have rotational symmetry. Thus, to remain general, we will consider $\bm{C}({\theta}_{C})=\left[\begin{array}{cc}\mathrm{cos}\phantom{\rule{0.1667em}{0ex}}{\theta}_{C}& \mathrm{sin}\phantom{\rule{0.1667em}{0ex}}{\theta}_{C}\end{array}\right]$, and ${\bm{C}}^{\perp}({\theta}_{C})=\left[\begin{array}{cc}\mathrm{sin}\phantom{\rule{0.1667em}{0ex}}{\theta}_{C}& \mathrm{cos}\phantom{\rule{0.1667em}{0ex}}{\theta}_{C}\end{array}\right]$. Solving Equation A1.3 for $\bm{B}=\bm{I}$ leads to an expression for the controllability Gramian of the nonnormal system as
Similarly, computation of the observability Gramian leads to
We can then compute
and
This highlights the dependence of $\alpha $ and $\beta $ on ${\theta}_{C}$, which can also be seen in Appendix 1—figure 4 (left). Interestingly, these expressions also make evident the supralinear scaling of $\alpha $ and $\beta $ with $w$ in nonnormal networks. Note however that we never investigate preparation in the very large $w$ regime, as the simulation of such networks with discretized dynamics is prone to numerical issues.
A.13.2 Predicting preparation in 2D networks
To assess how well preparation could be predicted from the controltheoretic properties $\alpha $ and $\beta $ (c.f. main text) of 2D networks, we generated 20,000 networks with weight matrix
where $a\sim \mathcal{U}(\mathrm{0,0.8})$, $\omega \sim \mathcal{U}(\mathrm{0,4})$, and ${w}_{\text{ff}}\sim \mathcal{U}(0,4)$. Equation A1.5 implies that $\bm{W}$ has a pair of complexconjugate eigenvalues $a\pm i\omega $, and also embeds a feedforward coupling of strength ${w}_{\text{ff}}$ from the second to the first dimension. For each network configuration, we computed the corresponding values of $\alpha $ and $\beta $. To confirm our intuition that the preparation index should increase with $\alpha $ and decrease with $\beta $, we first attempted to fit $\text{prep. index}={c}_{0}+{c}_{1}\frac{\alpha}{\beta}$. Interestingly, we found that while this quantity was positively correlated with the preparation index across networks, a substantial fraction of variance remained unexplained (test ${R}^{2}=0.16$). Labeling the preparation index by the rotational frequency of the network highlighted that a substantial fraction of the variance across networks came from this timescale of oscillations (Appendix 1—figure 5, left). Indeed, a regression model of the $\text{prep. index}={c}_{0}+{c}_{1}\omega \frac{\alpha}{\beta}$ captured 80% of the variance in preparation index, yielding an accurate fit across networks with only two free parameters (Appendix 1—figure 5, right).
We stress that the predictive power of these simple fits is remarkable given that the preparation index comes out of a complex process of optimization over control inputs. Thus, the controltheoretic quantities $\alpha $ and $\beta $ appear to appropriately summarize the benefits of preparation for individual networks.
The fact that the preparation index also grows with $\omega $ can be understood by considering the alignment between the activity trajectories which the network can autonomously generate and those that are required for solving the motor task. Indeed, a network that is intrinsically unable to generate outputs with the right oscillatory timescale would have to rely on movementrelated inputs, i.e., would have a low preparation index. As observed here, the network’s characteristic frequency has a big impact in 2D networks, consistent with $\omega $ determining the only oscillatory pattern that the network can generate on its own. For highdimensional networks, however, we did not have to incorporate such a measure of compatibility between task requirements and network dynamics (Appendix 1—figure 5). We speculate that this is due to averaging effects. Indeed, larger networks possess a wide range of intrinsic oscillatory timescales, and the readout matrix – which here was not aligned to the network’s dynamics in any specific way – is expected to read out a little bit of all frequencies, including taskappropriate ones.
A1.4 Comparison across networks
Our main investigation was largely focused on behavior of inhibitionstabilized networks, which are believed to constitute good models of M1. We however found that the expression we derived to obtain a network’s preparation index from its controltheoretic properties generalized across to other types of networks. Below, we detail the other network families we considered, and show how their dynamics qualitatively differ from the ISN, although their preparation can be predicted using the same quantities.
We modeled three additional classes of networks: randomly connected networks with either (i) unstructured or (ii) skewsymmetric connectivities, (iii) a surrogate network obtained by applying a similarity transformation to the ISN that preserved its eigenvalue spectrum but eliminated any ‘nonnormality’ (i.e., we found $\mathcal{T}$ such that $\stackrel{~}{\bm{A}}={\mathcal{T}}^{1}\bm{A}\mathcal{T}$, where $\stackrel{~}{\bm{A}}$ was a diagonal matrix with the same eigenvalues as $\bm{A}$). Note that we did not apply the transformation to the readout or input matrices, such that the transfer function of the system was changed by our transformation. This was voluntary, as we were interested in the effect that transforming the dynamics would have on the inputoutput response. These networks were chosen for the diversity of dynamical motifs they exhibit: combinations of rapidly and slowly decaying modes, oscillations, and transient dynamics. Moreover, each of these network families could be sampled from in a straightforward manner, allowing to compute results across many instantiations of each network type. We again used random readout matrices not specifically adjusted to the dynamics of the network nor to the motor task.
To get an intuition for how different networks solve the task, we generated one network from each family and qualitatively compared their inputs and internal activations when performing the same delayed reach (Appendix 1—figure 6A). We first considered an unconnected network, i.e., a network whose recurrent weights were all 0. Unsurprisingly, this network had no use for a preparation phase. Indeed, there is no benefit to giving early inputs as the network is unable to amplify them. More surprisingly, a random network with a much stronger connectivity – as can be seen in its eigenvalue spectrum forming a small ball of radius close to 1 (Appendix 1—figure 6A, top) – also displayed very little preparation. The strong, visually apparent similarity between the inputs to the random and unconnected networks suggests that the optimal way of controlling the random network relies largely on ignoring its internal dynamics and solving the task almost entirely in an inputdriven regime. The example skewsymmetric network, which had imaginary eigenvalues only (ranging between –5.5 and 5.5), displayed considerably more preparation, but still relied on strong inputs during the movement phase that resembled those of the unconnected and random networks. Finally, the ISN relied much more on preparation; the small inputs it receives are strongly amplified into large activity patterns owing to its strong, nonnormal recurrent connectivity. Interestingly however, the similarity transformed ISN lost much of that ability to amplify inputs, instead displaying dynamics resembling that of the skewsymmetric network. This highlights the effect of the ISN’s nonnormal dynamics in shaping the network’s activity and optimal inputs.
Next, we assessed more directly how beneficial preparation was for the different networks. We evaluated how the total loss and preparation index evolved as a function of the delayperiod length (Appendix 1—figure 6B). As expected, the control of networks that relied on preparation (skewsymmetric and ISN) benefited more from longer delays. The ISN has markedly lower control cost and higher preparation index than other networks, reflecting the fact that even weak (thus energetically cheap) inputs were sufficient to produce internal activity and thus output torques of the desired magnitude (Appendix 1—figure 6A, right).
The above results give a sense of the range of possible dynamics that different types of networks display. Interestingly, despite these differences, we showed in Appendix 1—figure 5 that the preparation index could be predicted with a simple formula across all networks.
Data availability
The current manuscript is a computational study, so no data have been generated for this manuscript. Modelling code and the code used to generate figures and analyses is available at https://github.com/marineschimel/whyprep2 (copy archived at Schimel, 2024).
References

ConferenceDifferentiable Mpc for EndtoEnd Planning and ControlAdvances in Neural Information Processing Systems 31 (NeurIPS 2018). pp. 8289–8300.

Dynamics of sparsely connected networks of excitatory and inhibitory spiking neuronsJournal of Computational Neuroscience 8:183–208.https://doi.org/10.1023/a:1008925309027

Delay of movement caused by disruption of cortical preparatory activityJournal of Neurophysiology 97:348–359.https://doi.org/10.1152/jn.00808.2006

The origin of corticospinal projections from the premotor areas in the frontal lobeThe Journal of Neuroscience 11:667–689.https://doi.org/10.1523/JNEUROSCI.110300667.1991

Independence of movement preparation and movement initiationThe Journal of Neuroscience 36:3007–3015.https://doi.org/10.1523/JNEUROSCI.324515.2016

The computational and neural bases of contextdependent learningAnnual Review of Neuroscience 46:233–258.https://doi.org/10.1146/annurevneuro092322100402

Nonnormal amplification in random balanced neuronal networksPhysical Review E 86:011909.https://doi.org/10.1103/PhysRevE.86.011909

Contributions to the theory of optimal controlBoletín de La Sociedad Matemática Mexicana 5:102–119.

Neuroscience out of control: controltheoretic perspectives on neural circuit dynamicsCurrent Opinion in Neurobiology 58:122–129.https://doi.org/10.1016/j.conb.2019.09.001

Cortical activity in the null space: permitting preparation without movementNature Neuroscience 17:440–448.https://doi.org/10.1038/nn.3643

ConferenceIterative linear quadratic regulator design for nonlinear biological movement systemsICINCO 2004, Proceedings of the First International Conference on Informatics in Control, Automation and Robotics. pp. 222–229.

Predicting reaction time from the neural state space of the premotor and parietal grasping networkThe Journal of Neuroscience 35:11415–11432.https://doi.org/10.1523/JNEUROSCI.171415.2015

BookModel Predictive Control: Theory, Computation, and DesignMadison, WI: Nob Hill Publishing.

Cortical control of arm movements: a dynamical systems perspectiveAnnual Review of Neuroscience 36:337–359.https://doi.org/10.1146/annurevneuro062111150509

A network perspective on sensorimotor learningTrends in Neurosciences 44:170–181.https://doi.org/10.1016/j.tins.2020.11.007

Motor primitives in space and time via targeted gain modulation in cortical networksNature Neuroscience 21:1774–1783.https://doi.org/10.1038/s4159301802760

A neural network that finds A naturalistic solution for the production of muscle activityNature Neuroscience 18:1025–1033.https://doi.org/10.1038/nn.4042

BookTheory and Implementation of Biomimetic Motor ControllersHebrew University of Jerusalem.

Optimal feedback control as a theory of motor coordinationNature Neuroscience 5:1226–1235.https://doi.org/10.1038/nn963

ConferenceOptimal control methods suitable for biomechanical systems25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. pp. 1758–1761.https://doi.org/10.1109/IEMBS.2003.1279748

Independent generation of sequence elements by motor cortexNature Neuroscience 24:412–424.https://doi.org/10.1038/s41593021007985
Article and author information
Author details
Funding
Engineering and Physical Sciences Research Council (RG94782)
 Marine Schimel
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We are grateful to Matthew T Kaufman and Mark M Churchland for sharing data for the monkey experiments. We thank Kristopher Jensen, David Liu, Javier Antorán, and Rory Byrne for helpful comments on the manuscript. MS was funded by an EPSRC DTP studentship, and part of this work was performed using resources provided by the Cambridge Tier2 system operated by the University of Cambridge Research Computing Service (http://www.hpc.cam.ac.uk) funded by EPSRC Tier2 capital grant EP/P020259/1. For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission.
Version history
 Sent for peer review:
 Preprint posted:
 Reviewed Preprint version 1:
 Reviewed Preprint version 2:
 Reviewed Preprint version 3:
 Version of Record published:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.89131. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2023, Schimel et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,386
 views

 84
 downloads

 5
 citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.