# Abstract

# Summary

During delayed ballistic reaches, motor areas consistently display movement-specific activity patterns prior to movement onset. It is unclear why these patterns arise: while they have been proposed to seed an initial neural state from which the movement unfolds, recent experiments have uncovered the presence and necessity of ongoing inputs during movement, which may lessen the need for careful initialization. Here, we modelled the motor cortex as an input-driven dynamical system, and we asked what the optimal way to control this system to perform fast delayed reaches is. We find that delay-period inputs consistently arise in an optimally controlled model of M1. By studying a variety of network architectures, we could dissect and predict the situations in which it is beneficial for a network to prepare. Finally, we show that optimal input-driven control of neural dynamics gives rise to multiple phases of preparation during reach sequences, providing a novel explanation for experimentally observed features of monkey M1 activity in double reaching.

**eLife assessment**

This **valuable** study provides a new perspective on preparatory activity occurring before the onset of movement. The authors report that, for a wide variety of recurrent networks, the optimal inputs should start before the desired network output. The authors present **convincing** evidence by combining mathematically tractable analyses in linear networks and numerical simulation in nonlinear networks. One limitation is that the only cost function considered for the input is that of magnitude. While this constraint is reasonable, it is suspected that other cost functions could lead to greater realism of model responses.

During the production of ballistic movements, the motor cortex is thought to operate as a dynamical system whose state trajectories trace out the appropriate motor commands for downstream effectors (Shenoy et al., 2013; Miri et al., 2017; Russo et al., 2018). The extent to which these cortical dynamics are controlled by exogenous inputs before and/or during movement is the subject of ongoing study.

On the one hand, several experimental and modelling studies point to a potential role for exogenous inputs in motor preparation. First, cortical state trajectories are empirically well described by low-dimensional dynamics that evolve near-autonomously during movement (Churchland et al., 2012; Pandarinath et al., 2018; Schimel et al., 2022), such that there is a priori no reason to suspect that inputs are required for motor production. Rather, inputs would be required during preparation to bring the state of the cortical network into a suitable initial condition. This input-driven seeding process is corroborated by observations of movement-specific primary motor cortex (M1) activity arising well before movement initiation (Lara et al., 2018; Kaufman et al., 2014; Churchland et al., 2012; Meirhaeghe et al., 2023; Figure 1A), and associated models demonstrate the critical role of preparatory inputs therein (Sussillo et al., 2015; Hennequin et al., 2014; Kao et al., 2021).

On the other hand, recent studies have shown that the cortex receives critical input from the thalamus during movement production (Sauerbrei et al., 2020), and that sensory feedback may also contribute significantly to the observed dynamics of M1 (Kalidindi et al., 2021).

Moreover, most published network models of delayed reaches are able to perform the task just as well without preparatory inputs, i.e. with external inputs forcefully confined to the movement epoch – an illustratory example is shown in Figure 1B. Thus, the relative contributions of preparatory vs. movement-epoch inputs to motor cortex dynamics remain unclear.

In addition to the specific form that inputs to cortical dynamics might take, one may ask more broadly about the computational role of motor preparation. Motor preparation is known to benefit behaviour (e.g. by shortening reaction times and enabling more accurate execution Riehle and Requin, 1989; Churchland and Shenoy, 2007; Michaels et al., 2015) and may facilitate motor learning (Sheahan et al., 2016; Sun et al., 2022). However, from the perspective of cortical dynamics, preparation also introduces additional constraints. Specifically, the high density of M1 neurons projecting directly to the spinal cord (Dum and Strick, 1991) sug-gests that motor cortical outputs control lower-level effectors with little intermediate processing. For preparatory processes to avoid triggering premature movement, any pre-movement activity in the motor and dorsal premotor (PMd) cortices must carefully exclude those pyramidal tract neurons. While this can be achieved by constraining neural activity to evolve in a nullspace of the motor output (Kaufman et al., 2014), the question nevertheless arises: what advantage is there to having neural dynamics begin earlier in a constrained manner, rather than unfold freely just in time for movement production?

Here we sought a normative explanation for motor preparation at the level of motor cortex dynamics: we asked whether preparation arises in recurrent neural networks (RNNs) performing delayed reaching tasks, and what factors lead to more or less preparation. Such an explanation could not be obtained from previous network models of delayed reaches, as they typically as-sume from the get-go that the cortical network receives preparatory inputs during a fixed time window preceding the go cue (Sussillo et al., 2015; Kao et al., 2021). In this case, pre-movement activity is by design a critical determinant of the subsequent behaviour (Sussillo et al., 2015; Kao et al., 2021; Zimnik and Churchland, 2021). In this work, we removed this modelling assumption and studied models in which the correct behaviour could in principle be obtained without explicit motor preparation.

To study the role of motor preparation, and that of exogenous inputs in this process, we followed an optimal control approach (Harris and Wolpert, 1998; Todorov and Jordan, 2002; Yeo et al., 2016). We considered the dynamics of a recurrent neural network (RNN) model of M1 coupled to a model arm (Todorov and Li, 2003), and used a standard control cost functional to quantify and optimize performance in a delayed-reaching task. We used the iLQR algorithm (Li and Todorov, 2004) to find the spatiotemporal patterns of network inputs that minimize this cost functional, for any given network connectivity. Critically, these inputs could arise both before and during movement; thus, our framework allowed for principled selection amongst a continuum of motor strategies, going from purely autonomous motor generation following preparation, to purely input-driven unprepared dynamics.

We considered an inhibition-stabilized network – which was shown previously to capture prominent aspects of monkey M1 activity (Hennequin et al., 2014; Kao et al., 2021) – and found that optimal control of the model requires preparation, with optimal inputs arising well before movement begins. To understand what features of network connectivity lead to optimal preparatory control strategies, we first turned to low-dimensional models, which could be more easily dissected. We then generalized insights from those systems back to high-dimensional networks using tools from control theory, and found that preparation can be largely explained by two quantities summarizing the dynamical response properties of the network.

Finally, we studied the optimal control of movement *sequences*. Consistent with recent experimental findings (Zimnik and Churchland, 2021), we observed that optimal control of compound reaches leads to input-driven preparatory activity in a dedicated activity subspace prior to each movement chunk.

Overall, our results show that preparatory neural activity patterns arise from optimal control of reaching movements at the level of motor cortical circuits, thus providing a possible explanation for a number of observed experimental findings.

# Model

## A model of cortical dynamics for reaching movements

We considered a simple reaching task, in which the hand must move from a resting location to one of eight radially located targets in a 2D plane as fast as possible (Figure 1). The target had to be reached within 600 ms of a go cue that followed a delay period of varying (but known) duration. We modelled the trajectory of the hand via a two-jointed model arm (Li and Todorov, 2004; Kao et al., 2021), driven into motion by a pair of torques * m*(

*t*) (Methods). We further assumed that these torques arose as a linear readout of the momentary firing rates

*(*

**r***t*) of a population of M1 neurons,

where * C* was a randomly generated readout matrix. We modelled the dynamics of

*N*= 200 M1 neurons using a standard rate equation,

where the momentary population firing rate vector * r*(

*t*) was obtained by passing a vector of internal neuronal activations

*(*

**x***t*) through a rectified linear function

*φ*[·], element-wise. In Equation 2,

*is a constant input that establishes a baseline firing rate of 5 Hz on average, with a standard deviation of 5 Hz across neurons,*

**h***(*

**u***t*) is a task-dependent control input (see below), and

*denotes the matrix of recurrent connection weights. Throughout most of this work, we considered inhibition-stabilized M1 dynamics (Hennequin et al., 2014; Methods), which have previously been shown to produce activity resembling that of M1 during reaching (Kao et al., 2021). Thus, our model can be viewed as a two-level controller, with the arm being controlled by M1, and M1 being controlled by external inputs. Note that each instantiation of our model corresponds to a set of*

**W***,*

**W***, and*

**C***, none of which are specifically optimized for the task.*

**h**## To prepare or not to prepare?

Previous experimental (Churchland et al., 2012; Shenoy et al., 2013) and modelling (Hennequin et al., 2014; Sussillo et al., 2015; Pandarinath et al., 2018) work suggests that fast ballistic movements rely on strong, near-autonomous internal dynamics in M1. Network-level models of ballistic control thus rely critically on a preparation phase during which the motor cortex is driven into a movement-specific state that seeds its subsequent autonomous dynamics (Kao et al., 2021; Sussillo et al., 2015). However, somewhat paradoxically, the same re-curent neural network models can also solve the task in a completely different regime, in which task-related inputs arise during movement only, with no preparatory inputs whatsoever. We illustrate this dichotomy in Fig-ure 1. The same center-out reach can be produced with control inputs to M1 that arise either prior to movement only (full lines), or during movement only (dashed lines). In the latter case, no reach-specific preparatory activity is observed, making the model inconsistent with experimental findings. But what rationale is there in preparing for upcoming movements, then?

To address this question, we formulated delayed reaching as an optimal control problem, and asked what external inputs are required, and at what time, to drive the hand into the desired position with minimum control effort. Specifically, we sought inputs that were as weak as possible yet accurately drove the hand to the target within an alloted time window. We also penalized inputs that caused premature movement before the go cue.

Thus, we solved for spatio-temporal input trajectories that minimized a cost functional capturing the various task requirements. Our cost was composed of three terms: penalizes deviations away from the target, with an “urgency” weight that increases quadratically with time, thus capturing the implicit incentive for animals to perform fast reaches in such experiments (which are normally conducted in sessions of fixed duration). penalizes premature movement during preparation, as measured by any deviation in position, speed and acceleration of the hand. Finally, penalizes control effort in the form of input magnitude throughout the whole trial, thus promoting energy-efficient control solutions amongst a typically infinite set of possibilities (Kao et al., 2021; Sterling and Laughlin, 2015). Note that can be viewed as a standard regularization term, and must be included to ensure the control problem is well defined. The total objective thus had the following form:

where * θ* and denote the position and velocity of the hand in angular space, Δ

_{prep}was the duration of the delay period and

*T*that of the movement period. As and depend on

*(*

**u***t*) implicitly through Equations 1 and 2, is a function of

*only. Importantly, we allowed for inputs within a time window beginning Δ*

**u**_{prep}ms before, and ending

*T*ms after the go cue (set at

*t*= 0). Therefore, both preparation-only and movement-only input strategies (cf. Figure 1) could potentially arise, as well as anything in between.

Here, we solved for the optimal control inputs using the iterative linear quadratic regulator algorithm (iLQR; Li and Todorov, 2004), an efficient trajectory optimization algorithm that is well-suited for handling the nonlinear nature of both the arm’s and the network’s dynamics. As our primary goal was to assess the role of preparation in a normative way, we did not study the putative circuit dynamics upstream of M1 that might lead to the computation of these optimal inputs.

We balanced the various components of our cost functional by choosing α_{null} and α_{effort} to qualitatively match the behavioural requirements of a typical reach- and-hold task. Specifically, we tuned them jointly so as to ensure (i) stillness during preparation, and (ii) reach duration of approximately ~ 400 ms, with the hand staying within 0.5cm of the target for ~ 200 ms after the end of the reach. We ensured that the main qualitative features of the solution, i.e. the results presented below, were robust to the choice of hyperparameter values within the fairly large range in which the above soft-constraints are satisfied (Supplementary Material S1).

# Results

## Preparation arises as an optimal control strategy

Using the above control framework, we assessed whether the optimal way of performing a delayed reach involves preparation. More concretely, does the optimal control strategy of the model described in Equation 2 involve any preparatory inputs during the delay period? For any single optimally performed reach, we found that network activity began changing well before the go cue (Figure 2B, bottom), and that this was driven by inputs that arose early (Figure 2B, middle). Thus, although preparatory network activity cancels in the readout (such that the hand remains still; Figure 2B, top) and therefore does not contribute directly to movement it still forms an integral part of the optimal reach strat egy. To quantify how much the optimal control strat egy relied on inputs prior to movement, we defined the *preparation index* as the ratio of input magnitude during the delay period to that during the remainder of the trial:

We found that the preparation index rose sharply as we increased the delay period, and eventually plateaued at ~ 1.3 for delay periods longer than 300 ms (Figure 2C). Similarly, the total cost of the task was highest in the absence of preparation, and decreased until it also reached a plateau at Δ_{prep} ~ 300 ms (Figure 2C, black). This appears somewhat counterintuitive, as having a larger Δ_{prep} means that both and are accumulated over a longer period. To resolve this paradox, we examined each component of the cost function. We found that the overall decrease in cost with increasing preparation time was driven by a concurrent decrease in both and . The former effect was due to the model producing faster reaches (Figure 2C inset; hand position for a reach with (red) and without (blue) preparation) while the latter arose from the use of smaller control inputs when preparation was allowed. Together, these results suggest that the presence of a delay period changes the optimal control strategy for reaching, and increases performance in the task.

The results above show that delaying the reach beyond ~ 300 ms brings little benefit; in particular, all components of the cost stabilize past that point (Figure 2C). We thus wondered what features the optimally controlled dynamics would display as Δ_{prep} increased beyond 300 ms. Would the network defer preparation to a last minute surge, or prepare more gently over the entire preparatory window? Would the network produce the same neural activity patterns? We found that the optimal controller made very little use of any preparation time available up to 300 ms before the go cue: with longer preparation times, external input continued to arise just a couple of hundred milliseconds before movement initiation, and single neuron firing rates remained remarkably similar (Figure 3A). This was also seen in PCA projections of the firing rates, which traced out similar trajectories irrespective of the delay period (Figure 3B). We hypothesized that this behaviour is due to the network dynamics having a certain maximum characteristic timescale, such that inputs that arrive too early end up being “forgotten” - they increase and possibly without having a chance to influence . We confirmed this by varying the characteristic time constant (*τ* in Equation 2). For a fixed Δ_{prep}, we found that for larger (resp. lower) values of τ, the optimal control inputs started rising earlier (resp. later) and thus occupied more (resp. less) of the alloted preparatory period (Supplementary Material S3).

## Understanding optimal control in simplified models

Having established that the ISN model of M1 relies on preparatory inputs to solve the delayed reaching task, we next tried to understand *why* it does so. To further unravel the interplay between the structure of the network and the optimal control strategy, i.e. what aspects of the dynamics of the network warrant preparation, we turned to simpler, two-dimensional models of cortical dynamics. These 2D models are small enough to enable detailed analysis (Supplementary Material S3), yet rich enough to capture the two dominant dynamical phenomena that arise in ISN dynamics: nonnormal amplification (Murphy and Miller, 2009; Goldman, 2009; Hennequin et al., 2012) and oscillations (Brunel, 2000; Dayan and Abbott, 2001). Specifically, networks of E and I neurons have been shown to embed two main motifs of effective connectivity which are revealed by appropriate orthogonal changes of basis: (i) feedforward (“nonnormal”) connectivity whereby a “source mode” of E-I imbalance feeds into a “sink mode” in which balance is restored, and (ii) anti-symmetric connectivity that causes the two populations to oscillate.

To study the impact of each of these prototypical connectivity motifs on movement preparation, we implemented them separately, i.e. as two small networks of two units each, with an overall connectivity scale parameter *w* which we varied (Figure 4A and D; Methods). As both nonnormal and oscillatory dynamics arise from linear algebraic properties of the connectivity matrix, we considered linear network dynamics for this analysis (*ø*(*x*) = *x* in Equation 3). Moreover, to preserve the existence of an output nullspace in which preparation could in principle occur without causing premature movement, we reduced the dimensionality of the motor readout from 2D (where there would be no room left for a nullspace) to 1D (leaving a 1D nullspace), and adapted the motor task so that the network now had to move the hand position along a single dimension (Figure 4B and E, top). Analogous to the previous arm model, we assumed that the hand’s acceleration along this axis was directly given by the 1D network readout.

We found that optimal control of both dynamical motifs generally led to preparatory dynamics, with inputs arising before the go cue (Figure 4B and E, bottom). In the feedforward motif, the amount of preparatory inputs appeared to depend critically on the orientation of the readout. When the readout was aligned with the sink (brown) mode (Figure 4B, left), the controller prepared the network by moving its activity along the source (orange) mode (Figure 4C, left). This placed the network in a position from which it had a natural propensity to generate large activity transients along the readout dimension (c.f. flow field in Figure 4A); here, these transients were exploited to drive the fast upstroke in hand acceleration and throw the hand towards the target location. Note that this strategy reduces the amount of input the controller needs to deliver during the movement, because the network itself does most of the work. Nevertheless, in this case the network’s own impulse response was not rich enough to accommodate the phase reversal required to subsequently slow the hand down and terminate the movement. Optimal control therefore also involved inputs during the movement epoch, leading to a preparatory index of ~ 0.54 (Figure 4G, dark blue).

When it was instead the source mode that was read out (Figure 4B, right), the only dimension along which the system could prepare without moving was the sink mode. Preparing this way is of no benefit, because the flow field along the sink mode has no component along the source (readout) mode. Thus, here the optimal strategy was to defer control to the movement epoch, during which the transient growth of network activity along the readout rested entirely on adequate control inputs. This led to a preparation index of ~ 0 (Figure 4G, pale green). Although the network did react with large activity excursions along the sink mode (Figure 4C, right), these were inconsequential for the movement. Importantly, of the two extreme readout configurations discussed above, the first one yielded a smaller overall optimal control cost (by a factor of ~ 1.5). Thus, at a meta-control level, ideal downstream effectors would read out the sink mode, not the source mode. Note that while increasing the connectivity strength initially led to more preparation (Figure 4H), a plateau was eventually reached for *w* ≥ 4. Indeed, while stronger dynamics initially make preparation more beneficial, they also make it more difficult for preparatory activity to remain in the readout nullspace.

We obtained similar insights for oscillatory network dynamics (Figure 4D-F). A key difference however was that the flow field was rotationally symmetric such that no distinction could be made between “source” and “sink” units – indeed the optimal control strategy yielded the same results (up to a rotation of the state space) irrespective of which of the two units was driving the hand’s acceleration (compare left and right panels in Figure 4D-F). Nevertheless, the optimal controller consistently moved the network’s activity along the output-null axis during preparation, in such a way as to engage the network’s own rotational flow immediately after the go cue (Figure 4F). This rotational flow drove a fast rise and decay of activity in the readout unit, thus providing the initial segment of the required hand acceleration. The hand was subsequently slowed down by modest movement-epoch control inputs which eventually receded, leading to a preparation index of ~ 0.58. Interestingly, the preparation index showed a decrease for very large *w* (Figure 4G), which did not reflect smaller preparatory inputs (Figure 4H) but rather reflected the larger inputs that were required during movement to cancel the fast oscillations naturally generated by the network.

The above results highlight how the optimal control strategy is shaped by the dynamical motifs present in the network. Crucially, we found that the optimal way to control the movement depends not only on the strength and flow of the internal network dynamics, but also on their interactions with the readout.

## Control-theoretic properties predict the amount of preparation

Our investigation of preparation in a low-dimensional system allowed us to isolate the impact of core dynamical motifs, and highlighted how preparation depends on the geometry of the flow field, and its alignment to the readout. However, these intuitions remain somewhat qualitative, making them difficult to generalize to our high-dimensional ISN model.

To quantify the key criteria that appear important for preparation, we turned to tools from control theory. We reasoned that, for a network to be able to benefit from preparation and thus exhibit a large preparation index, there must be some advantage to using early inputs that do not immediately cause movement, relative to using later inputs that do. We hypothesized that this advantage could be broken down into two criteria. First, there must exist activity patterns that are momentarily output-null (i.e. do not immediately cause movement) yet seed output-potent dynamics that subsequently move the arm. The necessity of this criterion was obvious in the 2D nonnormal network, which did not display any preparation when its nullspace was aligned with its “sink” mode. In the language of control theory, this criterion implies that the nullspace of the readout must be sufficiently “observable” – we captured this in a scalar quantity *α* (Methods; Kao and Hennequin, 2019; Skogestad and Postlethwaite, 2007). Second, there must be a sizeable cost to performing the movement in an entirely input-driven manner without relying on preparation. In other words, the network should be hard to steer along the readout direction, i.e. the readout must be of limited “controllability” – we captured this in another scalar quantity *β* (Methods).

We illustrate the meaning of these two metrics in Figure 5A and B for a 2-dimensional example network that combines nonnormality and oscillations. We show two extreme choices of readout direction (Figure 5A, dashed black): the one that maximizes *α* (top) and the one that minimizes it (bottom). In the first case, the readout nullspace (dashed orange) is very observable, i.e. trajectories that begin in the nullspace evolve to produce large transients along the readout (solid orange & inset). In the second case, the opposite is true. For each case, we also assessed the controllability of the readout (*β*). The controllability of a direction corresponds to how much variance activity trajectories exhibit along that direction, when they are randomly and isotropically initialized (Figure 5B). In other words, a very controllable direction is one along which network trajectories have a natural tendency to evolve.

We then assessed how well *α* and *β* could predict the preparation index of individual networks. In 2D networks, we found that a simple function that grows with *α* and decreases with *β* could accurately predict preparation across thousands of networks (Supplementary Material S3.2). To assess whether these insights carried over to high-dimensional networks, we generated a range of large ISNs with parameterically varied connectivity strengths and decay timescales (Methods). We then regressed the preparation index against the values of *α* and *β* computed for each of these networks (as controllability and observability are only defined for linear networks, we set *φ*(*x*) = *x* for this investigation). We found that a simple linear mapping, prep. index = *k*_{0} + *k _{α}α* +

*k*, with parameters fitted to one half of the ISN networks, accurately predicted the preparation indices of the other half (Figure 5C;

_{β}β*R*

^{2}= 0.93, 5-fold cross-validated). Interestingly, we observed that although

*α*and

*β*(which are both functions of the network connectivity) were highly correlated across different networks, discarding either variable in our linear regression led to a significant drop in

*R*

^{2}(Figure 5C, inset). Importantly, it was their difference that best predicted the preparation index (

*k*> 0 and

_{α}*k*< 0), consistent with our hypothesis that the preparation index is a relative quantity which increases as the nullspace becomes more observable, but decreases as readout dimensions become more controllable.

_{β}We were able to confirm the generality of this predictive model by generating networks with other types of connectivity (oscillatory networks, and networks with unstructured random weights), which displayed dynamics very different from the ISNs (see Supplementary Material S4). Interestingly, despite the different distribution of a and *β* parameters in those networks, we could still capture a large fraction of the variance in their preparation index (*R*^{2} = 0.69) using the linear fit obtained from the ISNs alone. This confirms that *α* and *β* can capture information about the networks’ dynamics in a universal manner.

Note that we do not make any claims about the specific functional form of the relationship between *α*, *β* and the preparation index. Rather, we claim that there should be a broad trend for the preparation index to increase with *α* and decrease with *β*, and we acknowledge that this relationship could in general be nonlinear. Indeed, in 2D networks, we found that the preparation index was in fact better predicted by the ratio of *α* over *β* than by their difference (Supplementary Material S3.2).

## Modelling movement sequences

Having gained a better understanding of what features lead a network to prepare, we next set out to assess whether optimal control could also explain the neural preparatory processes underlying the generation of movement *sequences*. We revisited the experimental studies of Zimnik and Churchland (2021), where monkeys were trained to perform two consecutive reaches. Each trial started with the display of both targets, followed by an explicitly enforced delay period before the onset of the first reach. A distinction was made between “double” reaches in which a pause was enforced between reaches, and “compound” reaches in which no pause was required. This study concluded that, rather than the whole movement sequence unrolling from a single preparatory period, each reach was instead successively seeded by its own preparatory activity.

Here, we asked whether such an independent, successive preparation strategy would arise as an optimal control solution, in the same way that single-reach preparation did. Importantly, we could not answer this question by directly examining network inputs as we did for single reaches. Indeed, any network input observed before the second reach could be contributing either to the end of the first movement, or to the preparation of the next. In fact, the issue of teasing apart preparatory vs. movement-related activity patterns also arose in the analysis of the monkey data. To address this, Zimnik and Churchland (2021) exploited the fact that mon-key M1 activity just before and during single reaches is segregated into two distinct subspaces. Thus, momentary activity patterns (during either single or double reaches) can be unambiguously labelled as preparatory or movement-related depending on which of the two subspaces they occupied. We performed a similar analysis (Methods) and verified that preparatory and movement activity patterns in the model were also well segregated in their respective subspaces in the singlereach task (Figure 6A-B). We then assessed the occupancy of the preparatory subspace during double reaching in the model, and took this measure as a signature of preparation.

To model optimal control of a double reach, we modified our cost functional to account for the presence of two consecutive targets (see Methods). We considered the same set of eight targets as in our single-reach task, and modelled all possible combinations of two targets (one example shown in Figure 6). We set the hyper-parameters of the cost function such that both targets could be reached by the resulting optimal controller, in a way that matched important qualitative aspects of the monkeys’ behaviour (in particular, such that both reaches were performed at similar velocities, with the second reach lasting slightly longer on average; Fig-ure 6B-C; top).

We projected the network activity onto preparatory and movement subspaces identified using single and double reaches activity (Methods). For double reaches with a long (600ms) pause, the preparatory subspace was transiently occupied twice, with the two peaks occurring just before the onset of each movement in the sequence (Figure 6B; bottom).

Notably, the occupancy during the “compound” reach (without pause; Figure 6C) also started rising prior to the first movement before decaying very slightly and peaking again before the second reach, indicating two independent preparatory events. This is somewhat surprising, given that a movement sequence can also be viewed as a single “compound” movement, for which we have shown previously a unique preparatory phase is sufficient (Figure 2). In our model, this behaviour can be understood to arise from the requirement that the hand stop briefly at the first target. To produce the second reach, the hand needs to accelerate again, which requires transient re-growth of activity in the network. Given that the network’s dynamical repertoire exhibits limited timescales, this is most easily achieved by rein-jecting inputs into the system.

In summary, our results suggest that the “independent” preparation strategy observed in monkeys is con-sistent with the optimal control of a two-reach sequence. While Zimnik and Churchland (2021) showed that RNNs trained on this task used this “independent” strategy, this was by design as the network was only cued for the second reach after the first one had started. In addition to replicating this proof of concept that it is possible to prepare whilst moving, our model also shows how and why independent preparation might arise as an optimal control solution.

# Discussion

In this work, we proposed a model for the dynamics of motor cortex during a delayed reaching task in non-human primates. Unlike previous work, we treated M1 as an input-driven nonlinear dynamical system, with generic connectivity not specifically optimized for the task, but with external inputs assumed to be optimal for each reach. Motivated by a large body of evidence suggesting that preparation is useful before delayed reaches (Churchland et al., 2010; Lara et al., 2018; Afshar et al., 2011; Shenoy et al., 2013), but also evidence for thalamic inputs being necessary for accurate movement execution (Sauerbrei et al., 2020), we used this model to investigate whether and why neural circuits might rely on motor preparation during delayed reaching tasks. Interestingly, preparation arose as an optimal control strategy in our model, with the optimal solution to the task relying strongly on inputs prior to movement onset. Moreover, the benefits of preparation were dependent on the network connectivity, with preparation being more prevalent in networks whose rich internal dynamics can be advantageously seeded by early external inputs. We were able to quantify this intuition with a predictive model relating the dynamical response properties of a network to the amount of preparation it exhibits when controlled optimally. Finally, we found that prominent features of the monkeys’ neural activity during sequential reaches arose naturally from optimal control assumptions. Specifically, optimally controlled networks relied on two phases of preparation when executing sequences of two reaches, corroborating recent experimental observations in monkey M1 (Zimnik and Churchland, 2021). Together, our results provide a normative explanation for the emergence of preparatory activity in both single and sequential reaching movements.

In recent years, task-optimized recurrent neural networks have become a very popular tool to model neural circuit dynamics. Classically, those models incorporate only those inputs that directly reflect task-related stimuli (e.g. motor target, go cue, etc). This requires assumptions about the form of the inputs, such as modelling them as simple step functions active during specific task epochs. However, as local neural circuits are part of a wider network of brain areas, a large portion of their inputs come from other brain areas at intermediate stages of the computation and may therefore not be directly tied to task stimuli. Thus, it is not always obvious what assumptions can reasonably be made about the inputs that drive the circuit’s dynamics. Our optimization framework, which does not require us to make any specific assumptions about when and how inputs enter the network (although it does allow to incorporate prior information in the form of constraints), allows to bypass this problem and to implicitly model unobserved inputs from other areas. Importantly, our framework allows to ask questions – such as “why prepare” – that are difficult to phrase in standard input-driven RNN models. We note, however, that in the investigation we have presented here, the lack of imposed structure for the inputs also implied that the model could not make use of mechanisms known to contribute certain aspects of preparatory neural activity. For example, our model did not exhibit the usual visually-driven response to the target input, nor did it have to use the delay epoch to keep such a transient sensory input in memory (Guo et al., 2014; Li et al., 2015). Moreover, while we did not introduce any strong assumptions about the temporal structure of inputs, we did assume for simplicity that the optimal controller was aware of the duration of the delay period. While this made solving for the optimal control inputs easier, it made our task more akin to a self-initiated reach (Lara et al., 2018) than to a typical delayed reach with unpredictable, stochastic delay durations. We note that our current framework could be generalized to the stochastic setting by using model predictive control to optimize the control inputs (Rawlings et al., 2017). Although this would be considerably more expensive computationally, we hypothesize that explicitly modelling such uncertainty over the delay period would not only yield a better model of the typical delayed reaching task, but may also lead to preparatory activity patterns arising immediately after target onset and persisting until the go cue.

Dynamical systems have a longstanding history as models of neural populations (Dayan and Abbott, 2001). However, understanding how neural circuits can perform various computations remains a challenging question. Recently, there has been increased interest in trying to understand the role of inputs in shaping cortical dynamics. This question has been approached both from a data-driven perspective (Malonis et al., 2021; Soldado-Magraner et al., 2023), and in modelling work with e.g Driscoll et al. (2022) showing how a single network can perform different tasks by reorganizing its dynamics under the effect of external inputs and Dubreuil et al. (2021) relating network structure to the ability to process contextual inputs. To better understand how our motor system can generate flexible behaviours (Logiaco et al., 2021; Stroud et al., 2018), and to characterize learning on short timescales (Sohn et al., 2020; Heald et al., 2023), it is important to study how network dynamics can be modulated by external signals that allow rapid adaptation to new contexts without requiring extensive modifications of the network’s connectivity. The optimal control approach we proposed here offers a way to systematically perform such evaluations, in a variety of tasks and under different assumptions regarding how inputs are allowed to impact the dynamics of the local circuit of interest. While our model’s predictions will depend on e.g. the choice of connectivity or the design of the cost function, an exciting direction for future work will be to obtain those parameters in a data-driven manner, for instance using recently developed methods to infer dynamics from data (Pandarinath et al., 2018; Schimel et al., 2022), and advances in inverse reinforcement learning and differentiable control (Amos et al., 2018) to infer the cost function that behaviour optimizes.

# Methods

## Experimental model and subject details

In Figure 1, we showed data from two primate datasets that were made available to us by Mark Churchland, Matthew Kaufman and Krishna Shenoy. Details of animal care, surgery, electrophysiological recordings, and behavioral task have been reported previously in Churchland et al. (2012); Kaufman et al. (2014) (see in particular the details associated with the J and N “array” datasets). The subjects of this study, J and N, were two adult male macaque monkeys (Macaca mulatta). The animal protocols were approved by the Stanford University Institutional Animal Care and Use Committee. Both monkeys were trained to perform a delayed reaching task on a fronto-parallel screen. At the beginning of each trial, they fixated on the center of the screen for some time, after which a target appeared on the screen. After a variable delay period (0–1000 ms), a go cue appeared instructing the monkeys to reach toward the target. Recordings were made in the dorsal premotor cortex and in the primary motor cortex using a pair of implanted 96-electrode arrays. In Figure 6, we also reproduced data from Lara et al. (2018) and Zimnik and Churchland (2021). Details of animal care, surgery, electrophysiological recordings, and behavioral task for those data can be found in the Methods section of the respective papers.

## Arm model

To simulate reaching movements, we used the planar two-link arm model described in Li and Todorov (2004). The two links have lengths *L*_{1} and *L*_{2}, masses *M*_{1} and *M*_{2}, and moments of inertia *I*_{1} and *I*_{2} respectively. The lower arm’s center of mass is located a distance *D*_{2} from the elbow. By considering the geometry of the upper and lower limb, the position of the hand and elbow can be written as vectors **y**_{h}(*t*) and **y**_{e} given by

The joint angles * θ* = (

*θ*

_{1};

*θ*

_{2})

^{T}evolve dynamically according to the differential equation

where * m*(

*t*) is the momentary torque vector, is the matrix of inertia, accounts for the centripetal and Coriolis forces, and is a damping matrix representing joint friction. These parameters are given by

with , *a*_{2} = *M*_{2}*L*_{1}*D*_{2}, and *a*_{3} = *I*_{2}.

## iLQR algorithm

Throughout this work, we used the iLQR algorithm (Li and Todorov, 2004) to find the locally optimal inputs that minimize our cost function. iLQR is a trajectory optimization algorithm that can handle nonlinear dynamics and non-quadratic costs. iLQR works in an iterative manner, by linearizing the dynamics and performing a quadratic approximation of the cost at each iteration, thus turning the control problem into a local linear-quadratic problem whose unique solution is found using LQR (Kalman et al., 1960). The LQR solver uses a highly efficient dynamic programming approach that exploits the sequential structure of the problem. Our implementation of iLQR followed from Li and Todorov (2004), with the difference that we performed regularization of the local curvature matrix as recommended by Tassa (2011).

## Generation of the high-dimensional readouts and networks

### Generation of inhibitory-stabilized networks

Simulations in Figures 1, 3, 5 and 6 were conducted using inhibition-stabilized networks (ISN). Those were generated according to the procedure described in Hennequin et al. (2014) with minor adjustments. In brief, we initialized strongly connected chaotic networks with sparse and log-normally distributed excitatory weights, and stabilized them through progressive adjustments of the inhibitory weights until the spectral abscissa of the connectivity matrix fell below 0.8. This yielded strongly connected but stable networks with a strong degree of non-normality. When considering a larger range of ISNs (Figure 5), we independently varied both the variance of the distribution of initial excitatory weights and the spectral abscissa below which we stopped optimizing the inhibitory weights.

### Generation of additional networks in Figure 5

To assess the generality of our findings in Figure 5, we additionally generated randomly connected networks by sampling each weight from a Gaussian distribution with , where the spectral radius *R* was varied between 0 and 0.99. We also sampled skew-symmetric networks by drawing a random network * S* as above, and setting

*= (*

**W***—*

**S***)/2. We varied the radius*

**S**^{T}*R*of the

*matrices between 0 and 5. Moreover, we considered diagonally shifted skew-symmetric networks*

**S***= (*

**W***—*

**S***)/2 +*

**S**^{T}*λ*where

**I***λ*denotes the real part of all the eigenvalues and was varied between 0 and 0.8.

The elements of the readout matrix * C* mapping neural activity onto torques were drawn from a normal distribution with zero mean and standard deviation . This was chosen to ensure that firing rates of standard deviation on the order of 30Hz would be decoded into torques of standard deviation ~ 2 N/m, which is the natural variation required for the production of the reaches we considered.

## Details of Figure 4

To more easily dissect the phenomena leading to the presence or absence of preparation, we turned to 2D linear networks in Figure 4. We modelled nonnormal networks with a connectivity and oscillatory networks with connectivity . The activity of the two units evolved as

and directly influenced the acceleration of a one-dimensional output *y*(*t*) according to

where * C_{i}* = [cos

*θ*sin

_{C}*θ*] was a row matrix reading the activity of the network along an angle

_{C}*θ*from the horizontal (first unit). Our setup aimed to mirror the reaching task studied in this work. We thus optimized inputs to minimize the following cost function:

_{C}
where *y** = 20 was the target position.

## Computing networks’ controllability and observability to predict preparation in Figure 5

As part of our attempt to predict how much a network will prepare given its intrinsic properties only, we computed the prospective potency of the nullspace *α*, and the controllability of the readout *β*. For a stable linear dynamical system described by

the system’s observability Gramian * Q* can be computed as the unique positive-definite solution of the Lyapunov equation

The prospective potency of the nullspace **C**^{⊥} is then defined as

Note that this measure *α* is invariant to the specific choice of basis for the nullspace **C**^{⊥}. Similarly, to assess the controllability of the readout, we first computed the controllability Gramian of the system * P*, which is the solution of

with * B* =

*in our system. We then defined the controllability of the readout as*

**I**## Details of Figure 6

### Cost function

We modelled sequences of reaches by modifying our cost functional to account for the presence of two targets, as

where *τ* describes how long the monkey’s hands had to stay on the intermediate target before performing its second reach. We used *τ* = 600 ms and *α*_{pause} = 100 for the double reaches in which a pause was explicitely enforced during the experiment. For compound reaches, the experiment did not require monkeys to stop for any specific duration. However, to ensure that the hand stopped on the target in the model (as it does in experiments when monkeys touch the screen) rather than fly through it, we set *τ* = 6 ms and *α*_{pause} = 100 when modelling compound reaches.

### Preparatory subspace analysis

Lara et al. (2018) proposed an analysis to identify preparatory and movement-related subspaces. This analysis allows to assess when the neural activity enters those subspaces, independently of whether it is delay-period or post-go-cue activity.

The method identifies a set of preparatory dimensions and a set of movement dimensions, constrained to be orthogonal to one another, as in Elsayed et al. (2016). These are found in the following manner: the trial-averaged neural activity is split between preparatory and movement-related epochs, yielding two matrices of size *N* × *MT* where *N* is the number of neurons, *T* is the number of time bins and *M* is the number of reaches. One then optimizes the and (where *d*_{prep} and *d*_{mov} are the predefined dimensions of the two subspaces) such that the subspaces respectively capture most variance in the preparatory and movement activities, while being orthogonal to one another. This is achieved by maximizing the following objective:

where *C*_{prep/mov} are the covariance matrices of the neural activity during the preparatory and movement epochs, respectively. The normalizing constant *Z*_{prep}(*d*_{prep}) denotes the maximum amount of variance in preparatory activity that can be captured by any subspace of dimension *d*_{prep} (this is found via SVD), and similarly for *Z*_{mov}(*d*_{mov}). The objective is maximized under the constraints and . We set subspace dimensions *d*_{prep} = *d*_{mov} = 6, although our results were robust to this choice.

The occupancy of the preparatory subspace was defined as

and that of the movement subspace was defined as

For single reaches, we defined preparatory epoch responses as the activity in a 300ms window before the end of the delay period, and movement epoch responses as the activity in a 300ms window starting 50ms after the go cue. We normalized all neural activity traces using the same procedure as Churchland et al. (2012); Elsayed et al. (2016). For double reaches, we followed Zimnik and Churchland (2021) and used neural activity traces from both single reaches and the first reach of double-reach sequences. Note that we did not include any activity from the second reaches in the sequence, or from compound reaches, when defining the subspaces.

## Parameter table

# Acknowledgements

We are grateful to Matthew T. Kaufman and Mark M. Churchland for sharing data for the monkey experiments. We thank Kristopher Jensen, David Liu, Rui Xia, Javier Antorán, and Rory Byrne for helpful comments on the manuscript. M.S. was funded by an EPSRC DTP studentship, grant number RG94782. Part of this work was performed using resources provided by the Cambridge Tier-2 system operated by the University of Cambridge Research Computing Service (http://www.hpc.cam.ac.uk) funded by EPSRC Tier-2 capital grant EP/P020259/1. For the purpose of open access, the authors have applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission.

# Competing interests

T-C.K. is currently a research scientist at Meta Reality Labs, but only contributed to this work while studying at the University of Cambridge.

# Supplementary material

## S1 Choice of the hyperparameters of the model

Our cost function for the delayed single-reaching task was composed of 3 components. The relative weighings of the different terms in our cost, which are hyperparameters of the model, affect the way in which the task is solved. To ensure robustness of our results to hyperparameter changes, we scanned the space of *α*_{null} and *α*_{effort} (as the solution is invariant to scaling of the cost, only those relative weighings matter), and evaluated the solutions found across this hyperparameter space for a delayed reach of 300 ms.

Our evaluation was based on multiple criteria. We considered the target to have been successfully reached if the mean distance to the target in the last 200 ms of the movement was lower than 5 mm (for a reach radius of 12 cm). We considered that the requirement to stay still during the delay period was satisfied if the mean torques during preparation were smaller than 0.02 N/m. We computed the preparation index and total cost as described in Equation (5) and Equation (4). We moreover computed the total input energy per neuron as , and the maximum velocity as . These various quantities are shown for a range of hyperparameters in Figure S1, with the choice of hyperparameters used throughout our simulations marked with a red star. This shows that the behaviour of the model is consistent across a range of hyperparameter settings around the one we used.

In Figure S2, we illustrate the output of the model for several hyperparameter settings. One can notice that for very small values of *α*_{effort} the reach is successful, but executed with larger torques and velocity than is necessary – e.g the red and yellow reaches are equally successful but the red one is much faster – which comes at the cost of larger inputs. We chose the set of hyperparameters for our simulations such as to lie in an intermediate regime in which the task is solved successfully, but without requiring more inputs than necessary.

## S2 Investigation of the effect of the network decay timescale

Figure 3 highlighted that preparatory inputs tend to consistently arise late during the delay period. We hypothesized that this may be a reflection of the intrinsic tendency of the network dynamics to decay, such that inputs given too early may be “lost”. To test this, we changed the characteristic timescale of the dynamics *during preparation only*, leading to the following dynamics:

with *τ*_{mov} = 150ms. This allowed us to evaluate whether having dynamics decaying more slowly during preparation led to inputs starting earlier. Note that we also rescaled the inputs during preparation by , to ensure that the effective cost of the inputs was not affected by the timescale change.

As shown in Figure S3, inputs started rising earlier when the network’s decay timescale was longer. This was consistent with the hypothesis that the length of the window of preparation that the optimal controller uses depends on the network’s intrinsic timescale.

## S3 Additional results in the 2D system

Our visualization of the behaviour of 2D networks in Figure 4 allowed us to identify features of the dynamics that were well-suited to predicting preparation. Below, we compute *α* and *β* numerically and analytically in 2D oscillatory and nonnormal networks, to gain insights into how these quantities vary with the networks’ dynamics. We then show how preparation can be predicted highly accurately across a large number of 2D systems, using only those quantities to summarize the network dynamics.

### S3.1 Controllability and observability computations

In Figure S4, we computed *α* and *β* numerically, as a function of the connectivity strength and the choice of readout, for the nonnormal and the oscillatory motifs shown in Figure 4. This highlights the very different behaviours of the two networks, which are to some extent also reflected in higher-dimensional models. In particular, we find a strong effect of the alignment between the readout and the network dynamics in nonnormal networks, while *α* and *β* are independent of in oscillatory networks. Interestingly, we see that *β* is constant across all oscillatory networks, while *α* increases with *w*.

As the reduced 2D model is more amenable to mathematical analysis than its high-dimensional counterpart, we can gain further insights into the origin of these differences by computing *α*(*w*, *θ _{C}*) and

*β*(

*w*,

*θ*) analytically.

_{C}Recall that the observability Gramian Q of a linear input-driven dynamical system satisfies

and the controllability Gramian satisfies

and that we defined Tr(**C**^{⊥}**QC**^{⊥}) and *β* = Tr(* CPC^{T}*) where

**C**^{⊥}denotes the nullspace of the readout matrix. Below, we compute these quantities for the 2D oscillatory and nonnormal networks, with

*=*

**B***and*

**I***a unit-norm vector whose direction we parametrize via a quantity*

**C***θc*. Note that we ignore the effect of dt and

*τ*in the mathematical analysis, as those quantities can straightforwardly be included in the final result via a rescaling of

*w*and

*.*

**B**#### Oscillatory network

In the case of * A* =

*+*

**−I***where*

**S***is a skew-symmetric network (i.e*

**S***= −*

**S**^{T}*), Equation (S3) is solved by*

**S***=*

**P***/2 independently of the value of*

**I***. This explains why is independent of both the connectivity strength*

**S***w*and the orientation of the readout

*θ*for skew-symmetric networks (see Figure S4; bottom right). Practically, this means that skew-symmetric networks are equally controllable in all directions: when driven by random inputs, these networks display isotropic activity of equal variance along all directions. Moreover, as

_{c}*w*controls the oscillation frequency of the network, but does not change the decay timescale of the eigenmodes, the amount of variance generated by a random stimulation is independent of

*w*. Interestingly, we can see in Figure S4 (top right) that

*α*displays a different behaviour, and increases with

*w*. As highlighted above, skew-symmetric systems are rotationally symmetric. Without loss of generality, we can thus define our 1D vector to read out the first unit, i.e

*= [1 ü].*

**C**The observability Gramian must satisfy

This can be found in closed-form by solving the 2D system of equations, yielding

From there, we obtain . As found empirically, this quantity will initially increase before plateauing towards 1/4 as *w* becomes large.

One might wonder why observability displays such a dependence on the oscillatory frequency of the network, even though the network is rotationally symmetric, and *w* does not affect the decay timescale. As highlighted in Equation (S2), controllability and observability Gramian would be identical for a skew-symmetric system if * C* =

*. However, a feature of the systems we consider is the existence of a nullspace, i.e the fact that the readout*

**I***only targets a subset of dimensions across the whole space (implying that*

**C***is a low-rank matrix). Intuitively, the reason why*

**C**^{T}**C***α*increases with

*w*while

*β*is constant in skew-symmetric networks can be understood as follows:

*α*is computing how much

*readout activity*a vector initialized in the nullspace of

*will generate, while*

**C***β*is computing the amount of energy that will be generated

*across all directions*by a vector initialized in the readout space. Thus, assuming once again

*= [1 0 and*

**C**

**C**^{⊥}= [0 1], the activity of vectors initialized along

*and*

**C**

**C**^{⊥}respectively and evolving autonomously from there is given by

*(*

**ν**_{C}*t*) = [

*e*cos(

^{−t}*wt*)

*e*sin(

^{-t}*wt*)] and

**ν**_{C}^{⊥}(

*t*) = [

*e*sin(

^{−t}*wt*) −

*e*cos(

^{-t}*wt*).]

From there, we can compute . Thus, as found above, only the decay timescale of the envelope (fixed to 1 here) affects the value of *β*.

Importantly, *α* will instead have a dependence on *w* arising from the fact that it depends on the size of the activity *projected into the readout*, as

The dependence of this quantity on *w* can be understood by the fact that activity patterns initialized in the readout nullspace benefit from the existence of rotational dynamics, which allows them to be read-out before the activity decays completely.

#### Nonnormal network

In the nonnormal network, we have .

The nonnormal 2D system, unlike its oscillatory counterpart, does not have rotational symmetry. Thus, to remain general, we will consider * C*(

*θ*) = [cos

_{C}*θ*sin

_{C}*θ*], and

_{C}

**C**^{⊥}(

*θ*) = [— sin

_{C}*θ*cos

_{C}*Θ*]. Solving Equation (S2) for

_{C}*=*

**B***leads to an expression for the controllability Gramian of the nonnormal system as*

**I**Similarly, computation of the observability Gramian leads to

We can then compute

and

This highlights the dependence of *α* and *β* on *θ _{C}*, which can also be seen in Figure S4 (left). Interestingly, these expressions also make evident the supralinear scaling of

*α*and

*β*with w in nonnormal networks. Note however that we never investigate preparation in the very large w regime, as the simulation of such networks with discretized dynamics is prone to numerical issues.

### S3.2 Predicting preparation in 2D networks

To assess how well preparation could be predicted from the control-theoretic properties *α* and *β* (c.f. main text) of 2D networks, we generated 20000 networks with weight matrix

where , and . Equation (S15) implies that * W* has a pair of complex-conjugate eigenvalues

*a*±

*iω*, and also embeds a feedforward coupling of strength

*w*

_{ff}from the second to the first dimension. For each network configuration, we computed the corresponding values of

*α*and

*β*. To confirm our intuition that the preparation index should increase with

*α*and decrease with

*β*, we first attempted to fit prep. . Interestingly, we found that while this quantity was positively correlated with the preparation index across networks, a substantial fraction of variance remained unexplained (test

*R*

^{2}= 0.16). Labelling the preparation index by the rotational frequency of the network highlighted that a substantial fraction of the variance across networks came from this timescale of oscillations (Figure S5, left). Indeed, a regression model of the for prep. captured 80% of the variance in preparation index, yielding an accurate fit across networks with only two free parameters (Figure S5, right).

We stress that the predictive power of these simple fits is remarkable given that the preparation index comes out of a complex process of optimization over control inputs. Thus, the control-theoretic quantities *α* and *β* appear to appropriately summarize the benefits of preparation for individual networks.

The fact that the preparation index also grows with *ω* can be understood by considering the alignment between the activity trajectories which the network can autonomously generate and those that are required for solving the motor task. Indeed, a network that is intrinsically unable to generate outputs with the right oscillatory timescale would have to rely on movement-related inputs, i.e. would have a low preparation index. As observed here, the network’s characteristic frequency has a big impact in 2D networks, consistent with *ω* determining the *only* oscillatory pattern that the network can generate on its own. For high-dimensional networks, however, we did not have to incorporate such a measure of compatibility between task requirements and network dynamics (c.f. Figure 5). We speculate that this is due to averaging effects. Indeed, larger networks possess a wide range of intrinsic oscillatory timescales, and the readout matrix - which here was not aligned to the network’s dynamics in any specific way - is expected to read out a little bit of all frequencies, including task-appropriate ones.

## S4 Comparison across networks

Our main investigation was largely focused on on behaviour of inhibition-stabilized networks, which are believed to constitute good models of M1. We however found that the expression we derived to obtain a network’s preparation index from its control-theoretic properties generalized across to other types of networks. Below, we detail the other network families we considered, and show how their dynamics qualitatively differ from the ISN, although their preparation can be predicted using the same quantities.

We modelled three additional classes of networks: randomly connected networks with either (i) unstructured or (ii) skew-symmetric connectivities, (iii) a surrogate network obtained by applying a similarity transformation to the ISN that preserved its eigenvalue spectrum but eliminated any “nonnormality” (i.e, we found such that where was a diagonal matrix with the same eigenvalues as * A*). Note that we did not apply the transformation to the readout or input matrices, such that the transfer function of the system was changed by our transformation. This was voluntary, as we were interested in the effect that transforming the dynamics would have on the input-output response. These networks were chosen for the diversity of dynamical motifs they exhibit: combinations of rapidly and slowly decaying modes, oscillations, and transient dynamics. Moreover, each of these network families could be sampled from in a straightforward manner, allowing to compute results across many instantiations of each network type. We again used random readout matrices not specifically adjusted to the dynamics of the network nor to the motor task. To get an intuition for how different networks solve the task, we generated one network from each family and qualitatively compared their inputs and internal activations when performing the same delayed reach (Figure S6A). We first considered an unconnected network, i.e. a network whose recurrent weights were all 0. Unsurprisingly, this network had no use for a preparation phase. Indeed, there is no benefit to giving early inputs as the network is unable to amplify them. More surprisingly, a random network with a much stronger connectivity – as can be seen in its eigenvalue spectrum forming a small ball of radius close to 1 (Figure S6A(top)) – also displayed very little preparation. The strong, visually apparent similarity between the inputs to the random and unconnected networks suggests that the optimal way of controlling the random network relies largely on ignoring its internal dynamics and solving the task almost entirely in an input-driven regime. The example skew-symmetric network, which had imaginary eigenvalues only (ranging between −5.5 and 5.5), displayed considerably more preparation, but still relied on strong inputs during the movement phase that resembled those of the unconnected and random networks. Finally, the ISN relied much more on preparation; the small inputs it receives are strongly amplified into large activity patterns owing to its strong, nonnormal recurrent connectivity. Interestingly however, the similarity transformed ISN lost much of that ability to amplify inputs, instead displaying dynamics resembling that of the skew-symmetric network. This highlights the effect of the ISN’s nonnormal dynamics in shaping the network’s activity and optimal inputs.

Next, we assessed more directly how beneficial preparation was for the different networks. We evaluated how the total loss and preparation index evolved as a function of the delay period length (Figure S6B). As expected, the control of networks that relied on preparation (skew-symmetric and ISN) benefitted more from longer delays. The ISN has markedly lower control cost and higher preparation index than other networks, reflecting the fact that even weak (thus energetically cheap) inputs were sufficient to produce internal activity and thus output torques of the desired magnitude (c.f. Figure S6A, right).

The above results give a sense of the range of possible dynamics that different types of networks display. Interestingly, despite these differences, we showed in Figure 5 that the preparation index could be predicted with a simple formula across all networks.

# References

- Single-trial neural correlates of arm movement preparation
*Neuron***71**:555–564 - Differentiable mpc for end-to-end planning and control
*Advances in Neural Information Processing Systems*:8289–8300 - Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons
*J. Comput. Neurosci***8**:183–208 - Neural population dynamics during reaching
*Nature***487**:51–56 - Cortical preparatory activity: representation of movement or first cog in a dynamical machine?
*Neuron***68**:387–400 - Delay of movement caused by disruption of cortical preparatory activity
*Journal of neurophysiology***97**:348–359 - Theoretical neuroscience
- Flexible multitask computation in recurrent networks utilizes shared dynamical motifs
*bioRxiv* - The role of population structure in computations through neural dynamics
*bioRxiv*:2020–7 - The origin of corticospinal projections from the premotor areas in the frontal lobe
*Journal of Neuroscience***11**:667–689 - Reorganization between preparatory and movement population responses in motor cortex
*Nature communications***7**:1–15 - Memory without feedback in a neural network
*Neuron***61**:621–634 - Flow of cortical activity underlying a tactile decision in mice
*Neuron***81**:179–194 - Signal-dependent noise determines motor planning
*Nature***394**:780–784 - The computational and neural bases of context-dependent learning
*Ann. Rev. Neurosci*:1–27 - Nonnormal amplification in random balanced neuronal networks
*Physical Review E***86** - Optimal control of transient dynamics in balanced networks supports generation of complex movements
*Neuron***82**:1394–1406 - Rotational dynamics in motor cortex are consistent with a feedback controller
*Elife***10** - Contributions to the theory of optimal control
*Bol. soc. mat. mexicana***5**:102–119 - Neuroscience out of control: control-theoretic perspectives on neural circuit dynamics
*Cur. Op. Neurobiol***58**:122–129 - Optimal anticipatory control as a theory of motor preparation: a thalamo-cortical circuit model
*Neuron***109** - Cortical activity in the null space: permitting preparation without movement
*Nature neuroscience***17**:440–448 - Conservation of preparatory neural events in monkey motor cortex regardless of how movement is initiated
*Elife***7** - A motor cortex circuit for motor planning and movement
*Nature***519**:51–56 - Iterative linear quadratic regulator design for nonlinear biological movement systems
*ICINCO (1)*:222–229 - Thalamic control of cortical dynamics in a model of flexible motor sequencing
*Cell Reports***35** - M1 dynamics share similar inputs for initiating and correcting movement
*bioRxiv* - Parallel movement planning is achieved via an optimal preparatory state in motor cortex
*Cell Reports***42** - Predicting reaction time from the neural state space of the premotor and parietal grasping network
*Journal of Neuroscience***35**:11415–11432 - Behaviorally selective engagement of short-latency effector pathways by motor cortex
*Neuron***95**:683–696 - Balanced amplification: a new mechanism of selective amplification of neural activity patterns
*Neuron***61**:635–648 - Inferring single-trial neural population dynamics using sequential auto-encoders
*Nature methods***15**:805–815 - Model predictive control: theory, computation, and design
- Monkey primary motor and premotor cortex: single-cell activity related to prior information about direction and extent of an intended movement
*Journal of neurophysiology***61**:534–549 - Motor cortex embeds muscle-like commands in an untangled population response
*Neuron***97**:953–966 - Cortical pattern generation during dexterous movement is input-driven
*Nature***577**:386–391 - iLQR-VAE: control-based learning of input-driven dynamics with applications to neural data
*International Conference on Learning Representations* - Motor planning, not execution, separates motor memories
*Neuron***92**:773–779 - Cortical control of arm movements: a dynamical systems perspective
*Annual review of neuroscience***36** - Multivariable feedback control: analysis and design
- A network perspective on sensorimotor learning
*Trends in Neurosciences* - Inferring context-dependent computations through linear approximations of prefrontal cortex dynamics
*bioRxiv* - Principles of neural design
- Motor primitives in space and time via targeted gain modulation in cortical networks
*Nature neuroscience***21**:1774–1783 - Cortical preparatory activity indexes learned motor memories
*Nature*:1–6 - A neural network that finds a naturalistic solution for the production of muscle activity
*Nature neuroscience***18**:1025–1033 - Theory and Implementation of Biomimetic Motor Controllers
- Optimal feedback control as a theory of motor coordination
*Nature neuroscience***5**:1226–1235 - Optimal control methods suitable for biomechanical systems
*Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No. 03CH37439)*:1758–1761 - When optimal feedback control is not enough: Feedforward strategies are required for optimal control with active sensing
*PLoS computational biology***12** - Independent generation of sequence elements by motor cortex
*Nature neuroscience***24**:412–424