The re-aiming learning strategy.

a. Re-aiming strategy for BCI learning. If activity evoked by imagining a leftward planar movement moves the BCI cursor right, then the animal learns to use this motor command to move the cursor to the right. Critically, the space of imagined planar movements is low-dimensional.

b. Proposed model of re-aiming. Upstream inputs to motor cortex, uj, depend on a low-dimensional motor command vector, θ (depicted here as two-dimensional). The BCI readout, y, is a 2D linear readout of motor cortical firing rates through a decoding matrix, D. Re-aiming is formalized as identifying the motor command, θ, that ensures the BCI readout gets as close as possible to a given target readout, y* (cf. equation 4).

BCI learning task of Sadtler et al. (2014).

a. Schematic of task structure. Subjects first engage in a “calibration task” whereby they passively observe center-out cursor movements on a screen. Recorded neural activity in motor cortex is used to construct the baseline decoder and estimate the intrinsic manifold. Subjects are then instructed to perform center-out cursor movements under BCI control, first using the baseline decoder and then with a perturbed decoder, constructed by perturbing the baseline decoder. This perturbation can either preserve the baseline decoder’s alignment with the intrinsic manifold (a within-manifold perturbation, or WMP) or disrupt it (an outside-manifold perturbation, or OMP).

b. Low-dimensional illustration of the intrinsic manifold and its relationship to the decoders (defined in equation 3) used in this task. Colored dots represent activity patterns recorded during different trials of the calibration task, colored by the cursor velocity presented on that trial. The cursor velocities of these stimuli are depicted by color-matched arrows in the inset in the top right, with the cursor targets used in the subsequent cursor control task depicted by green diamonds. The evoked neural activity patterns reside predominantly within the two-dimensional plane depicted by the gray rectangle, the so-called intrinsic manifold. Three hypothetical one-dimensional decoders are depicted by colored arrows, labelled baseline decoder, WMP, and OMP. The corresponding component of the linear readouts, y1, from these decoders can be visualized by projecting individual activity patterns onto the corresponding decoder vector. This is illustrated for one activity pattern marked in green, whose projections onto each of the three decoders is shown. Because this activity pattern resides close to the intrinsic manifold, it yields a large readout (i.e. far from the origin, at the intersection of the three decoders) from the baseline decoder and WMP, which are both well aligned with the intrinsic manifold. In contrast, this activity pattern’s readout through the OMP is much weaker (i.e. its projection onto this decoder is much closer to the origin), since this decoder is oriented away from the intrinsic manifold. It is important to keep in mind that this illustration is a simplified cartoon of the true task, in which the intrinsic manifold is higher-dimensional (8-12D instead of 2D) and the BCI task depends on two readouts (y1, y2) rather than one.

Re-aiming with two command variables suffices to learn good solutions for within-but not outside-manifold perturbations.

a. Optimal motor commands, ,for the baseline decoder and one example WMP and OMP, plotted in θ1-θ2 space. The shade of green indexes the target readout, ,that each motor command is optimized for, corresponding to the target readouts plotted in the adjacent panel (green diamonds in fig. 3b).

b. Readouts generated at time tend by the optimal motor commands shown in the previous panel (fig. 3a), i.e. .Green diamonds mark the eight target readouts for the center-out cursor control task, set to the directions of the eight radial cursor targets used by Sadtler et al. (2014).

c. Readouts from each of the reachable manifold activity patterns plotted in fig. 3f, with matched marker colors and sizes. The diamonds denote the eight target readouts as in fig. 3b. Note that the reachable readouts closest to the targets do not necessarily match the readouts produced by the optimal motor commands (fig. 3b), as the optimal motor commands are optimized to minimize the metabolic cost of the upstream input as well the readout error (cf. equation 4).

d. Distribution of mean squared error achieved by the optimal motor commands for 100 randomly sampled WMP’s and OMP’s. The mean squared error achieved by the optimal motor commands for the baseline decoder from which these perturbations are derived is marked by the vertical dashed black line. Target readouts are unit norm, so a mean squared error of 1.0 is equivalent to producing readouts at the origin.

e. Motor commands covering a range of angles on the θ1θ2 plane and 5 norms, ∥θ∥ ∈ {0.1, 0.4, 0.7, 1.0, smax}, with smax 1.25 (see Methods Section 4.4 for how this was chosen). The motor commands used to simulate the calibration task are indicated by the pink/purple squares. All other command variables, θ3, θ4, …, θK, are fixed to 0.

f. Activity patterns in the reachable manifold at endpoint time tend = 1000ms. Each ring of activity patterns is generated by the corresponding ring of color- and size-matched motor commands in the previous panel. This ensemble of N - dimensional activity patterns is projected onto its top three principal components. The black line is drawn to facilitate visualization of the 3D structure of this conical manifold. Note that the points in this plot should not be thought of as spatiotemporal trajectories of activity; rather, they depict activity patterns at the same timepoint generated by different motor commands.

g. Purple curve: cumulative variance in reachable manifold activity patterns along each intrinsic manifold dimension (equation 25). Gray curve: cumulative variance in calibration task neural responses. By construction, the intrinsic manifold contains 95% of the total variance of the calibration task neural responses (Methods Section 4.8).

Re-aiming predicts biases in readouts after short-term learning of within-manifold perturbation (WMP) decoders.

a. Readouts reachable through four representative WMP decoders, using the same color conventions as in fig. 3c. In each case, the four loops correspond to four distinct motor command norms, chosen to aid visualization. The leftmost panel corresponds to the example WMP decoder shown in fig. 3cii. The projection of the reachable manifold centroid, ,is overlaid as an open arrow, arbitrarily rescaled for visibility.

b. Maximal cursor progress in each target direction as a function of angle with , for the four example WMP decoders in panel a.

c. Maximal cursor progress in each target direction as a function of angle with ,for all 100 sampled WMPs (Pearson r = −0.66, p < .001, N = 8 target directions × 100 sampled WMPs = 800). As was done for the experimental data in the next panel, the reachable manifold centroid, , is estimated using simulated mean firing rates during baseline decoder control (see Methods Section 4.6).

d. Maximal cursor progress in each target direction as a function of angle with ,for all 46 sessions of WMP learning across three monkeys (Pearson r = −0.41, p < .001, N = 8 target directions × 46 experimental sessions with WMP control = 368). Maximal cursor progress is estimated using the average cursor progress over the 50 contiguous trials with lowest acquisition times. The reachable manifold centroid, , is estimated using mean firing rates over trials of baseline decoder control (see Methods Section 4.6).

Generalized re-aiming produces good solutions for outside-manifold perturbations (OMPs).

a. Mean squared error achieved by generalized re-aiming solutions for all sampled OMP decoders, plotted as a function of the number of command variables used for re-aiming,. Lighter blue points show the mean squared error for individual OMP decoders, darker open circles show the median over all sampled OMPs. For reference, dotted horizontal lines show the mean squared error achieved by re-aiming solutions with for the baseline decoder (black) and for WMP decoders (red); for WMP decoders, the median over all sampled decoders is shown with shading marking the upper and lower quartiles (corresponding to the values plotted in the red histogram in fig. 3d).

b. Participation ratio of the reachable manifold covariance (a measure of the effective dimensionality of the reachable manifold; see Methods Section 4.4, equation 16) as a function of the number of command variables used for re-aiming, .

c. Convex hull of OMP readouts reachable with different number of command variables, ,for the same OMP decoder shown in fig. 3c. The innermost ring corresponds to the convex hull of the reachable readouts plotted in fig. 3ciii.

Generalized re-aiming solutions reproduce motor cortical tuning changes observed under credit assignment rotation perturbations.

a. A linear BCI readout (equation 3) can be interpreted as summing together columns of the decoding matrix D, each weighted by the firing rate of the corresponding neuron (the centering term c has been dropped here for simplicity). These columns are called the neurons’ decoding vectors, and they are plotted on the axes below the equation. Under a credit assignment rotation perturbation, the decoding vectors of a subset of neurons (marked in purple) are rotated by a fixed angle (in this case 75o counter-clockwise). The neurons’ decoding vectors under this perturbed decoder are shown by dashed green arrows. The neurons whose decoding vectors are rotated are termed “rotated” neurons (in purple), the rest of the neurons that are recorded by the BCI are termed “non-rotated” neurons (in pink). Neurons that are not recorded by the BCI (i.e. whose decoding vectors are just a vector of 0’s, not depicted here) are termed “indirect” neurons.

b. Tuning curve of a representative example rotated neuron of our model, during cursor control with the baseline decoder (black) and with a credit assignment rotation perturbation (green). The dots show the time-averaged activity over tend = 1000ms while the motor cortical network is driven by the re-aiming solutions for each respective decoder, using for the baseline decoder and for the perturbed decoder. Curves show tuning curves fit to these responses (Methods Section 4.9). The vertical dotted gray lines mark the preferred direction under each decoder, with an arrow labeling the change in preferred direction.

c. Tuning curve of a representative example non-rotated neuron of our model, under the same two decoders. All conventions exactly as in the previous panel. Note that this neuron’s preferred direction changes less than that of the rotated neuron in the previous panel.

d. Mean squared error achieved by generalized re-aiming solutions for 100 random credit assignment rotation perturbations, plotted as a function of the number of command variables used for re-aiming, .Light green dots denote individual decoder perturbations, overlaid darker open circles denote medians over all 100 sampled decoder perturbations. Black dotted horizontal line shows the mean squared error achieved by re-aiming solutions to the unperturbed baseline decoder with .

e. Average change in preferred direction of rotated, non-rotated, and indirect neurons between simulated cursor control with the baseline decoder and each perturbed decoder, plotted as a function of the number of command variables used for re-aiming, .For each decoder perturbation, the changes in preferred direction are averaged over all neurons in each sub-population, and the median over all sampled perturbations is plotted. Error bars mark the upper and lower quartiles. Positive angles indicate a counter-clockwise rotatation, consistent with the direction of rotation of the decoding vectors of the rotated neurons.

Operant conditioning of individual neurons by re-aiming.

a. Activity of two model neurons at various points on the reachable manifold, at tend = 1000ms with , following the same conventions as fig. 3f. As in that figure, each ring of activity patterns is generated by the corresponding ring of color- and size-matched motor commands in fig. 3e. Activity patterns below the diagonal are ones where neuron a is more active than neuron b, satisfying the task demands when neuron a is the target neuron; the reverse holds for activity patterns above the diagonal. The green and orange open circles denote the activity patterns produced by the optimal re-aiming solutions for the two respective target assignments. Note that, due to the metabolic cost term in the re-aiming objective function, these do not necessarily correspond to the points on reachable manifold that are furthest away from the diagonal.

b. Difference in activity produced by re-aiming solutions for each target assignment (at time tend = 1000 ms, the endpoint time the re-aiming solutions were optimized for), for 500 random pairs of neurons from the same model motor cortical network used in previous simulations. Each dot corresponds to one pair of neurons. The pair of neurons shown in previous panel is marked by an open circle. Two additional examples are marked by open circles. Insets show the activity of those neuron pairs at various points on the reachable manifold, following same conventions as the previous panel.

c. Difference in activity (at tend = 1000 ms) produced by the re-aiming solutions optimized for neuron a being the target, for the same 500 random pairs of neurons, plotted as a function of the correlation between the two neurons during simulated spontaneous behavior. The three example pairs of neurons highlighted in the previous panel are again highlighted here with open circles. A quantitatively similar trend is observed for rbra with re-aiming solutions optimized for neuron b being the target (data not shown).

d. Activity of indirect neurons (at tend = 1000 ms) for the same neuron pairs and re-aiming solutions in previous panel, plotted as a function of correlation with neuron a during simulated spontaneous behavior. A quantitatively similar trend is observed for correlations with neuron b with re-aiming solutions optimized for neuron b being the target (data not shown).

Each column shows simulation results for a different network: (1) the randomly connected network used in the main text; (2) another randomly connected network, with weights sampled in exactly the same way; (3) a network with random E/I connectivity; (4) a network with inhibition-optimized E/I connectivity; (5) network with connectivity optimized for delayed center-out reaching.

a. Eigenspectra of recurrent weight matrices of each network, plotted on the complex plane. Note that the optimized network has low-rank connectivity: almost all eigenvalues are clustered at 0. Dashed vertical line marks the linear stability boundary of a real value of 1.0.

b. Mean squared error achieved by re-aiming solutions for different endpoint times, tend, for each decoder. Lighter markers correspond to individual decoders, darker open markers (connected by lines) show medians over all decoders. Note that the metabolic cost weight, γ, is fixed to the same value, which was picked to guarantee low error under the baseline decoder for tend = 1000ms only (see Methods Section 4.3). Thus, the error rises as the endpoint time decreases below this, as higher magnitude motor commands are necessary to achieve low error faster. However, note that the difference between baseline decoder, WMPs, and OMPs remains the same even at these lower values of tend.

c. Calibration task response and reachable manifold variance cumulatively accounted for by each dimension of the intrinsic manifold, as in fig. 3g. Intrinsic manifold was found to be about 12-dimensional with inhibition-optimized connectivity and about 6-dimensional with connectivity optimized for delayed center-out reaching.

d. Maximal cursor progress for each target and WMP as a function of target direction angle with ,as in fig. 4c.

e. Mean squared error achieved for each OMP by re-aiming with different numbers of command variables, , as in fig. 5a.

Analysis of WMP readout bias in the direction of .

a. Simulated motor cortical responses in the calibration task, color-matched to the motor commands in fig. 3e driving these responses. These are plotted together with the same reachable manifold activity patterns from fig. 3f, projected onto the same three principal components. The open circles in the interior of this conical structure show the calibration task mean, c, in light purple and the reachable manifold centroid, , in dark purple, each connected to the origin by a line for visualization purposes. Note that, by definition, the calibration task neural responses at time tend = 1000ms (the last point in each trajectory) lie almost exactly on the reachable manifold, slightly offset only because of noise in the response dynamics (see Methods Section 4.7).

b. Norm of for each sampled WMP decoder, D, for each simulated model motor cortical network (see Supplementary Materials Section S.1.1). Overlaid with an open black square is the norm of for the baseline decoder of each corresponding simulation.

Removing the non-negativity constraint fails to reproduce experimentally observed biases in WMP learning.

a. Maximal cursor progress in each target direction as a function of angle with ,for 20 sampled WMPs. Only 20 WMP’s were used as we found that the intrinsic manifold of the linear network was only = 5-dimensional, so we correspondingly adjusted the criteria for subsampling WMP’s and OMP’s (cf. Supplementary Materials Section S.3.3) and found that only 20 WMP’s and 60 OMP’s satisfied them. As was done in fig. 4c, the reachable manifold centroid, , is estimated using simulated mean firing rates during baseline decoder control (see Methods Section 4.6). Maximal cursor progress was calculated exactly as in equation 6, following the same procedure as in the main text for selecting smax (cf. Methods Section 4.4). A total of 8 target directions × 20 sampled WMPs = 160 points are plotted.

b. Activity patterns in the reachable manifold at endpoint time tend = 1000ms, with non-zero command variables. Calibration task responses to each of the eight radial reach stimuli are overlaid in shades of pink, following exactly the same conventions as in Supplementary Figure S2a. These N -dimensional activity patterns are projected onto the top two principal components of the reachable manifold (PC1 and PC2) and the orthogonal dimension capturing the most calibration task response variance (PC3). Because the network dynamics are linear, the reachable manifold is exactly -dimensional, so PC1 and PC2 capture 100% of the variance in activity patterns within it. The small open black circle at the center marks the origin of the state space. The light and dark purple open circles at the origin mark the calibration task mean c the reachable manifold centroid in purple, respectively. The latter is barely visible because they overlap almost completely.

c. Readouts reachable through an example WMP, following the same color conventions as in fig. 3c. The green diamonds show the eight target readouts from the radial cursor reaching task.

Operant conditioning of ensembles of 10 neurons by re-aiming.

a. Difference in ensembles’ summed firing rates (at tend = 1000 ms) produced by the re-aiming solutions optimized for ensemble a being the target, for 500 randomly sampled ensembles of 10 neurons, plotted as a function of the correlation between the two ensembles’ summed firing rates during simulated spontaneous behavior. Here, and denote the summed firing rates of neurons in ensemble a and b, respectively.

b. Activity of indirect neurons (at tend = 1000 ms) for the same ensembles and re-aiming solutions in previous panel, plotted as a function of mean correlation coefficeint with neurons in ensemble a during simulated spontaneous behavior.

Changes in modulation depth under generalized re-aiming.

a. Average change in modulation depth of rotated, non-rotated, and indirect neurons between simulated reaches with the baseline decoder and each perturbed decoder, plotted as a function of the number of command variables used for re-aiming .As in fig. 6e, the changes in modulation depth are averaged over all neurons in each sub-population, and the median over all 100 sampled credit assignment rotation perturbations is plotted. Error bars mark the upper and lower quartiles.

b. Average modulation depth of direct neurons (neurons recorded by the BCI, with non-zero decoding weights in D) and indirect neurons (neurons not recorded by the BCI) under generalized re-aiming solutions for OMP’s. Modulation depths are averaged over all neurons in each sub-population, and the median over all 100 sampled OMPs is plotted. Error bars marks the upper and lower quartiles.

Closed-loop re-aiming reproduces the differences in WMP and OMP learning.

a. Mean squared error (mean over target readouts) achieved by closed loop control with command variables, as a function of time. Each line corresponds to performance on a different decoder, with a correspondingly optimized feedback controller, (equation 41).

b. Mean squared error (mean over target readouts and over time) achieved by error feedback controllers with command variables. Each point corresponds to a different decoder, with medians over all decoders in each class marked by the height of the bars.

c. Mean squared error (mean over target readouts and over time) achieved by error feedback controllers optimized for each OMP, with and 10 command variables. Light blue points denote this quantity for individual OMP’s, larger open circles on top show the median. For reference, dotted horizontal lines show the mean squared error achieved by optimized error feedback with command variables for the baseline decoder (black) and WMP’s (red); the red dotted line shows the median over all sampled WMP’s with shading marking the upper and lower quartiles.

Differences between sampled decoder perturbations and the baseline decoder.

a. Distribution of mean principal angle between row space of baseline decoder and row space of each perturbed decoder.

b. Distribution of mean squared error achieved by mean calibration task responses under each perturbed decoder.

c. Distribution of minimal absolute change in preferred direction needed to produce the same readouts with each perturbed decoder as with the baseline decoder.