An emerging view of neural geometry in motor cortex supports high-performance decoding

  1. Sean M Perkins
  2. Elom A Amematsro
  3. John Cunningham
  4. Qi Wang
  5. Mark M Churchland  Is a corresponding author
  1. Department of Biomedical Engineering, Columbia University, United States
  2. Zuckerman Institute, Columbia University, United States
  3. Department of Neuroscience, Columbia University Medical Center, United States
  4. Department of Statistics, Columbia University, United States
  5. Center for Theoretical Neuroscience, Columbia University Medical Center, United States
  6. Grossman Center for the Statistics of Mind, Columbia University, United States
  7. Kavli Institute for Brain Science, Columbia University Medical Center, United States
8 figures, 9 tables and 1 additional file

Figures

Two perspectives on the structure of neural activity during motor tasks.

(a) Illustration of a traditional perspective. Each neural state (colored dots) is an N-dimensional vector of firing rates. States are limited to a manifold that can be usefully approximated by a subspace. In this illustration, neural states are largely limited to a two-dimensional subspace within the full three-dimensional firing-rate space. Within that restricted subspace, there exist neural dimensions where neural activity correlates with to-be-decoded variables. BCI decoding leverages knowledge of the subspace and of those correlations. If two tasks have similar motor outputs at two particular moments, the corresponding neural states will also be similar, aiding generalization. (b) An emerging perspective. Neural activity is summarized by neural factors (one per axis), with each neuron’s firing rate being a simple function of the factors. Factors may be numerous (many dozens or even hundreds), resulting in a high-dimensional subspace of neural activity. Yet most of that space is unoccupied; factor-trajectories are heavily constrained by the underlying dynamics (gray arrows), such that most states are never visited. The order in which states are visited is also heavily sculpted by dynamics; e.g. trajectories rarely ‘swim upstream’. The primary constraint is thus not a subspace, but a sparse manifold where activity largely flows in specific directions. Different tasks may often (but not always) employ different dynamics and thus different regions of the manifold. Because most aspects of neural trajectories fall in the null space of outgoing commands, there are no directions in neural state space that consistently correlate with kinematic variables, and distant states may correspond to similar motor outputs.

Properties of neural trajectories in motor cortex, illustrated for data recorded during the cycling task (MC_Cycle dataset).

(a) Low tangling implies separation between trajectories that might otherwise be close. Top. Neural trajectories for forward (purple) and backward (orange) cycling. Trajectories begin 600 ms before movement onset and end 600 ms after movement offset. Trajectories are based on trial-averaged firing rates, projected onto two dimensions. Dimensions were selected to highlight apparent crossings of the cyclic trajectories during forward and backward cycling, while also capturing considerable variance (11.5%, comparable to the 11.6% captured by PCs 3 and 4). Gray region highlights one set of apparent crossings. Bottom. Trajectories during the restricted range of times in the gray region, but projected onto different dimensions. The same scale is used in top and bottom subpanels. (b) Examples of well-separated neural states (main panel) corresponding to similar behavioral states (inset). Colored trajectory tails indicate the previous 150 ms of states. Data from 7-cycle forward condition. (c) Joint distribution of pairwise distances for muscle and neural trajectories. Analysis considered all states across all conditions. For both muscle and neural trajectories, we computed all possible pairwise distances across all states. Each muscle state has a corresponding neural state, from the same time within the same condition. Thus, each pairwise muscle-state distance has a corresponding neural-state distance. The color of each pixel indicates how common it was to observe a particular combination of muscle-state distance and neural-state distance. Muscle trajectories are based on 7 z-scored intramuscular EMG recordings. Correspondence between neural and muscle state pairs included a 50 ms lag to account for physiological latency. Results were not sensitive to the presence or size of this lag. Neural and muscle distances were normalized (separately) by average pairwise distance. (d) Same analysis for neural and kinematic distances (based on phase and angular velocity). Correspondence between neural and kinematic state pairs included a 150 ms lag. Results were not sensitive to the presence or size of this lag. (e) Control analysis to assess the impact of sampling error. If two sets of trajectories (e.g. neural and kinematic) are isometric and can be estimated perfectly, their joint distribution should fall along the diagonal. To estimate the impact of sampling error, we repeated the above analysis comparing neural distances across two data partitions, each containing 15–18 trials/condition.

Example training (top panel) and decoding (bottom panels) procedures for MINT, illustrated using four conditions from a reaching task.

(a) Libraries of neural and behavioral trajectories are learned such that each neural state x¯kc corresponds to a behavioral state z¯kc. (b) Spiking observations are binned spike counts (20 ms bins for all main analyses). s¯tτ:t contains the spike count of each neuron for the present bin, t, and τ bins in the past. (c) At each time step during decoding, the log-likelihood of observing s¯tτ:t is computed for each and every state in the library of neural trajectories. Log-likelihoods decompose across neurons and time bins into Poisson log-likelihoods that can be queried from a precomputed lookup table. A recursive procedure (not depicted) further improves computational efficiency. (d) Two candidate neural states (purple and blue) are identified. The first is the state within the library of trajectories that maximizes the log-likelihood of the observed spikes. The second state similarly maximizes that log-likelihood, with the restriction that the second state must not come from the same trajectory as the first (i.e. must be from a different condition). Interpolation identifies an intermediate state that maximizes log-likelihood. (e) The optimal interpolation is applied to candidate neural states – yielding the final neural-state estimate – and their corresponding behavioral states – yielding decoded behavior. Despite utilizing binned spiking observations, neural and behavioral states can be updated at millisecond resolution (Methods).

Figure 4 with 1 supplement
Examples of behavioral decoding provided by MINT for one simulated dataset and four empirically recorded datasets.

All decoding is causal; only spikes from before the decoded moment are used. (a) MINT was applied to spiking data from an artificial spiking network. That network was trained to generate posterior deltoid activity and to switch between reaching and cycling tasks. Based on spiking observations, MINT approximately decoded the true network output at each moment. ‘R3’, ‘R4’, and ‘R6’ indicate three different reach conditions. ‘C’ indicates a cycling bout. MINT used no explicit task-switching, but simply tracked neural trajectories across tasks as if they were conditions within a task. (b) Illustration of the challenging nature, from a decoding perspective, of network trajectories. Trajectories are shown for two dimensions that are strongly occupied during cycling. Trajectories for the 8 reaching conditions (pink) are all nearly orthogonal to the trajectory for cycling (brown) and thus appear compressed in this projection. (c) Decoded behavioral variables (green) compared to actual behavioral variables (black) across four empirical datasets. MC_Cycle and MC_RTT show 10 seconds of continuous decoding. MC_Maze and Area2_Bump show randomly selected trials, demarcated by vertical dashed lines. (d) Distribution of distances between each decoded neural state and the nearest state in the library Ω (green), for the MC_Cycle dataset. To provide a reference, we drew neural states from a Gaussian distribution whose mean and covariance matched the library states, and computed distances to the library states (purple). This emulates what would occur if the manifold were defined by the data covariance and associated neural subspace. The green distribution mostly contains non-zero values, but is much closer to zero than the purple distribution. Thus, MINT rarely decodes a neural state that exactly matches a library state, but most decoded states are close to the scaffolding provided by the library states. Neural distances are normalized by the average pairwise distance between library states.

Figure 4—video 1
Video demonstrating causal neural state estimation and behavioral decoding from MINT on the MC_Cycle dataset.

In this dataset, a monkey moved a hand pedal cyclically forward or backward in response to visual cues. The raster plot of spiking activity (112 neurons, bottom right subpanel) and the actual and decoded angular velocities of the pedal (top right subpanel) are animated with 10 seconds of trailing history. Decoding was causal; the decode of the present angular velocity (right hand edge of scrolling traces) was based only on present and past spiking. Cycling speed was ∼2 Hz. The underlying neural state estimate (green sphere in left subpanel) is plotted in a 3D neural subspace, with a 2D projection below. The present neural state is superimposed on top of the library of 8 neural trajectories used by MINT. The state estimate always remained on or near (via state interpolation) the neural trajectories. Purple and orange trajectories correspond to forward and backward cycling conditions, respectively. The lighter-to-darker color gradients differentiate between trajectories corresponding to 1-, 2-, 4-, and 7-cycle conditions for each cycling direction. Neural state estimate corresponds in time to the right edge of the scrolling raster/velocity plot. The three-dimensional neural subspace was hand-selected to capture a large amount of neural variance (59.6%; close to the 62.9% captured by the top 3 PCs) while highlighting the dominant translational and rotational structure in the trajectories.

Figure 5 with 2 supplements
Comparison of decoding performance, for MINT and four additional algorithms, for four datasets and multiple decoded variables.

On a given trial, MINT decodes all behavioral variables in a unified way based on the same inferred neural state. For non-MINT algorithms, separate decoders were trained for each behavioral group (with the exception of the Kalman filter, which used the same model to decode position and velocity). E.g. separate GRUs were trained to output position and velocity in MC_Maze. Parentheticals indicate the number of behavioral variables within a group. E.g. ‘position (2)’ has two components: x- and y-position. R2 is averaged across behavioral variables within a group. ‘Overall’ plots performance averaged across all behavioral groups. R2 values for feedforward networks and GRUs are additionally averaged across runs for 10 random seeds. The Kalman filter is traditionally utilized for position- and velocity-based decoding and was therefore only used to predict these behavioral groups. Accordingly, the ‘overall’ category excludes the Kalman filter for datasets in which the Kalman filter did not contribute predictions for every behavioral group. Results are based on the following numbers of training / test trials: MC_Cycle (174 train, 99 test), MC_Maze (1721 train, 574 test), Area2_Bump (272 train, 92 test), MC_RTT (810 train, 268 test). Vertical offsets and vertical ticks are used to increase visibility of data when symbols overlap.

Figure 5—figure supplement 1
MINT’s decoding performance is robust to the choice of hyperparameters.

MINT was run on the MC_Maze dataset with systematic perturbations to two of MINT’s hyperparameters: bin width (ms) and window length (ms). Bin width is the size of the bin in which spikes are counted: 20 ms for all analyses in the main figures. Window length is the length, in milliseconds, of spiking history that is considered. For all main analyses of the MC_Maze dataset, this was 300 ms, i.e. MINT considered the spike count in the present bin and in 14 previous bins. Perturbations were also made to two hyperparameters related to learning neural trajectories: temporal smoothing width (standard deviation of Gaussian filter applied to spikes) and condition-smoothing dimensionality (see Methods). These two hyperparameters describe how aggressively the trial-averaged data are smoothed (across time and conditions, respectively) when learning rates. Baseline decoding performance (black circles) was computed using the same hyperparameters that were used with the MC_Maze dataset in the analyses from Figures 4 and 5. Then, decoding performance was computed by perturbing each of the four hyperparameters twice (colored circles): once to ∼50% of the hyperparameter’s baseline value and once to ∼150%. Trials were bootstrapped (1000 resamples) to generate 95% confidence intervals (error bars). Perturbations of hyperparameters had little impact on performance. Altering bin width had essentially no impact, nor did altering temporal smoothing. Shortening window length had a negative impact, presumably because MINT had to estimate the neural state using fewer observations. However, the drop in performance was minimal: R2 dropped by .011 for position decoding only. Reducing the number of dimensions used for across-condition smoothing, and consequently over-smoothing the data, had a negative impact on both position and velocity decoding. Yet again this was small: e.g. velocity R2 dropped by .010. These results demonstrate that MINT can achieve high performance using hyperparameter values that span a large range. Thus, they do not need to be meticulously optimized to ensure good performance. In general, optimization may not be needed at all, as MINT’s hyperparameters can often be set based on first principles. For example, in this study, bin width was never optimized either for specific datasets or in general. We chose to always count spikes in 20 ms bins (except in the perturbations shown here) because this duration is long enough to reduce computation time yet short relative to the timescales over which rates change. Additionally, window length can be optimized (as we did for decoding analyses), but it could also simply be chosen to roughly match the timescale over which past behavior tends to predict future behavior. Temporal smoothing of trajectories when building the library can simply use the same values commonly used when analyzing such data. For example, in prior studies, we have used smoothing kernels of width 20–30 ms when computing trial-averaged rates, and these values also support excellent decoding. Condition smoothing is optional and need not be applied at all, but may be useful if one wishes to record fewer trials for more conditions. For example, rather than record 15 trials for 8 reach directions, one might wish to record 5 trials for 24 conditions, then use condition smoothing to reduce sampling error.

Figure 5—figure supplement 2
Impact of different modeling and preprocessing choices on performance of MINT.

For most applications we anticipate MINT will employ direct decoding that leverages the correspondence between neural and behavioral trajectories, but one could also choose to linearly decode behavior from the estimated neural state. For most applications, we anticipate MINT will use interpolation amongst candidate neural states on neighboring trajectories, but one could also restrict decoding to states within the trajectory library. For real-time applications we anticipate MINT will be run causally, but acausal decoding (using both past and future spiking observations) could be used offline or even online by introducing a lag. We anticipate MINT may be used both in situations where spike events have been sorted based on neuron identity, and situations where decoding simply uses channel-specific unsorted threshold crossings. Panels (a-c) explore the first three choices. Performance was quantified for all eight combinations of: direct MINT readout vs. linear MINT readout, interpolation vs. no interpolation, causal decoding vs. acausal decoding. This was done for 121 behavioral variables across four datasets for a total of 964 R2 values. The ‘phase’ behavioral variable in MC_Cycle was excluded from ‘linear MINT readout’ variants because its circularity makes it a trivially poor target for linear decoding. (a) MINT’s direct neural-to-behavioral-state association outperforms a linear readout based on MINT’s estimated neural state. Performance was significantly higher using the direct readout (ΔR2=0.061 ± 0.002 SE; p<0.001, paired t-test). Note that the linear readout still benefits from the ability of MINT to estimate the neural state using all neural dimensions, not just those that correlate with kinematics. (b) Decoding with interpolation significantly outperformed decoding without interpolation (ΔR2=0.018 ± 0.001 SE; p<0.001, paired t-test). (c) Running acausally significantly improved performance relative to causal decoding (ΔR2=0.051 ± 0.002 SE; p<0.001, paired t-test). Although causal decoding is required for real-time applications, this result suggests that (when tolerable) introducing a small decoding lag could improve performance. For example, a decoder using 200 ms of spiking activity could introduce a 50 ms lag such that the decode at time t is rendered by generating the best estimate of the behavioral state at time t50 using spiking data from t200 through t. (d) Decoding performance for 13 behavioral variables in the MC_Cycle dataset when sorted spikes were used (112 neurons, pink) versus ‘good’ threshold crossings from electrodes for which the signal-to-noise ratio of the firing rates exceeded a threshold (93 electrodes, SNR >2, cyan). The loss in performance when using threshold crossings was small (ΔR2=–0.014 ± 0.002 SE). SE refers to standard error of the mean.

Figure 6 with 1 supplement
Implications of neural geometry for decoder generalization, in principle and in practice.

(a) Under a traditional perspective, training that explores the full range of to-be-decoded variables will also explore the relevant neural manifold, allowing future generalization. Gray dots indicate neural states observed during training. Colored traces indicate newly observed trajectories. Because these trajectories traverse the previously defined manifold, they can be decoded by leveraging the established correlations between manifold dimensions and kinematics. (b) Under an emerging perspective, generalization is possible when new trajectories lie within a similar region of factor-space as training-set trajectories. Gray-dashed traces indicate training-set neural trajectories. Colored traces indicate newly observed trajectories. Green and blue trajectories largely follow the flow-field implied by training-set trajectories, allowing generalization. In contrast, the orange trajectory evolves far from any previously explored region, challenging generalization. This may occur even when behavioral outputs overlap with those observed during training. (c) Basic decoding performance on the MC_PacMan task for MINT (green) and a Wiener filter (red). Test sets used held-out trials from the same conditions as training sets. Force-decoding was excellent both when considering all conditions and when considering only ramps (fast and slow, increasing and decreasing). Results are based on the following numbers of training and test trials: All (234 train, 128 test), Ramps (91 train, 57 test). (d) Generalization when training employed the four ramp conditions and testing employed a sine or chirp. Results are based on 91 training trials (all cases) and the following numbers of test trials: Ramps (57), 0.25 Hz Sine (19), 1 Hz Sine (12), 3 Hz Sine (15), Chirp (25). (e) Data likelihoods indicate strained generalization. We computed the log-likelihood of the spiking observations for each neural state decoded by MINT. Rightward values indicate that those observations were unlikely even for the state selected as most likely to have generated those observations. Training employed the four ramp conditions. Distributions are shown when test data employed ramps, a 0.25 Hz sine, or a 3 Hz sine. (f) Examples of force decoding traces (for held-out trials) when decoding using the Wiener filter. Top row: all conditions were used during training. Bottom row: training employed the four ramp conditions. Horizontal and vertical scale bars correspond to 2 s and 16 Newtons, respectively. (g) As above, but when decoding using MINT. Trials and scale bars are matched across (f) and (g). Representative trials were selected as those that achieved median decoding performance by MINT within each condition.

Figure 6—figure supplement 1
Illustration of why decoding will typically fail to generalize across tasks when neural trajectories occupy orthogonal subspaces.

In theory, learning the output-potent dimensions in motor cortex would be an effective strategy for biomimetic decoding that generalizes to novel tasks. In practice, it is difficult (and often impossible) to statistically infer these dimensions without observing the subject’s full behavioral repertoire (at which point generalization is no longer needed because all the relevant behaviors were directly observed). (a) In this toy example, two neural trajectories occupy fully orthogonal subspaces. In the ‘blue task’, the neural trajectory occupies dimensions 1 and 2. In the ‘red task’, the neural trajectory occupies dimensions 3 and 4. Trajectories in both subspaces have a non-zero projection onto wout, enabling neural activity from each task to drive the appropriate output. The output at the right is simply the projection of the blue-task and red-task trajectories onto wout. (b) Illustration of the difficulty of inferring wout from only one task. In this example, wout is estimated using data from the blue task, by linearly regressing the observed output against the neural trajectory for the blue task. The resulting estimate, w^1, correctly translates neural activity in the blue task into the appropriate time-varying output, but fails to generalize to the red task. This failure to generalize is a straightforward consequence of the fact that w^1 was learned from data that didn’t explore dimensions 3 and 4. Note that this same phenomenon would occur if w^1 were estimated by regressing intended output versus neural activity (as might occur in a paralyzed patient). (c) Estimating the output-potent dimension based only on the red task yields an estimate, w^2, that fails to generalize to the blue task. This phenomenon is illustrated here for a linear readout, but would apply to most nonlinear methods as well, unless some additional form of knowledge allows inferences regarding neural trajectories in previously unseen dimensions.

Evaluation of neural state estimates for seven datasets.

(a) Performance quantified using bits per spike. The benchmark’s baseline methods have colored markers. All other submissions have gray markers. Vertical offsets and vertical ticks are used to increase visibility of data when symbols are close due to similar values. Results with negative values are designated by markers to the left of zero, but their locations don’t reflect the magnitude of the negative values. Data from Neural Latents Benchmark (https://neurallatents.github.io/). (b) Performance quantified using PSTH R2. (c) Performance quantified using velocity R2, after velocity was decoded linearly from the neural state estimate. A linear decode is used simply as a way of evaluating the quality of neural state estimates, especially in dimensions relevant to behavior.

Figure 8 with 2 supplements
Decoding robustness in the face of small training sets and reduced neuron counts.

(a) R2 values for MINT and two neural network decoders (GRU and the feedforward network) when decoding position and velocity for four maze datasets with progressively fewer training trials per condition. Results are based on the following numbers of training and testing trials: MC_Maze-S (75 train, 25 test), MC_Maze-M (188 train, 62 test), MC_Maze-L (375 train, 125 test), MC_Maze (1721 train, 574 test). MC_Maze contains 108 conditions and the other maze datasets each contain 27 conditions. (b) Decoding performance in the face of undetected (dashed traces) and known (solid traces) loss of neurons. R2 values are shown for MINT (green) and the Wiener filter (red) when decoding position and velocity from the MC_Maze-L dataset (same train/test trials as in a). To simulate undetected neuron loss, we chose k neurons (randomly, with k being the number of lost neurons out of 162 total) and eliminated their spikes without adjusting the decoders. This procedure was repeated 50 times for each value of k. Traces show the mean and standard deviation of the sampling distribution (equivalent to the standard error). To simulate known neuron loss, we followed the above procedure but altered the decoder to account for the known loss. For MINT, this simply involved ignoring the lost neurons when computing data likelihoods. In contrast, the Wiener filter had to be retrained using training data with those neurons eliminated (as would be true for most methods). This was done separately for each set of lost neurons.

Figure 8—figure supplement 1
The Kalman filter’s relative performance would improve if neural data had different statistical properties.

We compared MINT to the Kalman filter across one empirical dataset (MC_Maze) and four simulated datasets (same behavior as MC_Maze, but simulated spikes). Simulated firing rates were linear functions of hand position, velocity, and acceleration (as is assumed by the Kalman filter). The means and standard deviations (across time and conditions) of the simulated firing rates were matched to actual neural data (with additional rate scaling in two cases). The Kalman filter assumes that observation noise is stationary and Gaussian. Although spiking variability cannot be Gaussian (spike counts must be positive integers), spiking variability can be made more stationary by letting that variability depend less on rate. Thus, although the first simulation generated spikes via a Poisson process, subsequent simulations utilized gamma-interval spiking (gamma distribution with α=2 and β=2λ, where λ is firing rate). Gamma-interval spiking variability of this form is closer to stationary at higher rates. Thus, the third and fourth simulations scaled up firing rates to further push spiking variability into a more stationary regime at the expense of highly unrealistic rates (in the fourth simulation, a firing rate briefly exceeded 1800 Hz). Overall, as the simulated neural data better accorded with the assumptions of the Kalman filter, decoding performance for the Kalman filter improved. Interestingly, MINT continued to perform well even on the simulated data, likely because MINT can exploit a linear relationship between neural and behavioral states when the data argue for it and higher rates benefit both algorithms. These results demonstrate that algorithms like MINT and the Kalman filter are not intrinsically good or bad. Rather, they are best suited to data that match their assumptions. When simulated data approximate the assumptions of the Kalman filter, both methods perform similarly. However, MINT shows much higher performance for the empirical data, suggesting that its assumptions are a better match for the statistical properties of the data.

Figure 8—figure supplement 2
MINT is a modular algorithm amenable to a variety of modifications and extensions.

The flowchart illustrates the standard MINT algorithm in black and lists potential changes to the algorithm in red. For example, the library of neural trajectories is typically learned via standard trial-averaging or a single-trial rate estimation technique like AutoLFADS. However, transfer learning could be utilized to learn the library of trajectories based on trajectories from a different subject or session. The trajectories could also be modified online while MINT is running to reflect changes in spiking activity that relate to recording instabilities. A potential modification to the method also occurs when likelihoods are converted into posterior probabilities. Typically, we assume a uniform prior over states in the library. However, that prior could be set to reflect the relative frequency of different behaviors and could even incorporate time-varying information from external sensors (e.g. state of a prosthetic limb, eye tracking, etc.) that include information about how probable each behavior is at a given moment. Another potential extension of MINT occurs at the stage where candidate neural states are selected. Typically, these states are selected based solely on spike count likelihoods. However, one could use a utility function that reflects a user’s values (e.g. the user may dislike some decoding mistakes more than others) in conjunction with the likelihoods to maximize expected utility. Lastly, the behavioral estimate that MINT returns could be post-processed (e.g. temporally smoothed). MINT’s modularity largely derives from the fact that the library of neural trajectories is finite. This assumption enables posterior probabilities to be directly computed for each state, rather than analytically derived. Thus, choices like how to learn the library of trajectories, which observation model to use (e.g. Poisson vs. generalized Poisson), and which (if any) state priors to use, can be made independently from one another. These choices all interact to impact performance. Yet they will not interact to impact tractability, as would have been the case if one analytically derived a continuous posterior probability distribution.

Tables

Table 1
MINT training times and average execution times (average time it took to decode a 20 ms bin of spiking observations).

To be appropriate for real-time applications, execution times will ideally be shorter than the bin width. Note that the 20 ms bin width does not prevent MINT from decoding every millisecond; MINT updates the inferred neural state and associated behavioral state between bins, using neural and behavioral trajectories that are sampled every millisecond. For all table entries (except MC_RTT training time), means and standard deviations were computed across 10 train/test runs for each dataset. Training times exclude loading datasets into memory and any hyperparameter optimization. Timing measurements taken on a Macbook Pro (on CPU) with 32GB RAM and a 2.3 GHz 8-Core Intel Core i9 processor. Training and execution code used for timing measurements was written in MATLAB (with the core recursion implemented as a MEX file). For MC_RTT, training involved running AutoLFADS twice (and averaging the resulting rates) to generate neural trajectories. This training procedure utilized 10 GPUs and took ∼1.6 hr per run. For AutoLFADS, hyperparameter optimization and model fitting procedures are intertwined. Thus, the training time reported includes time spent optimizing hyperparameters.

DatasetTraining Time (s)Execution Time (ms)
Area2_Bump4.8 ± 0.20.31 ± 0.03
MC_Cycle20.3 ± 2.90.55 ± 0.03
MC_Maze42.8 ± 0.62.26 ± 0.09
MC_Maze-L10.8 ± 0.40.83 ± 0.02
MC_Maze-M8.0 ± 0.40.80 ± 0.04
MC_Maze-S5.8 ± 0.20.59 ± 0.07
MC_RTT∼3.2 hr6.99 ± 0.28
Table 2
Neuron, condition, and trial counts for each dataset.

For some datasets, there are additional trials that are excluded from these counts. These trials are excluded because they were only usable for Neural Latents Benchmark submissions due to hidden behavioral data and partially hidden spiking data (see “Neural Latents Benchmark” section).

DatasetNeuronsConditionsTrials
Area2_Bump6516364
DMFC_RSG54401006
MC_Cycle1128273
MC_Maze1821082295
MC_Maze-L16227500
MC_Maze-M15227250
MC_Maze-S14227100
MC_RTT130N/A1080
MC_PacMan1288362
Maze Simulations1821082295
Multitask Network12009270
Table 3
Hyperparameters for learning neural trajectories via standard trial-averaging.

σ is the standard deviation of the Gaussian kernel used to temporally filter spikes. Trial-averaging Type I and Type II procedures are described in the “Averaging across trials” section. Dneural and Dcondition are the neural- and condition-dimensionalities described in the “Smoothing across neurons and/or conditions” section. ‘Full’ means that no dimensionality reduction was performed. Condition smoothing could not be performed for MC_Cycle or the multitask network because different conditions in these datasets are of different lengths (i.e. Kc is not the same for all c). (C) and (R) refer to the cycling and reaching trajectories, respectively, in the multitask network. tmove, tstop, and tgo correspond to movement onset, movement offset, and the ‘go’ time in the ready-set-go task, respectively. In MC_PacMan, each force profile is padded with static target forces at the beginning and end. tprof_start and tprof_end mark the beginning and end of the non-padded force profile on each trial.

DatasetTrajectory StartTrajectory EndσTrial AveragingDneuralDcondition
Area2_Bumptmove – 350tmove + 75025Type IIFullFull
DMFC_RSGtgo – 1950tgo + 75055Type I4917
MC_Cycletmove – 600tstop + 60030Type IFullN/A
MC_Mazetmove – 500tmove + 70030Type IIFull21
MC_Maze-Ltmove – 500tmove + 70035Type IIFull16
MC_Maze-Mtmove – 500tmove + 70060Type IIFull20
MC_Maze-Stmove – 500tmove + 70085Type II12010
MC_PacMantprof_start – 1000tprof_end + 100035Type IIFullFull
Maze Simulationstmove – 500tmove + 70030Type IIFull21
Multitask Network(C) tmove – 1000
(R) tmove – 800
(C) tstop + 1000
(R) tmove + 1000
15Type IFullN/A
Table 4
Details for decoding analyses.

The number of training and testing trials for each dataset are provided along with the evaluation period over which performance was computed. Generalization analyses used subsets of these training and testing trials. The window lengths refer to the amount of spiking history MINT used for decoding (e.g. when τ=14 and Δ=20 the window length is (τ+1)Δ=300 ms). tmove corresponds to movement onset, tstop corresponds to movement offset, tstart refers to the beginning of a trial, and tend refers to the end of a trial. There is no defined condition structure for MC_RTT to use for defining trial boundaries. Thus, each trial is simply a 600 ms segment of data, with no alignment to movement. Although 270 of these segments were available for testing, the first 2 segments lacked sufficient spiking history for all decoders to be evaluated and were therefore excluded, leaving 268 test trials. In MC_PacMan, each force profile is padded with static target forces at the beginning and end. tprof_start and tprof_end mark the beginning and end of the non-padded force profile on each trial. For the multitask network, performance was evaluated on a continuous stretch of 135 trials spanning ∼7.1 minutes.

DatasetTraining TrialsTest TrialsEvaluation StartEvaluation EndWindow Length (ms)
Area2_Bump27292tmove – 100tmove + 500240
MC_Cycle17499tmove – 250tstop + 250200
MC_Maze1721574tmove – 250tmove + 450300
MC_Maze-L375125tmove – 250tmove + 450300
MC_Maze-M18862tmove – 250tmove + 450300
MC_Maze-S7525tmove – 250tmove + 450340
MC_RTT810268tstarttstart + 599480
MC_PacMan234128tprof_start – 500tprof_end + 1000300
Maze Simulations1721574tmove – 250tmove + 450300
Multitask Network135135tstart (first trial)tend (last trial)300
Table 5
Details for neural state estimation results.

Note that the training trial counts match the total number of trials reported in Table 2. This reflects that the Neural Latents Benchmark utilized an additional set of test trials not reflected in the Table 2 trial counts. The test trials used for this analysis have ground truth behavior hidden by the benchmark creators and are therefore only suitable for this analysis.

DatasetTraining TrialsTest TrialsHeld-in NeuronsHeld-out NeuronsWindow Length (ms)
Area2_Bump364984916500
DMFC_RSG100628340141500
MC_Maze229557413745500
MC_Maze-L50010012240500
MC_Maze-M25010011438500
MC_Maze-S10010010735500
MC_RTT10802719832500
Appendix 1—table 1
Hyperparameters used for the Wiener filter.

The L2 regularization term λ was optimized in the range [0, 2000], with the optimized values rounded to the closest multiple of 10. Window lengths were optimized (in 20 ms increments) in the range [200, 600] for Area2_Bump, [200, 1000] for MC_Cycle, [200, 1200] for MC_RTT, [200, 500] for MC_PacMan, and [200, 700] for MC_Maze, MC_Maze-L, MC_Maze-M, and MC_Maze-S. These ranges were determined by the structure of each dataset (e.g. Area2_Bump couldn’t look back more than 600 ms from the beginning of the evaluation epoch without entering the previous trial). Window lengths are directly related to κ via Δ (e.g. κ=14 would correspond to a window length of Δ(κ+1)=20(14+1)=300 ms).

DatasetBehavioral GroupWindow Length (ms)L2 Regularization (λ)
Area2_BumpPosition560350
Velocity320320
Force600900
Joint Angles300250
Joint Velocities4001410
Muscle Lengths3800
Muscle Velocities460940
MC_CyclePhase10001730
Angular Velocity9601470
Position9201710
Velocity420600
EMG5601990
MC_MazePosition660530
Velocity540210
MC_Maze-LPosition540510
Velocity540440
MC_Maze-MPosition660440
Velocity480310
MC_Maze-SPosition680130
Velocity420270
MC_RTTVelocity11801610
MC_PacManForce480470
Appendix 1—table 2
Hyperparameters used for the Kalman filter.

The lag (in increments of 20 ms time bins) between neural activity and behavior was optimized in the range [2, 8], corresponding to 40–160 ms, for all datasets except Area2_Bump. For Area2_Bump the lag was not optimized and was simply set to 0 due to the fact that, in a sensory area, movement precedes sensory feedback. Given that xk aggregates spikes across the whole time bin, but yk corresponds to the behavioral variables at the end of the time bin, the effective lag is actually half a bin (10 ms) longer — i.e. the effective range of lags considered for the non-sensory datasets was 50–170 ms.

DatasetBehavioral GroupLag (bins)
Area2_BumpPosition & Velocity0
MC_CyclePosition & Velocity2
MC_MazePosition & Velocity4
MC_Maze-LPosition & Velocity4
MC_Maze-MPosition & Velocity4
MC_Maze-SPosition & Velocity4
MC_RTTPosition & Velocity2
Appendix 1—table 3
Hyperparameters used for the feedforward neural network.

The number of hidden layers (L) was optimized in the range [1, 15]. The number of units per hidden layer (D) was optimized in the range [50, 1000], with the optimized values rounded to the closest multiple of 10. The dropout rate was optimized in the range [0,0.5] and the number of training epochs was optimized in the range [2, 100]. Window lengths were optimized (in 20 ms increments) in the range [200, 600] for Area2_Bump, [200, 1000] for MC_Cycle, [200, 1200] for MC_RTT, and [200, 700] for MC_Maze, MC_Maze-L, MC_Maze-M, and MC_Maze-S.

DatasetBehavioral GroupWindow Length (ms)Hidden Layers (L)Units/Layer (D)Dropout RateEpochs
Area2_BumpPosition44046900.2058
Velocity5205260062
Force24045600.0154
Joint Angles32074800.1559
Joint Velocities46056600.1236
Muscle Lengths240108400.1981
Muscle Velocities50064800.0224
MC_CyclePhase74081800.1124
Angular Velocity70087100.0143
Position3609470045
Velocity46053300.2555
EMG84059700.0743
MC_MazePosition68073400.1565
Velocity70014760089
MC_Maze-LPosition38061600.0544
Velocity60083800.1343
MC_Maze-MPosition52088600.2679
Velocity42023600.0959
MC_Maze-SPosition58083400.0274
Velocity52042100.0794
MC_RTTVelocity104027000.1517
Appendix 1—table 4
Hyperparameters used for the GRU network.

The number of units (D) was optimized in the range [500, 1000], with the optimized values rounded to the closest multiple of 10. The dropout rate was optimized in the range [0,0.5] and the number of training epochs was optimized in the range [2, 50]. Window lengths were optimized (in 20 ms increments) in the range [200, 600] for Area2_Bump, [200, 1000] for MC_Cycle, [200, 1200] for MC_RTT, and [200, 700] for MC_Maze, MC_Maze-L, MC_Maze-M, and MC_Maze-S.

DatasetBehavioral GroupWindow Length (ms)Units (D)
Dropout RateEpochs
Area2_BumpPosition5607000.0612
Velocity3408200.323
Force3803800.2726
Joint Angles4806300.318
Joint Velocities2603900.279
Muscle Lengths4609900.3445
Muscle Velocities4208700.1947
MC_CyclePhase4605800.1849
Angular Velocity4803900.1227
Position9207600.0642
Velocity5201700.4345
EMG8402500.4036
MC_MazePosition6407400.274
Velocity5208000.3411
MC_Maze-LPosition6206400.4049
Velocity5804200.3631
MC_Maze-MPosition6208200.4148
Velocity6608400.2930
MC_Maze-SPosition4603100.3927
Velocity6805000.4445
MC_RTTVelocity5408900.408

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Sean M Perkins
  2. Elom A Amematsro
  3. John Cunningham
  4. Qi Wang
  5. Mark M Churchland
(2025)
An emerging view of neural geometry in motor cortex supports high-performance decoding
eLife 12:RP89421.
https://doi.org/10.7554/eLife.89421.3