A theory of brain-computer interface learning via low-dimensional control

  1. Gatsby Computational Neuroscience Unit, University College London, London, United Kingdom
  2. Harvard University, Cambridge, United States
  3. University of Washington, Seattle, United States
  4. University of Pittsburgh, Pittsburgh, United States
  5. Carnegie Mellon University, Pittsburgh, United States

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Srdjan Ostojic
    École Normale Supérieure - PSL, Paris, France
  • Senior Editor
    Panayiota Poirazi
    FORTH Institute of Molecular Biology and Biotechnology, Heraklion, Greece

Reviewer #1 (Public review):

Summary:

This study considers learning with brain-computer interfaces (BCIs) in nonhuman primates, and in particular, the high speed and flexibility with which subjects learn to control these BCIs.

The authors raise the hypothesis that such learning is based on controlling a small number of input or control variables, rather than directly adapting neural connectivity within the network of neurons that drive the BCI. Adapting a small number of input variables would circumvent the issue of credit assignment in high dimensions and allow for quick learning, potentially using cognitive strategies ("re-aiming"). Based on a computational model, the authors show that such a strategy is viable in a number of experimental settings and reproduces previous experimental observations:

(1) Differences in learning with decoders either within or outside of the neural manifold (the space spanned by the dominant modes of neural activity).

(2) A novel, theory-based prediction on biases in BCI learning due to the positivity of neural firing rates, which is then confirmed in data from previous experiments.

(3) An example of "illusory credit assignment": Changes in neurons' tuning curves depending on whether these neurons are affected by changes in the BCI decoder, even though learning only happens on the level of low-dimensional control variables.

(4) A reproduction of results from operant conditioning of individual neurons, in particular, the observation that it is difficult to change the firing rates of neurons strongly correlated before learning in different directions (up vs down).

Taken together, these observations yield strong evidence for the plausibility that subjects use such a learning strategy, at least during short-term learning.

Strengths:

Text and figures are clearly structured and allow readers to understand the main concepts well. The study presents a very clear and simple model that explains a number of seemingly disparate or even contradictory observations (neuron-specific credit assignment vs. low-dimensional, cognitive control). The predicted and tested bias due to positivity of firing rates provides a neat example of how such a theory can help understand experimental results. The idea that subjects first use a small number of command variables (those sufficient in the calibration task) and later, during learning, add more variables provides a nice illustration of the idea that learning takes place on multiple time scales, potentially with different mechanisms at play. On a more detailed level, the study is a nice example of closely matching the theory to the experiment, in particular regarding the modeling of BCI perturbations.

Weaknesses:

Overall, I find only two minor weaknesses. First, the insights of this study are, first and foremost, of feed-forward nature, and a feed-forward network would have been enough (and the more parsimonious model) to illustrate the results. While using a recurrent neural network (RNN) shows that the results are, in general, compatible with recurrent dynamics, the specific limitations imposed by RNNs (e.g., dynamical stability, low-dimensional internal dynamics) are not the focus of this study. Indeed, the additional RNN models in the supplementary material show that under more constrained conditions for the RNN (low-dimensional dynamics), using the input control alone runs into difficulties.

Second, explaining the quantitative differences between the model and data for shifts in tuning curves seems to take the model a bit too literally. The model serves greatly for qualitative observations. I assume, however, that many of the unconstrained aspects of the model would yield quantitatively different results.

Reviewer #2 (Public review):

Summary :

The paper proposes a model to explain the learning that occurs in brain-computer interface (BCI) tasks when animals need to adapt to novel BCI decoders. The model consists of a network formulation of the "re-aiming" learning strategy, which assumes that BCI learning does not modify the underlying neural circuitry, but instead occurs through a reorganization of existing neural activity patterns.

The authors formalize this in a recurrent neural network (RNN) model, driven by upstream inputs that live in a low-dimensional space.

They show that modelling BCI learning as reorganization of these upstream inputs can explain several experimental findings, such as the difference in the ability of animals to adapt to within vs outside-manifold perturbations, biases in the decoded behaviour after within-manifold perturbations, or qualitative changes in the neural responses observed during credit assignment rotation perturbations or operant conditioning of individual neurons.

Overall, while the idea of re-aiming as a learning strategy has previously been proposed in the literature, the authors show how it can be formalized in a network model, which allows for more direct comparisons to experimental data.

Strengths:

The paper is very well written. The presentation of the model is clear, and the use of vanilla RNN dynamics driven by upstream inputs that are constant in time is consistent with the broader RNN modeling literature.

The main value of the paper lies in the fact that it proposes a network implementation for a learning strategy that had been proposed previously. The network model has a simple form, but the optimization problem is performed in the space of inputs, which requires the authors to solve a nonlinear optimization problem in that space.

While some of the results (eg the fact that the model can adapt to within but not outside-manifold perturbations) are to be expected based on the model assumptions, having a network model allows to make more direct and quantitative comparisons to experiments, to investigate analytically how much the dimension of the output is constrained by the input, and to make predictions that can be tested in data.

The authors perform such comparisons across three different experiments. The results are clearly presented, and the authors show that they hold for various RNN connectivities.

Weaknesses :

The authors mention alternative models (eg, based on synaptic plasticity in the RNN and/or input weights) that can explain the same experimental data that they do, they do not provide any direct comparisons to those models.

Thus, the main argument that the authors have in favor of their model is the fact that it is more plausible because it relies on performing the optimization in a low-dimensional space. It would be nice to see more quantitative arguments for why the re-aiming strategy may be more plausible than synaptic plasticity (either by showing that it explains data better, or explaining why it may be more optimal in the context of fast learning).

In particular, the authors model the adaptation to outside-manifold perturbations (OMPs) through a "generalized re-aiming strategy". This assumes the existence of additional command variables, which are not used in the original decoding task, but can then be exploited to adapt to these OMPs. While this model is meant to capture the fact that optimization is occurring in a low-dimensional subspace, the fact that animals take longer to adapt to OMPs suggests that WMPs and OMPs may rely on different learning mechanisms, and that synaptic plasticity may actually be a better model of adaptation to OMPs. It would be important to discuss how exactly generalized re-aiming would differ from allowing plasticity in the input weights, or in all weights in the network. Do those models make different predictions, and could they be differentiated in future experiments?

Author response:

Reviewer #1 (Public Review):

Overall, I find only two minor weaknesses. First, the insights of this study are, first and foremost, of feed-forward nature, and a feed-forward network would have been enough (and the more parsimonious model) to illustrate the results. While using a recurrent neural network (RNN) shows that the results are, in general, compatible with recurrent dynamics, the specific limitations imposed by RNNs (e.g., dynamical stability, low-dimensional internal dynamics) are not the focus of this study. Indeed, the additional RNN models in the supplementary material show that under more constrained conditions for the RNN (low-dimensional dynamics), using the input control alone runs into difficulties.

We thank the reviewer for raising this important point. While we agree that recurrent dynamics were not the focus of this study, we would like to point out that 1) dynamics, of some kind, are necessary to simulate the decoder fitting process and 2) recurrent neural networks (RNNs) are valuable for obtaining general insights on how biological constraints shape the reachable manifold:

(1) To simulate the decoder fitting process, we had to simulate neural activity during the so-called “calibration task”. Some dynamics to these responses are necessary to produce a population response with dimensionality resembling what was found in experiments (10 dimensions). Moreover, dynamics are necessary to create a common direction of high variance across population responses to the calibration task stimuli (see Supplementary Figure 2a and surrounding discussion), which is necessary to reproduce the biases in readouts demonstrated in Figure 4 (as many within-manifold decoder perturbations are aligned with it; Supplementary Figure 2b).

Because feed-forward networks lack dynamics, reproducing our results with a feed-forward network would require using an input with dynamics. Rather than making an arbitrary choice for these input dynamics, we chose to keep the input static and instead generate the dynamics with a RNN, which is in line with recent models of motor cortex.

We agree, however, that this is an important point worth clarifying in the manuscript. In our revision we will aim to add a demonstration of how to reproduce a subset of our results with a feed-forward network and a dynamic input.

(2) While we agree that RNNs impose certain limitations over feed-forward networks, we see these limitations as an advantage because they provide a framework for understanding the structure of the reachable manifold in terms of biological constraints. For example, our simulations in Supplementary Figure 1 show that the dimensionality of the reachable manifold is highly dependent on recurrent connectivity: inhibition-stabilized connectivity makes it higher-dimensional whereas task-specific optimized connectivity makes it lower-dimensional. Such insights are valuable to understand the broader implications and experimental predictions of the re-aiming strategy.

Because feed-forward networks are untied from the reality of recurrent cortical circuitry, they cannot be characterized in terms of such biological constraints. For instance, as the reviewer points out, dynamical stability is not a well-defined property of feed-forward networks. Such models therefore cannot provide any insight into how the biological constraint of dynamical stability could influence the reachable manifold (which we show it does in Figure 5b). Relatedly, feed-forward networks cannot be optimized to solve complex spatiotemporal tasks like the ballistic reaching task we used for our task-optimized RNN (Supplementary Figure 1, right column), so cannot be used to understand how such behavioral constraints would influence the reachable manifold.

We agree that these reasons for using RNNs are subtle and left implicit in how they are currently exposed in the text. We will add a discussion point clarifying these in our revision.

Second, explaining the quantitative differences between the model and data for shifts in tuning curves seems to take the model a bit too literally. The model serves greatly for qualitative observations. I assume, however, that many of the unconstrained aspects of the model would yield quantitatively different results.

We completely agree: our model is best used to provide a qualitative description of the capabilities of the re-aiming strategy. We will be sure to revise our manuscript to keep such quantitative comparisons at a minimum.

Reviewer #2 (Public Review):

The authors mention alternative models (eg, based on synaptic plasticity in the RNN and/or input weights) that can explain the same experimental data that they do, they do not provide any direct comparisons to those models. Thus, the main argument that the authors have in favor of their model is the fact that it is more plausible because it relies on performing the optimization in a low-dimensional space. It would be nice to see more quantitative arguments for why the re-aiming strategy may be more plausible than synaptic plasticity (either by showing that it explains data better, or explaining why it may be more optimal in the context of fast learning).

We agree this remains a limitation of our study. To contrast our re-aiming model with models of synaptic plasticity (in the input and/or recurrent weights), we have included substantial discussion of these alternative models in two sections of the manuscript:

  • Introduction, where we elaborate on the argument that synaptic plasticity requires solving an exceptionally difficult optimization problem in high dimensions

  • Discussion section “The role of synaptic plasticity in BCI learning”, where we review a number of synaptic plasticity models and experimental results they can account for

We fully agree that more quantitative comparisons remain an important follow-up to this line of research. However, it is worth noting that there are many such models out there. Moreover, as is the case with many computational models, the results one can achieve with any given model can be highly sensitive to a number of different hyperparameters (e.g. learning rates). We therefore feel that a more rigorous comparison requires deeper study and is out of scope of this manuscript.

In particular, the authors model the adaptation to outside-manifold perturbations (OMPs) through a "generalized re-aiming strategy". This assumes the existence of additional command variables, which are not used in the original decoding task, but can then be exploited to adapt to these OMPs. While this model is meant to capture the fact that optimization is occurring in a low-dimensional subspace, the fact that animals take longer to adapt to OMPs suggests that WMPs and OMPs may rely on different learning mechanisms, and that synaptic plasticity may actually be a better model of adaptation to OMPs.

We thank the reviewer for raising this question. We agree that the fact that animals take longer to adapt to OMPs suggests that the underlying learning strategy is somehow different. But the argument we try to make in this section of the paper is that it in fact does not require an entirely different mechanism. Our simulations show that the same mechanism of re-aiming can suffice to learn OMPs, but it simply requires re-aiming in the larger space of all command variables available to the motor system (rather than just the two command variables evoked by the calibration task). Because this is a much higher-dimensional search space (10-20 vs. 2 dimensions, which is a substantial difference due to the curse of dimensionality), we argue that learning should be slower, even though the mechanism (i.e. re-aiming) is the same.

This is an important and somewhat surprising takeaway from these simulations, which we will try to bring up more explicitly and clearly in the revision.

It would be important to discuss how exactly generalized re-aiming would differ from allowing plasticity in the input weights, or in all weights in the network. Do those models make different predictions, and could they be differentiated in future experiments?

They do in fact make different predictions, and we thank the reviewer for asking and pointing out the lack of discussion of this point. The key difference between these two learning mechanisms is demonstrated in Figure 5b: under generalized re-aiming, there is a fundamental limit to the set of activity patterns one can learn to produce in the brain-computer interface (BCI) learning task. This is quantified in that analysis by the asymptotic participation ratio of the reachable manifold as K increases, which indicates that there is a limited ~12-dimensional subspace that the reachable manifold can occupy. The specific orientation of this subspace is determined by the (recurrent and input) connectivity of the recurrent neural network. With synaptic plasticity in any of the weight matrices (Wrec,Win,U), this subspace could be re-oriented in any arbitrary direction. Our theory of “generalized re-aiming” therefore predicts that the reachable manifold is 1) constrained to a low-d subspace and 2) is not modified when learning BCIs with outside-manifold perturbations.

Experimentally testing this would require a within-/outside- manifold perturbation BCI learning task akin to that of Sadtler et al, but where the “intrinsic manifold” is measured from population responses evoked by every possible motor command so as to entirely contain the full reachable manifold at max K. This would require measuring motor cortical activity during naturalistic behavior under a wide range of conditions, rather than just in response to the 2D cursor movements on the screen used in the calibration task of the original study. In this case, learning outside-manifold perturbations would require re-orienting the reachable manifold, so a pure generalized re-aiming strategy would fail to learn them. Synaptic plasticity, on the other hand, would not.

We will be sure to elaborate further on this claim in the revised manuscript.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation