Indistinguishable network dynamics can emerge from unalike plasticity rules

  1. University of Tübingen, Germany
  2. Institute of Science and Technology, Austria
  3. VIB-Neuroelectronics Research Flanders (NERF), Belgium
  4. imec, Belgium
  5. Max Planck Institute for Intelligent Systems, Tübingen, Germany

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Julijana Gjorgjieva
    Technical University of Munich, Freising, Germany
  • Senior Editor
    Panayiota Poirazi
    FORTH Institute of Molecular Biology and Biotechnology, Heraklion, Greece

Reviewer #1 (Public Review):

Summary:

The pioneering work of Eve Marder on central pattern generators in the stomatogastric ganglion (STG) has made a strong case for redundancy as a biological mechanism for ensuring functional robustness, where multiple configurations of biophysical parameters are equivalent in terms of their ability to generate desired patterns of periodic circuit activity. In parallel, normative theories of synaptic plasticity have argued for functional equivalences between learning objectives and corresponding plasticity rules in implementing simple unsupervised learning (see Brito & Gerstner 2016, although similar arguments have been made long before e.g. in Aapo Hyvarinen's ICA book). This manuscript argues that similar notions of redundancy need to be taken into account in the study of synaptic plasticity rules in the brain, more specifically in the context of data-driven approaches to extract the "true" synaptic plasticity rule operating in a neural circuit from neural activity recordings. Concretely, the modeling approach takes a set of empirical measurements of the evolution of neural activity and trains a flexibly parametrized model to match that in statistical terms. Instead of being predefined by the experimenter, the features that determine this match are themselves extracted from data using a generative adversarial network framework (GAN). They show that the flexible models manage to reproduce the neural activity to a reasonable degree (though not perfectly), but lead to very different synaptic trajectories.

Strengths:

The idea of learning rule redundancy is a good one, and the use of GANs for the learning rule estimation is a good complement to other data-driven approaches to extract synaptic plasticity ruled from neural data.

Weaknesses:

(1) Numerics provide only partial support to the statements describing the results.

(2) Even if believing the results, I don't necessarily agree with the interpretation. First: unlike the Marder example where there is complementary evidence to argue that the parameter variations actually reflect across animal biophysical variations, here the statements are really about uncertainty that the experimenter has about what is going on in a circuit observed through a certain measurement lens. Second, while taking into account this uncertainty when using the outcomes of this analysis for subsequent scientific goals is certainly sensible, the biggest punchline for me is that simply observing neural activity in a simple and very restricted context does not provide enough information about the underlying learning mechanism, especially when the hypothesis space is very large (as is the case for the MLP). So it seems more useful to use this framework to think about how to enrich the experimental design/ learning paradigms/ or the measurements themselves to make the set of hypotheses more discriminable (in the spirit of the work by Jacob Portes et al, 2022 for instance). Conversely, one should perhaps think about other ways in which to use other forms of experimental data to reasonably constrain the hypothesis space in the first place.

Reviewer #2 (Public Review):

Summary:

This paper poses the interesting and important question of whether plasticity rules are mathematically degenerate, which would mean that multiple plasticity rules can give rise to the same changes in neural activity. They claim that the answer is "yes," which would have major implications for many researchers studying the biological mechanisms of learning and memory. Unfortunately, I found the evidence for the claim to be weak and confusing, and I don't think that readers can currently infer much beyond the results of the specific numerical experiments reported in the paper.

Strengths:

I love the premise of the paper. I agree with the authors that neuroscientists often under-emphasize the range of possible models that are consistent with empirical findings and/or theoretical demands. I like their proposal that the field is shifting its thinking towards characterizing the space of plasticity rules. I do not doubt the accuracy of most reported numerical results, just their meaning and interpretation. I therefore think that readers can safely use most of the the numerical results to revise their thinking about plasticity mechanisms and draw their own conclusions.

Weaknesses:

Unfortunately, I found many aspects of the paper to be problematic. As a result, I did not find the overarching conclusions drawn by the authors to be convincing.

First, the authors aren't consistent in how they mathematically define and conceptually interpret the "degeneracy" of plasticity mechanisms. In practice, they say that two plasticity mechanisms are "degenerate" if they can't build a neural network to distinguish between a set of neural trajectories generated by them. Their interpretation extrapolates far beyond this, and they seem to conclude that such plasticity rules are in principle indistinguishable. I think that this conclusion is wrong. Plasticity rules are simply mathematical functions that specify how the magnitude of a synaptic weight changes due to other factors, here presynaptic activity (x), postsynaptic activity (y), and the current value of the weight (w). Centuries-old mathematics proves that very broad classes of functions can be parameterized in a variety of non-degenerate ways (e.g., by their Taylor series or Fourier series). It seems unlikely to me that biology has developed plasticity rules that fall outside this broad class. Moreover, the paper's numerical results are all for Oja's plasticity rule, which is a third-order polynomial function of x, y, and w. That polynomial functions cannot be represented by any other Taylor series is a textbook result from calculus. One might wonder if this unique parameterization is somehow lost when many synapses combine to produce neural activity, but the neuron model used in this work is linear, so the function that specifies how the postsynaptic activity changes is simply a fourth-order polynomial in 3N+1 variables (i.e., the presynaptic activities of N neurons prior to the plasticity event, the weights of N synapses prior to the plasticity event, the postsynaptic activity prior to the plasticity event, the presynaptic activities of N neurons after the plasticity event). The same fundamental results from calculus apply to the weight trajectories and the activity trajectories, and a non-degenerate plasticity rule could in principle be inferred from either. What the authors instead show is that their simulated datasets, chosen parameterizations for the plasticity rule, and fitting procedures fail to reveal a non-degenerate representation of the plasticity rule. To what extent this failure is due to the nature of the simulated datasets (e.g., their limited size), the chosen parameterization (e.g., an overparameterized multi-layer perceptron), and their fitting procedure (e.g., their generative adversarial network framework) is unclear. I suspect that all three aspects contribute.

Second, I am concerned by the authors' decision to use a generative adversarial network (GAN) to fit the plasticity rule. Practically speaking, the quality of the fits shown in the figures seems unimpressive to me, and I am left wondering if the authors could have gotten better fits with other fitting routines. For example, other authors fit plasticity rules through gradient descent learning, and these authors claimed to accurately recover Oja's rule and other plasticity rules (Mehta et al., "Model-based inference of synaptic plasticity rules," bioRxiv, 2023). Whether this difference is one of author interpretation or method accuracy is not currently clear. The authors do include some panels in Figure 3A and Figure 8 that explore more standard gradient descent learning, but their networks don't seem to be well-trained. Theoretically speaking, Eqn. (7) in Section 4.4 indicates that the authors only try to match p(\vec y) between the data and generator network, rather than p(\vec x, \vec y). If this equation is an accurate representation of the authors' method, then the claimed "degeneracy" of the learning rule may simply mean that many different joint distributions for \vec x and \vec y can produce the same marginal distribution for \vec y. This is true, but then the "degeneracy" reported in the paper is due to hidden presynaptic variables. I don't think that most readers would expect that learning rules could be inferred by measuring postsynaptic activity alone.

Third, it's important for readers to note that the 2-dimensional dynamical systems representations shown in figures like Figures 2E are incomplete. Learning rules are N-dimensional nonlinear dynamical systems. The learning rule of any individual synapse depends only on the current presynaptic activity, the current postsynaptic activity, and the current weight magnitude, and slices through this function are shown in figures like Figure 2D. However, the postsynaptic activity is itself a dynamical variable that depends on all N synaptic weights. It's therefore unclear how one is supposed to interpret figures like Figure 2E, because the change in y is not a function of y and any single w. My best guess is that figures like Figure 2E are generated for the case of a single presynaptic neuron, but the degeneracies observed in this reduced system need not match those found when fitting the larger network.

Reviewer #3 (Public Review):

Summary:

The authors show that a GAN can learn to reproduce the distribution of outputs of a neuron endowed with Oja's plasticity rule throughout its learning process by learning a plasticity rule. The GAN does not, however, learn Oja's rule. Indeed, the plasticity dynamics it infers can differ dramatically from the true dynamics. The authors propose this approach as a way to uncover families of putative plasticity rules consistent with observed activity patterns in biological systems.

Oja's rule was a great choice for the comparison because it makes explicit, I think, the limitations of this approach. As is well known, Oja's rule allows a (linear) neuron to learn the first principal component of its inputs; the synaptic weights converge to the first eigenvector of the input covariance. After this learning process, the response of a neuron to a particular input sample measures the weighted angle between that input and that principal component.

The other, meta-learned plasticity rules that the authors' GAN uncovers notably do not learn the same computation as Oja's rule (Figure 2). This is, to me, the central finding of the paper and fleshed out nicely. It seems to me that this may be because the objective of the GAN is only to reproduce the marginal output statistics of the neuron. It is, if I understand correctly, blind to the input samples, the inputs' marginal statistics, and to correlations between the input and output. I wonder if a GAN that also had some knowledge of the correlation between input and outputs might be more successful at learning the underlying true dynamics.

The focus on reproducing output statistics has some similarity to some types of experiments (e.g., in vivo recordings) but also seems willfully blind to other aspects of these experiments. In my experience, experimentalists are well aware that the circuits they record receive external inputs. Those inputs are often recorded (perhaps in separate experiments or studies). The point being that I'm not sure that this is an entirely fair comparison to the field.

Finally, the plasticity models studied by theoreticians are not only constructed by intuition and hand-tuning. They also draw, often heavily, on biological data and principles. Oja's rule, for example, is simply the combination of Hebbian learning with a homeostatic constraint on the total synaptic weight amplitude (under the choice of a Euclidean norm).

To me, this study very nicely exposes the caveats and risks associated with a blind machine-learning approach to model specification in biology and highlights the need for understanding underlying biological mechanisms and principles. I agree with the authors that heterogeneity and degeneracy in plasticity rules deserve much more attention in the field.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation