Laboratory of Computational Neuroscience, Brain Mind Institute, School of Computer and Communication Sciences and School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Laboratory of Sensory Processing, Brain Mind Institute, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Machine Learning Research Unit, Technical University of Vienna (TU Wien), Vienna, Austria
This important study demonstrates the significance of incorporating biological constraints in training neural networks to develop models that make accurate predictions under novel conditions. By comparing standard sigmoid recurrent neural networks (RNNs) with biologically constrained RNNs, the manuscript offers compelling evidence that biologically grounded inductive biases enhance generalization to perturbed conditions. This manuscript will appeal to a wide audience in systems and computational neuroscience.
important: Findings that have theoretical or practical implications beyond a single subfield
landmark
fundamental
important
valuable
useful
Strength of evidence
compelling: Evidence that features methods, data and analyses more rigorous than the current state-of-the-art
exceptional
compelling
convincing
solid
incomplete
inadequate
During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments
Abstract
A recurrent neural network fitted to large electrophysiological datasets may help us understand the chain of cortical information transmission. In particular, successful network reconstruction methods should enable a model to predict the response to optogenetic perturbations. We test recurrent neural networks (RNNs) fitted to electrophysiological datasets on unseen optogenetic interventions, and measure that generic RNNs used predominantly in the field generalize poorly on these perturbations. Our alternative RNN model adds biologically informed inductive biases like structured connectivity of excitatory and inhibitory neurons and spiking neuron dynamics. We measure that some of the biological inductive biases can improve the model prediction under perturbation in a simulated dataset and a dataset recorded in mice in vivo. Furthermore, we show in simulations that gradients of the fitted RNN can predict the effect of micro-perturbations in the recorded circuits, and discuss potentials for measuring brain gradients or using gradient-targeted stimulation to bias an animal behavior.
1 Introduction
A fundamental question in neuroscience is how cortical circuit mechanisms drive perception and behavior. To tackle this question, the field has been collecting large-scale electrophysiology datasets under reproducible experimental settings (Siegle et al., 2021; Esmaeili et al., 2021; Urai et al., 2022; International Brain Laboratory et al., 2023). However, the field lacks data-grounded modeling approaches to generate and test hypotheses on the causal role of neuronal and circuit-level mechanisms. To leverage the high information density of contemporary recordings, we need both (1) modeling approaches that scale well with data, and (2) metrics to quantify when the models provide a plausible mechanism for the observed phenomena.
An important question is whether these data-constrained RNNs can reveal a truthful mechanism of neuronal activity and behavior. By construction, the RNNs can generate brain-like network activity, but how can we measure whether the reconstructed network faithfully represents the biophysical mechanism? To answer this question, we submit a range of RNN reconstruction methods to a difficult perturbation test : we measure the similarity of the network response to unseen perturbations in the RNN and the recorded biological circuit.
Optogenetics is a powerful tool to induce precise causal perturbations in vivo (Esmaeili et al., 2021; Guo et al., 2014). It involves the expression of light-sensitive ion channels (Aravanis et al., 2007), such as channelrhodopsins, in specific populations of neurons (e.g., excitatory/pyramidal or inhibitory/parvalbumin-expressing). In this paper, we use datasets combining dense electrophysiological recordings with optogenetic perturbations to evaluate RNN reconstruction methods. Since the neurons in our RNNs are mapped one-to-one to the recorded cells, we can model optogenetic perturbations targeting the same cell-types and areas as done in vivo. Yet, we observe that the similarity between the simulated and recorded perturbations varies greatly depending on the reconstruction methods.
Most prominently, we study two opposite types of RNN specifications. First, as a control model, we consider a traditional sigmoidal RNN (σRNN) which is arguably the most common choice for contemporary data-constrained RNNs (Perich et al., 2020; Arthur et al., 2023; Pals et al., 2024); and second, we develop a model with biologically informed inductive biases (bioRNN): (1) neuronal dynamics follow a simplified spiking neuron model, and (2) neurons associated with fast-spiking inhibitory cells have short-distance inhibitory projections (other neurons are excitatory with both local and long-range interareal connectivity). Following Neftci et al. (2019); Bellec et al. (2018b, 2021); Sourmpis et al. (2023), we adapt gradient descent techniques to optimize the bioRNN parameters of neurons and synapses to explain the recorded neural activity and behavior.
Strikingly, we find that the bioRNN is more robust to perturbations than the σRNN. This is nontrivial because it is in direct contradiction with other metrics often used in the field: the σRNN simulation achieves higher similarity with unseen recorded trials before perturbation, but lower than the bioRNN on perturbed trials. This contradiction is confirmed both on synthetic and in vivo datasets. To analyze this result, we submit a spectrum of intermediate bioRNN models to the same perturbation tests, and identify two bioRNN model features that are most important to improve robustness to perturbation: (1) Dale’s law (the cell type constrains the sign of the connections (Eccles, 1976)), and (2) local-only inhibition (inhibitory neurons do not project to other cortical areas). We also find that other biological inductive biases like spiking neuron dynamics and a sparsity prior improve the robustness to perturbation but to a lesser extent.
Furthermore, a perturbation-robust bioRNN enables the prediction of the causal effect of perturbations in the recorded circuit. This becomes particularly interesting with micro-perturbations (µ-perturbation) targeting dozens of neurons in a small time window. We show in silico that the causal effect of µ-perturbations can be well approximated by the RNN gradients, which has two important implications for experimental neuroscience: (1) in a close-loop experimental setup, we can use RNN gradients to target a µ-perturbation which optimally increase (or decrease) movement in a simulated mouse (this is demonstrated in silico); (2) our RNN reconstruction methodology enables the estimation of gradients of the recorded circuit. So fitting RNNs becomes a tool to measure “brain gradients” and potentially relate contemporary in vivo measurement to decades of theoretical results from machine learning where the gradient is a foundational concept (LeCun et al., 2015; Richards and Kording, 2023).
2 Results
2.1 Reconstructed Networks: biological inductive biases strengthen robustness to perturbations
Synthetic dataset for challenging causal inference
We build a toy synthetic dataset to formalize how we intend to reverse engineer the mechanism of a recorded circuit using optogenetic perturbations and RNN reconstruction methods. It also serves as the first dataset to evaluate our network reconstruction methods. This toy example represents a simplified version of large-scale cortical recordings from multiple brain areas during a low-dimensional instructed behavior (Steinmetz et al., 2019; Esmaeili et al., 2021; International Brain Laboratory et al., 2023), similarly to the vivo dataset of a GO/No-Go task Esmaeili et al. (2021) analyzed in the next section. Let’s consider two areas A and B which are either transiently active together (“hit trial” occurring with frequency p) or quiescent together (“miss trial” occurring with probability 1 − p). Since the two areas are active or in-active together, it is hard to infer if they are connected in a feedforward or recurrent fashion. In Method 4.1, we describe a theoretical example where it is impossible to decide between opposing mechanistic hypothesis (feedforward or recurrent) when recording only the macroscopic activations of areas A and B. In this case, performing optogenetic inactivation of one area is decisive to distinguish between the feedforward or recurrent hypothesis.
To generate artificial spike train recordings that capture this problem, we design two reference circuits (RefCircs) from which we can record the spike trains. Each RefCirc consist of two populations of 250 spiking neurons (80% are excitatory) representing areas A and B. To highlight the importance of optogenetic perturbations as in the Methods 4.1, the first circuit RefCirc1 is feedforward and the second Refcirc2 is recurrent: RefCirc1 (and not RefCirc2) has strictly zero feedback connections from B to A. Yet, the two RefCircs are almost identical without optogenetic perturbations: each neuron in RefCirc1 has been constructed to have an almost identical trial-averaged activity as the corresponding neuron in RefCirc2; and in response to a stimulus, the circuits display a similar bi-modal hit-or-miss response with a hit trial frequency p ≈ 50%. We consider that a trial is a hit if area A is active (averaged firing rate above 8Hz; Defining a “hit” trial based on area A is equivalent to saying that both areas need to be active during unperturbed trials with this dataset. But excluding B in this definition avoids that the hit rate is trivially impacted when manipulating the activity of B with optogenetic perturbations.). To simulate optogenetic inactivations of an area in the RefCircs, we inject a transient current into the inhibitory neurons, modeling the opening of light-sensitive ion channels (Symmetrically, an optogenetic activation is simulated as a positive current injected into excitatory cells). Figure S1 shows that optogenetic perturbations in area B reflect the presence or absence of feedback connections which differs in RefCirc 1 and 2. Methods 4.4 provides more details on the construction of the artificial circuits. Our perturbation test will be to compare the effect of optogenetic perturbations in the reconstructed RNN and in the reference circuits RefCirc1 and 2.
To reconstruct the recorded circuits with an RNN, we record activity from the spiking RefCircs, and optimize the parameters of an RNN to generate highly similar network activity. The whole reconstruction method is summarized graphically in panel A of Figure 1. In the simplest cases, the RNN is specified as a sigmoidal network model (Rosenblatt, 1960; Elman, 1990): σRNN1 and σRNN2 are optimized to reproduce the recording from RefCirc1 and RefCirc2 respectively. In this synthetic dataset, the reconstructed σRNNs have the same size as the RefCircs (500 neurons) and sigmoidal neurons are mapped one-to-one with RefCirc neurons (20% are mapped to inhibitory RefCirc neurons). They are initialized with all-to-all connectivity and are therefore blind to the structural difference of the RefCirc1 and 2 (feed-forward or recurrent). From each of the two RefCircs, we store 2,000 trials of simultaneous spike train recordings of all 500 neurons (step 1 in Fig. 1A). Half of the trials are used as training set and will be the basis for our data-driven RNN optimization. The second half of the recorded trials form the testing set and are used to evaluate the quality of the reconstruction before perturbations.
Network reconstruction and perturbation tests.
A. The three steps to reconstruct the reference circuit (RefCirc) using a biologically informed RNN (bioRNN) or a simgoidal RNN (σRNN) and evaluate the reconstruction based on perturbation tests. B. Summary of the differences between a bioRNN and a σRNN. C. Example raster plots for bioRNN1 and σRNN1, neurons fitted to spike trains from inhibitory cells in RefCirc are shown in red. D. Trial-averaged activity of area A of the two circuits during hit (black-dashed: RefCirc1; blue: bioRNN1; pink: σRNN1) and miss (grey-dashed: RefCirc1; light blue: bioRNN1; light pink: σRNN1) trials. All models display a hit rate of p 5 ≈ 0%. E. Same as D during inactivation of area B. is the recorded change of hit rate for the feedforward circuit RefCirc1, so a successful reconstruction achieves .
We optimize the synaptic “weights” of the σRNN to minimize the difference between its activity and that of the RefCirc (step 2 in Fig 1A, see Methods). The optimization combines three loss functions defined mathematically in Methods 4.5: (i) the neuron-specific loss function ℒneuron is the mean-square error of the trial-averaged neural activity (e.g. the PSTH) between σRNN and RefCirc neurons. (ii) To account for fluctuations of the single-trial network activity, we use a trial-specific loss function ℒtrial, which is the distance between the distribution of single trial population-averaged activity of σRNN and RefCirc (see Sourmpis et al. (2023)). (iii) Finally, we add a regularization loss function ℒreg to penalize unnecessary large weights.
We also developed a biologically informed RNN model (bioRNN) for which we have designed a successful optimization technique. The main differences between σRNNs and bioRNNs consist in the following biological inductive biases. Firstly, the bioRNN neuron model follows a simplified leaky integrate and fire dynamics (see Methods 4.2) yielding strictly binary spiking activity. Secondly, we constrain the recurrent weight matrix to describe cell-type specific connectivity constraints: following Dale’s law, neurons have either non-negative or non-positive outgoing connections; moreover, since cortical inhibitory neurons rarely project across areas, we assume that inhibitory neurons project only locally within the same area. Thirdly, we add a term to the regularization loss ℒ reg to implement the prior knowledge that cross-area connections are more sparse than within an area. Adding these biological features into the model requires an adapted gradient descent algorithm and matrix initialization strategies (Methods 4.5). The reconstruction method with σRNNs and bioRNNs is otherwise identical: the models have the same size, and are optimized on the same data, for the same number of steps and using the same loss functions. The two models bioRNN1 and bioRNN2 are optimized to explain recordings from RefCirc1 and Refcirc2, respectively. Importantly, the structural difference between RefCirc1 (feedforward) and RefCirc2 (feedback) is assumed to be unknown during parameter optimization: at initialization, excitatory neurons in bioRNN1 or bioRNN2 project to any neuron in the network with transmission efficacies (aka as synaptic weights) initialized randomly.
After parameter optimization, we have four models, σRNN1, σRNN2, bioRNN1 and bioRNN2, that we call “reconstructed” models. To validate the reconstructed models, we verify that the network trajectories closely match the data on the test set in terms of (i) the “behavioral” hit-trial frequency, (ii) the peristimulus time histogram (PSTH) mean-square error of single neurons as evaluated by ℒ neuron, and (iii) the distance between single-trial network dynamics as evaluated by ℒ trial (see Suppl. Fig. S2 and Suppl. Table 1). At first sight, the σRNN displays a better performance when comparing the activity simulated by the RNN and recordings held out in the testing set: ℒ trial is for instance lower with σRNN (see Suppl. Table 1). This is expected considering that the optimization of bioRNNs is less flexible and efficient because of the sign-constrained weight matrix and the imperfect surrogate gradient approximation through spiking activity. However, the two bioRNNs are drastically more robust when evaluating the models with perturbation tests.
Perturbation test
To test which of the reconstructed RNNs capture the causal mechanisms of the RefCircs, we simulate optogenetic activations and inactivations of area B (step 3 in Fig. 1A). We first compare the change of hit probability after perturbations in the reconstructed RNN and recorded in RefCirc in Figure 2. For the σRNN the activation or inactivation of area B changes drastically the peak firing rate in area A: all trials become a hit during inactivation of area B. This drastic increase of hit rate is not consistent the reference where the effect of the optogenetic inactivations is mild: the distribution of network responses remains bi-modal (hit versus miss) with only a moderate change of hit frequency for RefCirc2. For RefCirc1 we even expect by design because of the absence of feedback connections from B to A. In contrast, the bioRNN models capture these changes more accurately (see Fig. 1 and 2). Quantitative results are summarized in Fig. 2C, the error of hit probability changes is 7% with bioRNNs when averaged over all conditions (bioRNN1 and bioRNN2, with optogenetic inactivations and activations). The corresponding error is 48.5% on average for σRNNs. In this sense, we argue that the bioRNN provides a better prediction of the perturbed hit frequency than the σRNN. We also performed spike train recordings in the area that is not directly targeted by the light to compare the perturbed network dynamics in the fitted RNNs and the RefCirc. The perturbed dynamics are displayed in Fig. 2B. The quantity is a distance between the network dynamics (RNN versus reference) and is reported in Fig. 2D-E and Supplementary Table 1. Again, the perturbed dynamics of the bioRNN are more similar to those of the reference circuits , than with the σRNN (t-test p-value is 0.0003).
A. Parameters of bioRNN1, bioRNN2, σRNN1 and σRNN2 are optimized to reconstruct RefCirc1 or RefCirc2 from spiking recordings. The bioRNNs and σRNNs are blind to the structural difference of RefCirc1 and 2 and must infer this from the spiking data alone. BioRNN variants are defined by removing one of the biologically inspired features, for instance “No Dale’s law” refers to a bioRNN without sign constraints in the weight matrix, or removing the trial-matching loss function (No TM). B. Trial-averaged activity in area A under activation/inactivation of area B. Dashed black lines indicate the activity of RefCirc1 (thick dashed) and RefCirc2 (thin dashed). All the RNNs are tested with the same reference circuit and training data, each bioRNN model variant is shown with a different color. C. Error between the change of hit probability after perturbations in the RNN and in the RefCirc. D. The distance of network dynamics between each RNN and RefCirc, as a function of perturbation strength applied to area B (horizontal axis: light power in arbitrary units). E. Same quantity as D but averaged for each RNN under the strongest light power condition of perturbation of area B (activation/inactivation across the strongest power level). Statistical significance is computed using the mean over multiple network initializations and compared with the full bioRNN method, significance is indicated with 0 to 4 stars corresponding to p-values thresholds: 0.05, 0.01, 0.001 and 0.0001.
To analyze which features of bioRNN explain this robustness to perturbation, we then derive a family of models where only one feature of the reconstruction is omitted. Namely, the “No Dale’s law” model does not have excitatory and inhibitory weight constraints, the “Non-local inhibition” model allows inhibitory neurons to project outside of their areas, the “No Spike” model replaces the spiking dynamics with a sigmoidal neuron model, and the “No Sparsity” model omits the cross-area sparsity penalty in ℒ reg. Omitting all these features in bioRNN would be equivalent to using a σRNN. The accuracy metrics on the testing sets before perturbation are reported for all RNN variants on the Suppl. Fig. 2C and F. For reference, we also include the model “No TM” (trial-matching) which omits the loss function ℒ trial during training.
The strongest effect measured with this analysis is that the Dale’s law and local inhibition explain most of the improved robustness of bioRNNs. This is visible in Fig. 2 as the perturbed trajectories of “No Dale’s law” and “Non-local inhibition” are most distant from the reference in Fig. 2B. This is confirmed numerically where both the hit-rate error and the distance of network dynamics increase the most when lifting these constraints (Fig. 2C-E and Supplementary Table 1. We explain this result as follows: the mono-synaptic effect of a cell stimulated by the light are always correct in bioRNN (according to Dale’s law, and inhibition locality), but often wrong in the alternative models (see Fig. 2A). For instance, a simple explanation may justify the failure of the “Non-local inhibition” model: the stimulation of inhibitory neurons in B induces (via the erroneous mono-synaptic inhibition) a reduction in the baseline activity in area A (see the green trace during inactivation in Fig. 2B). More generally for perturbation testing, we speculate that these features are measured to be important here because they are central to the biophysical nature of the perturbation considered: optogenetic perturbation targets specific cell types, and these features incorporate a biophysical connectivity priors which is hard to infer entirely from the unperturbed data.
In comparison, it is less clear how spiking is relevant to model optogenetic perturbations. Considering that train spiking networks is a long-standing open question (Bohte et al., 2002; Neftci et al., 2019; Wunderlich and Pehle, 2021), it is technologically nontrivial however that the spiking bioRNN matches the performance of the “No Spike” model on both perturbation test metrics (hit-frequency in Figure 2C and distances of network dynamics ℒ trialFigure 2E). The spiking bioRNN is expected to achieve worse predictions because surrogate gradients Neftci et al. (2019) through the binarized spiking dynamics is an approximation. Indeed the σRNN models achieves lower loss functions on unperturbed data (see Suppl. Table 1), but the fitting does not generalize to perturbations. On perturbed data, it is the spiking bioRNN which achieves slightly better performance (spiking is better on all metrics, but without significant margins, t-test p-value is 0.31 for, see Suppl. Table 1 and Fig. 2E). Also, we speculate that simulating spikes will have a stronger predictive advantage on other perturbation test experiments where timing matters more than long-lasting area-wide activations.
So far, we have identified biological inductive biases of bioRNN that improve its robustness to perturbation. But the prediction error in Fig. 2B on the perturbed trajectory is still too large for our method to unequivocally reveal the characteristic differences between the feedforward RefCirc1 and the recurrent RefCirc2. To reveal this difference in the optimized bioRNN1 and bioRNN2 models, we find that adding a sparse connectivity prior across areas is necessary. This error of the “No Sparsity” model is visible in Fig. 2B (red curves): the PSTH traces for both RefCirc1 and 2 (in Fig. 2B) display the early increase in area A characteristic of feedback from area B to A, it should not exist for RefCirc1. This is corrected with the sparsity prior (bioRNN, blue curves). Altogether on this dataset, the model with constrained weights, spiking activity, and sparsity prior predicted best the perturbed trials. It was also able to capture this characteristic difference between feedforward and recurrent circuits RefCirc1 and RefCirc2. We will now investigate whether the same approach can predict optogenetic perturbations with data recorded in vivo.
2.2 Predicting perturbations on in vivo electrophysiology data
To test whether our reconstruction method with biological inductive biases can predict optogenetic perturbations in large-scale recordings, we used the dataset from Esmaeili et al. (2021). In this study, water-deprived mice were trained to perform a whisker tactile detection task. In 50% of the trials (Go trials), a whisker is deflected, followed by a 1-second delay, after which an auditory cue signals the mice can lick a water spout to receive a water reward. In the other 50% of trials (No-Go trials), no whisker deflection occurs, and licking after the auditory cue results in a penalty with an extended time-out period. While the mice performed the task, experimenters recorded 6, 182 units from 12 areas across 18 mice. Using this dataset, we focused on the 6 most relevant areas for executing this task (Esmaeili et al., 2021). From each area, we randomly selected 250 neurons (200 putative excitatory and 50 putative inhibitory), which correspond to 1500 neurons in total. These areas, all shown to be causally involved in the task (Esmaeili et al., 2021), include the primary and secondary whisker sensory cortex (wS1, wS2), the primary and secondary whisker motor cortex (wM1, wM2), the anterior lateral motor cortex (ALM), and the primary tongue-jaw motor cortex (tjM1). We fit the neuronal activity using the same reconstruction method as used for the synthetic dataset. In the model, we simulate the jaw movement of the mouse as a linear readout driven by the model’s neural activity. This readout is regressed with the real jaw movement extracted from video footage. The parameter optimization of the behavioral readout is performed jointly with fitting the synaptic weights to the neuronal recordings, see Methods 4.5. After training, our reconstructed model can generate neural activity with firing rate distribution, trial-averaged activity, single-trial network dynamics and behavioral outcome which are all consistent with the recordings (see Suppl. Fig S5). Before perturbations, we observe again that the σRNN model fits the testing set data better than the bioRNN model (see Table 2 and Fig. S5).
Trial-matching loss test loss ℒ trial of the different reconstruction methods with the real recordings from (Esmaeili et al., 2021) ± indicates the 95% confidence interval.
We then submit the reconstructed σRNNs and bioRNNs models to perturbation tests. In the dataset, the behavior of the animal is tracked during inactivation of the an area at a given time window (stimulus, delay, or choice periods). For each of the six areas and time windows, we extract the averaged hit frequency under optogenetic inactivation, and attempt to predict this perturbed behavior by inducing the same inactivations to the fitted RNNs. These perturbations are acute spatiotemporal optogenetic inactivations of each area during different time periods (see Figure 3B). As an example, we show the effect of an inactivation of wS1 during the whisker period in the model in Fig. 3. In panel C, we display the simulated trial of a fitted bioRNN with and without perturbations side by side. The two trials are simulated with the same random seed, and this example shows that an early perturbation in wS1 can change a lick decision from hit to miss in the model (Fig. 3C).
Predicting optogenetic perturbations for in vivo electrophysiology data
A. During a delayed whisker detection task, the mouse reports a whisker stimulation by licking to obtain a water reward. Jaw movements are recorded by a camera. Our model simulates the jaw movements and the neural activity from six areas. B. The experimentalists performed optogenetic inactivations of cortical areas (one area at a time) in three temporal windows. C. Example hit trial of a reconstructed network (left). Using the same random seed, the trial turns into a miss trial if we inactivate area wS1 (right, light stimulus indicated by blue shading) during the whisker period by stimulation of inhibitory neurons (red dots). D. Error of the change in lick frequency caused by the perturbation, is predicted by the model, and Δp𝒟 is recorded in mice. Light-shaded circles show individual reconstructed networks with different initializations. The whiskers are the standard error of means. E. Examples of hit rate changes under perturbation for wS1 (Top) and tjM1 (Bottom). See Suppl. Fig. S6 for the other areas.
We denote by ΔpD the in vivo change in lick probability across Go trials in response to optogenetic pertubations. The perturbations were performed in different periods for each area in Esmaeili et al. (2021) (stimulation, delay, or choice periods). For all areas and time windows, we measure the corresponding in the model. On average, the error change probability obtained with the σRNN model is which is significantly worse than the bioRNN model’s 16% (t-test p-value is 0.014, see Figure 3D). As in the synthetic dataset, we find this to be consistent over multiple bioRNN model variant, and we find that imposing Dale’s law and local inhibition best explain the improvement in perturbation-robustness. We also measure that the spiking bioRNN predicts the change in lick probability slightly better than the “No Spike” bioRNN model. Conversely, adding the sparsity prior does not seem to improve the perturbed hit-rate prediction on the real data as seen in the recurrent artifical dataset (RefCirc2) and not in the feedforward case (RefCirc1) as shown in Suppl. Fig. S4.
To further analyze the consistency of the perturbations in the model, we can compare the perturbation map showing changes in lick probability obtained from acute inactivation in the data and the model. The Suppl. Fig. S6 summarizes visually which area has a critical role at specific time points. The change of lick probability in area wS1, ALM and tjM1 are accurately predicted by the bioRNN. In contrast, our model tends to underestimate the causal effect induced by the inactivations of wS2, wM1 and wM2 (Suppl Fig S6). Overall, our model is consistent with a causal chain of interaction from wS1 to ALM and continuing to tjM1.
2.3 Applications for experimental electrophysiology
With future progress in recording technology and reconstruction methods, network reconstruction may soon predict the effect of optogenetic perturbation with even higher accuracy. In this section, we explore possible consequences and applications for experimental electro-physiology. We demonstrate in the following that (1) perturbation-robust bioRNNs enable us to estimate gradients of the recorded circuits, (2) which in turn enable us to target µ-perturbations in the recorded circuit and optimally increase (or decrease) induced movements in our simulated mouse. The so-called “recorded circuit” is a bioRNN trained on the in vivo dataset that we use as a proxy experimental preparation. Its mathematical underpinnings enables us for rigorous theoretical considerations and the design of forward-looking in silico experiments.
µ-perturbations measure brain gradients
We first prove a mathematical relationship between gradients in the recorded circuit and µ-perturbations. We define the integrated movement as where yt is the movement of the jaw movement at time t generated by the model, and we denote ΔY7 as the change of movement caused by the µ-perturbation. If the circuit has well-defined gradients (e.g. say a “No spike” bioRNN model trained on the in vivo recordings in the previous section), using a Taylor expansion, we find that:
where ℐ are the neuron and time indices selected for the optogenetic intervention. The error term ϵ is negligible when the current induced by the light is small. We first confirm this approximation with numerical visualization in Fig. 4A: we display movement perturbations ⟨ΔY ↯⟩ in the circuit with time windows of decreasing sizes (⟨·⟩ indicates a trial average). When the time window is small, and the perturbation is only applied to excitatory or inhibitory cells in Fig. 4A, one can appreciate visually the similarity with the binned gradient in Fig. 4B. Proceeding to a quantitative verification of equation (1), we now compare the effect of small perturbations targeting only 20 neurons on a singletrial. We use the gradient (see Fig. 4C) to predict the outcome of µ-perturbation as follows: for each trial, and each 100ms time window, we identify 20 neurons in the model with highest (or lowest) gradients. We then re-simulate the exact same trial with identical random seed, but induce a µ-perturbation on selected neurons (see rectangles in Figure 4). If we target neurons with strongly positive gradients, the perturbed jaw movements are strongly amplified ΔY ↯ > 0; conversely, if we target neurons with negative gradients the jaw movements are suppressed ΔY ↯ < 0. Although the equation (1) is only rigorously valid for models with well-defined gradients like the “No Spike” model, we also confirm in Fig. 4D that this numerical verification also holds in a spiking circuit model where the gradients are replaced with surrogate gradients (Neftci et al., 2019).
Measuring circuit gradients with µ-perturbations
A-B. Numerical verification for equation (1). A shows the change of jaw movement ΔY following inactivations in a “No Spike” bioRNN. From left to right, we reduce the size of the spatiotemporal window for the optogenetic stimulation. B. Gradients values that approximate ΔY from A using equation (1). C-D. Verification that gradients predict the change of movement on single trials. In C, we display the gradients and jaw movement for three different trials, the neurons targeted by the µ-perturbation are boxed and the perturbed jaw movement is blue. Results averaged for every 100ms stimulation windows are shown in D: positive (resp. negative) modulated means that the 20 neurons with highest (resp. lowest) gradients are targeted, random neurons are selected for the shuffled case.
An implication of equation (1) i s that the measurements ⟨ΔY ↯⟩ that can be recorded in vivo are estimates of the gradients in the recorded circuit. Yet, measuring detailed gradient maps (or perturbation maps) as displayed in Fig. 4 would be costly in vivo as it requires to average ΔY ↯ over dozens of trials for each spatio-temporal window. Instead, gradient calculation in a bioRNN model (that was fitted to the experimental preparation) is a rapid mathematical exercise. If the extracted model is valid, then the gradients in the bioRNN approximate (1) the effect of µ-perturbations ΔY7 in the experimental preparation; (2) the gradient in the recorded circuit.
Targeting in vivo µ-perturbations with bioRNN gradients
Building on this theoretical finding, we build a speculative experimental setup where the bioRNN gradients are used to target a µ-perturbation and increase (or decrease) the movements Y in the experimental preparation in real time. We show a schematic of this speculative close-loop experiment in Fig. 5C extending contemporary read-write elecotrophysiology setups (Packer et al., 2015; Adesnik and Abdeladim, 2021; Grosenick et al., 2015; Papagiakoumou et al., 2020). We demonstrate in silico in Fig. 5A-B how this experiment could use bioRNN gradients to bias the simulated mouse movement Y. As a preparation step, and before applying perturbations, we assume that the bioRNN is well fitted to the recorded circuit and we collect a large databank ℬ of simulated trials from the fitted bioRNN. Then in real-time, we record the activity from the experimental preparation until the time t∗ at which the stimulation will be delivered (Step 1 in Fig. 5A, t∗ is 100ms before the decision period). Rapidly, we find the trial with the closest spike trains in the databank of simulated trials (Step 2) and use the corresponding gradient maps to target neurons with the highest gradient in the model (Step 3). The targeted stimulation is then delivered immediately at t∗ to the experimental preparation (Step 4). When testing this in silico on our artificial experimental preparation, we show in Fig. 5C that this approach can bias the quantity of jaw movement Y driven by the circuit in a predictable way. The amount of movement is consistently augmented if we target neurons with the highest (or diminished if we target neurons with the lowest).
Gradient targeted µ-perturbations could precisely bias an animal behavior
A. Protocol to deliver an optimal µ-perturbation on the experimental preparation based on jaw gradients. (Step 1) The circuit is recorded until stimulation time t∗. (Step 2) The closest bioRNN trial to the ongoing recorded trial is retrieved from the databank. ℬ (Step 3) We select the neurons with the highest (or lowest) gradient value for the µ-perturbation. (Step 4) The µ-perturbation is delivered at t∗. B. Effect of the µ-perturbation using the artificial setup A under different light protocols. Practically, for “High gradient”, we keep step 3 as it is, for “Low gradient”, we change the sign of the gradient, and for “Zero gradient”, we pick the 40 neurons with lowest gradient norm. C. Speculative schematic of a close-up setup implementing the protocol A inspired by the all optical “read-write” setup from Packer et al. (2015).
3 Discussion
Finding the right level of detail to model recorded phenomena has sparked intensive debates in computational neuroscience. Our results show that the co-development of perturbation tests and biological modeling provides a quantitative measurement to address this problem. Perturbation tests can characterize two types of incomplete brain models: (1) physiologically detailed models intended to explain brain mechanisms but do not enable quantitative predictions; (2) machine learning models with high predictive power but capturing a wrong causal mechanism causing erroneous generalizations.
Our reconstruction method uses data-constrained RNNs with informed inductive biases, and is validated on perturbation tests. We achieve a reconstruction of the sensory-motor pathway in the mouse cortex during a sensory detection task from electrophysiology data. The model is optimized to explain electrophysiological recordings and generalizes better than standard models to in vivo optogenetic interventions. We found unambiguously that anatomically informed sign and connectivity constraints for dominant excitatory and inhibitiory cell types improve the model robustness to optogenetic perturbations. In hindsight, we conclude that adding biological constraints becomes beneficial when (1) they model the interaction between the circuit and the perturbation mechanism; (2) their implementation should not impair the efficiency of the optimization process. Broadly speaking this hindsight is also supported by other results elsewhere in neuroscience. For instance: biologically inspired topological maps improve convolutional models of the Monkey’s visual system Schrimpf et al. (2024), and detailed cell-type distribution and connectome improve vision model in the fly brain Lappalainen et al. (2023); Cowley et al. (2024). For future work, there is a dense knowledge of unexploited physiological data at the connectivity, laminar or cell-type level that could be added to improve a cortical model like ours (Harris et al., 2019; Liu et al., 2022; Udvary et al., 2022; Staiger and Petersen, 2021; Rimehaug et al., 2023). By submitting the extended models to the relevant perturbation tests, it becomes possible to measure quantitatively the goodness of their biological mechanism implementations. We do not rule out, that significant improvements on perturbation tests can also be achieved with other means (e.g. by training deep learning architectures Azabou et al. (2024); Pandarinath et al. (2018); Ye et al. (2023) on larger datasets to enable generalization, or with generic regularization techniques like low-rank connectivity Dubreuil et al. (2022); Valente et al. (2022)). However, in a similar way as the σRNN was apriori an excellent predictor on our initial test-set, any powerful brain model will likely have failure modes that can be well characterized and measured with an appropriate perturbation test. So perturbation tests could become a central component of an iterative loop to identify needed data collection or model improvements towards robust brain models.
To highlight the importance of perturbation-robust circuit models, we have discussed possible implications for experimental neuroscience in section 2.3. We have shown that network reconstructions generate detailed perturbation sensitivity maps from non-perturbed recordings only. These sensitivity maps correspond to the RNN gradients and can be used to make targeted micro-stimulation in vivo. As a result we could design a hypothetical close-loop setup combining read-write electrophysiology with a brain model to influence the brain activity or behavior, having potentially important practical and ethical consequences. More conceptually, we have shown theoretically that the gradients of a perturbation robust RNN are also consistent with the gradients of the true recorded circuits. In perspective with the foundational role of gradients in machine learning theory LeCun et al. (2015); Richards and Kording (2023), it enables the measurement of “brain gradients” and lays a computational link between in vivo experimental research and decades of theoretical results on artificial learning and cognition.
4 Methods
4.1 Mathematical toy model of the difficult causal inference between H1 and H2
Let’s consider two simplistic mathematical models that both depend on two binary random variables A and B which represent that putative area A is active as A = 1 and area B as B = 1. With this notation, we can construct two hypothetical causal mechanisms H1 (“a feedforward hypothesis”) and H2 (“a recurrent hypothesis”), which are radically different. The empirical frequency p(A, B) of the outcome does not allow us to differentiate whether the system was generated by a feedforward mechanism H1 or a recurrent mechanism H2. Schematically, we can represent the two mechanism hypotheses as follows:
For hypothesis H1: we assume that external inputs are driving the activity of area A such that A = 1 is active with probability p0, and there are strong feed-forward connections from A to B causing systemically B = 1 as soon as A = 1. Alternatively, in H2, we assume that areas A and B receive independent external inputs with probability . Each of these two inputs is sufficient to cause A = 1 or B = 1, and the two areas are also strongly connected, so A = 1 always causes B = 1 and vice versa. Under these hypothetical mechanisms H1 and H2, one finds that the empirical probability table p(A, B) is identical: (“Hit trial”, both areas are active), p(A = 0, B = 0) = 1 − p0 (“Miss trial”, the areas are quiescent) (To prove this: a and b to denote the binary external inputs into A and B, so we have: where we used that p(A = 1, B = 1|a, b) is 0 or 1, then using p(a = 1) = p(b = 1) = p1 and the independence between a and b we find: ). In both cases, the possibility that only one area is active is excluded by construction. So for any A and B pH1(A, B) = pH2(A, B) and in other words, even if we observe an infinite number of trials and compute any statistics of the binary activations A and B, discriminating the two possible causal interactions (H1 versus H2) is impossible.
A solution to discriminate between hypotheses H1 and H2 is to induce a causal perturbation. We can discriminate between our two hypotheses if we can impose a perturbation that forces the inactivation of area B in both mathematical models. In mathematical terms we refer to the do operator from causality theory. Under the feedforward mechanism H1 and inactivation of B, A is not affected pH1 (A = 1 | do (B = 0)) = p0. Under the recurrent hypothesis, H2, and inactivation of B, A is activated only by its external input such that pH2 (A = 1 | do (B = 0)) = p1 ≠ p0. So the measurement of the frequency of activation of area A under inactivation of B can discriminate between H1 and H2 which illustrates mathematically how a causal perturbation can be decisive to discriminate between those two hypothetical mechanisms.
4.2 Neuron and jaw movement model
We model neurons as leaky-integrate and fire (LIF) neurons. The output of every neuron j at time t is a binary outcome (spike if, no spike if) generated from its membrane voltage. The following equations give the dynamics of the membrane voltage :
where , and are the recurrent and input weight matrices. The timestep of the simulation δt is 2 ms when we simulate the real dataset and 1 ms otherwise. The superscript d denotes the synaptic delay; every synapse has one synaptic delay of either 2 or 3 ms. With, we define the integration speed of the membrane voltage, where τm = 30 ms for excitatory and τm = 10 ms for inhibitory neurons. The noise source is a Gaussian random variable with zero mean and standard deviation is typically initialized at 0.14). The input is a binary pulse signal with a duration of 10 ms. For the real dataset, we have two binary pulse input signals, one for the whisker deflection and one for the auditory cue. The spikes are sampled with a Bernoulli distribution , where v0 is the temperature of the exponential function and vthr,j is the effective membrane threshold. After each spike, the neuron receives a reset current with an amplitude of vtrh,j and enters an absolute refractory period of 4 ms, during which it cannot fire.
For networks fitted to the real dataset, we also simulate the jaw movement. The jaw movement trace y is controlled by a linear readout from the spiking activity of all excitatory neurons. Specifically, y is computed as, where b is a scaling parameter and is given by . Here, is the output weight matrix (linear readout) for the jaw, and τjaw = 5ms defines, which controls the integration velocity of the jaw trace.
4.3 Session-stitching and network structure
As in (Sourmpis et al., 2023), we simulate multi-area cortical neuronal activity fitted to electrophysiology neural recordings. Before we start the optimization, we define and fix each neuron’s area and cell type in the model by uniquely assigning them to a neuron from the recordings. For the real dataset from Esmaeili et al. (2021), the cell type is inferred from the cell’s action potential waveform (with fast-spiking neurons classified as inhibitory and regular-spiking neurons as excitatory). Most electrophysiology datasets include recordings from multiple sessions, and our method would typically require simultaneous recordings of all neurons. To address this challenge, similarly to (Sourmpis et al., 2023) we use the technique called “session-stitching” which allows neighboring modeled neurons to be mapped with neurons recorded across multiple sessions. This effectively creates a “collage” that integrates data from multiple sessions within our model. This approach has practical implications for our optimization process. Specifically, the trial-matching loss includes a term for each session, with the overall loss calculated as the average across all sessions (see 4.5).
For both the real and the synthetic datasets, we simulate each area with 250 LIF neurons and impose that each area has 200 excitatory neurons and 50 inhibitory. Respecting the observation that inhibitory neurons mostly project in the area that they belong to (Tamamaki and Tomioka, 2010; Markram et al., 2004), we don’t allow for across-area inhibitory connections. The “thalamic” input is available to every neuron of the circuit, and the “motor” output for the real dataset, i.e., jaw movement, is extracted with a trained linear readout from all the excitatory neurons of the network, see 4.2.
4.4 Reference circuits for hypotheses 1 and 2
To build a synthetic dataset that illustrates the difficulty of separating the feedforward (H1) and recurrent hypotheses (H2), we construct two reference spiking circuit models RefCirc1 and RefCirc2. The two networks consists of two areas A and B, and their activity follows the hard causal inference problem from method 4.1, making it hard to distinguish A1 and A2 when recording the co-activation of A and B. Moreover, to make the problem even harder, the two networks are constructed to make it almost impossible to distinguish between H1 and H2 with dense recordings: the two circuits are designed to have the same PSTH and single-trial network dynamics despite their structural difference, one is feedforward and the other is recurrent.
To do so, RefCirc1 and 2 are circuit models that start from random network initializations following the specifications described in Methods 4.2 and 4.3. The only difference is that we do not allow feedback connections from A to B in RefCirc1, the construction below is otherwise identical. The synaptic weights of the two circuits are optimized with the losses described in Methods 4.5 to fit the identical target statistics in all areas: the same PSTH activity for each neuron and the same distribution of single-trial network dynamics. The target statistics are chosen so the activity in RefCirc1 and 2 resemble kinematics and statistics from a primary and a secondary sensory area. The baseline firing rates of the neurons is dictated by the target PSTH distribution and it follows a log-normal distribution, with excitatory neurons having a mean of 2.9 Hz and a standard deviation of 1.25 Hz and inhibitory neurons having a mean of 4.47 Hz and a standard deviation of 1.31 Hz. The distribution of single-trial activity is given by the targeted single-trial dynamics: in RefCirc1 and 2, the areas A and B respond to input 50% of the time with a transient population average response following a double exponential kernel characterized by τrise = 5 ms and τfall = 20 ms. Mimicking a short signal propagation between areas, these transients have a 4 ms delay in area A and 12 ms delay in B (relative to the onset time of the stimulus). To impose a “behavioral” hit versus miss distribution that could emerge from a feedforward and recurrent hypothesis (see method 4.1), the targeted population-averaged response of each trial is either a double-exponential transient in both area A and B (”Hit trials”), or remains at a baseline level in both areas (”Miss trials”) in the remaining trials. At the end of the training, we verified that RefCirc1 and RefCirc2 generate very similar network activity in the absence of perturbation (see Figure S1). The circuits are then frozen and used to generate the synthetic dataset. We generate 2000 trials from these RefCircs, 1000 of which are used for the training set and 1000 for the testing set.
4.5 Optimization and loss function
The optimization method we use to fit our models is back-propagation through time (BPTT). To overcome the non-differentiability of the spiking function, we use surrogate gradients (Neftci et al., 2019). In particular, we use the piece-wise linear surrogate derivative from Bellec et al. (2018b). For the derivative calculations, we use and not exp. We use sample-and-measure loss functions that rely on summary statistics, as in (Bellec et al., 2021; Sourmpis et al., 2023), to fit the networks to the data. Our loss function has two main terms: one to fit the trial-averaged activity of every neuron (ℒ neuron), and one to fit the single trial population average activity (ℒ trial), = ℒ neuron + ℒ trial. The two terms of the loss function are reweighted with a parameter-free multi-task method (Défossez et al., 2023) that enables the gradients to have comparable scales.
As in Sourmpis et al. (2023): (1) To calculate the trial-averaged loss, we first filter the trial-averaged spiking activity using a rolling average window (f) of 8 ms. We then normalize it by the trial-averaged filtered data activity, (zD are recorded spike trains)
where ⟨. ⟩t is the time average, and σt the standard deviation over time. The trial-averaged loss function is defined as:
where T is the number of time points in a trial and N is the number of neurons. For the real dataset, where we want to fit also the jaw movement, we have an additional term for the trial-averaged filtered and normalized jaw, , where y is the simulated jaw movement and yD the recorded jaw movement.
(2)To calculate the trial-matching loss, we first filter the population-average activity of each area , using a rolling average window of 32 ms. We then normalize it by the population-averaged filtered activity of the same area from the recordings, , and concatenate all the areas that were simultaneously recorded, , where ⟨.⟩k is the trial average, and σk the standard deviation over trials. The trial-matching loss is defined as:
where π is an assignment between pairs of K recorded and generated trials π : {1, … K} → {1, … K}. Note that the minimum over π is a combinatorial optimization that needs to be calculated for every evaluation of the loss function. For the real dataset, we consider the jaw movement as an additional area, and we concatenate it to the .
Based on this loss function, we optimize the following parameters: , and β for the RefCircs. For the RNNs, we optimize only the recurrent connectivity Wrec,d, and the rest are fixed from the RefCircs. For the real dataset, additionally to the parameters optimized in the RefCircs, we also optimize the jaw’s linear readout and its scaling parameter b.
Implementing Dale’s law and local inhibition
In our network, the recurrent weights Wrec are computed as the elementwise product of two matrices:, which encodes the strength of synaptic efficacies and is always positive, and, which has a fixed sign determined by the neurotransmitter type of the presynaptic neuron and :
To enforce Dale’s law during optimization, we set any negative values of to zero at each iteration as in Bellec et al. (2018a). Similarly, to constrain only local inhibitory connections during optimization, we zero out any changes in the synaptic efficacies of across-areas inhibitory connections at each iteration. In simplified models, Dale’s law or the local inhibition constraint can be disrupted by omitting this correction step.
The success of the network optimization highly depends on the initialization of the recurrent weight matrices. To initialize signed matrices we follow the theoretical Rajan and Abbott (2006) and practical insights Bellec et al. (2018b); Cornford et al. (2020) developed previously. After defining the constraints on the weight signs, the initialization amplitude for each target neuron is adjusted to a zero-summed input weights (the sum of incoming excitatory inputs is equal to the sum of inhibitory inputs). Then the weight amplitude is re-normalized by the modulus of its largest eigenvalue of Wrec, so all the eigenvalues of this matrix Wrec have modulus 1 or smaller.
Stopping criterion for the optimization
For the synthetic dataset, we train the models for 4000 gradient descent steps. For the real dataset, due to limited data and a noisy test set, we select the final model based on the optimization step that yields the best trial-type accuracy (closest to the trial-type accuracy from the data), derived from the jaw trace and whisker stimulus, along with the highest trial-matched Pearson correlation between the model and the recordings.
Sparsity regularization
There is a plethora of ways to enforce sparsity. In this work, we use weight regularization. In particular, we use the norm of the recurrent and input weights that promote a high level of sparsity (Xu et al., 2012). To avoid numerical instabilities, we apply this regularization only for synaptic weights above α and prune all synapses below α. (we set α = 1e−7). The regularized loss function becomes:
where Wacross,d are the connections from one area to the other.
For the synthetic dataset, we choose the level of across-area sparsity by performing a small grid search for λ3. In particular, the sparsity level λ3 is the maximum value λ3 where the performance remains as good as without sparsity, see Suppl. Fig S3. For the real dataset, we use the same value λ3 as the one we found for the full reconstruction method of bioRNN1.
4.6 Perturbation test of in silico optogenetics
In systems neuroscience, a method to test causal interactions between brain regions uses spatially and temporally precise optogenetic activations or inactivations (Esmaeili et al., 2021; Guo et al., 2014). Usually, inactivations refer to the strong activation of inhibitory neurons for cortical areas. These inhibitory neurons have strong intra-area connections that effectively “silence” their local-neighborhood (Helmstaedter et al., 2009).
Our model can simulate these perturbations and allow us to compare the causal mechanisms of two networks based on their responses to optogenetic perturbations. We implement activations and inactivations as a strong input current to all the neurons in one area’s excitatory or inhibitory population. For the RefCircs and reconstructed RNNs, we use a transient current that lasts 40 ms, from 20 ms before to 20 ms after the input stimulus. The strength of the current (light power) varies until there is an effect in the full reconstruction method bioRNN1. For the synthetic dataset in Figure 2 (except for panel D), we inject a current of into excitatory neurons for activations and into inhibitory neurons for inactivations. For the real dataset, we perform optogenetics inactivations in three different periods. As in Esmaeili et al. (2021), we silence the cortical circuit during the whisker presentation, the time between the whisker and auditory stimulus, or when the animal was licking for the reward. In particular, we use transient currents to the inhibitory neurons during (i.) 100 ms before and after the whisker presentation, (ii.) 100 ms after the whisker presentation till 100ms before the onset of the auditory cue, and (iii.) after the auditory cue till the end of our simulation. For cases (i.) and (ii.), we linearly decreased the strength of the current to avoid rebound excitation. The light power is chosen so that our model has the best results in reproducing the lick probability of the recordings. It is important to mention that the perturbation data are not used to optimize the network but to test whether the resulting network has the same causal interactions with the recordings.
For the RefCircs and bioRNNs, we evaluate the effect of the perturbations directly from the neural activity. We use the distance of network dynamics ℒ trial to compare the two perturbed networks. For the real dataset, we compare the effect of the inactivations on the behavior; as behavior here, we mean whether the mouse/model licked. We classify the licking action using a multilayer perceptron with two hidden layers with 128 neurons each. The classifier is trained with the jaw movement of the real dataset, which was extracted from video filming using custom software, to predict the lick action, which was extracted from a piezo sensor placed in the spout. This classifier predicted lick correctly 94% of the time. We then used the same classifier on the jaw movement from the model to determine whether there was a “lick” or not.
The loss ℒ trial measures the distance between the network dynamics in single trials.
Here for the synthetic dataset, we report the distance between activity statistics in area A during stimulation of area B. The column “light” reports the distance between RNN and RefCirc dynamics during perturbation with the highest light amplitude, the column “no light” reports the same value when no stimulation is delivered. ± indicates the 95% confidence interval.
Modeling “optogenetic” perturbations.
A. Two different network hypotheses for implementing a detection task. In RefCirc1, area A projects to area B but not vice versa. In RefCirc2, the areas are recurrently connected. B. Raster plots of all neurons in RefCirc1 during a single hit trial under normal conditions (control, left) and under optogenetic perturbation of excitatory (middle) and inhibitory (right) neurons. The duration of the light stimulus is shown with a blue shading. C. Same for RefCirc2 D. Trial-averaged activity of the two circuits during Hit (blue: RefCirc1; green: RefCirc2) and Miss (yellow: RefCirc 1; red: RefCirc2) trials. A trial is classified as “Hit” if area A reaches a transient firing rate above 8Hz; and otherwise as “Miss”. For the control case, the maximal difference between the trial average activity of the two networks is below 0.51 Hz (zoom inset).
Fitting Reconstructed networks to the synthetic dataset.
A. Schematic representation of the RefCirc1 and bioRNN1. and probability of hit trials. B. Histogram of the firing rate distribution of the RefCirc1 and all the RNN1 versions. We observe that all RNN1 versions fit well with the RefCirc1. C. Left: Neuron loss of the different RNN1 variants. Right: Trial-matching loss of the different RNN1 variants. We observe that the model without the trial-matching loss function behaves considerably worse. The whiskers show the 95% confidence interval of the mean across trials. D-F. Same as A-B for RefCirc2 and RNNs2.
Picking the sparsity level.
A. Grid search for the optimal maximum regularization strength (λ3) without a drop in performance. As a performance measure, we used the trial-matching loss, Ltrial.
In contrast to Figure 2C, here we show separately the change of hit probability for RefCirc1 (left) and RefCirc2 (right).
Reconstruction of the real recordings.
A. Probability of hit trials of the different variant models. B. Histogram of the firing rate distribution from the real recordings and all the variants. C. Top: Neuron loss of the different RNN1 variants. All RNN versions have a similar loss value. Bottom: Trial-matching loss of the different model variants. We observe that the model without the trial-matching loss function behaves considerably worse. The whiskers show the 95% confidence interval.
A Change of lick probability under inactivation of all areas in all the different temporal windows. We show the Δp from the data and reconstruction model variants.
Acknowledgements
We thank Alireza Modirshanechi, Shuqi Wang, and Tâm Nguyen for their valuable feedback on the manuscript. We are grateful to Vahid Esmaeili for collecting the dataset and ongoing support throughout this project. This research is supported by the Sinergia project CR-SII5 198612, the Swiss National Science Foundation (SNSF) project 200020 207426 awarded to WG, SNSF projects TMAG-3 209271 and 31003A 182010 awarded to CP, and the Vienna Science and Technology Fund (WWTF) project VRG24-018 awarded to GB.
References
Adesnik H.
Abdeladim L.
2021Probing neural codes with two-photon holographic optogeneticsNature Neuroscience24:1356–1366Google Scholar
Aravanis A. M.
Wang L.-P.
Zhang F.
Meltzer L. A.
Mogri M. Z.
Schneider M. B.
Deisseroth K.
2007An optical neural interface: in vivo control of rodent motor cortex with integrated fiberoptic and optogenetic technologyJournal of Neural Engineering4:S143Google Scholar
Arthur B. J.
Kim C. M.
Chen S.
Preibisch S.
Darshan R.
2023A scalable implementation of the recursive least-squares algorithm for training spiking neural networksFrontiers in Neuroinformatics17:1099510Google Scholar
Azabou M.
Arora V.
Ganesh V.
Mao X.
Nachimuthu S.
Mendelson M.
Richards B.
Perich M.
Lajoie G.
Dyer E.
2024A unified, scalable framework for neural population decodingAdvances in Neural Information Processing Systems36Google Scholar
Bellec G.
Kappel D.
Maass W.
Legenstein R.
2018aDeep rewiring: Training very sparse deep networksIn: IclrGoogle Scholar
Bellec G.
Salaj D.
Subramoney A.
Legenstein R.
Maass W.
2018bLong short-term memory and learning-to-learn in networks of spiking neuronsAdvances in Neural Information Processing Systems31Google Scholar
Bellec G.
Wang S.
Modirshanechi A.
Brea J.
Gerstner W.
2021Fitting summary statistics of neural data with a differentiable spiking network simulatorAdvances in Neural Information Processing Systems34Google Scholar
Billeh Y. N.
Cai B.
Gratiy S. L.
Dai K.
Iyer R.
Gouwens N. W.
Abbasi-Asl R.
Jia X.
Siegle J. H.
Olsen S. R.
et al.
2020Systematic integration of structural and functional data into multi-scale models of mouse primary visual cortexNeuron106:388–403Google Scholar
Bohte S. M.
Kok J. N.
La Poutre H.
2002Error-backpropagation in temporally encoded networks of spiking neuronsNeurocomputing48:17–37Google Scholar
Chen G.
Scherr F.
Maass W.
2022A data-based large-scale model for primary visual cortex enables brain-like robust and versatile visual processingScience Advances8:eabq7592Google Scholar
Cornford J.
Kalajdzievski D.
Leite M.
Lamarquette A.
Kullmann D. M.
Richards B.
2020Learning to live with dale’s principle: Anns with separate excitatory and inhibitory unitsbioRxiv :2020.11.02.364968Google Scholar
Cowley B. R.
Calhoun A. J.
Rangarajan N.
Ireland E.
Turner M. H.
Pillow J. W.
Murthy M.
2024Mapping model units to visual neurons reveals population code for social behaviourNature629:1100–1108Google Scholar
2017Multiplexed computations in retinal ganglion cells of a single typeNature Communications8:1964Google Scholar
Dinc F.
Shai A.
Schnitzer M.
Tanaka H.
2023Cornn: Convex optimization of recurrent neural networks for rapid inference of neural dynamicsAdvances in Neural Information Processing Systems36Google Scholar
Dubreuil A.
Valente A.
Beiran M.
Mastrogiuseppe F.
Ostojic S.
2022The role of population structure in computations through neural dynamicsNature Neuroscience25:783–794Google Scholar
Eccles J. C.
1976From Electrical to Chemical Transmission in the Central Nervous System: The Closing Address of the Sir Henry Dale Centennial SymposiumSpringer Google Scholar
Elman J. L.
1990Finding structure in timeCognitive Science14:179–211Google Scholar
Esmaeili V.
Tamura K.
Muscinelli S. P.
Modirshanechi A.
Boscaglia M.
Lee A. B.
Oryshchuk A.
Foustoukos G.
Liu Y.
Crochet S.
et al.
2021Rapid suppression and sustained activation of distinct cortical regions for a delayed sensory-triggered motor responseNeuron109:2183–2201Google Scholar
Fraile J. G.
Scherr F.
Ramasco J. J.
Arkhipov A.
Maass W.
Mirasso C. R.
2023Competition between bottom-up visual input and internal inhibition generates error neurons in a model of the mouse primary visual cortexbioRxiv :2023.01.27.525984Google Scholar
Grosenick L.
Marshel J. H.
Deisseroth K.
2015Closed-loop and activity-guided optogenetic controlNeuron86:106–139Google Scholar
Guo Z. V.
Li N.
Huber D.
Ophir E.
Gutnisky D.
Ting J. T.
Feng G.
Svoboda K.
2014Flow of cortical activity underlying a tactile decision in miceNeuron81:179–194Google Scholar
Harris J. A.
Mihalas S.
Hirokawa K. E.
Whitesell J. D.
Choi H.
Bernard A.
Bohn P.
Caldejon S.
Casal L.
Cho A.
et al.
2019Hierarchical organization of cortical and thalamic connectivityNature575:195–202Google Scholar
Helmstaedter M.
Sakmann B.
Feldmeyer D.
2009L2/3 interneuron groups defined by multiparameter analysis of axonal projection, dendritic geometry, and electrical excitabilityCerebral Cortex19:951–962Google Scholar
Hodgkin A. L.
1958The croonian lecture-ionic movements and electrical activity in giant nerve fibresProceedings of the Royal Society of London. Series B-Biological Sciences148:1–37Google Scholar
International Brain Laboratory
Benson B.
Benson J.
Birman D.
Bonacchi N.
Carandini M.
Catarino J. A.
Chapuis G. A.
Churchland A. K.
Dan Y.
et al.
2023A brain-wide map of neural activity during complex behaviourbioRxiv :2023.07.04.547681Google Scholar
Isbister J. B.
Ecker A.
Pokorny C.
Bolaños-Puchet S.
Santander D. E.
Arnaudon A.
Awile O.
Barros-Zulaica N.
Alonso J. B.
Boci E.
et al.
2023Modeling and simulation of neocortical micro-and mesocircuitry. part ii: Physiology and experimentationbioRxiv :2023.05.17.541168Google Scholar
Kim C. M.
Finkelstein A.
Chow C. C.
Svoboda K.
Darshan R.
2023Distributing task-related neural activity across a cortical network through task-independent connectionsNature Communications14:2851Google Scholar
Lappalainen J. K.
Tschopp F. D.
Prakhya S.
McGill M.
Nern A.
Shinomiya K.
Takemura S.-y.
Gruntman E.
Macke J. H.
Turaga S. C.
2023Connectome-constrained deep mechanistic networks predict neural responses across the fly visual system at single-neuron resolutionbioRxiv :2023.03.11.532232Google Scholar
2022Axonal and dendritic morphology of excitatory neurons in layer 2/3 mouse barrel cortex imaged through whole-brain two-photon tomography and registered to a digital brain atlasFrontiers in Neuroanatomy15:791015Google Scholar
Mahuas G.
Isacchini G.
Marre O.
Ferrari U.
Mora T.
2020A new inference approach for training shallow and deep generalized linear models of noisy interacting neuronsAdvances in Neural Information Processing Systems33Google Scholar
Markram H.
Muller E.
Ramaswamy S.
Reimann M. W.
Abdellah M.
Sanchez C. A.
Ailamaki A.
Alonso-Nanclares L.
Antille N.
Arsever S.
et al.
2015Reconstruction and simulation of neocortical microcircuitryCell163:456–492Google Scholar
Markram H.
Toledo-Rodriguez M.
Wang Y.
Gupta A.
Silberberg G.
Wu C.
2004Interneurons of the neocortical inhibitory systemNature Reviews Neuroscience5:793–807Google Scholar
Neftci E. O.
Mostafa H.
Zenke F.
2019Surrogate gradient learning in spiking neural networks: Bringing the power of gradient-based optimization to spiking neural networksIEEE Signal Processing Magazine36:51–63Google Scholar
Packer A. M.
Russell L. E.
Dalgleish H. W.
Häusser M.
2015Simultaneous all-optical manipulation and recording of neural circuit activity with cellular resolution in vivoNature Methods12:140–146Google Scholar
Pals M.
Sağtekin A. E.
Pei F.
Gloeckler M.
Macke J. H.
2024Inferring stochastic low-rank recurrent neural networks from neural dataarXivGoogle Scholar
Pandarinath C.
O’Shea D. J.
Collins J.
Jozefowicz R.
Stavisky S. D.
Kao J. C.
Trautmann E. M.
Kaufman M. T.
Ryu S. I.
Hochberg L. R.
et al.
2018Inferring single-trial neural population dynamics using sequential auto-encodersNature Methods15:805–815Google Scholar
Papagiakoumou E.
Ronzitti E.
Emiliani V.
2020Scanless two-photon excitation with temporal focusingNature Methods17:571–581Google Scholar
2008Spatio-temporal correlations and visual signalling in a complete neuronal populationNature454:995–999Google Scholar
Pozzorini C.
Mensi S.
Hagens O.
Naud R.
Koch C.
Gerstner W.
2015Automated high-throughput characterization of single neurons by means of simplified spiking modelsPLoS Computational Biology11:e1004275Google Scholar
Rajan K.
Abbott L. F.
2006Eigenvalue spectra of random matrices for neural networksPhysical Review Letters97:188104Google Scholar
Richards B. A.
Kording K. P.
2023The study of plasticity has always been about gradientsThe Journal of Physiology601:3141–3149Google Scholar
1960Perceptual generalization over transformation groupsIn: Self Organizing Systems NY: Pergamon Press pp. 63–96Google Scholar
Schrimpf M.
McGrath P.
Margalit E.
DiCarlo J. J.
2024Do topographic deep ann models of the primate ventral stream predict the perceptual effects of direct it cortical interventions?bioRxiv :2024.01.09.572970Google Scholar
Siegle J. H.
Jia X.
Durand S.
Gale S.
Bennett C.
Graddis N.
Heller G.
Ramirez T. K.
Choi H.
Luviano J. A.
et al.
2021Survey of spiking in the mouse visual system reveals functional hierarchyNature592:86–92Google Scholar
Sourmpis C.
Petersen C.
Gerstner W.
Bellec G.
2023Trial matching: capturing variability with data-constrained spiking neural networksAdvances in Neural Information Processing Systems36Google Scholar
Spieler A.
Rahaman N.
Martius G.
Schölkopf B.
Levina A.
2023The expressive leaky memory neuron: an efficient and expressive phenomenological neuron model can solve long-horizon tasksIn: The Twelfth International Conference on Learning RepresentationsGoogle Scholar
Staiger J. F.
Petersen C. C.
2021Neuronal circuits in barrel cortex for whisker sensory perceptionPhysiological Reviews101:353–415Google Scholar
Steinmetz N. A.
Zatka-Haas P.
Carandini M.
Harris K. D.
2019Distributed coding of choice, action and engagement across the mouse brainNature576:266–273Google Scholar
Tamamaki N.
Tomioka R.
2010Long-range gabaergic connections distributed throughout the neocortex and their possible functionFrontiers in Neuroscience4:202Google Scholar
2022The impact of neuron morphology on cortical network architectureCell Reports39Google Scholar
Urai A. E.
Doiron B.
Leifer A. M.
Churchland A. K.
2022Large-scale neural recordings call for new insights to link brain and behaviorNature Neuroscience25:11–19Google Scholar
Valente A.
Pillow J. W.
Ostojic S.
2022Extracting computational mechanisms from neural data using low-rank rnnsAdvances in Neural Information Processing Systems35Google Scholar
Wunderlich T. C.
Pehle C.
2021Event-based backpropagation can compute exact gradients for spiking neural networksScientific Reports11:12829Google Scholar
Xu Z.
Chang X.
Xu F.
Zhang H.
2012l {1/2} regularization: A thresholding representation theory and a fast solverIEEE Transactions on neural networks and learning systems23:1013–1027Google Scholar
Ye J.
Collinger J.
Wehbe L.
Gaunt R.
2023Neural data transformer 2: multi-context pretraining for neural spiking activityAdvances in Neural Information Processing Systems36:80352–80374Google Scholar
Article and author information
Author information
Christos Sourmpis
Laboratory of Computational Neuroscience, Brain Mind Institute, School of Computer and Communication Sciences and School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Laboratory of Sensory Processing, Brain Mind Institute, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Laboratory of Computational Neuroscience, Brain Mind Institute, School of Computer and Communication Sciences and School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Laboratory of Computational Neuroscience, Brain Mind Institute, School of Computer and Communication Sciences and School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland, Machine Learning Research Unit, Technical University of Vienna (TU Wien), Vienna, Austria
You can cite all versions using the DOI https://doi.org/10.7554/eLife.106827. This DOI represents all versions, and will always resolve to the latest one.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
views
97
downloads
0
citations
0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.