# Abstract

Studying and understanding the code of large neural populations hinge on accurate statistical models of population activity. A novel class of models, based on learning to weigh sparse nonlinear Random Projections (RP) of the population, has demonstrated high accuracy, efficiency, and scalability. Importantly, these RP models have a clear and biologically-plausible implementation as shallow neural networks. We present a new class of RP models that are learned by optimizing the randomly selected sparse projections themselves. This “reshaping” of projections is akin to changing synaptic connections in just one layer of the corresponding neural circuit model. We show that Reshaped RP models are more accurate and efficient than the standard RP models in recapitulating the code of tens of cortical neurons from behaving monkeys. Incorporating more biological features and utilizing synaptic normalization in the learning process, results in even more efficient and accurate models. Remarkably, these models exhibit homeostasis in firing rates and total synaptic weights of projection neurons. We further show that these sparse homeostatic reshaped RP models outperform fully connected neural network models. Thus, our new scalable, efficient, and highly accurate population code models are not only biologically-plausible but are actually optimized due to their biological features. These findings suggest a dual functional role of synaptic normalization in neural circuits: maintaining spiking and synaptic homeostasis while concurrently optimizing network performance and efficiency in encoding information and learning.

**eLife assessment**

This work is an **important** contribution to the development of a biologically plausible theory of statistical modeling of spiking activity. The authors **convincingly** implemented the statistical inference of input likelihood in a simple neural circuit, demonstrating the relationship between synaptic homeostasis, neural representations, and computational accuracy. This work will be of interest to neuroscientists, both theoretical and experimental, who are exploring how statistical computation is implemented in neural networks. There are questions about the performance of the methods in the case where other biologically significant parameters, such as firing rate and thresholds, are optimized together with the synaptic weights.

# Introduction

The potential “vocabulary” of spiking patterns of a population of neurons scales exponentially with the size of the population, and so, the mapping the rules of neural population codes and their semantic organization, cannot rely on direct sampling of the vocabulary for more than a handful of neurons. Moreover, the stochastic nature of neural activity implies that the characterization of neural codes must rely on probability distributions over population activity patterns. Therefore, to describe and analyze the structure and content of the code with which neural circuits respond to stimuli, process information, and direct action – we must learn statistical models of their activity. Such models have been used to study neural population codes in different systems: Models of the directional coupling between neurons, such as Generalized Linear Models, have been used to replicate the stimulusdependent rates of populations of tens of neurons (1–4). Maximum entropy models have accurately captured the joint activity patterns of more than 100 neurons, using simple statistical features of the population, like firing rates, pairwise correlations, synchrony, and other low-order statistics (5– 13). These models have further been used to characterize the semantic organization of population codes (14, 15). Autoencoder models have been employed to replicate the detailed structure of population activity – yielding generative models that can be used to study the code, but their design is difficult to interpret (16–18). Importantly, scaling of these models to hundreds of neurons is computationally challenging (9, 19, 20), which has been a major challenge in modeling large neural systems.

While statistical models are invaluable for describing and studying neural codes, it is not clear whether the brain relies on such models or implements them when representing or processing information (21, 22). Consequently, much of the analysis of neural codes has focused on decoding population activity, typically using simple decoders (2, 23– 27), or metrics over the structure of population activity patterns (14, 28, 29). Yet, if neural circuits do implement such statistical models, and in particular, ones that compute the likelihood of their inputs – this would present a realizable mechanism for real neural circuits to carry Bayesian computation and decision making (30–32). Such network models are, therefore, of interest not only as a way to study neural codes, but also as a potential way for biological neural networks to implement efficient learning and overcome the credit assignment problem. In addition, they may be useful for improving learning in artificial neural networks using biological features (33–38).

Both structured architectural features of neural circuits and random connectivity patterns have been suggested to shape the computation carried out by neural circuits (30, 39–43). These computations rely on the nature of synaptic connectivity and the coupling between synapses in terms of how they change during learning. Competition mechanisms between synapses or other regularization mechanisms have also been suggested to be important components of computation and learning in artificial neural networks as well as in cortical circuits (44, 45). One such mechanism is the homeostatic scaling of synaptic plasticity, which has been observed *in vitro* and *in vivo* at the level of incoming synapses to a neuron and outgoing ones (46–49). This mechanism has been commonly attributed to the regulation of firing rates, while its computational implications remain mostly unclear, but of interest computationally and mechanistically (50–54). A related computational feature has been presented by network models that include divisive normalization, suggested as an important component of computations performed by cortical circuits (55).

Here, we bring these ideas together to present a biologicallyinspired variant of a new family of statistical models for large neural population codes. Adding biological features to these population models enabled us to improve the models, and to explore designs that real neural circuits could employ to implement such models. Specifically, we expand the Random Projections (RP) model (30), which was shown to be highly accurate in recapitulating the detailed spiking patterns of more than 100 neurons in different neural systems. Importantly, in addition to being accurate and requiring little amounts of training data, these RP models can be readily implemented by a simple neural circuit model – suggesting how real neural circuits can learn a statistical model of their own inputs and compute the likelihood of the inputs. We show that we can make these models better by “reshaping” the randomly chosen sparse non-linear projections that they rely on, achieving highly accurate models using significantly fewer projections. We further show that reshaping of projections that incorporates normalization of synaptic weights during learning, results in more accurate models that are also more efficient, and makes the models homeostatic in terms of neural activity and total synaptic weights. Thus, we present a new class of accurate and efficient statistical models for large neural population codes that also suggests a clear computational benefit of homeostatic synaptic normalization and its potential role in biological neural networks and artificial ones.

# Results

The Random Projections (RP) model is a class of highly accurate, scalabale, and efficient statistical models of the joint activity patterns of large populations of neurons (30, 31). These models are based on random and sparse nonlinear functions, or “projections”, of the population: Given a recording of the spiking activity of a population of neuorns, the model is a probability distribution over discrete activity patterns (quantized into small time bins, e.g. 10-20 ms), that relies on a set of random non-linear functions of the population activity,

where *a*_{ij} are randomly sampled coefficients such that most of them for any *i* are zero (i.e., the set is sparse), *θ*_{j} are thresholds, and *σ*(*·*) are nonlinear functions, (e.g., the Heaviside step function). The RP model is the maximum entropy distribution (56), which is consistent with the observed average values of the random projections *⟨f*_{i}*⟩*_{p} = *⟨f*_{i}*⟩*_{data} (See Methods). Thus, it is the least structured distribution that retains the average values of the projections, is mathematically unique, and is given by

where *λ*_{i} are Lagrange multipliers, and *Z* is a normalization factor or the “partition function”, which can be found numerically. Applied to cortical data from multiple areas (see, e.g., Figure 1A), this model proved to be highly accurate in predicting individual activity patterns, using small amounts of training data (30). Importantly, unlike many other statistical models of population activity, RP models have a simple, biologically plausible neural circuit that can implement them (30): Figure 1B shows such a feed-forward circuit with one intermediate layer and an output neuron, where the random coefficients of the sparse projections, *a*_{ij}, are the synaptic weights connecting the input neurons to an intermediate layer of neurons *f*_{i}. Each intermediate neuron implements one projection of the input population. The Lagrange multipliers, *λ*_{i}, are the synaptic weights connecting the intermediate layer to the output neuron, whose membrane potential or output gives the log-likelihood of the activity pattern of *_x*, up to a normalization factor.

The model in eq. 2 harbors a duality between the projections, *f*_{i}, and their coefficients, *λ*_{i}: In the maximum entropy formalism of the model, the projections are randomly sampled and then fixed, and their corresponding weights, *λ*_{i}’s, are tuned to maximize the entropy and satisfy the constraints. Alternatively, we may consider the case of training the model by keeping the *λ*_{i}’s fixed and changing or tuning the projections *f*_{i} to maximize the likelihood. In the corresponding neural circuit, this would imply that we would learn a circuit that implements the statistical model by training the sparse set of synaptic connections, *a*_{ij}, which define the projections, instead of training the synapses that weigh the projections, *λ*_{i} (Figure 1B).

Notably, a variant of the RP model in which projections that were weighted by a low value of *λ*_{i} are pruned and replaced with new random projections proved to be more accurate than the original RP model, while using fewer projections (30). This procedure of pruning and replacement is a crude form of learning of the model through changing the projections, and finding more efficient ones. We, therefore, asked here whether instead of the heuristic pruning and replacement, we can directly learn more accurate and efficient models by tuning the projections.

## Reshaping random projections gives more accurate and compact models

We first learned a new class of RP models for populations of tens of cortical neurons from the prefrontal cortex of monkeys performing a visual classification task (57) by tuning their randomly selected projections. Specifically, given an initial draw of sparse projections, the random weights that define the projections, *a*_{ij}, are then changed to maximize the likelihood of the model:

where *η* is the learning rate. We note that unlike the RP model presented in (30), here we used a sigmoid function for the nonlinearity of the projections,

where *β* sets the slope of the sigmoid. In this formulation, the model ranges from an independent model of the popula-tion for *β →* 0, to the original RP model of (30) for *β → ∞*. The rule for changing the projections (eq. 3) means that the specific set of inputs to each projection neuron is retained, but their relative weights are changed, and so the projections are “reshaped”. We focus henceforth on the case of all *λ*_{i} = 1, and so the Reshaped Random Projections (Reshaped RP) model, is given by

We compared the RP and the Reshaped RP models by quantifying their performance on the same set of initial projections. We first learned the RP model as in (30), using a Heaviside non-linearity for the projections, and RP models that used a sigmoid non-linearity, where both models used the same set of random projections, and found the latter models to be be more accurate (see supp. Figure 1). We then learned Reshaped RP models in which we optimize the same initial projections while keeping *λ*_{i} = 1. We note that while in its maximum entropy formulation, the RP model is the unique so-lution to a convex optimization problem, the Reshaped RP models are not guaranteed to reach a global optimum. We also considered yet another class of models, in which the projections and the Lagrange multipliers *λ*_{i} are optimized simultaneously, similar to backpropagation-based learning used to train feed-forward neural networks (see Methods). Figure 1C shows an example of the accuracy of the sigmoid RP models, Reshaped Random Projections models, and backpropagation-based models in predicting the probability of individual activity patterns for one group of 20 neurons, recorded from the cortex of behaving monkeys (57). The activity patterns are predicted by the reshaped RP model to an accuracy that is within the sampling noise (denoted by the 99% confidence interval funnel), and is similar to the performance of the full backpropagation model. The standard RP model, in comparison, has many more patterns that are outside the 99% confidence interval funnel. We quantified the performance of the three classes of models by calculating the mean log-likelihood of the models over 100 groups of 50 neurons on held out datasets, as a function of the number of projections that we used (Fig. 1D). The reshaped models outperform the RP ones for a low number of projections, whereas the performances of all three models converge to a similar value for large number of projections.

To compare the “mechanistic” nature of these different models, we calculated the mean correlation between the projections within each model class, and the average values of each projection (where the average is over the population activity patterns), which correspond to the mean firing rates of the neurons in the intermediate layer. Interestingly, the firing rates of the neurons in the intermediate layer are considerably lower for the reshaped models, and this sparseness in activity becomes more pronounced as a function of the number of projections (Figure 1E). We further find that the correlations between the projections in the reshaped models are considerably lower compared to RP and backpropagation models (Figure 1F).

Thus, the reshaped projection models suggest a way to learn more accurate models of population activity, by tuning of projections. These models are also more efficient, requiring fewer projections. These projections also have lower firing rates (i.e., reshaped projections use fewer spikes), and they are less correlated. Given their accuracy and efficiency, we next asked how adding biological features or constraints to a Reshaped RP circuit may affect its performance and efficiency.

## Normalized reshaping of random projections gives more accurate and efficient models

We studied the effect of adding two classes of biological features or constraints on the performance and nature of the Reshaped RP circuit model. The first constraint stems from the biophysical limits on individual synapses, and so we bound the maximal strength of individual synapses such that the strength of all synaptic weights are smaller than a “ceiling” value: |*a*_{ij}| *< θ*. The other is a normalization of the synaptic weights during the reshaping, inspired by the synaptic re-scaling that has been observed experimentally (49), and divisive normalization of synaptic weights (44).We consider multiple mechanisms of this kind later, but begin here with fixing the total sum of the incoming synaptic strength of each projection such that Σ *j* |*a*_{ij}| = *ϕ*. Thus, when the strength of one synapse increases (decreases), the strength of the rest of the incoming synapses decreases (increases) such that the total synaptic weight incoming into the projection is kept constant. We term this constraint “homeostatic synaptic normalization”. We emphasize that the notion of homeostatic mechanisms is commonly reserved for designating regulation processes that retain a functional property of neurons, whereas normalization of synaptic weights might seem more mechanistic than functional. But, as we show later, learning with synaptic normalization also regulates the firing rate of the projection neurons, and so, we use this name henceforth.

To compare the effect of these constraints, we used the same set of initial random projections, and then learn by reshaping them, each time with a different value of their corresponding parameters, *ϕ* or *θ*. We estimated the likelihood of each of the models on 100 groups of 50 neurons, over 100 random sets of 150 projections. To quantify the “synaptic budget” of each model, we measured the total sum of the absolute values of synaptic weights available to each model in units of the total synaptic strength of the initial set of projections (this is equivalent to defining the total sum of the synaptic weights of the initial set of projections as “1”, and then measuring total synaptic weights in these units). For the models with bounded synapses, the total available synaptic budget is given by the number of synapses times *θ*, whereas for the homeostatic constraint, it equals *ϕ* times the number of pro-jections in the model. Figure 2B shows the log-likelihood of each model class vs. the total available synaptic budget of the different models: For a wide range of synaptic budgets, the homeostatic models outperform the bounded models, and only for very high values of available synaptic budget, the performance of the bounded models is on par with the homeostatic models.

The differences between the homeostatic normalization models and the bounded synaptic strength models are further reflected in Figure 2C, which shows the performance of each model class as a function of the total sum of synaptic weights that is used by that model at the end of the training Σ *ij* |*a*_{ij}|. We note that the curve of the homeostatic model is identi-cal to the one from Figure 2B by definition; the curve of the bounded models shows that at a certain value of *θ* the sum of the synaptic weights starts to decrease and converges to the unconstrained reshaped model. The poor performance of the bounded models compared to the homeostatic ones suggests that the coupled changes in the synaptic weights improve learning. Specifically, during reshaping, the homeostatic models move synaptic “mass” from less important synapses to more important ones. This redistribution of resources results in accurate models even for relatively low values of synaptic weights – making them more efficient in terms of the total synaptic weight needed.

The dominance of the homeostatic learning over the bounded synaptic weights is clear not just for the average over models, but also at the level of individual models: Figure 2D shows the performance of the homeostatic and bounded models that are initialized with the same set of random projections; all bounded constraint models are inferior to the corresponding RP ones, whereas all the homeostatic constraint models are superior to the RP models (and clearly all the homeostatic models are superior to the corresponding bounded models).

We further find that the mean firing rates of the reshaped projection neurons, as well as the correlations between them, are lower in the homeostatic models compared to the bounded models (Figure 2E-F), making them more energetically efficient (in terms of spiking activity). We recall that this is consistent with the notions of efficient coding by decorrelated neural populations (58, 59).

## Normalized reshaping of random projections results in more efficient codes and homeostasis of firing rates

The experimental characterization of synaptic rescaling has shown it to be a homeostatic mechanism that regulates the firing rates of neurons (49). We therefore asked whether the synaptic normalization we employ for the Reshaped RP models has a similar effect. Figure 3A shows that the overall performance of the model in terms of capturing the population codebook is similar between the “free” reshape model and different values of synaptic normalization. Similarly, reshaping with normalization or without it drives the projection neurons to converge to similar average firing rate values (Figure 3B). However, the distribution of firing rates over the different neurons becomes narrower with tighter normalization values (Figure 3C). Importantly, while different normalization values imply very different initial firing rates of the projection neurons, after reshaping the values converge to similar average values (Figure 3D). Moreover, reshaping with normalization implies lower firing rates as well as smaller changes in the reshaping process (Figure 3E). Thus, normalized reshaping results in homeostatic regulation of the firing rates, which validates the naming of these models as homeostatic normalization reshaping of random projections.

Having established the computational benefits and efficiency of the homeostatic reshaped projection models that rely on synaptic normalization, we turned to ask how the connectivity itself, rather than the synaptic weights, may affect the performance of the models.

## Optimal sparseness of Reshaped Projections models under homeostatic constraints

The benefits of reshaping a given set of projections, reflected in the figures above, raise the question of the importance of the nature of the random projections we choose (which are then reshaped). We, therefore, asked how the initial random “wiring” of the projections affects the performance of the model, and whether non-random projections would result in even better models. To quantify the effects of the projections’ connectivity on the performance and efficiency of reshaped models, we used simulated population activity that we generated using RP models that were trained on real data. By using synthetic data that was generated by a known model, we can compare the learned models to the “ground truth” in terms of connectivity, as well as extensively sampling of activity patterns from the model.

We learned homeostatic reshaped models for the synthetic data, using different initial connectivity structures (Figure 4A-B): (i) A “true” connectivity model in which we reshaped a random projections model that has the same connectivity as the projections of the model that generated the data. (ii) A Random connectivity model in which we reshaped projections with sparse and connectivity that is randomly sampled and is independent of the model that generated the synthetic data. (iii) A full connectivity model in which we reshaped random projections with full connectivity, i.e., all input neurons are connected to all the projections, but with random initial weights. We carried out homeostatic reshaping of the projections in all three models with different values of *θ*. Surprisingly, the true and random connectivity models performed very similarly (Figure 4C). Although the full connectivity model contains the “ground truth” connectivity, and could recreate the true connectivity by canceling out unnecessary synapses during reshaping – we find that the full connectivity models are inferior to the other models, except for the case of high model costs.

The mean correlations between projections at the end of reshaping and the mean firing rates of the models that use the true and random connectivity were also very similar (Figure 4D-E), whereas the full connectivity models showed, again, very different behavior. These results reflect another computational benefit of homeostatic reshaping: there is no need to know the optimal circuit connectivity, and there is no apparent benefit to all-to-all connectivity, which would be expensive in terms of the energetic cost, the space needed, and the biological construction. Thus, starting from random connectivity and optimizing the circuit under homeostatic constraints seems to provide optimal results.

Given the inefficiency of the fully connected reshaped projections model, we also quantified the effect of the sparseness of the projections on reshaped RP models. We recall that for the standard RP model, sparse projections were optimal for a wide range of network sizes (30)), and so we measured the performance of homeostatic reshaped RP models for different values of in-degree of the projections, while keeping the total synaptic budget of the models fixed. We found that different synaptic budgets have a different optimal in-degree (Figure 4F), and that the value of the optimal in-degree seems to grow with the total synaptic budget.

We further estimated the efficiency of the models by the synaptic cost per connection in the projections (Figure 4G). We find that curves for different total synaptic costs seem to coincide and have a similar peak value – suggesting an optimal ratio between the total available resources and the number of synapses.

## Different homeostatic mechanisms for reshaping random projections models result in different projection sets

We explored two other forms of synaptic normalization rules for the reshaping of projections (Figure 5A). In the first, we fixed and normalized the outgoing synapses from each neuron, such that Σ *i* |*a*_{ij}| = *ϕ*. In the second, we kept the total synaptic weight of the whole circuit fixed, namely, Σ _{ij} |*a*_{ij}| = *ϕ*. Figure 5B shows that the performance of the models that use these other homeostatic mechanisms is surprisingly similar in terms of the model’s likelihood over the test data, as well as the firing rates of the projection neurons (Supp. Figure 2A), and correlations between them (Supp. Figure 2B).

As the homeostatic reshaping of random projections proved to be similarly accurate and efficient for the three homeostatic model variants, we asked which features of normalized reshaping might differentiate between homeostatic models in terms of their performance. Since each projection defines a hyperplane in the space of population activity patterns, reshaping can be interpreted as a rotation or a change of the angle of these hyperplanes, depicted schematically in Figure 5C. We, therefore, compared the different homeostatic variants of the reshaped projections models by initializing them from the same set of random projections, and evaluating the corresponding rotation angles, *θ*, of all of the projections due to the reshaping. Figure 5D shows an example of the rotations of the same initial projections for one model under different reshaping constraints, highlighting the substantial differences between them.

Figure 5E shows the mean rotation angle over 100 homeostatic models as a function of synaptic cost – reflecting that the different forms of homeostatic regulation results in different reshaped projections. Supp. Figure 2C-D shows the histogram of the rotation angles of several homeostatic models, as well as the unconstrained reshape model.

# Discussion

We presented a new family of statistical models for large neural populations that is based on sparse and random non-linear projections of the population, which are adapted during learning. This new family of models proved to be more accurate than the highly accurate Random Projections class of models, using fewer projections and incurring a lower “synaptic cost” in terms of the total sum of synaptic weights of the model. Moreover, we found that reshaping of the projections gave even more accurate and efficient models in terms of synaptic weights of the neural circuit that implements the model, and was optimal for random and sparse initial connectivity, surpassing fully connected network models. The synaptic normalization mechanism resulted in homeostatic regulation of the firing rates of neurons in the model.

Our results suggest a computational role for the experimentally observed scaling or normalization of synapses during learning: In addition to “regularizing” the firing rates in neural circuits, in our Reshaped RP models, homeostatic plasticity optimizes the performance of network models and their efficiency in scenarios of limited resources and random connectivity. Moreover, the similarity of the performance of models that use different homeostatic synaptic mechanisms suggests a possible universal role for homeostatic mechanisms in computation.

We note that while homeostatic synaptic scaling regulates the firing rates of neurons (49), it is not immediately clear what “sets” the desired firing rate of each neuron. The synaptic normalization constraints we used here offer a simple solution: a universal value of the total incoming synaptic weights for the neurons in the circuit (or outgoing ones), results in a widely distributed firing rates of neurons (which may change considerably during the learning), but converge to a similar average value. Thus, rather than requiring some mechanism to define and balance the firing rates of individual neurons, our model suggest a single global synaptic feature that would set this for the random projections.

The shallowness of the circuit implementation of the Reshaped Random Projections model implies that the learning of these models does not require the backpropagation of information over many layers, which distinguishes deep artificial networks from biological ones. Moreover, the locality of the reshaping process itself points to the feasibility of this model in terms of real biological circuits. The biological plausibility is further supported by the robustness of the model to the specific connectivity used for the reshaped models, and to the specific choice of the homeostatic mechanism we used.

A key remaining issue for the biological feasibility of the RP family of models is the feedback signal from the readout neuron to the intermediate neurons. The noise-dependent learning mechanism for RP models presented in (30) and for other local feedback and synaptic learning mechanisms that approximate backprogapation (35) offers clear directions for future study. Our results may also be relevant for learning in artificial neural networks, whose training relies on nonconvex approaches that necessitate different regularization techniques (60). The homeostatic mechanism we focused on here is a form of “hard” L1 regularization, but on the sum of the weights. This approach limits the search space, compared to regularization over the weights themselves, but defines coupled changes in weights, in a manner highly effective for the cortical data we studied. We, therefore, hypothesize that homeostatic normalization may be beneficial for artificial architectures (see, e.g., (37)).

# Materials and Methods

## Experimental Data

Extra-cellular recordings were performed using Utah arrays from populations of neurons in the prefrontal cortex of macaque monkeys performing a direction discrimination task with random dots. For more details see (57).

## Data Pre-processing

Neural activity was discretized using 20 ms bins, such that in each time bin a neuron was active (‘1’) if it emitted a spike in that bin and silent (‘0’) if not. Recorded data was split randomly into training sets and heldout test sets: 100 different random splits were generated for each model setup, consisting of 160,000 samples in the training set and 40,000 in the test set.

## Constructing Sparse Random Projections

Following (30), the coefficients *a*_{ij} of the random projections are set using a two stage process. First, the connectivity of the projections is set such that the average in-degree of the projections matches a predetermined sparsity value: each input neuron connects to each projection with a probability *p* = *indegree/n*, where *n* is the number of neurons in the input layer. The corresponding *a*_{ij} coefficients are then sampled from a Gaussian distribution, *a*_{ij} *∼ N* (1, 1), and the remaining *a*_{ij} values are set to zero. The threshold of each projection, *θ*_{i}, was set to 1.

The average in-degree of sparse models used here was 5, unless specified otherwise in the text. For the fully connected models *indegree* = *n* (i.e., sparsity=0).

## Training RP models

Given empirical data **X** and a set of projections defined by *a*_{ij}, we train the RP models by searching for the parameters *λ*_{i} that maximize the log-likelihood of the model given the data, arg max_{λi} (*L*(X)), where . This is a convex function whose gradient is given by

We found the values *λ*_{i} that maximize the log-likelihood by gradient descent with momentum or ADAM algorithms. We computed the empirical expectation in *⟨f*_{i}*⟩*X by summing over the training data, and the expectation over the probability model *⟨f*_{i}*⟩pRP* by summing over synthetic data generated from *p*_{RP} using Metropolis–Hasting sampling.

For each of the empirical marginals *⟨f*_{i}*⟩*X, we used the Clopper–Pearson method to estimate the distribution of possible values for the real marginal given the empirical observation. We set the convergence threshold of the numerical solver such that each of the marginals in the model distribution falls within a CI of one SD under this distribution, from its empirical marginal.

## Reshaping RP models

Given empirical data **X**, we optimize the RP models by modifying the coefficients *a*_{ij} such that the log-likelihood of the model is maximized, arg max_{a}*ij* (*L*(X)). Starting from an initial set of projec-tions, , using the update rule of equation 3, we optimize the projections by applying the gradient descent with momentum algorithm. Importantly, only non-zero elements of are optimized.

## Optimizing backpropagation models

Full backpropagation models are optimized using the learning rules of the trained RP models and the reshaped models simultaneously in each gradient descent step, i.e., eqs. 3 and 6.

## Homeostatic reshaping of RP models

The homeostatic RP models are reshaped as follows: We first define a set of unconstrained projections where the coefficients *ã*_{ij} are randomly sampled. Each of the projections is then normalized homeostatically, such that *a*_{ij} are a function of this unconstrained set: *a*_{ij} = *ϕ · ã*_{ij}*/Σ*_{k} |*ã*_{ik}|, where *ϕ* is the available synaptic budget for each projection. We then optimize *ã*_{ij} to maximize the log-likelihood of the model given the empirical data **X**: arg max*ãij* (*L*(**X**)). The computed constrained projections *a*_{ij} are then used in the resulting homeostatic RP model.

## Bounded reshaping of RP models

Similar to reshaping homeostatic RP models, we define a set of unconstrained projections *ã*_{ij}, where the projections are a function of this unconstrained set: *a*_{ij} = min (max (*ã*_{ij}, *−θ*), *θ*), where *θ* is the “ceiling” value of each synapse.

## Generating synthetic data from RP models with known connectivity

Synthetic neural activity patterns were obtained by training RP models on real neural recordings as described above and then generating data from these models using Metropolis-Hastings sampling.

# Acknowledgements

We thank Adam Haber, Tal Tamir, Udi Karpas, and the rest of the Schneidman lab members for discussions, comments, and ideas. This work was supported by Simons Collaboration on the Global Brain grant 542997 (ES), Israel Science Foundation grant 137628 (ES), Israeli Council for Higher Education/Weizmann Data Science Research Center (ES), Martin Kushner Schnur, and Mr. & Mrs. Lawrence Feis. ES is the incumbent of the Joseph and Bessie Feinberg Chair.

# Supplementary Figures

# References

- 1.A Point Process Framework for Relating Neural Spiking Activity to Spiking History, Neural Ensemble, and Extrinsic Covariate Effects
*Journal of Neurophysiology***93**:1074–1089https://doi.org/10.1152/jn.00697.2004 - 2.Spatio-temporal correlations and visual signalling in a complete neuronal population
*Nature***454**:995–999https://doi.org/10.1038/nature07140 - 3.A Generalized Linear Model for Estimating Spectrotemporal Receptive Fields from Responses to Natural Sounds
*PLoS ONE***6**https://doi.org/10.1371/journal.pone.0016104 - 4.Disentangling the functional consequences of the connectivity between optic-flow processing neurons
*Nature Neuro-science***15**:441–448https://doi.org/10.1038/nn.3044 - 5.Weak pairwise correlations imply strongly correlated network states in a neural population
*Nature***440**:1007–1012https://doi.org/10.1038/nature04701 - 6.The Structure of Multi-Neuron Firing Patterns in Primate Retina
*Journal of Neuroscience***26**:8254–8266https://doi.org/10.1523/JNEUROSCI.1282-06.2006 - 7.A Maximum Entropy Model Applied to Spatial and Temporal Correlations from Cortical Networks In Vitro
*Journal of Neuroscience***28**:505–518https://doi.org/10.1523/JNEUROSCI.3359-07.2008 - 8.Searching for Collective Behavior in a Large Network of Sensory Neurons
*PLoS Computational Biology***10**https://doi.org/10.1371/journal.pcbi.1003408 - 9.Sparse low-order interaction network underlies a highly correlated and learnable neural population code
*Proceedings of the National Academy of Sciences***108**:9679–9684https://doi.org/10.1073/pnas.1019641108 - 10.Prediction of Spatiotemporal Patterns of Neural Activity from Pairwise Correlations
*Physical Review Letters***102**https://doi.org/10.1103/PhysRevLett.102.138101 - 11.Sparse coding and high-order correlations in fine-scale cortical networks
*Nature***466**:617–621https://doi.org/10.1038/nature09178 - 12.Stimulus-dependent Maximum Entropy Models of Neural Population Codes
*PLOS Computational Biology***9**https://doi.org/10.1371/journal.pcbi.1002922 - 13.Collective Behavior of Place and Non-place Neurons in the Hippocampal Network
*Neuron***96**:1178–1191https://doi.org/10.1016/j.neuron.2017.10.027 - 14.A thesaurus for a neural population code
*eLife***4**https://doi.org/10.7554/eLife.06134 - 15.Retinal Metric: A Stimulus Distance Measure Derived from Population Neural Responses
*Physical Review Letters***110**https://doi.org/10.1103/PhysRevLett.110.058104 - 16.Inferring single-trial neural population dynamics using sequential auto-encoders
*Nature Methods***15**:805–815https://doi.org/10.1038/s41592-018-0109-9 - 17.Analyzing biological and artificial neural networks: challenges with opportunities for synergy?
*Current Opinion in Neurobiology***55**:55–64https://doi.org/10.1016/j.conb.2019.01.007 - 18.Training deep neural density estimators to identify mechanistic models of neural dynamics
*eLife***9**https://doi.org/10.7554/eLife.56261 - 19.Thermodynamics and signatures of criticality in a network of neurons
*Proceedings of the National Academy of Sciences***112**:11508–11513https://doi.org/10.1073/pnas.1514188112 - 20.Coarse Graining, Fixed Points, and Scaling in a Large Population of Neurons
*Physical Review Letters***123**https://doi.org/10.1103/PhysRevLett.123.178103 - 21.Towards the design principles of neural population codes
*Current Opinion in Neurobiology***37**:133–140https://doi.org/10.1016/j.conb.2016.03.001 - 22.Strongly correlated spatiotemporal encoding and simple decoding in the prefrontal cortex
*bioRxiv*https://doi.org/10.1101/693192 - 23.Cracking the Neural Code for Sensory Perception by Combining Statistics, Intervention, and Behavior
*Neuron***93**:491–507https://doi.org/10.1016/j.neuron.2016.12.036 - 24.The simplest maximum entropy model for collective behavior in a neural network
*Journal of Statistical Mechanics: Theory and Experiment***2013**https://doi.org/10.1088/1742-5468/2013/03/P03011 - 25.Nonlinear decoding of a complex movie from the mammalian retina
*PLOS Computational Biology***14**https://doi.org/10.1371/journal.pcbi.1006057 - 26.Functional characterization of retinal ganglion cells using tailored nonlinear modeling
*Scientific Reports***9**https://doi.org/10.1038/s41598-019-45048-8 - 27.A latent variable approach to decoding neural population activity
*bioRxiv* - 28.Long-term stability of cortical population dynamics underlying consistent behavior
*Nature Neuroscience***23**:260–270https://doi.org/10.1038/s41593-019-0555-4 - 29.The intrinsic attractor manifold and population dynamics of a canonical cognitive circuit across waking and sleep
*Nature Neuroscience***22**:1512–1520https://doi.org/10.1038/s41593-019-0460-x - 30.Learning probabilistic neural representations with randomly connected circuits
*Proceedings of the National Academy of Sciences***117**:25066–25073https://doi.org/10.1073/pnas.1912804117 - 31.Flexible and accurate inference and learning for deep generative models
*arXiv* - 32.Probabilistic Interpretation of Population Codes
*Neural Computation***10**:403–430https://doi.org/10.1162/089976698300017818 - 33.Towards Biologically Plausible Deep Learning
*arXiv* - 34.Using goal-driven deep learning models to understand sensory cortex
*Nature Neuroscience***19**:356–365https://doi.org/10.1038/nn.4244 - 35.Pyramidal Neuron as Two-Layer Neural Network
*Neuron***37**:989–999https://doi.org/10.1016/S0896-6273(03)00149-1 - 36.A deep learning framework for neuroscience
*Nature Neuroscience***22**:1761–1770https://doi.org/10.1038/s41593-019-0520-2 - 37.A theory of weight distribution-constrained learning
*Advances in Neural Information Processing Systems* - 38.Drawing inspiration from biological dendrites to empower artificial neural networks
*Current Opinion in Neurobiology***70**:1–10https://doi.org/10.1016/j.conb.2021.04.007 - 39.Optimal Degrees of Synaptic Connectivity
*Neuron***93**:1153–1164https://doi.org/10.1016/j.neuron.2017.01.030 - 40.Learning the Architectural Features That Predict Functional Similarity of Neural Networks
*Physical Review X***12**https://doi.org/10.1103/PhysRevX.12.021051 - 41.Generation of stable heading representations in diverse visual scenes
*Nature***576**:126–131https://doi.org/10.1038/s41586-019-1767-1 - 42.Reprogramming the topology of the nociceptive circuit in C. elegans reshapes sexual behavior
*Current Biology***32**:4372–4385https://doi.org/10.1016/j.cub.2022.08.038 - 43.The computational and learning benefits of Daleian neural networks
*arXiv* - 44.Normalization of cell responses in cat striate cortex
*Visual Neuroscience***9**:181–197https://doi.org/10.1017/S0952523800009640 - 45.Normalization as a canonical neural computation
*Nature Reviews Neuroscience***13**:51–62https://doi.org/10.1038/nrn3136 - 46.Activity-dependent scaling of quantal amplitude in neocortical neurons
*Nature***391**:892–896https://doi.org/10.1038/36103 - 47.Synaptic Scaling and Homeostatic Plasticity in the Mouse Visual Cortex In Vivo
*Neuron***80**:327–334https://doi.org/10.1016/j.neuron.2013.08.018 - 48.Firing Rate Homeostasis in Visual Cortex of Freely Behaving Rodents
*Neuron***80**:335–342https://doi.org/10.1016/j.neuron.2013.08.038 - 49.The Self-Tuning Neuron: Synaptic Scaling of Excitatory Synapses
*Cell***135**:422–435https://doi.org/10.1016/j.cell.2008.10.008 - 50.Locally coordinated synaptic plasticity of visual cortex neurons in vivo
*Science***360**:1349–1354https://doi.org/10.1126/science.aao0862 - 51.Homeostatic mechanisms regulate distinct aspects of cortical circuit dynamics
*Proceedings of the National Academy of Sciences***117**:24514–24525https://doi.org/10.1073/pnas.1918368117 - 52.Integrating Hebbian and homeostatic plasticity: the current state of the field and future research directions
*Philosophical Transactions of the Royal Society B: Biological Sciences***372**https://doi.org/10.1098/rstb.2016.0158 - 53.Hebbian plasticity requires compensatory processes on multiple timescales
*Philosophical Transactions of the Royal Society B: Biological Sciences***372**https://doi.org/10.1098/rstb.2016.0259 - 54.Modeling the Dynamic Interaction of Hebbian and Homeostatic Plasticity
*Neuron***84**:497–510 - 55.A model of neuronal responses in visual area MT
*Vision Research***38**:743–761https://doi.org/10.1016/S0042-6989(97)00183-1 - 56.Information Theory and Statistical Mechanics
*Physical Review***106**:620–630https://doi.org/10.1103/PhysRev.106.620 - 57.Dynamics of Neural Population Responses in Prefrontal Cortex Indicate Changes of Mind on Single Trials
*Current Biology***24**:1542–1547https://doi.org/10.1016/j.cub.2014.05.049 - 58.Possible principles underlying the transformation of sensory messages
*Sensory communication***1** - 59.Sparse coding with an overcomplete basis set: A strategy employed by V1?
*Vision Research***37**:3311–3325https://doi.org/10.1016/S0042-6989(97)00169-7 - 60.Deep Learning

# Article and author information

## Version history

- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:

## Copyright

© 2024, Jonathan Mayzel & Elad Schneidman

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

# Metrics

- views
- 202
- download
- 1
- citations
- 0

Views, downloads and citations are aggregated across all versions of this paper published by eLife.