Introduction

The potential “vocabulary” of spiking patterns of a population of neurons scales exponentially with the size of the population, and so, the mapping the rules of neural population codes and their semantic organization, cannot rely on direct sampling of the vocabulary for more than a handful of neurons. Moreover, the stochastic nature of neural activity implies that the characterization of neural codes must rely on probability distributions over population activity patterns. Therefore, to describe and analyze the structure and content of the code with which neural circuits respond to stimuli, process information, and direct action – we must learn statistical models of their activity. Such models have been used to study neural population codes in different systems: Models of the directional coupling between neurons, such as Generalized Linear Models, have been used to replicate the stimulusdependent rates of populations of tens of neurons (14). Maximum entropy models have accurately captured the joint activity patterns of more than 100 neurons, using simple statistical features of the population, like firing rates, pairwise correlations, synchrony, and other low-order statistics (5– 13). These models have further been used to characterize the semantic organization of population codes (14, 15). Autoencoder models have been employed to replicate the detailed structure of population activity – yielding generative models that can be used to study the code, but their design is difficult to interpret (1618). Importantly, scaling of these models to hundreds of neurons is computationally challenging (9, 19, 20), which has been a major challenge in modeling large neural systems.

While statistical models are invaluable for describing and studying neural codes, it is not clear whether the brain relies on such models or implements them when representing or processing information (21, 22). Consequently, much of the analysis of neural codes has focused on decoding population activity, typically using simple decoders (2, 23– 27), or metrics over the structure of population activity patterns (14, 28, 29). Yet, if neural circuits do implement such statistical models, and in particular, ones that compute the likelihood of their inputs – this would present a realizable mechanism for real neural circuits to carry Bayesian computation and decision making (3032). Such network models are, therefore, of interest not only as a way to study neural codes, but also as a potential way for biological neural networks to implement efficient learning and overcome the credit assignment problem. In addition, they may be useful for improving learning in artificial neural networks using biological features (3338).

Both structured architectural features of neural circuits and random connectivity patterns have been suggested to shape the computation carried out by neural circuits (30, 3943). These computations rely on the nature of synaptic connectivity and the coupling between synapses in terms of how they change during learning. Competition mechanisms between synapses or other regularization mechanisms have also been suggested to be important components of computation and learning in artificial neural networks as well as in cortical circuits (44, 45). One such mechanism is the homeostatic scaling of synaptic plasticity, which has been observed in vitro and in vivo at the level of incoming synapses to a neuron and outgoing ones (4649). This mechanism has been commonly attributed to the regulation of firing rates, while its computational implications remain mostly unclear, but of interest computationally and mechanistically (5054). A related computational feature has been presented by network models that include divisive normalization, suggested as an important component of computations performed by cortical circuits (55).

Here, we bring these ideas together to present a biologicallyinspired variant of a new family of statistical models for large neural population codes. Adding biological features to these population models enabled us to improve the models, and to explore designs that real neural circuits could employ to implement such models. Specifically, we expand the Random Projections (RP) model (30), which was shown to be highly accurate in recapitulating the detailed spiking patterns of more than 100 neurons in different neural systems. Importantly, in addition to being accurate and requiring little amounts of training data, these RP models can be readily implemented by a simple neural circuit model – suggesting how real neural circuits can learn a statistical model of their own inputs and compute the likelihood of the inputs. We show that we can make these models better by “reshaping” the randomly chosen sparse non-linear projections that they rely on, achieving highly accurate models using significantly fewer projections. We further show that reshaping of projections that incorporates normalization of synaptic weights during learning, results in more accurate models that are also more efficient, and makes the models homeostatic in terms of neural activity and total synaptic weights. Thus, we present a new class of accurate and efficient statistical models for large neural population codes that also suggests a clear computational benefit of homeostatic synaptic normalization and its potential role in biological neural networks and artificial ones.

Results

The Random Projections (RP) model is a class of highly accurate, scalabale, and efficient statistical models of the joint activity patterns of large populations of neurons (30, 31). These models are based on random and sparse nonlinear functions, or “projections”, of the population: Given a recording of the spiking activity of a population of neuorns, the model is a probability distribution over discrete activity patterns (quantized into small time bins, e.g. 10-20 ms), that relies on a set of random non-linear functions of the population activity,

where aij are randomly sampled coefficients such that most of them for any i are zero (i.e., the set is sparse), θj are thresholds, and σ(·) are nonlinear functions, (e.g., the Heaviside step function). The RP model is the maximum entropy distribution (56), which is consistent with the observed average values of the random projections ⟨fip = ⟨fidata (See Methods). Thus, it is the least structured distribution that retains the average values of the projections, is mathematically unique, and is given by

where λi are Lagrange multipliers, and Z is a normalization factor or the “partition function”, which can be found numerically. Applied to cortical data from multiple areas (see, e.g., Figure 1A), this model proved to be highly accurate in predicting individual activity patterns, using small amounts of training data (30). Importantly, unlike many other statistical models of population activity, RP models have a simple, biologically plausible neural circuit that can implement them (30): Figure 1B shows such a feed-forward circuit with one intermediate layer and an output neuron, where the random coefficients of the sparse projections, aij, are the synaptic weights connecting the input neurons to an intermediate layer of neurons fi. Each intermediate neuron implements one projection of the input population. The Lagrange multipliers, λi, are the synaptic weights connecting the intermediate layer to the output neuron, whose membrane potential or output gives the log-likelihood of the activity pattern of _x, up to a normalization factor.

Reshaped Random Projections models outperform Random Projections models.

(A) A short segment of the spiking activity of 100 cortical neurons used for the analysis and comparison of different statistical models of population activity (see Methods). (B) Schematics of the neural circuits that implement the different random projections models we compared: The “standard” Random Projections model, where the coefficients aij that define the projections are randomly selected and fixed whereas the factors λi are learned (see text). The Reshaped Random Projections model, where the coefficients aij that define the projections are tuned and the factors λis are fixed. The backpropagation model, where we tune both the aij s and λis. (C) The predicted probability of individual populations’ activity patterns as a function of the observed probability for RP, Reshaped and Backpropagation models. Gray funnels denote 99% confidence interval. (D) Average performance of models of the three classes is shown as a function of the number of projections, measured by log-likelihood, over 100 sets of randomly selected groups of 50 neurons. Reshaped models outperform RP models and are on par with backpropagation; the shaded area denotes the standard error over 100 models. (E-F) Mean firing rates of projection neurons and mean correlation between projections. Reshaped models show lower correlations and lower firing rates compared to RP and backpropagation models. Standard errors are smaller than marker’s size, hence invisible.

The model in eq. 2 harbors a duality between the projections, fi, and their coefficients, λi: In the maximum entropy formalism of the model, the projections are randomly sampled and then fixed, and their corresponding weights, λi’s, are tuned to maximize the entropy and satisfy the constraints. Alternatively, we may consider the case of training the model by keeping the λi’s fixed and changing or tuning the projections fi to maximize the likelihood. In the corresponding neural circuit, this would imply that we would learn a circuit that implements the statistical model by training the sparse set of synaptic connections, aij, which define the projections, instead of training the synapses that weigh the projections, λi (Figure 1B).

Notably, a variant of the RP model in which projections that were weighted by a low value of λi are pruned and replaced with new random projections proved to be more accurate than the original RP model, while using fewer projections (30). This procedure of pruning and replacement is a crude form of learning of the model through changing the projections, and finding more efficient ones. We, therefore, asked here whether instead of the heuristic pruning and replacement, we can directly learn more accurate and efficient models by tuning the projections.

Reshaping random projections gives more accurate and compact models

We first learned a new class of RP models for populations of tens of cortical neurons from the prefrontal cortex of monkeys performing a visual classification task (57) by tuning their randomly selected projections. Specifically, given an initial draw of sparse projections, the random weights that define the projections, aij, are then changed to maximize the likelihood of the model:

where η is the learning rate. We note that unlike the RP model presented in (30), here we used a sigmoid function for the nonlinearity of the projections,

where β sets the slope of the sigmoid. In this formulation, the model ranges from an independent model of the popula-tion for β → 0, to the original RP model of (30) for β → ∞. The rule for changing the projections (eq. 3) means that the specific set of inputs to each projection neuron is retained, but their relative weights are changed, and so the projections are “reshaped”. We focus henceforth on the case of all λi = 1, and so the Reshaped Random Projections (Reshaped RP) model, is given by

We compared the RP and the Reshaped RP models by quantifying their performance on the same set of initial projections. We first learned the RP model as in (30), using a Heaviside non-linearity for the projections, and RP models that used a sigmoid non-linearity, where both models used the same set of random projections, and found the latter models to be be more accurate (see supp. Figure 1). We then learned Reshaped RP models in which we optimize the same initial projections while keeping λi = 1. We note that while in its maximum entropy formulation, the RP model is the unique so-lution to a convex optimization problem, the Reshaped RP models are not guaranteed to reach a global optimum. We also considered yet another class of models, in which the projections and the Lagrange multipliers λi are optimized simultaneously, similar to backpropagation-based learning used to train feed-forward neural networks (see Methods). Figure 1C shows an example of the accuracy of the sigmoid RP models, Reshaped Random Projections models, and backpropagation-based models in predicting the probability of individual activity patterns for one group of 20 neurons, recorded from the cortex of behaving monkeys (57). The activity patterns are predicted by the reshaped RP model to an accuracy that is within the sampling noise (denoted by the 99% confidence interval funnel), and is similar to the performance of the full backpropagation model. The standard RP model, in comparison, has many more patterns that are outside the 99% confidence interval funnel. We quantified the performance of the three classes of models by calculating the mean log-likelihood of the models over 100 groups of 50 neurons on held out datasets, as a function of the number of projections that we used (Fig. 1D). The reshaped models outperform the RP ones for a low number of projections, whereas the performances of all three models converge to a similar value for large number of projections.

To compare the “mechanistic” nature of these different models, we calculated the mean correlation between the projections within each model class, and the average values of each projection (where the average is over the population activity patterns), which correspond to the mean firing rates of the neurons in the intermediate layer. Interestingly, the firing rates of the neurons in the intermediate layer are considerably lower for the reshaped models, and this sparseness in activity becomes more pronounced as a function of the number of projections (Figure 1E). We further find that the correlations between the projections in the reshaped models are considerably lower compared to RP and backpropagation models (Figure 1F).

Thus, the reshaped projection models suggest a way to learn more accurate models of population activity, by tuning of projections. These models are also more efficient, requiring fewer projections. These projections also have lower firing rates (i.e., reshaped projections use fewer spikes), and they are less correlated. Given their accuracy and efficiency, we next asked how adding biological features or constraints to a Reshaped RP circuit may affect its performance and efficiency.

Normalized reshaping of random projections gives more accurate and efficient models

We studied the effect of adding two classes of biological features or constraints on the performance and nature of the Reshaped RP circuit model. The first constraint stems from the biophysical limits on individual synapses, and so we bound the maximal strength of individual synapses such that the strength of all synaptic weights are smaller than a “ceiling” value: |aij| < θ. The other is a normalization of the synaptic weights during the reshaping, inspired by the synaptic re-scaling that has been observed experimentally (49), and divisive normalization of synaptic weights (44).We consider multiple mechanisms of this kind later, but begin here with fixing the total sum of the incoming synaptic strength of each projection such that Σ j |aij| = ϕ. Thus, when the strength of one synapse increases (decreases), the strength of the rest of the incoming synapses decreases (increases) such that the total synaptic weight incoming into the projection is kept constant. We term this constraint “homeostatic synaptic normalization”. We emphasize that the notion of homeostatic mechanisms is commonly reserved for designating regulation processes that retain a functional property of neurons, whereas normalization of synaptic weights might seem more mechanistic than functional. But, as we show later, learning with synaptic normalization also regulates the firing rate of the projection neurons, and so, we use this name henceforth.

To compare the effect of these constraints, we used the same set of initial random projections, and then learn by reshaping them, each time with a different value of their corresponding parameters, ϕ or θ. We estimated the likelihood of each of the models on 100 groups of 50 neurons, over 100 random sets of 150 projections. To quantify the “synaptic budget” of each model, we measured the total sum of the absolute values of synaptic weights available to each model in units of the total synaptic strength of the initial set of projections (this is equivalent to defining the total sum of the synaptic weights of the initial set of projections as “1”, and then measuring total synaptic weights in these units). For the models with bounded synapses, the total available synaptic budget is given by the number of synapses times θ, whereas for the homeostatic constraint, it equals ϕ times the number of pro-jections in the model. Figure 2B shows the log-likelihood of each model class vs. the total available synaptic budget of the different models: For a wide range of synaptic budgets, the homeostatic models outperform the bounded models, and only for very high values of available synaptic budget, the performance of the bounded models is on par with the homeostatic models.

Reshaped RP models that use homeostatic synaptic normalization outperform RP models and bounded RP models.

(A) Schematic drawing of the different models we studied: standard RP model, unconstrained reshaped model, and two types of constrained reshaped models: Bounded models in which each synapse separately obeys Σ ij |aij | < θ during learning, and normalized input reshaped models, where we fix the total synaptic weight of incoming synapses Σ ij |aij | = ϕ. (B) The mean log-likelihood of the models is shown as a function of the total available budget. The normalized input reshaped RP models give optimal results for a wide range of values of available synaptic budget, outperforming the bounded models and the RP model. (C) The mean log-likelihood of the models is shown as a function of the total used budget. Aside from the bounded models, all other models are the same as in (B) by construction. For high available budget values, bounded models show better performance while utilizing a lower synaptic budget, similar to the unconstrained reshape model. (D) Comparison of the performance of 100 individual examples of each model class and their corresponding RP model, where all models relied on the same set of initial projections. Normalized input models outperformed the RP models in all cases (all points are above the diagonal), while all bounded models were worse (points below the diagonal). (E-F) The mean correlation and firing rates of projections as a function of the model’s cost. Normalized reshape models show low correlations and mean firing rates, similar to unconstrained reshaped models. Note that in panels B, C, E, and F, the standard errors are smaller than the marker size, and are therefore invisible.

The differences between the homeostatic normalization models and the bounded synaptic strength models are further reflected in Figure 2C, which shows the performance of each model class as a function of the total sum of synaptic weights that is used by that model at the end of the training Σ ij |aij|. We note that the curve of the homeostatic model is identi-cal to the one from Figure 2B by definition; the curve of the bounded models shows that at a certain value of θ the sum of the synaptic weights starts to decrease and converges to the unconstrained reshaped model. The poor performance of the bounded models compared to the homeostatic ones suggests that the coupled changes in the synaptic weights improve learning. Specifically, during reshaping, the homeostatic models move synaptic “mass” from less important synapses to more important ones. This redistribution of resources results in accurate models even for relatively low values of synaptic weights – making them more efficient in terms of the total synaptic weight needed.

The dominance of the homeostatic learning over the bounded synaptic weights is clear not just for the average over models, but also at the level of individual models: Figure 2D shows the performance of the homeostatic and bounded models that are initialized with the same set of random projections; all bounded constraint models are inferior to the corresponding RP ones, whereas all the homeostatic constraint models are superior to the RP models (and clearly all the homeostatic models are superior to the corresponding bounded models).

We further find that the mean firing rates of the reshaped projection neurons, as well as the correlations between them, are lower in the homeostatic models compared to the bounded models (Figure 2E-F), making them more energetically efficient (in terms of spiking activity). We recall that this is consistent with the notions of efficient coding by decorrelated neural populations (58, 59).

Normalized reshaping of random projections results in more efficient codes and homeostasis of firing rates

The experimental characterization of synaptic rescaling has shown it to be a homeostatic mechanism that regulates the firing rates of neurons (49). We therefore asked whether the synaptic normalization we employ for the Reshaped RP models has a similar effect. Figure 3A shows that the overall performance of the model in terms of capturing the population codebook is similar between the “free” reshape model and different values of synaptic normalization. Similarly, reshaping with normalization or without it drives the projection neurons to converge to similar average firing rate values (Figure 3B). However, the distribution of firing rates over the different neurons becomes narrower with tighter normalization values (Figure 3C). Importantly, while different normalization values imply very different initial firing rates of the projection neurons, after reshaping the values converge to similar average values (Figure 3D). Moreover, reshaping with normalization implies lower firing rates as well as smaller changes in the reshaping process (Figure 3E). Thus, normalized reshaping results in homeostatic regulation of the firing rates, which validates the naming of these models as homeostatic normalization reshaping of random projections.

Homeostatic models firing rates.

(A) Likelihood values of the different models we consider here. Error bars denote standard error over 100 different models. (B) Projections mean firing rates during reshape. Standard error is smaller than marker size, hence cannot be seen (C) A histogram of the projections firing rates of unconstrained models and different homeostatic input models. (D) Projections mean firing rates after reshaping vs. the projection initial firing rate, for different homeostatic models and unconstrained model. Error bars denote standard deviation over 100 different models. (E) The cumulative density function of the projections firing rate relative change during reshape, |FRfinal − FRinitial| /F Rinitial.

Having established the computational benefits and efficiency of the homeostatic reshaped projection models that rely on synaptic normalization, we turned to ask how the connectivity itself, rather than the synaptic weights, may affect the performance of the models.

Optimal sparseness of Reshaped Projections models under homeostatic constraints

The benefits of reshaping a given set of projections, reflected in the figures above, raise the question of the importance of the nature of the random projections we choose (which are then reshaped). We, therefore, asked how the initial random “wiring” of the projections affects the performance of the model, and whether non-random projections would result in even better models. To quantify the effects of the projections’ connectivity on the performance and efficiency of reshaped models, we used simulated population activity that we generated using RP models that were trained on real data. By using synthetic data that was generated by a known model, we can compare the learned models to the “ground truth” in terms of connectivity, as well as extensively sampling of activity patterns from the model.

We learned homeostatic reshaped models for the synthetic data, using different initial connectivity structures (Figure 4A-B): (i) A “true” connectivity model in which we reshaped a random projections model that has the same connectivity as the projections of the model that generated the data. (ii) A Random connectivity model in which we reshaped projections with sparse and connectivity that is randomly sampled and is independent of the model that generated the synthetic data. (iii) A full connectivity model in which we reshaped random projections with full connectivity, i.e., all input neurons are connected to all the projections, but with random initial weights. We carried out homeostatic reshaping of the projections in all three models with different values of θ. Surprisingly, the true and random connectivity models performed very similarly (Figure 4C). Although the full connectivity model contains the “ground truth” connectivity, and could recreate the true connectivity by canceling out unnecessary synapses during reshaping – we find that the full connectivity models are inferior to the other models, except for the case of high model costs.

Models that rely on projections that use random connectivity show similar performance to models that use the correct connectivity.

(A) Synthetic population activity data is sampled from an RP model with known connectivity (i.e., the “ground truth” model; see Methods). (B) Homeostatic reshaped random projections models that differ in their connectivity are learned to fit the synthetic data. The “True connectivity” model uses projections whose connectivity is identical to the “ground truth” model. The “Random connectivity” model uses projections that are randomly sampled using sparse random connectivity. The “Full connectivity” model is a homeostatic reshaped model that uses projections with full connectivity. (C) The mean log-likelihood of the models is shown as a function of the model’s cost. The true connectivity model is only slightly better than the random connectivity model, with both outperforming the full connectivity model for low model budget values. (D) The mean correlation between the activity of the projection neurons, shown as a function of the model cost. We note that true and random connectivity models are indistinguishable. (E) The firing rates of the projection neurons, shown as a function of the model cost. (F) The performance of homeostatic reshaped RP models, shown as a function of their normalized in-degree of the projections (0–disconnected, 1–fully connected), for different normalization values, shown by the model’s synaptic cost. (G) The performance of the homeostatic reshaped RP models, shown as a function of the synaptic cost normalized by the in-degree of the projections. Curves of different cost values coincide, suggesting a fixed optimal cost/activity ratio. Note that in panels D-G the standard errors over 100 models are smaller than the size of markers and are, therefore, invisible.

The mean correlations between projections at the end of reshaping and the mean firing rates of the models that use the true and random connectivity were also very similar (Figure 4D-E), whereas the full connectivity models showed, again, very different behavior. These results reflect another computational benefit of homeostatic reshaping: there is no need to know the optimal circuit connectivity, and there is no apparent benefit to all-to-all connectivity, which would be expensive in terms of the energetic cost, the space needed, and the biological construction. Thus, starting from random connectivity and optimizing the circuit under homeostatic constraints seems to provide optimal results.

Given the inefficiency of the fully connected reshaped projections model, we also quantified the effect of the sparseness of the projections on reshaped RP models. We recall that for the standard RP model, sparse projections were optimal for a wide range of network sizes (30)), and so we measured the performance of homeostatic reshaped RP models for different values of in-degree of the projections, while keeping the total synaptic budget of the models fixed. We found that different synaptic budgets have a different optimal in-degree (Figure 4F), and that the value of the optimal in-degree seems to grow with the total synaptic budget.

We further estimated the efficiency of the models by the synaptic cost per connection in the projections (Figure 4G). We find that curves for different total synaptic costs seem to coincide and have a similar peak value – suggesting an optimal ratio between the total available resources and the number of synapses.

Different homeostatic mechanisms for reshaping random projections models result in different projection sets

We explored two other forms of synaptic normalization rules for the reshaping of projections (Figure 5A). In the first, we fixed and normalized the outgoing synapses from each neuron, such that Σ i |aij| = ϕ. In the second, we kept the total synaptic weight of the whole circuit fixed, namely, Σ ij |aij| = ϕ. Figure 5B shows that the performance of the models that use these other homeostatic mechanisms is surprisingly similar in terms of the model’s likelihood over the test data, as well as the firing rates of the projection neurons (Supp. Figure 2A), and correlations between them (Supp. Figure 2B).

A variety of homeostatic synaptic normalization mechanisms produce similar model behavior using a wide range of random projection parameters.

(A) Schematic drawings of the different homeostatic models we compared: Homeostatic input models in which we fixed the total synaptic weight of the incoming synapses Σj |aij | = ϕ; Homeostatic output models in which we fixed the total synaptic weight of the outgoing synapses Σi |aij | = ϕ; and Homeostatic circuit models in which we fixed the total synaptic weight of the whole synaptic circuit Σ ij |aij | = ϕ. (B) The mean log-likelihood of models, shown as a function of the total used synaptic cost. All three homeostatic model variants show similar behavior. (C) Schematic drawing of how projections rotate during reshaping: starting from the initial projections (grey lines), they rotate to their reshaped orientation (black lines) by angle θi. (D) Rotation angles after reshaping, shown for different pairs of models. All four panels show models that initialized with the same set of projections. The different labels specify the constraint type and strength, namely, the specific value of ϕ and θ. (E) The mean rotation angle of the projections due to reshaping, shown as a function of the model synaptic budget.

As the homeostatic reshaping of random projections proved to be similarly accurate and efficient for the three homeostatic model variants, we asked which features of normalized reshaping might differentiate between homeostatic models in terms of their performance. Since each projection defines a hyperplane in the space of population activity patterns, reshaping can be interpreted as a rotation or a change of the angle of these hyperplanes, depicted schematically in Figure 5C. We, therefore, compared the different homeostatic variants of the reshaped projections models by initializing them from the same set of random projections, and evaluating the corresponding rotation angles, θ, of all of the projections due to the reshaping. Figure 5D shows an example of the rotations of the same initial projections for one model under different reshaping constraints, highlighting the substantial differences between them.

Figure 5E shows the mean rotation angle over 100 homeostatic models as a function of synaptic cost – reflecting that the different forms of homeostatic regulation results in different reshaped projections. Supp. Figure 2C-D shows the histogram of the rotation angles of several homeostatic models, as well as the unconstrained reshape model.

Discussion

We presented a new family of statistical models for large neural populations that is based on sparse and random non-linear projections of the population, which are adapted during learning. This new family of models proved to be more accurate than the highly accurate Random Projections class of models, using fewer projections and incurring a lower “synaptic cost” in terms of the total sum of synaptic weights of the model. Moreover, we found that reshaping of the projections gave even more accurate and efficient models in terms of synaptic weights of the neural circuit that implements the model, and was optimal for random and sparse initial connectivity, surpassing fully connected network models. The synaptic normalization mechanism resulted in homeostatic regulation of the firing rates of neurons in the model.

Our results suggest a computational role for the experimentally observed scaling or normalization of synapses during learning: In addition to “regularizing” the firing rates in neural circuits, in our Reshaped RP models, homeostatic plasticity optimizes the performance of network models and their efficiency in scenarios of limited resources and random connectivity. Moreover, the similarity of the performance of models that use different homeostatic synaptic mechanisms suggests a possible universal role for homeostatic mechanisms in computation.

We note that while homeostatic synaptic scaling regulates the firing rates of neurons (49), it is not immediately clear what “sets” the desired firing rate of each neuron. The synaptic normalization constraints we used here offer a simple solution: a universal value of the total incoming synaptic weights for the neurons in the circuit (or outgoing ones), results in a widely distributed firing rates of neurons (which may change considerably during the learning), but converge to a similar average value. Thus, rather than requiring some mechanism to define and balance the firing rates of individual neurons, our model suggest a single global synaptic feature that would set this for the random projections.

The shallowness of the circuit implementation of the Reshaped Random Projections model implies that the learning of these models does not require the backpropagation of information over many layers, which distinguishes deep artificial networks from biological ones. Moreover, the locality of the reshaping process itself points to the feasibility of this model in terms of real biological circuits. The biological plausibility is further supported by the robustness of the model to the specific connectivity used for the reshaped models, and to the specific choice of the homeostatic mechanism we used.

A key remaining issue for the biological feasibility of the RP family of models is the feedback signal from the readout neuron to the intermediate neurons. The noise-dependent learning mechanism for RP models presented in (30) and for other local feedback and synaptic learning mechanisms that approximate backprogapation (35) offers clear directions for future study. Our results may also be relevant for learning in artificial neural networks, whose training relies on nonconvex approaches that necessitate different regularization techniques (60). The homeostatic mechanism we focused on here is a form of “hard” L1 regularization, but on the sum of the weights. This approach limits the search space, compared to regularization over the weights themselves, but defines coupled changes in weights, in a manner highly effective for the cortical data we studied. We, therefore, hypothesize that homeostatic normalization may be beneficial for artificial architectures (see, e.g., (37)).

Materials and Methods

Experimental Data

Extra-cellular recordings were performed using Utah arrays from populations of neurons in the prefrontal cortex of macaque monkeys performing a direction discrimination task with random dots. For more details see (57).

Data Pre-processing

Neural activity was discretized using 20 ms bins, such that in each time bin a neuron was active (‘1’) if it emitted a spike in that bin and silent (‘0’) if not. Recorded data was split randomly into training sets and heldout test sets: 100 different random splits were generated for each model setup, consisting of 160,000 samples in the training set and 40,000 in the test set.

Constructing Sparse Random Projections

Following (30), the coefficients aij of the random projections are set using a two stage process. First, the connectivity of the projections is set such that the average in-degree of the projections matches a predetermined sparsity value: each input neuron connects to each projection with a probability p = indegree/n, where n is the number of neurons in the input layer. The corresponding aij coefficients are then sampled from a Gaussian distribution, aij ∼ N (1, 1), and the remaining aij values are set to zero. The threshold of each projection, θi, was set to 1.

The average in-degree of sparse models used here was 5, unless specified otherwise in the text. For the fully connected models indegree = n (i.e., sparsity=0).

Training RP models

Given empirical data X and a set of projections defined by aij, we train the RP models by searching for the parameters λi that maximize the log-likelihood of the model given the data, arg maxλi (L(X)), where . This is a convex function whose gradient is given by

We found the values λi that maximize the log-likelihood by gradient descent with momentum or ADAM algorithms. We computed the empirical expectation in ⟨fiX by summing over the training data, and the expectation over the probability model ⟨fi⟩pRP by summing over synthetic data generated from pRP using Metropolis–Hasting sampling.

For each of the empirical marginals ⟨fiX, we used the Clopper–Pearson method to estimate the distribution of possible values for the real marginal given the empirical observation. We set the convergence threshold of the numerical solver such that each of the marginals in the model distribution falls within a CI of one SD under this distribution, from its empirical marginal.

Reshaping RP models

Given empirical data X, we optimize the RP models by modifying the coefficients aij such that the log-likelihood of the model is maximized, arg maxaij (L(X)). Starting from an initial set of projec-tions, , using the update rule of equation 3, we optimize the projections by applying the gradient descent with momentum algorithm. Importantly, only non-zero elements of are optimized.

Optimizing backpropagation models

Full backpropagation models are optimized using the learning rules of the trained RP models and the reshaped models simultaneously in each gradient descent step, i.e., eqs. 3 and 6.

Homeostatic reshaping of RP models

The homeostatic RP models are reshaped as follows: We first define a set of unconstrained projections where the coefficients ãij are randomly sampled. Each of the projections is then normalized homeostatically, such that aij are a function of this unconstrained set: aij = ϕ · ãijk |ãik|, where ϕ is the available synaptic budget for each projection. We then optimize ãij to maximize the log-likelihood of the model given the empirical data X: arg maxãij (L(X)). The computed constrained projections aij are then used in the resulting homeostatic RP model.

Bounded reshaping of RP models

Similar to reshaping homeostatic RP models, we define a set of unconstrained projections ãij, where the projections are a function of this unconstrained set: aij = min (max (ãij, −θ), θ), where θ is the “ceiling” value of each synapse.

Generating synthetic data from RP models with known connectivity

Synthetic neural activity patterns were obtained by training RP models on real neural recordings as described above and then generating data from these models using Metropolis-Hastings sampling.

Acknowledgements

We thank Adam Haber, Tal Tamir, Udi Karpas, and the rest of the Schneidman lab members for discussions, comments, and ideas. This work was supported by Simons Collaboration on the Global Brain grant 542997 (ES), Israel Science Foundation grant 137628 (ES), Israeli Council for Higher Education/Weizmann Data Science Research Center (ES), Martin Kushner Schnur, and Mr. & Mrs. Lawrence Feis. ES is the incumbent of the Joseph and Bessie Feinberg Chair.

Supplementary Figures

Sigmoid Random Projections models outperform Step Random Projections models.

(A) Average performance of models as a function of the number of projections, measured by log-likelihood. Sigmoid RP models outperform step RP models. (B-C) Mean firing rates of projection neurons and mean correlation between projections, step RP models show slightly lower correlations and firing rates compared to sigmoid RP models. In all panels shaded area denotes standard error over 100 models.

Homeostatic model variants show similar firing rates and correlation, but different rotation angles.

(A) and (B) Mean correlation between the projection neurons and firing rates as a function of the model cost of the three different homeostatic models we compared. All three homeostatic models show a similar behavior. Note that the standard errors over 100 models are smaller than the marker’s size and are, therefore, invisible. (C) and (D) Histograms of rotation angles of the projections after learning in the case of the unconstrained reshape models and different homeostatic input (C) and homeostatic circuit (D) models. The legend specifies the constraint type and strength, namely, the specific value of ϕ.