Extracting grid cell characteristics from place cell inputs using nonnegative principal component analysis
 Cited 11
 Views 3,603
 Annotations
Abstract
Many recent models study the downstream projection from grid cells to place cells, while recent data have pointed out the importance of the feedback projection. We thus asked how grid cells are affected by the nature of the input from the place cells. We propose a singlelayer neural network with feedforward weights connecting placelike input cells to grid cell outputs. Placetogrid weights are learned via a generalized Hebbian rule. The architecture of this network highly resembles neural networks used to perform Principal Component Analysis (PCA). Both numerical results and analytic considerations indicate that if the components of the feedforward neural network are nonnegative, the output converges to a hexagonal lattice. Without the nonnegativity constraint, the output converges to a square lattice. Consistent with experiments, grid spacing ratio between the first two consecutive modules is −1.4. Our results express a possible linkage between place cell to grid cell interactions and PCA.
https://doi.org/10.7554/eLife.10094.001eLife digest
Long before the invention of GPS systems, ships used a technique called dead reckoning to navigate at sea. By tracking the ship’s speed and direction of movement away from a starting point, the crew could estimate their position at any given time. Many believe that some animals, including rats and humans, can use a similar process to navigate in the absence of external landmarks. This process is referred to as “path integration”.
It is commonly believed that the brain’s navigation system is based on such path integration in two key regions: the entorhinal cortex and the hippocampus. Most models of navigation assume that a network of grid cells in the entorhinal cortex processes information about an animal’s speed and direction of movement. The grid cell network estimates the animal’s future position and relays this information to cells in the hippocampus called place cells. Individual place cells then fire whenever the animal reaches a specific location.
However, recent work has shown that information also flows from place cells back to grid cells. Further experiments have suggested that place cells develop before grid cells. Also, inactivating place cells eliminates the hexagonal patterns that normally appear in the activity of the grid cells.
Using a computational model, Dordek, Soudry et al. now show that place cell activity could in principle trigger the formation of the grid cell network, rather than vice versa. This is achieved using a process that resembles a common statistical algorithm called principal component analysis (PCA). However, this only works if place cells only excite grid cells and never inhibit their activity, similar to what is known from the anatomy of these brain regions. Under these circumstances, the model shows hexagonal patterns emerging in the activity of the grid cells, with similar properties to those patterns observed experimentally.
These results suggest that navigation may not depend solely on grid cells processing information about speed and direction of movement, as assumed by path integration models. Instead grid cells may rely on positionbased input from place cells. The next step is to create a single model that combines the flow of information from place cells to grid cells and vice versa.
https://doi.org/10.7554/eLife.10094.002Introduction
The system of spatial navigation in the brain has recently received much attention (Burgess, 2014; Morris, 2015; Eichenbaum, 2015). This system involves many regions, which seem to divide into two major classes: regions such as CA1 and CA3 of the hippocampus, which contain place cells (O'Keefe and Dostrovsky, 1971; O'Keefe and Nadel, 1978), vs. regions, such as the medialentorhinal cortex (MEC), the presubiculum and the parasubiculum, which contain grid cells, headdirection cells and border cells (Hafting et al., 2005; Boccara et al., 2010; Sargolini et al., 2006; Solstad et al., 2008; Savelli et al., 2008). While the phenomenology of those cells is described in many studies (Derdikman and Knierim, 2014; Tocker et al., 2015), the manner in which grid cells are formed is quite enigmatic. Many mechanisms have been proposed. The details of these mechanisms differ, however, they mostly share in common the assumption that the animal’s velocity is the main input to the system (Derdikman and Knierim, 2014; Zilli, 2012; Giocomo et al., 2011), such that positional information is generated by the integration of this input in time. This process is termed 'path integration' (PI) (Mittelstaedt and Mittelstaedt, 1980). A notable exception to this class of models was suggested in a previous paper by Kropff and Treves (2008); and in a sequel to that paper (Si and Treves, 2013), in which they demonstrated the emergence of grid cells from place cell inputs without using the rat's velocity as an input signal.
We note here that generating grid cells from place cells may seem at odds with the architecture of the network, since it is known that place cells reside at least one synapse downstream of grid cells (Witter and Amaral, 2004). Nonetheless, there is current evidence that the feedback from place cells to grid cells is of great functional importance. Specifically, there is evidence that inactivation of place cells causes grid cells to disappear (Bonnevie et al., 2013), and furthermore, it seems that, in development, place cells emerge before grid cells do (Langston et al., 2010; Wills et al., 2010). Thus, there is good motivation for trying to understand how the feedback from hippocampal place cells may contribute to grid cell formation.
In the present paper, we thus investigated a model of grid cell development from place cell inputs. We showed the resemblance between a feedforward network from place cells to grid cells to a neural network architecture previously used to implement the PCA algorithm (Oja, 1982). We demonstrated, both analytically and through simulations, that the formation of grid cells from place cells using such a neural network could occur given specific assumptions on the input (i.e. zero mean) and on the nature of the feedforward connections (specifically, nonnegative, or excitatory).
Results
Comparing neuralnetwork results to PCA
We initially considered the output of a singlelayer neural network and of the PCA algorithm in response to the same inputs. These consisted of the temporal activity of a simulated agent moving around in a twodimensional (2D) space (Figure 1A; see Materials and methods for details). In order to mimic place cell activity, the simulated virtual space was covered by multiple 2D Gaussian functions uniformly distributed at random (Figure 1B), which constituted the input. In order to calculate the principal components, we used a [Neuron x Time] matrix (Figure 1C) after subtracting the temporal mean, generated from the trajectory of the agent as it moved through the place fields. Thus, we displayed a onedimensional mapping of the twodimensional activity, transforming the 2D activity into a 1D vector per input neuron. This resulted in the [Neuron X Neuron] covariance matrix (Figure 1D), on which PCA was performed by evaluating the appropriate eigenvalues and eigenvectors.
To learn the grid cells, based on the place cell inputs, we implemented a singlelayer neural network with a single output (Figure 2). Input to output weights were governed by a Hebbianlike learning rule. As described in the Introduction (see also analytical treatment in the Methods section), this type of architecture induces the output’s weights to converge to the leading principal component of the input data.
The agent explored the environment for a sufficiently long time allowing the weights to converge to the first principal component of the temporal input data. In order to establish a spatial interpretation of the eigenvectors (from PCA) or the weights (from the converged network) we projected both the PCA eigenvectors and the network weights onto the place cells space, producing corresponding spatial activity maps. The leading eigenvectors of the PCA and the network’s weights converged to squarelike periodic spatial solutions (Figure 3A–B).
Being a PCA algorithm, the spatial projections of the weights were periodic in space due to the covariance matrix of the input having a Toeplitz structure (Dai et al., 2009) (a Toeplitz matrix has constant elements along each diagonal). Intuitively, the Toeplitz structure arises due to the spatial stationarity of the input. In fact, since we used periodic boundary conditions for the agent’s motion, the covariance matrix was a circulant matrix, and the eigenvectors were sinusoidal functions, with length constants determined by the scale of the box (Gray, 2006) [a circulant matrix is defined by a single row (or column), and the remaining rows (or columns) are obtained by cyclic permutations. It is a special case of a Toeplitz matrix  see for example Figure 1D]. The covariance matrix was heavily degenerate, with approximately 90% of the variance accounted for by the first 15% of the eigenvectors (Figure 4B). The solution demonstrated a fourfold redundancy. This was apparent in the plotted eigenvalues (from the largest to the smallest eigenvalue, Figure 4A and C), which demonstrated a fourfold groupingpattern. The fourfold redundancy can be explained analytically by the symmetries of the system – see analytical treatment of PCA in Methods section (specifically Figure 15C).
In summary, both the direct PCA algorithm and the neural network solutions developed periodic structure. However, this periodic structure was not hexagonal but rather had a squarelike form.
Adding a nonnegativity constraint to the PCA
It is known that most synapses from the hippocampus to the MEC are excitatory (Witter and Amaral, 2004). We thus investigated how a nonnegativity constraint, applied to the projections from place cells to grid cells, affected our simulations. As demonstrated in the analytical treatment in the Methods section, we could expect to find hexagons when imposing the nonnegativity constraint. Indeed, when adding this constraint, the outputs behaved in a different manner and converged to a hexagonal grid, similar to real grid cells. While it was straightforward to constrain the neural network, calculating nonnegative PCA directly was a more complicated task due to the nonconvex nature of the problem (Montanari and Richard, 2014; Kushner and Clark, 1978).
In the network domain, we used a simple rectification rule for the learned feedforward weights, which constrained their values to be nonnegative. For the direct nonnegative PCA calculation, we used the raw place cells activity (after spatial or temporal mean normalization), as inputs to three different iterative numerical methods: NSPCA (Nonnegative Sparse PCA), AMP (Approximate Message Passing) and FISTA (Fast Iterative Threshold and Shrinkage) based algorithms (see Materials and methods section).
In both cases, we found that hexagonal grid cells emerged in the output layer (plotted as spatial projection of weights and eigenvectors: Figure 5A–B, Figure 6A–B, Video 1, Video 2). When we repeated the process over many simulations (i.e. new trajectories and random initializations of weights) we found that the population as a whole consistently converged to hexagonal gridlike responses, while similar simulations with the unconstrained version did not (compare Figure 3 to Figure 5–Figure 6).
In order to further assess the hexagonal grid emerging in the output, we calculated the mean (hexagonal) Gridness scores ([Sargolini et al., 2006], which measure the degree to which the solution resembles a hexagonal grid [see Materials and methods]). We ran about 1500 simulations of the network (in each simulation, the network consisted of 625 place celllike inputs and a single grid celllike output), and found noticeable differences between the constrained and unconstrained cases. Namely, the Gridness score in the nonnegatively constrainedweight simulations was significantly higher than in the unconstrainedweight case (Gridness = 1.07 ± 0.003 in the constrained case vs. 0.302 ± 0.003 in the unconstrained case. see Figure 7). A similar difference was observed with the direct nonnegative PCA methods (1500 simulations, each with different trajectories, Gridness = 1.13 ± 0.0022 in the constrained case vs. 0.27 ± 0.0023 in the unconstrained case).
Another score we tested was a 'Square Gridness' score (see Materials and methods) where we measured the 'Squareness' of the solutions (as opposed to 'Hexagonality'). We found that the unconstrained network had a higher squareGridness score while the constrained network had a lower squareGridness score (Figure 7); for both the directPCA calculation (squareGridness = 0.89 ± 0.0074 in the unconstrained case vs. 0.1 ± 0.006 in the constrained case) and the neuralnetwork (squareGridness = 0.073 ± 0.006 in the constrained case vs. 0.73 ± 0.008 in the unconstrained case).
All in all, these results suggest that when direct PCA eigenvectors and neural network weights were unconstrained they converged to periodic square solutions. However, when constrained to be nonnegative, the direct PCA, and the corresponding neural network weights, both converged to a hexagonal solution.
Dependence of the result on the structure of the input
We investigated the effect of different inputs on the emergence of the grid structure in the networks' output. We found that some manipulation of the input was necessary in orderto enable the implementation of PCA in the neural network. Specifically, PCA requires a zeromean input, while simple Gaussianlike place cells do not possess this property. In order to obtain input with zeromean, we either performed differentiation of the place cells’ activity in time, or used a Mexicanhat like (Laplacian) shape (See Materials and methods for more details on the different types of inputs). Another option we explored was the usage of positivenegative disks with a total sum of zero activity in space (Figure 8). The motivation for the use of Mexicanhat like transformations is their abundance in the nervous system (Wiesel and Hubel, 1963; EnrothCugell and Robson, 1966; Derdikman et al., 2003).
We found that usage of simple 2D Gaussianfunctions as inputs did not generate hexagonal grid cells as outputs (Figure 9). On the other hand, timedifferentiated inputs, positivenegative disks or Laplacian inputs did generate gridlike output cells, both when running the nonnegative PCA directly (Figure 6), or by simulating the nonnegatively constrained Neural Network (Figure 5). Another approach we used for obtaining zeromean was to subtract the mean dynamically from every output individually (see Materials and methods). The latter approach, related to adaptation of the firing rate, was adopted from Kropff & Treves (Kropff and Treves, 2008), who used it to control various aspects of the grid cell's activity. In addition to controlling the firing rate of the grid cells, if applied correctly, the adaptation could be exploited to keep the output's activity stable, with zeromean rates. We applied this method in our system and in this case the outputs converged to hexagonal grid cells as well, similarly to the previous cases (e.g. derivative in time, or Mexican hats as inputs; data not shown).
In summary, two conditions were required for the neural network to converge to spatial solutions resembling hexagonal grid cells: (1) nonnegativity of the feedforward weights and (2) an effective zeromean of the inputs (in time or space).
Stability analysis
Convergence to hexagons from various initial spatial conditions
In order to numerically test the stability of the hexagonal solution, we initialized the network in different ways, randomly, using linear stripes, squares, rhomboids (squares on hexagonal lattice) and noisy hexagons. In all cases, the network converged to a hexagonal pattern (Figure 10; for squares and stripes, other shapes not shown here).
We also ran the converged weights in a new simulation with novel trajectories and tested the Gridness scores, and the intertrial stability in comparison to previous simulations. We found that the hexagonal solutions of the network remained stable although the trajectories varied drastically (data not shown).
Asymptotic stability of the equilibria
Under certain conditions (e.g., decaying learning rates and independent and identically distributed (i.i.d.) inputs), it was previously proved (Hornik and Kuan, 1992), using techniques from the theory of stochastic approximation, that the system described here can be asymptotically analyzed in terms of (deterministic) Ordinary Differential Equations (ODE), rather than in terms of the stochastic recurrence equations. Since the ODE defining the converged weights is nonlinear, we solved the ODEs numerically (see Materials and methods), by randomly initializing the weight vector. The asymptotic equilibria were reached much faster, compared to the outcome of the recurrence equations. Similarly to the recurrence equations, constraining the weights to be nonnegative induced them to converge into a hexagonal shape while a nonconstrained system produced squarelike outcomes (Figure 11).
Simulation was run 60 times, with 400 outputs per run. 60° Gridness score mean was 1.1 ± 0.0006 when weights were constrained and 0.29 ± 0.0005 when weights were unconstrained. 90° Gridness score mean was 0.006 ± 0.002 when weights were constrained and 0.8 ± 0.0017 when weights were unconstrained.
Effect of place cell parameters on grid structure
A more detailed view of the resulting grid spacing showed that it was heavily dependent on the field widths of the place cells inputs. When the environment size was fixed and the output calculated per input size, the gridspacing (distance between neighboring peaks) increased for larger place cell field widths.
To enable a fast parameter sweep over many place cell field widths (and large environment sizes), we took the steady state limit, and the limit of a high density of place cell locations, and used the fast FISTA algorithm to solve the nonnegative PCA problem (see Materials and methods section).
We performed multiple simulations, and found that there was a simple linear dependency between the place field size and the output grid scale. For the case of periodic boundary conditions, we found that grid scale was S = 7.5sigma+0.85, where sigma was the width of the place cell field (Figure 12A). For a different set of simulations with zero boundary conditions, we achieved a similar relation: S=7.54sigma+0.62 (figure not shown). Grid scale was more dependent on place field size and less on box size (Figure 12H). We note that for very large environments, the effects of boundary conditions diminishes. At this limit, this linear relation between place field size and grid scale can be explained from analytical considerations (see Materials and methods section). Intuitively, this follows from dimensional analysis: given an infinite environment, at steady state the length scale of the place cell field width is the only length scale in the model, so any other length scale must be proportional to this scale. More precisely, we can provide a lower bound for the linear fit (Figure 12A), which depends only on the tuning curve of the place cells (see Materials and methods section). This lower bound was derived for periodic boundary conditions, but works well even with zero boundary conditions (not shown).
Furthermore, we found that the grid orientation varied substantially for different place cell field widths, in the possible range of 0–15 degrees (Figure 12C,D). For small environments, the orientation strongly depended on the boundary conditions. However, as described in the Methods section, analytical considerations suggest that as the environment grows, the distribution of grid orientations becomes uniform in the range of 0–15 degrees, with a mean at 7.5°. Intuitively, this can be explained by rotational symmetry – when the environment size is infinite, all directions in the model are equivalent, and so we should get all orientations with equal probability, if we start the model from a uniformly random initialization. In addition, grid orientation was not a clear function of the gridness of the obtained grid cells (Figure 12B). For large enough place cells, gridness was larger than 1 (Figure 12E–G).
Modules of grid cells
It is known that in reality grid cells form in modules of multiple spacings (Barry et al., 2007; Stensola et al., 2012). We tried to address this question of modules in several ways. First, we used different widths for the Gaussian/Laplacian input functions: Initially, we placed a heterogeneous population of widths in a given environment (i.e., uniformly random widths) and ran the singleoutput network 100 times. The distribution of grid spacings was almost comparable to the results of the largest width if applied alone, and did not exhibit module like behavior. This result is not surprising when thinking about a small place cell overlapping in space with a large place cell. Whenever the agent passes next to the small one, it activates both weights via synaptic learning. This causes the large firing field to overshadow the smaller one. Additionally, when using populations of only two widths of place fields, the grid spacings were dictated by the size of the larger place field (data not shown).
The second option we considered was to use a multioutput neural network, capable of computing all 'eigenvectors' rather than only the principal 'eigenvector' (where by 'eigenvector' we mean here the vectors achieved under the positivity constraint, and not the exact eigenvectors themselves). We used a hierarchical network implementation introduced by Sanger, 1989 (see Materials and methods). Since the 1^{st} output’s weights converged to the 1^{st} 'eigenvector', the network (Figure 13A–B) provided to the subsequent outputs (2^{nd}, 3^{rd}, and so forth) a reducedversion of the data from which the projection of the 1^{st} 'eigenvector' has been subtracted out. This process, reminiscent of GramSchmidt orthogonalization, was capable of computing all 'eigenvectors' (in the modified sense) of the input's covariance matrix. It is important to note though that, due to the nonnegativity constraint, the vectors achieved in this way were not orthogonal, and thus it cannot be considered a real orthogonalization process, although, as explained in the Methods section, the process does aim for maximum difference between the vectors.
When constrained to be nonnegative, and using the same homogeneous 'place cells' as in the previous network, the networks' weights converged to hexagonal shapes. Here, however, we found that the smaller the 'eigenvalue' was (or the higher the principal component number) the denser the grid became. We were able to identify two main populations of griddistance 'modules' among the hexagonal spatial solutions with high Gridness scores (>0.7, Figure 14A–B). In addition, we found that the ratio between the distances of the modules was −1.4, close to the value of 1.42 found by Stensola et al. (Stensola et al., 2012). Although we searched for additional such jumps, we could only identify this single jump, suggesting that our model can yield up to two 'modules' and not more. The same process was repeated using the direct PCA method, utilizing the covariance matrix of the data after simulation as input for the nonnegative PCA algorithms, and considering their ability to calculate only the 1^{st} 'eigenvector'. By iteratively projecting the 1^{st }'eigenvector' on the simulation data and subtracting the outcome from the original data, we applied the nonnegative PCA algorithm to the residual data obtaining the 2^{nd }'eigenvector' of the original data. This 'eigenvector' now constituted the 1^{st} eigenvector' of the new residual data (see Materials and methods). Applying this process to as many 'outputs' as needed, we obtained very similar results to the ones presented above using the neural network (data not shown).
Discussion
In our work, we explored the nature and behavior of the feedback projections from place cells to grid cells. We shed light on the importance of this relation and showed, both analytically and in simulation, how a simple singlelayer neural network could produce hexagonal grid cells when subjected to place celllike temporal input from a randomlyroaming moving agent. We found that the network resembled a neural network performing PCA (Oja, 1982), with the constraint that the weights were nonnegative. Under these conditions, and also under the requirements that place cells have a zero mean in time or space, the first principal component in the 2D arena had a firing pattern resembling a hexagonal grid cell. Furthermore, we found that in the limit of very large arenas, grid orientation converged to a uniform distribution in range of 0–15°. When looking at additional components, grid scale tends to be discretely clustered, such that two modules emerge. This is partially consistent with current experimental findings (Stensola et al., 2012; 2015). Furthermore, the inhibitory connectivity between multiple grid cells is consistent with the known functional anatomy in this network (Couey et al., 2013).
PlacetoGrid as a PCA network
As a consequence of the requirements for PCA to hold, we found that the place cell input needed to have a zeromean, otherwise the output was not periodic. Due to the lack of the zeromean property in 2D Gaussians, we used various approaches to impose zeromean on the input data. The first, in the time domain, was to differentiate the input and use the derivatives (a random walk produces zeromean derivatives) as inputs. Another approach was to dynamically subtract the mean in all iterations of the simulation. This approach was reminiscent of the adaptation procedure suggested in the Kropff & Treves paper (Kropff and Treves, 2008). A third approach, applied in the spatial domain was to use inputs with a zerospatial mean such as Laplacians of Gaussians (Mexican hats in 2D, or differencesofGaussians) or negative – positive disks. Such Mexicanhat inputs are quite typical in the nervous system (Wiesel and Hubel, 1963; EnrothCugell and Robson, 1966; Derdikman et al., 2003), although in the case of place cells it is not completely known how they are formed. They could be a result of interaction between place cells and the vast number of inhibitory interneurons in the local hippocampal network (Freund and Buzsáki, 1996).
Another condition we found crucial, which was not part of the original PCA network, was a nonnegativity constraint on the placetogrid learned weights. While rather easy to implement in the network, adding this constraint to the nonconvex PCA problem was harder to implement. Since the problem is NPhard (Montanari and Richard, 2014), we turned to numerical methods. We used three different algorithms (Montanari and Richard, 2014; Zass and Shashua, 2006; Beck and Teboulle, 2009) to find the leading 'eigenvector' of every given temporal based input. As shown in the results section, both processes (i.e. direct PCA and the neural network) resulted in hexagonal outcomes when the nonnegativity and zeromean criteria were met. Note that the ease of use of the neural network for solving the positive PCA problem is a nice feature of the neural network implementation, and should be investigated further.
We also note that while our network focused on the projection from place cells to grid cells, we cannot preclude the importance of the reciprocal projection from grid cells to place cells. Further study will be needed to ‘close the loop’ and simultaneously consider both of these projections at once.
Similar studies
We note that similar work has noticed the relation between placecelltogridcell transformation and PCA. Notably, Stachenfeld et al., (2014) have demonstrated, from considerations related to reinforcement learning, that grid cells could be related to place cells through a PCA transformation. However, due to the unconstrained nature of their transformation, the resulting grid cells were squarelike. Furthermore, there has been an endeavor to model the transformation from place cells to grid cells using independentcomponentanalysis (Franzius et al., 2007).
We also note that there is now a surge of interest in the feedback projection from place cells to gridcells, which is inverse to the anatomical downstream direction from grid cells to place cells (Witter and Amaral, 2004) that has guided most of the models todate (Zilli, 2012; Giocomo et al., 2011). In addition to several papers from the Treves group, in which the projection from place cells to grid cells is studied (Kropff and Treves, 2008; Si and Treves, 2013), there has been also recent work from other groups as well exploring this direction (Castro and Aguiar, 2014; Stepanyuk, 2015). As far as we are aware, none of the previous studies noted the importance of the nonnegativity constraint and the requirement of zero mean input. Additionally, to the best of our knowledge, the analytic results and insights provided in this work (see Materials and methods) are novel, and provide a mathematically consistent explanation for the emergence of hexagonallyspaced grid cells.
Predictions of our model
Based on the findings of this work, it is possible to make several predictions. First, the grid cells must receive zeromean input over time to produce hexagonally shaped firing patterns. With all feedback projections from place cells being excitatory, the lateral inhibition from other neighboring grid cells might be the balancing parameter to achieve the temporal zeromean (Couey et al., 2013). Alternatively, an adaptation method, such as the one suggested in Kropff and Treves, (2008) may be applied. Second, if indeed the grid cells are a lower dimensional representation of the place cells in a PCA form, the placetogrid neural weights distribution should be similar across identically spaced grid cell populations. This is because all grid cells with similar spacing would have maximized the variance over the same input, resulting in similar spatial solutions. As an aside, we note that such a projection may be a source of phaserelated correlations in grid cells (Tocker et al., 2015). Third, we found a linear relation between the size of the place cells and the spacing between grid cells. Furthermore, the spacing of the grid cells is mostly determined by the size of the largest place cell – predicting that the feedback from large place cells is not connected to grid cells with small spacing. Fourth, we found modules of different grid spacings in a hierarchical network with the ratio of distances between successive units close to √2. This result is in accordance with the ratio reported in Stensola et al., (2012). However, we note that there is a difference between our results and experimental results because the analysis predicts that there should only be two modules, while the data show at least 5 modules, with a range of scales, the smallest and most numerous having approximately the scale of the smaller place fields found in the dorsal hippocampus (25–30 cm). Fifth, for large enough environments our model suggests that, from mathematical considerations, the grid orientation should approach a uniform orientation in the possible range of 0–15°. This is in discrepancy with experimental results which measure a peak at 7.5°, and not a uniform distribution (Stensola et al., 2015). As noted, the discrepancies between our results and reality may relate to the fact that a more advanced model will have to take into account both the downstream projection from grid cells to place cells together with the upstream projection from place cells to grid cells discussed in this paper. Furthermore, such a model will have to take into account the nonuniform distribution of placecell widths (Kjelstrup et al., 2008).
Why hexagons?
In light of our results, we further asked what is special about the hexagonal shape which renders it a stable solution. Past works have demonstrated that hexagonality is optimal in terms of efficient coding. Two recent papers have addressed the potential benefit of encoding by grid cells. Mathis et al., (2015) considered the decoding of spatial information based on a gridlike periodic representation. Using lower bounds on the reconstruction error based on a Fisher information criterion, they demonstrated that hexagonal grids lead to the highest spatial resolution in two dimensions (extensions to higher dimensions were also provided). The solution is obtained by mapping the problem onto a circle packing problem. The work of Wei et al., (2013) also took a decoding perspective, and showed that hexagonal grids minimize the number of neurons required to encode location with a given resolution. Both papers offer insights into the possible information theoretic benefits of the hexagonal grid solution. In the present paper, we were mainly concerned with a specific biologically motivated learning (development) mechanism that may yield such a solution. Our analysis suggests that the hexagonal patterns can arise as a solution that maximizes the grid cell output variance, under nonnegativity constraints. In Fourier space, the solution is a hexagonal lattice with lattice constant near the peak of the Fourier transform of the place cell tuning curve (Figures 15 and 16; see Materials and methods).
To conclude, this work demonstrates how grid cells could be formed from a simple Hebbian neural network with place cells as inputs, without needing to rely on pathintegration mechanisms.
Materials and methods
All code was written in MATLAB, and can be obtained on https://github.com/derdikman/Dordeketal.Matlabcode.git or on request from authors.
Neural network architecture
We implemented a singlelayer neural network with feedforward connections that was capable of producing a hexagonallike output (Figure 2). The feedforward connections were updated according to a selfnormalizing version of a Hebbian learning rule referred to as the Oja rule (Oja, 1982),
where ${\epsilon}^{t}$ denotes the learning rate, ${J}_{i}^{t}$ is the ${i}^{th}$ weight and ${\psi}^{t},{r}_{i}^{t}$ are the output and the ${i}^{th}$ input of the network, respectively (all at time $t$). The weights were initialized randomly according to a uniform distribution and then normalized to have norm 1. The output ${\psi}^{t}$ was calculated every iteration by summing up all presynaptic activity from the entire input neuron population. The activity of each output was processed through a sigmoidal function (e.g., $\text{tanh}$) or a simple linear function. Formally,
where n is the number of input place cells. Since we were initially only concerned with the eigenvector associated with the largest eigenvalue, we did not implement a multipleoutput architecture. In this formulation, in which no lateral weights were used, multiple outputs were equivalent to running the same setting with one output several times.
As discussed in the introduction, this kind of simple feedforward neural network with linear activation and a local weight update in the form of Oja’s rule (1) is known to perform Principal Components Analysis (PCA) (Oja, 1982; Sanger, 1989; Weingessel and Hornik, 2000). In the case of a single output the feedforward weights converge to the principal eigenvector of the input's covariance matrix. With several outputs, and lateral weights, as described in the section on modules, the weights converge to the leading principal eigenvectors of the covariance matrix, or, in certain cases (Weingessel and Hornik, 2000), to the subspace spanned by the principal eigenvectors. We can thus compare the results of the neural network to those of the mathematical procedure of PCA. Hence, in our simulation, we (1) let the neural networks' weights develop in real time based on the current place cell inputs. In addition, we (2) saved the input activity for every time step to calculate the input covariance matrix and perform (batch) PCA directly.
It is worth mentioning that the PCA solution described in this section can be interpreted differently based on the Singular Value Decomposition (SVD). Denoting by $R$ the $T\times d$ spatiotemporal pattern of place cell activities (after setting the mean to zero), where $T$ is the time duration and $d$ is the number of place cells, the SVD decomposition (see Jolliffe, 2002; sec. 3.5) for $R$ is $R=\text{ULA'}$. For a matrix $R$ of rank $r$, $L$ is a $r\times r$ diagonal matrix whose $k$th element is equal to ${l}_{k}{}^{1/2}$, the square root of the $k$th eigenvalue of the covariance matrix $\text{RR'}$ (computed in the PCA analysis), $A$ is the $d\times r$ matrix with $k$th column equal to the $k$th eigenvector of $\text{RR'}$, and $U$ is the $T\times r$ matrix whose $k$th column is ${l}_{k}^{1/2}R{a}_{k}$. Note that $U$ is a $T\times r$ dimensional matrix whose $k$th column represents the temporal dynamics of the $k$^{th} grid cell. In other words, the SVD provides a decomposition of the place cell activity in terms of the grid cell activity, as opposed to the grid cell representation in terms of place cell activity we discussed so far. The network learns the spatial weights over place cells (the eigenvectors) as the connections weights from the place cells, and 'projection onto place cell space' (${l}_{k}^{1/2}R{a}_{k}$) is simply the firing rates of the output neuron plotted against the location of the agent.
The question we therefore asked was under what conditions, when using place celllike inputs, a solution resembling hexagonal grid cells emerges. To answer this we used both the neuralnetwork implementation and the direct calculation of the PCA coefficients.
Simulation
We simulated an agent moving in a 2D virtual environment consisting of a square arena covered by $n$ uniformly distributed 2D Gaussianshaped place cells, organized on a grid, given by
where $\mathit{X}\left(t\right)$ represents the location of the agent. The variables ${r}_{i}^{t}$ constitute the temporal input from place cell $i$ at time $t$, and $\mathbf{}{\mathit{C}}_{i},{\sigma}_{i}$ are the ${i}^{th}$ place cell’s field center and width, respectively (see variations on this input structure below). In order to eliminate boundary effects, periodic boundary conditions were assumed. The virtual agent moved about in a random walk scheme (see Appendix) and explored the environment (Figure 1A). The place cell centers were assumed to be uniformly distributed (Figure 1B) and shared the same standard deviation $\sigma $. The activity of all place cells as a function of time $(r{\left(t\right)}_{1},r{\left(t\right)}_{2}\dots r{\left(t\right)}_{n})$ was dependent on the stochastic movement of the agent, and formed a [Neuron x Time] matrix ($r\in {R}^{nxT}$, with T being the time dimension, see Figure 1C).
The simulation was run several times with different input arguments (see Table 1). The agent was simulated for $T$ time steps, allowing the neural network's weights to develop and reach a steady state by using the learning rule (Equations 1,2) and the input (Equation 3) data. The simulation parameters are listed below and include parameters related to the environment, simulation, agent and network variables.
To calculate the PCA directly, we used the MATLAB function Princomp in order to evaluate the $n$ principal eigenvectors ${\left\{{\overrightarrow{q}}_{k}\right\}}_{k=1}^{n}$ and corresponding eigenvalues of the input covariance matrix. As mentioned in the Results section, there exists a near fourfold redundancy in the eigenvectors (XY axis and in phase). Figure 3 demonstrates this redundancy by plotting the eigenvalues of the covariance matrix. The output response of each eigenvector ${\overrightarrow{q}}_{k}$ corresponding to a 2D input location $(x,y)$ is
where ${c}_{x}^{j}$ and ${c}_{y}^{j}$ are the $x,y$ components of the centers of the individual place cell fields. Unless otherwise mentioned, we used place cells in a rectangular grid, such that a place cell is centered at each pixel of the image (that is – number of place cells equals the number of image pixels).
Nonnegativity constraint
Projections between place cells and grid cells are known to be primarily excitatory (Witter and Amaral, 2004), thus if we aim to mimic the biological circuit, a nonnegativity constraint should be added to the feedforward weights in the neural network. While implementing a nonnegativity constraint in the neural network is rather easy (a simple rectification rule in the weight dynamics, such that weights which are smaller than 0 are set to 0), the equivalent condition for calculating nonnegative Principal Components is more intricate. Since this problem is nonconvex and, in general, NPhard (Montanari and Richard, 2014), a numerical procedure was imperative. We used three different algorithms for this purpose.
The first (Zass and Shashua, 2006) named NSPCA (Nonnegative Sparse PCA) is based on coordinatedescent. The algorithm computes a nonnegative version of the covariance matrix's eigenvectors and relies on solving a numerical optimization problem, converging to a local maximum starting from a random initial point. The local nature of the algorithm did not guarantee a convergence to a global optimum (recall that the problem is nonconvex). The algorithm's inputs consisted of the place cell activities’ covariance matrix, α  a balancing parameter between reconstruction and orthonormality, β – a variable which controls the amount of sparseness required, and an initial solution vector. For the sake of generality, we set the initial vector to be uniformly random (and normalized), α was set to a relatively high value – 10^{4} and since no sparseness was needed, β was set to zero.
The second algorithm (Montanari and Richard, 2014) does not require any simulation parameters except an arbitrary initialization. It works directly on the inputs and uses a message passing algorithm to define an iterative algorithm to approximately solve the optimization problem. Under specific assumptions it can be shown that the algorithm asymptotically solves the problem (for large input dimensions).
The third algorithm we use is the parameter free Fast Iterative Threshold and Shrinkage algorithm FISTA (Beck and Teboulle, 2009). As described later in this section, this algorithm is the fastest of the three, and allowed us rapid screening of parameter space.
Different variants of input structure
Performing PCA on raw data requires the subtraction of the data mean. Some thought was required in order to determine how to perform this subtraction in the case of the neural network.
One way to perform the subtraction in the time domain was to dynamically subtract the mean during simulation by using the discrete 1^{st} or 2^{nd} derivatives of the inputs in time [i.e. from Equation 3, $\u2206r(t+1)=r(t+1)r\left(t\right)$]. Under conditions of an isotropic random walk (namely, given any starting position, motion in all directions is equally likely) it is clear that $E\left[\u2206r\left(t\right)\right]=0$. Another option for subtracting the mean in the time domain was the use of an adaptation variable, as was initially introduced by Kropff and Treves, (2008). Although originally exploited for control over the firing rate, it can be viewed as a variable that represents subtraction of a weighted sum of the firing rate history. Instead of using the inputs ${r}_{i}^{t}$ directly in Equation 2 to compute the activation ${\psi}^{t}$, an intermediate adaptation variable ${\psi}_{adp}^{t}\left(\delta \right)$ was used ($\delta $ being the relative significance of the present temporal sample) as
It is not hard to see that for i.i.d. variables ${\psi}_{adp}^{t}$, the sequence ${\overline{\psi}}^{t}$ converges for large $t$ to the mean of ${\psi}^{t}$. Thus, when $\text{t}\to \infty $ we find that $E\left[{\psi}_{adp}^{t}\right]\to 0$, specifically, the adaptation variable is of zero asymptotic mean.
The second method we used to enforce a zero mean input was simply to create it in advance. Rather than using 2D Gaussian functions (i.e. [Equation 3]) as inputs we used 2D differenceofGaussians (all $\sigma $ are equal in x and y axis):
where the constants ${c}_{1}$ and ${c}_{2}$ are set so the integral of the given Laplacian function is zero (if the environment size is not too small, then ${c}_{1,i}/{c}_{2,i}\approx {\sigma}_{2,i}/{\sigma}_{1,i}$). Therefore, if we assume a random walk that covers the entire environment uniformly, the temporal mean of the input would be zero as well. Such input data can be inspired by similar behavior of neurons in the retina and the lateralgeniculate nucleus (Wiesel and Hubel, 1963; EnrothCugell and Robson, 1966). Finally, we implemented another input data type; positivenegative disks (see Appendix). Analogously to the differenceofGaussians function, the integral over input is zero so the same goal (zeromean) was achieved. It is worthwhile noting that subtracting a constant from a simple Gaussian function is not sufficient since at infinity it does not reach zero.
Quality of solution and Gridness
In order to test the hexagonality of the results we used a hexagonal Gridness score (Sargolini et al., 2006). The Gridness score of the spatial fields was calculated from a cropped ring of their autocorrelogram including the six maxima closest to the center. The ring was rotated six times, ${30}^{\circ}$ per rotation, reaching in total angles of ${30}^{\circ},{60}^{\circ},{90}^{\circ},{120}^{\circ},{150}^{\circ}$. Furthermore, for every rotated angle the Pearson correlation with the original unrotated map was obtained. Denoting by ${C}_{\gamma}$ the correlation for a specific rotation angle $\gamma $, the final Gridness score was (Kropff and Treves, 2008):
In addition to this 'traditional' score we used a Squareness Gridness score in order to examine how squarelike the results are spatially. The special reference to the square shape was driven by the tendency of the spatial solution to converge to a rectangular shape when no constrains were applied. The Squareness Gridness score is similar to the hexagonal one, but now the cropped ring of the autocorrelogram is rotated ${45}^{\circ}$ every iteration to reach angles of ${45}^{\circ},\text{}{90}^{\circ},\text{}{135}^{\circ}$. As before, denoting ${C}_{\gamma}$ as the correlation for a specific rotation angle $\gamma $ the new Gridness score was calculated as:
All errors calculated in gridness measures are SEM (Standard Error of the Mean).
Hierarchical networks and modules
As described in the Results section, we were interested to check whether a hierarchy of outputs could explain the module phenomenon described for real grid cells. We replaced the singleoutput network with a hierarchical, multiple outputs network, which is capable of computing all 'principal components' of the input data while maintaining the nonnegativity constraint as before. The network, introduced by Sanger, 1989, computes each output as a linear summation of the weighted inputs similar to Equation 2. However, the weights are now calculated according to:
The first term in the parenthesis when $k=1$ was the regular HebbOja derived rule. In other words, the first output calculated the first nonnegative 'principal component' (in inverted commas due to the nonnegativity) of the data. Following the first one, the weights of each output received a back projection from the previous outputs. This learning rule applied to the data in a similar manner to the GramSchmidt process, subtracting the 'influence' of the previous 'principal components' on the data and recalculating the appropriate 'principal components' of the updated input data.
In a comparable manner, we applied this technique to the input data $\mathit{X}$ in order to obtain nonnegative 'eigenvectors' from the direct nonnegativePCA algorithms. We found ${\mathit{V}}_{2}$ by subtracting from the data the projection of ${\mathit{V}}_{1}$ on it,
Next, we computed $\mathbf{}{\mathit{V}}_{2}$, the first nonnegative 'principal component' of $\stackrel{~}{\mathit{X}}$, and similarly the subsequent ones.
Stability of hexagonal solutions
In order to test the stability of the solutions we obtained under all types of conditions, we applied the ODE method (Kushner and Clark, 1978; Hornik and Kuan, 1992; Weingessel and Hornik, 2000) to the PCA feature extraction algorithm introduced in pervious sections. This method allows one to asymptotically replace the stochastic update equations describing the neural dynamics by smooth differential equations describing the average asymptotic behavior. Under appropriate conditions, the stochastic dynamics converge with probability one to the solution of the ODEs. Although originally this approach was designed for a more general architecture (including lateral connections and asymmetric updating rules), we used a restricted version for our system. In addition, the following analysis is accurate solely for linear output functions. However, since our architecture works well with either linear or nonlinear output functions, the conclusions are valid.
We can rewrite the relevant updating equations of the linear neural network (in matrix form), (see [Weingessel and Hornik, 2000] Equations 15–19):
In our case we set
Consider the following assumptions
The input sequence ${r}^{t}$ consists of independent identically distributed, bounded random variables with zeromean.
$\left\{{\epsilon}^{t}\right\}$ is a positive number sequence satisfying: $\sum _{t}{\epsilon}^{t}=\infty ,{\sum _{t}\left({\epsilon}^{t}\right)}^{2}\infty $.
A typical suitable sequence is ${\epsilon}^{t}=\frac{1}{t},t=1,2\dots $.
For long times, we denote
The penultimate equalities in these equations used the fact that the weights converge with probability one to their average value, resulting from the solution of the ODEs. Following Weingessel and Hornik, (2000), we can analyze Equations 12,13 under the above assumptions, via their asymptotically equivalent associated ODEs
with equilibria at
We solved it numerically by exploiting the same covariance matrix and initializing with random weights $J$. In line with our previous findings, we found that constraining $J$ to be nonnegative (by a simple cutoff rule) resulted in a hexagonal shape (in the projection of $J$ onto the place cells space; Figure 11). In contrast, when the weights were not constrained they converged to squarelike results.
Steady state analysis
From this point onwards, we focus on the case of a single output, in which $J$ is a row vector, unless stated otherwise. In the unconstrained case, from Equation 17 any $J$ which is a normalized eigenvector of $\Sigma $ would be a fixed point. However, from Equation 16, only the principal eigenvector, which is the solution to the following optimization problem
would correspond to a stable fixed point. This is the standard PCA problem. By adding the constraint $J\ge 0$ we get the nonnegative PCA problem.
To speed up simulation and simplify analysis we make further simplifications.
First, we assume that the agent’s random movement is ergodic (e.g., an isotropic random walk in a finite box as we used in our simulation), uniform and covering the entire environment, so that
where $x$ denotes location vector (in contrast to $X\left(t\right)$, which is the random process corresponding to the location of the agent), $S$ is the entire environment, and $\leftS\right$ is the size of the environment.
Second, we assume that the environment $S$ is uniformly and densely covered by identical place cells, each of which has the same a tuning curve $r\left(\mathbf{x}\right)$ (which integrates to zero). In this case, the activity of the linear grid cell becomes a convolution operation
where $J\left(\mathbf{\text{x}}\right)$ is the synaptic weight connecting to the place cell at location $\text{x}$.
Thus, we can write our objective as
under the constraint that the weights are normalized
where either $J\left(\mathbf{\text{x}}\right)\in \mathrm{\mathbb{R}}$ (PCA) or $J\left(\mathbf{\text{x}}\right)\ge 0$ ('nonnegative PCA').
Since we expressed the objective using a convolution operation (different boundary conditions can be assumed), it can be solved numerically considerably faster. In the nonnegative case, we used the parameter free Fast Iterative Threshold and Shrinkage algorithm [FISTA (Beck and Teboulle, 2009); in which we do not use shrinkage, since we only have hard constraints], where the gradient was calculated efficiently using convolutions.
Moreover, as we show in the following sections, if we assume periodic boundary conditions and use Fourier analysis, we can analytically find the PCA solutions, and obtain important insight on the nonnegative PCA solutions.
Fourier notation
Any continuously differentiable function $f\left(\mathbf{\text{x}}\right)$, defined over $S\triangleq {[0,L]}^{D}$, a 'box' region in $D$ dimensions, with periodic boundary conditions, can be written using a Fourier series
where $\leftS\right={L}^{D}$ is the volume of the box and
is the reciprocal lattice of S in kspace (frequency space).
PCA solution
Assuming periodic boundary conditions, we use Parseval’s identity, and the properties of the convolution, to transform the steady state objective (Equation 21) to its simpler form in the Fourier domain,
Similarly, the normalization constraint can also be written in the Fourier domain,
Maximizing the objective Equation 24 under this constraint in the Fourier domain, we immediately get that any solution is a linear combination of the Fourier components,
where
and $\hat{J}\left(\mathbf{\text{k}}\right)$ satisfies the normalization constraint. In the original space, the Fourier components are
where $\varphi \in \left[0,2\pi \right)$ is a free parameter that determines the phase. Also, since $J\left(\mathbf{\text{x}}\right)$ should assume real values, it is composed of real Fourier components
This is a valid solution, since $r\left(\mathbf{\text{x}}\right)$ is a realvalued function, $\hat{r}\left(\mathbf{\text{k}}\right)=\hat{r}(\mathbf{\text{k}})$ and therefore ${\mathbf{\text{k}}}_{*}\in {{\mathrm{argmax}}_{\mathbf{\text{k}}\in \hat{\mathrm{S}}}}^{\hat{\mathrm{r}}}\left(\mathbf{\text{k}}\right)$.
PCA solution for a difference of Gaussians tuning curve
In this paper we focused on the case where $r\left(\mathbf{\text{x}}\right)$ has the shape of a difference of Gaussians (Equation 7),
where ${c}_{1}$ and ${c}_{2}$ are some positive normalization constants, set so that ${\int}_{S}r\left(\mathbf{\text{x}}\right)d\mathbf{\text{x}}=0$ (see appendix). The Fourier transform of $r\left(\mathbf{\text{x}}\right)$ is also a difference of Gaussians
$\forall \mathbf{\text{k}}\in \hat{S}$, as we show in the appendix. Therefore the value of the Fourier domain objective only depends on the radius $\Vert k\Vert $, and all solutions ${k}_{*}$ have the same radius $\Vert {k}_{*}\Vert $. If $L\to \infty $, then the $k$lattice $\widehat{S}$ becomes dense ($\widehat{S}\to {\mathbb{R}}^{D}$) and this radius is equal to
which is a unique maximizer, that can be easily obtained numerically.
Notice that if we multiply the place cell field width by some positive constant $c$, then the solution ${k}_{\u2020}$ will be divided by $c$. The grid spacing, proportional to $\frac{1}{{k}_{\u2020}}$, would therefore also be multiplied by $c$. This entails a linear dependency between the place cell field width and the grid cell spacing, in the limit of a large box size $\left(L\to \infty \right)$. When the box has a finite size, klattice discretization also has a (usually small) effect on the grid spacing.
In that case, all solutions ${k}_{*}$ are restricted to be on the finite lattice $\widehat{S}$. Therefore, the solutions ${k}_{*}$ are the points on the lattice $\widehat{S}$ for which the radius $\Vert {k}_{*}\Vert $ is closest to ${k}_{\u2020}$ (see Figure 15B,C).
The degeneracy of the PCA solution
The number of realvalued PCA solutions (degeneracy) in 1D is two, as there are exactly two maxima, ${k}_{*}$ and ${k}_{*}$. The phase $\varphi $, determines how the components at ${k}_{*}$ and ${k}_{*}$ are linearly combined.
However, there are more maxima in the 2D case. Specifically, given a maximum ${k}_{*}$, we can write $\left(m,n\right)=\frac{L}{2\pi}{k}_{*}$, where $(m,n)\in {\mathrm{\mathbb{Z}}}^{2}$. Usually there are 7 other different points with the same radius: $\left(m,n\right)$,$\left(m,n\right)$,$\left(m,n\right)$,$\left(n,m\right)$,$\left(n,m\right)$,$\left(n,m\right)$ and $(n,m)$, so we will have a degeneracy of eight (corresponding to the symmetries of a square box). This is case of points in group B, shown in Figure 15C.
However, we can also get a different degeneracy. First, if either $m=\pm n$, $n=0$ or $m=0$ we will have a degeneracy of 4, since then some of the original eight points will coincide (groups A,C and D in Figure 15C). Second, additional points $\left(k,r\right)$ can exist such that ${k}^{2}+{r}^{2}={m}^{2}+{n}^{2}$, (Pythagorean triplets with the same hypotenuse) – for example, ${15}^{2}+{20}^{2}={25}^{2}={7}^{2}+{24}^{2}$. These points will also appear in groups of four or eight.
Therefore, we will always have a degeneracy which is some multiple of 4. Note that in the full network simulation, the degeneracy is not exact. This is due to the perturbation noise from the agent’s random walk as well as the nonuniform sampling of the place cells.
The PCA solution with a nonnegative constraint
Next, we add the nonnegativity constraint $J\left(\mathbf{\text{x}}\right)\ge 0$. As mentioned earlier, this constraint renders the optimization problem NPhard, and prevents us from a complete analytical solution. We therefore combine numerical and mathematical analysis, in order to gain intuition as to why
Locally optimal 2D solutions are hexagonal.
These solutions have a grid spacing near ($4\pi /\left(\sqrt{3}{k}_{\u2020}\right)$ (${k}_{\u2020}$ is the peak of $\hat{r}\left(k\right)$).
The average grid alignment is approximately 7.5°, for large environments.
Why grid cells have modules, and what is their spacing.
1D Solutions
Our numerical results indicate that the Fourier components of any locally optimal 1D solution of nonnegative PCA have the following structure:
There is a nonnegative 'DC component' (k = 0).
The maximal nonDC component, (k ≠ 0) is ${k}_{*}$, where ${k}_{*}$ is 'close' (more details below) to ${k}_{\u2020}$, the peak of $\hat{r}\left(k\right)$.
All other nonzero Fourier components are ${\left\{m{k}_{*}\right\}}_{n=1}^{\infty}$, weaker harmonies of ${k}_{*}$.
This structure suggests that the component at ${k}_{*}$ aims to maximize the objective, while the other components guarantee the nonnegativity of the solution $J\left(x\right)$. In order to gain some analytical intuition as to why this is the case, we first examine the limit case that $L\to \infty $ and $\widehat{r}\left(k\right)$ is highly peaked at ${k}_{\u2020}$. In that case the Fourier objective (Equation 24) simply becomes $2{\left\widehat{r}\left({k}_{\u2020}\right)\right}^{2}{\left\widehat{J}\left({k}_{\u2020}\right)\right}^{2}$. For simplicity, we will rescale our units so that ${\left\widehat{r}\left({k}_{\u2020}\right)\right}^{2}=1/2$, and the objective becomes ${\left\widehat{J}\left({k}_{\u2020}\right)\right}^{2}$. Therefore, the solution must include a Fourier component at ${k}_{\u2020}$ or the objective would be zero. The other components exist only to maintain the nonnegativity constraint, since if they increase in magnitude, then the objective, which is proportional to ${\left\hat{J}\left({k}_{\u2020}\right)\right}^{2}$, must decrease to compensate (due to the normalization constraint – Equation 25). Note that these components must include a positive 'DC component' at $k=0$, or else $\underset{S}{\int}J\left(x\right)}dx\propto \widehat{J}\left(0\right)\le 0$, which contradicts the constraints. To find all the Fourier components, we examine a solution composed of only a few ($M$) components
Clearly, we can set ${k}_{1}={k}_{\u2020}$, or otherwise, the objective would be zero. Also, we must have
Otherwise, the solution would be either (1) negative or (2) nonoptimal, since we can decrease $\widehat{J}\left(0\right)$ and increase $\left{J}_{1}\right$.
For $M=1$, we immediately get that, in the optimal solution, $2{\widehat{J}}_{1}=\widehat{J}\left(0\right)=\sqrt{2/3}$ (${\varphi}_{m}$ does not matter). For $M=2,3$ and $4$ a solution is harder to find directly, so we performed a parameter grid search over all the free parameters (${k}_{m},{\widehat{J}}_{m}$ and ${\varphi}_{m}$) in those components. We found that the optimal solution (which maximizes the objective ${\left\widehat{J}\left({k}_{\u2020}\right)\right}^{2}$), had the following form
where ${x}_{0}$ is a free parameter. This form results from a parameter grid search for $M=1,2,3$ and $4$, under the assumption that $L\to \infty $ and $\widehat{r}\left(k\right)$ is highly peaked. However, our numerical results in the general case (Figure 16A), using the FISTA algorithm, indicate that the locally optimal solution does not change much even if $L$ is finite, and $\widehat{r}\left(k\right)$ is not highly peaked. Specifically, it has a similar form
Since $\widehat{J}\left(m{k}_{*}\right)$ is rapidly decaying (Figure 16A), effectively only the first few components are nonnegligible, as in Equation 33. This can also be seen in the value of the objective obtained in the parameter scan
where the contribution of additional high frequency components to the objective quickly becomes negligible. In fact, the value of the objective cannot increase above $0.25$, as we explain in the next section.
And so, the main difference between Equations 33 and 34 is the base frequency, ${k}_{*}$, which is slightly different from ${k}_{\u2020}$. As explained in the appendix, the relation between ${k}_{*}$ and ${k}_{\u2020}$ depends on the $k$lattice discretization, as well as on the properties of $\widehat{r}\left(k\right)$.
2D Solutions
The 1D properties, described in the previous section, generalize to the 2D case in the following manner:
There is a nonnegative DC component $\text{(}\mathbf{\text{k}}=\text{(}0,0\text{))}$.
A small 'basis set' of components ${\left\{{\mathbf{k}}_{*}^{\left(\mathrm{i}\right)}\right\}}_{i=1}^{B}$with similar amplitudes, and with similar radii ${\mathbf{k}}_{*}^{\left(i\right)}$ which are all 'close' to ${k}_{\u2020}$ (details below).
All other nonzero Fourier components are weaker, and restricted to the lattice
Interestingly, given these properties of the solution we already get hexagonal patterns, as we explain next.
Similarly to the 1D case, the difference between $\Vert {k}_{*}^{\left(i\right)}\Vert $ and ${k}_{\u2020}$ is affected by lattice discretization, and the curvature of $\widehat{r}\left(k\right)$ near ${k}_{\u2020}$. To simplify matters, we focus first on the simple case that $L\to \infty $ and $\widehat{r}\left(k\right)$ is sharply peaked around ${k}_{\u2020}$. Therefore, the Fourier objective becomes $\sum _{i=1}^{B}{\left\widehat{J}\left({k}_{*}^{\left(i\right)}\right)\right}^{2}$, so the only Fourier components that appear in the objective are ${\left\{{k}_{*}^{\left(i\right)}\right\}}_{i=1}^{B}$, which have radius ${k}_{\u2020}$. We examine the values this objective can have.
All the base components have the same radius. This implies, according to the Crystallographic restriction theorem in 2D, that the only allowed lattice angles (in the range between 0 and 90 degrees) are 0, 60 and 90 degrees. Therefore, there are only three possible lattice types in 2D. Next, we examine the value of the objective for each of these lattice types:
1) Square lattice, in which ${\mathbf{\text{k}}}_{*}^{\left(1\right)}={\mathbf{\text{k}}}_{\u2020}(1,0),{\mathbf{\text{k}}}_{*}^{\left(2\right)}={\mathbf{\text{k}}}_{\u2020}(0,1)$, up to a rotation. In this case,
and the value of the objective is bounded above by $0.25$ (see proof in appendix).
2) 1D lattice, in which ${\mathbf{\text{k}}}_{*}^{\left(1\right)}={\mathrm{k}}_{\u2020}(1,0)$, up to a rotation. This is a special case of the square lattice, with a subset of ${\widehat{J}}_{{m}_{x},{m}_{y}}^{}$equal to zero, so we can write, as we did in the 1D case
Therefore, the same objective upper bound, $0.25$, holds. Note that some of the solutions we found numerically are close to this bound (Equation 35).
3) Hexagonal lattice, in which the base components are
up to a rotation by some angle $\alpha $. Our parameter scans indicate that the objective value cannot surpass $0.2$ in any solution composed of only the base hexgonal components ${\left\{{\mathbf{k}}_{*}^{\left(m\right)}\right\}}_{m=1}^{3}$ and a DC component. However, taking into account also some higher order lattice components, we can find a better solution, with an objective value of $0.2558$. Though this is not necessarily the optimal solution, it surpasses any possible solutions on the other lattice types (bounded below $0.25$, as we proved in the appendix). Specifically, this solution is composed of the base vectors ${\left\{{k}_{*}^{\left(m\right)}\right\}}_{m=1}^{3}$ and their harmonics
with ${\mathbf{\text{k}}}_{*}^{\left(4\right)}={\mathbf{\text{2k}}}_{*}^{\left(1\right)}$, ${\mathbf{\text{k}}}_{*}^{\left(5\right)}={\mathbf{\text{2k}}}_{*}^{\left(2\right)}$, ${\mathbf{\text{k}}}_{*}^{\left(6\right)}={\mathbf{\text{2k}}}_{*}^{\left(3\right)}$, ${\mathbf{\text{k}}}_{*}^{\left(7\right)}={\mathbf{\text{k}}}_{*}^{\left(1\right)}+{\mathbf{\text{k}}}_{*}^{\left(2\right)}$, ${\mathbf{\text{k}}}_{*}^{\left(8\right)}={\mathbf{\text{k}}}_{*}^{\left(1\right)}+{\mathbf{\text{k}}}_{*}^{\left(3\right)}$. Also, ${\widehat{J}}_{0}=0.6449$, ${\widehat{J}}_{1}={\widehat{J}}_{2}={\widehat{J}}_{3}=0.292$, ${\widehat{J}}_{4}={\widehat{J}}_{5}={\widehat{J}}_{6}=0.0101$ and ${\widehat{J}}_{7}={\widehat{J}}_{8}=0.134$.
Thus, any optimal solution must be on the hexagonal lattice, given our approximations. In practice, the lattice hexagonal basis vectors do not have exactly the same radius, and, as in the 1D case, this radius is somewhat smaller then ${k}_{\u2020}$, due to the lattice discretization, and due to that $\widehat{r}\left(k\right)$ is not sharply peaked. However, the resulting solution lattice is still approximately hexagonal in $k$space. For example, this can be seen in the numerically obtained solution in Figure 16B – where the strongest nonDC Fourier components form an approximate hexagon near ${k}_{\u2020}$, from the Fourier components A, defined in Figure 17.
Grid spacing
In general, we get a hexagonal grid pattern in $x$space. If all base Fourier components have a radius of ${k}_{\u2020}$, then the grid spacing in $x$space would be $4\pi /\left(\sqrt{3}{k}_{\u2020}\right)$. Since the radius of the basis vectors can be smaller than ${k}_{\u2020}$, the value of $4\pi /\left(\sqrt{3}{k}_{\u2020}\right)$ is a lower bound to the actual grid spacing (as demonstrated in Figure 12A), up to lattice discretization effects.
Grid alignment
The angle of the hexagonal grid, $\alpha $, is determined by the directions of the hexagonal vectors. An angle $\alpha $ is possible, if there exists a $k$lattice point $\mathbf{k}=\frac{2\pi}{L}(m,n)$ (with $m,n$ integers), for which $\frac{\pi}{L}>\sqrt{\frac{2\pi}{L}\left({m}^{2}+{n}^{2}\right){k}_{*}^{2}}$, and then $\alpha =\mathrm{arctan}\frac{n}{m}$. Since the hexagonal lattice has rotational symmetry of ${60}^{\circ}$, we can restrict $\alpha $ to be in the range ${30}^{\circ}\le \alpha \le {30}^{\circ}$. The grid alignment, which is the minimal angle of the grid with the box boundaries is given by
which is limited to the range $\left[{0}^{\circ},{15}^{\circ}\right]$, since ${30}^{\circ}\le \alpha \le {30}^{\circ}$. There are usually several possible grid alignments which are (approximately) rotated versions of each other (i.e., different $\alpha $). Note that, due to the $k$lattice discretization, different alignments can result in slightly different objective values. However, the numerical algorithms we used to solve the optimization problem reached many possible grid alignments with a positive probability (Figure 12C), since we started from a random initialization and converged to a local minimum.
In the limit $L\to \infty $, the grid alignment will become uniform in the range $\left[{0}^{\circ},{15}^{\circ}\right]$, and the average grid alignment is ${7.5}^{\circ}$.
Hierarchical networks and modules
There are multiple routes to generalize nonnegative PCA with multiple vectors. In this paper we chose to do so using a 'GrammSchmidt' like process, which can be written in the following way. First we define,
and then, recursively, this process recovers nonnegative 'eigenvectors' by subtracting out the previous components, similarly to Sanger’s multiple PCA algorithm (Sanger, 1989), and enforcing the nonnegativity constraint.
To analyze this, we write the objectives we maximize in the Fourier domain, using Parseval’s Theorem.
For n =1, we recover the old objective (Equation 24):
For n =2, we get
where ${\hat{J}}^{*}$ is the complex conjugate of $\hat{J}$. This objective is similar to the original one, except that it penalizes $\widehat{J}\left(k\right)$ if its components are similar to those of ${\widehat{J}}_{1}\left(k\right)$. As n increases the objective becomes more and more complicated, but as before, it contains terms which penalize ${\widehat{J}}_{n}\left(k\right)$ if its components are similar to any of the previous solutions (i.e., ${\widehat{J}}_{m}\left(k\right)$ for $m<n$). This form suggests that each new 'eigenvector' tends to occupy new points in the Fourier lattice (similarly to unconstrained PCA solutions).
For example, the numerical solution shown in Figure 16B is composed of the Fourier lattice components in group A, defined in Figure 17. A completely equivalent solution would be in group B (it is just a 90 degrees rotation of the first). The next 'eigenvectors' should then include other Fourierlattice components outside groups A and B. Note that components with smaller $k$radius cannot be arranged to be hexagonal (not even approximately), so they will have a low gridness score. In contrast, the next components with higher $k$radius (e.g., group C) can form an approximately hexagonal shape together, and would appear as an additional grid cell 'module'. The grid spacing of this new module will decrease by $\sqrt{2}$, since the new $k$radius is about $\sqrt{2}$ times larger than the $k$radius of groups A and B.
Appendix
Movement schema of agent and environment data
The relevant simulation data used:
Size of arena 10X10  Place cells field width: 0.75  Place cells distribution: uniform 

Velocity: 0.25 (linear), 0.16.3 (angular)  # Place cells: 625  Learning rate: 1/(t+1e5) 
The agent was moved around the virtual environment according to:
where ${D}^{t}$ is the current direction angle, $\omega $ is the angular velocity, $ZN(0,1)$ where $N$ is the standard normal Gaussian distribution, $\nu $ is the linear velocity, and $\left({x}^{t},{y}^{t}\right)$ is the current position of the agent. Edges were treated as periodic – when agent arrived to one side of box it was teleported to the other side.
Positivenegative disks
Positivenegative disks are used with the following activity rules:
Where ${c}_{x},{c}_{y}$ are the centers of the disks, and ${\rho}_{1},{\rho}_{2}$ are the radii of the inner circle and the outer ring, respectively. The constant value in the negative ring was chosen to yield zero integral over the disk.
Fourier transform of the difference of Gaussians function
Here we prove that if we are given a difference of Gaussians function,
in which the appropriate normalization constants for a bounded box are
${c}_{i}={\left[{\displaystyle {\int}_{L}^{L}\mathrm{exp}}\left(\frac{{z}^{2}}{2{\sigma}_{i}^{2}}\right)dz\right]}^{1}={\left[\sqrt{2\pi {\sigma}_{i}^{2}}\left[2F\left(L/{\sigma}_{i}\right)1\right]\right]}^{D}$,
with $F\left(x\right)=\frac{1}{\sqrt{2\pi {\sigma}_{i}^{2}}}{\displaystyle {\int}_{\infty}^{x}\mathrm{exp}}\left(\frac{{z}^{2}}{2{\sigma}_{i}^{2}}\right)dz$ being the cumulative normal distribution, then $\forall \mathbf{k}\in \hat{S}={\left\{\left(\frac{{m}_{1}2\pi}{L},\dots ,\frac{{m}_{D}2\pi}{L}\right)\right\}}_{\left({m}_{1}\mathit{,}\mathit{\dots}\mathit{,}{m}_{D}\right)\in {\mathrm{\mathbb{Z}}}^{D}}$, we have
Note that the normalization constants ${c}_{i}$ vanish after the Fourier transform, and that in the limit $L\gg {\sigma}_{1},{\sigma}_{2}$ we obtain the standard result for the Fourier transform of an unbounded Gaussian distribution.
Proof: For simplicity of notation, we assume $D=2$. However, the calculation is identical for any $D$.
To solve this integral, we define
Its derivative is
where in the last equality we used the fact that $aL=2\pi n$ (where $n$ is an integer) if $a\in \widehat{S}$. Solving this differential equation, we obtain
where
substituting this into our last expression for $r\left(\mathbf{k}\right)$ we obtain
which is what we wanted to prove.
Why the 'unconstrained' ${k}_{\u2020}$ is a lower bound on 'constrained' ${k}_{*}$
In section 'The PCA solution with a nonnegative constraint – 1D Solutions', we mention that the main difference between the Equations (Equations 33 and 34) is the base frequency, ${k}_{*}$, which is slightly different from ${k}_{\u2020}$.
Here we explain how the relation between ${k}_{*}$ and ${k}_{\u2020}$ depends on the $k$lattice discretization, as well as the properties of $\widehat{r}\left(k\right)$. The discretization effect is similar to the unconstrained case, and can cause a difference of at most $\pi /L$ between ${k}_{*}$ to ${k}_{\u2020}$. However, even if $L\to \infty $, we can expect ${k}_{*}$ to be slightly smaller than ${k}_{\u2020}$. To see that, suppose we have a solution as described above, with ${k}_{*}={k}_{\u2020}+\delta k$ and $\delta k$ is a small perturbation (which does not affect the nonnegativity constraint). We write the perturbed $k$space objective of this solution
Since ${k}_{\u2020}$ is the peak of ${\left\widehat{r}\left(k\right)\right}^{2}$, if ${\left\widehat{r}\left(k\right)\right}^{2}$ is monotonically decreasing for $k>{k}_{\u2020}$ (as is the case for the difference of Gaussians function), then any positive perturbation $\delta k>0$ would decrease the objective.
However, a sufficiently small negative perturbation $\delta k<0$ would improve the objective. We can see this from the Taylor expansion of the objective,
In which the derivative ${\frac{d{\left\widehat{r}\left(k\right)\right}^{2}}{dk}}_{k=m{k}_{\u2020}}$ is zero for $m=1$ (since ${k}_{\u2020}$ is the peak of ${\left\widehat{r}\left(k\right)\right}^{2}$) and negative for $m>1$, if ${\left\widehat{r}\left(k\right)\right}^{2}$ is monotonically decreasing for $k>{k}_{\u2020}$. If we gradually increase the magnitude of this negative perturbation $\delta k<0$, at some point the objective will stop increasing and start decreasing, because, for the difference of Gaussians function, ${\left\widehat{r}\left(k\right)\right}^{2}$ increases more sharply for $k<{k}_{\u2020}$ than it decreases for $k>{k}_{\u2020}$. We can thus treat the 'unconstrained' ${k}_{\u2020}$ as an upper bound for the 'constrained' ${k}_{*}$, if we ignore discretization effects (i.e., the limit $L\to \infty $).
Upper bound on the constrained objective – 2D square lattice
In the Methods section 'The PCA solution with a nonnegative constraint – 2D Solutions' we examine different 2D latticebased solutions, including a square lattice, which is a solution of the following constrained minimization problem
Here we prove that the objective is bounded above by $\frac{1}{4}$, i.e., ${\widehat{J}}_{1,0}^{2}+{\widehat{J}}_{0,1}^{2}\le \frac{1}{4}$.
Without loss of generality we assume ${\varphi}_{1,0}={\varphi}_{0,1}=0,{k}_{\u2020}=1$ (we can always shift and scale $x,y$). We denote
and
We examine the domain ${\left[0,2\pi \right]}^{2}$. We denote by ${C}^{\pm}$ the regions in which $P\left(x,y\right)$ is positive or negative, respectively. Note that
so both regions have the same area, and
From the nonnegativity constraint (2), we must have
Also, note that $P\left(x,y\right)$ and $A\left(x,y\right)$ do not share any common Fourier (cosine) components. Therefore, they are orthogonal, with the inner product being the integral of their product on the region ${\left[0,2\pi \right]}^{2}$
$\begin{array}{lll}0& =& {\displaystyle \underset{{[0,2\pi ]}^{2}}{\int}}P(x,y)A(x,y)dxdy\\ & =& {\displaystyle \underset{{C}^{+}}{\int}}P(x,y)A(x,y)dxdy+{\displaystyle \underset{{{C}^{}}^{}}{\int}}P(x,y)A(x,y)dxdy\\ & \le & {\displaystyle \underset{{C}^{+}}{\int}}P(x,y)A(x,y)dxdy{\displaystyle \underset{{C}^{+}}{\int}}{P}^{2}(x,y)dxdy\end{array}$,
where in the last line we used the bound from Equation (A2), and then Equation (A1). Using this result, together with CauchySchwartz inequality, we have
So,
Summing this equation with Equation (A2), squared and integrated over the region ${C}^{}$, and dividing by ${\left(2\pi \right)}^{2}$, we obtain
using the orthogonality of the Fourier components (i.e., Parseval’s theorem) to perform the integrals over ${P}^{2}\left(x,y\right)$ and ${A}^{2}(x,y)$, we get
Lastly, plugging in the normalization constraint $\sum _{{m}_{x}=\infty}^{\infty}{\displaystyle \sum _{{m}_{y}=\infty}^{\infty}{\widehat{J}}_{{m}_{x},{m}_{y}}^{2}}}=1$, we find
as required.
References

1
Experiencedependent rescaling of entorhinal gridsNature Neuroscience 10:682–684.https://doi.org/10.1038/nn1905

2
A fast iterative shrinkagethresholding algorithm for linear inverse problemsSIAM Journal on Imaging Sciences 2:183–202.https://doi.org/10.1137/080716542

3
Grid cells in pre and parasubiculumNature Neuroscience 13:987–994.https://doi.org/10.1038/nn.2602

4
Grid cells require excitatory drive from the hippocampusNature Neuroscience 16:309–317.https://doi.org/10.1038/nn.3311
 5
 6

7
Recurrent inhibitory circuitry as a mechanism for grid formationNature Neuroscience 16:318–324.https://doi.org/10.1038/nn.3310

8
Asymptotics of eigenvalues and eigenvectors of Toeplitz matricesJournal of Statistical Mechanics: Theory and Experiment 2009:P05012.https://doi.org/10.1088/17425468/2009/05/P05012

9
Imaging spatiotemporal dynamics of surround inhibition in the barrels somatosensory cortexJournal of Neuroscience 23:3100–3105.

10
Space,Time and Memory in the Hippocampal FormationVienna: Springer Vienna.https://doi.org/10.1007/9783709112922
 11

12
The contrast sensitivity of retinal ganglion cells of the catThe Journal of Physiology 187:517–552.

13
Slowness and sparseness lead to place, headdirection, and spatialview cellsPLoS Computational Biology 3:e166.https://doi.org/10.1371/journal.pcbi.0030166
 14
 15
 16
 17
 18
 19
 20
 21

22
Applied Mathematical Sciences, Vol. 26Stochastic Approximation Methods for Constrained and Unconstrained Systems, Applied Mathematical Sciences, Vol. 26, New York, NY, Springer New York, 10.1007/9781468493528.
 23
 24

25
Homing by path integration in a mammalNaturwissenschaften 67:566–567.https://doi.org/10.1007/BF00450672
 26

27
The mantle of the heavens: Reflections on the 2014 Nobel Prize for medicine or physiologyHippocampus, 25, 10.1002/hipo.22455.

28
The hippocampus as a spatial map. Preliminary evidence from unit activity in the freelymoving ratBrain Research 34:171–175.

29
The hippocampus as a cognitive map, Oxford, Oxford University PressThe hippocampus as a cognitive map, Oxford, Oxford University Press.

30
A simplified neuron model as a principal component analyzerJournal of Mathematical Biology 15:267–273.
 31
 32
 33
 34
 35
 36
 37
 38

39
Selforganization of grid fields under supervision of place cells in a neuron model with associative plasticityBiologically Inspired Cognitive Architectures 13:48–62.https://doi.org/10.1016/j.bica.2015.06.006
 40
 41
 42

43
Effects of visual deprivation on morphology and physiology of cells in the cat's lateral geniculate bodyJournal of Neurophysiology 26:978–993.
 44

45
Hippocampal FormationIn: G Paxinos, editors. The Rat Nervous System (3rd ed). San Diego, CA: Elsevier. pp. 635–704.https://doi.org/10.1016/B9780125476386/500225
 46

47
Models of grid cell spatial firing published 20052011Frontiers in Neural Circuits, 6, 10.3389/fncir.2012.00016.
Decision letter

Michael J FrankReviewing Editor; Brown University, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your work entitled "Extracting grid characteristics from spatially distributed place cell inputs using nonnegative PCA" for peer review at eLife. Your submission has been favorably evaluated by Timothy Behrens (Senior editor), Reviewing editor Michael Frank, and two reviewers.
The reviewers have discussed the reviews with one another and the Reviewing editor has drafted this decision to help you prepare a revised submission.
Summary:
The authors present analytical and neural network simulations which demonstrate that, under certain constraints, the first principal component of place cell firing patterns in a 2D arena resemble hexagonal grid cell firing patterns. Specifically, these constraints require that the principal components be constrained to nonnegativity, and that the place cell inputs have a zero mean in time or space. In addition, they suggest that interesting features of grid cell firing, namely the tendency for grid scale to be discretely clustered, and for the grid to align at approximately 7 degrees to the walls of a square arena. This is an important result which adds to a growing body of recent studies that have emphasised the importance of the projection from place cells to grid cells, as opposed to the grid cell to place cell projection that has been the subject of several previous theoretical studies.
As you will see however, the Reviewers felt that it was important to provide some intuition and cogent explanation about how the results were obtained, and several questions arose along these lines. It is not clear how the dual constraints of (eigenvector) nonnegativity and orthogonality are actually satisfied, nor is it clear why they get 7degree alignment or a ratio of 1.4 for the jump in grid scale.
In addition to the revisions articulated below, the following suggestion arose from discussion amongst the Reviewers/editors.
Figure 13 of the manuscript tracks the evolution of the network from different initial conditions.
One might go some way in clarifying the questions of how and why the observed phenomena occur by tracking individual objective measures, in a manner similar to Figure 13.
To wit, one of the algorithms you use trades off two objectives:
(i) maximize the variance of the projection of the data onto the [nonnegative] principal component
(ii) enforce approximate orthogonormality by minimizing  I – e^{T} e, where I is the identity matrix, and e is the normalized "eigenvector," subject to the nonnegativity constraint.
In addition, one ought to quantify the degree of nonnegativity. There are several ways one might go about this, but one approach would be to normalize the eigenvector to unit length, then take the norm of the vector composed of only the positive entries and subtract the norm of the vector of negative entries.
Essential revisions:
1) This manuscript is of sufficient general interest for publication in eLife, although there is a general lack of detail when describing the methodology. There is also significant overlap with a recent paper at NIPS 2014 (Stachenfeld et al., 2014). I find both papers (Dordek et al., submitted, and Stachenfeld et al.) extremely interesting, and they both have different focusses. But I do think that there needs to be some link to/discussion of Stachenfeld et al.
2) Perhaps the most important issue is that they provide little intuition into the interesting subsidiary results. There is a section on "Why Hexagons?" but none on why other aspects of the results occur. This becomes more important given the fact that Stachenfeld also see the basic result. Why do grid scales have a ratio of 1.4 (indeed is this figure reliable)? Why do they see a nonzero alignment of grids to borders (and is this figure reliable)? Alignment angle appears to depend on place field size, but how does overall place cell firing coverage (and covariance) vary with place field size? How does the alignment angle covary with gridness (the very nice gridlike patterns they show tend to look aligned)?
3) "Due to the isotropic nature of the data (generated by the agent's motion), there was a 2D redundancy in the XY plane in conjunction with a dual phase redundancy in 1D, resulting in a fourfold total redundancy of every solution" – this statement requires unpacking. Presumably the dual phase redundancy is caused by the periodic boundary conditions? What are the implications for nonperiodic boundary conditions? This sentence needs to be explained to the nonexpert reader.
4) Similarly, Figure 4A is cryptic – how does the figure show fourfold redundancy? The authors also use a reduced number of place cells to obtain this result (225 vs. 625 in other simulations) – what are the reasons for this change? This should be explained and justified in the text.
5) More generally, when changing place field size, what happens to the number of place cells – presumably the density of firing overlap (covariance) needs to be maintained as this will strongly affect the results?
6) It is not clear how the nonnegative weights are implemented alongside zero mean inputs, and we could not gain any additional insight from the Methods section. This issue needs to be elaborated upon in the text, otherwise it seems implicitly contradictory. Does it imply that the output GC can have negative firing rates? i.e. are there positive and negative place cell firing rates (with a mean of zero), coupled with positive place cell to grid cell synaptic weights, to give both negative and positive inputs to the output grid cell? Presumably the time step to time step changes in grid cell firing rate between negative and positive values will have a large impact on the direction of synaptic weight change – is this the case?
7) Figure 7C, D; also 14B – the analytical PCA results for constrained weights appear to exhibit a bimodal distribution of 90 degree gridness scores – what in the data accounts for this? Can the authors provide any comment or insight?
"The dependency was almost linear" – presumably there is a mathematical reason for this, based on the relationship between the covariance matrix and eigenvectors? Can the authors provide any comment or insight into this relationship?
8) The methods are generally lacking in detail that would allow these simulations to be replicated, based on details provided in the manuscript (see specific details below). The authors should ensure that sufficient detail to allow replication is incorporated. They may also like to consider uploading their code to ModelDB, or a similar online repository, for the sake of transparency and to allow independent replication.
9) The representation of space lies on a torus in this model, which means that the eigenvectors should be periodic and the spatial frequencies quantized. We fail to grasp why, in Figure 3, the first spatial frequency should be 2 and not 1 (i.e., we have two peaks in each direction), and why the eigenvectors 3B is periodic, but in 3A they are not.
10) The eigenvectors of a correlation matrix are orthogonal. If we normalize these eigenvectors and furthermore enforce an additional constraint that the entries of each eigenvector must be positive semidefinite, then the "synaptic weights" of different eigenvectors should not overlap. In other words, if an entry in the first eigenvector is nonzero, it ought to be zero in all other eigenvectors.
11) While the algorithms the authors use relax this last constraint, it is not clear how skewing or shearing a square pattern that results from PCA helps satisfy the dual constraints of nonnegativity and orthogonality.
This is a key point in the paper, and begs for an intuitive explanation, or at least more analysis or quantification of the degree to which the constraints are satisfied.
12) The finding that the spacings corresponding to successive eigenvectors obey ~ $\sqrt{2}$ also calls for an explanation. Why should this ratio of spacings best satisfy nonnegativity, orthogonality, and periodicity?
[Editors' note: further revisions were requested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled "Extracting grid characteristics from spatially distributed place cell inputs using nonnegative PCA" for further consideration at eLife. Your revised article has been favorably evaluated by Timothy Behrens (Senior editor), Reviewing editor Michael Frank, and two reviewers, one of whom is Neil Burgess. The manuscript has been improved but both reviewers have identified some remaining issues that need to be addressed before acceptance, as articulated in the comments below:
Reviewer #1:
The authors have worked hard to provide further intuition and interpretation to their results, with much success, although there are still some points that should be clarified, see comments below.
There is also a plain error that should be removed. The authors state "the distribution of grid orientations becomes uniform in the range of 015 degrees, with a mean (and median) at 7.5 degrees, similar to experimental data [Stensola et al., 2015]". On the contrary the experimental data shows a very nonuniform distribution clearly peaked at about 7.5 degrees, which is quite different. Also, as a circular variable limited to [0,15) – albeit one for which 16 degrees corresponds to 14 degrees rather than 1 degree – the 'mean' and 'median' they refer to are undefined if the distribution is uniform. Thus the authors should acknowledge that they do not provide an explanation for this aspect of the experimental data, and some of the text/ results on this point could be removed, most importantly in the Abstract.
At the start of describing the main result (Results, first paragraph) is not clear that the "1d mapping of the 2d activity" is caused by the trajectory of the rat moving through the place fields.
The next part of the description of their PCA analysis as Eigenvectors and their spatial interpretation could also be improved in terms of intuition. It would be worth mentioning that the PCA, in addition to being calculated as the Eigenvectors of the matrix of temporal covariance between inputs, can also be thought of in terms of singular value decomposition of the (appropriately normalised) input activity. I.e. the input to the network (temporal sequences of place cell activity) can be decomposed into an outer product of (spatial) weights over place cell inputs (their Eigenvectors) and temporally varying weights over these Eigenvectors. The network learns the spatial weights over place cells (the Eigenvectors) as the connections weights from the place cells, and "projection onto place cell space" is simply the firing rates of the output neuron plotted against the location of the rat.
In the second paragraph of the subsection “Adding a nonNegativity constraint to the PCA “it is important to remind the reader that the "raw place cells activity" is not the final input to the PCA numerical methods – spatial or temporal mean normalization has to happen first.
It would be useful to have a surface plot showing Gridness vs. PC Density vs. PC Width. I have found it hard to replicate the Gridness distribution and wonder how general it is to the precise place cell inputs used? The reference to Figure 10A in the third paragraph of the subsection “Effect of place cell parameters on grid structure “is incorrect (I think) – not showing the dependence on the width of the place fields.
Regarding the presence of "modules" of grid cells (subsection “Modules of grid cells”), it is not clear if the interpretation is that different modules correspond to different grid scales, or different Eigenvectors (even if having the same scale). Assuming the latter, it seems that the analysis predicts that there should only be two modules, with the scale of the first module approximately 7.5x the largest place field inputs, and the second module (~$\sqrt{2}$ smaller) existing only to enable nonnegative PCA solutions. This deserves discussion, given that the data show at least 5 modules, with a range of scales, the smallest and most numerous having approximately the scale of the smaller place fields found in the dorsal hippocampus (2530 cm).
Reviewer #2:
The new material, starting with the subsection “Steady state analysis “is great, and just the thing I was asking for, namely an understanding as to why hexagonal patterns result under a nonnegative PCA learning rule.
There are a few things I'd like to understand better, though:
The last equation in the section “2D Solutions”: should read ϕ_{1} = ϕ_{2} = ϕ_{3}n 2 π/3, n ϵ Z. e.g. ϕ_{1} = ϕ_{2} = ϕ_{3}= 0 would be a solution.
As written, ϕ_{1} = ϕ_{2} = ϕ_{3}, is not the maximum. Besides, this more general case would allow solutions that are unlike the ones observed in experiments.
This brings me to
In the subsection “2D solutions”:
"To do this [maximize the Fourier components inside the basis set,] we must maximize the minimal value of the cosine components for the basis set"
I am afraid this argument escapes me, even though its role is crucial.
While I think I understand the Fourier domain argument up to this point in the text, how exactly does maximizing the minimum value in real space of the function guarantee that the Fourier components in Eq. 24 are maximized?
Eq. 34: why is the upper limit here ∞, when it was M in Eq. 33?
I'd like some clarification from the authors before I can make a judgment call as to whether the argument is likely to be correct.
https://doi.org/10.7554/eLife.10094.027Author response
As you will see however, the Reviewers felt that it was important to provide some intuition and cogent explanation about how the results were obtained, and several questions arose along these lines. It is not clear how the dual constraints of (eigenvector) nonnegativity and orthogonality are actually satisfied, nor is it clear why they get 7degree alignment or a ratio of 1.4 for the jump in grid scale. Reanalysis demonstrates that we tend to get values which are on average a little lower than 7 degrees (Author response images 3 and 4). However, the range of possible values is from 0 to 15 degrees, and we demonstrate analytically (in the new extended Methods section) that in the case of a very large box, the distribution of orientation angles becomes uniform in the interval 015, with a mean at 7.5 degrees, consistent with experimental data.
The jump of 1.4 results from the change in the circular components of the solution in Fourier space (See especially new Figure 17 in Methods section of paper). From running more simulations and also from analytical considerations, we believe that within our theoretical framework there is only one jump, and thus this model can really only explain 2 modules and not more (see analytical derivation for more intuition on this point).
In addition to the new extended Methods section, we have updated the Abstract and Results (throughout) to account for these changes.
In addition to the revisions articulated below, the following suggestion arose from discussion amongst the Reviewers/editors. […]
In addition, one ought to quantify the degree of nonnegativity. There are several ways one might go about this, but one approach would be to normalize the eigenvector to unit length, then take the norm of the vector composed of only the positive entries and subtract the norm of the vector of negative entries.
We thank the editors and reviewers for raising this essential issue. In our understanding, the main issue is to comprehend to what extent we can combine maximal variance, orthogonality, and positivity together, when looking at multiple outputs (such as in old Figures 11 and 12, or new Figures 13 and 14). As we demonstrate now, the outputs from the method are not orthogonal (Author response image 1), and this we believe solves the conundrum. We now added a short comment about this into the Results section: “It is important to note though that, due to the nonnegativity constraint, the vectors achieved in this way were not orthogonal, and thus it cannot be considered a real orthogonalization process, although as explained in the Methods section, the process does aim for maximum difference between the vectors”
Essential revisions: 1) This manuscript is of sufficient general interest for publication in eLife, although there is a general lack of detail when describing the methodology.
Additional detail has been added to the Methods. Furthermore, extensive sections have been added to the Methods in order to provide an analytical motivation to the whole paper (from the subsection “Steady state analysis”). See detailed points below in the section related to Methods. Furthermore, we have uploaded our MATLAB code to a public server, as described below.
There is also significant overlap with a recent paper at NIPS 2014 (Stachenfeld et al., 2014). I find both papers (Dordek et al., submitted, and Stachenfeld et al) extremely interesting, and they both have different focusses. But I do think that there needs to be some link to/discussion of Stachenfeld et al.
Stachnfeld et al. arrive to similar ideas in a very nice paper based on a reinforcement learning model of place cells. Unlike our paper, their paper does not contain the positivity constraint, and thus the “grid cells” that results from the PCA procedure they use are square rather than hexagonal. Furthermore, they did not provide a theoretical analysis of the nature provided in the revision. We now discuss their paper in the Discussion, as follows: “We note that similar work has noticed the relation between placecelltogridcell transformation and PCA.[…] However, due to the unconstrained nature of their transformation, the outputted grid cells were squarelike.”
2) Perhaps the most important issue is that they provide little intuition into the interesting subsidiary results. There is a section on "Why Hexagons?" but none on why other aspects of the results occur. This becomes more important given the fact that Stachenfeld also see the basic result. Why do grid scales have a ratio of 1.4 (indeed is this figure reliable)?
Following the new analytical derivation in the Methods section (from the subsection “Steady state analysis”), we can now explain the jump of scale as a hop on the Fourierspace lattice (new Figure 17 in paper).
We have now rerun the multiple solutions in order to demonstrate a jump between two consecutive scales. However, we find that it is harder to see the jump to the next smaller third scale, and thus we conclude that the method only partially accounts for the effect of modules. This is now noted in the Results as follows: “[…]we found that the ratio between the distances of the modules was ~1.4, close to the value of 1.42 found by Stensola et al. [Stensola et al., 2012]. Although we searched for additional such jumps, we could only identify this single jump, suggesting that our model can yield up to two “modules” and not more.”
Why do they see a nonzero alignment of grids to borders (and is this figure reliable)?
As we now show in our analytical derivation, the nonzero alignment can arise from the relation between the lattice points in the Fourier domain and the optimalsolution circle. This is now explainedthoroughly in the new sections added to the Methods from the subsection “Steady state analysis” and in Figure 15–16 in paper.
Alignment angle appears to depend on place field size, but how does overall place cell firing coverage (and covariance) vary with place field size?
In our previous submission we manipulated the place field size while keeping the size of the arena and number of place cells constant. Thus the coverage has increased as a function of the increase in size. Here (Author response image 2) we show an example of increasing both the place field size and the size of the arena, while keeping the placecellsize/arena size constant, such that the total coverage increases (a place cell on every pixel).
How does the alignment angle covary with gridness (the very nice gridlike patterns they show tend to look aligned)?
In Author response image 2b we plot grid orientation as a function of gridness. As can be seen the two quantities are not strongly related:
3) "Due to the isotropic nature of the data (generated by the agent's motion), there was a 2D redundancy in the XY plane in conjunction with a dual phase redundancy in 1D, resulting in a fourfold total redundancy of every solution" – this statement requires unpacking. Presumably the dual phase redundancy is caused by the periodic boundary conditions?
We have now devoted a large section in the Methods to analytical considerations, which can explain the fourfold redundancy. Thus we changed the wording of the above paragraph to the following: “The solution demonstrated a fourfold redundancy (Figure 4C). This was apparent in the plotted eigenvalues (from largest to the smallest eigenvalue, Figure 4A and 4C), which demonstrated a fourfold groupingpattern. The fourfold redundancy can be explained analytically by the symmetries of the system – see PCA analysis in Methods section.”
What are the implications for nonperiodic boundary conditions? This sentence needs to be explained to the nonexpert reader.
In the case of very large arenas, it does not matter what the boundary conditions are. In smaller arenas, the nature of the boundary conditions (i.e. how the place cells behave at the boundary) has an effect. It seems from our simulations that the boundary conditions matter less for the scale of the grid and more for the grid orientation (Compare Author response images 3 and 4). We now note in the Results section: “[…]for very large environments the effects of boundary conditions diminish.”.
</Author response image 4 title/legend>
4) Similarly, Figure 4A is cryptic – how does the figure show fourfold redundancy? The authors also use a reduced number of place cells to obtain this result (225 vs. 625 in other simulations) – what are the reasons for this change? This should be explained and justified in the text. We have improved the figure by increasing the resolution and using the same number of points for both subfigures. Now the fourfold redundancy is quite clear (Figure 4C). We updated Figure 4 to contain all 625 values, and have also added an additional panel to emphasize the fourfold redundancy.
5) More generally, when changing place field size, what happens to the number of place cells – presumably the density of firing overlap (covariance) needs to be maintained as this will strongly affect the results?
Author response image 2 above demonstrates that results are not strongly dependent on the place cell overlap.
6) It is not clear how the nonnegative weights are implemented alongside zero mean inputs, and we could not gain any additional insight from the Methods section. This issue needs to be elaborated upon in the text, otherwise it seems implicitly contradictory. Does it imply that the output GC can have negative firing rates? i.e. are there positive and negative place cell firing rates (with a mean of zero), coupled with positive place cell to grid cell synaptic weights, to give both negative and positive inputs to the output grid cell?
The grid cells were not clipped to zero so they did obtain negative values. In order to realize this in real neurons we will need to assume a general baseline firing which will account for the fact that real neurons do not have negative firing rates.
Presumably the time step to time step changes in grid cell firing rate between negative and positive values will have a large impact on the direction of synaptic weight change – is this the case?
Yes, this is indeed the case, as far as we understand the comment.
7) Figure 7C, D; also 14B – the analytical PCA results for constrained weights appear to exhibit a bimodal distribution of 90 degree gridness scores – what in the data accounts for this? Can the authors provide any comment or insight?
The bimodal distribution was a result of the formation of two populations, one with small orientations to one of the walls and another with small orientation to the other wall. We believe this is not an essential result, but cannot provide a full explanation for this phenomenon.
"The dependency was almost linear" – presumably there is a mathematical reason for this, based on the relationship between the covariance matrix and eigenvectors? Can the authors provide any comment or insight into this relationship?
This relation is now explained from the added analytical model. The predicted value provides a tight lower bound on the actual relation, as can be seen from the results of simulations in Author response image 3 and 4 above, where the actual values were that the grid scale is about 7.5 times the width of the place field.
Intuitively, the linear dependency between the place cell width and the grid cell spacing must be true in the limit of infinite box size, from dimensional analysis: the length units of the grid cell spacing must be the same length units of the place cell width, since these are the only length units in the model. This is explained mathematically in the unconstrained case by Equation 32, as we detail below that equation:
"Notice that if we multiply the place cell width by some positive constant c, then the solution 2pi/k† will be divided by c. […] When the box has finite size, klattice discretization also has a (usually small) effect on the grid spacing."
A similar reasoning can be applied to the constrained case.
8) The methods are generally lacking in detail that would allow these simulations to be replicated, based on details provided in the manuscript (see specific details below). The authors should ensure that sufficient detail to allow replication is incorporated. They may also like to consider uploading their code to ModelDB, or a similar online repository, for the sake of transparency and to allow independent replication.
We have followed the advice given and have now uploaded our code to t
This is also noted at the beginning of the Methods section.
9) The representation of space lies on a torus in this model, which means that the eigenvectors should be periodic and the spatial frequencies quantized. We fail to grasp why, in Figure 3, the first spatial frequency should be 2 and not 1 (i.e., we have two peaks in each direction),
Indeed the spatial frequencies are quantized by the box dimensions. However, the eigenvectors periodicity is not determined by the box, as we now explain mathematically in extended Methods section. Even in the case that the box size is infinite, we get a finite periodic solution, with spatial frequency equal to the peak of the Fourier transform of the place cell tuning curve – given in Equation 32. When the box size is finite this spatial frequency is also affected by the quantization due to the box dimensions (Figure 15).
and why the eigenvectors 3B is periodic, but in 3A they are not.
In Figure 3B we show multiple examples of the 1^{st} principal component as output from the network, while in Figure 3A we show the first 16 components of the PCA algorithm. Thus both figures do not show the same thing. Generally, the PCA components need not appear periodic (i.e., repeat more than once in some directions), though they do obey periodic boundary conditions. For example, PCA components 512 in Figure 3A do not appear periodic since they are composed of the Fourier components B18 in Figure 15C.
10) The eigenvectors of a correlation matrix are orthogonal. If we normalize these eigenvectors and furthermore enforce an additional constraint that the entries of each eigenvector must be positive semidefinite, then the "synaptic weights" of different eigenvectors should not overlap. In other words, if an entry in the first eigenvector is nonzero, it ought to be zero in all other eigenvectors.
In the constrained case, the solutions maximizing the variance subject to the nonnegativity constraint are no longer eigenvectors of any matrix. Thus, the second such solution described in the section “Hierarchical networks and modules” need not be orthogonal to the first, and the issues raised by the reviewers does not arise. We have added a note on this issue in the Results section: “It is important to note though that due to the nonnegativity constraint the vectors achieved this way were not orthogonal, and thus it cannot be considered a real orthogonalization process.”. See Author response image 1, demonstrating that the different vectors outputted from the simulation are indeed not orthogonal.
11) While the algorithms the authors use relax this last constraint, it is not clear how skewing or shearing a square pattern that results from PCA helps satisfy the dual constraints of nonnegativity and orthogonality. This is a key point in the paper, and begs for an intuitive explanation, or at least more analysis or quantification of the degree to which the constraints are satisfied.
We believe the above reply addresses this issue. See also Author response image 1.
12) The finding that the spacings corresponding to successive eigenvectors obey ~ $\sqrt{2}$ also calls for an explanation. Why should this ratio of spacings best satisfy nonnegativity, orthogonality, and periodicity?
We now provide an analytical motivation for this in the new revised Methods section of the paper. In short, the change of scale occurs when “jumping” from the first optimal circle in the Fourier space to next available Fourier lattice points which can form a hexagonal shape (Figure 17 in paper).
[Editors' note: further revisions were requested prior to acceptance, as described below.]
Reviewer #1: The authors have worked hard to provide further intuition and interpretation to their results, with much success, although there are still some points that should be clarified, see comments below. There is also a plain error that should be removed. The authors state "the distribution of grid orientations becomes uniform in the range of 015 degrees, with a mean (and median) at 7.5 degrees, similar to experimental data [Stensola et al., 2015]". On the contrary the experimental data shows a very nonuniform distribution clearly peaked at about 7.5 degrees, which is quite different. Also, as a circular variable limited to [0,15) – albeit one for which 16 degrees corresponds to 14 degrees rather than 1 degree – the 'mean' and 'median' they refer to are undefined if the distribution is uniform. Thus the authors should acknowledge that they do not provide an explanation for this aspect of the experimental data, and some of the text/ results on this point could be removed, most importantly in the Abstract.
We agree that the experimental data is nonuniform, and clearly peaked around 7.5^{o}. We have thus erased this point from the abstract, and removed it throughout the paper.
At the start of describing the main result (Results, first paragraph) is not clear that the "1d mapping of the 2d activity" is caused by the trajectory of the rat moving through the place fields.
This point is now clarified in the second paragraph of subsection “Comparing NeuralNetwork results to PCA).
The next part of the description of their PCA analysis as Eigenvectors and their spatial interpretation could also be improved in terms of intuition. It would be worth mentioning that the PCA, in addition to being calculated as the Eigenvectors of the matrix of temporal covariance between inputs, can also be thought of in terms of singular value decomposition of the (appropriately normalised) input activity. I.e. the input to the network (temporal sequences of place cell activity) can be decomposed into an outer product of (spatial) weights over place cell inputs (their Eigenvectors) and temporally varying weights over these Eigenvectors. The network learns the spatial weights over place cells (the Eigenvectors) as the connections weights from the place cells, and "projection onto place cell space" is simply the firing rates of the output neuron plotted against the location of the rat.
Thanks for this good point. This alternative interpretation using SVD has now been added to the in the Methods section.
In the second paragraph of the subsection “Adding a nonNegativity constraint to the PCA “it is important to remind the reader that the "raw place cells activity" is not the final input to the PCA numerical methods – spatial or temporal mean normalization has to happen first.
This reminder has now been added in the third paragraph of the section “Adding a nonNegativity constraint to the PCA”.
It would be useful to have a surface plot showing Gridness vs. PC Density vs. PC Width. I have found it hard to replicate the Gridness distribution and wonder how general it is to the precise place cell inputs used?
We have now added the following subpanels to Figure 12, in which we vary the ratio between the box size and the size of the input place fields. As can be seen, as long as the box and place field sizes are large enough, the Gridness is > 1 (left panel). Furthermore, the scale of the grid is not highly dependent on the size of the box (right panel).
The reference to Figure 10A in the third paragraph of the subsection “Effect of place cell parameters on grid structure “is incorrect (I think) – not showing the dependence on the width of the place fields.
Typo has been corrected to “Figure 12A”.
Regarding the presence of "modules" of grid cells (subsection “Modules of grid cells”), it is not clear if the interpretation is that different modules correspond to different grid scales, or different Eigenvectors (even if having the same scale). Assuming the latter, it seems that the analysis predicts that there should only be two modules, with the scale of the first module approximately 7.5x the largest place field inputs, and the second module (~$\sqrt{2}$ smaller)
existing only to enable nonnegative PCA solutions. This deserves discussion, given that the data show at least 5 modules, with a range of scales, the smallest and most numerous having approximately the scale of the smaller place fields found in the dorsal hippocampus (2530cm).
We agree. We believe that a thorough understanding of the module phenomenon will require further study. Mainly two points need to be added to a future model: (a) Looking at a nonuniform distribution of placecell sizes, similar to reality and (b) adding the feedforward projection from grid cells to place cells, thus “closing the loop”. This point has now been added to the Discussion.
Reviewer #2: The last equation in the section “2D Solutions”: should read ϕ_{1} = ϕ_{2} = ϕ_{3}n 2 π/3, n ϵ Z. e.g. ϕ_{1} = ϕ_{2} = ϕ_{3}
= 0 would be a solution. As written, ϕ_{1} = ϕ_{2} = ϕ_{3}, is not the maximum. Besides, this more general case would allow solutions that are unlike the ones observed in experiments.
This part was removed due to our revision, in response to the next comment. In any case, this was a typo (which happened due to an inaccurate change of notation from cos(k _{i} ·(xx_{o})) to cos(k _{i} ·x+ϕ_{i}). It should have been “ϕ_{i} = k_{i} ·x_{0}, for some x_{0}”.
This brings me to In the subsection “2D solutions”: "To do this [maximize the Fourier components inside the basis set,] we must maximize the minimal value of the cosine components for the basis set" I am afraid this argument escapes me, even though its role is crucial.
While I think I understand the Fourier domain argument up to this point in the text, how exactly does maximizing the minimum value in real space of the function guarantee that the Fourier components in Eq. 24 are maximized?
This argument was indeed incorrect (it is true if can we neglect the higher order harmonics in each lattice, unfortunately this cannot be done, as we verified numerically). We thank the reviewer for noting this. We changed the subsection “2D solutions” (in section “The PCA solution with a nonnegative constraint”) to correct this. First, we prove (in the appendix section “Upper bound on the constrained objective – 2D square lattice”) that the objective is bounded from above by 0.25 in any nonhexagonal 2D lattice with a single grid length (i.e., a square or 1D lattice). Note this remains true even if we consider higher order harmonics. Second, we give an example for a hexagonal lattice (found via a numerical scan) which achieves an objective value higher than 0.25. Thus, given our approximations, this shows that any optimal solution must be a hexagonal lattice
Eq. 34: why is the upper limit here ∞, when it was M in Eq. 33?
The two equations result from two different numerical procedures and assumptions. We revised the text near the two equations to explain this point:
“This form (Eq. 33) results from a parameter grid search for M =1, 2,3 and 4, under the assumption that L ⟶ ∞ and rˆ (k) is highly peaked. […] Specifically, it has similar form (Eq. 34).”
The main point is to show that both give very similar results. In other words, that our approximate description, using the simplifying assumptions and finite M, is closely related to the solution without approximations. Additional details were added to the text to clarify this important point. Also, to improve focus, we relegated to the appendix some less important details from that section
(on the origin of the difference between k_{†} and k_{*}).
https://doi.org/10.7554/eLife.10094.028Article and author information
Author details
Funding
Ollendroff center of the Department of Electrical Engineering, Technion (Research fund)
 Yedidyah Dordek
 Ron Meir
Gruss Lipper Charitable Foundation
 Daniel Soudry
Intelligence Advanced Research Projects Activity
 Daniel Soudry
Israel Science Foundation (Personal Research Grant, 955/13)
 Dori Derdikman
Israel Science Foundation (New Faculty Equipment Grant, 1882/13)
 Dori Derdikman
Rappaport Institute (Personal Research Grant)
 Dori Derdikman
Allen and Jewel Prince Center for Neurodegenrative Disorders (Research Grant)
 Dori Derdikman
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We would like to thank Alexander Mathis and Omri Barak for comments on the manuscript, and Gilad Tocker and Matan Sela for helpful discussions and advice. The research was supported by the Israel Science Foundation grants 955/13 and 1882/13, by a Rappaport Institute grant, and by the Allen and Jewel Prince Center for Neurodegenerative Disorders of the Brain. The work of RM and Y D was partially supported by the Ollendorff Center of the Department of Electrical Engineering, Technion. DD is a David and Inez Myers Career Advancement Chair in Life Sciences fellow. The work of DS was partially supported by the Gruss Lipper Charitable Foundation, and by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/Interior Business Center (DoI/IBC) contract number D16PC00003. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DoI/IBC, or the U.S. Government.
Reviewing Editor
 Michael J Frank, Brown University, United States
Publication history
 Received: July 15, 2015
 Accepted: March 8, 2016
 Accepted Manuscript published: March 8, 2016 (version 1)
 Version of Record published: April 13, 2016 (version 2)
Copyright
© 2016, Dordek et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 3,603
 Page views

 772
 Downloads

 11
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, Scopus, PubMed Central.