Placecell capacity and volatility with gridlike inputs
Abstract
What factors constrain the arrangement of the multiple fields of a place cell? By modeling place cells as perceptrons that act on multiscale periodic gridcell inputs, we analytically enumerate a place cell’s repertoire – how many field arrangements it can realize without external cues while its grid inputs are unique – and derive its capacity – the spatial range over which it can achieve any field arrangement. We show that the repertoire is very large and relatively noiserobust. However, the repertoire is a vanishing fraction of all arrangements, while capacity scales only as the sum of the grid periods so field arrangements are constrained over larger distances. Thus, griddriven place field arrangements define a large response scaffold that is strongly constrained by its structured inputs. Finally, we show that altering gridplace weights to generate an arbitrary new place field strongly affects existing arrangements, which could explain the volatility of the place code.
Introduction
As animals run around in a small familiar environment, hippocampal place cells exhibit localized firing fields at reproducible positions, with each cell typically displaying at most a single firing field (O’Keefe and Dostrovsky, 1971; Wilson and McNaughton, 1993). However, a place cell generates multiple fields when recorded in single large environments (Fenton et al., 2008; Park et al., 2011; Rich et al., 2014) or across multiple environments (Muller et al., 1987; Colgin et al., 2008), including different physical and nonphysical spaces (Aronov et al., 2017).
Within large spaces, the locations seem to be welldescribed by a random process (Rich et al., 2014; Cheng and Frank, 2011), and across spaces the placecell codes appear to be independent or orthogonal (Muller et al., 1987; Colgin et al., 2008; Alme et al., 2014), also potentially consistent with a random process. However, a more detailed characterization of possible structure in these responses is both experimentally and theoretically lacking, and we hypothesize that there might be structure imposed by grid cells in place field arrangements, especially when spatial cues are sparse or unavailable.
Our motivation for this hypothesis arises from the following reasoning: grid cells (Hafting et al., 2005) are a critical spatially tuned population that provides inputs to place cells. Their codes are unique over very large ranges due to their modular, multiperiodic structure (Fiete et al., 2008; Sreenivasan and Fiete, 2011; Mathis et al., 2012). They appear to integrate motion cues to update their states and thus reliably generate fields even in the absence of external spatial cues (Hafting et al., 2005; McNaughton et al., 2006; Burak and Fiete, 2006; Burak and Fiete, 2009). Thus, it is possible that in the absence of external cues spatially reliable place fields are strongly influenced by gridcell inputs.
To generate theoretical predictions under this hypothesis, we examine here the nature and strength of potential constraints on the arrangements of multiple place fields driven by grid cells. On the one hand, the grid inputs are nonrepeating (unique) over a very large range that scales exponentially with the number of grid modules (given roughly by the product of the grid periods), and thus rich (Fiete et al., 2008; Sreenivasan and Fiete, 2011; Mathis et al., 2012); are these unique inputs sufficient to enable arbitrary place field arrangements? On the other hand, this vast library of unique coding states lies on a highly nonlinear, folded manifold that simple readouts might not be able to discriminate (Sreenivasan and Fiete, 2011). This nonlinear structure is a result of the geometric, periodically repeating structure of individual modules (Stensola et al., 2012); should we expect place field arrangements to be constrained by this structure?
These questions are important for the following reason: a likely role of place cells, and the view we espouse here, is to build consistent and faithful associations (maps) between external sensory cues and an internal scaffold of motionbased positional estimates, which we hypothesize is derived from grid inputs. This perspective is consistent with the classic ideas of cognitive maps (O’Keefe and Nadel, 1978; Tolman, 1948; McNaughton et al., 2006) and also relates neural circuitry to the computational framework of the simultaneous localization and mapping (SLAM) problem for robots and autonomously navigating vehicles (Leonard and DurrantWhyte, 1991; Milford et al., 2004; Cadena et al., 2016; Cheung et al., 2012; Widloski and Fiete, 2014; Kanitscheider and Fiete, 2017a; Kanitscheider and Fiete, 2017b; Kanitscheider and Fiete, 2017c). We can view the formation of a map as ‘decorating’ the internal scaffold with external cues. For this to work across many large spaces, the internal scaffold must be sufficiently large, with enough unique states and resolution to build appropriate maps.
A selfconsistent placecell map that associates a sufficiently rich internal scaffold with external cues can enable three distinct inferences: (1) allow external cues to correct errors in motionbased location estimation (Welinder et al., 2008; Burgess, 2008; Sreenivasan and Fiete, 2011; Hardcastle et al., 2014), through cuebased updating; (2) predict upcoming external cues over novel trajectories through familiar spaces by exploiting motionbased updating (Sanders et al., 2020; Whittington et al., 2020); and (3) drive fully intrinsic error correction and location inference when external spatial cues go missing and motion cues are unreliable by imposing selfconsistency (Sreenivasan and Fiete, 2011).
In what follows, we characterize which arrangements of place fields are realizable based on gridlike inputs in a simple perceptron model, in which place cells combine their multiple inputs and make a decision on whether to generate a field (‘1’ output) or not (‘0’ output) by selecting input weights and a firing threshold (Figure 1A,B). However, in contrast to the classical perceptron results, which are derived under the assumption of random inputs that are in general position (a property related to the linear independence of the inputs), grid inputs to place cells are structured, which adds substantial complexity to our derivations.
We show analytically that each place cell can realize a large repertoire of arrangements across all possible space where the grid inputs are unique. However, these realizable arrangements are a special and vanishing subset of all arrangements over the same space, suggesting a constrained structure. We show that the capacity of a place cell or spatial range over which all field arrangements can be realized equals the sum of distinct grid periods, a small fraction of the range of positions uniquely encoded by gridlike inputs. Overall, we show that field arrangements generated from gridlike inputs are more robust to noise than those driven by random inputs or shuffled grid inputs.
Together, our results imply that gridlike inputs endow place cells with rich and robust spatial scaffolds, but that these are also constrained by gridcell geometry. Rigorous proofs supporting all our mathematical results are provided in Appendix 1. Portions of this work have appeared previously in conference abstract form (Yim et al., 2019).
Modeling framework
Place cells as perceptrons
The perceptron model (Rosenblatt, 1958) idealizes a neuron as computing a weighted sum of its inputs ($\mathit{x}}_{j}\in {\mathbb{R}}^{N$) based on learned input weights ($\mathit{w}\in {\mathbb{R}}^{N}$) and applying a threshold ($\theta$) to generate a binary response that is above or below threshold. A perceptron may be viewed as separating its highdimensional input patterns into two output categories ($y\in \{0,1\}$) (Figure 2A), with the categorization depending on the weights and threshold so that sufficiently weightaligned input patterns fall into category 1 and the rest into category 0:
If each partitioning of inputs into the $\{0,1\}$ categories is called a dichotomy, then the only dichotomies ‘realizable’ by a perceptron are those in which the inputs are linearly separable – that is, the set of inputs in category 0 can be separated from those in category 1 by some linear hyperplane (Figure 2). Cover’s counting theorem (Cover, 1965; Vapnik, 1998) provides a count of how many dichotomies a perceptron can realize if input patterns are random (more specifically, in general position). A set of patterns $\{{\mathit{x}}_{1},\dots ,{\mathit{x}}_{P}\}$ in an $N$dimensional space is in general position if no subset of size smaller than $N+1$ is affinely dependent. In other words, no subset of $n+1$ points lies in a $(n1)$dimensional plane for all $n\le N$. (Figure 2B) and establishes that for $P\le N$ patterns, every dichotomy is realizable by a perceptron – this is the perceptron capacity (Figure 2C). For $P=2N$, exactly half of the ${2}^{P}$ possible dichotomies are realizable; when $P\gg N$ for fixed $N$, the realizable dichotomies become a vanishing fraction of the total (Figure 2C).
Here, to characterize the placecell scaffold, we model a place cell as a perceptron receiving gridlike inputs (Figure 1B). Across space, a particular ‘field arrangement’ is realizable by the place cell if there is some set of input weights and a threshold (Lee et al., 2020) for which its summed inputs are above threshold at only those locations and below it at all others (Figure 1A,B). We call an arrangement of exactly $K$ fields a ‘Kfield arrangement.’.
In the following, we answer two distinct but related questions: (1) out of all potential field arrangements over the entire set of unique grid inputs, how many are realizable, and how does the realizable fraction differ for gridlike inputs compared to inputs with matched dimension but different structure? This is akin to perceptron function counting (Cover, 1965) with structured rather than generalposition inputs and covers constraints within and across environments. We consider all arrangements regardless of sparsity, on one extreme, and $K$field (highly sparse) arrangements on the other; these cases are analytically tractable. We expect the regime of sparse firing to interpolate between these two regimes. (2) Over what range of positions is any field arrangement realizable? This is analogous to computing the perceptronseparating capacity (Cover, 1965) for structured rather than generalposition inputs.
Although the structured rather than random nature of the grid code adds complexity to our problem, the symmetries present in the code also allow for the computation of some more detailed quantities than typically done for random inputs, including capacity computations for dichotomies with a prescribed number of positive labels (Kfield arrangements).
Results
Our approach, summarized in Figure 3, is as follows: we define a mapping from space to gridlike input codes (Figure 3A,B), and a generalization to what we call modularonehot codes (Figure 3B). We explore the geometric structure and symmetries of these codes (Figure 3C). Next, we show how separating hyperplanes placed on these structured inputs by placecell perceptrons permits the realization of some dichotomies (Figure 3D) and thus some spatial field arrangements (Figure 3E), but not others, and obtains mathematical results on the number of realizable arrangements and the separating capacity.
The structure of gridlike input patterns
Grid cells have spatially periodic responses (Figure 1A,B). Cells in one grid module exhibit a common spatial period but cover all possible spatial phases. The dynamics of each module are lowdimensional (Fyhn et al., 2007; Yoon et al., 2013), with the dynamics within a module supporting and stabilizing a periodic phase code for position. Thus, we use the following simple model to describe the spatial coding of grid cells and modules: a module with spatial period ${\lambda}_{m}$ (in units of the spatial discretization) consists of ${\lambda}_{m}$ cells that tile all possible phases in the discretized space while maintaining their phase relationships with each other. Each grid cell’s response is a $\{0,1\}$valued periodic function of a discretized 1D location variable (indexed by $j$); cell $i$ in module $m$ fires (has response 1) whenever $(ji)mod{\lambda}_{m}=0$, and is off (has response 0) otherwise (Figure 1B). The encoding of location $j$ across all Mm modules is thus an $N$dimensional vector $\mathit{x}}_{j$, where $N={\sum}_{m=1}^{M}{\lambda}_{m}$. Nonzero entries correspond to coactive grid cells at position $j$. The total number of unique grid patterns is $L={\displaystyle \text{LCM}}(\{{\lambda}_{1},\dots ,{\lambda}_{M}\})$, which grows exponentially with $M$ for generic choices of the periods $\{{\lambda}_{m}\}$(Fiete et al., 2008). We refer to $L$ as the ‘full range’ of the code. We call the full ordered set of unique coding states $\{{\mathit{x}}_{j}\}$ the gridlike ‘codebook’ ${X}_{\mathrm{g}}$.
Because ${X}_{\mathrm{g}}$ includes all unique gridlike coding states across modules, it includes all possible relative phase shifts or ‘remappings’ between grid modules (Fiete et al., 2008; Monaco et al., 2011). Thus, this fullrange codebook may be viewed as the union of all gridcell responses across all possible space and environments. We assume implicitly that 2D grid modules do not rotate relative to each other across space or environments. Permitting grid modules to differentially rotate would lead to more input pattern diversity, more realizable place patterns, and bigger separating capacity than in our present computations.
The gridlike code belongs to a more general class that we call ‘modularonehot’ codes. In a modularonehot code, cells are divided into modules; within each module only one cell is allowed to be active (the withinmodule code is onehot), but there are no other constraints on the code. With $m=1,\mathrm{\dots},M$ modules of sizes ${\lambda}_{m}$, the modularonehot codebook ${X}_{\mathrm{mo}}$ contains $P={\prod}_{m=1}^{M}{\lambda}_{m}$ unique patterns, with $P\ge L$ for a corresponding gridlike code. When $\{{\lambda}_{1},\mathrm{\cdots},{\lambda}_{M}\}$ are pairwise coprime, $P=L$ and the gridlike and modularonehot codebooks contain identical patterns. However, even in this case, modularonehot codes may be viewed as a generalization of gridlike codes as there is no notion of a spatial ordering in the modularonehot codes, and they are defined without referring to a spatial variable.
Of our two primary questions introduced earlier, question (1) on counting the size of the placecell repertoire (the number of realizable field arrangements) depends only on the geometry of the grid coding states, and not on their detailed spatial embedding (i.e., it depends on the mappings in Figure 3B–D, but not on the mapping between Figure 3A,B,D,E). In other words, it does not depend on the spatial ordering of the gridlike coding states and can equivalently be studied with the corresponding modularonehot code instead, which turns out to be easier. Question (2), on placecell capacity (the spatial range $l\le L$ over which any place field arrangement is realizable), depends on the spatial embedding of the grid and place codes (and on the full chain of Figure 3AE). For $l\S lt;L$, this would correspond to a particular rather than random subset of ${X}_{\mathrm{mo}}$, thus we cannot use the general properties of this generalized version of the gridlike code.
Alternative codes
In what follows, we will contrast place field arrangements that can be obtained with gridlike or modularonehot codes with arrangements driven by alternatively coded inputs. To this end, we briefly define some key alternative codes, commonly encountered in neuroscience, machine learning, or in the classical theory of perceptrons. For these alternative codes, we match the input dimension (number of cells) to the modularonehot inputs (unless stated otherwise).
Random codes ${X}_{\mathrm{r}}$, used in the standard perceptron results, consist of realvalued random vectors. These are quite different from the gridlike code and all the other codes we will consider, in that the entries are realvalued rather than $\{0,1\}$valued like the rest. A set of up to $N$ random input patterns in $N$ dimensions is linearly independent; thus, they have no structure up to this number.
Define the onehot code ${X}_{\mathrm{oh}}$ as the set of vectors with a single nonzero element whose value is 1. It is a singlemodule version of the modularonehot code or may be viewed as a binarized version of the random patterns since $N$ patterns in $N$ dimensions are linearly independent. In the onehot code, all neurons are equivalent, and there is no modularity or hierarchy.
Define the ‘binary’ code ${X}_{\mathrm{b}}$ as all possible binary activity patterns of $N$ neurons (Figure 4B, right). We distinguish $\{0,1\}$valued codes from binary codes. In the binary code, each cell represents a specific position (register) according to the binary number system. Thus, each cell represents numbers at a different resolution, differing in powers of 2, and the code has no neuron permutation invariance since each cell is its own module; thus, it is both highly hierarchical and modular.
The gridlike and modularonehot codes exhibit an intermediate degree of modularity (multiple cells make up a module). If the modules are of a similar size, the code has little hierarchy.
The geometry of gridlike input patterns
We first explore question $(1)$. The modularonehot codebook ${X}_{\mathrm{mo}}$ is invariant to permutations of neurons (input matrix rows) within modules, but rows cannot be swapped across modules as this would destroy the modular structure. It is also invariant to permutations of patterns (input matrix columns $\mathit{x}}_{j$). Further, the codebook includes all possible combinations of states across modules, so that modules function as independent encoders. These symmetries are sufficient to define the geometric arrangement of patterns in ${X}_{\mathrm{mo}}$, and the geometry in turn will allow us to count the number of field arrangements that are realizable by separating hyperplanes.
To make these ideas concrete, consider a simple example with module sizes $\{2,3\}$ (corresponding to the periods in the gridlike code), as in Figure 1B and Figure 3B. Independence across modules causes the code to have a product structure in the code: the codebook consists of six states that can be obtained as products of the withinmodule states: $\{10100,10010,10001,01100,01010,01001\}$ = $\{10,01\}\times \{100,010,001\}$, where $\{10,01\}$ and $\{100,010,001\}$ are the coding states within the size2 and size3 modules, respectively. We represent the two states in the size2 module by two vertices, connected by an edge, which shows allowed state transitions within the module (Figure 4A, right). Similarly, the three states in the size3 module and transitions between them are represented by a triangular graph (Figure 4A, right). The product of this edge graph and the triangle graph yields the full codebook ${X}_{\mathrm{mo}}$. The resulting product graph (Figure 4A, left) is an orthogonal triangular prism with vertices representing the combined patterns.
This geometric construction generalizes to an arbitrary number of modules $M$ and to arbitrary module sizes (periods) ${\lambda}_{m}$, $1\le m\le M$: by permutation invariance of neurons within modules, and independence of modules, the patterns of the codebook ${X}_{\mathrm{mo}}$ and thus of the corresponding gridlike codebook ${X}_{\mathrm{g}}$ always lie on the vertices of some convex polytope (e.g., the triangular prism), given by an orthogonal product of $M$ simplicies (e.g., the line and triangle graphs). Each simplex represents one of the modules, with simplex dimension ${\lambda}_{m}1$ for module size (period) ${\lambda}_{m}$ (see Placecell capacity and volatility with gridlike inputs).
This geometric construction provides some immediate results on counting: in a convex polytope, any vertex can be separated from all the rest by a hyperplane; thus, all onefield arrangements are realizable. Pairs of vertices can be separated from the rest by a hyperplane if and only if the pair is directly connected by an edge (Figure 3D). Thus, we can now count the set of all realizable twofield arrangements as the number of adjacent vertices in the polytope. Unrealizable twofield arrangements, which consist geometrically of positive labels assigned to nonadjacent vertices, correspond algebraically to firing fields that are not separated by integer multiples of either of the grid periods (Figure 3D,E).
Moreover, note that the convex polytopes obtained for the gridlike code remain qualitatively unchanged in their geometry if the nonzero activations within each module are replaced by graded tuning curves as follows: convert all neural responses within a module into graded values by convolution along the spatial dimension by a kernel that has no periodicity over distances smaller than the module period (thus, the kernel cannot, for instance, be flat or contain multiple bumps within one module period). This convolution can be written as a matrix product with a circulant matrix of full rank and dimension equal to the full range $L$. Thus, the rank of the convolved matrix ${\stackrel{~}{X}}_{g}$ remains equal to the rank of ${X}_{g}$. Moreover, ${\stackrel{~}{X}}_{g}$ maintains the modular structure of ${X}_{g}$: it has the same withinmodule permutation invariance and acrossmodule independence. Thus, the resulting geometry of the code – that it consists of convex polytopes constructed from orthogonal products of simplices – remains unchanged. As a result, all counting derivations, which are based on these geometric graphs, can be carried out for $\{0,1\}$valued codes without any loss of generalization relative to graded tuning curves. (However, the conversion to graded tuning will modify the distances between vertices and thus affect the quantitative noise robustness of different field arrangements, as we will investigate later.) Later, we will also show that the counting results generalize to higher dimensions and higherresolution phase representations within each module.
Given this geometric characterization of the gridlike and modularonehot codes, we can now compute the number of realizable field arrangements it is possible to obtain with separating hyperplanes.
Counting realizable place field arrangements
For modularonehot codes (but not for random codes), it is possible to specify any separating hyperplane using only nonnegative weights and an appropriate threshold. This is an interesting property in the neurobiological context because it means that the finding that projections from entorhinal cortex to hippocampus are excitatory (Steward and Scoville, 1976; Witter et al., 2000; Shepard, 1998) does not further constrain realizable field arrangements.
It is also an interesting property mathematically, as we explore below: combined with the withinmodule permutation invariance property of modularonehot codes, the nonnegative weight observation allows us to map the problem onto Young diagrams (Figure 5), which enables two things: (1) to move from considering separating hyperplanes geometrically, where infinitesimal variations represent distinct hyperplanes even if they do not change any pattern classifications, to considering them topologically, where hyperplane variations are considered as distinct only if they change the classification of any patterns, and (2) to use counting results previously established for Young diagrams.
Let us consider the field arrangements permitted by combining gridlike inputs from two modules, of periods ${\lambda}_{1}$ and ${\lambda}_{2}$, (Figure 5A). The total number of distinct gridcell modules is estimated to be between 5 and 8 (Stensola et al., 2012). Further, there is a spatial topography in the projection of grid cells to the hippocampus, such that each local patch of the hippocampus likely receives inputs from 2, and likely no more than 3, grid modules (Witter and Groenewegen, 1984; Amaral and Witter, 1989; Witter and Amaral, 1991; Honda et al., 2012; Witter et al., 2000). We denote cells by their outgoing weights (${w}_{ij}$ is the weight from cell $j$ in module $i$) and arrange the weights along the axes of a coordinate space, one axis per module, in order of increasing size (Figure 5B). Since modularonehot codes are invariant to permutation of the cells within a module, we can assume a fixed ordering of cells and weights in counting all realizable arrangements, without loss of generality. The threshold (dark purple line) sets which combination of summed weights can contribute to a place field arrangement: no cell combinations below the boundary (purple region) have too small a summed weight and cannot contribute, while all cell combinations with larger summed weights (white region) can (Figure 5B). Decreasing the threshold (from Figure 5B to C) or increasing weights (from Figure 5B,C to D) a sufficient amount so some cells cross the threshold increases the number of combinations. But changes that do not cause cells to move past the threshold do not change the combinations (Figure 5B, solid versus dashed gray lines).
Young diagrams extract this topological information, stripping away geometric information about analog weights (Figure 5E). A Young diagram consists of stacks of blocks in rows of nonincreasing width, with maximum width and height given in this case by the two module periods, respectively. The number of realizable field arrangements turns out to be equivalent to the total number of Young diagrams that can be built of the given maximum height and width (see Appendix 3). With this mapping, we can leverage combinatorial results on Young diagrams (Fulton and Fulton, 1997; Postnikov, 2006) (commonly used to count the number of ways an integer can be written as a sum of nonnegative integers).
As a result, the total number of separating hyperplanes (Kfield arrangements for all $K$) across the full range $L$ can be written exactly as (see Appendix 3).
where ${S}_{k}^{(n)}$ are Stirling numbers of the second kind and ${B}_{k}^{(n)}$ are the polyBernoulli numbers (Postnikov, 2006; Kaneko, 1997). Assuming that the two periods have a similar size $({\lambda}_{1}\approx {\lambda}_{2}\equiv \lambda )$, this number scales asymptotically as (de Andrade et al., 2015).
Thus, the number of realizable field arrangements with $\sim {\lambda}^{2}$ distinct modularonehot input patterns in a $2\lambda $dimensional space grows nearly as fast as ${\lambda}^{2\lambda}$, (Table 1, row 2, columns 1–3). The total number of dichotomies over these input patterns scales as ${2}^{{\lambda}^{2}}.$ Thus, while the number of realizable arrangements over the full range is very large, it is a vanishing fraction of all potential arrangements (Table 1, row 2, column 4).
If $M\ge 3$ modules were to contribute to each place field’s response, then all realizable field arrangements still would correspond to Young diagrams; however, not all diagrams would correspond to realizable arrangements. Thus, counting Young diagrams would yield an upper bound on the number of realizable field arrangements but not an exact count (see Appendix 3). The latter limitation is not a surprise: Due to the structure of the gridlike code (a product of simplices), the enumeration of realizable dichotomies with arbitrarily many input modules is expected to be at least as challenging as that of Boolean functions. Counting the number of linearly separable Boolean functions of arbitrary (input) dimension (Peled and Simeone, 1985; Hegedüs and Megiddo, 1996) is hard.
Nevertheless, we can provide an exact count of the number of realizable $K$dichotomies for arbitrarily many input modules $M$ if $K$ is small ($K=1,2,3$ and 4). This may be biologically relevant since place fields tend to fire sparsely even on long tracks and across environments. In this case, the number $\mathcal{N}}_{K$ of realizable small$K$ field arrangements scales as (the exact expression is derived analytically in Appendix 3)
The scaling approximation becomes more accurate for periods that are large relative to the spatial discretization (see Appendix 3). Since the total number of Kdichotomies scales as ${\lambda}^{MK}$, the fraction of realizable Kdichotomies scales as ${(M/\lambda )}^{K1}{\lambda}^{(M1)}$, which for $\lambda \gg 1,\lambda \S gt;M$ vanishes as a power law as soon as $M\S gt;1$.
We can compare this result with the number of Kfield arrangements realizable by onehot codes. Since any arrangement is realizable with onehot codes, it suffices to simply count all Kfield arrangements. The full range of a onehot code with $M\lambda $ cells is $M\lambda $, thus the number of realizable Kfield arrangements is $\mathcal{N}}_{K}=(\genfrac{}{}{0ex}{}{M\lambda}{K})\sim (M\lambda {)}^{K$, where the last scaling holds for $K\ll M\lambda $. In short, a onehot code enables $\sim {M}^{K}{\lambda}^{K}$ arrangements, while the corresponding modularonehot code with $M\lambda $ cells enables $\sim {M}^{K1}{\lambda}^{K+M1}$ field arrangements, for a ratio ${\lambda}^{M1}/M\gg 1$ of realizable fields with modularonehot versus onehot codes. Once again, as in the case where we counted arrangements without regard to sparseness, the gridlike code enables far more realizable Kfield arrangements than onehot codes.
In summary, place cells driven by grid inputs can achieve a very large number of unique coding states that grows exponentially with the number of modules. We have derived this result for $M=2$ and all Kfield arrangements, on one hand, and for arbitrary $M$ but ultrasparse (small$K$) field arrangements. It is difficult to obtain an exact result for sparse field arrangements for which $K$ is a small but finite fraction of $L$; however, we expect that regime should interpolate between these other two; it will be interesting and important for future work to shed light on this intermediate regime. In all cases, the number of realizable arrangements is large but a vanishingly small fraction of all arrangements, and thus forms a highly structured subset. This suggests that place cells, when driven by gridcell inputs, can form a very large number of field arrangements that seem essentially unrestricted, but individual cells actually have little freedom in where to place their fields.
Comparison with other input patterns
How does the number of realizable place field arrangements differ for input codes with different levels of modularity and hierarchy? We directly compare codes with the same neuron budget (input dimension $N$) by taking $N=M\lambda $, where for simplicity, we set ${\lambda}_{i}=\lambda $ for all modules in the modularonehot codes. This is because the modularonehot codes include all permutations of states in each module, the number of unique input states with equalsized modules still equals the product of periods $L={(N/M)}^{M}={\lambda}^{M}$, as when the periods are different and coprime. The onehot code generates far fewer distinct input patterns ($L=N=M\lambda $) than the modularonehot code, which in turn generates fewer input patterns than the binary code ($L={2}^{N}={2}^{M\lambda}$) (Table 1, column 2). This is due to the greater expressive power afforded by modularity and hierarchy.
Next, we compare results across codes for $M=2$, the case for which we have an explicit formula counting the total number of realizable field arrangements for any $K$, and which is also best supported by the biology.
How many dichotomies are realizable with these inputs? As for the modularonehot codes, the patterns of ${X}_{\mathrm{oh}}$ and ${X}_{\mathrm{b}}$ fall on the vertices of a convex polytope. For ${X}_{\mathrm{oh}}$, that polytope is just a $(N1)$dimensional simplex (Figure 4C, left), thus any subset of $K$ vertices ($1\le K\le N$) lies on a $(K1)$dimensional face of the simplex and is therefore a linearly separable dichotomy. Thus, all ${2}^{N}$ dichotomies of ${X}_{\mathrm{oh}}$ are realizable and the fraction of realizable dichotomies is 1 (Table 1, columns 3 and 4). For ${X}_{\mathrm{b}}$, the polytope is a hypercube; it therefore consists of square faces, a prototypical configuration of points not in general position (not linearly separable, Figure 2B and Figure 4, right) even when the number of patterns is small relative to the input dimension (number of cells). Counting the number of linearly separable dichotomies on vertices of a hypercube (also called linear Boolean functions) has attracted much interest (Peled and Simeone, 1985; Hegedüs and Megiddo, 1996). It is an NPhard combinatorial problem, so no exact solution exists. However, in the limit of large dimension ($N\to \mathrm{\infty}$), the number of linearly separable dichotomies scales as ${2}^{{N}^{2}/2}$(Zuev, 1989), a much larger number than for onehot inputs (Table 1, column 3). However, this number is a strongly vanishing fraction of all ${2}^{{2}^{N}}$ hypercube dichotomies (Table 1, column 4).
For modularonehot codes with $M$ modules, the polytopes contain $M$dimensional hypercubes and not all patterns are thus in general position. We determined earlier that the total number of realizable dichotomies with $M=2$ modules scales as ${\lambda}^{2\lambda}$, permitting a direct comparison with the onehot and binary codes (Table 1, row 2).
Finally, we may compare gridlike codes with random (realvalued) codes, which are the standard inputs for the classical perceptron results. For a fixed input dimension, it is possible to generate infinitely many realvalued patterns, unlike the finite number achievable by $\{0,1\}$valued codes. We thus construct a random codebook ${X}_{\mathrm{r}}$ with the same number, $P={\lambda}^{2}$, of input patterns as the modularonehot code. We then determine the input dimension $N$ required to obtain the same number of realizable field arrangements as the gridlike code. The number of realizable dichotomies of the random code with $P\gg N$ patterns scales as ${P}^{N}\sim {\lambda}^{2N}$ according to an asymptotic expansion of Cover’s function counting theorem (Cover, 1965). For this number to match $\sim {\lambda}^{2\lambda}$, the number of realizable field arrangements with a onehotmodular code (of two modules of size $\sim \lambda $ each requires) $N\sim \lambda $. This is a comparable number of input cells in both codes, which is an interesting result because unlike for random codes the gridlike input patterns are not in general position, the states are confined to be $\{0,1\}$valued, and the grid input weights can be confined to be nonnegative.
In sum, the more modular a code, the larger the set of realizable field arrangements, but these are also increasingly special subsets of all possible arrangements and are strongly structured by the inputs, with far from random or arbitrary configurations. Modularonehot codes are intermediate in modularity. Therefore, griddriven placecell responses occupy a middle ground between pattern richness and constrained structure.
Placecellseparating capacity
We now turn to question (2) from above: what is the maximal range of locations, ${l}^{*}$, over which all field arrangements are realizable? Once we reference a spatial range, the mapping of coding states to spatial locations matters (specifically, the fact that locations in the range are spatially contiguous matters, but given the fact that the code is translationally invariant [Fiete et al., 2008], the origin of this range does not). We thus call ${l}^{*}$ the ‘contiguousseparating capacity’ of a place cell (though we will refer to it as separating capacity, for short); it is the analogue of Cover’s separating capacity (Cover, 1965), but for gridlike inputs with the addition of a spatial contiguity constraint.
We provide three primary results on this question. (1) We establish that for gridstructured inputs, the separating capacity ${l}^{*}$ equals the rank $R$ of the input matrix. (2) We establish analytically a formula for the rank $R$ of gridlike input matrices with integer periods and generalize the result to realvalued periods. (3) We show that this rank, and thus the separating capacity for generic realvalued module periods, asymptotically approaches the sum $\mathrm{\Sigma}\equiv {\sum}_{m=1}^{M}{\lambda}_{m}$. Our results are verified by numerical simulation and counting (proofs provided in Supporting Information Appendix).
We begin with a numerical example, using periods {3,4} (Figure 6A): the full range is $L=12$, while we see numerically that the contiguousseparating capacity is ${l}^{*}=6$. Although the separating capacity with gridstructured inputs is smaller than with random inputs, it is notably not much smaller (Figure 6B, black versus cyan curves), and it is actually larger than for random inputs if the readout weights are constrained to be nonnegative (Figure 6B, pink curves). Later, we will further show that the larger randominput capacity of place cells with unrestricted weights comes at the price of less robustness: the realizable fields have smaller margins. Next, we analytically characterize the separating capacity of place cells with gridlike inputs.
Separating capacity equals rank of gridlike inputs
For inputs in general position, the separating capacity equals the rank of the input matrix (plus 1 when the threshold is allowed to be nonzero), and the rank equals the dimension (number of cells) of the input patterns – the input matrix is full rank. When inputs are in general position, all input subsets of size equaling the separating capacity have the same rank. But when input patterns are not in general position, some subsets can have smaller ranks than others even when they have the same size. Thus, when input patterns are not in general position the separating capacity is only upper bounded by the rank of the full input matrix. In turn, the rank is only upper bounded by the number of cells (the input matrix need not be full rank).
For the gridlike code, all codewords can be generated by the iterated application of a linear operator $J$ to a single codeword: a simultaneous oneunit phase shift by a cyclic permutation in each grid module is such an operator $J$, which can be represented by a blockform permutation matrix. The sequence $\mathit{x},J\mathit{x},{J}^{2}\mathit{x},\dots {J}^{m}\mathit{x}$ of patterns generated by applying $J$ to a gridlike codeword $\mathit{x}$ with the same module structure represents $m$ contiguous locations (Figure 6C).
The separating capacity for inputs generated by iterated application of the same linear operation saturates its bound by equaling the rank of the input pattern matrix. Since a code $\mathit{x},J\mathit{x},{J}^{2}\mathit{x},{J}^{3}\mathit{x},\dots$, generated by some linear operator $J$ with starting codeword $\mathit{x}$ is translation invariant, the number of dimensions spanned by these patterns strictly increases until some value $l$, after which the dimension remains constant. By definition, $l$ is therefore the rank $R$ of the input pattern matrix. It follows that any contiguous set of $l=R$ patterns is linearly independent, and thus in general position, which means that the separating capacity of such a pattern matrix is $R$.
For place cells, it follows that whenever $l\le R$, with $R$ the rank of the gridlike input matrix, all field arrangements are realizable, while for any $l\S gt;R$, there will be nonrealizable field arrangements (Supporting Information Appendix). Therefore, the contiguousseparating capacity for place cells is ${l}^{*}=R$. This is an interesting finding: the separating capacity of a place cell fed with structured gridlike inputs approaches the same capacity as if fed with generalposition inputs of the same rank. Next, we compute the rank $R$ for gridlike inputs under increasingly general assumptions.
Grid input rank converges to sum of grid module periods
Integer periods
For integervalued periods ${\lambda}_{m}(1\le m\le M)$, the rank of the matrix consisting of the multiperiodic gridlike inputs can be determined through the inclusionexclusion principle (see Section B.4):
where ${S}_{k}^{i}$ is the ith of the kelement subsets of $\{{\lambda}_{1},\mathrm{\dots},{\lambda}_{M}\}$. To gain some intuition for this expression, note that if the periods were pairwise coprime, all the GCDs would be 1 and this formula would quite simply produce ${R}_{\mathrm{copr}}({\lambda}_{1},\mathrm{\dots},{\lambda}_{M})=\mathrm{\Sigma}M+1$, where $\mathrm{\Sigma}$ is defined as the sum of the module periods. If the periods are not pairwise coprime, the rank is reduced based on the set of common factors, as in (5), which satisfies the following inequality: $\mathrm{\Sigma}\sum _{i\S lt;j}{\displaystyle \text{GCD}}({\lambda}_{i},{\lambda}_{j})\le {R}_{\mathrm{i}\mathrm{n}\mathrm{t}}({\lambda}_{1},\cdots ,{\lambda}_{M})\le \mathrm{\Sigma}$. When the periods are large ($\lambda \gg 1$), the rank approaches $\mathrm{\Sigma}$. Large integers ($\lambda \gg 1$) evenly spaced or uniformly randomly distributed over some range tend not to have large common factors (Cesaro, 1881). As a result, even for noncoprime periods, the rank scales like and approaches $\mathrm{\Sigma}$ (see below for more elaboration).
Realvalued periods
Actual grid periods are real rather than integervalued, but with some finite resolution. To obtain an expression for this case, consider the sequence of ranks ${R}_{\mathrm{re}}^{q}$ defined as
where $\lfloor \cdot \rfloor $ denotes the floor operation, $q$ is an effective resolution parameter that takes integer values (the larger $q$, the finer the resolution of the approximation to a realvalued period), and the periods $0\S lt;{\lambda}_{1}\S lt;\dots \S lt;{\lambda}_{M}$ are real numbers. The rank of the gridlike input matrix with realvalued periods is given by ${lim}_{q\to \mathrm{\infty}}{R}_{\mathrm{re}}^{q}({\lambda}_{1},\mathrm{\cdots},{\lambda}_{M})/q$, if this limit exists. A finer resolution (higher $q$) corresponds to representing phases with higher resolution within each module, and thus intuitively to scaling the number of grid cells in each module by $q$.
Suppose that the periods are drawn uniformly from an interval of the reals, which we take without loss of generality to be $(0,1)$. Then the values $\lfloor q{\lambda}_{1}\rfloor ,\mathrm{\cdots},\lfloor q{\lambda}_{M}\rfloor $ are integers in $\{1,\mathrm{\dots},q\}$ and as above we have that $0\le q\mathrm{\Sigma}{R}_{\mathrm{r}\mathrm{e}}^{q}({\lambda}_{1},\cdots ,{\lambda}_{M})\le \sum _{i\S lt;j}{\displaystyle \text{GCD}}(\lfloor {\lambda}_{i}q\rfloor ,\lfloor {\lambda}_{j}q\rfloor )$. In the infinite resolution limit ($q\to \mathrm{\infty}$), the probability $\text{GCD}(\lfloor {\lambda}_{i}q\rfloor ,\lfloor {\lambda}_{j}q\rfloor )=g$ scales asymptotically as $1/{g}^{2}$, independent of $q$ (Cesaro, 1881), which means that large randomly chosen large integers tend not to have large common factors. This implies that with probability 1, the limit ${lim}_{q\to \mathrm{\infty}}{R}_{\mathrm{re}}^{q}({\lambda}_{1},\mathrm{\cdots},{\lambda}_{M})/q$ is welldefined and equals $\mathrm{\Sigma}$, the sum of the input grid module periods.
When assessed numerically at different resolutions ($q$), the approach of the finiteresolution rank to the realvalued grid period rank is quite rapid (Figure 6D). Thus, the separating capacity does not depend sensitively on the precision of the grid periods. It is also invariant to the resolution with which phases are represented within each module.
In summary, the placecellseparating capacity with realvalued grid periods and highresolution phase representations within each module equals the rank of the gridlike input matrix, which itself approaches $\mathrm{\Sigma}$, the sum of the module periods. Thus, a place cell can realize any arrangement of fields over a spatial range given by the sum of module periods of its grid inputs.
It is interesting that the contiguousseparating capacity of a place cell fed with gridlike inputs not in general position approaches the same capacity as if fed with generalposition inputs of the same rank. On the other hand, the contiguousseparating capacity is very small compared to the total range over which the input grid patterns are unique: since each local region of hippocampus receives input from 2 to 3 modules (Witter and Groenewegen, 1984; Amaral and Witter, 1989; Witter and Amaral, 1991; Witter et al., 2000; Honda et al., 2012), the range over which any field arrangement is realizable is at most 2–3 times the typical grid period. By contrast, the total range $L$ of locations over which the grid inputs provide unique codes scales as the product of the periods. The result implies that once field arrangements are freely chosen in a small region, they impose strong constraints on a much larger overall region and across environments. We explore this implication in more detail below.
Generalization to higher dimensions
We have already argued that our counting arguments hold for realistic tuning curve shapes with graded activity profiles. This follows from the fact that convolution of the gridlike codes with appropriate smoothing kernels does not change the general geometric arrangement of codewords relative to each other as these convolution operations preserve withinmodule permutation symmetries and acrossmodule independence in the code. We have also shown that the contiguousseparating capacity results apply to realvalued grid periods with dense phase encodings within each module.
Here, we describe the generalization to different spatial dimensions. Consider a $d$dimensional gridlike code consisting of ${({\lambda}_{m})}^{d}$ cells in the mth module to produce a onehot phase code for ${\lambda}_{m}$ (discrete) positions along each dimension (Figure 6E). Since the counting results rely only on the existence of a modularonehot code and not any mapping from real spaces to coding states, this code across multiple modules $m=1,\mathrm{\dots},M$ is equivalent to a modularonehot coding for ${\prod}_{m=1}^{M}{({\lambda}_{m})}^{d}$ states, with modules of size ${({\lambda}_{m})}^{d}$ each. All the counting results from before therefore hold, with the simple substitution ${\lambda}_{m}\to {({\lambda}_{m})}^{d}$ in the various formulae.
The contiguousseparating capacity in $d$dimensions is defined as the maximum volume over which all field arrangements are realizable. Like the 1D separating capacity results, this volume depends upon the mapping of physical space to gridlike codes. We are able to show that for grid modules with periods ${\lambda}_{1},\mathrm{\dots},{\lambda}_{M}$ the generalized separating capacity is ${l}_{d}^{\star}={\mathrm{\Sigma}}_{d}={\sum}_{m=1}^{M}{\lambda}_{m}^{d}$ (see Section B.4; Figure 6F). This result follows from essentially the same reasoning as for 1D environments, but with the use of $d$dimensional phaseshift operators.
Robustness of field arrangements to noise and nongrid inputs
An important quality of field arrangements that is neglected when merely counting the number of realizable arrangements or determining the separating capacity is robustness: these computations consider all realizable field arrangements, but field arrangements are practically useful only if they are robust so that small amounts of perturbation or noise in the inputs or weights do not render them unrealizable. Above, we showed that gridlike codes enable many dichotomies despite being structurally constrained, but that random analogvalued codes as well as more hierarchical codes permit even more dichotomies. Here, we show that the dichotomies realized by grid codes are substantially more robust to noise and thus more stable.
The robustness of a realizable dichotomy in a perceptron is given by its margin: for a given linear decision boundary, the margin is the smallest datapointboundary distance for each class, summed for the two classes. The maximum margin is the largest achievable margin for that dataset. The larger the maximum margin, the more robust the classification. We thus compare maximum margins (herein simply referred to as margins) across place field arrangements, when the inputs are gridlike or not.
Perceptron margins can be computed using quadratic programming on linear support vector machines (Platt, 1998). We numerically solve this problem for three types of input codes (permitting a nonzero threshold and imposing no weight constraints): the gridlike code ${X}_{\mathrm{g}}$; the shuffled gridlike code ${X}_{\mathrm{gs}}$ – a row and columnshuffled version of the gridlike code that breaks its modular structure; and the random code ${X}_{\mathrm{r}}$ of uniformly distributed random inputs (Figure 7). To make distance comparisons meaningful across codes, $(1)$ all patterns (columns) involve the same number of neurons (dimension), $(2)$ have the same total activity level (unity L_{1} norm), and $(3)$ the number of input patterns is the same across codes, and chosen to equal $L$, the full range of the corresponding gridlike code. To compute margins, we consider only the realizable dichotomies on these patterns.
The margins of all realizable place field arrangements with gridlike inputs are shown in Figure 7A (black); the margin values for all arrangements are discretized because of the geometric arrangements of the inputs, and each black bar has a very high multiplicity. The gridlike code produces much largermargin field arrangements than shuffled versions of the same code and random codes (Figure 7A, pink and blue). The higher margins of the gridlike compared to the shuffled gridlike code show that it is the structured geometry and modular nature of the code that produce wellseparated patterns in the input space (Figure 4B) and create wide margins and field stability. In other words, place field arrangements formed by grid inputs, though smaller in number than arrangements with differently coded inputs, should be more robust and stable against potential noise in neural activations or weights.
Next, we directly consider how different kinds of nongrid inputs, driving place cells in conjunction with gridlike inputs, affect our results on place field robustness. We examine two distinct types of added nongrid input: (1) spatially dense noise that is meant to model sources of uncontrolled variation in inputs to the cell and (2) spatially sparse and reliable cues meant to model spatial information from external landmarks.
After the addition of dense noise, previously realizable griddriven place field arrangements remain realizable and their margins, though somewhat lowered, remain relatively large (Figure 7B, empty green violins). In other words, griddriven place field arrangements are robust to small, dense, and spatially unreliable inputs, as expected given their large margins. Note that because the addition of dense i.i.d. noise to gridlike input patterns pushes them toward general position, and generalposition inputs enable more realizable arrangements, the noiseadded versions of gridlike inputs also give rise to some newly realizable field arrangements (Figure 7B, full green violins). However, as with arrangements driven purely by random inputs, these new arrangements have small margins and are relatively not robust. Moreover, since by definition noise inputs are assumed to be spatially unreliable, the newly realizable arrangements will not persist across trials.
Next, the addition of sparse spatial inputs (similar to the onehot codes of Table 1, though the sparse inputs here are nearly but not strictly orthogonal) leaves previous field arrangements largely unchanged and their margins substantially unmodified (Figure 7C, empty green violins). In addition, a few more field arrangements become realizable and these new arrangements also have large margins (Figure 7C, full green violins). Thus, sufficiently sparse spatial cues can drive additional stable place fields that augment the griddriven scaffold without substantially modifying its structure. Plasticity in weights from these sparse cue inputs can drive the learning of new fields without destabilizing existing field arrangements.
In sum, griddriven place arrangements are highly robust to noise. Combining gridcell drive with cuedriven inputs can produce robust maps that combine internal scaffolds with external cues.
High volatility of field arrangements with grid input plasticity
Our results on the fraction of realizable place field arrangements and on placecellseparating capacity with gridlike inputs imply that place cells have highly restricted flexibility in laying down place fields (without direct drive from external spatially informative cues) over distances greater than $\mathrm{\Sigma}$, the sum of the input grid module periods. Selecting an arrangement of fields over this range then constrains the choices that can be made over all remaining space in the same environment and across environments. Conversely, changing the field arrangement in any space by altering the gridplace weights should affect field arrangements everywhere.
We examine this question quantitatively by constructing realizable Kfield arrangements (with gridlike responses generated as 1D slices through 2D grids [Yoon et al., 2016]), then attempting to insert one or a few new fields (Figure 8A,B). Inserting even a single field at a randomly chosen location through Hebbian plasticity in the gridplace weights tends to produce new additional fields at uncontrolled locations, and also leads to the disappearance of existing fields (Figure 8A,B).
Interestingly, though field insertion affects existing arrangements through the uncontrolled appearance or disappearance of other fields, it does not tend to produce local horizontal displacements of existing fields (Figure 8C): fields that persist retain their firing locations or they disappear entirely, consistent with the surprising finding of a similar effect in experiments (Ziv et al., 2013).
The locations of fields, including of uncontrolled field additions, are wellpredicted by the structure (autocorrelation) of that cell’s grid inputs (Figure 8D). This multipeaked autocorrelation function, with large separations between the tallest peaks, reflects the multiperiodic nature of the grid code and explains why fields tend to appear or disappear at remote locations rather than shifting locally: modest weight changes in the gridlike inputs modestly alter the heights of the peaks, so that some of the wellseparated tall peaks fall below threshold for activation while others rise above.
Quantitatively, insertion of a single field at an arbitrary location in a 20 m span gridplace weight plasticity results in the insertion or deletion, on average, of $\sim 0.2$ uncontrolled fields per meter. The insertion of four fields anywhere over 20 m results in an average of one uncontrolled field per meter (Figure 8E).
Thus, if a place cell were to add a field in a new environment or within a large single environment by modifying the gridplace weights, our results imply that it is extremely likely that this learning will alter the original gridcelldriven field arrangements (scaffold). By contrast, adding fields that are driven by spatially specific external cues, though plasticity in the cue inputtoplace cell synapses, may not affect field arrangements elsewhere if the cues are sufficiently sparse (unique); in this case, the added field would be a ‘sensory’ field rather than an internally generated or ‘mnemonic’ one.
In sum, the small separating capacity of place cells according to our model may provide one explanation for the high volatility of the place code across tens of days (Ziv et al., 2013) if gridplace weights are subject to any plasticity over this timescale. Alternatively, to account for the stability of spatial representations over shorter timescales, our results suggest that external cuedriven inputs to place cells can be plastic but the gridplace weights, and correspondingly, the internal scaffold, may be fixed rather than plastic (Figure 8F). In experiments that induce the formation of a new place field through intracellular current injection (Bittner et al., 2015), it is notable that the precise location of the new field was not under experimental control: potentially, an induced field might only be able to form where an underlying (nearthreshold) grid scaffold peak already exists to help support it, and the observed long plasticity window could enable place cells to associate a plasticityinducing cue with a nearby scaffold peak.
This alternative is consistent with the finding that entorhinalhippocamapal connections stabilize longterm spatial and temporal memory (Brun et al., 2008; Brun et al., 2002; Suh et al., 2011).
Finally, we note that the robustness of place field arrangements obtained with gridlike inputs is not inconsistent with the volatility of field arrangements to the addition or deletion of new fields through gridplace weight plasticity. Griddriven place field arrangements are robust to random i.i.d. noise in the inputs and weights, as well as the addition of nongrid sparse inputs. On the other hand, the volatility results involve associative plasticity that induces highly nonrandom weight changes that are large enough to drive constructive interference in the inputs to add a new field at a specific location. This nonrandom perturbation, applied to the distributed and globally active grid inputs, results in global output changes.
Discussion
Griddriven hippocampal scaffolds provide a large representational space for spatial mapping
We showed that when driven by gridlike inputs, place cells can generate a spatial response scaffold that is influenced by the structural constraints of the gridlike inputs. Because of the richness of their gridlike inputs, individual place cells can generate a large library of spatial responses; however, these responses are also strongly structured so that the realizable spatial responses are a vanishingly small fraction of all spatial responses over the range where the grid inputs are unique. However, realizable spatial field arrangements are robust, and place cells can then ‘hang’ external sensory cues onto the spatial scaffold by associative learning to form distinct maps spatial maps for multiple environments. Note that our results apply equally well to the situation where grid states are incremented based on motion through arbitrary Euclidean spaces, not just spatial ones (Killian et al., 2012; Constantinescu et al., 2016; Aronov et al., 2017; Klukas et al., 2020).
Summary of mathematical results
Mathematically, formulating the problem of place field arrangements as a perceptron problem led us to examine the realizable (linearly separable) dichotomies of patterns that lie not in general position but on the vertices of convex regular polytopes, thus extending Cover’s results to define capacity for a case with geometrically structured inputs (Cover, 1965). Input configurations not in general position complicate the counting of linearly separable dichotomies. For instance, counting the number of linearly separable Boolean functions, which is precisely the problem of counting the linearly separable dichotomies on the hypercube, is NPhard (Peled and Simeone, 1985; Hegedüs and Megiddo, 1996).
We showed that the geometry of gridcell inputs is a convex polytope, given by the orthogonal product of simplices whose dimensions are set by the period of each grid module divided by the resolution. Gridlike codes are a special case of modularonehot codes, consisting of a population divided into modules with only one active cell (group) at a time per module.
Exploiting the symmetries of modularonehot codes allowed us to characterize and enumerate the realizable Kfield arrangements for small fixed $K$. Our analyses relied on combinatorial objects called Young diagrams (Fulton and Fulton, 1997). For the special case of $M=2$ modules, we expressed the number of realizable field arrangements exactly as a polyBernoulli number (Kaneko, 1997). Note that with random inputs, by contrast, it is not wellposed to count the number of realizable Kfield arrangements when $K$ is fixed since the solution will depend on the specific configuration of input patterns. While we have considered two extreme cases analytically, one with no constraints on place field sparsity and the other with very few fields, it remains an outstanding question of interest to examine the case of sparse but not ultrasparse field arrangements in which the number of fields is proportional to the full range, with a constant small prefactor (Itskov and Abbott, 2008). Finding results in this regime would involve restricting our count of all possible Young diagrams to a subset with a fixed filledin area (purple area in Figure 5). This constraint makes the counting problem significantly harder.
We showed using analytical arguments that our results generalize to analog or graded tuning curves, realvalued periods, and dense phase representations per module. We also showed numerically that our qualitative results hold when considering deviations from the ideal, like the addition of noise in inputs and weights. The relatively large margins of the place field arrangements obtained with gridlike inputs make the code resistant to noise. In future work, it will be interesting to further explore the dependence of margins, and thus the robustness of the place field arrangements, on graded tuning curve shapes and the phase resolution per module.
Robustness, plasticity, and volatility
As described in the section on separating capacity, once gridplace weights are set over a relatively small space (about the size of the sum of the grid module periods), they set up a scaffold also outside of that space (within and across environments). Associating an external cue with this scaffold would involve updating the weights from the external sensory inputs to place cells that are close to or above threshold based on the existing scaffold. This does not require relearning gridplace weights and does not cause interference with previously learned maps.
By contrast, relearning the gridplace weights for insertion of another griddriven field rearranges the overall scaffold, degrading previously learned maps (volatility: Ziv et al., 2013). If we consider a realizable field arrangement in a small local region of space then impose some desired field arrangement in a different local region of space through Hebbian learning, we might ask what the effect would be in the first region. Our results on field volatility provide an answer: if the first local region is of a size comparable to the sum of the place cell’s input grid periods, then any attempt to choose field locations in a different local region of space (e.g., a different environment) will almost surely have a global effect that will likely affect the arrangement of fields in the first region. A similar result might hold true if the first region is actually a disjoint set of local regions whose individual side lengths add up to the sum of input grid periods. This prediction might be consistent with the observed volatility of place fields over time even in familiar environments (Ziv et al., 2013).
Our volatility results alternatively raise the intriguing possibility that gridplace weights, and thus the scaffold, might be largely fixed and not especially plastic, with plasticity confined to the nongrid sensory cuedriven inputs and in the return projections from place to grid cells. The experiments of Rich et al., 2014 – in which place cells are recorded on a long track, the animal is then exposed to an extended version of the track, but the original fields do not shift – might be consistent with this alternative possibility. These are two rather strong and competing predictions that emerge from our model, each consistent with different pieces of data. It will be very interesting to characterize the nature of plasticity in the gridtoplace weights in the future.
Alternative models of spatial tuning in hippocampus
This work models place cells as feedforwarddriven conjunctions between (sparse) external sensory cues and (dense) motionbased internal position estimates computed in grid cells and represented by multiperiodic spatial tuning curves. In considering placecell responses as thresholded versions of their feedforward inputs including from grid cells, our model follows others in the literature that make similar assumptions (Hartley et al., 2000; Solstad et al., 2006; Sreenivasan and Fiete, 2011; Monaco et al., 2011; Cheng and Frank, 2011; Whittington et al., 2020). These models do not preclude the possibility that place cells feed back to correct gridcell states, and some indeed incorporate such return projections (Sreenivasan and Fiete, 2011; Whittington et al., 2020; Agmon and Burak, 2020). It will be interesting in future work to analyze how such return projections affect the capacity of the combined system.
Our assumptions and model architecture are quite different from those of a complementary set of models, which take the view that gridcell activity is derived from place cells (Kropff and Treves, 2008; Dordek et al., 2016; Stachenfeld et al., 2017). Our assumptions also contrast with a third set of models in which placecell responses are assumed to emerge largely from locally recurrent weights within hippocampus (Tsodyks et al., 1996; Samsonovich and McNaughton, 1997; Battista and Monasson, 2020; Battaglia and Treves, 1998). One challenge for those models is in explaining how to generate stable place fields through velocity integration across multiple large environments: the capacity (number of fixed points) of many fully connected neural integrator models in the style of Hopfield networks tends to be small – scaling as $\sim N$ states with $N$ neurons (Amit et al., 1985; Gardner, 1988; AbuMostafa and Jacques, 1985; Sompolinsky and Kanter, 1986; Samsonovich and McNaughton, 1997; Battaglia and Treves, 1998; Battista and Monasson, 2020; Monasson and Rosay, 2013) because of the absence of modular structures (Fiete et al., 2014; Sreenivasan and Fiete, 2011; Chaudhuri and Fiete, 2019; Mosheiff and Burak, 2019). There are at least two reasons why a capacity roughly equal to the number of place cells might be too small, even though the number of hippocampal cells is large: (1) a capacity equal to the number of place cells would be quickly saturated if used to tile 2D spaces: 10^{6} states from 10^{6} cells supply 10^{3} states per dimension. Assuming conservatively a spatial resolution of 10 cm per state, this means no more than 100 m of coding capacity per linear dimension, with no excess coding states for error correction (Fiete et al., 2008; Sreenivasan and Fiete, 2011). (2) The hippocampus sits atop all sensory processing cortical hierarchies and is believed to play a key role in episodic memory in addition to spatial representation and memory. The number of potential cortical coding states is vastly larger than the number of place cells, suggesting that the number of hippocampal coding states should grow more rapidly than linearly in the number of neurons, which is possible with our griddriven model but not with nonmodular Hopfieldlike network models with pairwise weights between neurons.
Even if our assumption that place cells primarily derive their responses from gridlike inputs combined with external cuederived nongrid inputs is correct, place cells may nevertheless deviate from our simple perceptron model if the place response involves additional layers of nonlinear processing. There are many ways in which this can happen: place cells are likely not entirely independent of each other, interacting through populationlevel competition and other recurrent interactions. Dendritic nonlinearities in place cells act as a hidden layer between gridcell input and place cell firing (Poirazi and Mel, 2001; Polsky et al., 2004; Larkum et al., 2007; Spruston, 2008; Larkum et al., 2009; Harnett et al., 2012; Harnett et al., 2013; Stuart et al., 2016). Or, if we identify our model place cells as residing in CA1, then CA3 would serve as an intermediate and locally recurrent processing layer. In principle, hidden layers that generated a onehot encoding for space from the gridlike inputs and then drove place cells as perceptrons would make all place field arrangements realizable. However, such an encoding would require a very large number of hidden units (equal to the full range of the grid code, while the grid code itself requires only the logarithm of this number). Additionally, place cells may exhibit richer inputoutput transformations than a simple pointwise nonlinearity, for instance, through cellular temporal dynamics including adaptation or persistent firing. Finding ways to include these effects in the analysis of place field arrangements is a promising and important direction for future study.
In sum, combining modular gridlike inputs produces a rich spatial scaffold of place fields, on which to associate external cues, much larger than possible with nonmodular recurrent dynamics within hippocampus. Nevertheless, the allowed states are strongly constrained by the geometry of the gridcell drive. Further, our results suggest either high volatility in the place scaffold if gridtoplacecell weights exhibit synaptic plasticity, or suggest the possibility that gridtoplacecell weights might be random and fixed.
Numerical methods
Random, weightconstrained random, and shuffled inputs
Entries of the random input matrix are uniformly distributed variables in $[0,1]$. To compare separating capacity (Figure 4) of random codes with the gridlike code, we consider matrices of the same input dimension (number of neurons) as the gridcell matrix, or alternatively of the same rank as the gridcell matrix, then use Cover’s theorem to count the realizable dichotomies (Cover, 1965). Weightconstrained random inputs (Figure 4B–D) are random inputs with nonnegative weights imposed during training.
To compare margins (Figure 7), we use matrices with the same input dimension and number of patterns. As margins scale linearly with the norm of the patterns, to keep comparisons fair the input columns (patterns) are normalized to have unity L_{1} norm.
Nongrid inputs
To test how nongrid inputs affect our results (Figure 7C,D), the ${\lambda}_{1}+{\lambda}_{2}$ gridlike inputs from two modules with periods ${\lambda}_{1}=31$ and ${\lambda}_{2}=43$ are augmented by 100 additional inputs. In Figure 7C, each nongrid dense noisy input is a random variable selected uniformly and identically at each location from the uniform interval $[0,2\mu ]$, where $\mu =0.2{\mu}_{g}$, and ${\mu}_{g}=2/({\lambda}_{1}+{\lambda}_{2})$ is the population mean of the grid inputs. In Figure 7D, each nongrid sparse input is a $\{0,1\}$ random variable with $Q$ nonzero responses across the full range $L={\lambda}_{1}{\lambda}_{2}$. We set $Q=0.2L{\mu}_{g}$. In all cases, input columns (patterns with grid and nongrid inputs combined) are finally normalized to have unity L_{1} norm. Results are based on 1000 realizations (samples) of the nongrid inputs.
Gridlike inputs with graded tuning curves
We generate periodic gridlike activity with graded tuning curves as a function of 1D space $x$ in cell $i$ of module $m$ with period ${\lambda}_{m}$ as follows Sreenivasan and Fiete, 2011:
where the phase of module $m$ is ${\varphi}_{m}(x)=(x/{\lambda}_{m}mod1)$. The ith cell in a module has a preferred activity phase ${\phi}_{i}$ drawn randomly and uniformly from (0,1). The tuning width ${\sigma}_{g}$ is defined in terms of phase, thus in real space the width of the activity bump grows linearly with the module period. We set ${\sigma}_{g}=0.16$ (thus the fullwidth at halfmax of the phase tuning curve equals 3/8 of the period, similar to grid cells).
Finally, to simulate quasiperiodic grid responses in 1D, we first generate 2D responses with Gaussian tuning on a hexagonal lattice, with the same field width as above. 1D responses of grid cells from the same module are then generated as parallel 1D slices of this lattice as in Yoon et al., 2016, with phases uniformly drawn at random.
Appendix 1
The geometry of the grid code
In this Appendix, we introduce the geometrical framework for the study of place cells modeled as perceptrons reading out the activity of grid cells. First, we define the space of gridlike inputs via symmetry considerations and without considering explicitly their relation to spatial locations. Second, we discuss linearly separable dichotomies in the space of gridlike inputs, whose geometric arrangements are not in general position. Third, we show that the geometry of gridlike inputs is that of a polytope that can be decomposed as an orthogonal product of simplices.
The space of gridlike inputs
We model gridcell activity via $\{0,1\}$ spatial patterns $\mathit{r}$ that take value 1 whenever the cell is active and take value 0 otherwise (Fyhn et al., 2004; Fiete et al., 2008). To model the periodic spatial response of grid cells, we assume that the activity pattern of a grid cell defines a periodic lattice with integer period $\lambda $. For simplicity, we consider 1D model for which the spatial patterns $\mathit{r}$ are $\lambda $periodic vectors and for which the set of activity patterns is given by the lattices $i+\lambda \mathbb{Z}$, $1\le i\le L$. We refer to the index $i$ as the phase index of the gridcell spatial pattern. Our key results will generalize to lattices of arbitrary dimension $n$, for which the set of spatial patterns is given by the hypercube lattices $\mathit{i}+{\left(\lambda \mathbb{Z}\right)}^{n}$, with phase indices $\mathit{i}$ in ${\{1,\mathrm{\dots},\lambda \}}^{n}$.
Within a population, grid cells can have distinct periods and arbitrary phases. To model this heterogeneity, we consider a population of grid cells with $M$ possible integer spatial periods $\mathit{\lambda}=({\lambda}_{1},\dots ,{\lambda}_{M})$, thereby defining $M$ modules of grid cells. We assume that each module comprises all possible gridcellactivity patterns, that is, ${\lambda}_{m}$ grid cells labeled by the phase indices $i$, $1\le i\le {\lambda}_{m}$. For convenience, we index each cell by its module index $m$ and its phase index $i$, $1\le i\le {\lambda}_{m}$, so that the actual component index of cell $(m,i)$, $1\le i\le {\lambda}_{m}$, is $\sum _{n\S lt;m}{\lambda}_{m}+i$. By construction of our model, at every spatial position, each module has a single active cell. Thus, at each spatial position, the gridlike input is specified by $\{0,1\}$ column vectors $\mathit{c}}_{\mathit{\lambda}$ of dimension $N={\sum}_{m=1}^{M}{\lambda}_{m}$, the total number of grid cells.
In principle, the inputs to place cells are defined as spatial locations. Here, by contrast, we consider gridlike inputs as the inputs to place cells, without requiring these patterns to be spatial encodings. This approach is mathematically convenient as it allows us to exploit the many symmetries of the set of gridlike inputs denoted by $\mathcal{C}}_{\mathit{\lambda}$. The set $\mathcal{C}}_{\mathit{\lambda}$ contains as many gridlike inputs $\mathit{c}$ as there are choices of phase indices in each module, that is, $\mathrm{\Lambda}={\prod}_{m=1}^{M}{\lambda}_{m}$:
Here follow two examples of gridlike inputs $\mathcal{C}}_{\mathit{\lambda}$ enumerated in lexicographical order for $\mathit{\lambda}=(2,3)$ and $\mathit{\lambda}=(2,2,2)$.
Observe that, albeit inspired by the spatial activity of grid cells, the set of patterns $\mathcal{C}}_{\mathit{\lambda}$ has broader relevance than suggested by its use for modeling gridlike inputs. In fact, the set of patterns $\mathcal{C}}_{\mathit{\lambda}$ describes any modular winnertakeall activity, whereby cells are pooled in modules with only one cell active at a time – the winner of the module.
In the following, we consider that linear readouts of gridlike inputs determine the activity of downstream cells, called place cells (O’Keefe and Dostrovsky, 1971). The set of these linear readouts is the vector space $V}_{\mathit{\lambda}$ spanned by the gridlike inputs $\mathcal{C}}_{\mathit{\lambda}$. The dimension of the vector space $V}_{\mathit{\lambda}$ specifies the dimensionality of the grid code. The following proposition characterizes $V}_{\mathit{\lambda}$ and shows that its dimension is simply related to the periods $\mathit{\lambda}$.
Proposition 1
The set of gridlike inputs $\mathit{C}}_{\mathit{\lambda}$ specified by $M$ grid modules with integer periods $\mathit{\lambda}=({\lambda}_{1},\dots ,{\lambda}_{M})$ span the vector space
In particular, the embedding dimension of the grid code is $\mathrm{dim}{V}_{\mathit{\lambda}}=\sum _{m=1}^{M}{\lambda}_{m}M+1$.
Proof. Let us denote by $A}_{\mathit{\lambda}$ a matrix formed by collecting all the column vectors from $\mathit{C}}_{\mathit{\lambda}$. The vector space $V}_{\mathit{\lambda}$ is the range of the matrix $A}_{\mathit{\lambda}$, which is also the orthogonal complement of $\mathrm{ker}{A}_{\mathit{\lambda}}^{T}$. A vector $\mathit{x}=({x}_{1,1},\dots ,{x}_{1,{\lambda}_{1}}\dots \dots {x}_{M,1},\dots ,{x}_{M,{\lambda}_{M}})$ in ${\mathbb{R}}^{{\lambda}_{1}}\times \mathrm{\dots}\times {\mathbb{R}}^{{\lambda}_{M}}$ belongs to $\mathrm{ker}{A}_{\mathit{\lambda}}^{T}$ if and only if ${\mathit{x}}^{T}{A}_{\mathit{\lambda}}=0$. By construction of the matrix $A}_{\mathit{\lambda}$:
where i_{m} refers to the index of the active cell in module $m$. The latter characterization implies that
In turn, a vector $\mathit{y}=({y}_{1,1},\dots ,{y}_{1,{\lambda}_{1}}\dots \dots {y}_{M,1},\dots ,{y}_{M,{\lambda}_{M}})$ of the orthogonal complement of $\mathrm{ker}{A}_{\mathit{\lambda}}^{T}$, that is, in the range of $A}_{\mathit{\lambda}$, is determined by ${\mathit{x}}^{T}\mathit{y}=0$ for all $\mathit{x}$ in $\mathrm{ker}{A}_{\mathit{\lambda}}^{T}$. From the above characterization of $\mathrm{ker}{A}_{\mathit{\lambda}}^{T}$, this means that $\mathit{y}$ is in the range of $A}_{\mathit{\lambda}$, that is, in $V}_{\mathit{\lambda}$, if and only if for all ${a}_{1},\mathrm{\dots},{a}_{M}$ such that ${\sum}_{m=1}^{M}{a}_{m}=0$, we have
Substituting ${a}_{M}={\sum}_{m=0}^{M1}{a}_{m}$ in the above relation, we have that for all ${a}_{1},\mathrm{\dots},{a}_{M1}$ in ${\mathbb{R}}^{M1}$,
which is equivalent to ${\sum}_{i=0}^{{\lambda}_{m}1}{y}_{m,i}={\sum}_{i=0}^{{\lambda}_{M}1}{y}_{M,i}$ for all $m$, $1\le m\S lt;M$. The above relation entirely specifies the range of the activity matrix $A}_{\mathit{\lambda}$, that is, $V}_{\mathit{\lambda}$, as a vector space of dimension ${\sum}_{m=1}^{M}{\lambda}_{m}M+1$.
Linear readouts of gridlike inputs
We model the response of a place cell as that of a perceptron, which takes gridlike inputs $\mathit{c}$ in $\mathcal{C}}_{\mathit{\lambda}$ as inputs (Rosenblatt, 1958). Such a perceptron is parametrized by a decision threshold $\theta $ and by a vector of readout weights $\mathit{w}=({w}_{1,1},\dots ,{w}_{1,{\lambda}_{1}}\dots \phantom{\rule{thinmathspace}{0ex}}\dots {w}_{M,1},\dots ,{w}_{M,\lambda M})$, where the vertical separators delineate the gridcell modules with periods ${\lambda}_{m}$, $1\le m\le M$. By convention, we consider that a place cell is active for gridlike inputs $\mathit{c}$ such that ${\mathit{w}}^{T}\mathit{c}>\theta$ and inactive otherwise. Thus, in the perceptron framework, a place cell has a multifield structure if it is active on a set of several gridlike inputs $\mathcal{S}\subset {\mathcal{C}}_{\mathit{\lambda}}$, with $\mathcal{S}>1$ (Rich et al., 2014). Considering gridlike inputs as inputs allows one to restrict the class of perceptrons under consideration.
Proposition 2
Every realizable multifield structure can be implemented by a perceptron with $\mathrm{(}i\mathrm{)}$ nonnegative weights, or $\mathrm{(}i\mathit{}i\mathrm{)}$ with zero threshold.
Proof. $(i)$ If $M$ is the total number of modules and 1 is the $N$dimensional column vectors of 1, for all gridlike inputs $\mathit{c}$ in $\mathcal{C}}_{\mathit{\lambda}$ we have ${\mathbf{1}}^{T}\mathit{c}=(1,\dots ,1)\mathit{c}=M$. Thus, for all perceptron $(\mathit{w},\theta )$ and for all real µ, we have
where $p$ is the placecellactivity level for gridcell pattern $\mathit{c}$ in $\mathcal{C}}_{\mathit{\lambda}$. Consequently, setting $\mu \ge {\mathrm{max}}_{1\le i\le N}{w}_{i}$, $\mathit{w}}^{\mathrm{\prime}}=\mathit{w}+\mu \mathbf{1$ and ${\theta}^{\prime}=\theta +\mu M$ defines a new perceptron $({\mathit{w}}^{\mathrm{\prime}},{\theta}^{\mathrm{\prime}})$ with nonnegative weights, which operates the same classification as the perceptron $(\mathit{w},\theta )$ is equivalent to $p>\theta$ The result directly follows from a similar argument by observing that for all gridpopulations pattern $\mathit{c}$ in$\mathcal{C}}_{\mathit{\lambda}$
which implies that if the perceptron models $(\mathit{w},\theta )$ and $(\mathit{w}\theta \mathbf{1},0)$ achieve the same linear classification.
Our goal is to study the multifield structure of placecell perceptrons, which amounts to characterize the twoclass linear classifications of gridlike inputs $\mathcal{C}}_{\mathit{\lambda}$. The study of linear binary classifications has a long history in machine learning. Given a collection of $\mathrm{\Lambda}$ input patterns, there are ${2}^{\mathrm{\Lambda}}$ possible assignments of binary labels to these patterns, also referred to as dichotomies. In general, not all dichotomies can be linearly classified by a perceptron. Those dichotomies that can be classified are called linearly separable. An important question is to compute the number of linearly separable dichotomies, which depends on the geometrical arrangement of the inputs presented to the perceptron. Remarkably, Cover’s function counting theorem specifies the exact number of linearly separable dichotomies for $P$ inputs represented as points in a $N$dimensional space (Cover, 1965). For inputs in general position, the number of dichotomies realizable by a zerothreshold perceptron is given by
which shows that all dichotomies are possible as long as $P\le N$. A collection of points $\{{\mathit{x}}_{1},\dots ,{\mathit{x}}_{P}\}$ in an $N$dimensional space is in general position if no subset of $n+1$ points lies on a $(n1)$dimensional plane for all $n\le N$. In our modeling framework, the inputs are collections of points representing gridlike inputs $\mathcal{C}}_{\mathit{\lambda}$. As opposed to Cover’s theorem assumptions, these gridlike inputs are not in general position as soon as we consider grid code with more than one module. For instance, it is not hard to see that for $\mathit{\lambda}=(2,3)$, the patterns $(1,01,0,0)$, $(1,00,1,0)$, $(0,11,0,0)$ and $(0,10,1,0)$ are not in general position for being the vertices of a square, therefore lying in a 2D plane. Nongeneric arrangements of gridlike inputs are due to symmetries that are inherent to the modular structure of the grid code. We expect such symmetries to heavily restrict the set of linearly separable dichotomies, therefore constraining the multifield structure of a place cell perceptron.
We justify the above expectation by discussing the problem of linear separability for two codes that are related to the grid code. These two codes are the ‘onehot’ code, whereby a single cell is active for all input pattern, and the ‘binary’ code, whereby the set of input patterns enumerate all possible binary vectors of activity. Exemplars of gridlike inputs for the onehot code and the binary code are given for $N=3$ input cells by
From a geometrical point of view, a set of points representing the gridlike inputs $\mathcal{S}}_{J}\subset \mathcal{C$ is linearly separable if there is a hyperplane separating the points $\mathcal{S}$ from the other points $\mathcal{C}\setminus \mathcal{S}$. The existence of a hyperplane separating a single point from all other points is straightforward when the set of patterns correspond to the vertices of a convex polytope. Then, every vertex can be linearly separated from the other points for being an extreme point. It turns out that both the population patterns of the onehot code and of the binary code represent the vertices of a convex polytope: a simplex for the singlecell code and a hypercube for the binary code. However, because these vertices are in general position for the singlecell code but not for the binary code, the fraction of linearly separable dichotomies drastically differs for the two codes.
Let us first consider the $N$ points whose coordinates are given by ${C}_{\mathrm{oh}}$. The convex hull of ${C}_{\mathrm{oh}}$ is the canonical $(N1)$dimensional simplex. Thus, any sets of $k$ vertices, $1\le k\le N$, specifies a $(k1)$dimensional face of the simplex, and as such, is a linearly separable $k$dichotomy. This immediately shows that all dichotomies are linearly separable. This result follows from the fact that the $N$ points in ${C}_{\mathrm{oh}}$ are in general position. Let us then consider the ${2}^{N}$ points whose coordinates are given by ${C}_{\mathrm{b}}$. The convex hull of ${C}_{\mathrm{b}}$ is the canonical $N$dimensional hypercube. Thus, by contrast with $C}_{\mathrm{o}\mathrm{h}$, the points in ${C}_{\mathrm{b}}$ are not in general position. As a result, there are dichotomies that are not linearly separable as shown by considering. For instance, the pair $\{(1,0,0)$, $(0,1,0)\}$ and the pair $\{(0,0,0)$, $(1,1,1)\}$ can be linearly separated from the other points of the hypercube. Determining the number of linearly separable sets of hypercube vertices is a hard combinatorial problem that has attracted a lot of interest (Peled and Simeone, 1985; Hegedüs and Megiddo, 1996). Unfortunately, there is no efficient characterization of that number as a function of the dimension $N$. However, it is known that out of the ${2}^{{2}^{N}}$ possible dichotomies, the total number of linearly separable dichotomies scales as ${2}^{{N}^{2}}$ in the limit of large dimension $N\to \mathrm{\infty}$ (Irmatov, 1993). This shows that only a vanishingly small fraction of hypercube dichotomies are also linearly separable.
Grid code convex polytope
It is beneficial to gain geometric intuition about gridlike inputs to characterize their linearly separable dichotomies. As binary vectors of length $N$, gridlike inputs form a subset of the ${2}^{N}$ vertices of the $N$dimensional hypercube. Just as for the onehot and binary codes, linear separability of sets of gridlike inputs can be seen as a geometric problem about polytopes. To clarify this point, let us denote by $H}_{\mathit{\lambda}$ the convex hull of gridlike inputs $\mathcal{C}}_{\mathit{\lambda}$. By definition, we have
where $\mathit{c}}_{i$ in $\mathcal{C}}_{\mathit{\lambda}$ denotes the ith column of $A}_{\mathit{\lambda}$. The convex hull $H}_{\mathit{\lambda}$ turns out to have a simple geometric structure.
Proposition 3
For integer periods $\mathit{\lambda}=({\lambda}_{1},\dots ,{\lambda}_{M})$, the convex hull generated by $\mathcal{C}}_{\mathit{\lambda}$, the set of gridcellpopulation patterns, determines a $d$dimensional polytope $H}_{\mathit{\lambda}$, with $d\mathrm{=}{\mathrm{\sum}}_{m\mathrm{=}\mathrm{1}}^{M}{\lambda}_{m}\mathrm{}M$, defined as $H}_{\mathit{\lambda}}={\mathrm{\Delta}}^{{\lambda}_{1}}\times \dots \times {\mathrm{\Delta}}^{{\lambda}_{M}$ where ${\mathrm{\Delta}}^{{\lambda}_{m}}$, $\mathrm{1}\mathrm{\le}m\mathrm{\le}M$, denotes the $\mathrm{(}{\lambda}_{m}\mathrm{}\mathrm{1}\mathrm{)}$simplex specified by the ${\lambda}_{m}$ points: $\mathrm{(}\mathrm{1}\mathrm{,}\mathrm{0}\mathrm{,}\mathrm{\dots}\mathrm{,}\mathrm{0}\mathrm{)}\mathrm{,}\mathrm{(}\mathrm{0}\mathrm{,}\mathrm{1}\mathrm{,}\mathrm{0}\mathit{}\mathrm{\dots}\mathrm{,}\mathrm{0}\mathrm{)}\mathrm{,}\mathrm{\dots}\mathrm{,}\mathrm{(}\mathrm{0}\mathrm{,}\mathrm{\dots}\mathrm{,}\mathrm{0}\mathrm{,}\mathrm{1}\mathrm{)}$.
Before proving the product decomposition of $H}_{\mathit{\lambda}$, let us make a couple of observations. First, observe that all the vectors $\mathit{c}$ in $C}_{\mathit{\lambda}$ satisfies ${\mathbf{1}}^{T}\mathit{c}=M$, so that all edges $\mathit{c}{\mathit{c}}^{\mathrm{\prime}}$, with $\mathit{c}$, $\mathit{c}}^{\mathrm{\prime}$ in $C}_{\mathit{\lambda}$, lie in the same hyperplane of the vector space $V}_{\mathit{\lambda}$. By Proposition 1, $V}_{\mathit{\lambda}$ has dimension $N={\sum}_{m}{\lambda}_{m}M+1$, this implies that the dimension of the polytope $H}_{\mathit{\lambda}$ is at most $d=N1$. Second, observe that the set $C}_{\mathit{\lambda}$ is left unchanged by the symmetry operators ${J}_{{\lambda}_{m}}$, $1\le m\le M$, where ${J}_{{\lambda}_{m}}$ cyclically shifts downward the mth module coordinates of the vectors in $C}_{\mathit{\lambda}$. The operators ${J}_{{\lambda}_{m}}$ admit the matrix representation
where ${I}_{n}$ denotes the identity matrix in ${\mathbb{R}}^{n}\times {\mathbb{R}}^{n}$. Notice that the matrices ${J}_{{\lambda}_{m}}$ satisfy ${J}_{{\lambda}_{m}}^{T}{J}_{{\lambda}_{m}}={I}_{{\lambda}_{1}+\mathrm{\dots}+{\lambda}_{M}}$ showing that the operators ${J}_{{\lambda}_{m}}$ are isometries in $\mathit{V}}_{\mathit{\lambda}$. Moreover, observe that for all $\mathit{c}$, $\mathit{c}}^{\mathrm{\prime}$ in $C}_{\mathit{\lambda}$, there are integers of ${k}_{1},\mathrm{\dots},{k}_{M}$ such that $J}_{{\lambda}_{1}}^{{k}_{1}}\dots {J}_{{\lambda}_{1}}^{{k}_{M}}\mathit{c}={\mathit{c}}^{\mathrm{\prime}$. This shows that each vector in $C}_{\mathit{\lambda}$ plays the same role in defining the geometry of $H}_{\mathit{\lambda}$, and thus $H}_{\mathit{\lambda}$ is vertextransitive. In particular, every vector in $C}_{\mathit{\lambda}$ represents an extreme point of the convex hull $H}_{\mathit{\lambda}$. As a result, $H}_{\mathit{\lambda}$ is a polytope with as many vertices as the cardinality of $C}_{\mathit{\lambda}$, that is, $\mathrm{\Lambda}={\prod}_{m=1}^{M}{\lambda}_{m}$. The product decomposition of the polytope $H}_{\mathit{\lambda}$ then follows from a simple recurrence argument over the number of modules $M$.
Proof. In order to relate the geometrical structure of $H}_{\mathit{\lambda}$ to that of simplices, let us introduce $\mathit{e}}_{i$, $1\le i\le {\lambda}_{M}$, the elementary unit vector corresponding to the $i$th coordinate of ${\mathbb{R}}^{{\lambda}_{M}}$. The set $\mathcal{C}}_{\mathit{\lambda}$ has the following product structure
where $\mathcal{C}}_{{\mathit{\lambda}}^{\mathrm{\prime}}$ is the set of vectors for $M1$ modules with periods $\mathit{\lambda}}^{\mathrm{\prime}}=\{{\lambda}_{1},\dots ,{\lambda}_{M1}\$. The product structure of the set $\mathcal{C}}_{\mathit{\lambda}$ transfers to the convex hull ${H}_{\lambda}$ it generates. Specifically, we have
where we have recognized that the convex hull of the set of elementary basis vectors $\mathit{e}}_{i$, $1\le i\le {\lambda}_{M}$, is precisely the canonical $({\lambda}_{M}1)$simplex. Thus, we have shown that $H}_{\mathit{\lambda}}={H}_{{\mathit{\lambda}}^{\mathrm{\prime}}}\times {\mathrm{\Delta}}^{{\lambda}_{M}$. Proceeding by recurrence on the number of modules, one obtains the announced decomposition of the convex hull as a product $H={\mathrm{\Delta}}^{{\lambda}_{1}}\times \mathrm{\dots}\times {\mathrm{\Delta}}^{{\lambda}_{M}}$, where ${\mathrm{\Delta}}^{{\lambda}_{M}}$, $1\le m\le M$, is the canonical $({\lambda}_{m}1)$simplex.
The above orthogonal decomposition suggests that the problem of determining the linearly separable dichotomies of gridlike inputs is related to that of determining the linearly separable Boolean functions. Indeed, the polytope defined by gridlike inputs with $M$ modules contains $M$dimensional hypercubes, for which many dichotomies are not linearly separable. As counting the linearly separable Boolean functions is a notoriously hard combinatorial problem, it is unlikely that one can find a general characterization of the linearly separable dichotomies of gridlike inputs. However, it is possible to give some explicit results for the case of two modules $M$ or for the case of $k$dichotomies for small cardinality $k$.
Appendix 2
Combinatorics of linearly separable dichotomies
In this Appendix, we establish combinatorial results about the properties and the cardinality of linearly separable dichotomies of gridlike inputs. First, we show that linearly separable dichotomies can be partitioned in classes, each indexed by a combinatorial object called Young diagram. Second, we exploit related combinatorial objects, called Young tableaux, to show that not all Young diagrams correspond to linearly separable dichotomies. Third, we utilize Young diagrams to characterize dichotomies for which one class of labeled patterns has small cardinality $k=1,\dots ,4$. Fourth, we count the exact number of linearly separable dichotomies for gridlike inputs with two modules.
Relation to Young diagrams
To count linearly separable dichotomies, we first show that these dichotomies can be partitioned in classes that are indexed by Young diagrams. Young diagrams are useful combinatorial objects that have been used to study, e.g., the properties of the group representations of the symmetric group and of the general linear group. Young diagrams are formally defined as follows:
Definition 1
A ddimensional Young diagram is a subset D of lattice points in the positiveorthant of a ddimensional integral lattice, which satisfies the following:
If $({n}_{1},\dots ,{n}_{i},\dots ,{n}_{d})\in D$ and ${n}_{i}\S gt;0$, then $({n}_{1},\dots ,{n}_{i}1,\dots ,{n}_{d})\in D$.
For any positive integer i ≤ d, and any non negative integers, m, p, with m > p, the restriction of D to the hyperplane n_{i} = m is a (d−1)dimensional Young diagram that covers the (d − 1)dimensional Young diagram formed by the restriction of S to the hyperplane n_{i} = p.
Moreover, the size of the diagram D, denoted by D, is defined as the number of lattice points in D.
Young diagrams have been primarily studied for d = 2 because their use allows oneto conveniently enumerate the partitions of the integers. For d = 2, there are differentconventions for representing Young diagrams pictorially. Hereafter, we follow the Frenchnotations, where Young diagrams are left justified lattice rows, whose length decreaseswith height. For the sake of clarity, Fig. 1a depicts the 5 Young diagrams associated to thepartitions of 4: 4, 3 + 1, 2 + 2, 2 + 1 + 1 and 1 + 1 + 1 + 1: Young diagrams have been less studiedfor dimensions d ≥ 3 and only a few of their combinatorial properties are known. Fig. 1brepresents a 3dimensional diagram, together with two 2dimensional restrictions (red edgesfor n_{3} = 1 and yellow edges for n_{3} = 3). Observe that these restrictions are 2dimensionalYoung diagrams, and that the restriction corresponding to n_{3} = 1 covers the restriction corresponding to n_{3} = 3. Young diagrams can equivalently be viewed as arrays of boxesrather than lattice points in the positive orthant. This corresponds to identifying each latticepoint $({n}_{1},\dots ,{n}_{d})\in D$ with the unit cube $({n}_{1}1,{n}_{1})\times \dots \times ({n}_{d}1,{n}_{d})$.
Before motivating the use of Young diagrams, let us make a few remarks about the set ofdichotomies that can be realized by a perceptron with fixed weight vector (ω, θ). First, recallthat with no loss of generality we can restrict the weight vectors ω to be nonnegative byProposition 2. Second, by permutation invariance, there is no loss of generality in consideringa perceptron (ω, θ) for which the weight vector.
is such that the weights are ordered within each module: $w}_{m,1}\S lt;\dots \S lt;{w}_{m,{\u22cb}_{m}$, for all $m,1\le m\le M$. We refer to weight vectors having this modulespecific, increasing order propertyas being a modularly ordered weight vector. Bearing these observations in mind, the following proposition establishes the link between Young diagrams and perceptrons.
Proposition 4
Given integer periods $\mathit{\lambda}=({\lambda}_{1},\dots ,{\lambda}_{M})$, for all modularly ordered, nonnegative, weight vectors $\mathit{w}$ and for all thresholds $\theta $, the lattice set
is a $M$dimensional Young diagram in $\mathrm{\{}\mathrm{1}\mathrm{,}\mathrm{\dots}\mathit{}{\lambda}_{\mathrm{1}}\mathrm{\}}\mathrm{\times}\mathrm{\dots}\mathrm{\times}\mathrm{\{}\mathrm{1}\mathrm{,}\mathrm{\dots}\mathit{}{\lambda}_{M}\mathrm{\}}$.
In other words, under assumption of modularly ordered, nonnegative weights, the phase indices of inactive grid cells form a Young diagram.
Proof. The Young diagram properties directly follow from the ordering of weights within modules. For instance, it is easy to see that if $({i}_{1},\mathrm{\dots},{i}_{M})\le ({j}_{1},\mathrm{\dots},{j}_{M})$ for the componentwise partial order in $\{1,\mathrm{\dots}{\lambda}_{1}\}\times \mathrm{\dots}\times \{1,\mathrm{\dots}{\lambda}_{M}\}$, then $({j}_{1},\dots ,{j}_{M})\in \mathcal{D}(\mathit{w},\theta )$ implies $({i}_{1},\dots ,{i}_{M})\in \mathcal{D}(\mathit{w},\theta )$. Indeed, we necessarily have
By the above proposition, given a grid code with $M$ modules, every perceptron $(\mathit{w},\theta )$ acting on that grid code can be associated to a unique $M$dimensional Young diagram $\mathcal{D}(\mathit{w},\theta )$ after ordering the components of $\mathit{w}$ within each module. Conversely, if a $M$dimensional Young diagram ${D}^{\prime}$ can be associated to a perceptron $(\mathit{w},\theta )$ with modularly ordered, nonnegative weights, we say that ${\mathcal{D}}^{\mathrm{\prime}}=\mathcal{D}(\mathit{w},\theta )$ is realizable. Then a natural question to ask is: are all $M$dimensional Young diagrams realizable by perceptrons? It turns out that perceptrons exhaustively enumerate all $M$dimensional Young diagrams if $M\le 2$, but there are unrealizable Young diagrams as soon as $M>2$.
Relation to Young tableaux
Understanding why there are unrealizable Young diagrams as soon as $M>2$ involves using combinatorial objects that are closely related to Young diagrams, called Young tableaux.
Definition 2
Given a Young diagram $\mathrm{D}$, a Young tableau $\mathrm{T}$ is obtained by labeling the lattice points – or filling in the boxes – of $\mathrm{D}$ with the integers $\mathrm{1}\mathrm{,}\mathrm{2}\mathrm{,}\mathrm{\dots}\mathrm{,}\mathrm{}D\mathrm{}\mathrm{,}$ such that each number occurs exactly once and such that the entries are increasing across each row (to the right) and across each column (to the top).
Here are two examples of Young tableaux that are distinct labeling of the same Young diagram:
Just as Young diagrams, Young tableaux are naturally associated to perceptrons. The following arguments specify the correspondence between perceptrons and Young tableaux. Given a perceptron $(\mathit{w},\theta )$ with modularly ordered, nonnegative weights, let us order all patterns in $\mathcal{C}}_{\mathit{\lambda}$ by increasing level of perceptron activity. Specifically, set $\mathcal{J}}_{0}={\mathcal{C}}_{\mathit{\lambda}$ and define iteratively for $k$, $0\le k\S lt;\mathrm{\Lambda}$,
With no loss of generality, we can assume that all patterns achieve distinct levels of activity, so that there is a unique minimizer for all $k$, $0\le k<\mathrm{\Lambda}$. With that assumption, the sequence ${\mathit{c}}_{k}^{\star}(\mathit{w})$, $1\le k\le \mathrm{\Lambda}$, enumerates unambiguously all patterns in $\mathcal{C}}_{\mathit{\lambda}$ by increasing level of activity. The Young tableau associated to the perceptron $(\mathit{w},\theta )$, denoted by $\mathcal{T}(\mathit{w},\theta )$, is then obtained by labeling lattice points of the Young diagram $\mathcal{D}(\mathit{w},\theta )$ by increasing level of activity as in the sequence ${\mathit{c}}_{k}^{\star}(\mathit{w})$, $1\le k\le \mathcal{D}(\mathit{w},\theta )$. One can check that such labeling yields a tableau as the resulting labels increase along each rows (to the right) and columns (to the top). Within this framework, we say that a Young tableau $\mathcal{T}}^{\mathrm{\prime}$ is realizable if there is a perceptron $(\mathit{w},\theta )$ such that ${\mathcal{T}}^{\mathrm{\prime}}=\mathcal{T}(\mathit{w},\theta )$. Finally, let us define the sequence of thresholds ${\theta}_{k}(\mathit{w})$, $0\le k\le \mathrm{\Lambda}+1$, such that ${\theta}_{0}=\mathrm{\infty}$, $\theta}_{\mathrm{\Lambda}+1}(\mathit{w})=\mathrm{\infty$, and for$0\S lt;k\le \mathrm{\Lambda}$
Then, observe that for all $k$, $0\le k\le \mathrm{\Lambda}$, the set of active patterns ${\mathcal{J}}_{k}(\mathit{w})$ is linearly separable for threshold $\theta $ satisfying ${\theta}_{k}(\mathit{w})\le \theta <{\theta}_{k+1}(\mathit{w})$. In fact, the sequence $\{{\mathcal{J}}_{k}(\mathit{w}){\}}_{0\le k\le \mathrm{\Lambda}}$ represents all the linearly separable dichotomies realizable by changing the threshold of a perceptron with weight vector $\mathit{w}$. This fact will be useful to prove the following proposition, which justifies considering Young tableaux.
Proposition 5
All $M$dimensional Young diagrams are realizable if and only if all $\mathrm{(}M\mathrm{}\mathrm{1}\mathrm{)}$dimensional Young tableaux are realizable.
Observe that the above proposition does not mention the periods ${\lambda}_{1},\mathrm{\dots},{\lambda}_{M}$. This is because the proposition deals with the correspondence between $m$dimensional Young diagrams and $(M1)$dimensional Young tableaux for all possible assignments of periods.
Proof. In this proof, we use prime notations for quantities relating to $M1$ modules and regular notations for quantities relating to $m$ modules. For instance, $\mathit{\lambda}$ denotes an arbitrary assignment of $m$ periods $\{{\lambda}_{1},\mathrm{\dots},{\lambda}_{M}\}$ and $\mathit{\lambda}}^{\mathrm{\prime}$ denotes its $m1$ first components $\{{\lambda}_{1},\mathrm{\dots},{\lambda}_{M1}\}$. With this preamble, we give the ‘if’ part of proof in $(i)$ and the ‘only if’ part in $(ii)$.
(i) Given a $(M1)$dimensional Young tableau $\mathcal{T}}^{\mathrm{\prime}$ with diagram $\mathcal{D}}^{\mathrm{\prime}$, let us consider the smallest periods $\mathit{\lambda}}^{\mathrm{\prime}$ such that $\mathcal{D}}^{\mathrm{\prime}}\subset \{1,\dots ,{\lambda}_{1}\}\times \dots \times \{1,\dots ,{\lambda}_{M1}\$. The ‘if’ part of the proof will follow from showing that if all $(M1)$dimensional tableaux $\mathcal{T}}^{\mathrm{\prime}$ with Young diagram $\mathcal{D}}^{\mathrm{\prime}$ are realizable, than all $M$dimensional Young diagrams whose restriction to $\{1,\dots {\lambda}_{1}\}\times \dots \times \{1,\dots {\lambda}_{M1}\}\times \{1\}$ is $\mathcal{D}}^{\mathrm{\prime}$ are realizable. To prove this property, observe that all the $M$dimensional Young diagrams with restriction $\mathcal{D}}^{\mathrm{\prime}$ are obtained as finite sequences of $(M1)$dimensional Young diagrams $\mathcal{D}}^{\mathrm{\prime}}={\mathcal{D}}_{1}^{\mathrm{\prime}}\supset {\mathcal{D}}_{2}^{\mathrm{\prime}}\supset \dots \supset {\mathcal{D}}_{{\lambda}_{M}}^{\mathrm{\prime}$, for some $\lambda}_{M$ specifying the minimum period in the mth dimension. For all such sequences, consider a tableau $\mathcal{T}}^{\mathrm{\prime}$ labeling ${D}^{\prime}$ such that for all $i$, $1\le i\le {\lambda}_{M}1$, the labels of ${D}_{i+1}^{\prime}$ are smaller than the labels ${D}_{i}^{\prime}\setminus {D}_{i+1}^{\prime}$. Such a tableau is always possible because of the nested property of the sequence of diagrams ${D}_{i}^{\prime}$, $1\le i\le {\lambda}_{M}$. Now, suppose that the Young tableau ${T}^{\prime}$ is realizable. This means that there is a perceptron $({\mathit{w}}^{\mathrm{\prime}},{\theta}^{\mathrm{\prime}})$ acting on the gridlike inputs in $\mathcal{C}}_{{\mathit{\lambda}}^{\mathrm{\prime}}$ such that ${\mathcal{T}}^{\mathrm{\prime}}=\mathcal{T}\mathbf{(}{w}^{\mathrm{\prime}},{\theta}^{\mathrm{\prime}})$. With no loss of generality, the weight vector $\mathit{w}}^{\mathrm{\prime}$ specifies a sequence of patterns ${\mathit{c}}_{k}^{\star}({\mathit{w}}^{\mathrm{\prime}})$, $1\le k\le {\mathrm{\Lambda}}^{\prime}$, and a sequence of thresholds ${\theta}_{k}({\mathit{w}}^{\mathrm{\prime}})$, $1\le k\le {\mathrm{\Lambda}}^{\prime}$, such that $(1)$ enumerates the elements of $\mathcal{C}}_{{\mathit{\lambda}}^{\mathrm{\prime}}$ by increasing level of activity and $(2)$ for all $0\le k\le {\mathcal{D}}^{\mathrm{\prime}}$, the set of active patterns ${\mathcal{J}}_{k}(\mathit{w})$ defined in (29) is linearly separable if and only if ${\theta}_{k}({\mathit{w}}^{\mathrm{\prime}})\le \theta <{\theta}_{k+1}({\mathit{w}}^{\mathrm{\prime}})$. Then by construction, the diagrams ${D}_{i}^{\prime}$, $1\le i\le {\lambda}_{M}$, are realized by a perceptron $({\mathit{w}}^{\mathrm{\prime}},{\theta}_{i}^{\mathrm{\prime}})$, where every ${\theta}_{i}^{\prime}\ge {\theta}^{\prime}$ is such that ${\theta}_{\mathrm{\Lambda}{\mathcal{D}}_{i}^{\mathrm{\prime}}}({\mathit{w}}^{\mathrm{\prime}})<{\theta}_{i}^{\mathrm{\prime}}<{\theta}_{\mathrm{\Lambda}{\mathcal{D}}_{i}^{\mathrm{\prime}}+1}({\mathit{w}}^{\mathrm{\prime}})$. We are now in a position to construct a $M$module perceptron $(\mathit{w},{\theta}^{\mathrm{\prime}})$ realizing the sequence ${D}^{\prime}={D}_{1}^{\prime}\supset {D}_{2}^{\prime}\supset \mathrm{\dots}\supset {D}_{{\lambda}_{M}}^{\prime}$. To do so, it is enough to specify the components ${w}_{M,1},\mathrm{\dots},{w}_{M,{\lambda}_{M}}$ of the Mth module of a weight vector $\mathit{w}$ since the other components will coincide with $\mathit{w}}^{\mathrm{\prime}$. One can check that choosing ${w}_{M,i}={\theta}_{i}^{\prime}{\theta}^{\prime}$ defines an admissible increasing sequence of nonnegative weights.
(ii) For the ‘only if’ part, let us consider an arbitrary $(M1)$dimensional Young tableau ${T}^{\prime}$, with diagram ${D}^{\prime}$ such that ${D}^{\prime}=p$. Then let us consider the $m$dimensional Young diagram $D$ obtained via the sequence of $(M\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}1)$dimensional diagrams ${D}^{\prime}={D}_{1}^{\prime}\supset {D}_{2}^{\prime}\supset \mathrm{\dots}\supset {D}_{p}^{\prime}$, where for all $q$, $1\le q\S lt;p$, $\mathcal{D}}_{q}^{\mathrm{\prime}}\setminus {\mathcal{D}}_{q+1}^{\mathrm{\prime}$ is a singleton containing the lattice point labeled by $pq+1$. Moreover, let us consider the smallest periods $\mathit{\lambda}$ such that $D\subset \{1,\mathrm{\dots},{\lambda}_{1}\}\times \mathrm{\dots}\times \{1,\mathrm{\dots},{\lambda}_{M}\}$. Now, suppose that all $m$dimensional Young diagrams are realizable. Then, there is a perceptron $(\mathit{w},\theta )$ acting on $\mathcal{C}}_{\mathit{\lambda}$ with modularly ordered, nonnegative weights such that $\mathcal{D}=\mathcal{D}(\mathit{w},\theta )$. This means that for all $i$, $1\le q\le p$, the diagram ${D}_{q}^{\prime}$ is realized by the perceptron $({\mathit{w}}^{\mathrm{\prime}},\theta {w}_{M,q})$, where $\mathit{w}}^{\mathrm{\prime}$ collect the components of $\mathit{w}$ that correspond to $m1$ first modules. Then, let us consider the pattern $\mathit{c}}_{q$ represented by the lattice point in the singleton ${D}_{q}^{\prime}\setminus {D}_{q+1}^{\prime}$. Remember that a pattern $\mathit{c}$ is identified to the lattice point $({i}_{1},\dots ,{i}_{M})$, whose coordinates are given by the phase of the active neuron within each module. Then, by the increasing property of the weights, we necessarily have $\theta {w}_{M,q+1}\le {\mathit{w}}^{\mathrm{\prime}T}{\mathit{c}}_{\mathit{q}}<\theta {w}_{M,q}$, which implies that the Young tableaux $\mathcal{D}}^{\mathrm{\prime}$ is realized by the perceptron $({\mathit{w}}^{\mathrm{\prime}},\theta {w}_{M,1})$.
It is straightforward to check that all 1D Young tableaux are realizable, so that all 2D Young diagrams are realizable. However, the following counterexample shows that not all 2D Young tableaux are realizable, so that $M$dimensional Young diagrams with $M\S gt;2$ are not all realizable.
Counterexample 1. The 2D Young tableaux defined as
is not realizable.
Proof. Suppose there is a perceptron with modularly ordered, nonnegative, weight vector $\mathit{w}=({w}_{1,1},{w}_{1,2},{w}_{1,3},{w}_{2,1},{w}_{2,2},{w}_{2,3})$ realizing $D$. By convention, we consider that the first module corresponds to the horizontal axis and the second module corresponds to the vertical axis. The labeling of $\mathcal{T}$ implies order relations among readout activities via $\mathit{w}$. Specifically, the activities can be listed by increasing order as ${w}_{1,1}+{w}_{2,1}\S lt;{w}_{1,2}+{w}_{2,1}\S lt;{w}_{1,1}+{w}_{2,2}\S lt;{w}_{1,1}+{w}_{2,3}\S lt;\dots$. We are going to show that such an order is impossible by contradiction. To do so, let us introduce the weight differences ${u}_{1}={w}_{1,2}{w}_{1,1}$, ${u}_{2}={w}_{1,3}{w}_{1,2}$ associated to the first module and the weight differences ${v}_{1}={w}_{2,2}{w}_{2,1}$, ${v}_{2}={w}_{2,3}{w}_{2,2}$ associated to the second module. These differences satisfy incompatible order relations. Specifically: $(1)$ the sequence $2\to 3$ in $T$ implies that the cost to go right, that is, ${u}_{1}={w}_{1,2}{w}_{1,1}$, is less than the cost to go up, that is, ${v}_{1}={w}_{2,2}{w}_{2,1}$. Otherwise, the label 2 would be on top the label 1. Thus, we necessarily have $u}_{1}\S lt;{v}_{1$. The same reasoning for the sequence $4\to 5$ implies $v}_{2}<{u}_{1$, so that we have $v}_{2}<{v}_{1$ The sequence $5\to 6$ implies $v}_{1}<{u}_{2$, and the sequence $7\to 8$ implies $u}_{2}<{v}_{2$, so that we have $v}_{1}\S lt;{v}_{2$. Thus, assuming that $T$ is realizable leads to considering weights for which $v}_{2}\S lt;{v}_{1$ and $v}_{1}<{v}_{2$—a contradiction.
Linearly separable dichotomies for realizable Young diagrams
Consider a Young $M$dimensional diagram $D\in \{1,\mathrm{\dots}{\lambda}_{1}\}\times \mathrm{\dots}\times \{1,\mathrm{\dots}{\lambda}_{M}\}$ that can be realized by a perceptron with modularly ordered, nonnegative weights. Such a Young diagram $D$ is the lattice set whose points represent the phase indices of inactive gridlike inputs. Indeed, if $({i}_{1},\mathrm{\dots},{i}_{M})\in D$, we have ${\sum}_{m=1}^{M}{w}_{m,{i}_{m}}\le \theta $, which means that the perceptron is inactive for the gridlike input $\mathit{c}$ in $\mathcal{C}}_{\mathit{\lambda}$ obtained by setting ${c}_{m,{i}_{m}}=1$ for all $1\le m\le M$. Thus, the perceptron implements the dichotomy for which the inactive gridlike inputs are exactly represented by $D$. Are there more dichotomies associated to $D$? Answering this question requires revisiting the correspondence between perceptrons and Young diagrams. The key property in establishing this correspondence is the assumption of modularly ordered weights. In Section B.1, we justified that such an assumption incurs no loss of generality by permutation invariance of the grid cells within each modules. Thus, each Young diagram $D$ is in fact associated to the class of perceptrons
where $\mathrm{\Pi}}_{\mathit{\lambda}$ denotes the set of permutation matrix stabilizing the modules of periods $\mathit{\lambda}$. Clearly, for $P\ne {P}^{\prime}$, the perceptron $(P\mathit{w},\theta )$ generally implements a distinct dichotomy than that of $({P}^{\mathrm{\prime}}\mathit{w},\theta )$. As a result, there is a class of dichotomies indexed by the Young diagram $D$, which we denote by $C(D)$.
Evaluating the cardinality of $C(D)$ via simple combinatorial arguments first requires a crude description of the geometry of $D$, and specifically of its degenerate symmetries. For all $1\le m\le M$, $1\le i\le {\lambda}_{m}$, let us denote the restriction of $D$ to the hyperplane ${i}_{m}=i$ by
By definition of the Young diagrams, we have ${R}_{m,i}\left(D\right)\supset {R}_{m,i+1}\left(D\right)$ for all $1\le i\S lt;{\lambda}_{m}$. We say that a Young diagram exhibits a degenerate symmetry along the mth dimension whenever two consecutive restrictions coincide: ${R}_{m,i}\left(D\right)={R}_{m,i+1}\left(D\right)$. To make the notion of degeneracy more precise, let us consider the equivalence relation on $\{1,\mathrm{\dots},{\lambda}_{m}\}$ defined by $i\sim j\iff {R}_{m,i}\left(D\right)={R}_{m,j}\left(D\right)$. Given $i$ in $\{1,\mathrm{\dots},{\lambda}_{m}\}$, the equivalence class of $i$ is then $\{j\in \{1,\mathrm{\dots},{\lambda}_{m}\}{R}_{m,i}\left(D\right)={R}_{m,j}\left(D\right)\}$. Let us denote the total number of such equivalence classes by k_{m}, $1\le {k}_{m}\le {\lambda}_{m}$. Then, the set $\{1,\mathrm{\dots},{\lambda}_{m}\}$ can be partitioned in k_{m} classes, ${C}_{m,1},\mathrm{\dots},{C}_{m,{k}_{m}}$, where the classes are listed by decreasing order of Young diagrams. For instance C_{1} comprises all the indices for which the restriction along the mth dimension yields the same Young diagram as ${R}_{m,1}\left(D\right)$. We denote the cardinality of the thusordered equivalence classes by ${\sigma}_{m,k}={C}_{m,k}$, $1\le k\le {k}_{m}$, so that we have ${\lambda}_{m}={\sigma}_{m,1}+\mathrm{\dots}+{\sigma}_{m,{k}_{m}}$. We refer to the ${\sigma}_{m,k}$ as the degeneracy indices. Degenerate symmetries correspond to degeneracy indices ${\sigma}_{m,k}\S gt;1$. We are now in a position to determine the cardinality of $C(D)$:
Proposition 6
For integer periods ${\lambda}_{\mathrm{1}}\mathrm{,}\mathrm{\dots}\mathrm{,}{\lambda}_{M}$, let us consider a realizable Young diagram $\mathrm{D}$ in $\mathrm{\{}\mathrm{1}\mathrm{,}\mathrm{\dots}\mathrm{,}{\lambda}_{\mathrm{1}}\mathrm{\}}\mathrm{\times}\mathrm{\dots}\mathrm{\times}\mathrm{\{}\mathrm{1}\mathrm{,}\mathrm{\dots}\mathrm{,}{\lambda}_{M}\mathrm{\}}$. Then, the class of linearly separable dichotomies with Young diagram $\mathrm{D}$, denoted by $C\mathit{}\mathrm{(}\mathrm{D}\mathrm{)}$, has cardinality
where ${\sigma}_{m\mathrm{,}k}$, $\mathrm{1}\mathrm{\le}k\mathrm{\le}m$ are the degeneracy indices of the Young diagram along the mth dimension.
Proof. A dichotomy is specified by enumerating the set of inactive gridlike inputs $\mathit{c}$ in $\mathcal{C}}_{\mathit{\lambda}$. Each pattern $\mathit{c}$ can be conveniently represented as a lattice point in $\{1,\mathrm{\dots},{\lambda}_{1}\}\times \mathrm{\dots}\times \{1,\mathrm{\dots},{\lambda}_{M}\}$ by considering the phase indices of the active cell in the $M$ modules of pattern $\mathit{c}$. Thus, a generic dichotomy is just a configuration of lattice points in $\{1,\mathrm{\dots},{\lambda}_{1}\}\times \mathrm{\dots}\times \{1,\mathrm{\dots},{\lambda}_{M}\}$. The class of dichotomies $C(D)$ comprises all latticepoint configurations in $\{1,\mathrm{\dots},{\lambda}_{1}\}\times \mathrm{\dots}\times \{1,\mathrm{\dots},{\lambda}_{M}\}$ obtained by permutations of the indices along the $\mathit{c}$ dimensions:
where we define
and where $\mathbb{S}}_{{\lambda}_{m}$ denotes the set of permutation of $\{1,\dots ,{\lambda}_{m}\}$. Let us denote a generic latticepoint configuration in $\{1,\dots ,{\lambda}_{1}\}\times \dots \times \{1,\dots ,{\lambda}_{M}\}$ by $\mathcal{S}$. By permuting the indices of the points in $S$, each transformation ${\pi}_{m}$ is actually permuting ${R}_{m,i}(S)$, $1\le i\le m$, the restrictions of the latticepoint configuration along the mth dimension. The partial order defined by inclusion is preserved by permutations in the sense that given ${\pi}_{m}$ in $\mathbb{S}}_{{\lambda}_{m}$, $1\le m\le M$, we have ${R}_{m,{\pi}_{m}(i)}({\pi}_{1}\dots {\pi}_{M}\mathcal{S})\subset {R}_{m,{\pi}_{m}(j)}({\pi}_{1}\dots {\pi}_{M}\mathcal{S})$ if and only if ${R}_{m,i}(\mathcal{S})\subset {R}_{m,j}(\mathcal{S})$. In particular, k_{m}, the number of restriction classes induced by the relation $i\sim j\iff {R}_{m,i}\left(S\right)={R}_{m,j}\left(S\right)$, is invariant to permutations, and so are their cardinalities. These cardinalities specify the degeneracy indices ${\sigma}_{m,1},\mathrm{\dots},{\sigma}_{m,{k}_{m}}$ of $S$ along the mth dimension. Thus, all configurations $S$ obtained via permutation of $D$ have the same degeneracy indices as $D$. Moreover, for a Young diagram $D$, these degeneracy indices simply count the equivalence classes formed by restrictions of identical size along the same dimension. Thus, the number of dichotomies in $C(D)$ is determined as the number of ways to independently assign the indices $\{1,\mathrm{\dots},{\lambda}_{m}\}$ to k_{m} restriction classes of size ${\sigma}_{m,1},\mathrm{\dots},{\sigma}_{m,{k}_{m}}$ for all $m$, $1\le m\le M$. For each $m$, this number is given by the multinomial coefficient: ${\lambda}_{m}!/({\sigma}_{m,{k}_{1}}!\mathrm{\dots}{\sigma}_{m,{k}_{m}}!)$.
As opposed to the case of random configurations in general position, the many symmetries of the gridlike inputs in $\mathcal{C}}_{\mathit{\lambda}$ allow one to enumerate dichotomies of specific cardinalities. We define the cardinality of a dichotomy by the size of the set of active pattern it separates. Thus, a perceptron $(\mathit{w},\theta )$ realizing a $k$dichotomy is one for which exactly $k$ patterns $\mathit{c}$ in $\mathcal{C}}_{\mathit{\lambda}$ are such that ${\mathit{w}}^{T}\mathit{c}>\theta$. Proposition 7 reduces the problem of counting realizable $k$dichotomies to that of enumerating realizable Young diagrams $D$ of size $D=k$. Such an enumeration depends on the number of modules $M$, which sets the dimensionality of the Young diagrams, as well as the periods ${\lambda}_{m}$, $1\le m\le M$. Unfortunately, even without considering the constraint of being a realizable Young diagram, there is no convenient way to enumerate Young diagrams of fixed size for general dimension $M$. However, for low cardinality, for example, $k\le 5$, there are only a few Young diagrams such that $D=k$, and it turns out that all of them are realizable. In the following, and without aiming at exhaustivity, we exploit the latter fact to characterize the sets of $k$dichotomies for $k\le 5$ and to compute their cardinalities.
There are $M$ possible $M$dimensional Young diagram of size 2, according to the dimension along which the two lattice points are positioned. The Young diagram extending along the mth dimension, $1\le m\le M$, has degeneracy indices ${\sigma}_{m,1}=2$ and ${\sigma}_{m,2}={\lambda}_{m}2$ or ${\sigma}_{n,1}=1$ and ${\sigma}_{n,2}={\lambda}_{n}1$ for $n\ne m$. As a result, the number of 2dichotomies of gridlike inputs is given by
There are two types of Young diagram of size 3, type $(3a)$ for which the three lattice points span one dimension and type $(3b)$ for which the lattice points span two dimensions. There are $M$ possible Mdimensional Young diagram of type $(3a)$. The degeneracy indices for the Young diagram extending along the mth dimension, $1\le m\le M$, are ${\sigma}_{m,1}=3$ and ${\sigma}_{m,3}={\lambda}_{m}3$, and ${\sigma}_{n,1}=1$ and ${\sigma}_{n,2}={\lambda}_{n}1$ for $n\ne m$, yielding
There are $M(M1)/2$ possible $M$dimensional Young diagram of type $(3b)$, as many as choices of two dimensions among $M$. The degeneracy indices of the Young diagram extending along dimensions $m$ and $n$, $1\le m\S lt;n\le M$, are ${\sigma}_{m,1}={\sigma}_{m,2}=1$ and ${\sigma}_{m,3}={\lambda}_{m}2$, ${\sigma}_{n,1}={\sigma}_{n,2}=1$ and ${\sigma}_{n,3}={\lambda}_{n}2$, and ${\sigma}_{k,1}=1$ and ${\sigma}_{k,2}={\lambda}_{k}1$ for $k\ne m,n$, yielding
As a result, the number of 3dichotomies of gridlike inputs is given by
A similar analysis reveals that there are four types of Young diagrams of size 4, which span up to three dimensions if $M\le 3$. These Young diagrams, denoted by $(4a)$, $(4b)$, $(4c)$, and $(4d)$, are represented in Figure 6, where degeneracy indices can be read graphically. As a result, the number of 4dichotomies of gridlike inputs is given by $\mathcal{N}}_{4}={\mathcal{N}}_{4}^{a}+{\mathcal{N}}_{4}^{b}+{\mathcal{N}}_{4}^{c}+{\mathcal{N}}_{4}^{d$ where the number of typespecific dichotomies is given by
The classification of dichotomies via Young diagrams also illuminates the geometrical structure of linearly separable $k$dichotomies, at least for small $k$. In particular, 2dichotomies are linearly separable if they involve two lattice points forming an edge of the convex polytope, that is, if these points correspond to patterns in $\mathcal{C}}_{\mathit{\lambda}$ whose coordinates only differ in one module. Similarly, 3dichotomies are linearly separable if and only if $(3a)$ they involve three lattice points representing patterns in $\mathcal{C}}_{\mathit{\lambda}$ whose coordinates only differ in one module or $(3b)$ they involve two pairs of lattice points representing patterns in $\mathcal{C}}_{\mathit{\lambda}$ whose coordinates only differ in one module. Thus, $(3a)$ corresponds to the case of three lattice points specifying a clique of convexpolytope edges, while $(3b)$ corresponds to the case of three lattice points specifying two convexpolytope edges. We illustrate the four geometrical structures of the linearly separable 4dichotomies in Figure 6.
Numbers of dichotomies for two modules
For two modules of period ${\lambda}_{1}$ and ${\lambda}_{2}$, recall that each grid pattern in $\mathcal{C}}_{\lambda$ is a $({\lambda}_{1}+{\lambda}_{2})$dimensional vector, which is entirely specified by the indices of its two active neurons: $(i,j)$, $1\le i\le {\lambda}_{1}$, $1\le j\le {\lambda}_{2}$. Thus, it is convenient to consider a set of grid patterns as a collection of points in the discrete lattice $\{1,\mathrm{\dots},{\lambda}_{1}\}\times \{1,\mathrm{\dots},{\lambda}_{2}\}$. From Proposition 4, we know that linearly separable dichotomies are made of those sets of grid patterns $\mathcal{C}}_{\lambda$ for which a Young diagram can be formed via permutations of rows and columns in the lattice (see Figure 7). By convention, we consider that the marked lattice points forming a Young diagram define the set of active grid patterns. The remaining unmarked lattice points define the set of inactive grid patterns. To each 2D Young diagrams in the lattice $\{1,\mathrm{\dots},{\lambda}_{1}\}\times \{1,\mathrm{\dots},{\lambda}_{2}\}$ corresponds a class of linearly separable dichotomies. Counting the total number of linearly separable dichotomies when $M=2$ will proceed in two steps: (i) we first give a slightly stronger result than Proposition about the cardinality of the classes of dichotomies associated to a Young diagram, and (ii) we evaluate the total number of dichotomies by summing class cardinalities over the set of Young diagrams.
Proposition 7
For two integer periods ${\lambda}_{\mathrm{1}}$ and ${\lambda}_{\mathrm{2}}$, let us consider a Young diagram $\mathrm{D}$ in the lattice $\mathrm{\{}\mathrm{1}\mathrm{,}\mathrm{\dots}\mathrm{,}{\lambda}_{\mathrm{1}}\mathrm{\}}\mathrm{\times}\mathrm{\{}\mathrm{1}\mathrm{,}\mathrm{\dots}\mathrm{,}{\lambda}_{\mathrm{2}}\mathrm{\}}$. Without loss of generality, $\mathrm{D}$ can be specified via the degeneracy indices ${\sigma}_{\mathrm{1}\mathrm{,}\mathrm{1}}\mathrm{,}\mathrm{\dots}\mathit{}{\sigma}_{\mathrm{1}\mathrm{,}k}$, and ${\sigma}_{\mathrm{2}\mathrm{,}\mathrm{1}}\mathrm{,}\mathrm{\dots}\mathit{}{\sigma}_{\mathrm{2}\mathrm{,}k}$, chosen such that
Then, the class of linearly separable dichotomies with Young diagram $\mathrm{D}$, denoted by $C\mathit{}\mathrm{(}\mathrm{D}\mathrm{)}$, has cardinality
where we have ${\sigma}_{\mathrm{1}\mathrm{,}\mathrm{1}}\mathrm{+}\mathrm{\dots}\mathrm{+}{\sigma}_{\mathrm{1}\mathrm{,}k\mathrm{+}\mathrm{1}}\mathrm{=}{\lambda}_{\mathrm{1}}$ and ${\sigma}_{\mathrm{2}\mathrm{,}\mathrm{1}}\mathrm{+}\mathrm{\dots}\mathrm{+}{\sigma}_{\mathrm{2}\mathrm{,}k\mathrm{+}\mathrm{1}}\mathrm{=}{\lambda}_{\mathrm{2}}$.
Proof. Consider a Young diagram $D$ in $\{1,\mathrm{\dots},{\lambda}_{1}\}\times \{1,\mathrm{\dots},{\lambda}_{2}\}$ with $p$ inactive patterns. The diagram $D$ is uniquely defined by the row partition $p={r}_{1}+\mathrm{\dots}+{r}_{{\lambda}_{1}}$, ${r}_{1}\ge \mathrm{\dots}\ge {r}_{{\lambda}_{1}}$, where r_{i} denotes the occupancy of row $i$, or equivalently by the column partition $p={s}_{1}+\mathrm{\dots}+{s}_{{\lambda}_{2}}$, ${s}_{1}\ge \mathrm{\dots}\ge {s}_{{\lambda}_{2}}$, where s_{j} denotes the occupancy of column $j$. The occupancies $\{{r}_{1},\mathrm{\dots},{r}_{{\lambda}_{1}}\}$ and $\{{s}_{1},\mathrm{\dots},{s}_{{\lambda}_{2}}\}$ entirely define restrictions along each dimension and each set of occupancies along a dimension is invariant to row and column permutations. The corresponding degeneracy indices can be determined straightforwardly by counting the number of rows or columns with a given occupancy, that is, within a given equivalence class. Denoting the necessarily identical number of rows classes and columns classes by $k\le \mathrm{min}({\lambda}_{1},{\lambda}_{2})$, Proposition yields directly the announced result.
Proposition 8
For two integer periods ${\lambda}_{\mathrm{1}}$ and ${\lambda}_{\mathrm{2}}$, the number of linearly separable dichotomies in ${\mathrm{C}}_{\mathrm{(}{\lambda}_{\mathrm{1}}\mathrm{,}{\lambda}_{\mathrm{2}}\mathrm{)}}$ is
where $S\mathit{}\mathrm{(}n\mathrm{,}k\mathrm{)}$ denotes the Stirling numbers of the second kind and where ${B}_{k}^{\mathrm{(}n\mathrm{)}}$ denotes the polyBernoulli numbers.
Proof. Our goal is to evaluate the total number of dichotomies $\mathcal{N}}_{{\lambda}_{1},{\lambda}_{2}$. To achieve this goal, we will exploit the combinatorics of 2D Young diagrams to specify $\mathcal{N}}_{{\lambda}_{1},{\lambda}_{2}$ as
where $D$ runs over all possible Young diagrams. Because of the multinomial nature of the cardinalities $C(D)$, it is advantageous to adopt an alternative representation for Young diagrams. This alternative representation will require utilizing the frontier of a Young diagram. Given a Young diagram $D$ with $k$ distinct nonempty rows and $k$ distinct nonempty columns, we define its frontier as the path joining the lattice points $(0,{\lambda}_{2})$ and $({\lambda}_{1},0)$, via lattice positions in $D$ separating the active region from the inactive region (see Figure 7). Such a path is uniquely defined via $k+1$ downward steps of size ${\sigma}_{1,k+1},\mathrm{\dots},{\sigma}_{1,1}$ and $k+1$ rightward steps of sizes ${\sigma}_{2,1},\mathrm{\dots},{\sigma}_{2,k+1}$, which satisfy ${\sigma}_{1,1}+\mathrm{\dots}+{\sigma}_{1,k+1}={\lambda}_{1}$ and ${\sigma}_{2,1}+\mathrm{\dots}+{\sigma}_{2,k+1}={\lambda}_{2}$. Clearly, the frontier of $D$ determines the cardinality of $C(D)$ via (46). To evaluate $\mathcal{N}}_{{\lambda}_{1},{\lambda}_{2}$ in (48), we partition Young diagrams based on $k$, the number of distinct row and column sizes. For $k=0$, we have ${\sigma}_{1,1}={\lambda}_{1}$ and ${\sigma}_{2,1}={\lambda}_{2}$, corresponding to ${N}_{{\lambda}_{1},{\lambda}_{2}}(0)=1$ Young diagram, the empty diagram, where all patterns are inactive. For $k=1$, there is a single row and column size, corresponding to Young diagrams where the active patterns are arranged in a rectangle, with edge lengths ${\sigma}_{1,1}$ and ${\sigma}_{2,1}$. Nonempty rectangular diagrams correspond to ${\sigma}_{1,1}\S gt;0$ and ${\sigma}_{2,1}\S gt;0$, and thus contribute
to the sum (48). The contribution of diagrams with general $k$frontier, denoted by ${N}_{{\lambda}_{1},{\lambda}_{2}}(k)$, follows from the multinomial theorem, where one ensures that frontiers with less than $k+1$ downward and rightward steps do not get repeated. These $k$frontiers correspond to $k+1$ sequences of downward and rightward steps for which no step has zero size, except possibly for the first downward step emanating from $(0,{\lambda}_{2})$ and the last rightward step arriving at $({\lambda}_{1},0)$. Under these conditions, the downward and rightward steps can be chosen independently, so that we can write ${N}_{{\lambda}_{1},{\lambda}_{2}}(k)={f}_{k}({\lambda}_{1}){f}_{k}({\lambda}_{2})$, where the factors ${f}_{k}({\lambda}_{1})$ and ${f}_{k}({\lambda}_{2})$ only depend on the downward steps and rightward steps, respectively. Let us focus on the downward steps alone, that is, on the term ${f}_{k}({\lambda}_{1})$. The admissible sequences of steps satisfy ${\sigma}_{1,1}+\mathrm{\dots}+{\sigma}_{1,k+1}={\lambda}_{1}$, with ${\sigma}_{1,1},\mathrm{\dots},{\sigma}_{1,k}\ne 0$. From the multinomial theorem, we have
where the first term of the righthand side is ${f}_{k}({\lambda}_{1})$ and the second term of the righthand side collects the contribution of sequences that are not $k$frontiers. The latter term can be evaluated explicitly via the exclusioninclusion principle yielding
where we have used the multinomial theorem for the last equality. Together with (51), the above equation allows one to specify ${f}_{k}(\lambda )$ in terms of the Sterling numbers of the second kind, denoted by $S(n,k)$, as
where the last equality follows from a wellknown identity about Stirling numbers of the second kind. Then, the overall number of dichotomies follows from the fact that the frontier has at most $\mathrm{min}({\lambda}_{1},{\lambda}_{2})$ distinct values of row/column sizes, which implies
where we have recognized the definition of the polyBernoulli numbers ${B}_{k}^{(n)}$. These numbers are defined via the generating function
where ${\mathrm{Li}}_{k}$ denotes the polylogarithm.
PolyBernoulli numbers were originally introduced by Kaneko to enumerate the set of binary $k$by$n$ matrices that are uniquely reconstructible from their row and column sums (Kaneko, 1997). The use of polyBernoulli numbers to enumerate permutations of Young tableaux was pioneered by Postnikov while investigating totally Grassmannian cells (Postnikov, 2006). While studying the asymptotics of the extremal excedance set statistic, de Andrade et al., 2015 obtained the asymptotics of the polyBernoulli numbers along the diagonal:
Appendix 3
Spatial embedding of the grid code
In this Appendix, we address the limitations entailed by spatially embedding gridlike inputs. First, we define the gridcellactivity matrix that specifies the spatial assignment of gridlike inputs for 1D space. Second, we show that the contiguousseparating capacity, defined as the maximum spatial extent over which all possible dichotomies are linearly separable, is determined by the rank of the gridcellactivity matrix. Third, we generalize our results about the separating capacity to spaces of arbitrary dimensions.
Gridcellactivity matrix for 1D space
The fundamental object of our combinatorial analysis is the polytope whose vertices have all possible gridcell patterns as coordinates. Thanks to the many symmetries of this polytope, we can enumerate linearly separable dichotomies of gridlike inputs. However, such an approach makes no explicit reference the actual physical space that these gridlike inputs encode. Making these reference consists in specifying a mapping between spatial positions and gridlike inputs. Unfortunately, this generally involves breaking many of the polytope symmetries, precluding any combinatorial analysis. It is especially true if one considers spaces encoded by a subset of gridcell patterns, as opposed to the full set $\mathcal{C}}_{\mathit{\lambda}$, a situation that leads to considering nonsymmetrical polytopes.
Let us explain this point by considering the case of a discrete 1D space where each position is marked by an integer in $\mathbb{Z}$. In this setting, positional information about $\mathbb{Z}$ is encoded by $M$ modules of grid cells with integer periods $\mathit{\lambda}=({\lambda}_{1},\dots ,{\lambda}_{M})$. Recall that each module comprises ${\lambda}_{m}$ cells, each active at a distinct phase within the period ${\lambda}_{m}$, and that the corresponding repertoire of gridlike inputs $\mathcal{C}}_{\mathit{\lambda}$ has cardinality $\mathrm{\Lambda}={\prod}_{m=1}^{M}{\lambda}_{m}$. Because the spatial activity of grid cells is periodic and because we consider a finite number of grid cells, the mappings between spatial positions and gridlike inputs are necessarily periodic functions $A}_{\mathit{\lambda}}:\mathbb{Z}\to {\mathcal{C}}_{\mathit{\lambda}$. Let us denote by $L$ the period of $A}_{\mathit{\lambda}$. It is then convenient to consider the functions $A}_{\mathit{\lambda}}:\mathbb{Z}/L\mathbb{Z}\to {\mathcal{C}}_{\mathit{\lambda}$ as matrices, called gridcellactivity matrices, whose jth column is the pattern in $\mathcal{C}}_{\mathit{\lambda}$ that encodes the jth spatial position in $\{1,\mathrm{\dots},L\}$, seen as the element $j$ in $\mathbb{Z}/L\mathbb{Z}$. In particular, the matrices $A}_{\mathit{\lambda}$ have $N={\sum}_{m=1}^{M}{\lambda}_{m}$ rows, each row corresponding to the periodic activity of a grid cell. Moreover, at every position $j$, $1\le j\le L$, each module has a single active cell. For the sake of clarity, here follows a concrete example of gridcellactivity matrix for $\mathit{\lambda}=(2,3,5)$:
As the labelling of grid cells is arbitrary within a module, gridpopulation activity is actually represented by a class of matrices, which is invariant to permutation of the grid cells $(m,i)$, $1\le i\le {\lambda}_{m}$, within a module $m$. Here, with no loss of generality, we consider the class representatives obtained by ordering the grid cells by increasing phase within each module. This convention allows us to simply define the activity matrix $A}_{\mathit{\lambda}$ via the introduction of a spatial shift operator $J}_{\mathit{\lambda}$. We define the shift operator $J}_{\mathit{\lambda}$ as the linear operator that cyclically increments the phases by one unit within each module, that is,
where ${J}_{{\lambda}_{m}}$ is the canonical circulant permutation matrix of order ${\lambda}_{m}$. We refer to $J}_{\mathit{\lambda}$ as a shift operator because its action on any vector of $A}_{\mathit{\lambda}$ corresponds to a positional shift by one unit of space: if $\mathit{c}}_{j$, $1\le j\le L$, denotes the jth column of $A}_{\mathit{\lambda}$, then $J}_{\mathit{\lambda}}{\mathit{c}}_{j}={\mathit{c}}_{j+1$ if $j\S lt;L$, and $J}_{\mathit{\lambda}}{\mathit{c}}_{L}={\mathit{c}}_{1$. Thus, we can define the gridcellactivity matrix as the matrix obtained by enumerating in order the gridcell patterns $J}_{\mathit{\lambda}}^{k}{\mathit{c}}_{1$, $k\in \mathbb{N}$, up to redundancies. Such a definition of the gridcellactivity matrix prominently features the relation between the symmetries of the grid code and those of the actual physical space. In particular, it clearly shows that the formulation of our problem is invariant to rotation of the discretized space $1,2,\mathrm{\dots},L$, that is, by shift in $\mathbb{Z}/L\mathbb{Z}$. We show that gridcellactivity matrix can be similarly defined for lattice space of higher dimensions in Section C.3, including the relevant case of the 2D hexagonal lattice.
A key observation is that the periodicity $L$, that is, the number of positions univocally tagged by gridlike inputs, is directly related to the periods $\mathit{\lambda}$ via the Chinese remainder theorem. Indeed, by the Chinese remainder theorem, the first redundant gridlike input occurs for $L=\mathrm{lcm}({\lambda}_{1},\mathrm{\dots},{\lambda}_{M})$, therefore specifying the number of columns of the activity matrix. Thus, for pairwise coprime periods ${\lambda}_{m}$, $1\le m\le M$, we have $L=\mathrm{\Lambda}$ and the columns of the activity matrix $A}_{\mathit{\lambda}$ exhaustively enumerate all gridlike inputs in $\mathcal{C}}_{\mathit{\lambda}$. As a result, all the combinatorial results obtained for the full set of patterns $\mathit{C}}_{\mathit{\lambda}$ directly apply over the full linear space $\{1,\mathrm{\dots},L\}$ for pairwise coprime periods. In particular, for pairwise coprime periods, we have $\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}=\sum _{i=1}^{M}{\lambda}_{i}M+1$ by Proposition 1.
Unfortunately, our combinatorial results do not directly extend to a spatial context for integer periods that are not pairwise coprime or for incomplete spaces $\{1,\mathrm{\dots},{L}^{\prime}\}$, ${L}^{\mathrm{\prime}}\S lt;L$. For noncoprime periods, we have $L\S lt;\mathrm{\Lambda}$, as exemplified by the gridcellactivity matrix for $\mathit{\lambda}=(2,8)$ given by
which comprises only four of the eight patterns of $\mathcal{C}}_{2,4$. Independent of the coprimality of the periods, the gridcellactivity matrix over incomplete spaces is simply obtained by deleting the columns corresponding to the missing positions. In particular, we clearly have $L}^{\mathrm{\prime}}\S lt;L\le \mathrm{\Lambda$. Excluding some gridlike inputs has two opposite implications: (i) the total number of dichotomies is reduced in keeping with considering a smaller space but (ii) some dichotomies that were previously not linearly separable can become realizable. Disentangling these opposite implications is obscured by the many broken symmetries of the polytope formed by the subset patterns under consideration. For this reason, we essentially resort to studying spatial embedding of the grid code numerically. Such numerical analysis reveals, perhaps not surprisingly, that a key role is played by the embedding dimension of the grid code, especially in relation to the concept of contiguousseparating capacity.
Contiguousseparating capacity
We define the contiguousseparating capacity of a grid code as the maximum physical extent over which all possible dichotomies are linearly separable. Classically, for $N$dimensional inputs in general position, the separating capacity is defined as the maximum number of patterns for which all possible dichotomies are linearly separable, without any reference to contiguity. Within this context, Cover’s counting function theorem implies that the separating capacity equals the dimension of the input space. Should the gridlike inputs be in general position in the input space, the separating capacity would thus be equal to $\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}$. However, being in general position requires that any submatrix formed by $r$ columns of $A}_{\mathit{\lambda}$ be of rank $r$ for $r\le \mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}$. This property does not hold for gridcellactivity matrices. Moreover, we are interested in a stronger notion of separating capacity as we require that the gridlike inputs achieving separating capacity represent contiguous spatial position. Thankfully, the spatial symmetry of the gridcellactivity matrices allows us to show that even under these restrictions the separating capacity is indeed $\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}$.
Proposition 9
The contiguousseparating capacity of the generic gridcellactivity matrix $A}_{\mathit{\lambda}$ is equal to $\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}$.
Proof.
The proof proceeds in two steps. With no loss of generality, we only consider linear classification via perceptron with zero threshold.
(i) By permutation and shift invariance, it is enough to consider contiguous columns of $A}_{\mathit{\lambda}$ starting from the first column $\mathit{c}}_{1$. From the definition of $A}_{\mathit{\lambda}$, the $k$ contiguous columns can be generated in terms of the shift operator $J}_{\mathit{\lambda}$ as the sequence: $\mathit{c}}_{1},{J}_{\mathit{\lambda}}{\mathit{c}}_{1},\dots ,{J}_{\mathit{\lambda}}^{k}{\mathit{c}}_{1$. Let us consider the sequence ${\{{d}_{k}\}}_{k\in \mathbb{N}}$ defined by $d}_{k}=\mathrm{dim}\{{\mathit{c}}_{1},{J}_{\mathit{\lambda}}{\mathit{c}}_{1},\dots ,{J}_{\mathit{\lambda}}^{k}{\mathit{c}}_{1}\$. Posit $r=\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}$. If there is an integer $n$ such that ${d}_{n}={d}_{n+1}$, then necessarily d_{k} is constant for $k\ge n$, and is equal to $\underset{k\to \mathrm{\infty}}{lim}{d}_{k}={d}_{L}=r$. As ${d}_{1}=1$ and $d}_{k+1}{d}_{k}\in \{0,1\$, the preceding observation implies that ${d}_{k+1}{d}_{k}=1$ for $1\le k<\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{\phantom{\rule{thinmathspace}{0ex}}A}_{\mathit{\lambda}}$. This shows that the contiguous columns $\mathit{c}}_{i$, $1\le i\le r$, are linearly independent, and thus are in general position in the input space. By Cover’s counting function theorem, all dichotomies obtained by labeling the positions $1\le i\le r$ with $r=\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{\phantom{\rule{thinmathspace}{0ex}}A}_{\mathit{\lambda}}$ are linearly separable.
(ii) Considering an extra position, that is, including the column $\mathit{c}}_{r+1$, produces at least a dichotomy that is not linearly separable. We proceed by contradiction. Assume that all dichotomies of the $r+1$ positions, that is, of the columns $\mathit{c}}_{i$ with $1\le i\le r+1$, are linearly separable. By Cover’s counting function theorem, this is equivalent to assuming that all dichotomies of the first $r$ positions, that is, of the columns $\mathit{c}}_{i$ with $1\le i\le r$, can be achieved by an $(r1)$dimensional hyperplane passing through $\mathit{c}}_{r+1$. In other words, for all $r$dichotomies $\mathit{y}$ in $\{0,1{\}}^{r}$, there is a weight vector $\mathit{w}$ such that ${y}_{i}({\mathit{w}}^{T}{\mathit{c}}_{i})>0$ for $1\le i\le r$ and such that ${w}^{T}{\mathit{c}}_{r+1}=0$. However, by linear dependence, there are nonzero coefficients a_{i} such that $\mathit{c}}_{r+1}=\sum _{i=1}^{r}{a}_{i}{\mathit{c}}_{i$, so that for any $r$dichotomy, we can find $\mathit{w}$ achieving that dichotomy and such that
Considering a dichotomy for which ${y}_{i}={a}_{i}/{a}_{i}$ for nonzero coefficients yields
which is a contradiction with (66).
The above proposition specifies $\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}$ as the contiguousseparating capacity for 1D spatial model. This rank also specifies the dimension of the space containing the subset of gridlike inputs to be linearly classified. For pairwise coprime periods $\mathit{\lambda}$, Proposition 1 shows that $\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}=\sum _{m=1}^{M}{\lambda}_{m}$. The following proposition generalizes this result to generic integer periods.
Proposition 10
Let $A}_{\mathit{\lambda}$ denote the gridcellactivity matrix specified by M grid modules with integer periods $\mathit{\lambda}=({\lambda}_{1},\dots ,{\lambda}_{M})$. The rank of the activity matrix A_{λ} is given by
where $S$ is a subset of integer periods and $\mathrm{}S\mathrm{}$ denotes the cardinality of the set $S$. If the periods are pairwise coprime, the above formula yields $\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}=\sum _{i=1}^{M}{\lambda}_{i}M+1$.
Proof. The proof will proceed in three steps.
(i) The first step is to realize that $\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}=\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}^{T}=\mathrm{dim}\left({V}_{1}+\dots +{V}_{M}\right)$, where the vector spaces ${V}_{m}$, $1\le m\le M$, are generated by the rows of the mth module of the activity matrix. Then, the exclusioninclusion principle applied to the sum of ${V}_{1}+\mathrm{\dots}+{V}_{M}$ yields an expression for A_{λ} as the alternated sum:
By definition of the activity matrix, the space ${V}_{m}$ is generated by ${\lambda}_{m}$ row vectors, which are cyclically permuted versions of the ${\lambda}_{m}$periodic vector ${\mathit{r}}_{{\lambda}_{m}}=(1,0,\dots ,01,0,\dots ,01,0,\dots )$. In particular, these ${\lambda}_{m}$ row vectors can be enumerated by iterated application of $J$, the canonical $L$dimensional circulant permutation operator. The resulting sequence $\mathit{r}}_{{\lambda}_{m}},J\mathit{r},\dots ,{J}^{{\lambda}_{m}1}{\mathit{r}}_{{\lambda}_{m}$ actually forms a basis of ${V}_{m}$, identified to the space of ${\lambda}_{m}$periodic vectors of length $L$, and thus $dim{V}_{m}={\lambda}_{m}$. The announced formula will follow from evaluation of the dimension of the intersection of the vector spaces ${V}_{m}$.
(ii) The second step is to observe that one can specify the set of spaces ${V}_{m}$, $1\le m\le M$, as the span of vectors chosen from a common basis of ${\mathbb{R}}^{L}$, where we recall that $L=\mathrm{lcm}({\lambda}_{1},\mathrm{\dots},{\lambda}_{M})$. We identify such a common basis by considering the action of the operator $J$ on $L$dimensional periodic vectors. As a circulant permutation operator, $J$ admits a diagonal matrix representation in the basis of eigenvectors $\{{\mathit{e}}_{i}\}$, $1\le i\le L$,
associated to the eigenvalue ${\omega}_{j}={e}^{i\frac{2\pi j}{L}}$, where ${i}^{2}=1$. Moreover, $J$ clearly preserves periodicity when acting on row vectors in ${\mathbb{R}}^{L}$, so that the spaces ${V}_{m}$, $1\le m\le M$, are stable by $J$. As a consequence, each space ${V}_{m}$ can be represented as the span of a subset of the eigenvectors of $J$. In principle, the existence of a basis spanning the spaces ${V}_{m}$, $1\le m\le M$, allows one to compute the dimension of the intersections of these spaces by counting the number of common basis elements in their span.
(iii)The last step is to show that counting the number of common basis elements $\mathit{e}}_{i$ in the subsets of ${\{{V}_{m}\}}_{1\le m\le M}$ yields the announced formula. Proving this point relies on elementary results from the theory of cyclic groups. Let us first consider the basis elements generating ${V}_{m}$, which are the elements $\mathit{e}}_{j$ that are ${\lambda}_{m}$periodic. These basis elements are precisely those for which ${\omega}_{j}^{{\lambda}_{m}}=1$, that is, ${\lambda}_{m}j=0$ in the cyclic group $\mathbb{Z}/L\mathbb{Z}$. Considering the integers $j$ as elements of $\mathbb{Z}/L\mathbb{Z}$, we can then specify the basis vectors generating ${V}_{m}$ by invoking the subgroup structure of the cyclic groups. Specifically, the basis elements generating ${V}_{m}$ are indexed by the elements of the unique subgroup of order ${\lambda}_{m}$ in $\mathbb{Z}/L\mathbb{Z}$. Thus, as expected, the number of basis elements equates the otherwise known dimension of $V}_{m$. Let us then consider the basis elements generating the intersection space ${V}_{m}\cap {V}_{n}$, $m\ne n$, which are the elements $\mathit{e}}_{j$ that are both ${\lambda}_{m}$periodic and ${\lambda}_{n}$periodic. These basis elements correspond to those indices $j$ for which we have ${\lambda}_{m}j=0$ and ${\lambda}_{n}j=0$ in the cyclic group $\mathbb{Z}/L\mathbb{Z}$, that is, for which $\mathrm{gcd}({\lambda}_{m},{\lambda}_{n})j=0$ in $\mathbb{Z}/L\mathbb{Z}$. By the subgroup structure of cyclic groups, the basis elements generating ${V}_{m}\cap {V}_{n}$ are thus indexed by the elements of the unique subgroup of order $\mathrm{gcd}({\lambda}_{n},{\lambda}_{m})$ in $\mathbb{Z}/L\mathbb{Z}$. Thus, we have $dim{V}_{m}\cap {V}_{n}=\mathrm{gcd}({\lambda}_{m},{\lambda}_{n})$. The above reasoning generalizes straightforwardly to any set of indices $1\le {m}_{1}\S lt;\dots \S lt;{m}_{k}\le M$, $1\le k\le m$, leading to
Specifying the dimension of the intersection spaces in (69) derived from the exclusioninclusion principle yields the rank formula given in (68).
Generalization to higher dimensional lattices
Our two results about (i) the number of dichotomies for grid code with two modules and about (ii) the separating capacity for an arbitrary number of modules generalize to an arbitrary number of dimensions. The generalization of (i) is straightforward as our results bear on the set of gridlike inputs with no reference to physical space. The only caveat has to do with the fact that for $d$dimensional lattice, each module $m$, $1\le m\le M$, contains ${\lambda}_{m}^{d}$ cells so that ${\lambda}_{m}^{d}$ has to be substituted for ${\lambda}_{m}$ in formula (47). It turns out that the generalization of (ii) proceeds in the exact same way, albeit in a less direct fashion. In the following, we prove that the separating capacity for a $d$dimensional lattice model, including the 2D hexagonal lattice, is still given by the rank of the corresponding activity matrix.
A couple of remarks are in order before justifying the generalization of (ii):
First, let us specify how to construct activity matrices in ddimensional space by considering a simple example. Consider the hexagonallattice model for two modules with $\mathit{\lambda}=(2,3)$. As illustrated in Figure 1, there are four possible 2periodic lattices and nine possible 3periodic lattices, each lattice representing the spatial activity pattern of a grid cell. Combining the encoding of the two modules yield a periodic lattice, with lattice mesh comprising $\mathrm{lcm}{({\lambda}_{1},{\lambda}_{2})}^{2}=36$ positions. Every position within the mesh size is uniquely labeled by the gridlike input, and any subset of positions with larger cardinality has redundancy. Observe moreover that the lattice mesh is equivalent to that of a (2, 3)square lattice, and in fact, the activity matrix for an (2, 3)hexagonal lattice model is the same as that for a (2, 3)square lattice. As a result, the spatial dependence of the gridcell population is described by a matrix in ${\mathbb{R}}^{13\times 36}$ with the following block structure:
In the above matrix ${A}_{(2,3)}^{(2)}$, the toptwo block rows represent the activity of 2periodic cells, while the bottomthree block rows represent the activity of 3periodic cells. By convention, we consider blocks ${B}_{(2)}$ and ${B}_{(3)}$, comprising respectively two and three cells, represent the activity of grid cells along the horizontal $x$axis. There are two rows of blocks ${B}_{(2)}$ and three rows of blocks ${B}_{(3)}$ to encode 2periodicity and 3periodicity, respectively, along the vertical $y$axis. It is straightforward to generalize this hierarchical block structure to construct an activity matrix $A}_{\mathit{\lambda}}^{(d)$ for arbitrary periods ${\lambda}_{m}$ and arbitrary squarelattice dimension $d$. In particular, the matrix $A}_{\mathit{\lambda}}^{(d)$ has ${\sum}_{m=1}^{M}{\lambda}_{m}^{d}$ rows and $L=\mathrm{lcm}{({\lambda}_{1},\mathrm{\dots},{\lambda}_{M})}^{d}$ columns.
Second, let us define the notion of contiguousseparating capacity for $d$dimensional lattice with $d\S gt;1$. In one dimension, we define the contiguousseparating capacity as the maximum spatial extent for which all dichotomies involving its discrete set of positions are linearly separable. We generalize this notion for arbitrary dimensions $d$ by defining the contiguousseparating capacity as the maximum connected component of $d$dimensional positions for which all dichotomies are possible. Observe that thusdefined, we are rather oblivious about the geometric arrangement of this connected components. This is due to the fact that in dimension $d\S gt;1$, the contiguousseparating capacity can be achieved by many distinct arrangements.
After these preliminary remarks, we can now prove the following proposition.
Proposition 11
The contiguousseparating capacity of the generic gridcellactivity matrix $A}_{\mathit{\lambda}$ is equal to $\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}^{(d)}$, where we have
Proof. We only justify the formula for the case $d=2$ as similar arguments apply for all integers $d\S gt;1$ (see Remark after this proof). The proof will proceed in two steps: (i) we justify the formula for $\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}^{(d)}$ and (ii) we justify that the contiguousseparating capacity equals $\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}^{(d)}$.
(i) We follow the same strategy as for dimension 1 to establish the rank formula for $d=2$ via exclusioninclusion principle. The key point is to exhibit a basis of vectors $({\mathit{e}}_{1},\dots ,{\mathit{e}}_{L})$ in ${\mathbb{R}}^{L\times L}$, with $L=\mathrm{lcm}({\lambda}_{1},\mathrm{\dots},{\lambda}_{M})$, which spans all the vector spaces ${V}_{m}$, $1\le m\le M$, where ${V}_{m}$ denotes the space of ${\lambda}_{m}$ periodic functions on the $(L\times L)$lattice mesh. To specify such a basis, we consider the two operators ${J}_{x}$ and ${J}_{y}$ acting on the gridlike inputs and representing the oneunit shift along horizontal $x$axis and along the vertical $y$axis, respectively. A basis of the space of ${\lambda}_{m}$ periodic functions on the $(L\times L)$lattice mesh is generated by iterated action of ${J}_{x}$ and ${J}_{y}$ on the activity lattice of a ${\lambda}_{m}$periodic cell, that is, on a $\{0,1\}$row vector $\mathit{r}}_{{\lambda}_{m}$ of the mth module of $\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}^{(d)}$. Specifically, a basis of ${V}_{m}$ is given by the ${\lambda}_{m}^{2}$ vectors $J}_{x}^{k}{J}_{y}^{l}{\mathit{r}}_{{\lambda}_{m}$, with $0\le k\S lt;{\lambda}_{m}$ and $0\le l\S lt;{\lambda}_{m}$. Moreover, the operators ${J}_{x}$ and ${J}_{y}$ commute on ${\mathbb{R}}^{L\times L}$, as by construction, shifting lattices by ${J}_{x}{J}_{y}$ yields the same lattice as the one obtained by shifting the original lattice by ${J}_{y}{J}_{x}$. Thus, if ${J}_{x}$ and ${J}_{y}$ are diagonalizable, they can be diagonalized in the same basis $\mathit{\u03f5}}_{ij$, $1\le i,j\le L$. Close inspection of the operators ${J}_{x}$ and ${J}_{y}$ reveals that they admit matrix representations that are closely related to the canonical $L$dimensional circulant matrix ${J}_{L}$:
Concretely, the operator ${J}_{x}$ cyclically shifts columns within each blocks $\mathrm{rank}{B}_{\lambda}$, whereas the operator ${J}_{y}$ cyclically shifts the blocks within $A}_{\mathit{\lambda}}^{(d)$. Considering the basis of eigenvector $\mathit{e}}_{i$, $1\le i\le L$, of ${J}_{L}$, we define the basis $\mathit{\u03f5}}_{ij$, $1\le i,j\le L$, as ${\mathit{\u03f5}}_{ij}=\left({\mathit{e}}_{j}{w}_{i}{\mathit{e}}_{j}\dots {w}_{i}^{L1}{\mathit{e}}_{j}\right)$, where w_{i} is the eigenvalue associated to $\mathit{e}}_{i$. We have
which shows that $\mathit{\u03f5}}_{ij$ is indeed a basis diagonalizing ${J}_{x}$ and ${J}_{y}$. Moreover, as ${J}_{x}$ and ${J}_{y}$ stabilize the space ${V}_{m}$, the basis $\mathit{\u03f5}}_{ij$ spans the space ${V}_{m}$, as well as all the spaces defined as intersections of subsets of ${\{{V}_{m}\}}_{1\le m\le M}$. Consider the set of indices $1\le {m}_{1}\S lt;\dots \S lt;{m}_{k}\S lt;M$, $1\le k\le M$, specifying the intersection ${V}_{{m}_{1}}\cap \mathrm{\dots}\cap {V}_{{m}_{k}}$. By the same reasoning as for dimension 1, the basis elements spanning ${V}_{{m}_{1}}\cap \mathrm{\dots}\cap {V}_{{m}_{k}}$ are those eigenvectors $\mathit{\u03f5}}_{ij$ that are $\mathrm{gcd}({\lambda}_{{m}_{1}},\mathrm{\dots},{\lambda}_{{m}_{k}})$periodic in both $x$direction and $y$direction. As $J}_{x}{\mathit{\u03f5}}_{ij}={\omega}_{j}{\mathit{\u03f5}}_{ij$ and $J}_{y}{\mathit{\u03f5}}_{ij}={\omega}_{i}{\mathit{\u03f5}}_{ij$, posing $g=\mathrm{gcd}({\lambda}_{{m}_{1}},\mathrm{\dots},{\lambda}_{{m}_{k}})$, this is equivalent to $(gi,gj)=(0,0)$ in $\mathbb{Z}/g\mathbb{Z}\times \mathbb{Z}/g\mathbb{Z}$. By the subgroup structure of cyclic group, the basis elements $\mathit{\u03f5}}_{ij$ generating ${V}_{{m}_{1}}\cap \mathrm{\dots}\cap {V}_{{m}_{k}}$ are thus indexed by $(i,j)$ where $i$ and $j$ are elements of the unique subgroup of order $g$ in $\mathbb{Z}/L\mathbb{Z}$. There are ${g}^{2}$ such basis elements, showing that
The rank formula follows immediately from expressing $\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}^{(d)}=\mathrm{dim}\left({V}_{1}+\dots +{V}_{M}\right)$ via the exclusioninclusion principle.
(ii) Just as for $(i)$, we follow the same strategy as for dimension 1 to show that the contiguousseparating capacity equals $\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}^{(d)}$. The only caveat to address is that the gridlike inputs, that is, the columns of $A}_{\mathit{\lambda}}^{(d)$, are generated by the action of two shift operators instead of one. Specifically, starting from the first column $\mathit{c}}_{1$ of $A}_{\mathit{\lambda}}^{(d)$, we can generate all subsequent columns by action of the operators $J}_{\mathit{\lambda},x$ and $J}_{\mathit{\lambda},y$, whose matrix representations are given by
Notice that $J}_{\mathit{\lambda},x$ and $J}_{\mathit{\lambda},y$ commute. By the same reasoning as for dimension 1, we know that the separating capacity cannot exceed $\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}^{(d)}$. Then, to prove that the separating capacity equals $\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}^{(d)}$, it is enough to exhibit a linearly independent set of contiguous positions with cardinality $\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}{A}_{\mathit{\lambda}}^{(d)}$. Let us exhibit such positions. Mirroring the 1D case, let us consider the sequence ${d}_{l}^{(1)}$ defined by
The above sequence is strictly increasing by unit step until some l_{1}, after which it remains constant at value
Let us then consider the sequence
The above sequence is also strictly increasing by unit step until some l_{2}, after which it remains constant at value
Moreover, V_{2} admits for basis the vectors $1\le i\le {l}_{2}$, and $J}_{\lambda ,y}^{i}{c}_{1},1\le i\le {l}_{1$, $J}_{\lambda ,y}^{i}{J}_{\lambda ,y}{c}_{1},1\le i\le {l}_{2$. We can iterate this construction by repeated action of the operator $J}_{\mathit{\lambda},x$, yielding a sequence of number l_{k} and a sequence of space $V}_{k}={V}_{k1}+{J}_{\mathit{\lambda},x}{V}_{k$. Necessarily, the sequence l_{k} becomes eventually zero as
Let us consider the smallest $k\S gt;1$ for which ${l}_{k}=0$, than the set of vectors
is linearly independent by construction and generates the range of $A}_{\mathit{\lambda}}^{(d)$. In particular, we necessarily have $l}_{1}+\dots +{l}_{k1}=\mathrm{r}\mathrm{a}\mathrm{n}\mathrm{k}\phantom{\rule{thinmathspace}{0ex}}{A}_{\mathit{\lambda}}^{(d)$. Observing that these vectors correspond to a connected component of positions concludes the proof.
Remark
Although we do not give the proof for arbitrary spatial dimension $d\S gt;2$, let us briefly comment on extending the above arguments to higher dimension. Such a generalization is straightforward but requires the utilization of tensor calculus. For integer periods $\mathit{\lambda}$ and generic dimension $d$, the activity tensor can be defined as
where ${y}_{{i}_{1}}^{m}\otimes \mathrm{\dots}\otimes {y}_{{i}_{d}}^{m}$ is the canonical basis vector associated to the $({i}_{1},\mathrm{\dots},{i}_{d})$ coordinate in $\mathbb{R}}^{{\lambda}_{m}^{d}$, with $({i}_{1},\dots ,{i}_{d})$ considered as an element of $\left(\mathbb{Z}/{\lambda}_{m}\mathbb{Z}\right)}^{d$, and where $x}_{{i}_{1}}^{\star}\otimes \dots \otimes {x}_{{i}_{d}}^{\star$ is the linear form associated to the $({i}_{1},\dots ,{i}_{d})$ coordinate in ${\mathbb{R}}^{{L}^{d}}$. In tensorial form, the operators $J}_{k$, $1\le k\le d$, representing unit shift along the kth dimension, have the simple form $J}_{k}={I}_{L}\otimes \dots \otimes {J}_{L}\otimes \dots \otimes {I}_{L$ such that
where ${i}_{k}+1$ is considered as an element of $\mathbb{Z}/L\mathbb{Z}$. The generalization to arbitrary $d$dimension follows from realizing that $\mathit{\u03f5}}_{{i}_{1},\dots ,{i}_{L}}={\mathit{e}}_{{i}_{1}}\otimes \dots \otimes {\mathit{e}}_{{i}_{L}$, $i}_{1},\dots ,{i}_{d}\in {L}^{d$, where $\mathit{e}}_{i$ is the eigenvector of $J}_{L$ associated to $\omega}_{i$, form a basis diagonalizing all the operators $J}_{k$, $1\le k\le d$ with $J}_{k}{\mathit{\u03f5}}_{{i}_{1},\dots ,{i}_{L}}={\omega}_{{i}_{k}}{\mathit{\u03f5}}_{{i}_{1},\dots ,{i}_{L}$.
Data availability
The authors confirm that the data supporting the findings of this study are available within the article. Implementation details and code are available at: https://github.com/myyim/placecellperceptron copy archived at https://archive.softwareheritage.org/swh:1:rev:8e03b880f47a1f0b7934afd91afb167f669ceeab.
References

Information capacity of the Hopfield modelIEEE Trans Inform Theory 31:461–464.https://doi.org/10.1109/TIT.1985.1057069

Storing Infinite Numbers of Patterns in a SpinGlass Model of Neural NetworksPhysical Review Letters 55:1530–1533.https://doi.org/10.1103/PhysRevLett.55.1530

Conjunctive input processing drives feature selectivity in hippocampal CA1 neuronsNature Neuroscience 18:1133–1142.https://doi.org/10.1038/nn.4062

Do We Understand the Emergent Dynamics of Grid Cell ActivityJournal of Neuroscience 26:9352–9354.https://doi.org/10.1523/JNEUROSCI.285706.2006

Accurate Path Integration in Continuous Attractor Network Models of Grid CellsPLOS Computational Biology 5:e1000291.https://doi.org/10.1371/journal.pcbi.1000291

Past, Present, and Future of Simultaneous Localization and Mapping: Toward the RobustPerception AgeIEEE Transactions on Robotics 32:1309–1332.https://doi.org/10.1109/TRO.2016.2624754

BookBipartite expander Hopfield networks as selfdecoding highcapacity error correcting codesIn: Wallach H, Larochelle H, Beygelzimer A, AlchéBuc F, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems 32. Curran Associates. pp. 7686–7697.

Maintaining a cognitive map in darkness: The need to fuse boundary knowledge with path integrationPLOS Computational Biology 8:e1002651.https://doi.org/10.1371/journal.pcbi.1002651

Understanding memory through hippocampal remappingTrends in Neurosciences 31:469–477.https://doi.org/10.1016/j.tins.2008.06.008

Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognitionIEEE Transactions on Electronic Computers EC14:326–334.https://doi.org/10.1109/PGEC.1965.264137

Asymptotics of the extremal excedance set statisticEuropean Journal of Combinatorics 46:75–88.https://doi.org/10.1016/j.ejc.2014.11.008

What Grid Cells Convey about Rat LocationThe Journal of Neuroscience 28:6858–6871.https://doi.org/10.1523/JNEUROSCI.568407.2008

BookYoung Tableaux: With Applications to Representation Theory and GeometryCambridge University Press.https://doi.org/10.1017/CBO9780511626241

ConferenceError accumulation and landmarkbased error correction in grid cellsNeuroscience 2014.

On the geometric separability of Boolean functionsDiscrete Applied Mathematics 66:205–218.https://doi.org/10.1016/0166218X(94)001616

Pattern capacity of a perceptron for sparse discriminationPhysical Review Letters 101:018101.https://doi.org/10.1103/PhysRevLett.101.018101

Making our way through the world: Towards a functional understanding of the brain’s spatial circuitsCurrent Opinion in Systems Biology 3:186–194.https://doi.org/10.1016/j.coisb.2017.04.008

ConferenceTraining recurrent networks to generate hypotheses about how the brain solves hard navigation problemsNIPS. pp. 4529–4538.

Efficient and flexible representation of higherdimensional cognitive variables with grid cellsPLOS Computational Biology 16:e1007796.https://doi.org/10.1371/journal.pcbi.1007796

Dendritic Spikes in Apical Dendrites of Neocortical Layer 2/3 Pyramidal NeuronsThe Journal of Neuroscience 27:8999–9008.https://doi.org/10.1523/JNEUROSCI.171707.2007

Mobile robot localization by tracking geometric beaconsIEEE Trans Robot Autom 7:376–382.https://doi.org/10.1109/70.88147

Optimal Population Codes for Space: Grid Cells Outperform Place CellsNeural Computation 24:2280–2317.https://doi.org/10.1162/NECO00319

Path integration and the neural basis of the ’cognitive mapNature Reviews. Neuroscience 7:663–678.https://doi.org/10.1038/nrn1932

Modular realignment of entorhinal grid cell activity as a basis for hippocampal remappingThe Journal of Neuroscience 31:9414–9425.https://doi.org/10.1523/JNEUROSCI.143311.2011

Spatial firing patterns of hippocampal complexspike cells in a fixed environmentJournal of Neuroscience 7:1935–1950.https://doi.org/10.1523/JNEUROSCI.070701935.1987

Scikitlearn: Machine Learning in PythonJournal of Machine Learning Research 12:2825–2830.

Polynomialtime algorithms for regular setcovering and threshold synthesisDiscrete Applied Mathematics 12:57–69.https://doi.org/10.1016/0166218X(85)90040X

BookSequential Minimal Optimization: A Fast Algorithm for Training Support Vector MachinesMicrosoft Research Technical Report MSRTR9814.

Computational subunits in thin dendrites of pyramidal cellsNature Neuroscience 7:621–627.https://doi.org/10.1038/nn1253

The perceptron: a probabilistic model for information storage and organization in the brainPsychological Review 65:386–408.https://doi.org/10.1037/h0042519

Path integration and cognitive mapping in a continuous attractor neural network modelThe Journal of Neuroscience 17:5900–5920.https://doi.org/10.1523/JNEUROSCI.171505900.1997

From grid cells to place cells: A mathematical modelHippocampus 16:1026–1031.https://doi.org/10.1002/hipo.20244

Temporal Association in Asymmetric Neural NetworksPhysical Review Letters 57:2861–2864.https://doi.org/10.1103/PhysRevLett.57.2861

Pyramidal neurons: dendritic structure and synaptic integrationNature Reviews Neuroscience 9:206–221.https://doi.org/10.1038/nrn2286

Grid cells generate an analog errorcorrecting code for singularly precise neural computationNature Neuroscience 14:1330–1337.https://doi.org/10.1038/nn.2901

The hippocampus as a predictive mapNature Neuroscience 20:1643–1653.https://doi.org/10.1038/nn.4650

Cells of origin of entorhinal cortical afferents to the hippocampus and fascia dentata of the ratThe Journal of Comparative Neurology 169:347–370.https://doi.org/10.1002/cne.901690306

BookDendrites (Third edn)Oxford University Press.https://doi.org/10.1093/acprof:oso/9780198745273.001.0001

BookHow does the brain solve the computational problems of spatial navigation?In: Derdikman D, Knierim JJ, editors. Space, and Timeand Thememinipthermation Shippocampaformation. Springer. pp. 373–407.https://doi.org/10.1007/978370911292214

Laminar origin and septotemporal distribution of entorhinal and perirhinal projections to the hippocampus in the catThe Journal of Comparative Neurology 224:371–385.https://doi.org/10.1002/cne.902240305

Entorhinal cortex of the monkey: V. Projections to the dentate gyrus, hippocampus, and subicular complexThe Journal of Comparative Neurology 307:437–459.https://doi.org/10.1002/cne.903070308

Anatomical organization of the parahippocampalhippocampal networkAnnals of the New York Academy of Sciences 911:1–24.https://doi.org/10.1111/j.17496632.2000.tb06716.x

Specific evidence of lowdimensional continuous attractor dynamics in grid cellsNature Neuroscience 16:1077–1084.https://doi.org/10.1038/nn.3450

Longterm dynamics of CA1 hippocampal place codesNature Neuroscience 16:264–266.https://doi.org/10.1038/nn.3329

BookAsymptotics of the Logarithm of the Number of Threshold Functions of the Algebra of LogicWalter de Gruyter.
Decision letter

Gordon J BermanReviewing Editor; Emory University, United States

Michael J FrankSenior Editor; Brown University, United States

Nicolas BrunelReviewer
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
Hippocampal place cells and entorhinal grid cells are crucial elements of the spatial representation system of the brain, but the mechanisms underlying their emergence are still poorly understood. A longstanding hypothesis in the field is that the properties of place cells can be well described as a nonlinear function of a weighted sum of inputs coming from entorhinal grid cells. In this paper, the authors explore the implications of this scenario, in a simplified model with discretized space, where grid cells are part of a discrete set of modules, and each cell has a perfectly periodic firing in space with a period that depends on the module. They compute analytically the number of possible place field arrangements, and the separating capacity, in this scenario, through a very nice extension of the classic Cover calculation for inputs in general position. These calculations show that the number of possible arrangements is much smaller than when inputs are in general position, but that they are more robust.
Decision letter after peer review:
Thank you for submitting your article "Where can a place cell put its fields? Let us count the ways" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Michael Frank as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Nicolas Brunel, PhD (Reviewer #2).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission. The reviewers were generally positive about the manuscript, finding that it significantly expands our understanding of the constraints on place field arrangements arising from grid cell inputs, but they would like to see several revisions and clarifications before being able to recommend it for publication. Please see the list of essential revisions below.
Summary:
Hippocampal place cells and entorhinal grid cells are crucial elements of the spatial representation system of the brain, but the mechanisms underlying their emergence are still poorly understood. A long standing hypothesis in the field is that the properties of place cells can be well described as a nonlinear function of a weighted sum of inputs coming from entorhinal grid cells. In this paper, the authors explore implications of this scenario, in a simplified model with discretized space, where grid cells are part of a discrete set of modules, and each cell has a perfectly periodic firing in space with a period that depends on the module. They compute analytically the number of possible place field arrangements, and the separating capacity, in this scenario, through a very nice extension of the classic Cover calculation for inputs in general position. These calculations shows that the number of possible arrangements is much smaller than when inputs are in general position, but that they are more robust.
Essential revisions:
1. The questions that are addressed in the manuscript are interesting mathematically but do not map directly to realistic properties of place cells. The reviewers were concerned that many readers won't understand the limitations. Therefore, the limitations of the approach should be acknowledged and spelled out more clearly. The first question, whether grid cell inputs can produce all possible patterns of place cell activity, is quite detached from biological reality because in the vast majority of these patterns the place cell would fluctuate wildly between on and off states as a function of position, whereas in reality place cells fire sparsely. Importantly, the sparseness is not a conclusion or a prediction of the theory because any degree of sparseness can be easily achieved by varying the threshold. Instead, from the point of view of biological realism, sparseness must be imposed.
The work does consider also patterns that are sparse, having K fields over the whole range of input patterns, where K is small. This question, too, is detached from the reality of place cell firing because place cells would clearly exhibit many firing fields (not just a handful of fields) over the vast range of positions that correspond to all input patterns. Place cells can have multiple firing fields in large continuous environments, and each place cell may have a different field in a significant fraction of small environments. Thus, it is important to consider sparse patterns where the number of firing fields is proportional to the range of positions that are represented by the input patterns. In addition, ideally, it would be interesting to consider this question on a large set of disjoint sets of inputs patterns, each corresponding individually to a continuous stretch of positions (one environment) instead of one long stretch (or the full range). The two cases considered in the work, of arbitrary (dense) patterns and of extremely sparse patterns can be thought of as two extremes where it was possible to derive precise results. These results are suggestive of what might happen with more biologically relevant activity patterns, but the limitations should be acknowledged.
2. The reviewers found the discussion on graded receptive fields (lines 429438) to be unconvincing, and it may convey an incorrect message about graded receptive fields once noise is taken into account. The argument is based on the observation that graded receptive fields can be related to narrow ones by a linear transformation. If this linear transformation is invertible, it does not alter the set of linearly separable patterns. However, the transformation under consideration is a lowpass filter. For all practical purposes, this transformation, which suppresses high frequency components of the input is noninvertible. The slightest amount of high frequency noise in the grid cell inputs would be dramatically amplified by applying the inverse transformation, and will destroy the correspondence with the case of the narrow input vectors. It is perhaps possible to conduct a more thorough analysis with graded receptive fields, either analytically or numerically. If this is beyond what the authors wish to do in this work, the best course of action might be to acknowledge the limitation of the theory and to leave the question of graded receptive fields open for future study.
3. In the model proposed by the authors, all inputs to a hippocampal place cell are grid cells with perfectly periodic firing in space. This is a very idealized setting that is far from the reality – many cells in entorhinal cortex are far from having spatially periodic firing, and even in those that exhibit strong periodicity, there are often significant variations in the average firing rate from peak to peak. This leads to the concern that purely periodic inputs might not represent the relevant scenario for hippocampal place cells. While the authors discuss briefly the addition of noisy nonperiodic inputs at the end of the Results section (Figures 7C and D), they only discuss the robustness of place fields generated by periodic grid inputs to such noisy inputs, but not number of arrangements or separating capacity. The reader is left to wonder how the other results presented in the paper (number of arrangements, separating capacity) are affected by such nonperiodic inputs. Are these results still relevant in the presence of realistic heterogeneities?
4. The authors show that beyond the scale of the separating capacity, not all place field arrangements are realizable. Could the authors characterize nonrealizable place field arrangements? It would be nice in particular to see specific examples in simple situations like the (3,4) case discussed in Figure 5. It would be even nicer if one can derive general results on such nonrealizable arrangements, possibly leading to experimental predictions (see also points 3 and 4 below). In addition, it would be nice if the authors could provide nontrivial predictions about the statistical structure of place cells that are due to the fact that place cells are given from a sum of spatially periodic inputs. An obvious prediction is that one would predict periodicity to appear at a sufficiently large spatial scale, but can one say something about this spatial scale given current data on grid cell periods? Are this, or other, predictions, testable experimentally?
5. Currently available recordings of place cells in large scale environments suggest the statistics of place cells are indistinguishable from a spatial Poisson process (see for instance papers from Albert Lee's lab, in Science (2014) and Cell (2020)). The authors should discuss how their results fit with this picture. It seems in particular that in their model, place fields are consistent with Poisson (in the sense that all possible configurations are possible) on short spatial scales (below the separating capacity), but not on larger spatial scales. Is it possible to characterize deviations from Poissoniality induced by the spatial structure in the inputs?
6. The manuscript (especially the first half) is not particularly easy to read even for a computational neuroscientist and the general conclusion was that for an audience composed mainly of nontheoreticians, it is rather inaccessible. The results (and the ideas behind the analyses) can potentially be understood by a broader audience, but the authors need to make a substantial communication effort. For example, Even the abstract, which should be readily understood by all neuroscientists, takes for granted the meaning of "separating capacity" or "unique input coding range". The abstract should be comprehensible before reading the whole paper (not after). We ask the authors to take care to make sure that their manuscript speaks to a broader audience than those wellversed in the theory behind grid and place cells.
https://doi.org/10.7554/eLife.62702.sa1Author response
Essential revisions:
1. The questions that are addressed in the manuscript are interesting mathematically but do not map directly to realistic properties of place cells. The reviewers were concerned that many readers won't understand the limitations. Therefore, the limitations of the approach should be acknowledged and spelled out more clearly. The first question, whether grid cell inputs can produce all possible patterns of place cell activity, is quite detached from biological reality because in the vast majority of these patterns the place cell would fluctuate wildly between on and off states as a function of position, whereas in reality place cells fire sparsely. Importantly, the sparseness is not a conclusion or a prediction of the theory because any degree of sparseness can be easily achieved by varying the threshold. Instead, from the point of view of biological realism, sparseness must be imposed.
The work does consider also patterns that are sparse, having K fields over the whole range of input patterns, where K is small. This question, too, is detached from the reality of place cell firing because place cells would clearly exhibit many firing fields (not just a handful of fields) over the vast range of positions that correspond to all input patterns. Place cells can have multiple firing fields in large continuous environments, and each place cell may have a different field in a significant fraction of small environments. Thus, it is important to consider sparse patterns where the number of firing fields is proportional to the range of positions that are represented by the input patterns. In addition, ideally, it would be interesting to consider this question on a large set of disjoint sets of inputs patterns, each corresponding individually to a continuous stretch of positions (one environment) instead of one long stretch (or the full range). The two cases considered in the work, of arbitrary (dense) patterns and of extremely sparse patterns can be thought of as two extremes where it was possible to derive precise results. These results are suggestive of what might happen with more biologically relevant activity patterns, but the limitations should be acknowledged.
Thank you for this comment. Indeed, as noted by the reviewer, we have covered two regimes in characterizing realizable field arrangements by place cells driven by gridlike inputs: in one regime we do so without regard to sparseness of the arrangements (Table 1), and in the other, we consider "ultra" sparse arrangements (Ksparse, or K fields/cell, where K is a small fixed number), with a small number of fields that does not scale with the number of modules or module periods (and thus with the full range of the code).
We would very much like to generate results in the intermediate regime where place fields are sparse but scale in number proportionally with the full range, as the reviewers note might be the most biologically relevant case. Mathematically, this involves a constraint that is difficult to implement: in the case of counting arrangements, it involves counting the number of Young diagrams with a fixed area.
However, for both nonsparse field arrangements and ultrasparse field arrangements (K=1,2,3,…), we find that the grid code enables a large number of field arrangements (e.g. relative to just onehot input codes; we have now added a comparison of Kfield arrangement counting of gridlike inputs with onehot inputs, which we did not have earlier), that are nevertheless a vanishingly small fraction of all arrangements, leading to our conclusion that the grid code's its modular structure enables the formation of many arrangements but that it simultaneously imposes strong structure on the place field arrangements. Thus, as the reviewer notes, given similar conclusions on two extremes, we may expect similar qualitative results on structure and richness in the intermediate regime of sparse but not ultrasparse field arrangements. This will be the basis of future work.
In both our Results and Discussion sections, we now explicitly comment that we consider dense and ultrasparse field arrangements but do not have analytical results for the sparse case.
2. The reviewers found the discussion on graded receptive fields (lines 429438) to be unconvincing, and it may convey an incorrect message about graded receptive fields once noise is taken into account. The argument is based on the observation that graded receptive fields can be related to narrow ones by a linear transformation. If this linear transformation is invertible, it does not alter the set of linearly separable patterns. However, the transformation under consideration is a lowpass filter. For all practical purposes, this transformation, which suppresses high frequency components of the input is noninvertible. The slightest amount of high frequency noise in the grid cell inputs would be dramatically amplified by applying the inverse transformation, and will destroy the correspondence with the case of the narrow input vectors. It is perhaps possible to conduct a more thorough analysis with graded receptive fields, either analytically or numerically. If this is beyond what the authors wish to do in this work, the best course of action might be to acknowledge the limitation of the theory and to leave the question of graded receptive fields open for future study.
We thank the reviewers for this comment, which has allowed us to improve our argument for the generalization of our results to graded receptive fields. In particular, we discussed that an invertible convolution applied to the {0,1} codewords would generate graded tuning curves, and because the transformation is invertible, the linear separability of the {0,1} original codewords would remain unchanged postconvolution. The reviewer notes that if, after convolution, the codewords were perturbed by noise, an inverse convolution would produce very different states than the original codewords. First, note that in going from binary to smoothed tuning, there is no sense in which the system is "allowed" to add highfrequency noise to the smoothed tuning curves: lowdimensional continuous attractor dynamics keep the tuning curve shapes fixed to a canonical set of translationally shifted smooth shapes, and perturbations to the shape count as offmanifold perturbations that are rapidly erased; any highdimensional/ highfrequency shapealtering noise is projected onto the nearest point on the lowdimensional manifold, resulting at worst in small shifts in the encoded phases of each grid module (the attractor dynamics also collectively maintains the relative phases of all cells within a module); thus, we should think of the convolved codewords and their relative phases as not subject to noise, and the only noise is in collective shifts of the full module phase relative to the actual spatial position. The mapping from internal coding states to positions is not used for counting arguments, and thus this type of noise is not relevant to our discussions.
Second, the argument for why the convolved codewords possess the same geometry as the uninvolved {0,1} codewords can be made without reference to invertibility of the convolution: If the convolution kernel maintains the sufficient statistic of position phase within each cell and module (and it will do so if the kernel exhibits no periodicity on the scale of the period of each module: thus, it cannot be doublybumped within a period, or be constant in amplitude across the period), then: (1) the sufficient statistic of each codeword, the phase encoding of position, is maintained; (2) the cells within each module are still equivalent and can be permuted; (3) the code retains its modular structure, lacking permutation invariance of cells across modules; and (4) the module states can be described as independently updating from each other. These properties mean that the qualitative geometry of the convolved code is again the orthogonal product of simplicies, with the individual simplices having the same geometry as the original {0,1} codeword simplicies. Thus, the counting arguments go through unchanged.
Finally, the effect of the convolution is a rescaling of the sides of the convex polytopes, which will affect the robustness (margins) of the codewords to noise relative to the original {0,1} codewords. We discuss this in the section on margins.
In sum, the counting arguments are not affected by convolution of codewords by kernels that convert {0,1} activations into graded phaseencoding activation profiles. Different encodings of phase will affect the margins and noiserobustness of the resulting field arrangements.
We have replaced our previous argument on the structure of graded gridlike codewords based on invertibility, with the second argument above.
3. In the model proposed by the authors, all inputs to a hippocampal place cell are grid cells with perfectly periodic firing in space. This is a very idealized setting that is far from the reality – many cells in entorhinal cortex are far from having spatially periodic firing, and even in those that exhibit strong periodicity, there are often significant variations in the average firing rate from peak to peak. This leads to the concern that purely periodic inputs might not represent the relevant scenario for hippocampal place cells. While the authors discuss briefly the addition of noisy nonperiodic inputs at the end of the Results section (Figures 7C and D), they only discuss the robustness of place fields generated by periodic grid inputs to such noisy inputs, but not number of arrangements or separating capacity. The reader is left to wonder how the other results presented in the paper (number of arrangements, separating capacity) are affected by such nonperiodic inputs. Are these results still relevant in the presence of realistic heterogeneities?
Thank you for the opportunity to clarify.
We have shown that the field arrangements that are realizable with grid inputs have bigger margins than if driven by shuffled grid codes and random codes, and thus are more robust to noise (Figure 7ab). Thus, the existing counting and capacity results will be robust to the addition of noise upto the size of the margins: existing field arrangements will not be destabilized by any noise smaller in size than these broad margins, and the number of realizable arrangements will therefore not decrease.
Moreover, we have shown (Figure 7cd, filled green violins) that the addition of noise or sparse spatial inputs, in addition to mostly not destroying existing field arrangements, creates new realizable field arrangements: this is because the addition of random inputs to the grid inputs moves the overall input vectors towards more general position. At the same time, however, these additional field arrangements are not stable/robust: their margins are much smaller. We have clarified these points in the manuscript.
4. The authors show that beyond the scale of the separating capacity, not all place field arrangements are realizable. Could the authors characterize nonrealizable place field arrangements? It would be nice in particular to see specific examples in simple situations like the (3,4) case discussed in Figure 5.
Thank you for this suggestion. We have added examples of nonrealizable place field arrangements in the caption of Figure 3. Geometrically, a 2field arrangement with positive labels for a pair of vertices that are not adjacent (directly connected by an edge) and negative labels for all the rest is not realizable. Conceptually, there are many unrealizable field arrangements (we know most are unrealizable because realizable one are a vanishing fraction) including some obvious ones: for the twomodule case with coprime periods, one cannot have a field arrangement with fields only every other lambda1 (e.g. a periodic arrangement with 2*lambda1). One cannot have a field arrangement with fields only at locations 1 and 2 (two adjacent locations) and nowhere else. This is because for the chosen locations to be above threshold, the periodic nature of the grid drive means that other locations, shifted by multiples of the module periods will also be above threshold. Given the very large set of unrealizable field arrangements, it actually is more tractable to characterize the structure expected in realizable arrangements – please see next response.
It would be even nicer if one can derive general results on such nonrealizable arrangements, possibly leading to experimental predictions (see also points 3 and 4 below). In addition, it would be nice if the authors could provide nontrivial predictions about the statistical structure of place cells that are due to the fact that place cells are given from a sum of spatially periodic inputs. An obvious prediction is that one would predict periodicity to appear at a sufficiently large spatial scale, but can one say something about this spatial scale given current data on grid cell periods? Are this, or other, predictions, testable experimentally?
This is a very good question – quantification of what structures are present within the special set of realizable arrangements, which we have counted in this work.
We are in the middle of a separate collaborative theoryexperimental work on this question, and to deal extensively with it is beyond the scope of this already very full paper.
We have seen that griddriven place field arrangements are highly constrained such that only a tiny fraction of potential field arrangements within or across environments are realizable. Realizable arrangements can be understood intuitively with a simple picture: A place cell could choose its input weights and threshold to produce a field at one location. But because gridcell inputs are multiply peaked and nonlocal, strengthening weights from grid cells with certain phases and periods to obtain a field at one location means that the place cell will also be strongly driven wherever a similar pattern of inputs recurs in the grid input. This will happen periodically at multiples of the full range L, but given that the separating capacity is given by a much smaller range, $\Sigma$, it follows that there should also be visible structure on this scale.
Specifically, we expect to see echoes of the grid structure in both gridplace relationships and in relationships between place fields: (i) Gridplace relationships: A place field strongly driven by a grid cell of a certain phase at one location will be more likely to also be driven by those cells at other locations. Thus, we expect an elevation in the conditional probability, given that a place and grid cell have a coincident field, that the next field by that place cell will also coincide with a field from that grid cell. (ii) Place field relationships: The combined drive of multiple grid periods and phases to a place cell makes its responses appear random (Figure, panel B). However, these realizable arrangements will be geometrically constrained in a scaffold, with more regularity in field spacing over the scale of the summed grid module periods than expected from purely random placement. The interfield interval (IFI) distributions of place fields, if tested along sufficiently long linear tracks with motion and orienting cues but the absence of many spatially localized landmarks, should exhibit peaks that reflect the combination of interfield intervals [Yoon et al., 2016] in the underlying periodic grid inputs (Figure, panel C).
5. Currently available recordings of place cells in large scale environments suggest the statistics of place cells are indistinguishable from a spatial Poisson process (see for instance papers from Albert Lee's lab, in Science (2014) and Cell (2020)). The authors should discuss how their results fit with this picture. It seems in particular that in their model, place fields are consistent with Poisson (in the sense that all possible configurations are possible) on short spatial scales (below the separating capacity), but not on larger spatial scales. Is it possible to characterize deviations from Poissoniality induced by the spatial structure in the inputs?
This question is very closely tied to question (4), please see our response above showing that Poissonlike field distributions can be consistent with periodic input drive, even though structure in the interfield intervals is visible over similarly short scales. We also show in the proposed new Figure that the interfield interval distribution quantifies deviations from Poissoniality induced by the structure of the inputs.
6. The manuscript (especially the first half) is not particularly easy to read even for a computational neuroscientist and the general conclusion was that for an audience composed mainly of nontheoreticians, it is rather inaccessible. The results (and the ideas behind the analyses) can potentially be understood by a broader audience, but the authors need to make a substantial communication effort. For example, Even the abstract, which should be readily understood by all neuroscientists, takes for granted the meaning of "separating capacity" or "unique input coding range". The abstract should be comprehensible before reading the whole paper (not after). We ask the authors to take care to make sure that their manuscript speaks to a broader audience than those wellversed in the theory behind grid and place cells.
Thank you very much for this comment. We have significantly edited the full manuscript for clarity, including by improving definitions. We have also: (1) edited the full manuscript, including text, figures, and captions to make it more accessible and clear; this includes the addition of more conceptual and highlevel overviews and interpretative descriptions; (2) added a new figure (Figure 3) showing the overall approach of the mathematical computations to follow early in Results, to guide readers at a high level through the conceptual steps; (3) added a note about 1assumptions and limitations as suggested by the reviewers, including about place field sparseness.
https://doi.org/10.7554/eLife.62702.sa2Article and author information
Author details
Funding
Simons Foundation (Simons Collaboration on the Global Brain)
 Man Yi Yim
 Ila R Fiete
Howard Hughes Medical Institute (Faculty Scholars Program)
 Ila R Fiete
Alfred P. Sloan Foundation (Alfred P. Sloan Research Fellowship FG20179554)
 Thibaud Taillefumier
Office of Naval Research (S&T BAA Award N000141912584)
 Ila R Fiete
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This work was supported by the Simons Foundation through the Simons Collaboration on the Global Brain, the ONR, the Howard Hughes Medical Institute through the Faculty Scholars Program to IRF, and the Alfred P Sloan Research Fellowship FG20179554 to TT. We thank Sugandha Sharma, Leenoy Meshulam, and Luyan Yu for comments on the manuscript.
Senior Editor
 Michael J Frank, Brown University, United States
Reviewing Editor
 Gordon J Berman, Emory University, United States
Reviewer
 Nicolas Brunel
Version history
 Received: September 2, 2020
 Accepted: April 28, 2021
 Accepted Manuscript published: May 24, 2021 (version 1)
 Accepted Manuscript updated: May 26, 2021 (version 2)
 Version of Record published: July 21, 2021 (version 3)
Copyright
© 2021, Yim et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,166
 Page views

 246
 Downloads

 2
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Computational and Systems Biology
 Neuroscience
The cerebellar granule cell layer has inspired numerous theoretical models of neural representations that support learned behaviors, beginning with the work of Marr and Albus. In these models, granule cells form a sparse, combinatorial encoding of diverse sensorimotor inputs. Such sparse representations are optimal for learning to discriminate random stimuli. However, recent observations of dense, lowdimensional activity across granule cells have called into question the role of sparse coding in these neurons. Here, we generalize theories of cerebellar learning to determine the optimal granule cell representation for tasks beyond random stimulus discrimination, including continuous inputoutput transformations as required for smooth motor control. We show that for such tasks, the optimal granule cell representation is substantially denser than predicted by classical theories. Our results provide a general theory of learning in cerebellumlike systems and suggest that optimal cerebellar representations are taskdependent.

 Computational and Systems Biology
 Neuroscience
Previous research has highlighted the role of glutamate and gammaaminobutyric acid (GABA) in perceptual, cognitive, and motor tasks. However, the exact involvement of these neurochemical mechanisms in the chain of information processing, and across human development, is unclear. In a crosssectional longitudinal design, we used a computational approach to dissociate cognitive, decision, and visuomotor processing in 293 individuals spanning early childhood to adulthood. We found that glutamate and GABA within the intraparietal sulcus (IPS) explained unique variance in visuomotor processing, with higher glutamate predicting poorer visuomotor processing in younger participants but better visuomotor processing in mature participants, while GABA showed the opposite pattern. These findings, which were neurochemically, neuroanatomically and functionally specific, were replicated ~21 mo later and were generalized in two further different behavioral tasks. Using resting functional MRI, we revealed that the relationship between IPS neurochemicals and visuomotor processing is mediated by functional connectivity in the visuomotor network. We then extended our findings to highlevel cognitive behavior by predicting fluid intelligence performance. We present evidence that fluid intelligence performance is explained by IPS GABA and glutamate and is mediated by visuomotor processing. However, this evidence was obtained using an uncorrected alpha and needs to be replicated in future studies. These results provide an integrative biological and psychological mechanistic explanation that links cognitive processes and neurotransmitters across human development and establishes their potential involvement in intelligent behavior.