Place-cell capacity and volatility with grid-like inputs

Abstract
Introduction
Modeling framework
Results
Discussion
Appendix 1
Appendix 2
Appendix 3
Data availability
References
Article and author information
Metrics

Abstract

What factors constrain the arrangement of the multiple fields of a place cell? By modeling place cells as perceptrons that act on multiscale periodic grid-cell inputs, we analytically enumerate a place cell’s repertoire – how many field arrangements it can realize without external cues while its grid inputs are unique – and derive its capacity – the spatial range over which it can achieve any field arrangement. We show that the repertoire is very large and relatively noise-robust. However, the repertoire is a vanishing fraction of all arrangements, while capacity scales only as the sum of the grid periods so field arrangements are constrained over larger distances. Thus, grid-driven place field arrangements define a large response scaffold that is strongly constrained by its structured inputs. Finally, we show that altering grid-place weights to generate an arbitrary new place field strongly affects existing arrangements, which could explain the volatility of the place code.

Introduction

As animals run around in a small familiar environment, hippocampal place cells exhibit localized firing fields at reproducible positions, with each cell typically displaying at most a single firing field (O’Keefe and Dostrovsky, 1971; Wilson and McNaughton, 1993). However, a place cell generates multiple fields when recorded in single large environments (Fenton et al., 2008; Park et al., 2011; Rich et al., 2014) or across multiple environments (Muller et al., 1987; Colgin et al., 2008), including different physical and nonphysical spaces (Aronov et al., 2017).

Within large spaces, the locations seem to be well-described by a random process (Rich et al., 2014; Cheng and Frank, 2011), and across spaces the place-cell codes appear to be independent or orthogonal (Muller et al., 1987; Colgin et al., 2008; Alme et al., 2014), also potentially consistent with a random process. However, a more detailed characterization of possible structure in these responses is both experimentally and theoretically lacking, and we hypothesize that there might be structure imposed by grid cells in place field arrangements, especially when spatial cues are sparse or unavailable.

Our motivation for this hypothesis arises from the following reasoning: grid cells (Hafting et al., 2005) are a critical spatially tuned population that provides inputs to place cells. Their codes are unique over very large ranges due to their modular, multi-periodic structure (Fiete et al., 2008; Sreenivasan and Fiete, 2011; Mathis et al., 2012). They appear to integrate motion cues to update their states and thus reliably generate fields even in the absence of external spatial cues (Hafting et al., 2005; McNaughton et al., 2006; Burak and Fiete, 2006; Burak and Fiete, 2009). Thus, it is possible that in the absence of external cues spatially reliable place fields are strongly influenced by grid-cell inputs.

To generate theoretical predictions under this hypothesis, we examine here the nature and strength of potential constraints on the arrangements of multiple place fields driven by grid cells. On the one hand, the grid inputs are nonrepeating (unique) over a very large range that scales exponentially with the number of grid modules (given roughly by the product of the grid periods), and thus rich (Fiete et al., 2008; Sreenivasan and Fiete, 2011; Mathis et al., 2012); are these unique inputs sufficient to enable arbitrary place field arrangements? On the other hand, this vast library of unique coding states lies on a highly nonlinear, folded manifold that simple read-outs might not be able to discriminate (Sreenivasan and Fiete, 2011). This nonlinear structure is a result of the geometric, periodically repeating structure of individual modules (Stensola et al., 2012); should we expect place field arrangements to be constrained by this structure?

These questions are important for the following reason: a likely role of place cells, and the view we espouse here, is to build consistent and faithful associations (maps) between external sensory cues and an internal scaffold of motion-based positional estimates, which we hypothesize is derived from grid inputs. This perspective is consistent with the classic ideas of cognitive maps (O’Keefe and Nadel, 1978; Tolman, 1948; McNaughton et al., 2006) and also relates neural circuitry to the computational framework of the simultaneous localization and mapping (SLAM) problem for robots and autonomously navigating vehicles (Leonard and Durrant-Whyte, 1991; Milford et al., 2004; Cadena et al., 2016; Cheung et al., 2012; Widloski and Fiete, 2014; Kanitscheider and Fiete, 2017a; Kanitscheider and Fiete, 2017b; Kanitscheider and Fiete, 2017c). We can view the formation of a map as ‘decorating’ the internal scaffold with external cues. For this to work across many large spaces, the internal scaffold must be sufficiently large, with enough unique states and resolution to build appropriate maps.

A self-consistent place-cell map that associates a sufficiently rich internal scaffold with external cues can enable three distinct inferences: (1) allow external cues to correct errors in motion-based location estimation (Welinder et al., 2008; Burgess, 2008; Sreenivasan and Fiete, 2011; Hardcastle et al., 2014), through cue-based updating; (2) predict upcoming external cues over novel trajectories through familiar spaces by exploiting motion-based updating (Sanders et al., 2020; Whittington et al., 2020); and (3) drive fully intrinsic error correction and location inference when external spatial cues go missing and motion cues are unreliable by imposing self-consistency (Sreenivasan and Fiete, 2011).

In what follows, we characterize which arrangements of place fields are realizable based on grid-like inputs in a simple perceptron model, in which place cells combine their multiple inputs and make a decision on whether to generate a field (‘1’ output) or not (‘0’ output) by selecting input weights and a firing threshold (Figure 1A,B). However, in contrast to the classical perceptron results, which are derived under the assumption of random inputs that are in general position (a property related to the linear independence of the inputs), grid inputs to place cells are structured, which adds substantial complexity to our derivations.

Figure 1

Download asset Open asset

The grid-like code and modeling place cells as perceptrons.

(A) Grid-like inputs and a conceptual view of a place cell as a perceptron: each place cell combines its feedforward inputs, including periodic drive from grid cells (responses simplified here to one spatial dimension) of various periods and phases (blue and red cells are from modules with different periods) to generate location-specific activity that might be multiply peaked across large spaces. Can these place fields be arranged arbitrarily? (B) Idealization of a place cell as a perceptron: in discretized 1-D space, the grid-like inputs are discrete patterns that for simplicity we consider to be binary; place fields are assigned at locations where the weighted input sum exceeds a threshold $θ$ . A place field arrangement can be considered as a set of binarized output labels (1 for each field, 0 for non-field locations) for the set of input patterns. We count field arrangements over the range of locations where the grid-like inputs have unique states; for two modules with periods ${2, 3}$ , this range is 6 (the LCM of the grid periods). LCM = least common multiple; GCD = greatest common divisor.

We show analytically that each place cell can realize a large repertoire of arrangements across all possible space where the grid inputs are unique. However, these realizable arrangements are a special and vanishing subset of all arrangements over the same space, suggesting a constrained structure. We show that the capacity of a place cell or spatial range over which all field arrangements can be realized equals the sum of distinct grid periods, a small fraction of the range of positions uniquely encoded by grid-like inputs. Overall, we show that field arrangements generated from grid-like inputs are more robust to noise than those driven by random inputs or shuffled grid inputs.

Together, our results imply that grid-like inputs endow place cells with rich and robust spatial scaffolds, but that these are also constrained by grid-cell geometry. Rigorous proofs supporting all our mathematical results are provided in Appendix 1. Portions of this work have appeared previously in conference abstract form (Yim et al., 2019).

Modeling framework

Place cells as perceptrons

The perceptron model (Rosenblatt, 1958) idealizes a neuron as computing a weighted sum of its inputs ( $x_{j} \in R^{N}$ ) based on learned input weights ( $w \in R^{N}$ ) and applying a threshold ( $θ$ ) to generate a binary response that is above or below threshold. A perceptron may be viewed as separating its high-dimensional input patterns into two output categories ( $y \in {0, 1}$ ) (Figure 2A), with the categorization depending on the weights and threshold so that sufficiently weight-aligned input patterns fall into category 1 and the rest into category 0:

y (x_{j}) = {\begin{cases} 1 & if w \cdot x_{j} - θ > 0, \\ 0 & otherwise . \end{cases}

Figure 2

Download asset Open asset

Linear separability, counting dichotomies, and separating capacity for perceptrons.

(A) A set of patterns (locations given by circles) that are assigned positive and negative labels (filled versus open), called a dichotomy of the patterns, is realizable by a perceptron if positive examples can be linearly separated (by a hyperplane) from the rest. The perceptron weights $w$ encode the direction normal to the separating hyperplane, and the threshold sets its distance from the origin. (B) An example with input dimension $N = 3$ (the input dimension is the length of each input pattern vector, which equals the number of input neurons). When placed randomly, $P = 4$ random real-valued patterns optimally occupy space and are said to be in general position (left); these patterns define a tetrahedron and all dichotomies are linearly separable. By contrast, structured inputs may occupy a lower-dimensional subspace and thus not lie in general position (right). This square configuration exhibits unrealizable dichotomies (as in A, bottom). (C) Cover’s results (Cover, 1965): for patterns in general position, the number of realizable dichotomies is $2^{P}$ , and thus the fraction of realizable dichotomies relative to all dichotomies is 1, when the number of patterns is smaller than the input dimension ( $P §lt; N$ ). The fraction drops rapidly to zero when the number of patterns exceeds twice the input dimension (the separating capacity).

If each partitioning of inputs into the ${0, 1}$ categories is called a dichotomy, then the only dichotomies ‘realizable’ by a perceptron are those in which the inputs are linearly separable – that is, the set of inputs in category 0 can be separated from those in category 1 by some linear hyperplane (Figure 2). Cover’s counting theorem (Cover, 1965; Vapnik, 1998) provides a count of how many dichotomies a perceptron can realize if input patterns are random (more specifically, in general position). A set of patterns ${x_{1}, \dots, x_{P}}$ in an $N$ -dimensional space is in general position if no subset of size smaller than $N + 1$ is affinely dependent. In other words, no subset of $n + 1$ points lies in a $(n - 1)$ -dimensional plane for all $n \leq N$ . (Figure 2B) and establishes that for $P \leq N$ patterns, every dichotomy is realizable by a perceptron – this is the perceptron capacity (Figure 2C). For $P = 2 N$ , exactly half of the $2^{P}$ possible dichotomies are realizable; when $P ≫ N$ for fixed $N$ , the realizable dichotomies become a vanishing fraction of the total (Figure 2C).

Here, to characterize the place-cell scaffold, we model a place cell as a perceptron receiving grid-like inputs (Figure 1B). Across space, a particular ‘field arrangement’ is realizable by the place cell if there is some set of input weights and a threshold (Lee et al., 2020) for which its summed inputs are above threshold at only those locations and below it at all others (Figure 1A,B). We call an arrangement of exactly $K$ fields a ‘K-field arrangement.’.

In the following, we answer two distinct but related questions: (1) out of all potential field arrangements over the entire set of unique grid inputs, how many are realizable, and how does the realizable fraction differ for grid-like inputs compared to inputs with matched dimension but different structure? This is akin to perceptron function counting (Cover, 1965) with structured rather than general-position inputs and covers constraints within and across environments. We consider all arrangements regardless of sparsity, on one extreme, and $K$ -field (highly sparse) arrangements on the other; these cases are analytically tractable. We expect the regime of sparse firing to interpolate between these two regimes. (2) Over what range of positions is any field arrangement realizable? This is analogous to computing the perceptron-separating capacity (Cover, 1965) for structured rather than general-position inputs.

Although the structured rather than random nature of the grid code adds complexity to our problem, the symmetries present in the code also allow for the computation of some more detailed quantities than typically done for random inputs, including capacity computations for dichotomies with a prescribed number of positive labels (K-field arrangements).

Results

Our approach, summarized in Figure 3, is as follows: we define a mapping from space to grid-like input codes (Figure 3A,B), and a generalization to what we call modular-one-hot codes (Figure 3B). We explore the geometric structure and symmetries of these codes (Figure 3C). Next, we show how separating hyperplanes placed on these structured inputs by place-cell perceptrons permits the realization of some dichotomies (Figure 3D) and thus some spatial field arrangements (Figure 3E), but not others, and obtains mathematical results on the number of realizable arrangements and the separating capacity.

Figure 3

Download asset Open asset

Our overall approach.

(**A, B**) Locations (indexed by $j$ ) map onto grid-like coding states ( ${x_{i}}$ , defining the grid-like codebook) through the assignment of spatially periodic responses to grid cells, with different cells in a module having different phases and different modules having different periods. (This example: periods 2,3.) (C) The patterns in the grid-like codebook form some nonrandom, geometric structure. (D) The geometric structure defines which dichotomies are realizable by separating hyperplanes. (E) A realizable dichotomy in the abstract codebook pattern space, when mapped back to spatial locations, corresponds to a realizable field arrangement. Shown is a place field arrangement realized by the separating hyperplane from (D). Similarly, an unrealizable field arrangement can be constructed by examination of (D): it would consist of, for instance, fields at locations $j = 1, 2$ only (or, e.g., at $j = 3, 4, 6$ only): vertices that cannot be grouped together by a single hyperplane.

The structure of grid-like input patterns

Grid cells have spatially periodic responses (Figure 1A,B). Cells in one grid module exhibit a common spatial period but cover all possible spatial phases. The dynamics of each module are low-dimensional (Fyhn et al., 2007; Yoon et al., 2013), with the dynamics within a module supporting and stabilizing a periodic phase code for position. Thus, we use the following simple model to describe the spatial coding of grid cells and modules: a module with spatial period $λ_{m}$ (in units of the spatial discretization) consists of $λ_{m}$ cells that tile all possible phases in the discretized space while maintaining their phase relationships with each other. Each grid cell’s response is a ${0, 1}$ -valued periodic function of a discretized 1D location variable (indexed by $j$ ); cell $i$ in module $m$ fires (has response 1) whenever $(j - i) mod λ_{m} = 0$ , and is off (has response 0) otherwise (Figure 1B). The encoding of location $j$ across all Mm modules is thus an $N$ -dimensional vector $x_{j}$ , where $N = \sum_{m = 1}^{M} λ_{m}$ . Nonzero entries correspond to co-active grid cells at position $j$ . The total number of unique grid patterns is $L = LCM ({λ_{1}, \dots, λ_{M}})$ , which grows exponentially with $M$ for generic choices of the periods ${λ_{m}}$ (Fiete et al., 2008). We refer to $L$ as the ‘full range’ of the code. We call the full ordered set of unique coding states ${x_{j}}$ the grid-like ‘codebook’ $X_{g}$ .

Because $X_{g}$ includes all unique grid-like coding states across modules, it includes all possible relative phase shifts or ‘remappings’ between grid modules (Fiete et al., 2008; Monaco et al., 2011). Thus, this full-range codebook may be viewed as the union of all grid-cell responses across all possible space and environments. We assume implicitly that 2D grid modules do not rotate relative to each other across space or environments. Permitting grid modules to differentially rotate would lead to more input pattern diversity, more realizable place patterns, and bigger separating capacity than in our present computations.

The grid-like code belongs to a more general class that we call ‘modular-one-hot’ codes. In a modular-one-hot code, cells are divided into modules; within each module only one cell is allowed to be active (the within-module code is one-hot), but there are no other constraints on the code. With $m = 1, \dots, M$ modules of sizes $λ_{m}$ , the modular-one-hot codebook $X_{mo}$ contains $P = \prod_{m = 1}^{M} λ_{m}$ unique patterns, with $P \geq L$ for a corresponding grid-like code. When ${λ_{1}, \dots, λ_{M}}$ are pairwise coprime, $P = L$ and the grid-like and modular-one-hot codebooks contain identical patterns. However, even in this case, modular-one-hot codes may be viewed as a generalization of grid-like codes as there is no notion of a spatial ordering in the modular-one-hot codes, and they are defined without referring to a spatial variable.

Of our two primary questions introduced earlier, question (1) on counting the size of the place-cell repertoire (the number of realizable field arrangements) depends only on the geometry of the grid coding states, and not on their detailed spatial embedding (i.e., it depends on the mappings in Figure 3B–D, but not on the mapping between Figure 3A,B,D,E). In other words, it does not depend on the spatial ordering of the grid-like coding states and can equivalently be studied with the corresponding modular-one-hot code instead, which turns out to be easier. Question (2), on place-cell capacity (the spatial range $l \leq L$ over which any place field arrangement is realizable), depends on the spatial embedding of the grid and place codes (and on the full chain of Figure 3A-E). For $l §lt; L$ , this would correspond to a particular rather than random subset of $X_{mo}$ , thus we cannot use the general properties of this generalized version of the grid-like code.

Alternative codes

In what follows, we will contrast place field arrangements that can be obtained with grid-like or modular-one-hot codes with arrangements driven by alternatively coded inputs. To this end, we briefly define some key alternative codes, commonly encountered in neuroscience, machine learning, or in the classical theory of perceptrons. For these alternative codes, we match the input dimension (number of cells) to the modular-one-hot inputs (unless stated otherwise).

Random codes $X_{r}$ , used in the standard perceptron results, consist of real-valued random vectors. These are quite different from the grid-like code and all the other codes we will consider, in that the entries are real-valued rather than ${0, 1}$ -valued like the rest. A set of up to $N$ random input patterns in $N$ dimensions is linearly independent; thus, they have no structure up to this number.

Define the one-hot code $X_{oh}$ as the set of vectors with a single nonzero element whose value is 1. It is a single-module version of the modular-one-hot code or may be viewed as a binarized version of the random patterns since $N$ patterns in $N$ dimensions are linearly independent. In the one-hot code, all neurons are equivalent, and there is no modularity or hierarchy.

Define the ‘binary’ code $X_{b}$ as all possible binary activity patterns of $N$ neurons (Figure 4B, right). We distinguish ${0, 1}$ -valued codes from binary codes. In the binary code, each cell represents a specific position (register) according to the binary number system. Thus, each cell represents numbers at a different resolution, differing in powers of 2, and the code has no neuron permutation invariance since each cell is its own module; thus, it is both highly hierarchical and modular.

Figure 4

Download asset Open asset

The geometry of structured inputs.

(A) Though the grid-like input patterns in the example Figure 1B are 5D, they have a simplified structure that can be embedded as a 3D triangular prism given by the product of a 2-graph (blue, middle) and 3-graph (red, right) because of the independently updating modular structure of the code. (B) Different codebooks and their geometries. At one end of the spectrum (left), one-hot codes consist of a single module; they are not hierarchical, and their geometry is always an elementary simplex (left). Grid cells and modular-one-hot codes (middle) have an intermediate level of hierarchy and consist of an orthogonal product of simplices. At the opposite end, the binary code (right) is the most hierarchical, consisting of as many modules as cells; the code has a hypercube geometry: vertices (codewords or patterns) on each face of the hypercube are far from being in general position.

The grid-like and modular-one-hot codes exhibit an intermediate degree of modularity (multiple cells make up a module). If the modules are of a similar size, the code has little hierarchy.

The geometry of grid-like input patterns

We first explore question $(1)$ . The modular-one-hot codebook $X_{mo}$ is invariant to permutations of neurons (input matrix rows) within modules, but rows cannot be swapped across modules as this would destroy the modular structure. It is also invariant to permutations of patterns (input matrix columns $x_{j}$ ). Further, the codebook includes all possible combinations of states across modules, so that modules function as independent encoders. These symmetries are sufficient to define the geometric arrangement of patterns in $X_{mo}$ , and the geometry in turn will allow us to count the number of field arrangements that are realizable by separating hyperplanes.

To make these ideas concrete, consider a simple example with module sizes ${2, 3}$ (corresponding to the periods in the grid-like code), as in Figure 1B and Figure 3B. Independence across modules causes the code to have a product structure in the code: the codebook consists of six states that can be obtained as products of the within-module states: ${10100, 10010, 10001, 01100, 01010, 01001}$ = ${10, 01} \times {100, 010, 001}$ , where ${10, 01}$ and ${100, 010, 001}$ are the coding states within the size-2 and size-3 modules, respectively. We represent the two states in the size-2 module by two vertices, connected by an edge, which shows allowed state transitions within the module (Figure 4A, right). Similarly, the three states in the size-3 module and transitions between them are represented by a triangular graph (Figure 4A, right). The product of this edge graph and the triangle graph yields the full codebook $X_{mo}$ . The resulting product graph (Figure 4A, left) is an orthogonal triangular prism with vertices representing the combined patterns.

This geometric construction generalizes to an arbitrary number of modules $M$ and to arbitrary module sizes (periods) $λ_{m}$ , $1 \leq m \leq M$ : by permutation invariance of neurons within modules, and independence of modules, the patterns of the codebook $X_{mo}$ and thus of the corresponding grid-like codebook $X_{g}$ always lie on the vertices of some convex polytope (e.g., the triangular prism), given by an orthogonal product of $M$ simplicies (e.g., the line and triangle graphs). Each simplex represents one of the modules, with simplex dimension $λ_{m} - 1$ for module size (period) $λ_{m}$ (see Place-cell capacity and volatility with grid-like inputs).

This geometric construction provides some immediate results on counting: in a convex polytope, any vertex can be separated from all the rest by a hyperplane; thus, all one-field arrangements are realizable. Pairs of vertices can be separated from the rest by a hyperplane if and only if the pair is directly connected by an edge (Figure 3D). Thus, we can now count the set of all realizable two-field arrangements as the number of adjacent vertices in the polytope. Unrealizable two-field arrangements, which consist geometrically of positive labels assigned to nonadjacent vertices, correspond algebraically to firing fields that are not separated by integer multiples of either of the grid periods (Figure 3D,E).

Moreover, note that the convex polytopes obtained for the grid-like code remain qualitatively unchanged in their geometry if the nonzero activations within each module are replaced by graded tuning curves as follows: convert all neural responses within a module into graded values by convolution along the spatial dimension by a kernel that has no periodicity over distances smaller than the module period (thus, the kernel cannot, for instance, be flat or contain multiple bumps within one module period). This convolution can be written as a matrix product with a circulant matrix of full rank and dimension equal to the full range $L$ . Thus, the rank of the convolved matrix ${\tilde{X}}_{g}$ remains equal to the rank of $X_{g}$ . Moreover, ${\tilde{X}}_{g}$ maintains the modular structure of $X_{g}$ : it has the same within-module permutation invariance and across-module independence. Thus, the resulting geometry of the code – that it consists of convex polytopes constructed from orthogonal products of simplices – remains unchanged. As a result, all counting derivations, which are based on these geometric graphs, can be carried out for ${0, 1}$ -valued codes without any loss of generalization relative to graded tuning curves. (However, the conversion to graded tuning will modify the distances between vertices and thus affect the quantitative noise robustness of different field arrangements, as we will investigate later.) Later, we will also show that the counting results generalize to higher dimensions and higher-resolution phase representations within each module.

Given this geometric characterization of the grid-like and modular-one-hot codes, we can now compute the number of realizable field arrangements it is possible to obtain with separating hyperplanes.

Counting realizable place field arrangements

For modular-one-hot codes (but not for random codes), it is possible to specify any separating hyperplane using only non-negative weights and an appropriate threshold. This is an interesting property in the neurobiological context because it means that the finding that projections from entorhinal cortex to hippocampus are excitatory (Steward and Scoville, 1976; Witter et al., 2000; Shepard, 1998) does not further constrain realizable field arrangements.

It is also an interesting property mathematically, as we explore below: combined with the within-module permutation invariance property of modular-one-hot codes, the non-negative weight observation allows us to map the problem onto Young diagrams (Figure 5), which enables two things: (1) to move from considering separating hyperplanes geometrically, where infinitesimal variations represent distinct hyperplanes even if they do not change any pattern classifications, to considering them topologically, where hyperplane variations are considered as distinct only if they change the classification of any patterns, and (2) to use counting results previously established for Young diagrams.

Figure 5

Download asset Open asset

Counting realizable place field arrangements.

(A) Geometric structure of a modular-one-hot code with two modules of periods $λ_{1} = 5$ and $λ_{2} = 7$ . (**B–D**) Because cells within a module can be freely permuted, we can arrange the cells in order of increasing weights and keep this ordering fixed during counting, without loss of generality. We arrange the cells in modules 1 and 2 along the ordinate and abcissa in increasing weight order (solid blue and red lines, respectively). Because the weights can all be assumed to be non-negative for modular-one-hot codes, the threshold can be interpreted as setting a summed-weight budget: no cell (weight) combinations (purple regions with purple-white circles) below the threshold (diagonal purple line) can contribute to a place field arrangement, while all cell combinations with larger summed weights (unmarked regions) can. Increasing the threshold (from B to C) decreases the number of permitted combinations, as does decreasing the weights (B to D). Weight changes (B, from solid to dashed lines) and threshold changes (C, solid to dashed line), so long as they do not change which lines are to the bottom-left of the threshold, do not affect the number of permitted combinations, reflecting the topological structure of the counting problem. (E) With Young diagrams (each corresponding to **B–D** above), we extract the purely topological part of the problem, stripping away analog weights to simplify counting. A Young diagram consists of stacks of blocks in rows of nonincreasing width within a grid of a maximum width and height. The number of realizable field arrangements is simply the total number and multiplicity of distinct Young diagrams that can be built of the given height and width (see Appendix 3), which in our case is given by the periods of the two modules.

Let us consider the field arrangements permitted by combining grid-like inputs from two modules, of periods $λ_{1}$ and $λ_{2}$ , (Figure 5A). The total number of distinct grid-cell modules is estimated to be between 5 and 8 (Stensola et al., 2012). Further, there is a spatial topography in the projection of grid cells to the hippocampus, such that each local patch of the hippocampus likely receives inputs from 2, and likely no more than 3, grid modules (Witter and Groenewegen, 1984; Amaral and Witter, 1989; Witter and Amaral, 1991; Honda et al., 2012; Witter et al., 2000). We denote cells by their outgoing weights ( $w_{i j}$ is the weight from cell $j$ in module $i$ ) and arrange the weights along the axes of a coordinate space, one axis per module, in order of increasing size (Figure 5B). Since modular-one-hot codes are invariant to permutation of the cells within a module, we can assume a fixed ordering of cells and weights in counting all realizable arrangements, without loss of generality. The threshold (dark purple line) sets which combination of summed weights can contribute to a place field arrangement: no cell combinations below the boundary (purple region) have too small a summed weight and cannot contribute, while all cell combinations with larger summed weights (white region) can (Figure 5B). Decreasing the threshold (from Figure 5B to C) or increasing weights (from Figure 5B,C to D) a sufficient amount so some cells cross the threshold increases the number of combinations. But changes that do not cause cells to move past the threshold do not change the combinations (Figure 5B, solid versus dashed gray lines).

Young diagrams extract this topological information, stripping away geometric information about analog weights (Figure 5E). A Young diagram consists of stacks of blocks in rows of nonincreasing width, with maximum width and height given in this case by the two module periods, respectively. The number of realizable field arrangements turns out to be equivalent to the total number of Young diagrams that can be built of the given maximum height and width (see Appendix 3). With this mapping, we can leverage combinatorial results on Young diagrams (Fulton and Fulton, 1997; Postnikov, 2006) (commonly used to count the number of ways an integer can be written as a sum of non-negative integers).

As a result, the total number of separating hyperplanes (K-field arrangements for all $K$ ) across the full range $L$ can be written exactly as (see Appendix 3).

N_{λ_{1}, λ_{2}} = \sum_{k = 0}^{min (λ_{1}, λ_{2})} (k!)^{2} S_{k + 1}^{(λ_{1} + 1)} S_{k + 1}^{(λ_{2} + 1)} = B_{λ_{2}}^{(- λ_{1})},

where $S_{k}^{(n)}$ are Stirling numbers of the second kind and $B_{k}^{(n)}$ are the poly-Bernoulli numbers (Postnikov, 2006; Kaneko, 1997). Assuming that the two periods have a similar size $(λ_{1} \approx λ_{2} \equiv λ)$ , this number scales asymptotically as (de Andrade et al., 2015).

N_{λ, λ} = B_{λ}^{(- λ)} = (\frac{1}{\log 2 \sqrt{1 - \log 2}} + o (1)) \frac{(2 λ)!}{(2 \log 2)^{2 λ}} \sim λ^{2 λ} .

Thus, the number of realizable field arrangements with $\sim λ^{2}$ distinct modular-one-hot input patterns in a $2 λ$ -dimensional space grows nearly as fast as $λ^{2 λ}$ , (Table 1, row 2, columns 1–3). The total number of dichotomies over these input patterns scales as $2^{λ^{2}} .$ Thus, while the number of realizable arrangements over the full range is very large, it is a vanishing fraction of all potential arrangements (Table 1, row 2, column 4).

Table 1

Number and fraction of realizable dichotomies with binary, modular-one-hot ( $M = 2$ modules) and one-hot input codes with the same input cell budget ( $N = 2 λ$ ).

	# cells	# input patts (L)	# lin dichot	Frac lin dichot
Binary	2λ	2^2λ	2^2λ2	$2^{2 λ^{2} - 2^{2 λ}}$
Binary	=	<<	<<	>>
Modular-one-hot	2λ	λ²	${(\frac{λ}{e \log (2)})}^{2 λ}$	$2^{2 λ \log (λ) - λ^{2}}$
Modular-one-hot	=	<<	<<	>>
One-hot	2λ	2λ	2^2λ	1

If $M \geq 3$ modules were to contribute to each place field’s response, then all realizable field arrangements still would correspond to Young diagrams; however, not all diagrams would correspond to realizable arrangements. Thus, counting Young diagrams would yield an upper bound on the number of realizable field arrangements but not an exact count (see Appendix 3). The latter limitation is not a surprise: Due to the structure of the grid-like code (a product of simplices), the enumeration of realizable dichotomies with arbitrarily many input modules is expected to be at least as challenging as that of Boolean functions. Counting the number of linearly separable Boolean functions of arbitrary (input) dimension (Peled and Simeone, 1985; Hegedüs and Megiddo, 1996) is hard.

Nevertheless, we can provide an exact count of the number of realizable $K$ -dichotomies for arbitrarily many input modules $M$ if $K$ is small ( $K = 1, 2, 3$ and 4). This may be biologically relevant since place fields tend to fire sparsely even on long tracks and across environments. In this case, the number $N_{K}$ of realizable small- $K$ field arrangements scales as (the exact expression is derived analytically in Appendix 3)

N_{K} \sim M^{K - 1} λ^{M + K - 1} .

The scaling approximation becomes more accurate for periods that are large relative to the spatial discretization (see Appendix 3). Since the total number of K-dichotomies scales as $λ^{M K}$ , the fraction of realizable K-dichotomies scales as ${(M / λ)}^{K - 1} λ^{- (M - 1)}$ , which for $λ ≫ 1, λ §gt; M$ vanishes as a power law as soon as $M §gt; 1$ .

We can compare this result with the number of K-field arrangements realizable by one-hot codes. Since any arrangement is realizable with one-hot codes, it suffices to simply count all K-field arrangements. The full range of a one-hot code with $M λ$ cells is $M λ$ , thus the number of realizable K-field arrangements is $N_{K} = (\binom{M λ}{K}) \sim (M λ)^{K}$ , where the last scaling holds for $K ≪ M λ$ . In short, a one-hot code enables $\sim M^{K} λ^{K}$ arrangements, while the corresponding modular-one-hot code with $M λ$ cells enables $\sim M^{K - 1} λ^{K + M - 1}$ field arrangements, for a ratio $λ^{M - 1} / M ≫ 1$ of realizable fields with modular-one-hot versus one-hot codes. Once again, as in the case where we counted arrangements without regard to sparseness, the grid-like code enables far more realizable K-field arrangements than one-hot codes.

In summary, place cells driven by grid inputs can achieve a very large number of unique coding states that grows exponentially with the number of modules. We have derived this result for $M = 2$ and all K-field arrangements, on one hand, and for arbitrary $M$ but ultra-sparse (small- $K$ ) field arrangements. It is difficult to obtain an exact result for sparse field arrangements for which $K$ is a small but finite fraction of $L$ ; however, we expect that regime should interpolate between these other two; it will be interesting and important for future work to shed light on this intermediate regime. In all cases, the number of realizable arrangements is large but a vanishingly small fraction of all arrangements, and thus forms a highly structured subset. This suggests that place cells, when driven by grid-cell inputs, can form a very large number of field arrangements that seem essentially unrestricted, but individual cells actually have little freedom in where to place their fields.

Comparison with other input patterns

How does the number of realizable place field arrangements differ for input codes with different levels of modularity and hierarchy? We directly compare codes with the same neuron budget (input dimension $N$ ) by taking $N = M λ$ , where for simplicity, we set $λ_{i} = λ$ for all modules in the modular-one-hot codes. This is because the modular-one-hot codes include all permutations of states in each module, the number of unique input states with equal-sized modules still equals the product of periods $L = {(N / M)}^{M} = λ^{M}$ , as when the periods are different and coprime. The one-hot code generates far fewer distinct input patterns ( $L = N = M λ$ ) than the modular-one-hot code, which in turn generates fewer input patterns than the binary code ( $L = 2^{N} = 2^{M λ}$ ) (Table 1, column 2). This is due to the greater expressive power afforded by modularity and hierarchy.

Next, we compare results across codes for $M = 2$ , the case for which we have an explicit formula counting the total number of realizable field arrangements for any $K$ , and which is also best supported by the biology.

How many dichotomies are realizable with these inputs? As for the modular-one-hot codes, the patterns of $X_{oh}$ and $X_{b}$ fall on the vertices of a convex polytope. For $X_{oh}$ , that polytope is just a $(N - 1)$ -dimensional simplex (Figure 4C, left), thus any subset of $K$ vertices ( $1 \leq K \leq N$ ) lies on a $(K - 1)$ -dimensional face of the simplex and is therefore a linearly separable dichotomy. Thus, all $2^{N}$ dichotomies of $X_{oh}$ are realizable and the fraction of realizable dichotomies is 1 (Table 1, columns 3 and 4). For $X_{b}$ , the polytope is a hypercube; it therefore consists of square faces, a prototypical configuration of points not in general position (not linearly separable, Figure 2B and Figure 4, right) even when the number of patterns is small relative to the input dimension (number of cells). Counting the number of linearly separable dichotomies on vertices of a hypercube (also called linear Boolean functions) has attracted much interest (Peled and Simeone, 1985; Hegedüs and Megiddo, 1996). It is an NP-hard combinatorial problem, so no exact solution exists. However, in the limit of large dimension ( $N \to \infty$ ), the number of linearly separable dichotomies scales as $2^{N^{2} / 2}$ (Zuev, 1989), a much larger number than for one-hot inputs (Table 1, column 3). However, this number is a strongly vanishing fraction of all $2^{2^{N}}$ hypercube dichotomies (Table 1, column 4).

For modular-one-hot codes with $M$ modules, the polytopes contain $M$ -dimensional hypercubes and not all patterns are thus in general position. We determined earlier that the total number of realizable dichotomies with $M = 2$ modules scales as $λ^{2 λ}$ , permitting a direct comparison with the one-hot and binary codes (Table 1, row 2).

Finally, we may compare grid-like codes with random (real-valued) codes, which are the standard inputs for the classical perceptron results. For a fixed input dimension, it is possible to generate infinitely many real-valued patterns, unlike the finite number achievable by ${0, 1}$ -valued codes. We thus construct a random codebook $X_{r}$ with the same number, $P = λ^{2}$ , of input patterns as the modular-one-hot code. We then determine the input dimension $N$ required to obtain the same number of realizable field arrangements as the grid-like code. The number of realizable dichotomies of the random code with $P ≫ N$ patterns scales as $P^{N} \sim λ^{2 N}$ according to an asymptotic expansion of Cover’s function counting theorem (Cover, 1965). For this number to match $\sim λ^{2 λ}$ , the number of realizable field arrangements with a one-hot-modular code (of two modules of size $\sim λ$ each requires) $N \sim λ$ . This is a comparable number of input cells in both codes, which is an interesting result because unlike for random codes the grid-like input patterns are not in general position, the states are confined to be ${0, 1}$ -valued, and the grid input weights can be confined to be non-negative.

In sum, the more modular a code, the larger the set of realizable field arrangements, but these are also increasingly special subsets of all possible arrangements and are strongly structured by the inputs, with far from random or arbitrary configurations. Modular-one-hot codes are intermediate in modularity. Therefore, grid-driven place-cell responses occupy a middle ground between pattern richness and constrained structure.

Place-cell-separating capacity

We now turn to question (2) from above: what is the maximal range of locations, $l^{*}$ , over which all field arrangements are realizable? Once we reference a spatial range, the mapping of coding states to spatial locations matters (specifically, the fact that locations in the range are spatially contiguous matters, but given the fact that the code is translationally invariant [Fiete et al., 2008], the origin of this range does not). We thus call $l^{*}$ the ‘contiguous-separating capacity’ of a place cell (though we will refer to it as separating capacity, for short); it is the analogue of Cover’s separating capacity (Cover, 1965), but for grid-like inputs with the addition of a spatial contiguity constraint.

We provide three primary results on this question. (1) We establish that for grid-structured inputs, the separating capacity $l^{*}$ equals the rank $R$ of the input matrix. (2) We establish analytically a formula for the rank $R$ of grid-like input matrices with integer periods and generalize the result to real-valued periods. (3) We show that this rank, and thus the separating capacity for generic real-valued module periods, asymptotically approaches the sum $Σ \equiv \sum_{m = 1}^{M} λ_{m}$ . Our results are verified by numerical simulation and counting (proofs provided in Supporting Information Appendix).

We begin with a numerical example, using periods {3,4} (Figure 6A): the full range is $L = 12$ , while we see numerically that the contiguous-separating capacity is $l^{*} = 6$ . Although the separating capacity with grid-structured inputs is smaller than with random inputs, it is notably not much smaller (Figure 6B, black versus cyan curves), and it is actually larger than for random inputs if the read-out weights are constrained to be non-negative (Figure 6B, pink curves). Later, we will further show that the larger random-input capacity of place cells with unrestricted weights comes at the price of less robustness: the realizable fields have smaller margins. Next, we analytically characterize the separating capacity of place cells with grid-like inputs.

Figure 6

Download asset Open asset

Place-cell-separating capacity.

(A) Fraction of K-field arrangements that are realizable with grid-like inputs as a function of range ( $L$ indicates the full range; in this example, grid periods are ${3, 4}$ and $L = 12$ ). (B) Fraction of realizable field arrangements (summed over $K$ ) as a function of range for grid cells (black); for random inputs, range refers to number of input patterns (solid cyan: random with matching input dimension; open/dashed cyan: random with input dimension equal to rank of the grid-like input matrix; dark teal: same as open cyan, but with weights constrained to be non-negative, as for grid-like inputs). With the non-negative weight constraint for random inputs, different specific input configurations produce quite different results, introducing considerable variability in separating capacity (unlike the unconstrained random input case or the grid code case for which results are exact rather than statistical). (C) The grid code is generated by iterated application of a phase-shift operator as a function of one-step updates in position over a contiguous 1D range. This feature of the code leads to a separating capacity that achieves its optimal value, given by the rank of the input matrix. (D) Separating capacity as a function of the sum of module periods for real-valued periods (randomly drawn from $λ_{i} \in [3, 20]$ with $M \in {2, 3, 4, 5, 6}$ , 100 realizations), showing the quality of the integer approximation at different resolutions. Integer approximations to the real-value periods at successively finer resolutions quickly converge, with results from $q = 2$ and $q = 4$ nearly indistinguishable from each other. Inset: ratio of separating capacity to sum of periods ( $R_{re}^{q} / Σ$ as a function of resolution $q$ quickly approaches 1 from below as $q$ increases). (**E, F**) Capacity results generalize to multidimensional spatial settings: (E) in 2D, grid-cell-activity patterns lie on a hexagonal lattice (all circles of one color mark the activity locations of one grid cell). For grid periods ${2, 3}$ , this code utilizes 4 two-periodic cells and 9 three-periodic cells, respectively. (F) Full range of the 2D grid-like code from (E). The set of contiguous locations over which any place field arrangement is realizable (the 2D separating capacity) is shown in gray.

Separating capacity equals rank of grid-like inputs

For inputs in general position, the separating capacity equals the rank of the input matrix (plus 1 when the threshold is allowed to be nonzero), and the rank equals the dimension (number of cells) of the input patterns – the input matrix is full rank. When inputs are in general position, all input subsets of size equaling the separating capacity have the same rank. But when input patterns are not in general position, some subsets can have smaller ranks than others even when they have the same size. Thus, when input patterns are not in general position the separating capacity is only upper bounded by the rank of the full input matrix. In turn, the rank is only upper bounded by the number of cells (the input matrix need not be full rank).

For the grid-like code, all codewords can be generated by the iterated application of a linear operator $J$ to a single codeword: a simultaneous one-unit phase shift by a cyclic permutation in each grid module is such an operator $J$ , which can be represented by a block-form permutation matrix. The sequence $x, J x, J^{2} x, \dots J^{m} x$ of patterns generated by applying $J$ to a grid-like codeword $x$ with the same module structure represents $m$ contiguous locations (Figure 6C).

The separating capacity for inputs generated by iterated application of the same linear operation saturates its bound by equaling the rank of the input pattern matrix. Since a code $x, J x, J^{2} x, J^{3} x, \dots$ , generated by some linear operator $J$ with starting codeword $x$ is translation invariant, the number of dimensions spanned by these patterns strictly increases until some value $l$ , after which the dimension remains constant. By definition, $l$ is therefore the rank $R$ of the input pattern matrix. It follows that any contiguous set of $l = R$ patterns is linearly independent, and thus in general position, which means that the separating capacity of such a pattern matrix is $R$ .

For place cells, it follows that whenever $l \leq R$ , with $R$ the rank of the grid-like input matrix, all field arrangements are realizable, while for any $l §gt; R$ , there will be nonrealizable field arrangements (Supporting Information Appendix). Therefore, the contiguous-separating capacity for place cells is $l^{*} = R$ . This is an interesting finding: the separating capacity of a place cell fed with structured grid-like inputs approaches the same capacity as if fed with general-position inputs of the same rank. Next, we compute the rank $R$ for grid-like inputs under increasingly general assumptions.

Grid input rank converges to sum of grid module periods

Integer periods

For integer-valued periods $λ_{m} (1 \leq m \leq M)$ , the rank of the matrix consisting of the multi-periodic grid-like inputs can be determined through the inclusion-exclusion principle (see Section B.4):

R_{int} (λ_{1}, \dots, λ_{M}) = \sum_{i = 1}^{M} λ_{i} + \sum_{k = 2}^{M} {(- 1)}^{k - 1} \sum_{i = 1}^{(\binom{M}{k})} GCD (S_{k}^{i}),

where $S_{k}^{i}$ is the ith of the k-element subsets of ${λ_{1}, \dots, λ_{M}}$ . To gain some intuition for this expression, note that if the periods were pairwise coprime, all the GCDs would be 1 and this formula would quite simply produce $R_{copr} (λ_{1}, \dots, λ_{M}) = Σ - M + 1$ , where $Σ$ is defined as the sum of the module periods. If the periods are not pairwise coprime, the rank is reduced based on the set of common factors, as in (5), which satisfies the following inequality: $Σ - \sum_{i §lt; j} GCD (λ_{i}, λ_{j}) \leq R_{i n t} (λ_{1}, \dots, λ_{M}) \leq Σ$ . When the periods are large ( $λ ≫ 1$ ), the rank approaches $Σ$ . Large integers ( $λ ≫ 1$ ) evenly spaced or uniformly randomly distributed over some range tend not to have large common factors (Cesaro, 1881). As a result, even for non-coprime periods, the rank scales like and approaches $Σ$ (see below for more elaboration).

Real-valued periods

Actual grid periods are real- rather than integer-valued, but with some finite resolution. To obtain an expression for this case, consider the sequence of ranks $R_{re}^{q}$ defined as

R_{r e}^{q} (λ_{1}, \dots, λ_{M}) = R_{i n t} (⌊ q λ_{1} ⌋, \dots, ⌊ q λ_{M} ⌋),

where $⌊ \cdot ⌋$ denotes the floor operation, $q$ is an effective resolution parameter that takes integer values (the larger $q$ , the finer the resolution of the approximation to a real-valued period), and the periods $0 §lt; λ_{1} §lt; \dots §lt; λ_{M}$ are real numbers. The rank of the grid-like input matrix with real-valued periods is given by ${lim}_{q \to \infty} R_{re}^{q} (λ_{1}, \dots, λ_{M}) / q$ , if this limit exists. A finer resolution (higher $q$ ) corresponds to representing phases with higher resolution within each module, and thus intuitively to scaling the number of grid cells in each module by $q$ .

Suppose that the periods are drawn uniformly from an interval of the reals, which we take without loss of generality to be $(0, 1)$ . Then the values $⌊ q λ_{1} ⌋, \dots, ⌊ q λ_{M} ⌋$ are integers in ${1, \dots, q}$ and as above we have that $0 \leq q Σ - R_{r e}^{q} (λ_{1}, \dots, λ_{M}) \leq \sum_{i §lt; j} GCD (⌊ λ_{i} q ⌋, ⌊ λ_{j} q ⌋)$ . In the infinite resolution limit ( $q \to \infty$ ), the probability $GCD (⌊ λ_{i} q ⌋, ⌊ λ_{j} q ⌋) = g$ scales asymptotically as $1 / g^{2}$ , independent of $q$ (Cesaro, 1881), which means that large randomly chosen large integers tend not to have large common factors. This implies that with probability 1, the limit ${lim}_{q \to \infty} R_{re}^{q} (λ_{1}, \dots, λ_{M}) / q$ is well-defined and equals $Σ$ , the sum of the input grid module periods.

When assessed numerically at different resolutions ( $q$ ), the approach of the finite-resolution rank to the real-valued grid period rank is quite rapid (Figure 6D). Thus, the separating capacity does not depend sensitively on the precision of the grid periods. It is also invariant to the resolution with which phases are represented within each module.

In summary, the place-cell-separating capacity with real-valued grid periods and high-resolution phase representations within each module equals the rank of the grid-like input matrix, which itself approaches $Σ$ , the sum of the module periods. Thus, a place cell can realize any arrangement of fields over a spatial range given by the sum of module periods of its grid inputs.

It is interesting that the contiguous-separating capacity of a place cell fed with grid-like inputs not in general position approaches the same capacity as if fed with general-position inputs of the same rank. On the other hand, the contiguous-separating capacity is very small compared to the total range over which the input grid patterns are unique: since each local region of hippocampus receives input from 2 to 3 modules (Witter and Groenewegen, 1984; Amaral and Witter, 1989; Witter and Amaral, 1991; Witter et al., 2000; Honda et al., 2012), the range over which any field arrangement is realizable is at most 2–3 times the typical grid period. By contrast, the total range $L$ of locations over which the grid inputs provide unique codes scales as the product of the periods. The result implies that once field arrangements are freely chosen in a small region, they impose strong constraints on a much larger overall region and across environments. We explore this implication in more detail below.

Generalization to higher dimensions

We have already argued that our counting arguments hold for realistic tuning curve shapes with graded activity profiles. This follows from the fact that convolution of the grid-like codes with appropriate smoothing kernels does not change the general geometric arrangement of codewords relative to each other as these convolution operations preserve within-module permutation symmetries and across-module independence in the code. We have also shown that the contiguous-separating capacity results apply to real-valued grid periods with dense phase encodings within each module.

Here, we describe the generalization to different spatial dimensions. Consider a $d$ -dimensional grid-like code consisting of ${(λ_{m})}^{d}$ cells in the mth module to produce a one-hot phase code for $λ_{m}$ (discrete) positions along each dimension (Figure 6E). Since the counting results rely only on the existence of a modular-one-hot code and not any mapping from real spaces to coding states, this code across multiple modules $m = 1, \dots, M$ is equivalent to a modular-one-hot coding for $\prod_{m = 1}^{M} {(λ_{m})}^{d}$ states, with modules of size ${(λ_{m})}^{d}$ each. All the counting results from before therefore hold, with the simple substitution $λ_{m} \to {(λ_{m})}^{d}$ in the various formulae.

The contiguous-separating capacity in $d$ -dimensions is defined as the maximum volume over which all field arrangements are realizable. Like the 1D separating capacity results, this volume depends upon the mapping of physical space to grid-like codes. We are able to show that for grid modules with periods $λ_{1}, \dots, λ_{M}$ the generalized separating capacity is $l_{d}^{⋆} = Σ_{d} = \sum_{m = 1}^{M} λ_{m}^{d}$ (see Section B.4; Figure 6F). This result follows from essentially the same reasoning as for 1D environments, but with the use of $d$ -dimensional phase-shift operators.

Robustness of field arrangements to noise and nongrid inputs

An important quality of field arrangements that is neglected when merely counting the number of realizable arrangements or determining the separating capacity is robustness: these computations consider all realizable field arrangements, but field arrangements are practically useful only if they are robust so that small amounts of perturbation or noise in the inputs or weights do not render them unrealizable. Above, we showed that grid-like codes enable many dichotomies despite being structurally constrained, but that random analog-valued codes as well as more hierarchical codes permit even more dichotomies. Here, we show that the dichotomies realized by grid codes are substantially more robust to noise and thus more stable.

The robustness of a realizable dichotomy in a perceptron is given by its margin: for a given linear decision boundary, the margin is the smallest datapoint-boundary distance for each class, summed for the two classes. The maximum margin is the largest achievable margin for that dataset. The larger the maximum margin, the more robust the classification. We thus compare maximum margins (herein simply referred to as margins) across place field arrangements, when the inputs are grid-like or not.

Perceptron margins can be computed using quadratic programming on linear support vector machines (Platt, 1998). We numerically solve this problem for three types of input codes (permitting a nonzero threshold and imposing no weight constraints): the grid-like code $X_{g}$ ; the shuffled grid-like code $X_{gs}$ – a row- and column-shuffled version of the grid-like code that breaks its modular structure; and the random code $X_{r}$ of uniformly distributed random inputs (Figure 7). To make distance comparisons meaningful across codes, $(1)$ all patterns (columns) involve the same number of neurons (dimension), $(2)$ have the same total activity level (unity L₁ norm), and $(3)$ the number of input patterns is the same across codes, and chosen to equal $L$ , the full range of the corresponding grid-like code. To compute margins, we consider only the realizable dichotomies on these patterns.

Figure 7

Download asset Open asset

Robustness of place field arrangements to noise and nongrid inputs.

In (**A–C**), grid periods are ${31, 43}$ ; the number of input patterns is set to $1333 = LCM (31, 43)$ for all input codes. Input patterns are normalized to have unity L₁ norm in all cases. Maximum margins are determined by using SVC in scikit-learn (Pedregosa et al., 2011) (with thresholds and no weight constraints). (A) Black bars: the maximum margins of all realizable arrangements with grid-like inputs (bars have high multiplicity: across the very large number of realizable field arrangements, the set of distinct maximum margins is small and discrete because of the regular geometric structure of the grid-like code). Pink: margins for shuffled grid inputs that break the code’s modularity (shuffling neurons across modules for each pattern; 10 shuffles per $K$ and sampling 1000 realizable field arrangements per shuffle). Blue: margins for random inputs in general position (inputs sampled i.i.d. uniformly from $[0, 1]$ ; 10 realizations of a random matrix per $K$ , 1000 realizable field arrangements sampled per realization). (B) Effect of noise on margins. We added dense noise inputs (100 non-negative i.i.d. random inputs at each location) to the place cell, in addition to the 74 grid-like inputs. (The expected value of each random input was 20% of the population mean of the grid inputs; thus, the summed random input was on average $(0.2 \times 100 / 74)$ the size of the summed grid input.) Black: noise-free margins as in (A). Empty green violins: margins of existing field arrangements modestly shrink in size. Solid green violins: margins of some newly created field arrangements: these are small and thus unstable. (C) Effect of sparse spatial inputs (plots as in C). (We added 100 sparse ${0, 1}$ inputs per location; each sparse input had $0.2 \times 2 L / 74$ fields placed randomly across the full range $L$ , so that the summed sparse input was on average $(0.2 \times 100 / 74)$ the size of the summed grid input. The combined grid and nongrid input at each location was normalized to 1.)

The margins of all realizable place field arrangements with grid-like inputs are shown in Figure 7A (black); the margin values for all arrangements are discretized because of the geometric arrangements of the inputs, and each black bar has a very high multiplicity. The grid-like code produces much larger-margin field arrangements than shuffled versions of the same code and random codes (Figure 7A, pink and blue). The higher margins of the grid-like compared to the shuffled grid-like code show that it is the structured geometry and modular nature of the code that produce well-separated patterns in the input space (Figure 4B) and create wide margins and field stability. In other words, place field arrangements formed by grid inputs, though smaller in number than arrangements with differently coded inputs, should be more robust and stable against potential noise in neural activations or weights.

Next, we directly consider how different kinds of nongrid inputs, driving place cells in conjunction with grid-like inputs, affect our results on place field robustness. We examine two distinct types of added nongrid input: (1) spatially dense noise that is meant to model sources of uncontrolled variation in inputs to the cell and (2) spatially sparse and reliable cues meant to model spatial information from external landmarks.

After the addition of dense noise, previously realizable grid-driven place field arrangements remain realizable and their margins, though somewhat lowered, remain relatively large (Figure 7B, empty green violins). In other words, grid-driven place field arrangements are robust to small, dense, and spatially unreliable inputs, as expected given their large margins. Note that because the addition of dense i.i.d. noise to grid-like input patterns pushes them toward general position, and general-position inputs enable more realizable arrangements, the noise-added versions of grid-like inputs also give rise to some newly realizable field arrangements (Figure 7B, full green violins). However, as with arrangements driven purely by random inputs, these new arrangements have small margins and are relatively not robust. Moreover, since by definition noise inputs are assumed to be spatially unreliable, the newly realizable arrangements will not persist across trials.

Next, the addition of sparse spatial inputs (similar to the one-hot codes of Table 1, though the sparse inputs here are nearly but not strictly orthogonal) leaves previous field arrangements largely unchanged and their margins substantially unmodified (Figure 7C, empty green violins). In addition, a few more field arrangements become realizable and these new arrangements also have large margins (Figure 7C, full green violins). Thus, sufficiently sparse spatial cues can drive additional stable place fields that augment the grid-driven scaffold without substantially modifying its structure. Plasticity in weights from these sparse cue inputs can drive the learning of new fields without destabilizing existing field arrangements.

In sum, grid-driven place arrangements are highly robust to noise. Combining grid-cell drive with cue-driven inputs can produce robust maps that combine internal scaffolds with external cues.

High volatility of field arrangements with grid input plasticity

Our results on the fraction of realizable place field arrangements and on place-cell-separating capacity with grid-like inputs imply that place cells have highly restricted flexibility in laying down place fields (without direct drive from external spatially informative cues) over distances greater than $Σ$ , the sum of the input grid module periods. Selecting an arrangement of fields over this range then constrains the choices that can be made over all remaining space in the same environment and across environments. Conversely, changing the field arrangement in any space by altering the grid-place weights should affect field arrangements everywhere.

We examine this question quantitatively by constructing realizable K-field arrangements (with grid-like responses generated as 1D slices through 2D grids [Yoon et al., 2016]), then attempting to insert one or a few new fields (Figure 8A,B). Inserting even a single field at a randomly chosen location through Hebbian plasticity in the grid-place weights tends to produce new additional fields at uncontrolled locations, and also leads to the disappearance of existing fields (Figure 8A,B).

Figure 8

Download asset Open asset

Predicted volatility of place field arrangements.

(A) Top: original field arrangement over a 20 m space (gray line: summed inputs to place cell; purple stars: original field locations; green arrow: location where new field will be induced by Hebbian plasticity in grid-place weights). Bottom: after induction of the new field (green star), two new uncontrolled fields appear (red stars). (B) Similar to (A): the insertion of a new field at a random location (green star) leads to one uncontrolled new field (red star) and the loss of two original fields (empty red stars). (C) Histogram of changes, after single-field insertion, in pairwise inter-field intervals (spacings): the primary off-target effect of field insertion is for other fields to appear or disappear, but existing fields do not tend to move. (D) A spatially extended version of (C) (purple), together with the (vertically rescaled) autocorrelation of the grid inputs to the cell (gray): new fields tend to appear at spacings corresponding to peaks in the input autocorrelation function. (E) Sum of uncontrolled field insertions or deletions per meter, in response to inserted fields when starting with a K-field arrangement over 20 m. (F) High place field volatility resulting from plasticity in the grid-to-place synapses suggests the possibility that grid-place weights might be relatively rigid (nonplastic).

Interestingly, though field insertion affects existing arrangements through the uncontrolled appearance or disappearance of other fields, it does not tend to produce local horizontal displacements of existing fields (Figure 8C): fields that persist retain their firing locations or they disappear entirely, consistent with the surprising finding of a similar effect in experiments (Ziv et al., 2013).

The locations of fields, including of uncontrolled field additions, are well-predicted by the structure (autocorrelation) of that cell’s grid inputs (Figure 8D). This multi-peaked autocorrelation function, with large separations between the tallest peaks, reflects the multi-periodic nature of the grid code and explains why fields tend to appear or disappear at remote locations rather than shifting locally: modest weight changes in the grid-like inputs modestly alter the heights of the peaks, so that some of the well-separated tall peaks fall below threshold for activation while others rise above.

Quantitatively, insertion of a single field at an arbitrary location in a 20 m span grid-place weight plasticity results in the insertion or deletion, on average, of $\sim 0.2$ uncontrolled fields per meter. The insertion of four fields anywhere over 20 m results in an average of one uncontrolled field per meter (Figure 8E).

Thus, if a place cell were to add a field in a new environment or within a large single environment by modifying the grid-place weights, our results imply that it is extremely likely that this learning will alter the original grid-cell-driven field arrangements (scaffold). By contrast, adding fields that are driven by spatially specific external cues, though plasticity in the cue input-to-place cell synapses, may not affect field arrangements elsewhere if the cues are sufficiently sparse (unique); in this case, the added field would be a ‘sensory’ field rather than an internally generated or ‘mnemonic’ one.

In sum, the small separating capacity of place cells according to our model may provide one explanation for the high volatility of the place code across tens of days (Ziv et al., 2013) if grid-place weights are subject to any plasticity over this timescale. Alternatively, to account for the stability of spatial representations over shorter timescales, our results suggest that external cue-driven inputs to place cells can be plastic but the grid-place weights, and correspondingly, the internal scaffold, may be fixed rather than plastic (Figure 8F). In experiments that induce the formation of a new place field through intracellular current injection (Bittner et al., 2015), it is notable that the precise location of the new field was not under experimental control: potentially, an induced field might only be able to form where an underlying (near-threshold) grid scaffold peak already exists to help support it, and the observed long plasticity window could enable place cells to associate a plasticity-inducing cue with a nearby scaffold peak.

This alternative is consistent with the finding that entorhinal-hippocamapal connections stabilize long-term spatial and temporal memory (Brun et al., 2008; Brun et al., 2002; Suh et al., 2011).

Finally, we note that the robustness of place field arrangements obtained with grid-like inputs is not inconsistent with the volatility of field arrangements to the addition or deletion of new fields through grid-place weight plasticity. Grid-driven place field arrangements are robust to random i.i.d. noise in the inputs and weights, as well as the addition of nongrid sparse inputs. On the other hand, the volatility results involve associative plasticity that induces highly nonrandom weight changes that are large enough to drive constructive interference in the inputs to add a new field at a specific location. This nonrandom perturbation, applied to the distributed and globally active grid inputs, results in global output changes.

Discussion

Grid-driven hippocampal scaffolds provide a large representational space for spatial mapping

We showed that when driven by grid-like inputs, place cells can generate a spatial response scaffold that is influenced by the structural constraints of the grid-like inputs. Because of the richness of their grid-like inputs, individual place cells can generate a large library of spatial responses; however, these responses are also strongly structured so that the realizable spatial responses are a vanishingly small fraction of all spatial responses over the range where the grid inputs are unique. However, realizable spatial field arrangements are robust, and place cells can then ‘hang’ external sensory cues onto the spatial scaffold by associative learning to form distinct maps spatial maps for multiple environments. Note that our results apply equally well to the situation where grid states are incremented based on motion through arbitrary Euclidean spaces, not just spatial ones (Killian et al., 2012; Constantinescu et al., 2016; Aronov et al., 2017; Klukas et al., 2020).

Summary of mathematical results

Mathematically, formulating the problem of place field arrangements as a perceptron problem led us to examine the realizable (linearly separable) dichotomies of patterns that lie not in general position but on the vertices of convex regular polytopes, thus extending Cover’s results to define capacity for a case with geometrically structured inputs (Cover, 1965). Input configurations not in general position complicate the counting of linearly separable dichotomies. For instance, counting the number of linearly separable Boolean functions, which is precisely the problem of counting the linearly separable dichotomies on the hypercube, is NP-hard (Peled and Simeone, 1985; Hegedüs and Megiddo, 1996).

We showed that the geometry of grid-cell inputs is a convex polytope, given by the orthogonal product of simplices whose dimensions are set by the period of each grid module divided by the resolution. Grid-like codes are a special case of modular-one-hot codes, consisting of a population divided into modules with only one active cell (group) at a time per module.

Exploiting the symmetries of modular-one-hot codes allowed us to characterize and enumerate the realizable K-field arrangements for small fixed $K$ . Our analyses relied on combinatorial objects called Young diagrams (Fulton and Fulton, 1997). For the special case of $M = 2$ modules, we expressed the number of realizable field arrangements exactly as a poly-Bernoulli number (Kaneko, 1997). Note that with random inputs, by contrast, it is not well-posed to count the number of realizable K-field arrangements when $K$ is fixed since the solution will depend on the specific configuration of input patterns. While we have considered two extreme cases analytically, one with no constraints on place field sparsity and the other with very few fields, it remains an outstanding question of interest to examine the case of sparse but not ultra-sparse field arrangements in which the number of fields is proportional to the full range, with a constant small prefactor (Itskov and Abbott, 2008). Finding results in this regime would involve restricting our count of all possible Young diagrams to a subset with a fixed filled-in area (purple area in Figure 5). This constraint makes the counting problem significantly harder.

We showed using analytical arguments that our results generalize to analog or graded tuning curves, real-valued periods, and dense phase representations per module. We also showed numerically that our qualitative results hold when considering deviations from the ideal, like the addition of noise in inputs and weights. The relatively large margins of the place field arrangements obtained with grid-like inputs make the code resistant to noise. In future work, it will be interesting to further explore the dependence of margins, and thus the robustness of the place field arrangements, on graded tuning curve shapes and the phase resolution per module.

Robustness, plasticity, and volatility

As described in the section on separating capacity, once grid-place weights are set over a relatively small space (about the size of the sum of the grid module periods), they set up a scaffold also outside of that space (within and across environments). Associating an external cue with this scaffold would involve updating the weights from the external sensory inputs to place cells that are close to or above threshold based on the existing scaffold. This does not require relearning grid-place weights and does not cause interference with previously learned maps.

By contrast, relearning the grid-place weights for insertion of another grid-driven field rearranges the overall scaffold, degrading previously learned maps (volatility: Ziv et al., 2013). If we consider a realizable field arrangement in a small local region of space then impose some desired field arrangement in a different local region of space through Hebbian learning, we might ask what the effect would be in the first region. Our results on field volatility provide an answer: if the first local region is of a size comparable to the sum of the place cell’s input grid periods, then any attempt to choose field locations in a different local region of space (e.g., a different environment) will almost surely have a global effect that will likely affect the arrangement of fields in the first region. A similar result might hold true if the first region is actually a disjoint set of local regions whose individual side lengths add up to the sum of input grid periods. This prediction might be consistent with the observed volatility of place fields over time even in familiar environments (Ziv et al., 2013).

Our volatility results alternatively raise the intriguing possibility that grid-place weights, and thus the scaffold, might be largely fixed and not especially plastic, with plasticity confined to the nongrid sensory cue-driven inputs and in the return projections from place to grid cells. The experiments of Rich et al., 2014 – in which place cells are recorded on a long track, the animal is then exposed to an extended version of the track, but the original fields do not shift – might be consistent with this alternative possibility. These are two rather strong and competing predictions that emerge from our model, each consistent with different pieces of data. It will be very interesting to characterize the nature of plasticity in the grid-to-place weights in the future.

Alternative models of spatial tuning in hippocampus

This work models place cells as feedforward-driven conjunctions between (sparse) external sensory cues and (dense) motion-based internal position estimates computed in grid cells and represented by multi-periodic spatial tuning curves. In considering place-cell responses as thresholded versions of their feedforward inputs including from grid cells, our model follows others in the literature that make similar assumptions (Hartley et al., 2000; Solstad et al., 2006; Sreenivasan and Fiete, 2011; Monaco et al., 2011; Cheng and Frank, 2011; Whittington et al., 2020). These models do not preclude the possibility that place cells feed back to correct grid-cell states, and some indeed incorporate such return projections (Sreenivasan and Fiete, 2011; Whittington et al., 2020; Agmon and Burak, 2020). It will be interesting in future work to analyze how such return projections affect the capacity of the combined system.

Our assumptions and model architecture are quite different from those of a complementary set of models, which take the view that grid-cell activity is derived from place cells (Kropff and Treves, 2008; Dordek et al., 2016; Stachenfeld et al., 2017). Our assumptions also contrast with a third set of models in which place-cell responses are assumed to emerge largely from locally recurrent weights within hippocampus (Tsodyks et al., 1996; Samsonovich and McNaughton, 1997; Battista and Monasson, 2020; Battaglia and Treves, 1998). One challenge for those models is in explaining how to generate stable place fields through velocity integration across multiple large environments: the capacity (number of fixed points) of many fully connected neural integrator models in the style of Hopfield networks tends to be small – scaling as $\sim N$ states with $N$ neurons (Amit et al., 1985; Gardner, 1988; Abu-Mostafa and Jacques, 1985; Sompolinsky and Kanter, 1986; Samsonovich and McNaughton, 1997; Battaglia and Treves, 1998; Battista and Monasson, 2020; Monasson and Rosay, 2013) because of the absence of modular structures (Fiete et al., 2014; Sreenivasan and Fiete, 2011; Chaudhuri and Fiete, 2019; Mosheiff and Burak, 2019). There are at least two reasons why a capacity roughly equal to the number of place cells might be too small, even though the number of hippocampal cells is large: (1) a capacity equal to the number of place cells would be quickly saturated if used to tile 2D spaces: 10⁶ states from 10⁶ cells supply 10³ states per dimension. Assuming conservatively a spatial resolution of 10 cm per state, this means no more than 100 m of coding capacity per linear dimension, with no excess coding states for error correction (Fiete et al., 2008; Sreenivasan and Fiete, 2011). (2) The hippocampus sits atop all sensory processing cortical hierarchies and is believed to play a key role in episodic memory in addition to spatial representation and memory. The number of potential cortical coding states is vastly larger than the number of place cells, suggesting that the number of hippocampal coding states should grow more rapidly than linearly in the number of neurons, which is possible with our grid-driven model but not with nonmodular Hopfield-like network models with pairwise weights between neurons.

Even if our assumption that place cells primarily derive their responses from grid-like inputs combined with external cue-derived nongrid inputs is correct, place cells may nevertheless deviate from our simple perceptron model if the place response involves additional layers of nonlinear processing. There are many ways in which this can happen: place cells are likely not entirely independent of each other, interacting through population-level competition and other recurrent interactions. Dendritic nonlinearities in place cells act as a hidden layer between grid-cell input and place cell firing (Poirazi and Mel, 2001; Polsky et al., 2004; Larkum et al., 2007; Spruston, 2008; Larkum et al., 2009; Harnett et al., 2012; Harnett et al., 2013; Stuart et al., 2016). Or, if we identify our model place cells as residing in CA1, then CA3 would serve as an intermediate and locally recurrent processing layer. In principle, hidden layers that generated a one-hot encoding for space from the grid-like inputs and then drove place cells as perceptrons would make all place field arrangements realizable. However, such an encoding would require a very large number of hidden units (equal to the full range of the grid code, while the grid code itself requires only the logarithm of this number). Additionally, place cells may exhibit richer input-output transformations than a simple pointwise nonlinearity, for instance, through cellular temporal dynamics including adaptation or persistent firing. Finding ways to include these effects in the analysis of place field arrangements is a promising and important direction for future study.

In sum, combining modular grid-like inputs produces a rich spatial scaffold of place fields, on which to associate external cues, much larger than possible with nonmodular recurrent dynamics within hippocampus. Nevertheless, the allowed states are strongly constrained by the geometry of the grid-cell drive. Further, our results suggest either high volatility in the place scaffold if grid-to-place-cell weights exhibit synaptic plasticity, or suggest the possibility that grid-to-place-cell weights might be random and fixed.

Numerical methods

Random, weight-constrained random, and shuffled inputs

Entries of the random input matrix are uniformly distributed variables in $[0, 1]$ . To compare separating capacity (Figure 4) of random codes with the grid-like code, we consider matrices of the same input dimension (number of neurons) as the grid-cell matrix, or alternatively of the same rank as the grid-cell matrix, then use Cover’s theorem to count the realizable dichotomies (Cover, 1965). Weight-constrained random inputs (Figure 4B–D) are random inputs with non-negative weights imposed during training.

To compare margins (Figure 7), we use matrices with the same input dimension and number of patterns. As margins scale linearly with the norm of the patterns, to keep comparisons fair the input columns (patterns) are normalized to have unity L₁ norm.

Nongrid inputs

To test how nongrid inputs affect our results (Figure 7C,D), the $λ_{1} + λ_{2}$ grid-like inputs from two modules with periods $λ_{1} = 31$ and $λ_{2} = 43$ are augmented by 100 additional inputs. In Figure 7C, each nongrid dense noisy input is a random variable selected uniformly and identically at each location from the uniform interval $[0, 2 μ]$ , where $μ = 0.2 μ_{g}$ , and $μ_{g} = 2 / (λ_{1} + λ_{2})$ is the population mean of the grid inputs. In Figure 7D, each nongrid sparse input is a ${0, 1}$ random variable with $Q$ nonzero responses across the full range $L = λ_{1} λ_{2}$ . We set $Q = 0.2 L μ_{g}$ . In all cases, input columns (patterns with grid and nongrid inputs combined) are finally normalized to have unity L₁ norm. Results are based on 1000 realizations (samples) of the nongrid inputs.

Grid-like inputs with graded tuning curves

We generate periodic grid-like activity with graded tuning curves as a function of 1D space $x$ in cell $i$ of module $m$ with period $λ_{m}$ as follows Sreenivasan and Fiete, 2011:

g (ϕ_{m} (x), φ_{i}) = e^{- \frac{‖ ϕ_{m} - φ_{i} ‖^{2}}{2 σ_{g}^{2}}}, ‖ α ‖ = min (| α |, 1 - | α |)

where the phase of module $m$ is $ϕ_{m} (x) = (x / λ_{m} mod 1)$ . The ith cell in a module has a preferred activity phase $φ_{i}$ drawn randomly and uniformly from (0,1). The tuning width $σ_{g}$ is defined in terms of phase, thus in real space the width of the activity bump grows linearly with the module period. We set $σ_{g} = 0.16$ (thus the full-width at half-max of the phase tuning curve equals 3/8 of the period, similar to grid cells).

Finally, to simulate quasi-periodic grid responses in 1D, we first generate 2D responses with Gaussian tuning on a hexagonal lattice, with the same field width as above. 1D responses of grid cells from the same module are then generated as parallel 1D slices of this lattice as in Yoon et al., 2016, with phases uniformly drawn at random.

Appendix 1

The geometry of the grid code

In this Appendix, we introduce the geometrical framework for the study of place cells modeled as perceptrons reading out the activity of grid cells. First, we define the space of grid-like inputs via symmetry considerations and without considering explicitly their relation to spatial locations. Second, we discuss linearly separable dichotomies in the space of grid-like inputs, whose geometric arrangements are not in general position. Third, we show that the geometry of grid-like inputs is that of a polytope that can be decomposed as an orthogonal product of simplices.

The space of grid-like inputs

We model grid-cell activity via ${0, 1}$ spatial patterns $r$ that take value 1 whenever the cell is active and take value 0 otherwise (Fyhn et al., 2004; Fiete et al., 2008). To model the periodic spatial response of grid cells, we assume that the activity pattern of a grid cell defines a periodic lattice with integer period $λ$ . For simplicity, we consider 1D model for which the spatial patterns $r$ are $λ$ -periodic vectors and for which the set of activity patterns is given by the lattices $i + λ ℤ$ , $1 \leq i \leq L$ . We refer to the index $i$ as the phase index of the grid-cell spatial pattern. Our key results will generalize to lattices of arbitrary dimension $n$ , for which the set of spatial patterns is given by the hypercube lattices $i + {(λ Z)}^{n}$ , with phase indices $i$ in ${1, \dots, λ}^{n}$ .

Within a population, grid cells can have distinct periods and arbitrary phases. To model this heterogeneity, we consider a population of grid cells with $M$ possible integer spatial periods $λ = (λ_{1}, \dots, λ_{M})$ , thereby defining $M$ modules of grid cells. We assume that each module comprises all possible grid-cell-activity patterns, that is, $λ_{m}$ grid cells labeled by the phase indices $i$ , $1 \leq i \leq λ_{m}$ . For convenience, we index each cell by its module index $m$ and its phase index $i$ , $1 \leq i \leq λ_{m}$ , so that the actual component index of cell $(m, i)$ , $1 \leq i \leq λ_{m}$ , is $\sum_{n §lt; m} λ_{m} + i$ . By construction of our model, at every spatial position, each module has a single active cell. Thus, at each spatial position, the grid-like input is specified by ${0, 1}$ column vectors $c_{λ}$ of dimension $N = \sum_{m = 1}^{M} λ_{m}$ , the total number of grid cells.

In principle, the inputs to place cells are defined as spatial locations. Here, by contrast, we consider grid-like inputs as the inputs to place cells, without requiring these patterns to be spatial encodings. This approach is mathematically convenient as it allows us to exploit the many symmetries of the set of grid-like inputs denoted by $C_{λ}$ . The set $C_{λ}$ contains as many grid-like inputs $c$ as there are choices of phase indices in each module, that is, $Λ = \prod_{m = 1}^{M} λ_{m}$ :

C_{λ} = {c = (c_{1}, \dots, c_{M}) \in {0, 1}^{λ_{1}} \times \dots \times {0, 1}^{λ_{M}} | \sum_{i = 1}^{λ_{m}} c_{m, i} = 1, 1 \leq m \leq M} .

Here follow two examples of grid-like inputs $C_{λ}$ enumerated in lexicographical order for $λ = (2, 3)$ and $λ = (2, 2, 2)$ .

C_{(2, 3)} = {\begin{array}{cccccc} 1 & 1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 1 \\ 1 & 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 \end{array}}, C_{(2, 2, 2)} = {\begin{array}{cccccccc} 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 \\ 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \end{array}} .

Observe that, albeit inspired by the spatial activity of grid cells, the set of patterns $C_{λ}$ has broader relevance than suggested by its use for modeling grid-like inputs. In fact, the set of patterns $C_{λ}$ describes any modular winner-take-all activity, whereby cells are pooled in modules with only one cell active at a time – the winner of the module.

In the following, we consider that linear read-outs of grid-like inputs determine the activity of downstream cells, called place cells (O’Keefe and Dostrovsky, 1971). The set of these linear read-outs is the vector space $V_{λ}$ spanned by the grid-like inputs $C_{λ}$ . The dimension of the vector space $V_{λ}$ specifies the dimensionality of the grid code. The following proposition characterizes $V_{λ}$ and shows that its dimension is simply related to the periods $λ$ .

Proposition 1

The set of grid-like inputs $C_{λ}$ specified by $M$ grid modules with integer periods $λ = (λ_{1}, \dots, λ_{M})$ span the vector space

V_{λ} = s p a n C_{λ} = {y = (y_{1}, \dots, y_{M}) \in R^{λ_{1}} \times \dots \times R^{λ_{M}} | \sum_{i = 1}^{λ_{1}} y_{1, i} = \dots = \sum_{i = 1}^{λ_{M}} y_{M, i}},

In particular, the embedding dimension of the grid code is $\dim V_{λ} = \sum_{m = 1}^{M} λ_{m} - M + 1$ .

Proof. Let us denote by $A_{λ}$ a matrix formed by collecting all the column vectors from $C_{λ}$ . The vector space $V_{λ}$ is the range of the matrix $A_{λ}$ , which is also the orthogonal complement of $\ker A_{λ}^{T}$ . A vector $x = (x_{1, 1}, \dots, x_{1, λ_{1}} | \dots \dots | x_{M, 1}, \dots, x_{M, λ_{M}})$ in $ℝ^{λ_{1}} \times \dots \times ℝ^{λ_{M}}$ belongs to $\ker A_{λ}^{T}$ if and only if $x^{T} A_{λ} = 0$ . By construction of the matrix $A_{λ}$ :

x^{T} A_{λ} = 0 ⟺ \sum_{m = 1}^{M} x_{m, i_{m}} = 0, f o r a l l 1 \leq i_{m} \leq λ_{m},

where i_m refers to the index of the active cell in module $m$ . The latter characterization implies that

\ker A_{λ}^{T} = {x = (a_{1}, \dots, a_{1} | \dots \dots | a_{M}, \dots, a_{M}) \in R^{λ_{1}} \times \dots \times R^{λ_{M}} | \sum_{m = 1}^{M} a_{m} = 0} .

In turn, a vector $y = (y_{1, 1}, \dots, y_{1, λ_{1}} | \dots \dots | y_{M, 1}, \dots, y_{M, λ_{M}})$ of the orthogonal complement of $\ker A_{λ}^{T}$ , that is, in the range of $A_{λ}$ , is determined by $x^{T} y = 0$ for all $x$ in $\ker A_{λ}^{T}$ . From the above characterization of $\ker A_{λ}^{T}$ , this means that $y$ is in the range of $A_{λ}$ , that is, in $V_{λ}$ , if and only if for all $a_{1}, \dots, a_{M}$ such that $\sum_{m = 1}^{M} a_{m} = 0$ , we have

\sum_{m = 1}^{M} a_{m} \sum_{i = 0}^{λ_{m} - 1} y_{m, i} = 0 .

Substituting $a_{M} = - \sum_{m = 0}^{M - 1} a_{m}$ in the above relation, we have that for all $a_{1}, \dots, a_{M - 1}$ in $ℝ^{M - 1}$ ,

\sum_{m = 1}^{M - 1} a_{m} (\sum_{i = 0}^{λ_{m} - 1} y_{m, i} - \sum_{i = 0}^{λ_{M}} y_{M, i}) = 0,

which is equivalent to $\sum_{i = 0}^{λ_{m} - 1} y_{m, i} = \sum_{i = 0}^{λ_{M} - 1} y_{M, i}$ for all $m$ , $1 \leq m §lt; M$ . The above relation entirely specifies the range of the activity matrix $A_{λ}$ , that is, $V_{λ}$ , as a vector space of dimension $\sum_{m = 1}^{M} λ_{m} - M + 1$ .

Linear read-outs of grid-like inputs

We model the response of a place cell as that of a perceptron, which takes grid-like inputs $c$ in $C_{λ}$ as inputs (Rosenblatt, 1958). Such a perceptron is parametrized by a decision threshold $θ$ and by a vector of read-out weights $w = (w_{1, 1}, \dots, w_{1, λ_{1}} | \dots \dots | w_{M, 1}, \dots, w_{M, λ M})$ , where the vertical separators delineate the grid-cell modules with periods $λ_{m}$ , $1 \leq m \leq M$ . By convention, we consider that a place cell is active for grid-like inputs $c$ such that $w^{T} c > θ$ and inactive otherwise. Thus, in the perceptron framework, a place cell has a multi-field structure if it is active on a set of several grid-like inputs $S \subset C_{λ}$ , with $| S | > 1$ (Rich et al., 2014). Considering grid-like inputs as inputs allows one to restrict the class of perceptrons under consideration.

Proposition 2

Every realizable multi-field structure can be implemented by a perceptron with $(i)$ non-negative weights, or $(i i)$ with zero threshold.

Proof. $(i)$ If $M$ is the total number of modules and 1 is the $N$ -dimensional column vectors of 1, for all grid-like inputs $c$ in $C_{λ}$ we have $1^{T} c = (1, \dots, 1) c = M$ . Thus, for all perceptron $(w, θ)$ and for all real µ, we have

(w + μ 1)^{T} c = w^{T} c + μ 1^{T} c = p + μ M,

where $p$ is the place-cell-activity level for grid-cell pattern $c$ in $C_{λ}$ . Consequently, setting $μ \geq \max_{1 \leq i \leq N} | w_{i} |$ , $w^{'} = w + μ 1$ and $θ^{'} = θ + μ M$ defines a new perceptron $(w^{'}, θ^{'})$ with non-negative weights, which operates the same classification as the perceptron $(w, θ)$ is equivalent to $p > θ$ The result directly follows from a similar argument by observing that for all grid-populations pattern $c$ in $C_{λ}$

w^{T} c - θ = {(w - θ 1)}^{T} c,

which implies that if the perceptron models $(w, θ)$ and $(w - θ 1, 0)$ achieve the same linear classification.

Our goal is to study the multi-field structure of place-cell perceptrons, which amounts to characterize the two-class linear classifications of grid-like inputs $C_{λ}$ . The study of linear binary classifications has a long history in machine learning. Given a collection of $Λ$ input patterns, there are $2^{Λ}$ possible assignments of binary labels to these patterns, also referred to as dichotomies. In general, not all dichotomies can be linearly classified by a perceptron. Those dichotomies that can be classified are called linearly separable. An important question is to compute the number of linearly separable dichotomies, which depends on the geometrical arrangement of the inputs presented to the perceptron. Remarkably, Cover’s function counting theorem specifies the exact number of linearly separable dichotomies for $P$ inputs represented as points in a $N$ -dimensional space (Cover, 1965). For inputs in general position, the number of dichotomies realizable by a zero-threshold perceptron is given by

N_{P, N} = 2 \sum_{k = 0}^{N - 1} (\binom{P - 1}{k}),

which shows that all dichotomies are possible as long as $P \leq N$ . A collection of points ${x_{1}, \dots, x_{P}}$ in an $N$ -dimensional space is in general position if no subset of $n + 1$ points lies on a $(n - 1)$ -dimensional plane for all $n \leq N$ . In our modeling framework, the inputs are collections of points representing grid-like inputs $C_{λ}$ . As opposed to Cover’s theorem assumptions, these grid-like inputs are not in general position as soon as we consider grid code with more than one module. For instance, it is not hard to see that for $λ = (2, 3)$ , the patterns $(1, 0 | 1, 0, 0)$ , $(1, 0 | 0, 1, 0)$ , $(0, 1 | 1, 0, 0)$ and $(0, 1 | 0, 1, 0)$ are not in general position for being the vertices of a square, therefore lying in a 2D plane. Nongeneric arrangements of grid-like inputs are due to symmetries that are inherent to the modular structure of the grid code. We expect such symmetries to heavily restrict the set of linearly separable dichotomies, therefore constraining the multi-field structure of a place cell perceptron.

We justify the above expectation by discussing the problem of linear separability for two codes that are related to the grid code. These two codes are the ‘one-hot’ code, whereby a single cell is active for all input pattern, and the ‘binary’ code, whereby the set of input patterns enumerate all possible binary vectors of activity. Exemplars of grid-like inputs for the one-hot code and the binary code are given for $N = 3$ input cells by

C_{o h} = {\begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}} a n d C_{b} = {\begin{array}{cccccccc} 0 & 0 & 0 & 0 & 1 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 & 0 & 0 & 1 & 1 \\ 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 \end{array}} .

From a geometrical point of view, a set of points representing the grid-like inputs $S_{J} \subset C$ is linearly separable if there is a hyperplane separating the points $S$ from the other points $C ∖ S$ . The existence of a hyperplane separating a single point from all other points is straightforward when the set of patterns correspond to the vertices of a convex polytope. Then, every vertex can be linearly separated from the other points for being an extreme point. It turns out that both the population patterns of the one-hot code and of the binary code represent the vertices of a convex polytope: a simplex for the single-cell code and a hypercube for the binary code. However, because these vertices are in general position for the single-cell code but not for the binary code, the fraction of linearly separable dichotomies drastically differs for the two codes.

Let us first consider the $N$ points whose coordinates are given by $C_{oh}$ . The convex hull of $C_{oh}$ is the canonical $(N - 1)$ -dimensional simplex. Thus, any sets of $k$ vertices, $1 \leq k \leq N$ , specifies a $(k - 1)$ -dimensional face of the simplex, and as such, is a linearly separable $k$ -dichotomy. This immediately shows that all dichotomies are linearly separable. This result follows from the fact that the $N$ points in $C_{oh}$ are in general position. Let us then consider the $2^{N}$ points whose coordinates are given by $C_{b}$ . The convex hull of $C_{b}$ is the canonical $N$ -dimensional hypercube. Thus, by contrast with $C_{o h}$ , the points in $C_{b}$ are not in general position. As a result, there are dichotomies that are not linearly separable as shown by considering. For instance, the pair ${(1, 0, 0)$ , $(0, 1, 0)}$ and the pair ${(0, 0, 0)$ , $(1, 1, 1)}$ can be linearly separated from the other points of the hypercube. Determining the number of linearly separable sets of hypercube vertices is a hard combinatorial problem that has attracted a lot of interest (Peled and Simeone, 1985; Hegedüs and Megiddo, 1996). Unfortunately, there is no efficient characterization of that number as a function of the dimension $N$ . However, it is known that out of the $2^{2^{N}}$ possible dichotomies, the total number of linearly separable dichotomies scales as $2^{N^{2}}$ in the limit of large dimension $N \to \infty$ (Irmatov, 1993). This shows that only a vanishingly small fraction of hypercube dichotomies are also linearly separable.

Grid code convex polytope

It is beneficial to gain geometric intuition about grid-like inputs to characterize their linearly separable dichotomies. As binary vectors of length $N$ , grid-like inputs form a subset of the $2^{N}$ vertices of the $N$ -dimensional hypercube. Just as for the one-hot and binary codes, linear separability of sets of grid-like inputs can be seen as a geometric problem about polytopes. To clarify this point, let us denote by $H_{λ}$ the convex hull of grid-like inputs $C_{λ}$ . By definition, we have

H_{λ} = {\sum_{i = 1}^{L} α_{i} c_{i} | α_{i} \geq 0, g t \sum_{i = 1} α_{i} = 1},

where $c_{i}$ in $C_{λ}$ denotes the ith column of $A_{λ}$ . The convex hull $H_{λ}$ turns out to have a simple geometric structure.

Proposition 3

For integer periods $λ = (λ_{1}, \dots, λ_{M})$ , the convex hull generated by $C_{λ}$ , the set of grid-cell-population patterns, determines a $d$ -dimensional polytope $H_{λ}$ , with $d = \sum_{m = 1}^{M} λ_{m} - M$ , defined as $H_{λ} = Δ^{λ_{1}} \times \dots \times Δ^{λ_{M}}$ where $Δ^{λ_{m}}$ , $1 \leq m \leq M$ , denotes the $(λ_{m} - 1)$ -simplex specified by the $λ_{m}$ points: $(1, 0, \dots, 0), (0, 1, 0 \dots, 0), \dots, (0, \dots, 0, 1)$ .

Before proving the product decomposition of $H_{λ}$ , let us make a couple of observations. First, observe that all the vectors $c$ in $C_{λ}$ satisfies $1^{T} c = M$ , so that all edges $c - c^{'}$ , with $c$ , $c^{'}$ in $C_{λ}$ , lie in the same hyperplane of the vector space $V_{λ}$ . By Proposition 1, $V_{λ}$ has dimension $N = \sum_{m} λ_{m} - M + 1$ , this implies that the dimension of the polytope $H_{λ}$ is at most $d = N - 1$ . Second, observe that the set $C_{λ}$ is left unchanged by the symmetry operators $J_{λ_{m}}$ , $1 \leq m \leq M$ , where $J_{λ_{m}}$ cyclically shifts downward the mth module coordinates of the vectors in $C_{λ}$ . The operators $J_{λ_{m}}$ admit the matrix representation

J_{λ_{m}} = (\begin{matrix} I_{λ_{1} + \dots + λ_{m - 1}} \\ J_{λ_{m}} \\ I_{λ_{m + 1} + \dots + λ_{M}} \end{matrix}) with J_{n} = (\begin{matrix} 0 & 1 & 0 & 0 & \dots \\ 0 & 0 & 1 & 0 & \dots \\ ⋮ & ⋱ & ⋱ & ⋱ \\ 1 & 0 & \dots \end{matrix},) \in ℝ^{n} \times ℝ^{n},

where $I_{n}$ denotes the identity matrix in $ℝ^{n} \times ℝ^{n}$ . Notice that the matrices $J_{λ_{m}}$ satisfy $J_{λ_{m}}^{T} J_{λ_{m}} = I_{λ_{1} + \dots + λ_{M}}$ showing that the operators $J_{λ_{m}}$ are isometries in $V_{λ}$ . Moreover, observe that for all $c$ , $c^{'}$ in $C_{λ}$ , there are integers of $k_{1}, \dots, k_{M}$ such that $J_{λ_{1}}^{k_{1}} \dots J_{λ_{1}}^{k_{M}} c = c^{'}$ . This shows that each vector in $C_{λ}$ plays the same role in defining the geometry of $H_{λ}$ , and thus $H_{λ}$ is vertex-transitive. In particular, every vector in $C_{λ}$ represents an extreme point of the convex hull $H_{λ}$ . As a result, $H_{λ}$ is a polytope with as many vertices as the cardinality of $C_{λ}$ , that is, $Λ = \prod_{m = 1}^{M} λ_{m}$ . The product decomposition of the polytope $H_{λ}$ then follows from a simple recurrence argument over the number of modules $M$ .

Appendix 1—figure 1

Download asset Open asset

Simplicial decomposition.

The convex hull generated by the grid code activity patterns is a product of simplices.

Proof. In order to relate the geometrical structure of $H_{λ}$ to that of simplices, let us introduce $e_{i}$ , $1 \leq i \leq λ_{M}$ , the elementary unit vector corresponding to the $i$ -th coordinate of $ℝ^{λ_{M}}$ . The set $C_{λ}$ has the following product structure

C_{λ} = {c = (c^{'}, e_{i}) | c^{'} \in C_{λ^{'}}, 0 < λ_{M}},

where $C_{λ^{'}}$ is the set of vectors for $M - 1$ modules with periods $λ^{'} = {λ_{1}, \dots, λ_{M - 1}}$ . The product structure of the set $C_{λ}$ transfers to the convex hull $H_{λ}$ it generates. Specifically, we have

H_{λ} = {\sum_{i = 1}^{λ_{M}} \sum_{j = 1}^{L / λ_{M}} α_{i j} (c_{j}, e_{i}) | \sum_{i = 1}^{λ_{M}} \sum_{j = 1}^{L / λ_{M}} α_{i j} = 1},

= {(\sum_{j = 1}^{L / λ_{M}} (\sum_{i = 1}^{λ_{M}} α_{i j}) c_{j}, \sum_{i = 1}^{λ_{M}} (\sum_{j = 1}^{L / λ_{M}} α_{i j}) e_{i}) | \sum_{i = 1}^{λ_{M}} \sum_{j = 1}^{L / λ_{M}} α_{i j} = 1},

= {(\sum_{j = 1}^{L / λ_{M}} β_{j} c_{j}, \sum_{i = 1}^{λ_{M}} γ_{i} e_{i}) | \sum_{j = 1}^{L / λ_{M}} β_{j} = 1, \sum_{i = 1}^{λ_{M}} γ_{i} = 1},

= {(c^{'}, δ) | c^{'} \in H_{λ^{'}}, δ \in Δ^{λ_{M}}},

where we have recognized that the convex hull of the set of elementary basis vectors $e_{i}$ , $1 \leq i \leq λ_{M}$ , is precisely the canonical $(λ_{M} - 1)$ -simplex. Thus, we have shown that $H_{λ} = H_{λ^{'}} \times Δ^{λ_{M}}$ . Proceeding by recurrence on the number of modules, one obtains the announced decomposition of the convex hull as a product $H = Δ^{λ_{1}} \times \dots \times Δ^{λ_{M}}$ , where $Δ^{λ_{M}}$ , $1 \leq m \leq M$ , is the canonical $(λ_{m} - 1)$ -simplex.

The above orthogonal decomposition suggests that the problem of determining the linearly separable dichotomies of grid-like inputs is related to that of determining the linearly separable Boolean functions. Indeed, the polytope defined by grid-like inputs with $M$ modules contains $M$ -dimensional hypercubes, for which many dichotomies are not linearly separable. As counting the linearly separable Boolean functions is a notoriously hard combinatorial problem, it is unlikely that one can find a general characterization of the linearly separable dichotomies of grid-like inputs. However, it is possible to give some explicit results for the case of two modules $M$ or for the case of $k$ -dichotomies for small cardinality $k$ .

Appendix 2

Combinatorics of linearly separable dichotomies

In this Appendix, we establish combinatorial results about the properties and the cardinality of linearly separable dichotomies of grid-like inputs. First, we show that linearly separable dichotomies can be partitioned in classes, each indexed by a combinatorial object called Young diagram. Second, we exploit related combinatorial objects, called Young tableaux, to show that not all Young diagrams correspond to linearly separable dichotomies. Third, we utilize Young diagrams to characterize dichotomies for which one class of labeled patterns has small cardinality $k = 1, \dots, 4$ . Fourth, we count the exact number of linearly separable dichotomies for grid-like inputs with two modules.

Relation to Young diagrams

To count linearly separable dichotomies, we first show that these dichotomies can be partitioned in classes that are indexed by Young diagrams. Young diagrams are useful combinatorial objects that have been used to study, e.g., the properties of the group representations of the symmetric group and of the general linear group. Young diagrams are formally defined as follows:

Definition 1

A d-dimensional Young diagram is a subset D of lattice points in the positiveorthant of a d-dimensional integral lattice, which satisfies the following:

If $(n_{1}, \dots, n_{i}, \dots, n_{d}) \in D$ and $n_{i} §gt; 0$ , then $(n_{1}, \dots, n_{i} - 1, \dots, n_{d}) \in D$ .
For any positive integer i ≤ d, and any non negative integers, m, p, with m > p, the restriction of D to the hyperplane n_i = m is a (d−1)-dimensional Young diagram that covers the (d − 1)-dimensional Young diagram formed by the restriction of S to the hyperplane n_i = p.

Moreover, the size of the diagram D, denoted by |D|, is defined as the number of lattice points in D.

Young diagrams have been primarily studied for d = 2 because their use allows oneto conveniently enumerate the partitions of the integers. For d = 2, there are differentconventions for representing Young diagrams pictorially. Hereafter, we follow the Frenchnotations, where Young diagrams are left justified lattice rows, whose length decreaseswith height. For the sake of clarity, Fig. 1a depicts the 5 Young diagrams associated to thepartitions of 4: 4, 3 + 1, 2 + 2, 2 + 1 + 1 and 1 + 1 + 1 + 1: Young diagrams have been less studiedfor dimensions d ≥ 3 and only a few of their combinatorial properties are known. Fig. 1brepresents a 3-dimensional diagram, together with two 2-dimensional restrictions (red edgesfor n₃ = 1 and yellow edges for n₃ = 3). Observe that these restrictions are 2-dimensionalYoung diagrams, and that the restriction corresponding to n₃ = 1 covers the restriction corresponding to n₃ = 3. Young diagrams can equivalently be viewed as arrays of boxesrather than lattice points in the positive orthant. This corresponds to identifying each latticepoint $(n_{1}, \dots, n_{d}) \in D$ with the unit cube $(n_{1} - 1, n_{1}) \times \dots \times (n_{d} - 1, n_{d})$ .

Before motivating the use of Young diagrams, let us make a few remarks about the set ofdichotomies that can be realized by a perceptron with fixed weight vector (ω, θ). First, recallthat with no loss of generality we can restrict the weight vectors ω to be nonnegative byProposition 2. Second, by permutation invariance, there is no loss of generality in consideringa perceptron (ω, θ) for which the weight vector.

w = (w_{1, 1}, . . ., w_{1, λ 1} | . . . . . . | w_{M, 1}, . . ., w_{M, λ M})

is such that the weights are ordered within each module: $w_{m, 1} §lt; \dots §lt; w_{m, ⋋_{m}}$ , for all $m, 1 \leq m \leq M$ . We refer to weight vectors having this module-specific, increasing order propertyas being a modularly ordered weight vector. Bearing these observations in mind, the following proposition establishes the link between Young diagrams and perceptrons.

Proposition 4

Given integer periods $λ = (λ_{1}, \dots, λ_{M})$ , for all modularly ordered, non-negative, weight vectors $w$ and for all thresholds $θ$ , the lattice set

D (w, θ) = {(i_{1}, \dots, i_{M}) \in {1, \dots λ_{1}} \times \dots \times {1, \dots λ_{M}} | \sum_{m = 1}^{M} w_{m, i_{m}} \leq θ}

is a $M$ -dimensional Young diagram in ${1, \dots λ_{1}} \times \dots \times {1, \dots λ_{M}}$ .

In other words, under assumption of modularly ordered, non-negative weights, the phase indices of inactive grid cells form a Young diagram.

Proof. The Young diagram properties directly follow from the ordering of weights within modules. For instance, it is easy to see that if $(i_{1}, \dots, i_{M}) \leq (j_{1}, \dots, j_{M})$ for the component-wise partial order in ${1, \dots λ_{1}} \times \dots \times {1, \dots λ_{M}}$ , then $(j_{1}, \dots, j_{M}) \in D (w, θ)$ implies $(i_{1}, \dots, i_{M}) \in D (w, θ)$ . Indeed, we necessarily have

\sum_{m = 1}^{M} w_{m, i_{m}} \leq \sum_{m = 1}^{M} w_{m, j_{m}} §lt; θ .

By the above proposition, given a grid code with $M$ modules, every perceptron $(w, θ)$ acting on that grid code can be associated to a unique $M$ -dimensional Young diagram $D (w, θ)$ after ordering the components of $w$ within each module. Conversely, if a $M$ -dimensional Young diagram $D^{'}$ can be associated to a perceptron $(w, θ)$ with modularly ordered, non-negative weights, we say that $D^{'} = D (w, θ)$ is realizable. Then a natural question to ask is: are all $M$ -dimensional Young diagrams realizable by perceptrons? It turns out that perceptrons exhaustively enumerate all $M$ -dimensional Young diagrams if $M \leq 2$ , but there are unrealizable Young diagrams as soon as $M > 2$ .

Relation to Young tableaux

Understanding why there are unrealizable Young diagrams as soon as $M > 2$ involves using combinatorial objects that are closely related to Young diagrams, called Young tableaux.

Definition 2

Given a Young diagram $D$ , a Young tableau $T$ is obtained by labeling the lattice points – or filling in the boxes – of $D$ with the integers $1, 2, \dots, | D |,$ such that each number occurs exactly once and such that the entries are increasing across each row (to the right) and across each column (to the top).

Here are two examples of Young tableaux that are distinct labeling of the same Young diagram:

Appendix 2—scheme 1

Download asset Open asset

Just as Young diagrams, Young tableaux are naturally associated to perceptrons. The following arguments specify the correspondence between perceptrons and Young tableaux. Given a perceptron $(w, θ)$ with modularly ordered, non-negative weights, let us order all patterns in $C_{λ}$ by increasing level of perceptron activity. Specifically, set $J_{0} = C_{λ}$ and define iteratively for $k$ , $0 \leq k §lt; Λ$ ,

c_{k + 1}^{⋆} (w) = \arg min_{c \in J_{k} (w)} w^{T} c, J_{k + 1} (w) = J_{k} (w) ∖ {c_{k + 1}^{⋆} (w)} .

With no loss of generality, we can assume that all patterns achieve distinct levels of activity, so that there is a unique minimizer for all $k$ , $0 \leq k < Λ$ . With that assumption, the sequence $c_{k}^{⋆} (w)$ , $1 \leq k \leq Λ$ , enumerates unambiguously all patterns in $C_{λ}$ by increasing level of activity. The Young tableau associated to the perceptron $(w, θ)$ , denoted by $T (w, θ)$ , is then obtained by labeling lattice points of the Young diagram $D (w, θ)$ by increasing level of activity as in the sequence $c_{k}^{⋆} (w)$ , $1 \leq k \leq | D (w, θ) |$ . One can check that such labeling yields a tableau as the resulting labels increase along each rows (to the right) and columns (to the top). Within this framework, we say that a Young tableau $T^{'}$ is realizable if there is a perceptron $(w, θ)$ such that $T^{'} = T (w, θ)$ . Finally, let us define the sequence of thresholds $θ_{k} (w)$ , $0 \leq k \leq Λ + 1$ , such that $θ_{0} = - \infty$ , $θ_{Λ + 1} (w) = \infty$ , and for $0 §lt; k \leq Λ$

θ_{k} (w) = min_{c \in J_{k - 1} (w)} w^{T} c = w^{T} c_{k}^{⋆} (w) .

Then, observe that for all $k$ , $0 \leq k \leq Λ$ , the set of active patterns $J_{k} (w)$ is linearly separable for threshold $θ$ satisfying $θ_{k} (w) \leq θ < θ_{k + 1} (w)$ . In fact, the sequence ${J_{k} (w)}_{0 \leq k \leq Λ}$ represents all the linearly separable dichotomies realizable by changing the threshold of a perceptron with weight vector $w$ . This fact will be useful to prove the following proposition, which justifies considering Young tableaux.

Proposition 5

All $M$ -dimensional Young diagrams are realizable if and only if all $(M - 1)$ -dimensional Young tableaux are realizable.

Observe that the above proposition does not mention the periods $λ_{1}, \dots, λ_{M}$ . This is because the proposition deals with the correspondence between $m$ -dimensional Young diagrams and $(M - 1)$ -dimensional Young tableaux for all possible assignments of periods.

Proof. In this proof, we use prime notations for quantities relating to $M - 1$ modules and regular notations for quantities relating to $m$ modules. For instance, $λ$ denotes an arbitrary assignment of $m$ periods ${λ_{1}, \dots, λ_{M}}$ and $λ^{'}$ denotes its $m - 1$ first components ${λ_{1}, \dots, λ_{M - 1}}$ . With this preamble, we give the ‘if’ part of proof in $(i)$ and the ‘only if’ part in $(i i)$ .

(i) Given a $(M - 1)$ -dimensional Young tableau $T^{'}$ with diagram $D^{'}$ , let us consider the smallest periods $λ^{'}$ such that $D^{'} \subset {1, \dots, λ_{1}} \times \dots \times {1, \dots, λ_{M - 1}}$ . The ‘if’ part of the proof will follow from showing that if all $(M - 1)$ -dimensional tableaux $T^{'}$ with Young diagram $D^{'}$ are realizable, than all $M$ -dimensional Young diagrams whose restriction to ${1, \dots λ_{1}} \times \dots \times {1, \dots λ_{M - 1}} \times {1}$ is $D^{'}$ are realizable. To prove this property, observe that all the $M$ -dimensional Young diagrams with restriction $D^{'}$ are obtained as finite sequences of $(M - 1)$ -dimensional Young diagrams $D^{'} = D_{1}^{'} \supset D_{2}^{'} \supset \dots \supset D_{λ_{M}}^{'}$ , for some $λ_{M}$ specifying the minimum period in the mth dimension. For all such sequences, consider a tableau $T^{'}$ labeling $D^{'}$ such that for all $i$ , $1 \leq i \leq λ_{M} - 1$ , the labels of $D_{i + 1}^{'}$ are smaller than the labels $D_{i}^{'} ∖ D_{i + 1}^{'}$ . Such a tableau is always possible because of the nested property of the sequence of diagrams $D_{i}^{'}$ , $1 \leq i \leq λ_{M}$ . Now, suppose that the Young tableau $T^{'}$ is realizable. This means that there is a perceptron $(w^{'}, θ^{'})$ acting on the grid-like inputs in $C_{λ^{'}}$ such that $T^{'} = T (w^{'}, θ^{'})$ . With no loss of generality, the weight vector $w^{'}$ specifies a sequence of patterns $c_{k}^{⋆} (w^{'})$ , $1 \leq k \leq Λ^{'}$ , and a sequence of thresholds $θ_{k} (w^{'})$ , $1 \leq k \leq Λ^{'}$ , such that $(1)$ enumerates the elements of $C_{λ^{'}}$ by increasing level of activity and $(2)$ for all $0 \leq k \leq | D^{'} |$ , the set of active patterns $J_{k} (w)$ defined in (29) is linearly separable if and only if $θ_{k} (w^{'}) \leq θ < θ_{k + 1} (w^{'})$ . Then by construction, the diagrams $D_{i}^{'}$ , $1 \leq i \leq λ_{M}$ , are realized by a perceptron $(w^{'}, θ_{i}^{'})$ , where every $θ_{i}^{'} \geq θ^{'}$ is such that $θ_{Λ - | D_{i}^{'} |} (w^{'}) < θ_{i}^{'} < θ_{Λ - | D_{i}^{'} | + 1} (w^{'})$ . We are now in a position to construct a $M$ -module perceptron $(w, θ^{'})$ realizing the sequence $D^{'} = D_{1}^{'} \supset D_{2}^{'} \supset \dots \supset D_{λ_{M}}^{'}$ . To do so, it is enough to specify the components $w_{M, 1}, \dots, w_{M, λ_{M}}$ of the Mth module of a weight vector $w$ since the other components will coincide with $w^{'}$ . One can check that choosing $w_{M, i} = θ_{i}^{'} - θ^{'}$ defines an admissible increasing sequence of non-negative weights.

(ii) For the ‘only if’ part, let us consider an arbitrary $(M - 1)$ -dimensional Young tableau $T^{'}$ , with diagram $D^{'}$ such that $| D^{'} | = p$ . Then let us consider the $m$ -dimensional Young diagram $D$ obtained via the sequence of $(M - 1)$ -dimensional diagrams $D^{'} = D_{1}^{'} \supset D_{2}^{'} \supset \dots \supset D_{p}^{'}$ , where for all $q$ , $1 \leq q §lt; p$ , $D_{q}^{'} ∖ D_{q + 1}^{'}$ is a singleton containing the lattice point labeled by $p - q + 1$ . Moreover, let us consider the smallest periods $λ$ such that $D \subset {1, \dots, λ_{1}} \times \dots \times {1, \dots, λ_{M}}$ . Now, suppose that all $m$ -dimensional Young diagrams are realizable. Then, there is a perceptron $(w, θ)$ acting on $C_{λ}$ with modularly ordered, non-negative weights such that $D = D (w, θ)$ . This means that for all $i$ , $1 \leq q \leq p$ , the diagram $D_{q}^{'}$ is realized by the perceptron $(w^{'}, θ - w_{M, q})$ , where $w^{'}$ collect the components of $w$ that correspond to $m - 1$ first modules. Then, let us consider the pattern $c_{q}$ represented by the lattice point in the singleton $D_{q}^{'} ∖ D_{q + 1}^{'}$ . Remember that a pattern $c$ is identified to the lattice point $(i_{1}, \dots, i_{M})$ , whose coordinates are given by the phase of the active neuron within each module. Then, by the increasing property of the weights, we necessarily have $θ - w_{M, q + 1} \leq w^{' T} c_{q} < θ - w_{M, q}$ , which implies that the Young tableaux $D^{'}$ is realized by the perceptron $(w^{'}, θ - w_{M, 1})$ .

It is straightforward to check that all 1D Young tableaux are realizable, so that all 2D Young diagrams are realizable. However, the following counterexample shows that not all 2D Young tableaux are realizable, so that $M$ -dimensional Young diagrams with $M §gt; 2$ are not all realizable.

Counterexample 1. The 2D Young tableaux defined as

T = \begin{array}{ccc} 4 & 8 & 9 \\ 3 & 5 & 7 \\ 1 & 2 & 6 \end{array}

is not realizable.

Proof. Suppose there is a perceptron with modularly ordered, non-negative, weight vector $w = (w_{1, 1}, w_{1, 2}, w_{1, 3}, w_{2, 1}, w_{2, 2}, w_{2, 3})$ realizing $D$ . By convention, we consider that the first module corresponds to the horizontal axis and the second module corresponds to the vertical axis. The labeling of $T$ implies order relations among read-out activities via $w$ . Specifically, the activities can be listed by increasing order as $w_{1, 1} + w_{2, 1} §lt; w_{1, 2} + w_{2, 1} §lt; w_{1, 1} + w_{2, 2} §lt; w_{1, 1} + w_{2, 3} §lt; \dots$ . We are going to show that such an order is impossible by contradiction. To do so, let us introduce the weight differences $u_{1} = w_{1, 2} - w_{1, 1}$ , $u_{2} = w_{1, 3} - w_{1, 2}$ associated to the first module and the weight differences $v_{1} = w_{2, 2} - w_{2, 1}$ , $v_{2} = w_{2, 3} - w_{2, 2}$ associated to the second module. These differences satisfy incompatible order relations. Specifically: $(1)$ the sequence $2 \to 3$ in $T$ implies that the cost to go right, that is, $u_{1} = w_{1, 2} - w_{1, 1}$ , is less than the cost to go up, that is, $v_{1} = w_{2, 2} - w_{2, 1}$ . Otherwise, the label 2 would be on top the label 1. Thus, we necessarily have $u_{1} §lt; v_{1}$ . The same reasoning for the sequence $4 \to 5$ implies $v_{2} < u_{1}$ , so that we have $v_{2} < v_{1}$ The sequence $5 \to 6$ implies $v_{1} < u_{2}$ , and the sequence $7 \to 8$ implies $u_{2} < v_{2}$ , so that we have $v_{1} §lt; v_{2}$ . Thus, assuming that $T$ is realizable leads to considering weights for which $v_{2} §lt; v_{1}$ and $v_{1} < v_{2}$ —a contradiction.

Linearly separable dichotomies for realizable Young diagrams

Consider a Young $M$ -dimensional diagram $D \in {1, \dots λ_{1}} \times \dots \times {1, \dots λ_{M}}$ that can be realized by a perceptron with modularly ordered, non-negative weights. Such a Young diagram $D$ is the lattice set whose points represent the phase indices of inactive grid-like inputs. Indeed, if $(i_{1}, \dots, i_{M}) \in D$ , we have $\sum_{m = 1}^{M} w_{m, i_{m}} \leq θ$ , which means that the perceptron is inactive for the grid-like input $c$ in $C_{λ}$ obtained by setting $c_{m, i_{m}} = 1$ for all $1 \leq m \leq M$ . Thus, the perceptron implements the dichotomy for which the inactive grid-like inputs are exactly represented by $D$ . Are there more dichotomies associated to $D$ ? Answering this question requires revisiting the correspondence between perceptrons and Young diagrams. The key property in establishing this correspondence is the assumption of modularly ordered weights. In Section B.1, we justified that such an assumption incurs no loss of generality by permutation invariance of the grid cells within each modules. Thus, each Young diagram $D$ is in fact associated to the class of perceptrons

{(P w, θ) | D = D (w, θ) P \in Π_{λ}},

where $Π_{λ}$ denotes the set of permutation matrix stabilizing the modules of periods $λ$ . Clearly, for $P \neq P^{'}$ , the perceptron $(P w, θ)$ generally implements a distinct dichotomy than that of $(P^{'} w, θ)$ . As a result, there is a class of dichotomies indexed by the Young diagram $D$ , which we denote by $C (D)$ .

Evaluating the cardinality of $C (D)$ via simple combinatorial arguments first requires a crude description of the geometry of $D$ , and specifically of its degenerate symmetries. For all $1 \leq m \leq M$ , $1 \leq i \leq λ_{m}$ , let us denote the restriction of $D$ to the hyperplane $i_{m} = i$ by

R_{m, i} (D) = {(i_{1}, \dots, i_{M}) \in D | i_{m} = i} .

By definition of the Young diagrams, we have $R_{m, i} (D) \supset R_{m, i + 1} (D)$ for all $1 \leq i §lt; λ_{m}$ . We say that a Young diagram exhibits a degenerate symmetry along the mth dimension whenever two consecutive restrictions coincide: $R_{m, i} (D) = R_{m, i + 1} (D)$ . To make the notion of degeneracy more precise, let us consider the equivalence relation on ${1, \dots, λ_{m}}$ defined by $i \sim j \Leftrightarrow R_{m, i} (D) = R_{m, j} (D)$ . Given $i$ in ${1, \dots, λ_{m}}$ , the equivalence class of $i$ is then ${j \in {1, \dots, λ_{m}} | R_{m, i} (D) = R_{m, j} (D)}$ . Let us denote the total number of such equivalence classes by k_m, $1 \leq k_{m} \leq λ_{m}$ . Then, the set ${1, \dots, λ_{m}}$ can be partitioned in k_m classes, $C_{m, 1}, \dots, C_{m, k_{m}}$ , where the classes are listed by decreasing order of Young diagrams. For instance C₁ comprises all the indices for which the restriction along the mth dimension yields the same Young diagram as $R_{m, 1} (D)$ . We denote the cardinality of the thus-ordered equivalence classes by $σ_{m, k} = | C_{m, k} |$ , $1 \leq k \leq k_{m}$ , so that we have $λ_{m} = σ_{m, 1} + \dots + σ_{m, k_{m}}$ . We refer to the $σ_{m, k}$ as the degeneracy indices. Degenerate symmetries correspond to degeneracy indices $σ_{m, k} §gt; 1$ . We are now in a position to determine the cardinality of $C (D)$ :

Proposition 6

For integer periods $λ_{1}, \dots, λ_{M}$ , let us consider a realizable Young diagram $D$ in ${1, \dots, λ_{1}} \times \dots \times {1, \dots, λ_{M}}$ . Then, the class of linearly separable dichotomies with Young diagram $D$ , denoted by $C (D)$ , has cardinality

| C (D) | = \prod_{m = 1}^{M} \frac{λ_{m}!}{σ_{m, k_{1}}! \dots σ_{m, k_{m}}!} .

where $σ_{m, k}$ , $1 \leq k \leq m$ are the degeneracy indices of the Young diagram along the mth dimension.

Proof. A dichotomy is specified by enumerating the set of inactive grid-like inputs $c$ in $C_{λ}$ . Each pattern $c$ can be conveniently represented as a lattice point in ${1, \dots, λ_{1}} \times \dots \times {1, \dots, λ_{M}}$ by considering the phase indices of the active cell in the $M$ modules of pattern $c$ . Thus, a generic dichotomy is just a configuration of lattice points in ${1, \dots, λ_{1}} \times \dots \times {1, \dots, λ_{M}}$ . The class of dichotomies $C (D)$ comprises all lattice-point configurations in ${1, \dots, λ_{1}} \times \dots \times {1, \dots, λ_{M}}$ obtained by permutations of the indices along the $c$ dimensions:

C (D) = {π_{1} \dots π_{M} D | π_{1} \in S_{λ_{1}}, \dots, π_{M} \in S_{λ_{M}}},

where we define

π_{1} \dots π_{M} D = {(π_{1} (i_{1}), \dots, π_{M} (i_{M})) | (i_{1}, \dots, i_{M}) \in D},

and where $S_{λ_{m}}$ denotes the set of permutation of ${1, \dots, λ_{m}}$ . Let us denote a generic lattice-point configuration in ${1, \dots, λ_{1}} \times \dots \times {1, \dots, λ_{M}}$ by $S$ . By permuting the indices of the points in $S$ , each transformation $π_{m}$ is actually permuting $R_{m, i} (S)$ , $1 \leq i \leq m$ , the restrictions of the lattice-point configuration along the mth dimension. The partial order defined by inclusion is preserved by permutations in the sense that given $π_{m}$ in $S_{λ_{m}}$ , $1 \leq m \leq M$ , we have $R_{m, π_{m} (i)} (π_{1} \dots π_{M} S) \subset R_{m, π_{m} (j)} (π_{1} \dots π_{M} S)$ if and only if $R_{m, i} (S) \subset R_{m, j} (S)$ . In particular, k_m, the number of restriction classes induced by the relation $i \sim j \Leftrightarrow R_{m, i} (S) = R_{m, j} (S)$ , is invariant to permutations, and so are their cardinalities. These cardinalities specify the degeneracy indices $σ_{m, 1}, \dots, σ_{m, k_{m}}$ of $S$ along the mth dimension. Thus, all configurations $S$ obtained via permutation of $D$ have the same degeneracy indices as $D$ . Moreover, for a Young diagram $D$ , these degeneracy indices simply count the equivalence classes formed by restrictions of identical size along the same dimension. Thus, the number of dichotomies in $| C (D) |$ is determined as the number of ways to independently assign the indices ${1, \dots, λ_{m}}$ to k_m restriction classes of size $σ_{m, 1}, \dots, σ_{m, k_{m}}$ for all $m$ , $1 \leq m \leq M$ . For each $m$ , this number is given by the multinomial coefficient: $λ_{m}! / (σ_{m, k_{1}}! \dots σ_{m, k_{m}}!)$ .

As opposed to the case of random configurations in general position, the many symmetries of the grid-like inputs in $C_{λ}$ allow one to enumerate dichotomies of specific cardinalities. We define the cardinality of a dichotomy by the size of the set of active pattern it separates. Thus, a perceptron $(w, θ)$ realizing a $k$ -dichotomy is one for which exactly $k$ patterns $c$ in $C_{λ}$ are such that $w^{T} c > θ$ . Proposition 7 reduces the problem of counting realizable $k$ -dichotomies to that of enumerating realizable Young diagrams $D$ of size $| D | = k$ . Such an enumeration depends on the number of modules $M$ , which sets the dimensionality of the Young diagrams, as well as the periods $λ_{m}$ , $1 \leq m \leq M$ . Unfortunately, even without considering the constraint of being a realizable Young diagram, there is no convenient way to enumerate Young diagrams of fixed size for general dimension $M$ . However, for low cardinality, for example, $k \leq 5$ , there are only a few Young diagrams such that $| D | = k$ , and it turns out that all of them are realizable. In the following, and without aiming at exhaustivity, we exploit the latter fact to characterize the sets of $k$ -dichotomies for $k \leq 5$ and to compute their cardinalities.

There are $M$ possible $M$ -dimensional Young diagram of size 2, according to the dimension along which the two lattice points are positioned. The Young diagram extending along the mth dimension, $1 \leq m \leq M$ , has degeneracy indices $σ_{m, 1} = 2$ and $σ_{m, 2} = λ_{m} - 2$ or $σ_{n, 1} = 1$ and $σ_{n, 2} = λ_{n} - 1$ for $n \neq m$ . As a result, the number of 2-dichotomies of grid-like inputs is given by

N_{2} = \sum_{m = 1}^{M} (\prod_{n \neq m} \frac{λ_{n}!}{1! (λ_{n} - 1)!}) \frac{λ_{m}!}{2! (λ_{m} - 2)!} = \frac{1}{2} \sum_{m = 1}^{M} λ_{m} (λ_{m} - 1) (\prod_{n \neq m} λ_{n}) .

There are two types of Young diagram of size 3, type $(3 a)$ for which the three lattice points span one dimension and type $(3 b)$ for which the lattice points span two dimensions. There are $M$ possible M-dimensional Young diagram of type $(3 a)$ . The degeneracy indices for the Young diagram extending along the mth dimension, $1 \leq m \leq M$ , are $σ_{m, 1} = 3$ and $σ_{m, 3} = λ_{m} - 3$ , and $σ_{n, 1} = 1$ and $σ_{n, 2} = λ_{n} - 1$ for $n \neq m$ , yielding

N_{3 a} = \sum_{m = 1}^{M} (\prod_{n \neq m} \frac{λ_{n}!}{1! (λ_{n} - 1)!}) \frac{λ_{m}!}{3! (λ_{m} - 3)!} = \frac{1}{6} \sum_{m} (\prod_{n \neq m} λ_{n}) λ_{m} (λ_{m} - 1) (λ_{m} - 2) .

There are $M (M - 1) / 2$ possible $M$ -dimensional Young diagram of type $(3 b)$ , as many as choices of two dimensions among $M$ . The degeneracy indices of the Young diagram extending along dimensions $m$ and $n$ , $1 \leq m §lt; n \leq M$ , are $σ_{m, 1} = σ_{m, 2} = 1$ and $σ_{m, 3} = λ_{m} - 2$ , $σ_{n, 1} = σ_{n, 2} = 1$ and $σ_{n, 3} = λ_{n} - 2$ , and $σ_{k, 1} = 1$ and $σ_{k, 2} = λ_{k} - 1$ for $k \neq m, n$ , yielding

N_{3 b} = \sum_{1 \leq m §lt; n \leq M} (\prod_{k \neq m, n} \frac{λ_{k}!}{1! (λ_{k} - 1)!}) \frac{λ_{m}!}{1! 1! (λ_{m} - 2)!} \frac{λ_{n}!}{1! 1! (λ_{n} - 2)!}

= \frac{1}{2} \sum_{n \neq m} (\prod_{k \neq m, n} λ_{k}) λ_{m} (λ_{m} - 1) λ_{n} (λ_{n} - 1) .

As a result, the number of 3-dichotomies of grid-like inputs is given by

N_{3} = N_{3}^{a} + N_{3}^{b} = \frac{1}{2} \prod_{m} λ_{m} (\sum_{n} (λ_{n} - 1) (\frac{λ_{n} - 2}{3} + \sum_{k \neq n} λ_{k} - n + 1)) .

Appendix 2—figure 1

Download asset Open asset

Multidimensional Young diagrams.

a. Lattice representations of the 2-dimensional Young diagrams of size 4, depicting the integer partitions of 4. b. Lattice representation of a 3-dimensional Young diagram with two 2-dimensional Young diagrams defined as horizontal restrictions.

Appendix 2—figure 2

Download asset Open asset

Linearly separable 4-dichotomies.

Top: there are four possible Young diagrams a, b, c, and d, of size 4, spanning at most three dimensions. Lattice points lying along the mth dimension represent grid-like inputs in $C_{λ}$ whose coordinates only differ in the mth module. Bottom: Graphical edge structure arising from embedding a Young diagram within $H (C_{λ})$ , the convex polytope defined by grid-like inputs.

A similar analysis reveals that there are four types of Young diagrams of size 4, which span up to three dimensions if $M \leq 3$ . These Young diagrams, denoted by $(4 a)$ , $(4 b)$ , $(4 c)$ , and $(4 d)$ , are represented in Figure 6, where degeneracy indices can be read graphically. As a result, the number of 4-dichotomies of grid-like inputs is given by $N_{4} = N_{4}^{a} + N_{4}^{b} + N_{4}^{c} + N_{4}^{d}$ where the number of type-specific dichotomies is given by

N_{4 a} = \frac{1}{24} \sum_{m = 1}^{M} (\prod_{n \neq m} λ_{n}) λ_{m} (λ_{m} - 1) (λ_{m} - 2) (λ_{m} - 3),

N_{4 b} = \frac{1}{2} \sum_{1 \leq m \leq M} (\prod_{k \neq m, n} λ_{k}) λ_{m} (λ_{m} - 1) (λ_{m} - 2) λ_{n} (λ_{n} - 1),

N_{4 c} = \frac{1}{4} \sum_{1 \leq m \leq M} (\prod_{k \neq m, n} λ_{k}) λ_{m} (λ_{m} - 1) λ_{n} (λ_{n} - 1),

N_{4 d} = \sum_{1 \leq m \leq M} (\prod_{l \neq m, n, k} λ_{l}) λ_{m} (λ_{m} - 1) λ_{n} (λ_{n} - 1) λ_{k} (λ_{k} - 1) .

The classification of dichotomies via Young diagrams also illuminates the geometrical structure of linearly separable $k$ -dichotomies, at least for small $k$ . In particular, 2-dichotomies are linearly separable if they involve two lattice points forming an edge of the convex polytope, that is, if these points correspond to patterns in $C_{λ}$ whose coordinates only differ in one module. Similarly, 3-dichotomies are linearly separable if and only if $(3 a)$ they involve three lattice points representing patterns in $C_{λ}$ whose coordinates only differ in one module or $(3 b)$ they involve two pairs of lattice points representing patterns in $C_{λ}$ whose coordinates only differ in one module. Thus, $(3 a)$ corresponds to the case of three lattice points specifying a clique of convex-polytope edges, while $(3 b)$ corresponds to the case of three lattice points specifying two convex-polytope edges. We illustrate the four geometrical structures of the linearly separable 4-dichotomies in Figure 6.

Numbers of dichotomies for two modules

For two modules of period $λ_{1}$ and $λ_{2}$ , recall that each grid pattern in $C_{λ}$ is a $(λ_{1} + λ_{2})$ -dimensional vector, which is entirely specified by the indices of its two active neurons: $(i, j)$ , $1 \leq i \leq λ_{1}$ , $1 \leq j \leq λ_{2}$ . Thus, it is convenient to consider a set of grid patterns as a collection of points in the discrete lattice ${1, \dots, λ_{1}} \times {1, \dots, λ_{2}}$ . From Proposition 4, we know that linearly separable dichotomies are made of those sets of grid patterns $C_{λ}$ for which a Young diagram can be formed via permutations of rows and columns in the lattice (see Figure 7). By convention, we consider that the marked lattice points forming a Young diagram define the set of active grid patterns. The remaining unmarked lattice points define the set of inactive grid patterns. To each 2D Young diagrams in the lattice ${1, \dots, λ_{1}} \times {1, \dots, λ_{2}}$ corresponds a class of linearly separable dichotomies. Counting the total number of linearly separable dichotomies when $M = 2$ will proceed in two steps: (i) we first give a slightly stronger result than Proposition about the cardinality of the classes of dichotomies associated to a Young diagram, and (ii) we evaluate the total number of dichotomies by summing class cardinalities over the set of Young diagrams.

Proposition 7

For two integer periods $λ_{1}$ and $λ_{2}$ , let us consider a Young diagram $D$ in the lattice ${1, \dots, λ_{1}} \times {1, \dots, λ_{2}}$ . Without loss of generality, $D$ can be specified via the degeneracy indices $σ_{1, 1}, \dots σ_{1, k}$ , and $σ_{2, 1}, \dots σ_{2, k}$ , chosen such that

D h a s σ_{1, i} r o w s o f l e n g t h \sum_{j = 1}^{k + 1 - i} σ_{2, j} ⟺ D h a s σ_{2, j} c o l u m n s o f l e n g t h \sum_{i = 1}^{k + 1 - i} σ_{1, i} .

Then, the class of linearly separable dichotomies with Young diagram $D$ , denoted by $C (D)$ , has cardinality

| C (D) | = \frac{λ_{1}!}{σ_{1, 1}! \dots σ_{1, k + 1}!} \frac{λ_{2}!}{σ_{2, 1}! \dots σ_{2, k + 1}!},

where we have $σ_{1, 1} + \dots + σ_{1, k + 1} = λ_{1}$ and $σ_{2, 1} + \dots + σ_{2, k + 1} = λ_{2}$ .

Appendix 2—figure 3

Download asset Open asset

Counting 2-module Young diagram.

Linearly separable dichotomies (left panel) can be associated to a unique Young diagram (middle panel). These Young diagrams are entirely specified by their frontier path, separating active positions from inactive ones. Enumerating all possible frontier paths allows one to count all the linearly separable dichotomies for two modules.

Proof. Consider a Young diagram $D$ in ${1, \dots, λ_{1}} \times {1, \dots, λ_{2}}$ with $p$ inactive patterns. The diagram $D$ is uniquely defined by the row partition $p = r_{1} + \dots + r_{λ_{1}}$ , $r_{1} \geq \dots \geq r_{λ_{1}}$ , where r_i denotes the occupancy of row $i$ , or equivalently by the column partition $p = s_{1} + \dots + s_{λ_{2}}$ , $s_{1} \geq \dots \geq s_{λ_{2}}$ , where s_j denotes the occupancy of column $j$ . The occupancies ${r_{1}, \dots, r_{λ_{1}}}$ and ${s_{1}, \dots, s_{λ_{2}}}$ entirely define restrictions along each dimension and each set of occupancies along a dimension is invariant to row and column permutations. The corresponding degeneracy indices can be determined straightforwardly by counting the number of rows or columns with a given occupancy, that is, within a given equivalence class. Denoting the necessarily identical number of rows classes and columns classes by $k \leq \min (λ_{1}, λ_{2})$ , Proposition yields directly the announced result.

Proposition 8

For two integer periods $λ_{1}$ and $λ_{2}$ , the number of linearly separable dichotomies in $C_{(λ_{1}, λ_{2})}$ is

N_{λ_{1}, λ_{2}} = \sum_{k = 0}^{min (λ_{1}, λ_{2})} (k!)^{2} S (λ_{1} + 1, k + 1) S (λ_{2} + 1, k + 1) = B_{λ_{2}}^{(- λ_{1})},

where $S (n, k)$ denotes the Stirling numbers of the second kind and where $B_{k}^{(n)}$ denotes the poly-Bernoulli numbers.

Proof. Our goal is to evaluate the total number of dichotomies $N_{λ_{1}, λ_{2}}$ . To achieve this goal, we will exploit the combinatorics of 2D Young diagrams to specify $N_{λ_{1}, λ_{2}}$ as

N_{λ_{1}, λ_{2}} = \sum_{D \subset {1, \dots, λ_{1}} \times {1, \dots, λ_{2}}} | C (D) |,

where $D$ runs over all possible Young diagrams. Because of the multinomial nature of the cardinalities $| C (D) |$ , it is advantageous to adopt an alternative representation for Young diagrams. This alternative representation will require utilizing the frontier of a Young diagram. Given a Young diagram $D$ with $k$ distinct nonempty rows and $k$ distinct nonempty columns, we define its frontier as the path joining the lattice points $(0, λ_{2})$ and $(λ_{1}, 0)$ , via lattice positions in $D$ separating the active region from the inactive region (see Figure 7). Such a path is uniquely defined via $k + 1$ downward steps of size $σ_{1, k + 1}, \dots, σ_{1, 1}$ and $k + 1$ rightward steps of sizes $σ_{2, 1}, \dots, σ_{2, k + 1}$ , which satisfy $σ_{1, 1} + \dots + σ_{1, k + 1} = λ_{1}$ and $σ_{2, 1} + \dots + σ_{2, k + 1} = λ_{2}$ . Clearly, the frontier of $D$ determines the cardinality of $C (D)$ via (46). To evaluate $N_{λ_{1}, λ_{2}}$ in (48), we partition Young diagrams based on $k$ , the number of distinct row and column sizes. For $k = 0$ , we have $σ_{1, 1} = λ_{1}$ and $σ_{2, 1} = λ_{2}$ , corresponding to $N_{λ_{1}, λ_{2}} (0) = 1$ Young diagram, the empty diagram, where all patterns are inactive. For $k = 1$ , there is a single row and column size, corresponding to Young diagrams where the active patterns are arranged in a rectangle, with edge lengths $σ_{1, 1}$ and $σ_{2, 1}$ . Nonempty rectangular diagrams correspond to $σ_{1, 1} §gt; 0$ and $σ_{2, 1} §gt; 0$ , and thus contribute

N_{λ_{1}, λ_{2}} (1) = \sum_{ρ_{1} = 1}^{λ_{1}} \sum_{σ_{1} = 1}^{λ_{2}} \frac{λ_{1}!}{σ_{1, 1}! (λ_{1} - σ_{1, 1})!} \frac{λ_{2}!}{σ_{2, 1}! (λ_{2} - σ_{2, 1})!}

= (\sum_{ρ_{1} = 0}^{λ_{1}} \frac{λ_{1}!}{σ_{1, 1}! (λ_{1} - σ_{1, 1})!} - 1) (\sum_{σ_{1} = 0}^{λ_{2}} \frac{λ_{2}!}{σ_{2, 1}! (λ_{2} - σ_{2, 1})!} - 1) = (2^{λ_{1}} - 1) (2^{λ_{2}} - 1),

to the sum (48). The contribution of diagrams with general $k$ -frontier, denoted by $N_{λ_{1}, λ_{2}} (k)$ , follows from the multinomial theorem, where one ensures that frontiers with less than $k + 1$ downward and rightward steps do not get repeated. These $k$ -frontiers correspond to $k + 1$ sequences of downward and rightward steps for which no step has zero size, except possibly for the first downward step emanating from $(0, λ_{2})$ and the last rightward step arriving at $(λ_{1}, 0)$ . Under these conditions, the downward and rightward steps can be chosen independently, so that we can write $N_{λ_{1}, λ_{2}} (k) = f_{k} (λ_{1}) f_{k} (λ_{2})$ , where the factors $f_{k} (λ_{1})$ and $f_{k} (λ_{2})$ only depend on the downward steps and rightward steps, respectively. Let us focus on the downward steps alone, that is, on the term $f_{k} (λ_{1})$ . The admissible sequences of steps satisfy $σ_{1, 1} + \dots + σ_{1, k + 1} = λ_{1}$ , with $σ_{1, 1}, \dots, σ_{1, k} \neq 0$ . From the multinomial theorem, we have

(k + 1)^{λ_{1}} = \sum_{\begin{matrix} σ_{1, 1} + \dots + σ_{1, k + 1} = λ_{1} \\ σ_{1, 1} \dots σ_{1, k} \neq 0 \end{matrix}} \frac{λ_{1}!}{σ_{1, 1}! \dots σ_{1, k + 1}!} + \sum_{\begin{matrix} σ_{1, 1} + \dots + σ_{1, k + 1} = λ_{1} \\ σ_{1, 1} \dots σ_{1, k} = 0 \end{matrix}} \frac{λ_{1}!}{σ_{1, 1}! \dots σ_{1, k + 1}!},

where the first term of the right-hand side is $f_{k} (λ_{1})$ and the second term of the right-hand side collects the contribution of sequences that are not $k$ -frontiers. The latter term can be evaluated explicitly via the exclusion-inclusion principle yielding

\sum_{\begin{matrix} σ_{1, 1} + \dots + σ_{1, k + 1} = λ_{1} \\ σ_{1, 1} \dots σ_{1, k} = 0 \end{matrix}} \frac{λ_{1}!}{σ_{1, 1}! \dots σ_{1, k + 1}!} = \sum_{i = 1}^{k} (- 1)^{i - 1} (\binom{k}{i}) \sum_{\begin{matrix} σ_{1, 1} + \dots + σ_{1, k + 1} = λ_{1} \\ σ_{1, 1} = 0, \dots, σ_{1, i} = 0 \end{matrix}} \frac{λ_{1}!}{σ_{1, 1}! \dots σ_{1, k + 1}!},

= \sum_{i = 1}^{k} {(- 1)}^{i - 1} (\binom{k}{i}) \sum_{\begin{matrix} σ_{1, i + 1} + \dots + σ_{1, k + 1} = λ_{1} \end{matrix}} \frac{λ_{1}!}{σ_{1, i + 1}! \dots σ_{1, k + 1}!}

= \sum_{i = 1}^{k} (- 1)^{i - 1} (\binom{k}{i}) (k + 1 - i)^{λ_{1}},

where we have used the multinomial theorem for the last equality. Together with (51), the above equation allows one to specify $f_{k} (λ)$ in terms of the Sterling numbers of the second kind, denoted by $S (n, k)$ , as

f_{k} (λ) = \sum_{i = 0}^{k} {(- 1)}^{i} (\binom{k}{i}) {(k + 1 - i)}^{λ},

= \sum_{i = 0}^{k} (- 1)^{i} (\binom{k}{i}) \sum_{j = 0}^{λ} (\binom{λ}{j}) (k - i)^{j},

= \sum_{j = 0}^{λ} (\binom{λ}{j}) \sum_{i = 0}^{k} {(- 1)}^{i} (\binom{k}{i}) {(k - i)}^{j},

= k! \sum_{j = 0}^{λ} (\binom{λ}{j}) S (j, k),

= k! S (λ + 1, k + 1),

where the last equality follows from a well-known identity about Stirling numbers of the second kind. Then, the overall number of dichotomies follows from the fact that the frontier has at most $\min (λ_{1}, λ_{2})$ distinct values of row/column sizes, which implies

N_{λ_{1}, λ_{2}} = \sum_{k = 0}^{min (λ_{1}, λ_{2})} N_{λ_{1}, λ_{2}} (k) = \sum_{k = 0}^{min (λ_{1}, λ_{2})} (k!)^{2} S (λ_{1} + 1, k + 1) S (λ_{2} + 1, k + 1) = B_{λ_{2}}^{(- λ_{1})} .

where we have recognized the definition of the poly-Bernoulli numbers $B_{k}^{(n)}$ . These numbers are defined via the generating function

\frac{{Li}_{k} (1 - e^{- x})}{1 - e^{- x}} = \sum_{n = 0}^{\infty} B_{k}^{(n)} \frac{x^{n}}{n!},

where ${Li}_{k}$ denotes the poly-logarithm.

Poly-Bernoulli numbers were originally introduced by Kaneko to enumerate the set of binary $k$ -by- $n$ matrices that are uniquely reconstructible from their row and column sums (Kaneko, 1997). The use of poly-Bernoulli numbers to enumerate permutations of Young tableaux was pioneered by Postnikov while investigating totally Grassmannian cells (Postnikov, 2006). While studying the asymptotics of the extremal excedance set statistic, de Andrade et al., 2015 obtained the asymptotics of the poly-Bernoulli numbers along the diagonal:

N_{λ, λ} = B_{λ}^{(- λ)} = (\frac{1}{\log 2 \sqrt{1 - \log 2}} + o (1)) \frac{(2 λ)!}{(2 \log 2)^{2 λ}} .

Appendix 3

Spatial embedding of the grid code

In this Appendix, we address the limitations entailed by spatially embedding grid-like inputs. First, we define the grid-cell-activity matrix that specifies the spatial assignment of grid-like inputs for 1D space. Second, we show that the contiguous-separating capacity, defined as the maximum spatial extent over which all possible dichotomies are linearly separable, is determined by the rank of the grid-cell-activity matrix. Third, we generalize our results about the separating capacity to spaces of arbitrary dimensions.

Grid-cell-activity matrix for 1D space

The fundamental object of our combinatorial analysis is the polytope whose vertices have all possible grid-cell patterns as coordinates. Thanks to the many symmetries of this polytope, we can enumerate linearly separable dichotomies of grid-like inputs. However, such an approach makes no explicit reference the actual physical space that these grid-like inputs encode. Making these reference consists in specifying a mapping between spatial positions and grid-like inputs. Unfortunately, this generally involves breaking many of the polytope symmetries, precluding any combinatorial analysis. It is especially true if one considers spaces encoded by a subset of grid-cell patterns, as opposed to the full set $C_{λ}$ , a situation that leads to considering nonsymmetrical polytopes.

Let us explain this point by considering the case of a discrete 1D space where each position is marked by an integer in $ℤ$ . In this setting, positional information about $ℤ$ is encoded by $M$ modules of grid cells with integer periods $λ = (λ_{1}, \dots, λ_{M})$ . Recall that each module comprises $λ_{m}$ cells, each active at a distinct phase within the period $λ_{m}$ , and that the corresponding repertoire of grid-like inputs $C_{λ}$ has cardinality $Λ = \prod_{m = 1}^{M} λ_{m}$ . Because the spatial activity of grid cells is periodic and because we consider a finite number of grid cells, the mappings between spatial positions and grid-like inputs are necessarily periodic functions $A_{λ} : Z \to C_{λ}$ . Let us denote by $L$ the period of $A_{λ}$ . It is then convenient to consider the functions $A_{λ} : Z / L Z \to C_{λ}$ as matrices, called grid-cell-activity matrices, whose jth column is the pattern in $C_{λ}$ that encodes the jth spatial position in ${1, \dots, L}$ , seen as the element $j$ in $ℤ / L ℤ$ . In particular, the matrices $A_{λ}$ have $N = \sum_{m = 1}^{M} λ_{m}$ rows, each row corresponding to the periodic activity of a grid cell. Moreover, at every position $j$ , $1 \leq j \leq L$ , each module has a single active cell. For the sake of clarity, here follows a concrete example of grid-cell-activity matrix for $λ = (2, 3, 5)$ :

A_{(2, 3, 5)} = (\begin{array}{ccccccccccccccccccccccccccccccc} 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & \dots & \dots & 0 & 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & 0 & 1 & \dots & \dots & 1 & 0 & 1 & 0 & 1 \\ 1 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 1 & \dots & \dots & 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & \dots & \dots & 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & \dots & \dots & 0 & 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & \dots & \dots & 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & \dots & \dots & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & \dots & \dots & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 & \dots & \dots & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 1 & \dots & \dots & 0 & 0 & 0 & 0 & 1 \end{array})

As the labelling of grid cells is arbitrary within a module, grid-population activity is actually represented by a class of matrices, which is invariant to permutation of the grid cells $(m, i)$ , $1 \leq i \leq λ_{m}$ , within a module $m$ . Here, with no loss of generality, we consider the class representatives obtained by ordering the grid cells by increasing phase within each module. This convention allows us to simply define the activity matrix $A_{λ}$ via the introduction of a spatial shift operator $J_{λ}$ . We define the shift operator $J_{λ}$ as the linear operator that cyclically increments the phases by one unit within each module, that is,

J_{λ} = (\begin{array}{ccccc} J_{λ_{1}} \\ J_{λ_{2}} \\ ⋱ \\ J_{λ_{M}} \end{array}) w i t h J_{λ_{m}} = (\begin{array}{ccccc} 0 & 1 & 0 & 0 & \dots \\ 0 & 0 & 1 & 0 & \dots \\ ⋮ & ⋱ & ⋱ & ⋱ \\ 1 & 0 & \dots \end{array}),

where $J_{λ_{m}}$ is the canonical circulant permutation matrix of order $λ_{m}$ . We refer to $J_{λ}$ as a shift operator because its action on any vector of $A_{λ}$ corresponds to a positional shift by one unit of space: if $c_{j}$ , $1 \leq j \leq L$ , denotes the jth column of $A_{λ}$ , then $J_{λ} c_{j} = c_{j + 1}$ if $j §lt; L$ , and $J_{λ} c_{L} = c_{1}$ . Thus, we can define the grid-cell-activity matrix as the matrix obtained by enumerating in order the grid-cell patterns $J_{λ}^{k} c_{1}$ , $k \in ℕ$ , up to redundancies. Such a definition of the grid-cell-activity matrix prominently features the relation between the symmetries of the grid code and those of the actual physical space. In particular, it clearly shows that the formulation of our problem is invariant to rotation of the discretized space $1, 2, \dots, L$ , that is, by shift in $ℤ / L ℤ$ . We show that grid-cell-activity matrix can be similarly defined for lattice space of higher dimensions in Section C.3, including the relevant case of the 2D hexagonal lattice.

A key observation is that the periodicity $L$ , that is, the number of positions univocally tagged by grid-like inputs, is directly related to the periods $λ$ via the Chinese remainder theorem. Indeed, by the Chinese remainder theorem, the first redundant grid-like input occurs for $L = lcm (λ_{1}, \dots, λ_{M})$ , therefore specifying the number of columns of the activity matrix. Thus, for pairwise coprime periods $λ_{m}$ , $1 \leq m \leq M$ , we have $L = Λ$ and the columns of the activity matrix $A_{λ}$ exhaustively enumerate all grid-like inputs in $C_{λ}$ . As a result, all the combinatorial results obtained for the full set of patterns $C_{λ}$ directly apply over the full linear space ${1, \dots, L}$ for pairwise coprime periods. In particular, for pairwise coprime periods, we have $r a n k A_{λ} = \sum_{i = 1}^{M} λ_{i} - M + 1$ by Proposition 1.

Unfortunately, our combinatorial results do not directly extend to a spatial context for integer periods that are not pairwise coprime or for incomplete spaces ${1, \dots, L^{'}}$ , $L^{'} §lt; L$ . For non-coprime periods, we have $L §lt; Λ$ , as exemplified by the grid-cell-activity matrix for $λ = (2, 8)$ given by

A_{(2, 8)} = (\begin{array}{cccccccc} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array})

which comprises only four of the eight patterns of $C_{2, 4}$ . Independent of the coprimality of the periods, the grid-cell-activity matrix over incomplete spaces is simply obtained by deleting the columns corresponding to the missing positions. In particular, we clearly have $L^{'} §lt; L \leq Λ$ . Excluding some grid-like inputs has two opposite implications: (i) the total number of dichotomies is reduced in keeping with considering a smaller space but (ii) some dichotomies that were previously not linearly separable can become realizable. Disentangling these opposite implications is obscured by the many broken symmetries of the polytope formed by the subset patterns under consideration. For this reason, we essentially resort to studying spatial embedding of the grid code numerically. Such numerical analysis reveals, perhaps not surprisingly, that a key role is played by the embedding dimension of the grid code, especially in relation to the concept of contiguous-separating capacity.

Contiguous-separating capacity

We define the contiguous-separating capacity of a grid code as the maximum physical extent over which all possible dichotomies are linearly separable. Classically, for $N$ -dimensional inputs in general position, the separating capacity is defined as the maximum number of patterns for which all possible dichotomies are linearly separable, without any reference to contiguity. Within this context, Cover’s counting function theorem implies that the separating capacity equals the dimension of the input space. Should the grid-like inputs be in general position in the input space, the separating capacity would thus be equal to $r a n k A_{λ}$ . However, being in general position requires that any submatrix formed by $r$ columns of $A_{λ}$ be of rank $r$ for $r \leq r a n k A_{λ}$ . This property does not hold for grid-cell-activity matrices. Moreover, we are interested in a stronger notion of separating capacity as we require that the grid-like inputs achieving separating capacity represent contiguous spatial position. Thankfully, the spatial symmetry of the grid-cell-activity matrices allows us to show that even under these restrictions the separating capacity is indeed $r a n k A_{λ}$ .

Proposition 9

The contiguous-separating capacity of the generic grid-cell-activity matrix $A_{λ}$ is equal to $r a n k A_{λ}$ .

Proof.

The proof proceeds in two steps. With no loss of generality, we only consider linear classification via perceptron with zero threshold.

(i) By permutation and shift invariance, it is enough to consider contiguous columns of $A_{λ}$ starting from the first column $c_{1}$ . From the definition of $A_{λ}$ , the $k$ contiguous columns can be generated in terms of the shift operator $J_{λ}$ as the sequence: $c_{1}, J_{λ} c_{1}, \dots, J_{λ}^{k} c_{1}$ . Let us consider the sequence ${d_{k}}_{k \in ℕ}$ defined by $d_{k} = \dim {c_{1}, J_{λ} c_{1}, \dots, J_{λ}^{k} c_{1}}$ . Posit $r = r a n k A_{λ}$ . If there is an integer $n$ such that $d_{n} = d_{n + 1}$ , then necessarily d_k is constant for $k \geq n$ , and is equal to $lim_{k \to \infty} d_{k} = d_{L} = r$ . As $d_{1} = 1$ and $d_{k + 1} - d_{k} \in {0, 1}$ , the preceding observation implies that $d_{k + 1} - d_{k} = 1$ for $1 \leq k < r a n k A_{λ}$ . This shows that the contiguous columns $c_{i}$ , $1 \leq i \leq r$ , are linearly independent, and thus are in general position in the input space. By Cover’s counting function theorem, all dichotomies obtained by labeling the positions $1 \leq i \leq r$ with $r = r a n k A_{λ}$ are linearly separable.

(ii) Considering an extra position, that is, including the column $c_{r + 1}$ , produces at least a dichotomy that is not linearly separable. We proceed by contradiction. Assume that all dichotomies of the $r + 1$ positions, that is, of the columns $c_{i}$ with $1 \leq i \leq r + 1$ , are linearly separable. By Cover’s counting function theorem, this is equivalent to assuming that all dichotomies of the first $r$ positions, that is, of the columns $c_{i}$ with $1 \leq i \leq r$ , can be achieved by an $(r - 1)$ -dimensional hyperplane passing through $c_{r + 1}$ . In other words, for all $r$ -dichotomies $y$ in ${0, 1}^{r}$ , there is a weight vector $w$ such that $y_{i} (w^{T} c_{i}) > 0$ for $1 \leq i \leq r$ and such that $w^{T} c_{r + 1} = 0$ . However, by linear dependence, there are nonzero coefficients a_i such that $c_{r + 1} = \sum_{i = 1}^{r} a_{i} c_{i}$ , so that for any $r$ -dichotomy, we can find $w$ achieving that dichotomy and such that

\sum_{i = 1}^{r} a_{i} (w^{T} c_{i}) = 0 .

Considering a dichotomy for which $y_{i} = a_{i} / | a_{i} |$ for nonzero coefficients yields

\sum_{i = 1}^{r} a_{i} (w^{T} c_{i}) = \sum_{i = 1}^{r} | a_{i} | | w^{T} c_{i} | > 0 .

which is a contradiction with (66).

The above proposition specifies $r a n k A_{λ}$ as the contiguous-separating capacity for 1D spatial model. This rank also specifies the dimension of the space containing the subset of grid-like inputs to be linearly classified. For pairwise coprime periods $λ$ , Proposition 1 shows that $r a n k A_{λ} = \sum_{m = 1}^{M} λ_{m}$ . The following proposition generalizes this result to generic integer periods.

Proposition 10

Let $A_{λ}$ denote the grid-cell-activity matrix specified by M grid modules with integer periods $λ = (λ_{1}, \dots, λ_{M})$ . The rank of the activity matrix A_λ is given by

r a n k A_{λ} = \sum_{i = 1}^{M} λ_{i} - \sum_{1 \leq i < j \leq M} gcd (λ_{i}, λ_{j}) + \sum_{1 \leq i < j < k \leq M} gcd (λ_{i}, λ_{j}, λ_{k}) - \dots + (- 1)^{M - 1} gcd (λ_{1}, \dots, λ_{M}) = \sum_{k = 1}^{M} (- 1)^{k - 1} \sum_{S \subset λ, | S | = k} gcd (S)

where $S$ is a subset of integer periods and $| S |$ denotes the cardinality of the set $S$ . If the periods are pairwise coprime, the above formula yields $r a n k A_{λ} = \sum_{i = 1}^{M} λ_{i} - M + 1$ .

Proof. The proof will proceed in three steps.

(i) The first step is to realize that $r a n k A_{λ} = r a n k A_{λ}^{T} = \dim (V_{1} + \dots + V_{M})$ , where the vector spaces $V_{m}$ , $1 \leq m \leq M$ , are generated by the rows of the mth module of the activity matrix. Then, the exclusion-inclusion principle applied to the sum of $V_{1} + \dots + V_{M}$ yields an expression for A_λ as the alternated sum:

r a n k A_{λ} = \dim (V_{1} + \dots + V_{M})

= \sum_{i = 1}^{M} \dim V_{i} - \sum_{1 \leq i §lt; j \leq M} \dim V_{i} \cap V_{j} + \sum_{1 \leq i §lt; j §lt; k \leq M} \dim V_{i} \cap V_{j} \cap V_{k} - \dots .

By definition of the activity matrix, the space $V_{m}$ is generated by $λ_{m}$ row vectors, which are cyclically permuted versions of the $λ_{m}$ -periodic vector $r_{λ_{m}} = (1, 0, \dots, 0 | 1, 0, \dots, 0 | 1, 0, \dots)$ . In particular, these $λ_{m}$ row vectors can be enumerated by iterated application of $J$ , the canonical $L$ -dimensional circulant permutation operator. The resulting sequence $r_{λ_{m}}, J r, \dots, J^{λ_{m} - 1} r_{λ_{m}}$ actually forms a basis of $V_{m}$ , identified to the space of $λ_{m}$ -periodic vectors of length $L$ , and thus $dim V_{m} = λ_{m}$ . The announced formula will follow from evaluation of the dimension of the intersection of the vector spaces $V_{m}$ .

(ii) The second step is to observe that one can specify the set of spaces $V_{m}$ , $1 \leq m \leq M$ , as the span of vectors chosen from a common basis of $ℝ^{L}$ , where we recall that $L = lcm (λ_{1}, \dots, λ_{M})$ . We identify such a common basis by considering the action of the operator $J$ on $L$ -dimensional periodic vectors. As a circulant permutation operator, $J$ admits a diagonal matrix representation in the basis of eigenvectors ${e_{i}}$ , $1 \leq i \leq L$ ,

e_{j} = (1, ω_{j}, ω_{j}^{2}, \dots, ω_{j}^{L - 1}),

associated to the eigenvalue $ω_{j} = e^{i \frac{2 π j}{L}}$ , where $i^{2} = - 1$ . Moreover, $J$ clearly preserves periodicity when acting on row vectors in $ℝ^{L}$ , so that the spaces $V_{m}$ , $1 \leq m \leq M$ , are stable by $J$ . As a consequence, each space $V_{m}$ can be represented as the span of a subset of the eigenvectors of $J$ . In principle, the existence of a basis spanning the spaces $V_{m}$ , $1 \leq m \leq M$ , allows one to compute the dimension of the intersections of these spaces by counting the number of common basis elements in their span.

(iii)The last step is to show that counting the number of common basis elements $e_{i}$ in the subsets of ${V_{m}}_{1 \leq m \leq M}$ yields the announced formula. Proving this point relies on elementary results from the theory of cyclic groups. Let us first consider the basis elements generating $V_{m}$ , which are the elements $e_{j}$ that are $λ_{m}$ -periodic. These basis elements are precisely those for which $ω_{j}^{λ_{m}} = 1$ , that is, $λ_{m} j = 0$ in the cyclic group $ℤ / L ℤ$ . Considering the integers $j$ as elements of $ℤ / L ℤ$ , we can then specify the basis vectors generating $V_{m}$ by invoking the subgroup structure of the cyclic groups. Specifically, the basis elements generating $V_{m}$ are indexed by the elements of the unique subgroup of order $λ_{m}$ in $ℤ / L ℤ$ . Thus, as expected, the number of basis elements equates the otherwise known dimension of $V_{m}$ . Let us then consider the basis elements generating the intersection space $V_{m} \cap V_{n}$ , $m \neq n$ , which are the elements $e_{j}$ that are both $λ_{m}$ -periodic and $λ_{n}$ -periodic. These basis elements correspond to those indices $j$ for which we have $λ_{m} j = 0$ and $λ_{n} j = 0$ in the cyclic group $ℤ / L ℤ$ , that is, for which $\gcd (λ_{m}, λ_{n}) j = 0$ in $ℤ / L ℤ$ . By the subgroup structure of cyclic groups, the basis elements generating $V_{m} \cap V_{n}$ are thus indexed by the elements of the unique subgroup of order $\gcd (λ_{n}, λ_{m})$ in $ℤ / L ℤ$ . Thus, we have $dim V_{m} \cap V_{n} = \gcd (λ_{m}, λ_{n})$ . The above reasoning generalizes straightforwardly to any set of indices $1 \leq m_{1} §lt; \dots §lt; m_{k} \leq M$ , $1 \leq k \leq m$ , leading to

dim V_{m_{1}} \cap \dots \cap V_{m_{k}} = \gcd (λ_{m_{1}}, \dots, λ_{m_{k}})

Specifying the dimension of the intersection spaces in (69) derived from the exclusion-inclusion principle yields the rank formula given in (68).

Generalization to higher dimensional lattices

Our two results about (i) the number of dichotomies for grid code with two modules and about (ii) the separating capacity for an arbitrary number of modules generalize to an arbitrary number of dimensions. The generalization of (i) is straightforward as our results bear on the set of grid-like inputs with no reference to physical space. The only caveat has to do with the fact that for $d$ -dimensional lattice, each module $m$ , $1 \leq m \leq M$ , contains $λ_{m}^{d}$ cells so that $λ_{m}^{d}$ has to be substituted for $λ_{m}$ in formula (47). It turns out that the generalization of (ii) proceeds in the exact same way, albeit in a less direct fashion. In the following, we prove that the separating capacity for a $d$ -dimensional lattice model, including the 2D hexagonal lattice, is still given by the rank of the corresponding activity matrix.

A couple of remarks are in order before justifying the generalization of (ii):

First, let us specify how to construct activity matrices in d-dimensional space by considering a simple example. Consider the hexagonal-lattice model for two modules with $λ = (2, 3)$ . As illustrated in Figure 1, there are four possible 2-periodic lattices and nine possible 3-periodic lattices, each lattice representing the spatial activity pattern of a grid cell. Combining the encoding of the two modules yield a periodic lattice, with lattice mesh comprising $lcm {(λ_{1}, λ_{2})}^{2} = 36$ positions. Every position within the mesh size is uniquely labeled by the grid-like input, and any subset of positions with larger cardinality has redundancy. Observe moreover that the lattice mesh is equivalent to that of a (2, 3)-square lattice, and in fact, the activity matrix for an (2, 3)-hexagonal lattice model is the same as that for a (2, 3)-square lattice. As a result, the spatial dependence of the grid-cell population is described by a matrix in $ℝ^{13 \times 36}$ with the following block structure:

A_{(2, 3)}^{(2)} = (\begin{array}{cccccc} B_{(2)} & 0 & B_{(2)} & 0 & B_{(2)} & 0 \\ 0 & B_{(2)} & 0 & B_{(2)} & 0 & B_{(2)} \\ B_{(3)} & 0 & 0 & B_{(3)} & 0 & 0 \\ 0 & B_{(3)} & 0 & 0 & B_{(3)} & 0 \\ 0 & 0 & B_{(3)} & 0 & 0 & B_{(3)} \end{array}) w i t h \begin{matrix} B_{(2)} = (\begin{array}{cccccc} 1 & 0 & 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 & 0 & 1 \end{array}), \\ B_{(3)} = (\begin{array}{cccccc} 1 & 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 \end{array}) . \end{matrix}

In the above matrix $A_{(2, 3)}^{(2)}$ , the top-two block rows represent the activity of 2-periodic cells, while the bottom-three block rows represent the activity of 3-periodic cells. By convention, we consider blocks $B_{(2)}$ and $B_{(3)}$ , comprising respectively two and three cells, represent the activity of grid cells along the horizontal $x$ -axis. There are two rows of blocks $B_{(2)}$ and three rows of blocks $B_{(3)}$ to encode 2-periodicity and 3-periodicity, respectively, along the vertical $y$ -axis. It is straightforward to generalize this hierarchical block structure to construct an activity matrix $A_{λ}^{(d)}$ for arbitrary periods $λ_{m}$ and arbitrary square-lattice dimension $d$ . In particular, the matrix $A_{λ}^{(d)}$ has $\sum_{m = 1}^{M} λ_{m}^{d}$ rows and $L = lcm {(λ_{1}, \dots, λ_{M})}^{d}$ columns.

Appendix 3—figure 1

Download asset Open asset

Hexagonal and square lattice in two dimensions.

(a) In two dimensions, 2-periodic and 3-periodic modules comprise respectively four and nine possible grid-cell-activity pattern. For instance, red, green, blue, and yellow patterns in the leftmost lattice correspond to the four possible patterns of activity that a 2-periodic cell can exhibit on an hexagonal lattice. (b) The maximum lattice mesh over which each position is uniquely encoded by the grid-like code is given as $6 \times 6 = 2^{2} \times 3^{2}$ . Moreover, the hexagonal symmetry plays no role in our capacity calculations and one can consider a square lattice of positions instead.

Second, let us define the notion of contiguous-separating capacity for $d$ -dimensional lattice with $d §gt; 1$ . In one dimension, we define the contiguous-separating capacity as the maximum spatial extent for which all dichotomies involving its discrete set of positions are linearly separable. We generalize this notion for arbitrary dimensions $d$ by defining the contiguous-separating capacity as the maximum connected component of $d$ -dimensional positions for which all dichotomies are possible. Observe that thus-defined, we are rather oblivious about the geometric arrangement of this connected components. This is due to the fact that in dimension $d §gt; 1$ , the contiguous-separating capacity can be achieved by many distinct arrangements.

After these preliminary remarks, we can now prove the following proposition.

Proposition 11

The contiguous-separating capacity of the generic grid-cell-activity matrix $A_{λ}$ is equal to $r a n k A_{λ}^{(d)}$ , where we have

r a n k A_{λ}^{(d)} = \sum_{k = 1}^{M} (- 1)^{k - 1} \sum_{S \subset λ, | S | = k} gcd (S)^{d}

Proof. We only justify the formula for the case $d = 2$ as similar arguments apply for all integers $d §gt; 1$ (see Remark after this proof). The proof will proceed in two steps: (i) we justify the formula for $r a n k A_{λ}^{(d)}$ and (ii) we justify that the contiguous-separating capacity equals $r a n k A_{λ}^{(d)}$ .

(i) We follow the same strategy as for dimension 1 to establish the rank formula for $d = 2$ via exclusion-inclusion principle. The key point is to exhibit a basis of vectors $(e_{1}, \dots, e_{L})$ in $ℝ^{L \times L}$ , with $L = lcm (λ_{1}, \dots, λ_{M})$ , which spans all the vector spaces $V_{m}$ , $1 \leq m \leq M$ , where $V_{m}$ denotes the space of $λ_{m}$ periodic functions on the $(L \times L)$ -lattice mesh. To specify such a basis, we consider the two operators $J_{x}$ and $J_{y}$ acting on the grid-like inputs and representing the one-unit shift along horizontal $x$ -axis and along the vertical $y$ -axis, respectively. A basis of the space of $λ_{m}$ periodic functions on the $(L \times L)$ -lattice mesh is generated by iterated action of $J_{x}$ and $J_{y}$ on the activity lattice of a $λ_{m}$ -periodic cell, that is, on a ${0, 1}$ -row vector $r_{λ_{m}}$ of the mth module of $r a n k A_{λ}^{(d)}$ . Specifically, a basis of $V_{m}$ is given by the $λ_{m}^{2}$ vectors $J_{x}^{k} J_{y}^{l} r_{λ_{m}}$ , with $0 \leq k §lt; λ_{m}$ and $0 \leq l §lt; λ_{m}$ . Moreover, the operators $J_{x}$ and $J_{y}$ commute on $ℝ^{L \times L}$ , as by construction, shifting lattices by $J_{x} J_{y}$ yields the same lattice as the one obtained by shifting the original lattice by $J_{y} J_{x}$ . Thus, if $J_{x}$ and $J_{y}$ are diagonalizable, they can be diagonalized in the same basis $ϵ_{i j}$ , $1 \leq i, j \leq L$ . Close inspection of the operators $J_{x}$ and $J_{y}$ reveals that they admit matrix representations that are closely related to the canonical $L$ -dimensional circulant matrix $J_{L}$ :

J_{x} = (\begin{matrix} J_{L} \\ ⋱ \\ J_{L} \end{matrix}), J_{y} = (\begin{matrix} 0 & I_{L} \\ ⋱ & I_{L} \\ I_{L} & 0 \end{matrix}) and J_{x} J_{y} = J_{y} J_{x} = (\begin{matrix} 0 & J_{L} \\ ⋱ & J_{L} \\ J_{L} & 0 \end{matrix}) .

Concretely, the operator $J_{x}$ cyclically shifts columns within each blocks $rank B_{λ}$ , whereas the operator $J_{y}$ cyclically shifts the blocks within $A_{λ}^{(d)}$ . Considering the basis of eigenvector $e_{i}$ , $1 \leq i \leq L$ , of $J_{L}$ , we define the basis $ϵ_{i j}$ , $1 \leq i, j \leq L$ , as $ϵ_{i j} = (e_{j} | w_{i} e_{j} | \dots | w_{i}^{L - 1} e_{j})$ , where w_i is the eigenvalue associated to $e_{i}$ . We have

J_{x} ϵ_{i j} = (J_{L} e_{j} | w_{i} J_{L} e_{j} | \dots | w_{i}^{L - 1} J_{L} e_{j}) = ω_{j} (e_{j} | w_{i} e_{j} | \dots | w_{i}^{L - 1} e_{j}) = ω_{j} ϵ_{i j},

J_{y} ϵ_{i j} = (w_{i} e_{j} | \dots | w_{i}^{L - 1} e_{j} | e_{j}) = ω_{i} (e_{j} | w_{i} e_{j} | \dots | w_{i}^{L - 1} e_{j}) = ω_{i} ϵ_{i j},

which shows that $ϵ_{i j}$ is indeed a basis diagonalizing $J_{x}$ and $J_{y}$ . Moreover, as $J_{x}$ and $J_{y}$ stabilize the space $V_{m}$ , the basis $ϵ_{i j}$ spans the space $V_{m}$ , as well as all the spaces defined as intersections of subsets of ${V_{m}}_{1 \leq m \leq M}$ . Consider the set of indices $1 \leq m_{1} §lt; \dots §lt; m_{k} §lt; M$ , $1 \leq k \leq M$ , specifying the intersection $V_{m_{1}} \cap \dots \cap V_{m_{k}}$ . By the same reasoning as for dimension 1, the basis elements spanning $V_{m_{1}} \cap \dots \cap V_{m_{k}}$ are those eigenvectors $ϵ_{i j}$ that are $\gcd (λ_{m_{1}}, \dots, λ_{m_{k}})$ -periodic in both $x$ -direction and $y$ -direction. As $J_{x} ϵ_{i j} = ω_{j} ϵ_{i j}$ and $J_{y} ϵ_{i j} = ω_{i} ϵ_{i j}$ , posing $g = \gcd (λ_{m_{1}}, \dots, λ_{m_{k}})$ , this is equivalent to $(g i, g j) = (0, 0)$ in $ℤ / g ℤ \times ℤ / g ℤ$ . By the subgroup structure of cyclic group, the basis elements $ϵ_{i j}$ generating $V_{m_{1}} \cap \dots \cap V_{m_{k}}$ are thus indexed by $(i, j)$ where $i$ and $j$ are elements of the unique subgroup of order $g$ in $ℤ / L ℤ$ . There are $g^{2}$ such basis elements, showing that

dim V_{m_{1}} \cap \dots \cap V_{m_{k}} = g^{2} = \gcd {(λ_{m_{1}}, \dots, λ_{m_{k}})}^{2} .

The rank formula follows immediately from expressing $r a n k A_{λ}^{(d)} = \dim (V_{1} + \dots + V_{M})$ via the exclusion-inclusion principle.

(ii) Just as for $(i)$ , we follow the same strategy as for dimension 1 to show that the contiguous-separating capacity equals $r a n k A_{λ}^{(d)}$ . The only caveat to address is that the grid-like inputs, that is, the columns of $A_{λ}^{(d)}$ , are generated by the action of two shift operators instead of one. Specifically, starting from the first column $c_{1}$ of $A_{λ}^{(d)}$ , we can generate all subsequent columns by action of the operators $J_{λ, x}$ and $J_{λ, y}$ , whose matrix representations are given by

J_{λ, x} = (\begin{array}{cccc} J_{λ_{1} \times λ_{1}} \\ J_{λ_{2} \times λ_{2}} \\ ⋱ \\ J_{λ_{M} \times λ_{M}} \end{array}), w i t h J_{λ_{m} \times λ_{m}} = (\begin{array}{cccc} J_{λ_{m}} \\ J_{λ_{m}} \\ ⋱ \\ J_{λ_{m}} \end{array}),

J_{λ, y} = (\begin{array}{cccc} J_{λ_{1} \times λ_{1}}^{'} \\ J_{λ_{2} \times λ_{2}}^{'} \\ ⋱ \\ J_{λ_{M} \times λ_{M}}^{'} \end{array}), w i t h J_{λ_{m} \times λ_{m}}^{'} = (\begin{array}{cccc} 0 & I_{λ_{m}} \\ ⋱ & ⋱ \\ ⋱ & I_{λ_{m}} \\ I_{λ_{m}} & 0 \end{array}) .

Notice that $J_{λ, x}$ and $J_{λ, y}$ commute. By the same reasoning as for dimension 1, we know that the separating capacity cannot exceed $r a n k A_{λ}^{(d)}$ . Then, to prove that the separating capacity equals $r a n k A_{λ}^{(d)}$ , it is enough to exhibit a linearly independent set of contiguous positions with cardinality $r a n k A_{λ}^{(d)}$ . Let us exhibit such positions. Mirroring the 1D case, let us consider the sequence $d_{l}^{(1)}$ defined by

d_{l}^{(1)} = \dim s p a n {J_{λ, y}^{i} c_{1} | 1 \leq i \leq l} .

The above sequence is strictly increasing by unit step until some l₁, after which it remains constant at value

d_{l_{1}}^{(1)} = \dim V_{1}, w i t h V_{1} = s p a n {J_{λ, y}^{i} c_{1} | 1 \leq i \leq L} .

Let us then consider the sequence

\dim V_{L} = \dim (V_{1}, J_{λ, x} V_{1}, \dots, J_{λ, x}^{L} V_{1}) = \dim s p a n {J_{λ, y}^{i} J_{λ, x}^{j} c_{1} | 1 \leq i, j \leq L} = r a n k A_{λ}^{(d)} .

The above sequence is also strictly increasing by unit step until some l₂, after which it remains constant at value

d_{l_{2}}^{(2)} = \dim V_{2}, w i t h V_{2} = V_{1} + J_{λ, x} V_{1} .

Moreover, V₂ admits for basis the vectors $1 \leq i \leq l_{2}$ , and $J_{λ, y}^{i} c_{1}, 1 \leq i \leq l_{1}$ , $J_{λ, y}^{i} J_{λ, y} c_{1}, 1 \leq i \leq l_{2}$ . We can iterate this construction by repeated action of the operator $J_{λ, x}$ , yielding a sequence of number l_k and a sequence of space $V_{k} = V_{k - 1} + J_{λ, x} V_{k}$ . Necessarily, the sequence l_k becomes eventually zero as

\dim V_{L} = \dim (V_{1}, J_{λ, x} V_{1}, \dots, J_{λ, x}^{L} V_{1}) = \dim s p a n {J_{λ, y}^{i} J_{λ, x}^{j} c_{1} | 1 \leq i, j \leq L} = r a n k A_{λ}^{(d)}

Let us consider the smallest $k §gt; 1$ for which $l_{k} = 0$ , than the set of vectors

{J_{λ, y}^{i} J_{λ, x}^{j} c_{1} | 1 \leq j < k, 0 \leq i \leq l_{k}}

is linearly independent by construction and generates the range of $A_{λ}^{(d)}$ . In particular, we necessarily have $l_{1} + \dots + l_{k - 1} = r a n k A_{λ}^{(d)}$ . Observing that these vectors correspond to a connected component of positions concludes the proof.

Remark

Although we do not give the proof for arbitrary spatial dimension $d §gt; 2$ , let us briefly comment on extending the above arguments to higher dimension. Such a generalization is straightforward but requires the utilization of tensor calculus. For integer periods $λ$ and generic dimension $d$ , the activity tensor can be defined as

A_{λ}^{(d)} = \sum_{i_{1}, \dots, i_{d} \in L^{d}} \sum_{m = 1}^{M} (y_{i_{1}}^{m} \otimes \dots \otimes y_{i_{d}}^{m}) \otimes (x_{i_{1}}^{⋆} \otimes \dots \otimes x_{i_{d}}^{⋆})

where $y_{i_{1}}^{m} \otimes \dots \otimes y_{i_{d}}^{m}$ is the canonical basis vector associated to the $(i_{1}, \dots, i_{d})$ coordinate in $R^{λ_{m}^{d}}$ , with $(i_{1}, \dots, i_{d})$ considered as an element of ${(Z / λ_{m} Z)}^{d}$ , and where $x_{i_{1}}^{⋆} \otimes \dots \otimes x_{i_{d}}^{⋆}$ is the linear form associated to the $(i_{1}, \dots, i_{d})$ coordinate in $ℝ^{L^{d}}$ . In tensorial form, the operators $J_{k}$ , $1 \leq k \leq d$ , representing unit shift along the kth dimension, have the simple form $J_{k} = I_{L} \otimes \dots \otimes J_{L} \otimes \dots \otimes I_{L}$ such that

J_{k} (x_{i_{1}} \otimes \dots \otimes x_{i_{k}} \otimes \dots \otimes x_{i_{L}}) = (I_{L} \otimes \dots \otimes J_{L} \otimes \dots \otimes I_{L}) (x_{i_{1}} \otimes \dots \otimes x_{i_{k} + 1} \otimes \dots \otimes x_{i_{L}})

where $i_{k} + 1$ is considered as an element of $Z / L Z$ . The generalization to arbitrary $d$ -dimension follows from realizing that $ϵ_{i_{1}, \dots, i_{L}} = e_{i_{1}} \otimes \dots \otimes e_{i_{L}}$ , $i_{1}, \dots, i_{d} \in L^{d}$ , where $e_{i}$ is the eigenvector of $J_{L}$ associated to $ω_{i}$ , form a basis diagonalizing all the operators $J_{k}$ , $1 \leq k \leq d$ with $J_{k} ϵ_{i_{1}, \dots, i_{L}} = ω_{i_{k}} ϵ_{i_{1}, \dots, i_{L}}$ .

Data availability

The authors confirm that the data supporting the findings of this study are available within the article. Implementation details and code are available at: https://github.com/myyim/placecellperceptron copy archived at https://archive.softwareheritage.org/swh:1:rev:8e03b880f47a1f0b7934afd91afb167f669ceeab.

References

1. Abu-Mostafa Y
2. Jacques JS
(1985) Information capacity of the Hopfield model
IEEE Trans Inform Theory 31:461–464.

https://doi.org/10.1109/TIT.1985.1057069
- Google Scholar
1. Agmon H
2. Burak Y
(2020) A theory of joint attractor dynamics in the hippocampus and the entorhinal cortex accounts for artificial remapping and grid cell field-to-field variability
eLife 08:9.

https://doi.org/10.7554/eLife.56894
- Google Scholar
1. Alme CB
2. Miao C
3. Jezek K
4. Treves A
5. Moser EI
6. Moser MB
(2014) Place cells in the hippocampus: Eleven maps for eleven rooms
PNAS 111:18428–18435.

https://doi.org/10.1073/pnas.1421056111
- PubMed
- Google Scholar
1. Amaral DG
2. Witter MP
(1989) The three-dimensional organization of the hippocampal formation: A review of anatomical data
Neuroscience 31:571–591.

https://doi.org/10.1016/0306-4522(89)90424-7
- PubMed
- Google Scholar
(1985) Storing Infinite Numbers of Patterns in a Spin-Glass Model of Neural Networks
Physical Review Letters 55:1530–1533.

https://doi.org/10.1103/PhysRevLett.55.1530
- PubMed
- Google Scholar
(2017) Mapping of a non-spatial dimension by the hippocampal–entorhinal circuit
Nature 543:719–722.

https://doi.org/10.1038/nature21692
- PubMed
- Google Scholar
1. Battaglia FP
2. Treves A
(1998) Attractor neural networks storing multiple space representations: A model for hippocampal place fields
Physical Review. E 58:7738–7753.

https://doi.org/10.1103/PhysRevE.58.7738
- Google Scholar
1. Battista A
2. Monasson R
(2020) Capacity-resolution trade-off in the optimal learning of multiple low-dimensional manifolds by attractor neural networks
Physical Review Letters 124:48302.

https://doi.org/10.1103/PhysRevLett.124.048302
- PubMed
- Google Scholar
1. Bittner KC
2. Grienberger C
3. Vaidya SP
4. Milstein AD
5. Macklin JJ
6. Suh J
7. Tonegawa S
8. Magee JC
(2015) Conjunctive input processing drives feature selectivity in hippocampal CA1 neurons
Nature Neuroscience 18:1133–1142.

https://doi.org/10.1038/nn.4062
- PubMed
- Google Scholar
1. Brun VH
2. Otnass MK
3. Molden S
4. Steffenach HA
5. Witter MP
6. Moser MB
7. Moser EI
(2002) Place cells and place recognition maintained by direct entorhinal-hippocampal circuitry
Science 296:2243–2246.

https://doi.org/10.1126/science.1071089
- PubMed
- Google Scholar
1. Brun VH
2. Solstad T
3. Kjelstrup KB
4. Fyhn M
5. Witter MP
6. Moser EI
7. Moser MB
(2008) Progressive increase in grid scale from dorsal to ventral medial entorhinal cortex
Hippocampus 18:1200–1212.

https://doi.org/10.1002/hipo.20504
- PubMed
- Google Scholar
1. Burak Y
2. Fiete I
(2006) Do We Understand the Emergent Dynamics of Grid Cell Activity
Journal of Neuroscience 26:9352–9354.

https://doi.org/10.1523/JNEUROSCI.2857-06.2006
- PubMed
- Google Scholar
1. Burak Y
2. Fiete IR
(2009) Accurate Path Integration in Continuous Attractor Network Models of Grid Cells
PLOS Computational Biology 5:e1000291.

https://doi.org/10.1371/journal.pcbi.1000291
- PubMed
- Google Scholar
1. Burgess N
(2008) Grid cells and theta as oscillatory interference: Theory and predictions
Hippocampus 18:1157–1174.

https://doi.org/10.1002/hipo.20518
- PubMed
- Google Scholar
1. Cadena C
2. Carlone L
3. Carrillo H
4. Latif Y
5. Scaramuzza D
6. Neira J
7. Reid I
8. Leonard JJ
(2016) Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age
IEEE Transactions on Robotics 32:1309–1332.

https://doi.org/10.1109/TRO.2016.2624754
- Google Scholar
1. Cesaro E
(1881) Démonstration Élémentaire et Généralisation de Quelques Théoremes de M Berger
Mathesis 1:99–102.

https://doi.org/10.1007/978-94-015-7842-4_13
- Google Scholar
Book
1. Chaudhuri R
2. Fiete I
(2019)
Bipartite expander Hopfield networks as self-decoding high-capacity error correcting codes

In: Wallach H, Larochelle H, Beygelzimer A, Alché-Buc F, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems 32. Curran Associates. pp. 7686–7697.
- Google Scholar
1. Cheng S
2. Frank LM
(2011) The structure of networks that produce the transformation from grid cells to place cells
Neuroscience 197:293–306.

https://doi.org/10.1016/j.neuroscience.2011.09.002
- PubMed
- Google Scholar
1. Cheung A
2. Ball D
3. Milford M
4. Wyeth G
5. Wiles J
(2012) Maintaining a cognitive map in darkness: The need to fuse boundary knowledge with path integration
PLOS Computational Biology 8:e1002651.

https://doi.org/10.1371/journal.pcbi.1002651
- PubMed
- Google Scholar
(2008) Understanding memory through hippocampal remapping
Trends in Neurosciences 31:469–477.

https://doi.org/10.1016/j.tins.2008.06.008
- PubMed
- Google Scholar
(2016) Organizing conceptual knowledge in humans with a gridlike code
Science 352:1464–1468.

https://doi.org/10.1126/science.aaf0941
- PubMed
- Google Scholar
1. Cover TM
(1965) Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition
IEEE Transactions on Electronic Computers EC-14:326–334.

https://doi.org/10.1109/PGEC.1965.264137
- Google Scholar
(2015) Asymptotics of the extremal excedance set statistic
European Journal of Combinatorics 46:75–88.

https://doi.org/10.1016/j.ejc.2014.11.008
- Google Scholar
1. Dordek Y
2. Soudry D
3. Meir R
4. Derdikman D
(2016) Extracting grid cell characteristics from place cell inputs using non-negative principal component analysis
eLife 5:e10094.

https://doi.org/10.7554/eLife.10094
- PubMed
- Google Scholar
1. Fenton AA
2. Kao HY
3. Neymotin SA
4. Olypher A
5. Vayntrub Y
6. Lytton WW
7. Ludvig N
(2008) Unmasking the CA1 ensemble place code by exposures to small and large environments: More place cells and multiple, irregularly arranged, and expanded place fields in the larger space
The Journal of Neuroscience 28:11250–11262.

https://doi.org/10.1523/JNEUROSCI.2862-08.2008
- PubMed
- Google Scholar
(2008) What Grid Cells Convey about Rat Location
The Journal of Neuroscience 28:6858–6871.

https://doi.org/10.1523/JNEUROSCI.5684-07.2008
- PubMed
- Google Scholar
Preprint
(2014) A Binary Hopfield Network with Information Rate and Applications to Grid Cell Decoding
arXiv.

https://arxiv.org/pdf/1407.6029.pdf
- Google Scholar
Book
1. Fulton W
2. Fulton MW
(1997) Young Tableaux: With Applications to Representation Theory and Geometry
Cambridge University Press.

https://doi.org/10.1017/CBO9780511626241
- Google Scholar
1. Fyhn M
2. Molden S
3. Witter MP
4. Moser EI
5. Moser MB
(2004) Spatial representation in the entorhinal cortex
Science 305:1258–1264.

https://doi.org/10.1126/science.1099901
- PubMed
- Google Scholar
1. Fyhn M
2. Hafting T
3. Treves A
4. Moser MB
5. Moser EI
(2007) Hippocampal remapping and grid realignment in entorhinal cortex
Nature 446:190–194.

https://doi.org/10.1038/nature05601
- PubMed
- Google Scholar
1. Gardner E
(1988) The space of interactions in neural network models
J Phys A 21:257–270.

https://doi.org/10.1088/0305-4470/21/1/030
- Google Scholar
1. Hafting T
2. Fyhn M
3. Molden S
4. Moser MB
5. Moser EI
(2005) Microstructure of a spatial map in the entorhinal cortex
Nature 436:801–806.

https://doi.org/10.1038/nature03721
- PubMed
- Google Scholar
Conference
(2014)
Error accumulation and landmark-based error correction in grid cells

Neuroscience 2014.
- Google Scholar
(2012) Nonlinear dendritic integration of sensory and motor input during an active sensing task
Nature 492:247–251.

https://doi.org/10.1038/nature11601
- PubMed
- Google Scholar
1. Harnett MT
2. Xu NL
3. Magee JC
4. Williams SR
(2013) Potassium Channels Control the Interaction between Active Dendritic Integration Compartments in Layer 5 Cortical Pyramidal Neurons
Neuron 79:516–529.

https://doi.org/10.1016/j.neuron.2013.06.005
- Google Scholar
1. Hartley T
2. Burgess N
3. Lever C
4. Cacucci F
5. O’Keefe J
(2000) Modeling place fields in terms of the cortical inputs to the hippocampus
Hippocampus 10:369–379.

https://doi.org/10.1002/1098-1063(2000)
- PubMed
- Google Scholar
1. Hegedüs T
2. Megiddo N
(1996) On the geometric separability of Boolean functions
Discrete Applied Mathematics 66:205–218.

https://doi.org/10.1016/0166-218X(94)00161-6
- Google Scholar
1. Honda Y
2. Sasaki H
3. Umitsu Y
4. Ishizuka N
(2012) Zonal distribution of perforant path cells in layer III of the entorhinal area projecting to CA1 and subiculum in the rat
Neuroscience Research 74:200–209.

https://doi.org/10.1016/j.neures.2012.10.005
- PubMed
- Google Scholar
1. Irmatov AA
(1993)
On the number of threshold functions

Diskretnaya Matematika 5:40–43.
- Google Scholar
1. Itskov V
2. Abbott LF
(2008) Pattern capacity of a perceptron for sparse discrimination
Physical Review Letters 101:018101.

https://doi.org/10.1103/PhysRevLett.101.018101
- PubMed
- Google Scholar
1. Kaneko M
(1997) Poly-bernoulli numbers
J Théor Nr Bordx 9:221–228.

https://doi.org/10.5802/jtnb.197
- Google Scholar
Preprint
1. Kanitscheider I
2. Fiete I
(2017a) Emergence of Dynamically Reconfigurable Hippocampal Responses by Learning to Perform Probabilistic Spatial Reasoning
bioRxiv.

https://doi.org/10.1101/231159
- Google Scholar
1. Kanitscheider I
2. Fiete I
(2017b) Making our way through the world: Towards a functional understanding of the brain’s spatial circuits
Current Opinion in Systems Biology 3:186–194.

https://doi.org/10.1016/j.coisb.2017.04.008
- Google Scholar
Conference
1. Kanitscheider I
2. Fiete I
(2017c)
Training recurrent networks to generate hypotheses about how the brain solves hard navigation problems

NIPS. pp. 4529–4538.
- Google Scholar
(2012) A map of visual space in the primate entorhinal cortex
Nature 491:761–764.

https://doi.org/10.1038/nature11587
- PubMed
- Google Scholar
1. Klukas M
2. Lewis M
3. Fiete I
(2020) Efficient and flexible representation of higher-dimensional cognitive variables with grid cells
PLOS Computational Biology 16:e1007796.

https://doi.org/10.1371/journal.pcbi.1007796
- PubMed
- Google Scholar
1. Kropff E
2. Treves A
(2008) The emergence of grid cells: Intelligent design or just adaptation?
Hippocampus 18:1256–1269.

https://doi.org/10.1002/hipo.20520
- PubMed
- Google Scholar
(2007) Dendritic Spikes in Apical Dendrites of Neocortical Layer 2/3 Pyramidal Neurons
The Journal of Neuroscience 27:8999–9008.

https://doi.org/10.1523/JNEUROSCI.1717-07.2007
- PubMed
- Google Scholar
1. Larkum ME
2. Nevian T
3. Sandler M
4. Polsky A
5. Schiller J
(2009) Synaptic Integration in Tuft Dendrites of Layer 5 Pyramidal Neurons: A New Unifying Principle
Science 325:756–760.

https://doi.org/10.1126/science.1171958
- Google Scholar
1. Lee JS
2. Briguglio JJ
3. Cohen JD
4. Romani S
5. Lee AK
(2020) The Statistical Structure of the Hippocampal Code for Space as a Function of Time, Context, and Value
Cell 183:620–635.

https://doi.org/10.1016/j.cell.2020.09.024
- PubMed
- Google Scholar
1. Leonard JJ
2. Durrant-Whyte HF
(1991) Mobile robot localization by tracking geometric beacons
IEEE Trans Robot Autom 7:376–382.

https://doi.org/10.1109/70.88147
- Google Scholar
(2012) Optimal Population Codes for Space: Grid Cells Outperform Place Cells
Neural Computation 24:2280–2317.

https://doi.org/10.1162/NECO00319
- PubMed
- Google Scholar
(2006) Path integration and the neural basis of the ’cognitive map
Nature Reviews. Neuroscience 7:663–678.

https://doi.org/10.1038/nrn1932
- PubMed
- Google Scholar
(2004) RatSLAM: a hippocampal model for simultaneous localization and mapping
In: ICRA 1:403–408.

https://doi.org/10.1109/ROBOT.2004.1307183
- Google Scholar
(2011) Modular realignment of entorhinal grid cell activity as a basis for hippocampal remapping
The Journal of Neuroscience 31:9414–9425.

https://doi.org/10.1523/JNEUROSCI.1433-11.2011
- PubMed
- Google Scholar
1. Monasson R
2. Rosay S
(2013) Crosstalk and transitions between multiple spatial maps in an attractor neural network model of the hippocampus: Phase diagram
Physical Review. E 87:62813.

https://doi.org/10.1103/PhysRevE.87.062813
- Google Scholar
1. Mosheiff N
2. Burak Y
(2019) Velocity coupling of grid cell modules enables stable embedding of a low dimensional variable in a high dimensional neural attractor
eLife 8:e48494.

https://doi.org/10.7554/eLife.48494
- PubMed
- Google Scholar
1. Muller R
2. Kubie J
3. Ranck J
(1987) Spatial firing patterns of hippocampal complex-spike cells in a fixed environment
Journal of Neuroscience 7:1935–1950.

https://doi.org/10.1523/JNEUROSCI.07-07-01935.1987
- PubMed
- Google Scholar
1. O’Keefe J
2. Dostrovsky J
(1971) The hippocampus as a spatial map Preliminary evidence from unit activity in the freely-moving rat
Brain Research 34:171–175.

https://doi.org/10.1016/0006-8993(71)90358-1
- PubMed
- Google Scholar
Book
1. O’Keefe J
2. Nadel L
(1978)
The Hippocampus as a Cognitive Map

Clarendon Press.
- Google Scholar
(2011) Ensemble place codes in hippocampus: Ca1, ca3, and dentate gyrus place cells have multiple place fields in large environments
PLOS ONE 6:e22349.

https://doi.org/10.1371/journal.pone.0022349
- PubMed
- Google Scholar
1. Pedregosa F
2. Varoquaux G
3. Gramfort A
4. Michel V
5. Thirion B
6. Grisel O
7. Blondel M
8. Prettenhofer P
9. Weiss R
10. Dubourg V
11. Vanderplas J
12. Passos A
13. Cournapeau D
14. Brucher M
15. Perrot M
16. Duchesnay E
(2011)
Scikit-learn: Machine Learning in Python

Journal of Machine Learning Research 12:2825–2830.
- Google Scholar
1. Peled UN
2. Simeone B
(1985) Polynomial-time algorithms for regular set-covering and threshold synthesis
Discrete Applied Mathematics 12:57–69.

https://doi.org/10.1016/0166-218X(85)90040-X
- Google Scholar
Book
1. Platt J
(1998)
Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines

Microsoft Research Technical Report MSR-TR-98-14.
- Google Scholar
1. Poirazi P
2. Mel BW
(2001) Impact of Active Dendrites and Structural Plasticity on the Memory Capacity of Neural Tissue
Neuron 29:779–796.

https://doi.org/10.1016/S0896-6273(01)00252-5
- PubMed
- Google Scholar
(2004) Computational subunits in thin dendrites of pyramidal cells
Nature Neuroscience 7:621–627.

https://doi.org/10.1038/nn1253
- PubMed
- Google Scholar
Preprint
1. Postnikov A
(2006) Total Positivity, Grassmannians, and Networks
arXiv.

https://arxiv.org/abs/math/0609764
- Google Scholar
1. Rich PD
2. Liaw HP
3. Lee AK
(2014) Place cells Large environments reveal the statistical structure governing hippocampal representations
Science 345:814–817.

https://doi.org/10.1126/science.1255635
- PubMed
- Google Scholar
1. Rosenblatt F
(1958) The perceptron: a probabilistic model for information storage and organization in the brain
Psychological Review 65:386–408.

https://doi.org/10.1037/h0042519
- PubMed
- Google Scholar
1. Samsonovich A
2. McNaughton BL
(1997) Path integration and cognitive mapping in a continuous attractor neural network model
The Journal of Neuroscience 17:5900–5920.

https://doi.org/10.1523/JNEUROSCI.17-15-05900.1997
- PubMed
- Google Scholar
(2020) Hippocampal remapping as hidden state inference
eLife 06:9.

https://doi.org/10.7554/eLife.51140
- Google Scholar
Book
1. Shepard G
(1998)
The Synaptic Organization of the Brain

New York: Oxford Univ Press Inc.
- Google Scholar
(2006) From grid cells to place cells: A mathematical model
Hippocampus 16:1026–1031.

https://doi.org/10.1002/hipo.20244
- PubMed
- Google Scholar
1. Sompolinsky H
2. Kanter I
(1986) Temporal Association in Asymmetric Neural Networks
Physical Review Letters 57:2861–2864.

https://doi.org/10.1103/PhysRevLett.57.2861
- PubMed
- Google Scholar
1. Spruston N
(2008) Pyramidal neurons: dendritic structure and synaptic integration
Nature Reviews Neuroscience 9:206–221.

https://doi.org/10.1038/nrn2286
- PubMed
- Google Scholar
1. Sreenivasan S
2. Fiete I
(2011) Grid cells generate an analog error-correcting code for singularly precise neural computation
Nature Neuroscience 14:1330–1337.

https://doi.org/10.1038/nn.2901
- PubMed
- Google Scholar
(2017) The hippocampus as a predictive map
Nature Neuroscience 20:1643–1653.

https://doi.org/10.1038/nn.4650
- PubMed
- Google Scholar
1. Stensola H
2. Stensola T
3. Solstad T
4. Frøland K
5. Moser MB
6. Moser EI
(2012) The entorhinal grid map is discretized
Nature 492:72–78.

https://doi.org/10.1038/nature11649
- PubMed
- Google Scholar
1. Steward O
2. Scoville SA
(1976) Cells of origin of entorhinal cortical afferents to the hippocampus and fascia dentata of the rat
The Journal of Comparative Neurology 169:347–370.

https://doi.org/10.1002/cne.901690306
- PubMed
- Google Scholar
Book
(2016) Dendrites (Third edn)
Oxford University Press.

https://doi.org/10.1093/acprof:oso/9780198745273.001.0001
- Google Scholar
1. Suh J
2. Rivest AJ
3. Nakashiba T
4. Tominaga T
5. Tonegawa S
(2011) Entorhinal cortex layer III input to the hippocampus is crucial for temporal association memory
Science 334:1415–1420.

https://doi.org/10.1126/science.1210125
- PubMed
- Google Scholar
1. Tolman EC
(1948) Cognitive maps in rats and men
Psychological Review 55:189–208.

https://doi.org/10.1037/h0061626
- PubMed
- Google Scholar
(1996) Population dynamics and theta rhythm phase precession of hippocampal place cell firing: a spiking neuron model
Hippocampus 6:271–280.

https://doi.org/10.1002/(SICI)1098-1063(1996)6:33.0.CO;2-Q
- PubMed
- Google Scholar
Book
1. Vapnik VN
(1998)
Statistical Learning Theory

Wiley.
- Google Scholar
(2008) Grid cells: The position code, neural network models of activity, and the problem of learning
Hippocampus 18:1283–1300.

https://doi.org/10.1002/hipo.20519
- PubMed
- Google Scholar
1. Whittington JCR
2. Muller TH
3. Mark S
4. Chen G
5. Barry C
6. Burgess N
7. Behrens TEJ
(2020) The Tolman-Eichenbaum Machine: Unifying Space and Relational Memory through Generalization in the Hippocampal Formation
Cell 183:1249–1263.

https://doi.org/10.1016/j.cell.2020.10.024
- PubMed
- Google Scholar
Book
1. Widloski J
2. Fiete I
(2014) How does the brain solve the computational problems of spatial navigation?
In: Derdikman D, Knierim JJ, editors. Space, and Timeand Thememinipthermation Shippocampaformation. Springer. pp. 373–407.

https://doi.org/10.1007/978-3-7091-1292-214
- Google Scholar
1. Wilson M
2. McNaughton B
(1993) Dynamics of the hippocampal ensemble code for space
Science 261:1055–1058.

https://doi.org/10.1126/science.8351520
- PubMed
- Google Scholar
1. Witter MP
2. Groenewegen HJ
(1984) Laminar origin and septotemporal distribution of entorhinal and perirhinal projections to the hippocampus in the cat
The Journal of Comparative Neurology 224:371–385.

https://doi.org/10.1002/cne.902240305
- PubMed
- Google Scholar
1. Witter MP
2. Amaral DG
(1991) Entorhinal cortex of the monkey: V. Projections to the dentate gyrus, hippocampus, and subicular complex
The Journal of Comparative Neurology 307:437–459.

https://doi.org/10.1002/cne.903070308
- PubMed
- Google Scholar
(2000) Anatomical organization of the parahippocampal-hippocampal network
Annals of the New York Academy of Sciences 911:1–24.

https://doi.org/10.1111/j.1749-6632.2000.tb06716.x
- PubMed
- Google Scholar
Book
(2019)
Mechanistic Models of Place Cell Statistics in Large Environments

SfN Abstract.
- Google Scholar
1. Yoon K
2. Buice MA
3. Barry C
4. Hayman R
5. Burgess N
6. Fiete IR
(2013) Specific evidence of low-dimensional continuous attractor dynamics in grid cells
Nature Neuroscience 16:1077–1084.

https://doi.org/10.1038/nn.3450
- PubMed
- Google Scholar
1. Yoon K
2. Lewallen S
3. Kinkhabwala AA
4. Tank DW
5. Fiete IR
(2016) Grid Cell Responses in 1D Environments Assessed as Slices through a 2D Lattice
Neuron 89:1086–1099.

https://doi.org/10.1016/j.neuron.2016.01.039
- PubMed
- Google Scholar
1. Ziv Y
2. Burns LD
3. Cocker ED
4. Hamel EO
5. Ghosh KK
6. Kitch LJ
7. Gamal AE
8. Schnitzer MJ
(2013) Long-term dynamics of CA1 hippocampal place codes
Nature Neuroscience 16:264–266.

https://doi.org/10.1038/nn.3329
- Google Scholar
Book
1. Zuev YA
(1989)
Asymptotics of the Logarithm of the Number of Threshold Functions of the Algebra of Logic

Walter de Gruyter.
- Google Scholar

Article and author information

Author details

Man Yi Yim
1. Center for Theoretical and Computational Neuroscience, University of Texas, Austin, United States
2. Department of Neuroscience, University of Texas, Austin, United States
3. Department of Brain and Cognitive Sciences and McGovern Institute, MIT, Austin, United States
Contribution
conceptualization, data-curation, Formal analysis, Investigation, methodology, visualization, writing-original-draft

Competing interests
none
Lorenzo A Sadun

Department of Mathematics and Neuroscience, The University of Texas, Austin, United States

Contribution
Formal analysis, Investigation

Competing interests
None

"This ORCID iD identifies the author of this article:" 0000-0002-2518-573X
Ila R Fiete
1. Center for Theoretical and Computational Neuroscience, University of Texas, Austin, United States
2. Department of Brain and Cognitive Sciences and McGovern Institute, MIT, Austin, United States
Contribution
conceptualization, Formal analysis, funding-acquisition, Investigation, methodology, project-administration, resources, supervision, validation, visualization, writing-original-draft, writing-review-and-editing

For correspondence
fiete@mit.edu

Competing interests
none

"This ORCID iD identifies the author of this article:" 0000-0003-4738-2539
Thibaud Taillefumier
1. Center for Theoretical and Computational Neuroscience, University of Texas, Austin, United States
2. Department of Neuroscience, University of Texas, Austin, United States
3. Department of Mathematics and Neuroscience, The University of Texas, Austin, United States
Contribution
conceptualization, Formal analysis, funding-acquisition, Investigation, methodology, project-administration, resources, supervision, validation, visualization, writing-original-draft, writing-review-and-editing

For correspondence
ttaillef@austin.utexas.edu

Competing interests
none

"This ORCID iD identifies the author of this article:" 0000-0003-3538-6882

Funding

Simons Foundation (Simons Collaboration on the Global Brain)

Man Yi Yim
Ila R Fiete

Howard Hughes Medical Institute (Faculty Scholars Program)

Ila R Fiete

Alfred P. Sloan Foundation (Alfred P. Sloan Research Fellowship FG-2017-9554)

Thibaud Taillefumier

Office of Naval Research (S&T BAA Award N00014-19-1-2584)

Ila R Fiete

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was supported by the Simons Foundation through the Simons Collaboration on the Global Brain, the ONR, the Howard Hughes Medical Institute through the Faculty Scholars Program to IRF, and the Alfred P Sloan Research Fellowship FG-2017-9554 to TT. We thank Sugandha Sharma, Leenoy Meshulam, and Luyan Yu for comments on the manuscript.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.