Velocity coupling of grid cell modules enables stable embedding of a low dimensional variable in a high dimensional neural attractor
Abstract
Grid cells in the medial entorhinal cortex (MEC) encode position using a distributed representation across multiple neural populations (modules), each possessing a distinct spatial scale. The modular structure of the representation confers the grid cell neural code with large capacity. Yet, the modularity poses significant challenges for the neural circuitry that maintains the representation, and updates it based on self motion. Small incompatible drifts in different modules, driven by noise, can rapidly lead to large, abrupt shifts in the represented position, resulting in catastrophic readout errors. Here, we propose a theoretical model of coupled modules. The coupling suppresses incompatible drifts, allowing for a stable embedding of a twodimensional variable (position) in a higher dimensional neural attractor, while preserving the large capacity. We propose that coupling of this type may be implemented by recurrent synaptic connectivity within the MEC with a relatively simple and biologically plausible structure.
https://doi.org/10.7554/eLife.48494.001Introduction
Much of the research on neural coding in the brain is focused on sensory representations, which are driven by external inputs that can be precisely controlled experimentally. In comparison, less is known about neural coding in deep brain structures, in which the dynamics reflect the outcome of internal computations. A notable exception is the hippocampal formation, where neural activity has been linked to high level cognitive variables such as an animal's estimate of its position within its environment (O’Keefe and Nadel, 1978; Moser et al., 2008; Taube et al., 1990), or its estimate of elapsed time within a trial of a trained behavioral task (Manns et al., 2007; Pastalkova et al., 2008; Itskov et al., 2011; Eichenbaum, 2014).
Specifically, the representation of position by grid cells (Hafting et al., 2005) in the medial entorhinal cortex (MEC) has led to new, unexpected insights on the neural coding of such quantities: even though position is a low dimensional variable, it is jointly encoded by several distinct populations of cells (modules: Stensola et al., 2012), exhibiting periodic spatial responses with varying spatial scales. The spatial responses of all grid cells within a module are characterized by the same grid spacing and orientation, while differing from each other only by a rigid translation. The representation of position by each module is ambiguous, but taken together, the joint activity in several modules constitutes a highly efficient and unambiguous neural code (see Burak, 2014). Due to its distributed organization, the grid cell code possesses a high dynamic range (ratio between the range of unambiguous representation and resolution; Burak, 2014), greatly exceeding the performance of unimodal coding schemes such as the representation of position by place cells in the hippocampus (Fiete et al., 2008; Sreenivasan and Fiete, 2011; Mathis et al., 2012b; Wei et al., 2015; Mosheiff et al., 2017).
Alongside the potential benefits arising from the combinatorial nature of the grid cell code, the distributed representation of position over several modules and spatial scales poses a mechanistic challenge to the underlying neural circuitry. The difficulty lies in the hypothesized role of the hippocampal formation, and specifically the MEC, in maintenance of shortterm memory and idiothetic path integration, as opposed to pure representation. When grid cells update their activity, for example based on self motion, they must do so in a coordinated manner, in order for them to coherently represent a position in twodimensional space, a variable of much lower dimensionality than the joint activity of all cells.
Neurons within a module maintain strict relationships in their joint activity (Yoon et al., 2013). These relationships are maintained across environments (Fyhn et al., 2007); under abrupt distortions of the environment (Barry et al., 2007; Yoon et al., 2013); in novel environments (Barry et al., 2012; Yoon et al., 2013), in which stable place fields are absent; during hippocampal inactivation (Almog et al., 2019); under genetic perturbations that disrupt the spatial periodicity in the response of individual cells (Allen et al., 2014); and in sleep (Trettel et al., 2019; Gardner et al., 2019). The rigidity of the correlation structure strongly suggests that the neural activity within a module is tightly coordinated by recurrent connectivity, consistent with attractor models of grid cell activity (McNaughton et al., 2006; Fuhs and Touretzky, 2006; Guanella et al., 2007; Burak and Fiete, 2009), which propose that synaptic connectivity restricts the joint activity within a module to lie on a twodimensional manifold. Additional support for attractor models has been recently obtained by imaging activity of multiple grid cells using calcium imaging in rats running on a virtual onedimensional track (Heys et al., 2014; Gu et al., 2018). These studies revealed a relationship between position on the cortical sheet and the preferred firing locations of grid cells, as predicted by the variants of attractor models that rely on distancedependent connectivity.
In contrast to the strong correlation in the activity of neurons within a module, much less is known about coupling of neurons that belong to different modules. A network of grid cells organized in $m$ modules, each independently structured as a twodimensional continuous attractor, possesses a $2m$ dimensional space of accessible steady states. Yet at any given time, continuous motion of the animal corresponds to a twodimensional subspace of the possible local changes in the state of the $m$ modules. Considering that noise may corrupt the representation of position in each module separately, the maintenance of a coherent representation of position across modules necessitates some form of coupling between them (Welinder et al., 2008; Sreenivasan and Fiete, 2011; Burak, 2014). Figure 1A demonstrates the need for this coupling: incoherent drifts in the positions represented by different modules, accrued due to noise, can rapidly produce a joint representation of position that is incompatible with any position in the close vicinity of the animal (Fiete et al., 2008; Welinder et al., 2008; Burak, 2014; Vágó and Ujfalussy, 2018). The desired coupling across modules is more subtle than the one observed within a module: the coupling should restrict changes in the states of different modules to lie within the twodimensional subspace that corresponds to smooth movement of the animal within its local environment. However, to preserve the high dynamic range of the code, the coupling should not restrict the dimensionality of the accessible steady states.
To further illustrate this point, it is instructive to consider an analogy of grid cell coding to the representation of a onedimensional position by the rotation angles ${\theta}_{1}$ and ${\theta}_{2}$ of two meshing gears (Figure 1B). We imagine that motion along the onedimensional axis corresponds to coordinated rotation of the two gears (Figure 1B, bottom). If the radii ${R}_{1}$ and ${R}_{2}$ of the two gears are incommensurate, a large distance is traversed before the two meshing gears come close to a previously visited state, thus allowing for a large range of positions to be unambiguously represented. However, it is crucial in this scheme that during continuous motion, the gears rotate in a coordinated manner: ${\dot{\theta}}_{1}{R}_{1}={\dot{\theta}}_{2}{R}_{2}$. This relationship between the phase velocities ${\dot{\theta}}_{1}$ and ${\dot{\theta}}_{2}$ is enforced by the meshing cogs along the circumference of the two gears. In the absence of this mechanical constraint, small movement of one gear relative to the other can abruptly transport the represented position to a distant location, unrelated to the original position. Note that the absolute angles of the two meshing gears are not constrained: in fact, the large capacity of the representation relies on the fact that any combination of the two angles is accessible (compare panels BC in Figure 1).
Motivated by this analogy, we ask whether synaptic connectivity between grid cell modules can enforce a similar dynamic relationship between their states. Below, we identify a simple form of synaptic connectivity between grid cells that can implement this desired form of coupling. Next, we show that the recurrent connectivity confers the joint representation of position with resilience to two types of noise: First, noise in the velocity inputs projecting to different modules. These may differ in different modules, for example, due to synaptic noise. Second, noise arising from the stochastic dynamics of spiking neural activity within each module. The outcome is a continuousattractor representation of position that achieves two goals: First, the representation is distributed across several modules with different spatial scales, allowing for combinatorial capacity by preserving the high dimensionality of the accessible steady states. Second, the neural circuitry that supports this representation is highly robust to noise when updating the representation based on selfmotion, or while maintaining a persistent representation in shortterm memory.
Alongside the recurrent connectivity, it is plausible that feedforward synaptic projections from the hippocampus to the entorhinal cortex play a role in shaping the grid cell response (Kropff and Treves, 2008; Dordek et al., 2016; D'Albis and Kempter, 2017; Weber and Sprekeler, 2018). Thus, hippocampal inputs may aid in coupling the states of different grid cell modules (Welinder et al., 2008; Sreenivasan and Fiete, 2011). In addition, sensory inputs that carry information about the animal’s position may contribute as well, through synaptic projections to the MEC from other cortical areas. However, there are situations in which these types of inputs to the MEC cannot ensure appropriate coordination between grid cells modules. First, under conditions in which sensory inputs are absent or weak, the brain must rely on idiothetic path integration in order to update its estimate of position. Second, in novel environments, and following global remapping in the hippocampus (Muller and Kubie, 1987), it is highly unlikely that specific connections between place cells and grid cells, that couple the two spatial representations, are immediately established. Hence, coupling modules via hippocampal inputs would be ineffective in a novel environment. Thus, in this study, we focus on the ability of recurrent connectivity within the entorhinal cortex to maintain a coherent representation of position across gridcell modules.
Results
Theoretical framework
In laying out the principles underlying our proposed synaptic connectivity, we consider first a one dimensional analogue of the grid cell representation, inspired by the analogy to meshing gears discussed above: we imagine that an animal moves in one dimension, and neurons in each grid cell module $\mu $ jointly represent the modulus of position relative to the grid spacing ${\lambda}_{\mu}$ (for simplicity, from here on, we define the phases ${\theta}_{\mu}$ such that they are in the range $[0,1]$):
We hypothesize that the joint dynamics of all grid cells within a module are restricted to lie in a one dimensional attractor, which we model as a ring attractor (BenYishai et al., 1995; Zhang, 1996). More specifically, we consider the doublering architecture (Xie et al., 2002), which includes a mechanism for updates based on velocity inputs, and was proposed as a model for integration of angular velocity inputs by head direction cells in rodents. Recent discoveries in the Drosophila melanogaster central complex point to a representation of heading that is maintained by neural circuitry with close similarity to this architecture (Seelig and Jayaraman, 2015; TurnerEvans et al., 2017; Kim et al., 2017; Green et al., 2017). Attractor models of grid cells in the entorhinal cortex (Fuhs and Touretzky, 2006; Burak and Fiete, 2009) generalize the doublering attractor model to motion in two dimensions.
Within the doublering attractor model (Xie et al., 2002), a module consists of two recurrently connected neural subpopulations, each comprising $N$ neurons organized on a ring (left ring and right ring, Figure 2A). We denote by
the vector of synaptic activations, where ${\overrightarrow{s}}_{R}$ and ${\overrightarrow{s}}_{L}$ represent the synaptic activation of the right and left subpopulations respectively. The synaptic activation ${s}_{i}$ of neuron $i$ follows the dynamics:
where $\tau $ is the synaptic time constant, ${r}_{i}$ is the firing rate of neuron $i$, $\varphi $ is a nonlinear transfer function, $W$ is the connectivity matrix (Equation 8; Equation 9), ${I}_{0}$ is a constant input, and $+dI$ ($dI$) is the velocity input to a neuron in the right (left) subpopulation. Synaptic weights projecting from neurons in each ring are shifted clockwise (right) or anticlockwise (left). When both subpopulations receive identical feedforward inputs, activity in the network settles on a stationary bump of activity. However, selective activation of the right (left) subpopulation via the feedforward inputs, induces clockwise (anticlockwise) motion of the activity bump at a phase velocity proportional to the velocity input $dI$ (Figure 2B). Hence, in a noisefree network, the position of the activity bump is an integral of the velocity input.
Our goal is to couple several such modules such that they will update their states in a coordinated manner in the presence of noisy inputs. It is essential to couple the modules based on their phase velocities ${\dot{\theta}}_{\mu}$ and not directly by their phases ${\theta}_{\mu}$, as we want to allow all phase combinations of the different modules to be possible steady states of the population neural dynamics. Our proposed coupling requires two ingredients: reading out the phase velocity of each module, and inducing corresponding phase velocities in the other modules. The double ring model already contains a mechanism for integration of velocity inputs, and therefore, our main challenge is in reading out the phases velocities.
Simple neural readout of velocity
Our first goal is to read out the phase velocity of a single module in our system. It is possible to compute the phase velocity by projecting Equation 3 on the eigenvector with zero eigenvalue of the dynamics (see Appendix 1 and Burak and Fiete, 2012). However, this projection cannot be evaluated linearly from the neural activity, since the projection coefficients depend on the location of the activity bump. Instead, we seek a simple estimate of the phase velocity that can be implemented in a neural circuit with relatively simple architecture in a biologically plausible manner.
Intuitively, in the described framework, most of the motion arises from the differences in activity between the right and left subpopulations Figure 2B. Therefore, this difference might be close to the phase velocity $\dot{\theta}$. We find, indeed, that the difference between the synaptic activities of the right and left subpopulations,
provides a good approximation for the phase velocity (Figure 2C), where $\beta $ is a proportionality factor. In Appendix 1 we show mathematically that $\omega \approx \dot{\theta}$.
Coupling modules by synaptic connectivity
In order to couple the motion of different modules, we use the readout signal ${\omega}_{\mu}$ of each module $\mu $ (Equation 4) as a velocity input to all other modules (Figure 2D–E, green arrows). In addition, we include negative self coupling within each module using the same readout signal ${\omega}_{\mu}$ (necessary, as shown below, in order to prevent instabilities that otherwise arise from the positive feedback generated by the intermodule couplings), Figure 2E (orange arrows).
Note that ${\omega}_{\mu}$ is a linear function of synaptic activities within the ring network, with coefficients that do not depend on the position of the activity bump. Thus, the coupling can be implemented by recurrent connectivity within the MEC, between modules and within a single module. The resulting synaptic connectivity between any two coupled modules is alltoall in the sense that every neuron in one module is connected to every neuron in the other module, with synaptic weights whose magnitudes are uniform (see Materials and methods, Equation 12). The sign of each synaptic weight depends only on the subpopulation (left or right) of the pre and postsynaptic neurons. This connectivity is completely symmetric to rotation in the two modules, thus preserving the ability to obtain a combinatorially large manifold of steady states in which activity bumps can be placed in any combination of positions.
To understand how the couplings influence the joint dynamics of the coupled modules, we analyze the response of a network, consisting of $m$ coupled modules, to external velocity inputs, $\overrightarrow{b}(t)$. The position of the bump in each module can be represented by a phase ${\theta}_{\mu}$. We find that the dynamics of these phases are governed by the following set of coupled differential equations
where $C$ is an $m\times m$ matrix whose element ${C}_{\mu \rho}$ represents the coupling strength from module $\rho $ to module $\mu $, $f*\dot{\overrightarrow{\theta}}$ is the convolution of $\dot{\overrightarrow{\theta}}$ with an exponential filter $f$ with the synaptic time scale $\tau $ (Equation 36), and $\alpha $ is a constant factor (see full derivation of Equation 5 in Appendix 2). Thus, the phase of each module is updated in response to two signals: the external velocity input projecting into the module (first term in Equation 5), and the recent history of changes in the phases of the other modules, conveyed by the coupling signal (second term in Equation 5).
Much of the system’s response to external velocity inputs can be understood by considering its dynamics under the assumption that the motion of the animal is sufficiently slow, such that the components of $\dot{\overrightarrow{\theta}}$ vary slowly compared to the synaptic time constant. Under this assumption, we obtain from Equation 5
where
is the linear response tensor.
For simplicity, let us consider first only two coupled modules (each of them one dimensional), with identical self coupling strength ${C}_{s}$ for both modules. The eigenvalues of $X$, denoted by ${X}_{+}$ and ${X}_{}$ (Figure 3A–C and Appendix 2), indicate how strongly the coupled modules respond to velocity inputs that drive coordinated and relative motion, respectively. If ${X}_{}$ is small, the modules respond weakly to velocity inputs that attempt to update the phases in an uncoordinated manner. Thus, if ${X}_{}$ is much smaller than ${X}_{+}$, we expect the motion of the two modules to remain coordinated, even if the velocity inputs to the two modules differ.
We choose coupling parameters ${C}_{1}$, ${C}_{2}$ and ${C}_{s}$ such that three requirements are fulfilled (see Appendix 2): First, the modules should respond significantly to inputs that drive coordinated motion (large ${X}_{+}$, Figure 3A,C). The response to such inputs should not be suppressed since the system must be able to update its state based on velocity inputs, to correctly represent the animal’s position in its environment. Second, the modules should respond very weakly to inputs that drive anticorrelated motion (small relative motion ${X}_{}$, Figure 3B,C). The self negative coupling ${C}_{s}$ enables us to achieve these two requirements while preserving stability (Figure 3A–C and Appendices 23). Our last requirement is the maintenance of a specific ratio between the module phase velocities, $\lambda $, that corresponds to the grid spacing ratio between successive modules (we set $\lambda =\sqrt{2}$ for all modules; Stensola et al., 2012).
Figure 3E–F demonstrates the response of two modules to an external velocity input, representing an animal’s trajectory (shown in Figure 3D). The input is given only to module 1. In the the case of uncoupled modules (${C}_{1}={C}_{2}={C}_{s}=0$), only module 1 follows the trajectory, as expected (Figure 3E). In the case of coupled modules, both of the modules follow the trajectory quite accurately, with the desired phase velocity ratio $\lambda $. Hence, under these conditions, the two coupled modules shift in a coordinated manner, even if they receive incompatible velocity inputs. Next, we generalize these results to multiple modules, and to grid cells in two dimensions.
Generalization to two dimensions and several modules
The coupling of modules, described so far, can be easily extended to grid cell responses in two dimensions. In accordance with grid cell responses in twodimensional arenas, each module is structured as a twodimensional attractor, whose state is determined by two periodic phases. Thus, the steady states of the attractor are arranged on a torus instead of a ring. To obtain this topology, we use a network architecture in which neurons are arranged on a parallelogram, corresponding to a unit cell of the hexagonal grid (in similarity to Guanella et al., 2007). The synaptic connection between any two neurons depends on their distance on the parallelogram, defined using periodic boundary conditions (Figure 4A and Materials and methods). The position of the activity bump must be able to shift in response to a twodimensional velocity input, in any direction in the plane. To implement this velocity response, each module contains four subpopulations (right, left, up, and down). The synaptic weights projecting from neurons in each subpopulation are shifted in a corresponding direction within the neural population (Burak and Fiete, 2009).
In addition, we generalize our network to $m$ grid cell modules. The coupling strengths ${C}_{\mu \rho}$ thus comprise ${m}^{2}$ parameters that we are free to adjust to fulfill a set of requirements, similar to those applied in the case $m=2$. Our most important goal is that the motion of all modules should be coordinated, even if the velocity inputs are not identical. To achieve this goal, we define a joint motion vector $\overrightarrow{u}$, such that ${u}_{\mu}/{u}_{\rho}$ is the ratio of grid spacings of modules $\rho $ and $\mu $. We require that this vector is an eigenvector of the linear response tensor, and minimize the eigenvalues corresponding to all other eigenvectors. If we were able to obtain eigenvalues that precisely vanish, the response tensor would be a rank one matrix whose columns are all proportional to $\overrightarrow{u}$. Under this idealized outcome, any velocity input, regardless of its direction in the $2m$ dimensional input space would result in coordinated motion of the modules. However, the couplings ${C}_{\mu \rho}$ that precisely achieve this goal diverge, in similarity to the twomodule case (Appendix 2). Thus, we impose a constraint on the strength of the synaptic connections. Another constraint is that all eigenvalues of $C$ must be smaller than unity. Otherwise, the system exhibits dynamic instability (Appendix 3). We optimize an appropriate target function under these constraints (Appendix 3).
For $m=2$ the optimization results in the same solution of coupling parameters ${C}_{\mu \rho}$ that we found previously. For $m>2$, we find that there is considerable freedom in choosing combinations of ${C}_{\mu \rho}$ that achieve satisfactory coupling (See Appendix 3). One principled way to reduce this freedom, is to require that there is connectivity only between successive modules. This choice is compatible with recent observations (Fu et al., 2018) that excitatory synaptic connectivity within the MEC is relatively short ranged. In our numerical results, we use this assumption to constrain the structure of the connectivity matrix ${C}_{\mu \rho}$, but other choices that include broader connectivity between modules lead to similar coupling between the modules.
To demonstrate how our proposed coupling affects the response of the modules to velocity inputs, we simulate the described network in two dimensions, with and without coupling, and with three modules. The velocity input is a measured rat trajectory from Stensola et al. (2012), with the addition of white Gaussian noise, drawn independently in the three modules. In each module, we set the proportionality coefficient that tunes the modulation of activity by the velocity input (${\gamma}_{\mu}$ in Equation 21) to achieve the desired grid spacing, even in the absence of intermodule coupling. In the simulation, we assume that only velocity inputs are responsible for the update of the neural representation of position, thus mimicking a situation in which sensory cues, such as those arising from visual inputs and encounters with the walls (Hardcastle et al., 2015; Keinath et al., 2018), are absent. In a noisefree simulation, the single cell firing rates form a hexagonal grid pattern as a function of the animal’s location (Figure 4B), as expected from the network structure, while the spacing ratio between modules is close to $\lambda $.
The trajectories of the 2d phases, in response to noisy velocity inputs, are shown, for each of the three modules, in Figure 4D (uncoupled modules) and Figure 4E (coupled modules). In both cases, the phases follow the animal’s trajectory (Figure 4C) quite closely, but the phases are much more similar to each other, and to the original trajectory, in the coupled case. Since panels DE show results only from a single simulation, we repeat the analysis for 100 realizations of the noise in the velocity inputs, to obtain statistical measures on the coupled vs. uncoupled dynamics. The coupling substantially reduces the mismatch accrued between the trajectories of the different modules, compared to the uncoupled case, Figure 4F. For comparison, Figure 4G shows the mismatch between module trajectories and the true trajectory. In the uncoupled case, all modules exhibit similar accumulation of error, which arises from their independent responses to the noise in the velocity inputs. In the coupled case, only the projection of the noise on the direction of joint motion contributes to the accumulation of errors, leading to a reduction by a factor of $m$ (in our case 3) in the slope of the MSE curve.
The lack of deviations between the phase trajectories, seen in the coupled case (Figure 4E–F), is an essential difference between the dynamics of the coupled and uncoupled modules. As discussed in the Introduction, we expect this difference to strongly impact the stability of the grid cell code. In the following section, we substantiate this point.
Consequences for spatial representation and readout
We next aim to validate our hypothesis that the coupling of modules stabilizes the grid cell code, and more specifically, prevents catastrophic errors that can be caused by uncoupled drift in the phases of different modules (Figure 1A). We simulate the dynamics of our three module network with noisy velocity inputs based on a measured rat trajectory from Stensola et al. (2012), as in Figure 4. We then generate Poisson spikes from the instantaneous firing rates of the neurons, and read out the animal’s trajectory from the simulated spikes: we do so both for coupled modules (Figure 5A) and for uncoupled modules (Figure 5B). The readout is accomplished using a decoder that sums spikes from the recent history, with an exponentially decaying temporal kernel (see Materials and methods and Mosheiff et al., 2017). In Figure 5 the spikes are involved only in the readout process and not in the intrinsic dynamics of the neural network.
In the case of coupled modules the decoded trajectory is similar to the input (Figure 5A), but due to the noise in the inputs, it gradually accrues an error relative to the true trajectory. Without coupling, the position read out from the network activity diverges sharply from the true trajectory (Figure 5B). Moreover, the readout trajectory is often discontinuous in time, and thus cannot be a good approximation to any reasonable path of the animal. The discontinuity arises from uncorrelated drifts of the modules which, combined with the periodic nature of the grid pattern, cause catastrophic readout errors, much larger than the errors accrued in the phases of each module separately (Figure 1A).
In order to quantitatively substantiate the relationship between the large deviation of the decoded trajectory from the true trajectory and the occurrence of catastrophic readout errors, we repeat the decoding process a hundred times with and without coupling, for a 20s input trajectory. In all realizations with coupling, the readout is coordinated with the input trajectory (Figure 5C, blue). In contrast, without coupling almost all realizations exhibit discontinuities within a time interval of 20s (Figure 5C, red). The mean square error (MSE) of the decoder increases as a function of time in the coupled as well as the uncoupled systems (Figure 5D), as expected due to noise in the input (as the coupling and inputs are of velocity and not location, there is no correcting mechanism that can correct coordinated shifts in the phases of all the modules). However, a few seconds after the start of the simulation, the MSE grows sharply in the uncoupled system (dashed line in Figure 5D). The time at which this starts to happen coincides with the first appearance of discontinuities in the decoded position (compare Figure 5 panels C and D). (Note that below this time the probability for occurrence of a readout discontinuity does not vanish, but can be inferred roughly to be small compared to 0.01 since we performed 100 simulations.) Thus, the dramatic reduction achieved by the coupling between modules arises primarily from the elimination of catastrophic readout errors. This conclusion is insensitive to the choice of the time scale of temporal integration used in the decoding process (Figure 5—figure supplement 1).
Qualitatively similar conclusions, as demonstrated above, are obtained also when the number of modules $m$ is increased (Figure 5—figure supplement 2). With larger $m$, the rate at which readout discontinuities occur in a given environment diminishes. Note, however, that additional modules enable unambiguous representation of larger environments (Fiete et al., 2008; Mathis et al., 2012a; Wei et al., 2015; Vágó and Ujfalussy, 2018), and that the rate of readout discontinuities increases with the size of the environment (red traces in Figure 5—figure supplement 2E–J).
Intrinsic neural noise
Up to this point we presented a theory of several grid cell modules, coupled to each other by synaptic connectivity within the MEC, such that the coupling significantly suppresses incompatible drifts caused by noisy inputs to the system. Next, we wish to address another important source of noise, arising from the variability in the spiking of individual neurons within the grid cell network (Softky and Koch, 1993; Shadlen and Newsome, 1994; Burak and Fiete, 2009). In similarity to noise in the inputs, stochasticity of the neurons participating in the attractor network drives errors that accumulate over time with diffusive dynamics (Burak and Fiete, 2009; Burak and Fiete, 2012). To model this process, we replace the firing rate of each neuron in Equation 3 by a Poisson spike train (see Equation 22).
Since we designed our network to be resilient to noisy inputs, it is not obvious that the same architecture can also provide resilience to intrinsic noise. To address this question, we revisit first the simple case of two coupled modules in one dimension. In Appendix A.2 we show that the simple readout of velocity used to couple the modules (Equation 4), is a good approximation for the phase velocity driven by intrinsic neural noise, suggesting that the coupling introduced previously can help suppress uncoordinated drifts. To quantify the impact of coupling on coordination of the modules, we compute the diffusion tensor of their phases, using Equation 25 (the calculation is based on the theoretical framework laid out in Burak and Fiete, 2012; see specifically Eq. S24). In the uncoupled case, the diffusion tensor is isotropic as expected (Figure 6A, blue line). When the modules are coupled, with the same coupling strengths as in Figure 3, the diffusion of the two modules is highly anisotropic (Figure 6A). The first principal axis of the diffusion tensor (red ellipse in Figure 6A) closely matches the direction of coordinated motion (dashed line in Figure 6A). The diffusion coefficient ${D}_{}$, associated with motion in the orthogonal direction, is much smaller than the diffusion coefficient ${D}_{+}$, associated with coordinated motion: ${D}_{+}/{D}_{}\sim {X}_{+}/{X}_{}\sim 40$ (compare Figures Figure 3C and Figure 6A). Thus, the coupling strongly suppresses incompatible diffusion of the two modules.
Next, we evaluate the consequences for representation and readout, arising from the suppression of incompatible diffusion arising from intrinsic neural noise. We repeat the simulation of three coupled modules in two dimensions, this time with stochastic (Poisson) neurons. Discontinuities in the decoded trajectory occur both in the uncoupled and coupled networks, but they are much more rare in the coupled network (Figure 6B). Accordingly, the readout MSE is reduced dramatically in the coupled network (Figure 6C, note the logarithmic scale). Thus, the coupling is effective not only in stabilizing the neural representation in response to noisy inputs, but also with respect to internal stochasticity within the grid cell network.
In principle, one could seek coupling parameters such that diffusion would be suppressed in all directions. However, recall our first requirement from the section Coupling modules by synaptic connectivity, that the network must respond with sufficient gain to external inputs, to follow an animal’s trajectory. As we keep the joint response strong, we cannot reduce the joint diffusion simultaneously, and we are satisfied with coupling of the diffusive drift, without eliminating coordinated diffusion.
Discussion
Previous works (Fiete et al., 2008; Mathis et al., 2012b; Wei et al., 2015; Mosheiff et al., 2017) have shown that grid cell activity, viewed as a neural code for position, achieves a high dynamic range due to the splitting of the representation across multiple modules. In this work, we addressed a key difficulty with this idea: the combinatorial nature of the representation, arising from the existence of multiple modules, leads to high vulnerability to noise. Small uncoordinated errors in the phases of the different modules can shift the represented position to a far away location. As a possible solution to this difficulty, we proposed a simple architecture of synaptic connectivity between grid cell modules, that can suppress incompatible drifts. The functional coupling between modules, arising from our proposed synaptic connectivity involves velocities, but is completely insensitive to their phases. Consequently, the coupling does not limit the combinations of possible phases of the different modules, and thus does not affect the capacity of the code.
Similar principles may apply to storage in working memory and coding of other continuous, low dimensional variables in the brain. Thus, the main contribution of our work from the theoretical perspective, is that it identifies a way to couple several low dimensional continuous attractors of dimension $d$ (in the case of grid cells, $d=2$), to produce a persistent neural representation of a single, $d$ dimensional variable with high dynamic range. The dynamics of the network are characterized by two seemingly contradictory features: first, the steady states of the system span a space of dimension $md$, where $m$ is the number of modules. Second, during maintenance and continuous update of the stored memory, the joint state of the modules is dynamically restricted to lie in a much smaller, $d$dimensional subspace. This enables the continuous embedding of a $d$ dimensional variable in the larger, $md$ dimensional space, without allowing for noise to shift the state of the system outside the appropriate, $d$ dimensional local subspace.
In the simulations of the coupled and uncoupled grid cell networks (Figure 5; Figure 6), our main goal was to demonstrate that with a reasonable choice of parameters, catastrophic readout errors are highly detrimental, and that the coupling mechanism greatly reduces the rate at which they occur. The rate of catastrophic readout errors, quantified in Figure 5C for specific choices of parameters, depends also on the noise sources, and on the probability that a set of shifted phases might match an alternative position in the environment. The latter quantity is influenced by the size of the environment, the number of modules, and the specific grid spacings and orientations (Fiete et al., 2008; Welinder et al., 2008; Burak and Fiete, 2009; Sreenivasan and Fiete, 2011; Burak, 2014; Vágó and Ujfalussy, 2018) (see also Figure 5—figure supplement 2).
The number of grid cell modules in the entorhinal cortices of rats and mice is unknown. So far there is direct evidence for the existence of four modules, but the number may be larger (Stensola et al., 2012; Rowland et al., 2016). A theoretical attempt to compare grid cell systems with different number of modules in terms of the rate of readout discontinuities, would require additional assumptions on the range of positions that are represented in each system: increasing the number of modules reduces the rate of readout discontinuities within the range of a given environment, but it offers the possibility to unambiguously represent larger environments (Fiete et al., 2008; Mathis et al., 2012a; Wei et al., 2015; Vágó and Ujfalussy, 2018), for which the error rate is higher Figure 5—figure supplement 2E–J. Furtheremore, the gain in capacity of the grid cell code, obtained with addition of modules, may be harnessed by the entorhinalhippocampal system to generate unique representations of different environments (Fyhn et al., 2007). Thus, incoherent phase errors may lead to confusion between different spatial maps, in addition to the confusion between two positions in any given environment. Accordingly, the rate of catastrophic readout errors may be influenced by the number of spatial maps represented in the brain.
Our proposed mechanism for coupling modules is complementary to another possible mechanism, of coupling grid cell modules through the reciprocal synaptic connectivity between the entorhinal cortex and the hippocampus (Welinder et al., 2008; Sreenivasan and Fiete, 2011; Burak, 2014). Since biological systems often harness multiple mechanisms to achieve the same function, both mechanisms might act in parallel to stabilize the grid cell code against catastrophic readout errors. As discussed in the introduction, it is highly unlikely that coupling via the hippocampus could work in a novel environment, following global remapping. On the other hand, it is of particular importance for the brain to establish a geometric representation of position, aided by idiothetic path integration, under this scenario. Thus, the velocity coupling mechanism proposed in this work may play an especially important role in generating a cognitive map of a novel environment.
Inputs from cells within the MEC may play a role in stabilizing the grid cell representation, alongside inputs from the hippocampus or other areas. These may include inputs to grid cells from border cells (Solstad et al., 2008) or objectvector cells (Høydal et al., 2019). Experimentally, it has been demonstrated that phase resets occur in the grid cell representation upon encounters with enviornmental boundaries (Hardcastle et al., 2015; Keinath et al., 2018; Ocko et al., 2018), and it has been argued theoretically that such resets can be implemented in attractor models of grid cells by inputs to grid cells from border cells (Hardcastle et al., 2015; Keinath et al., 2018; Ocko et al., 2018; Pollock et al., 2018). The origin of spatial specificity of border cells and objectvector cells is not yet identified, but since both types of cells are active even when an animal is not facing the features associated with their activation, their role in stabilizing the grid cell representation may be similar to the hypothesized role of place cell inputs in stabilizing the grid cell code, perhaps more so than the role of direct sensory inputs.
A model that involves synaptic coupling between modules, of a different architecture than the one considered here, has been recently proposed in Kang and Balasubramanian (2019). This model does not explore the consequences of noise on coding stability, and its primary goal is to explain the ratios between grid spacings, and the emergence of modularity (see also Urdapilleta et al., 2017). Hence, Kang and Balasubramanian (2019) address different questions from those studied in the present work. Nevertheless, it is plausible that the synaptic connectivity proposed by Kang and Balasubramanian stabilizes the dynamics against incompatible motion of the modules. An important difference between the network architecture explored in Kang and Balasubramanian (2019) and the architecture explored here, is that we consider connectivity between modules which is alltoall (every grid cell in one module projects to every grid cell in the other module), and is designed to be invariant to any static, relative shift in the module phases. Hence, all combinations of phases are steady states of the dynamics. In contrast, the synaptic connectivity considered in Kang and Balasubramanian (2019) is spatially local. Consequently, it tends to produce interlocked patterns of activity in adjacent modules, with shared spatial periodicity, and preferred relative spatial phases. These properties of the activity patterns are expected to limit the representational capacity of the code. Here, we addressed a different computational goal, of stabilizing a distributed representation of position over multiple modules, without compromising the dynamic range of the neural coding scheme.
Our focus in this work was on the suppression of relative motion across modules, but noise in the inputs, or in the intrinsic activity within the network, drives also coordinated motion. It is possible to suppress the latter type of random motion by increasing the negative feedback in the system (Figure 3A,C). In choosing our optimization goal for the coupling parameters, we did not attempt to suppress coordinated drift for two reasons. First, coordinated drift is much less detrimental from the coding perspective than relative drifts, as discussed in the introduction. Second, suppressing the coordinated motion comes with an inevitable cost: a reduction in the gain of the system’s velocity response. Nevertheless, it is interesting to consider also the suppression of coordinated drift. Next, we briefly discuss the possible implementation of this goal.
For simplicity, consider a single onedimensional module, structured as a ring attractor: in this situation, there is only coordinated motion. As in any continuous attractor network, stochasticity of neural activity within the network drives diffusive motion of the bump’s position. This diffusive motion gradually degrades the fidelity of the stored memory (Burak and Fiete, 2012). In the double ring architecture (Xie et al., 2002), much of the drift arises from fluctuations in the difference of activity between the two subpopulations that drive left and right motion. Using the negative selfcoupling of velocities, introduced in this work, it is possible to suppress these fluctuations to substantially reduce the noisedriven diffusion and stabilize the represented memory. It is interesting to compare this mechanism with another proposal (Lim and Goldman, 2014) for stabilization of the memory stored in a single ring attractor, using negative derivative feedback (Lim and Goldman, 2013). In (Lim and Goldman, 2014) the stabilization slows down the dynamics of all neurons in the network, thereby slowing down the relaxation of any deformation in the shape of the activity bump – not only the position of the activity bump on the ring attractor. In contrast, within the architecture considered here, the unimodal shape of the activity bump is maintained, while the velocity feedback mechanism slows down only noise driven diffusion of its position. Thus, the velocity coupling mechanism identified in this work may be relevant to the stabilization of shortterm memory in head directions cells of rodents (Taube, 2007) and insects (Seelig and Jayaraman, 2015), where there is no evidence for slowly decaying deformations in the shape of the activity bump.
Experimental predictions
The grid spacing of a single module is determined by the coefficient that tunes how strongly activity is modulated by velocity (${\gamma}_{\mu}$ in Equation 21): larger values of ${\gamma}_{\mu}$ lead to smaller grid spacing. Thus, in an uncoupled network the grid spacing ratios are determined by the coefficients ${\gamma}_{\mu}$. However, in the network of coupled modules the spacing ratios are determined primarily by the intermodule coupling parameters. Each one of the coefficients ${\gamma}_{\mu}$ influences all the grid spacings, but has little effect on the spacing ratios. For example, even if the ${\gamma}_{\mu}$s are identical in all modules, or if only one module receives a velocity input, all modules shift their states with a velocity ratio that matches the desired grid spacing ratio. An interesting prediction arises under a scenario in which one of the modules is disconnected from the others. This removes positive couplings from the other modules, but leaves the negative self coupling within the module intact. Hence, the disconnected module is expected to weaken its response to velocity inputs, and increase its grid spacing. Similarly, other grid spacings, of modules that were originally connected to the disconnected module, are expected to increase as well (see Figure 4—figure supplement 1).
The joint activity of grid cells can be expected to lie within a twodimensional space when salient sensory cues are available to the animal, regardless of the existence of an intermodule coupling mechanism. The existence of a coupling mechanism must therefore be tested under conditions in which external sensory cues are weak (Burak, 2014). It is instructive to compare this goal with what has been learned about population activity within a single module (Yoon et al., 2013; Fyhn et al., 2007; Allen et al., 2014; Trettel et al., 2019; Gardner et al., 2019). In that context, simultaneous recordings from pairs of grid cells were highly informative, since grid cells from a single module exhibit strong correlations (or anticorrelations) in their joint spiking activity. The preservation of these correlations, under conditions in which the animal’s sense of position is disrupted, supports an interpretation that the correlations are maintained by recurrent connectivity within each module. In contrast, cells belonging to different modules are expected to fire together in some positions in space, and refrain from firing together in other parts of the environment. Averaged over motion in a large environment, cell pairs from different modules are expected to exhibit weak correlations in their activity, even if the updates of module phases are fully coordinated.
Analysis of spike correlations in grid cells from different modules (Trettel et al., 2019; Gardner et al., 2019) confirms this expectation. During free running, spike correlation functions of grid cells from different modules are much weaker than those observed within a module. Despite being weak, these correlations can be statistically significant. Their existence originates from the fact that in any specific environment, and especially in small enclosures, the firing fields of two grid cells with different spatial scales slightly favor correlated or uncorrelated firing, depending on the precise overlap between the spatial receptive fields of the two cells. During sleep, these weak correlations are significantly reduced (Trettel et al., 2019; Gardner et al., 2019). A possible interpretation of this result is that there is no coupling between modules during sleep. However, an alternative explanation is that the reduction in correlations reflects an increase in the repertoire of positions and environments represented in the sleep state: regions of joint firing and regions of disjoint firing are expected to average out more evenly under such circumstances, leading to weaker correlations.
In order to test for the existence of a velocity coupling mechanism, it is desirable to test for correlations in the updates of phases of different modules, instead of directly testing for correlations in their phases. This will require simultaneous recordings from multiple grid cells, of sufficient numbers that will enable reliable tracking of the module phases. Appropriate recordings are not yet available, but techniques that enable simultaneous monitoring of large neural populations of the MEC (Jun et al., 2017; Gu et al., 2018; Obenhaus et al., 2018) are likely to enable their acquisition in the coming years. In similarity to the experiments that provided insights on the low dimensionality of activity within each module, it will be necessary to test for intermodule coupling under conditions in which the animal’s internal sense of position is not anchored to salient external cues.
Our model assumes that grid cells in the MEC are involved in idiothetic path integration, and harnesses ingredients from models of path integration in the grid cell system to generate the coupling between modules. It is widely hypothesized that grid cells are indeed involved in path integration (Hafting et al., 2005; McNaughton et al., 2006; Moser et al., 2008; Burak, 2014), but this involvement is not experimentally established (see, however, Gil et al., 2018). Accordingly, a specific role of any particular cell type within the MEC in idiothetic path integration is not yet identified. A specific population of cells that may provide the substrate for the connectivity proposed in our model are the conjunctive cells observed mostly in layer III and deeper layers of the MEC (Sargolini et al., 2006), which play a pivotal role in models of path integration in the grid cell system. We note that these cells are tuned to head direction more closely than heading (Raudies et al., 2015), a difficulty that faces all models of path integration within the MEC. The resolution of this difficulty may involve computational elements within the entorhinal circuitry that have not yet been identified. Thus, future experimental findings concerned with the mechanisms underlying path integration may call for (and enable) corresponding refinements of our model.
Very little is known about synaptic connectivity between grid cells in the MEC, especially for cells belonging functionally to different modules. An important conclusion of our work is that synaptic connectivity between different modules may be beneficial for dynamically stabilizing the grid cell representation during path integration and memory maintenance. The specific form of connectivity that we identify is appealing for several reasons: first, it involves broad, relatively unstructured connectivity between grid cells, that depends only on their preferred heading preference. A second appealing feature of our proposed architecture is that it is sufficient to couple grid cells from modules with adjacent spacings, to achieve the desired stabilization of the grid cell representation. Since there is a relationship between grid spacing and position along the dorsalventral axis (Hafting et al., 2005; Stensola et al., 2012), alltoall couplings between modules would require longrange connectivity within the MEC. Recent evidence (Fu et al., 2018) hints that synaptic connections between excitatory cells in the MEC may be more limited in range, but of sufficient spatial extent to allow for coupling of adjacent modules.
Materials and methods
Model and simulations details
The dynamics of the network are described by Equation 3 (or by Equation 22 for the Poisson spiking neuron case). The synaptic time constant $\tau =10\phantom{\rule{thinmathspace}{0ex}}\mathrm{m}\mathrm{s}$, the transfer function $\varphi (x)={\tau}^{1}\mathrm{max}(x,0)$, and ${I}_{0}=3$. All simulations were done using the Euler method for integration, with a time step $dt=0.1\phantom{\rule{thinmathspace}{0ex}}\mathrm{m}\mathrm{s}$.
Onedimensional module
Request a detailed protocolIn the onedimensional simulations the synaptic activation vector (Equation 2) includes synapses of the right and left subpopulations, each comprising $N=1000$ neurons. Each neuron has a preferred phase ${\theta}_{i}\in [0,1]$, uniformly arranged on a ring. The connectivity matrix $W$ is defined by
where
${W}^{\pm}$ is a $N\times N$ matrix, $\phi =0.2$, $A=200$, ${\sigma}^{2}=0.1$, and ${{x}_{1}{x}_{2}}_{\mathrm{P}}$ is the minimal distance between two points $\{{x}_{1},{x}_{2}\}\in [0,1]$ with periodic boundary conditions on $[0,1]$, namely
Coupled modules
Request a detailed protocolConsider $m$ coupled modules. The firing rate (Equation 3) of neuron $i$ from module $\mu $ is (Figure 2E):
where ${\omega}_{\mu}$ is the velocity estimation of module μ (Equation 4) , ${C}_{\mu \rho}$ is the coupling strength from module $\rho $ to module μ (${C}_{\mu \mu}$ is the self coupling strength of module μ), and the sign ± is equal to + (−) if the neuron $i$ belongs to a right (left) subpopulation. The proportionality factor $a={\left[\beta {\sum}_{i=1}^{2N}{\varphi}^{\prime}\left({\sum}_{j=1}^{2N}{W}_{ij}{\overline{s}}_{j}+{I}_{0}\right)\right]}^{1}$ is included to simplify the units of the coupling strengths ${C}_{\mu \rho}$. Note that $a\cdot {\omega}_{\mu}$ does not depend on the parameter β. Thus, the choice of β is of no consequence for the dynamics, and this parameter is included only for the sake of derivation convenience.
The coupling between the modules can be interpreted as arising from synaptic connectivity. This can be seen by rewriting Equation 11 as:
where the $2N\times 2N$ coupling connectivity matrix is:
Thus, the synapses responsible for the coupling of any two neurons belonging to modules $\mu $ and $\rho $ are of the same magnitude, $a\beta {C}_{\mu \rho}/\tau $ , and their signs depend only on the subpopulations (left or right) of the pre and postsynaptic neurons. In the case of $m=2$, we use the coupling parameters: $Cs=20$, ${C}_{12}={C}_{1}\approx 14.14$ and ${C}_{21}={C}_{2}\approx 28.3$. In the case of $m=3$, we use the parameters: $Cs=20$, ${C}_{12}\approx 14.14$, ${C}_{21}={C}_{23}\approx 9.4$, ${C}_{32}\approx 28.3$, and ${C}_{13}={C}_{31}=0$. Additional details on the choice of coupling parameters are provided in Appendices 2 and 3.
Twodimensional modules
Request a detailed protocolIn two dimensions, each module contains four subpopulations, and the synaptic activation vector is:
Each subpopulation contains ${N}^{2}={64}^{2}$ neurons, arranged on a parallelogram. The preferred phase of the $i$’th neuron in each subpopulation is:
where ${x}_{i}$ and ${y}_{i}$ are distributed uniformly in the interval $[0,1]$. The connectivity matrix is now:
where
and
The distance measure $\cdot {}_{{\mathrm{P}}_{2}}$ is defined using periodic boundary conditions on the parallelogram (see Figure 4A): ${{\overrightarrow{x}}_{1}{\overrightarrow{x}}_{2}}_{{P}_{2}}$ is the minimal distance between the two points ${\overrightarrow{x}}_{1}$ and ${\overrightarrow{x}}_{2}$ on the torus that is created by gluing the opposite edges of the parallelogram defined by the vertices $(0,0),(1,0),(\frac{1}{2},\frac{\sqrt{3}}{2}),(\frac{3}{2},\frac{\sqrt{3}}{2})$ (Figure 4A, compare with Equation 10, used in the onedimensional case).
The firing rate of neuron $i\in$ {subpopulation R or L} from module $\mu $ is:
where
Firing rates of neurons from the up and down subpopulations are obtained from Equation 19Equation 20 by replacing $x\to y$, , $R\to U$, and $L\to D$.
In each module, responses to horizontal and vertical velocity inputs are independent: the right and left subpopulations respond to the horizontal velocity inputs, and affect ${\theta}_{x}$, while the up and down subpopulations respond to the vertical velocity inputs and affect ${\theta}_{y}$. Hence, the twodimensional response tensor separates into independent, horizontal and vertical components with the same structure as in the onedimensional case. Since the linear response tensor (in each direction) is identical to that of the onedimensional case, the coupling parameters are chosen in the same way in one and two dimensions.
External velocity input
Request a detailed protocolThe external velocity input to module $\mu $ (in two dimensions) in $q\in \{x,y\}$ direction is (Equation 19):
${\gamma}_{\mu}$ is a proportionality factor that depends on the module, ${V}_{q}(t)$ is a the component of the animal’s velocity in the $q$ direction, and ${\eta}_{\mu ,q}(t)$ is a white noise process with $\u27e8{\eta}_{\mu ,q}(t){\eta}_{\rho ,{q}^{\prime}}({t}^{\prime})\u27e9={\eta}^{2}{\delta}_{q{q}^{\prime}}{\delta}_{\mu \rho}\delta (t{t}^{\prime})$. In Figure 4B and Figure 6B–C the external input is not noisy, so $\eta =0$. In Figure 4D–E and Figure 5, $\eta =0.02\mathrm{m}\cdot {\mathrm{s}}^{0.5}$.
In Figure 3E–F (one dimension) ${\gamma}_{1}=0.06$ and ${\gamma}_{2}=0$. In the simulations of Figure 4; Figure 6 (two dimensions) ${\gamma}_{1}=0.06$ , ${\gamma}_{2}=\lambda 0.06$ , ${\gamma}_{3}={\lambda}^{2}0.06$. Thus, even without coupling of the modules, the spacing ratio $\lambda $ is achieved by the ratios of the inputs strengths ${\gamma}_{\mu}$, and therefore, we can compare between the coupled and uncoupled phases and readout (Figure 4; Figure 6).
Spiking network
Request a detailed protocolIn the case of spiking Poisson neurons Equation 3 is replaced by:
where
is the spike train of neuron $i$, and ${t}_{i}^{\chi}$ are the spike times. Each neuron $i$ generates spikes sampled from a Poisson distribution with a firing rate ${r}_{i}(t)$, as defined in Equation 3 (Stevens and Zador, 1996; Gerstner and Kistler, 2002; Burak and Fiete, 2012).
Decoding
Request a detailed protocolDecoding of the animal’s trajectory, based on spike trains, is performed in Figure 5; Figure 6 using a decoder that sums spikes from recent history with an exponential temporal kernel (Mosheiff et al., 2017). In Figure 5, we simulate a spike train for each neuron (Equation 23), sampled from an inhomogeneous Poisson process with a firing rate ${r}_{i}(t)$ (note that the the network dynamics are deterministic and spikes are used only in the readout process). In Figure 6, the stochastic spike train is part of of the dynamics (Equation 22). In both cases, the decoded location of the animal in time $t$ is (Eqs. S10, S11 and S13 in Mosheiff et al., 2017):
The summation is over all neurons. Here, $\tau}_{d}=10\phantom{\rule{thinmathspace}{0ex}}\mathrm{m}\mathrm{s$ for all modules. The integral in Equation 24 yields an effective spike count of neuron $i$, weighted in time using a decaying exponential kernel, and ${\overline{r}}_{i}(\overrightarrow{x})$ is the receptive field of neuron $i$ at location $\overrightarrow{x}$, measured separately from the firing rate of each neuron in the steady state of the dynamics.
Diffusion tensor
Request a detailed protocolConsider a system of spiking Poisson neurons, and $m$ onedimensional modules. The internal noise introduces a diffusive drift. The network is now a single continuous attractor with dimension $m$ (see Appendix 5). Hence, the diffusion tensor is a $m\times m$ matrix, that can be calculated using Eq. S24 in Burak and Fiete (2012), for the dynamics of the $m$ dimensional attractor (Equation 77, Equation 78 and Equation 79 in Appendix 5):
where the summation is over all neurons, ${\overline{r}}_{i}(\overrightarrow{\theta})$ is the firing rate of neuron $i$ in the steady state of the system, and ${\nu}_{\mu}(\overrightarrow{\theta})$ is the left null eigenvector of the dynamics (Equation 78) corresponding to a phase shift in the direction of module $\mu $ (calculated numerically).
Appendix 1
Readout of the phase velocity
In this Appendix, we derive the relationship between $\omega $ (Equation 4) and the phase velocity $\dot{\theta}$ (Equation 1) . To simplify the presentation, we consider separately two situations: first, a noise free network, in which the phase velocity is driven by velocity inputs. Second, a module consisting of Poisson spiking neurons where, for simplicity, we set the velocity inputs to be zero. In the latter case the phase velocity is driven by the stochasticity of the neurons participating in the network dynamics. From the derivation it is easily seen that in the general case, and as long as the linearization of the dynamics (forming the basis of the calculation in both cases) is valid, the phase velocity, as well as $\omega $ can be expressed as a sum of independent contributions arising from the two sources considered below.
Phase velocity due to velocity inputs
Consider a single one dimensional module, whose dynamics follow Equation 3, and whose synaptic connectivity is described by the double ring model, as specified by Equations 810. We expand the dynamic equation around a steady state $\overline{s}(\theta )$, so that ${s}_{i}={\overline{s}}_{i}(\theta )+\delta {s}_{i}$, and assume small velocity input $dI(t)$ (see similar expansion in Burak and Fiete, 2012). After linearization:
where
and ${\overline{g}}_{i}(\theta )={\sum}_{j=1}^{2N}{W}_{ij}{\overline{s}}_{j}+{I}_{0}$ is the synaptic input in the steady state ${\overline{s}}_{i}(\theta )$. The velocity of the phase $\theta $ is obtained by projection of Equation 26 on the left eigenvector of $K$ with zero eigenvalue, $\overrightarrow{v}(\theta )$ (Burak and Fiete, 2012). After some algebra we obtain
where
Hence, $\dot{\theta}$ is proportional to the velocity input $dI(t)$. Note that both $\overrightarrow{v}$ and $\overline{g}$ rotate together with the position $\theta $ of the activity bump. Thus, $\alpha $ is independent of $\theta $.
Let us define the $2N$ dimensional vector:
where $\beta $ is a constant, whose value is determined below. The projection of Equation 26 on ${\overrightarrow{v}}_{0}$ results in
The second term on the right hand side of Equation 31 vanishes due to the symmetry of ${\varphi}^{\prime}({\overline{g}}_{i})$ to exchange of the right and left subpopulations, the antisymmetry of ${v}_{0}$ to this exchange, and the structure of the matrix $W$. Using the definition of $\omega $ (Equation 4), we can write:
Thus, Equation 31 becomes:
where
In Equation 34 we use again the symmetry of ${\varphi}^{\prime}({\overline{g}}_{i})$ to exchange of the right and left subpopulations. We set $\beta $ (Equation 30) such that $\stackrel{~}{\alpha}=\alpha $. Using Equation 29 and Equation 34:
From Equation 33 we see that the readout velocity $\omega $ is a convolution of $\alpha dI(t)$, with the filter
Combining Equation 28 and Equation 33 we conclude that
Hence, $\omega $ is equal to the phase velocity, smoothed over a relatively short time scale set by the synaptic time constant ($\tau =10\phantom{\rule{thinmathspace}{0ex}}\mathrm{m}\mathrm{s}$).
Phase velocity due to internal noise
Consider next a single one dimensional module with Poisson spiking neurons, following the dynamics of Equation 22. In the limit of large firing rates, Equation 22 can be replaced by
where ${\xi}_{i}$ is a white noise process with $\u27e8{\xi}_{i}(t){\xi}_{j}({t}^{\prime})\u27e9={r}_{i}(t){\delta}_{ij}\delta (t{t}^{\prime})$ (Burak and Fiete, 2012). We use a similar expansion as in section Phase velocity due to velocity inputs in this Appendix , and project the linearized dynamics on $\overrightarrow{v}(\theta )$, the left null eigenvector of $K$, to obtain:
The first term on the right hand side of Equation 39 is the contribution to the phase velocity arising from the velocity inputs, discussed above. From here on we assume that the velocity input vanishes. The only contribution to the phase velocity is then the second term of Equation 39, originating from the neural stochasticity. In order to relate this term to $\omega $, let us define:
where ${\overrightarrow{v}}_{R}$ and ${\overrightarrow{v}}_{L}$ are the first or last $N$ coordinates of $\overrightarrow{v}(\theta )$, respectively (Appendix 1—figure 1). Equation 39 becomes:
Next, we examine the structure of ${\overrightarrow{v}}_{+}$ and ${\overrightarrow{v}}_{}$. The left and right components of ${v}_{+}$ are spatially antisymmetric with respect to the peak of the activity bump, and precisely vanish at the center of the bump (yellow trace in Appendix 1—figure 1B). The components of ${v}_{}$ are symmetric (blue and red traces). We note that the Poisson noise is expressed only within the relatively narrow extent of the activity bump (dashed line). Two consequences follow: first, ${\overrightarrow{v}}_{+}^{T}\cdot \overrightarrow{\xi}$ is very small due to the nearly vanishing values of ${v}_{+}$ on the activity bump. Second, the components of ${v}_{}$ are nearly constant in magnitude along the narrow extent of the bump, but have opposite signs in the left and right subpopulations. This magnitude can be evaluated by rewriting Equation 35 as:
As the magnitude of ${v}_{}$ is nearly constant along the bump, we conclude from Equation 43 that this magnitude equals approximately $\beta $, and ${\overrightarrow{v}}_{}^{T}\cdot \overrightarrow{\xi}\approx {\overrightarrow{v}}_{0}^{T}\cdot \overrightarrow{\xi}$. Hence, Equation 42 becomes:
We now project the linearized dynamics of Equation 38 on the vector ${\overrightarrow{v}}_{0}$ to obtain:
Thus, $\omega $ is approximately a convolution of the velocity phase with the exponential filer $f$, with the synaptic time integration $\tau $. Therefore:
as in the case of phase velocity due to external inputs.
Appendix 2
Linear response tensor
We next wish to understand how our system of coupled modules responds to an external velocity input $\overrightarrow{b}(t)$. First, consider two modules, each in one dimension, that follow the dynamics of Equation 3. The synaptic activation vector is now $4N$ dimensional and has the form:
The firing rate of neuron $i$ from module 1 is (Figure 2E):
where the total velocity input is $dI={b}_{1}+a{C}_{1}{\omega}_{2}+a{C}_{s}{\omega}_{1}$. A similar equation for the rate ${r}_{2,i}$ is obtained by switching the indices $1\leftrightarrow 2$ in Equation 48. The constant
is a proportionality factor that could in principle be absorbed into the definition of the coupling strength parameters, but makes the units of the coupling strengths ${C}_{\mu}$ more convenient. Substitution of $dI$ and Equation 37 in Equation 28 yields:
More generally, for $m$ modules:
Thus, the phase velocities are proportional to the external velocity input, plus the velocities of all other modules filtered in time, and coupled by the matrix C. If we assume that the velocity is sufficiently small, that the position can be regarded as fixed within the synaptic time scale, the convolution with the filter $f$ can be omitted (note that the integral of $f$, Equation 36 , is equal to unity), and we obtain:
where $X$ is the linear response tensor.
B.1 Two modules
In the case of two modules and identical self coupling strength ${C}_{s}$ for both modules, we can easily find the eigenvalues ${X}_{\pm}$ and eigenvectors of $X$. The coupling matrix in this case is:
Using Equation 7 we obtain:
The eigenvectors of $X$ are:
Thus, the phase motion of the two modules can be represented as:
where
and
Here, ${\theta}_{\pm}$ are the joint or relative phases of the modules, respectively. We choose parameters such that ${X}_{+}$ is large and ${X}_{}$ is small in order to obtain large joint motion and small relative motion between modules for any external input. Note that ${X}_{}\approx 0$ implies that ${\dot{\theta}}_{}\approx 0$ (Equation 56), and in this case (Equation 57):
Thus, we can set the velocity ratio between modules (namely, the spacing ratio $\lambda $), by the ratio of the coupling parameters, using Equation 59. We choose
We choose ${X}_{+}=\alpha $ to maintain the same response to coordinated velocity inputs, as in the case of no coupling. Using Equation 54 we see that this requires ${C}_{s}+\sqrt{{C}_{1}{C}_{2}}=0$, which implies
and
To obtain small ${X}_{}$, ${C}_{s}$ should be negative, with a large absolute magnitude. Note that a large positive magnitude of ${C}_{s}$, although seems here as a suitable choice, would cause instability of the system (discussed in Appendix D).
Appendix 3
Optimization of the coupling strengths for m modules
We now wish to find appropriate coupling strengths, ${C}_{\mu \rho}$, in the general case of $m$ modules, while allowing for different self couplings in each module. We optimize ${C}_{\mu \rho}$ as follows.
Let $\overrightarrow{u}$ be a vector in the direction of the mutual motion of $\overrightarrow{\theta}$. The ratios of the components of $\overrightarrow{u}$ are the ratios between corresponding grid spacings. We demand that $\overrightarrow{u}$ is an eigenvector of $X$ (Equation 7) with an eigenvalue $\alpha $. Under this choice, the motion response in the mutual direction is the same as without coupling (where $\dot{\overrightarrow{\theta}}=\alpha \overrightarrow{b}$, Equation 52). Any other eigenvalue of $X$ should be close to zero, such that the motion in any direction, other than the mutual direction, is small.
Equivalent requirements are that $\overrightarrow{u}$ is an eigenvector of $C$ with eigenvalue , and any other eigenvalue of $C$ has a large magnitude (see Equation 52). In addition, as discussed below (Stability conditions) all the eigenvalues of $C$ must be smaller than unity, which excludes the possibility that they take large positive values. In order to find the optimal coupling parameters we apply the KarushKuhnTucker theorem with the Lagrangian:
The first term of the Lagrangian favors negative eigenvalues with large absolute magnitude, whereas the second term, involving $m$ Lagrange multipliers ${\u03f5}_{i}$, enforces the existence of the eigenvector $\overrightarrow{u}$ with eigenvalue . As the coupling magnitudes cannot be infinitely large, we constrain the self couplings by requiring that ${C}_{\mu \mu}^{2}<Q$ , where $Q>0$ is a constant. This constraint is enforced by the third term in Equation 63, where ${\zeta}_{\mu}\ge 0$’s are KarushKuhnTucker multipliers. We thus obtain the equations:
which results in:
For $m=2$ and $\overrightarrow{u}=\left(\begin{array}{c}\hfill 1\hfill \\ \hfill \lambda \hfill \end{array}\right)$ we obtain a single solution:
which is identical to the solution we found in Appendix B.1.
For $m>2$ and a given vector $\overrightarrow{u}$ there are multiple solutions. The general solution follows the equations:
for every $\mu $. As there are ${m}^{2}$ parameters and $2m$ equations to follow, there are ${m}^{2}2m$ degrees of freedom in the solution.
For example, for $m=3$ and ${u}_{i}={\lambda}^{i1}$, the solutions are the all ${C}_{\mu \rho}$’s that obey:
Making the additional choice that only successive modules are coupled yields the solution:
where there is a freedom in choosing ${C}_{23}$. In Figure 4, Figure 5 and Figure 6 we chose ${C}_{23}={C}_{21}$.
For general number of modules $m$, coupling only successive modules leads to a degeneracy of solutions with $m2$ degrees of freedom, since the matrix ${C}_{\mu \rho}$ includes $3m2$ nonvanishing entries and we solve $2m$ equations. The remaining degeneracy can be resolved by requiring identical coupling strengths to each module ($1<\mu <m$) from its neighbors: ${C}_{\mu ,\mu +1}={C}_{\mu ,\mu 1}$. The solution is then:
where ${C}_{0}=\lambda {C}_{s}/(1+{\lambda}^{2})$.
In principle, other constraints on the coupling strengths can be used instead of the requirement that ${C}_{\mu \mu}^{2}<Q$ for all $\mu $. For example, limiting the magnitude of all the coupling strengths (${C}_{\mu \rho}^{2}<Q$), instead of only the magnitude of the self couplings, yields solutions with smaller range of allowed parameters (in the case of multiple solutions), but otherwise identical structure. A different possible constraint is of the form ${\sum}_{\mu ,\rho =1}^{m}{C}_{\mu \rho}^{2}<Q$. This constraint results in a symmetric solution: ${C}_{\mu \rho}\propto \frac{{u}_{\mu}{u}_{\rho}}{{\sum}_{k}{u}_{k}^{2}}{\delta}_{\mu \rho}$. We prefer the constraint in Equation 63 as we do not see a compelling reason to limit all of the coupling parameters together instead of limiting them separately.
Appendix 4
Stability conditions
To map the stability conditions of the coupled network, we consider the dynamics of the velocity estimation $\overrightarrow{\omega}$ (Equation 33). As Equation 33 is obtained by projecting the full dynamics of the network on the vector ${\overrightarrow{v}}_{0}$, stable network would result in finite $\overrightarrow{\omega}$. We substitute the total velocity input $d\phantom{\rule{negativethinmathspace}{0ex}}\overrightarrow{I}=aC\cdot \overrightarrow{\omega}+\overrightarrow{b}(t)$ in Equation 33 and obtain:
Thus, in order to maintain the stability of $\overrightarrow{\omega}$, as well as the stability of the full network, all eigenvalues of $C$ must be smaller than unity, and all eigenvalues of $X$ (Equation 7) must be positive.
Appendix 5
The coupled network as a single attractor
In this Appendix we consider the system as a single neural network with recurrent connectivity that includes both the inter and intramodular synaptic connections, instead of thinking of the system as composed of $m$ coupled attractors (modules). The derivation of the system dynamics presented here, is necessary in order to calculate the diffusion tensor (Equation 25). In addition, it offers an alternative approach for obtaining the linear response tensor $X$, and provides some additional insights on the behaviour of the network.
We expand the dynamics of Equation 3 around the steady state $\overline{s}(\overrightarrow{\theta})$ (see similar expansion in Appendix A). Here $\overrightarrow{\theta}$ is the vector of represented phases in all modules, where ${\theta}_{\mu}$ is the phase of module $\mu $. The dynamics of neuron $i$ from module $\mu $ (for simplicity, we assume one dimensional modules) can be written as:
where the sign ± is + () if neuron $i$ belongs to a right (left) subpopulation. Here, ${K}_{m}$ is a $2Nm\times 2Nm$ matrix:
where $K(\theta )$ is defined in Equation 27, and
As the system is a continuous attractor of dimension $m$, there exist $m$ left eigenvectors of ${K}_{m}$ with zero eigenvalue, ${\overrightarrow{\nu}}_{\mu}(\overrightarrow{\theta})$, which we compute numerically. The diffusion tensor is computed in Equation 25, using these vectors ${\overrightarrow{\nu}}_{\mu}(\overrightarrow{\theta})$.
By projecting Equation 77 on ${\overrightarrow{\nu}}_{\mu}(\overrightarrow{\theta})$ we obtain the linear response tensor $X$, such that Equation 6 is valid. Explicitly:
where,
and $i\in \rho $ means that the $i$th neuron belongs to module $\rho $. Using this approach numerically, yields nearly identical results as Equation 52.
References
 1
 2

3
Experiencedependent rescaling of entorhinal gridsNature Neuroscience 10:682–684.https://doi.org/10.1038/nn1905
 4
 5

6
Spatial coding and attractor dynamics of grid cells in the entorhinal cortexCurrent Opinion in Neurobiology 25:169–175.https://doi.org/10.1016/j.conb.2014.01.013

7
Accurate path integration in continuous attractor network models of grid cellsPLOS Computational Biology 5:e1000291.https://doi.org/10.1371/journal.pcbi.1000291
 8

9
A singlecell spiking model for the origin of gridcell patternsPLOS Computational Biology 13:e1005782.https://doi.org/10.1371/journal.pcbi.1005782
 10

11
Time cells in the Hippocampus: a new dimension for mapping memoriesNature Reviews Neuroscience 15:732–744.https://doi.org/10.1038/nrn3827

12
What grid cells convey about rat locationJournal of Neuroscience 28:6858–6871.https://doi.org/10.1523/JNEUROSCI.568407.2008

13
Program No. 510.07. 2018 Neuroscience Meeting PlannerLayer specific characterization of local projections within the medial entorhinal cortex, Program No. 510.07. 2018 Neuroscience Meeting Planner, San Diego, Society for Neuroscience.

14
A spin glass model of path integration in rat medial entorhinal cortexJournal of Neuroscience 26:4266–4276.https://doi.org/10.1523/JNEUROSCI.435305.2006
 15

16
Correlation structure of grid cells is preserved during sleepNature Neuroscience 22:598–608.https://doi.org/10.1038/s4159301903600

17
Spiking Neuron Models: Single Neurons, Populations, PlasticityCambridge university press.

18
Impaired path integration in mice with disrupted grid cell firingNature Neuroscience 21:81–91.https://doi.org/10.1038/s4159301700393
 19
 20

21
A model of grid cells based on a twisted torus topologyInternational Journal of Neural Systems 17:231–240.https://doi.org/10.1142/S0129065707001093
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31

32
Balanced cortical microcircuitry for maintaining information in working memoryNature Neuroscience 16:1306–1314.https://doi.org/10.1038/nn.3492

33
Balanced cortical microcircuitry for spatial working memory based on corrective feedback controlJournal of Neuroscience 34:6790–6806.https://doi.org/10.1523/JNEUROSCI.460213.2014
 34

35
Optimal population codes for space: grid cells outperform place cellsNeural Computation 24:2280–2317.https://doi.org/10.1162/NECO_a_00319

36
Resolution of nested neuronal representations can be exponential in the number of neuronsPhysical Review Letters 109:018103.https://doi.org/10.1103/PhysRevLett.109.018103

37
Path integration and the neural basis of the 'cognitive map'Nature Reviews Neuroscience 7:663–678.https://doi.org/10.1038/nrn1932

38
Place cells, grid cells, and the brain's spatial representation systemAnnual Review of Neuroscience 31:69–89.https://doi.org/10.1146/annurev.neuro.31.061307.090723
 39

40
The effects of changes in the environment on the spatial firing of hippocampal complexspike cellsThe Journal of Neuroscience 7:1951–1968.https://doi.org/10.1523/JNEUROSCI.070701951.1987

41
Program No. 689.06. 2018 Neuroscience Meeting PlannerMiniaturized twophoton microscopy enables the study of functional network topography in the medial entorhinal cortex, Program No. 689.06. 2018 Neuroscience Meeting Planner, San Diego, Society for Neuroscience.

42
Emergent elasticity in the neural code for spacePNAS 115:E11798–E11806.https://doi.org/10.1073/pnas.1805959115
 43
 44
 45
 46

47
Ten years of grid cellsAnnual Review of Neuroscience 39:19–40.https://doi.org/10.1146/annurevneuro070815013824
 48
 49

50
Noise, neural codes and cortical organizationCurrent Opinion in Neurobiology 4:569–579.https://doi.org/10.1016/09594388(94)900590
 51
 52

53
Grid cells generate an analog errorcorrecting code for singularly precise neural computationNature Neuroscience 14:1330–1337.https://doi.org/10.1038/nn.2901
 54

55
When is an integrateandfire neuron like a poisson neuron?In: D. S Touretzky, M. C Mozer, M. E Hasselmo, editors. Advances in Neural Information Processing Systems, 8. MIT Press. pp. 103–109.
 56

57
The head direction signal: origins and sensorymotor integrationAnnual Review of Neuroscience 30:181–207.https://doi.org/10.1146/annurev.neuro.29.051605.112854
 58
 59

60
Selforganization of modular activity of grid cellsHippocampus 27:1204–1213.https://doi.org/10.1002/hipo.22765

61
Robust and efficient coding with grid cellsPLOS Computational Biology 14:e1005922–e1005928.https://doi.org/10.1371/journal.pcbi.1005922
 62
 63
 64

65
Doublering network model of the headdirection systemPhysical Review E 66:041902.https://doi.org/10.1103/PhysRevE.66.041902

66
Specific evidence of lowdimensional continuous attractor dynamics in grid cellsNature Neuroscience 16:1077–1084.https://doi.org/10.1038/nn.3450

67
Representation of spatial orientation by the intrinsic dynamics of the headdirection cell ensemble: a theoryThe Journal of Neuroscience 16:2112–2126.https://doi.org/10.1523/JNEUROSCI.160602112.1996
Decision letter

Stephanie PalmerReviewing Editor; University of Chicago, United States

Laura L ColginSenior Editor; University of Texas at Austin, United States

Ann M HermundstadReviewer; Howard Hughes Medical Institute, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "Velocity coupling of grid modules enables stable embedding of a low dimensional variable in a high dimensional attractor" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Laura Colgin as the Senior Editor. The following individual involved in review of your submission has agreed to reveal her identity: Ann M Hermundstad (Reviewer #2).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
Summary:
This paper develops a model in which grid modules are formed by a standard attractor mechanism, and are also coupled together by recurrent interactions to coordinate stochastic drift (caused by noisy velocity inputs and/or spiking noise) between the modules. Independent drift in grid modules would severely impair the fidelity of their spatial representation (as is shown elegantly in this work), so that if grid cells represent space, it is critical to coordinate the drift in modules. The authors show that a simple mechanism – excitatory interaction between modules – can solve this problem, and suggest experimental tests of the theory. In particular, the theory makes interesting predictions arising from the inactivation of a subset of the modules (e.g. selfmotion integration in the remaining modules should be affected in a specific way).
Essential revisions:
1) The choice of readout mechanisms might affect the results – hence this key aspect should be better explored.
2) The system's performance and error should be better characterized – as these are the main observables used to draw conclusions about the effectiveness of the proposed model.
Specific points concerning these first two essential revisions (followed by more essential revisions below):
a) It would be good to provide further intuition on the structure of the C_ij that achieved "satisfactory" performance. In particular, even if one constrains the coupling to lie between neighboring modules, the structure of the C_ij was not entirely clear. Is it alltoall? Does this coupling vary in any interesting way with separation on the neural sheet? Can this coupling be constrained to be somewhat local on the neural sheet and still achieve good results?
b) The manuscript's main argument is that small differences in the phases of individual modules can lead to large errors in the estimate of position that is read out from these modules; by choosing the appropriate coupling between modules, one can reduce deviations in the phases between different modules, and in turn reduce the frequency of large readout errors. It appears that the existence and degree of these discontinuities is due at least in part to the choice of readout. As argued in the framing of the paper and shown in Figures 4D, 5B, these discontinuities are not present in the phases of individual modules, but rather arise in the readout stage. These discontinuities occur within a single timestep (at least as far as one can tell from Figure 5B), suggesting that a readout with a slightly longer timescale could potentially alleviate some of these errors. The form of the readout was derived by the same authors in a previous paper; in that paper, it was shown that the optimal readout should have different timescales for grid modules with different spacings, ranging from 1ms to 600ms. However, here a readout with a single timescale of 10ms was chosen for all modules. Based on this previous work, it's not clear why this choice was made, and what implications this has for the extent of readout errors that are shown in Figure 5B. To make the strongest argument possible, an optimal version of the readout should be used (with appropriatelychosen timescales for different modules) in order to construct the strongest null model in the absence of coupling. As it is now, it's unclear whether the observed readout errors arise from the construction of readout itself, rather than the lack of couplings. Given that this is the primary argument for necessitating couplings between modules, this should be addressed.
c) The paper would be stronger if the analyses provided a better characterization of performance and error. As it's written now, it's difficult to gain an intuition as to why catastrophic errors are occurring when they do, and how much gradual error is accumulated between these catastrophic errors. There are some small fixes that could help (see minor points). Beyond these, there are some larger points that could be made more clear. Why does the first catastrophic error occur when it does (i.e., what is the significance of the timescale marked by the dotted line in Figure 5CD)? This timescale might have been expected to relate to the distance that must be traveled before the three grid spacings are likely to overlap, given the scale of the noise; is this the case? If so, one would have expected a sharper drop off in the percentage of successes in Figure 5C. If not, is there an intuition as to why the errors occur when they do? Does gradual drift accumulate much more rapidly in the uncoupled model that in the coupled model? This itself is interesting and would be valuable to quantify.
d) Finally, how does performance scale with the number of modules? A general theoretical model was derived for m modules but results are shown for only 2 and 3 modules (and because these two cases were illustrated in one and two dimensions, respectively, it's not possible to compare directly between them). Larger numbers of modules could presumably lead to a couple of different scenarios that might impact the conclusions in different ways: i) more modules could lead to a reduction in readout errors (even in the absence of coupling), because the readout is averaging over inconsistencies among modules; this would suggest that some of observed error could be alleviated with additional modules, or ii) more modules could lead to an increase in readout errors via the additional intermodule recurrence (e.g. in cases where assumptions about linearity or small input velocity break down). It would be useful to report more on these type of analyses and show how they depend on the source/magnitude of noise. Can these findings be interpreted in terms of the actual numbers of modules in mEC (which would be helpful to report somewhere)?
3) Provide, in the main text, more details/intuition about the choice of inter and intramodule couplings, which is indeed a fundamental aspect of the work. (See also point 1 and 2a above.)
A key point of the theory is that modules should be coupled in a phaseindependent manner, and that the coupling should only depend on the rate of change of the phase. There are two things that could be shown, perhaps in Figure 1 and incorporated into Figure 2:
– show the detrimental effect of phasedependent coupling for position encoding
– recap the properties of selfmotion integration networks (e.g. Seung's model used throughout the paper), showing the velocity dependent tuning of the left and right network and how a simple set of synaptic weights between modules can extract this information.
4) Discuss the role of landmarks (e.g. boundaries) in path integration – this important aspect did not receive enough attention throughout the manuscript.
For example, the section on "Intrinsic neural noise" shows how the coupling between modules coordinates their drift. An alternate idea might have been to largely shut down the drift via interaction with border cells (or more generally with boundaries and landmarks). A comment on this alternative would be in order in the Results and in the Discussion since there has been a good bit of work on it lately. These kinds of error correction mechanisms that people have considered so far still work separately on each module, and so a coordination mechanism such as the one in the present paper will still be necessary because even a small amount of independent drift between modules can cause problems with the spatial representation.
5) An additional figure should be added at the end of the paper, illustrating the nontrivial prediction related to the inactivation of one or more grid cell modules. This would help make the work for broadly accessible, and would strengthen the discussion.
6) Discuss the possibility that other (published) mechanisms might be at work for the selforganized emergence of modular structure (which is otherwise assumed in the current manuscript).
In particular, a potential overlap is with recent work on the role of interactions with borders and landmarks in anchoring and reducing drift in the grid system (e.g. Keinath et al., eLife; Ocko et al., PNAS; Hardcastle et al., Neuron; Pollock et al., bioRxiv). The present paper is complementary in that it suggests a way of coordinating drift between modules, rather than reducing the overall amount of the drift. Indeed the border interaction mechanisms may not reduce drift enough on their own, and the coordination mechanism described in the present paper may be necessary also – both mechanisms could be in play. Some commentary would be worthwhile in the Discussion as an interesting direction for the future is to ask how border interactions and intermodule interactions could work together.
https://doi.org/10.7554/eLife.48494.019Author response
Essential revisions:
1) The choice of readout mechanisms might affect the results – hence this key aspect should be better explored.
2) The system's performance and error should be better characterized – as these are the main observables used to draw conclusions about the effectiveness of the proposed model.
Specific points concerning these first two essential revisions (followed by more essential revisions below):
a) It would be good to provide further intuition on the structure of the C_ij that achieved "satisfactory" performance. In particular, even if one constrains the coupling to lie between neighboring modules, the structure of the C_ij was not entirely clear. Is it alltoall? Does this coupling vary in any interesting way with separation on the neural sheet? Can this coupling be constrained to be somewhat local on the neural sheet and still achieve good results?
There are two different ways in which the term ‘alltoall connectivity’ can be used in relation to our work: one context is when considering how two modules are coupled: every neuron in one module is coupled to every neuron in the other module. A second context has to do with the coupling strengths between modules, quantified byCμρ. We focus first on. Cμρ
The optimization goal that we formulated (Equation 63 and surrounding text) involves only the diagonal elements of the matrix C, so it does not determine the nondiagonal elements. Consequently, the optimization yields m equations, one for each element on the diagonal. The nondiagonal elements, together with the diagonal elements, are constrained by another set of m equations, stating that u→ is a null eigenvector of C. Since the number of parameters in C is quadratic in m, whereas the number of equations is linear, the solution is highly degenerate for anym>2. This degeneracy means that our optimization procedure cannot determine the structure ofCμρ, but we can seek solutions with additional properties. General solutions involve alltoall connectivity between modules, but we have chosen to work with a solution in which modules are only coupled to their adjacent modules. This is motivated by the fact that there is evidence for topographic organization of modules on the cortical sheet, and for a limited range of excitatory connections within the MEC (see Discussion). Thus, a solution in which the connectivity is only between adjacent modules may be more compatible (compared to other solutions) with what is currently known about the anatomy. However, the data is not yet conclusive on these questions.
We now write in detail (Appendix 3) the solution for m modules, and explicitly discuss the degeneracy of the solution. We then explicitly provide the matrix C obtained under the additional restriction that only adjacent modules are coupled (Equation 75).
When considering the coupling between neurons in any two modules, the connectivity is alltoall in the sense that every neuron in one module is synaptically coupled to every neuron in the other module (assuming that the corresponding Cμρ is nonvanishing). This form of connectivity is central to our proposal, as it entails complete symmetry to shifts in positions of the activity bumps in the two modules. The symmetry ensures that the set of steady states will involve any combination of bump positions, thereby preserving the large capacity of the code. This is now clarified in subsection “Coupling modules by synaptic connectivity”.
b) The manuscript's main argument is that small differences in the phases of individual modules can lead to large errors in the estimate of position that is read out from these modules; by choosing the appropriate coupling between modules, one can reduce deviations in the phases between different modules, and in turn reduce the frequency of large readout errors. It appears that the existence and degree of these discontinuities is due at least in part to the choice of readout. As argued in the framing of the paper and shown in Figures 4D, 5B, these discontinuities are not present in the phases of individual modules, but rather arise in the readout stage. These discontinuities occur within a single timestep (at least as far as one can tell from Figure 5B), suggesting that a readout with a slightly longer timescale could potentially alleviate some of these errors. The form of the readout was derived by the same authors in a previous paper; in that paper, it was shown that the optimal readout should have different timescales for grid modules with different spacings, ranging from 1ms to 600ms. However, here a readout with a single timescale of 10ms was chosen for all modules. Based on this previous work, it's not clear why this choice was made, and what implications this has for the extent of readout errors that are shown in Figure 5B. To make the strongest argument possible, an optimal version of the readout should be used (with appropriatelychosen timescales for different modules) in order to construct the strongest null model in the absence of coupling. As it is now, it's unclear whether the observed readout errors arise from the construction of readout itself, rather than the lack of couplings. Given that this is the primary argument for necessitating couplings between modules, this should be addressed.
The decoder that we use in the present manuscript was developed in Mosheiff et al., 2017, and its MSE in each module was quantified as a function of the time scale of readout. The reason why we used a time scale of 10 ms is that with the parameters that we use in the present work (see below), the optimal time scale for readout is very short (smaller than 1ms in all three modules), whereas readout on a time scale of 10 ms seems more easy to implement biologically.
The comment above raises an interesting concern, that the time scale of readout might influence the rate at which catastrophic readout errors occur, and thereby significantly influence the MSE of readout. To address this question, we repeated the analysis with different readout time scales, both shorter and longer than 10ms (1ms, 10ms, and 100ms). The results are shown in a new supporting figure (Figure 5—figure supplement 1), and discussed briefly in paragraph three of subsection “Consequences for spatial representation and readout”. We find that the time scale of readout has very little influence on occurrence of catastrophic readout errors (Figure 5—figure supplement 1A), or on the qualitative behavior of the MSE curves (Figure 5—figure supplement 1B). Therefore, the choice of the readout time scale is not essential in the context of the present work.
The main goal of Mosheiff et al., 2017 was to propose a normative explanation for a trend seen in the experimental data (Stensola et al., 2012), according to which modules with larger spacing appear to include a smaller number of neurons. However, it is not yet clear whether this trend is representative of the true distribution of cells in the MEC, or whether it was a result of an experimental bias. We think that in the present work, that addresses different questions, it makes more sense to make the simple choice of identical number of neurons in each module, which is why the optimal time scales for readout are small in all modules.
c) The paper would be stronger if the analyses provided a better characterization of performance and error. As it's written now, it's difficult to gain an intuition as to why catastrophic errors are occurring when they do, and how much gradual error is accumulated between these catastrophic errors. There are some small fixes that could help (see minor points). Beyond these, there are some larger points that could be made more clear. Why does the first catastrophic error occur when it does (i.e., what is the significance of the timescale marked by the dotted line in Figure 5CD)? This timescale might have been expected to relate to the distance that must be traveled before the three grid spacings are likely to overlap, given the scale of the noise; is this the case? If so, one would have expected a sharper drop off in the percentage of successes in Figure 5C. If not, is there an intuition as to why the errors occur when they do? Does gradual drift accumulate much more rapidly in the uncoupled model that in the coupled model? This itself is interesting and would be valuable to quantify.
We added a panel in Figure 4F, quantifying how relative errors between modules accumulate over time, averaged over multiple trials. The parameters are the same as those used in Figure 5. This panel is discussed in paragraph five of subsection “Generalization to two dimensions and several modules”.
The role of the vertical dotted line in Figures 5CD is to point out the strong relationship between large MSE and the occurrence of catastrophic readout errors. However, the time at which the first catastrophic error occurs depends not only on dynamic properties of the system, but also on the number of simulations performed: in our case, it roughly corresponds to the time at which the probability for occurrence of a catastrophic error is of order 1%, since we ran 100 simulations. To the left of the dashed line the probability to have a catastrophic error does not vanish, it’s simply very small. This is now clarified in the text (paragraph three subsection “Consequences for spatial representation and readout”).
Catastrophic errors due to relative phase shifts have been extensively discussed in the literature by us and others (Fiete et al., 2008, Wellinder et al., 2008, Sreenivasan and Fiete, 2011, Burak, 2014, Vágó and Ujfalussy, 2018). However these works focused on the capacity of the code and its resolution. A quantitative analysis of the probability for occurrence of catastrophic readout errors, and how this probability scales with various parameters, has not been achieved. We feel that this question is interesting but lies outside the scope of our manuscript for two main reasons:
1. There is no question that these errors are highly detrimental (see citations mentioned above).
2. Our goal is to propose a mechanism that mitigates these errors, and we demonstrate clearly that our proposed mechanism achieves this goal.
As a basic and rough argumentation, it seems reasonable that in order for a catastrophic readout error to occur, relative phase shifts between modules should become significant compared to unity. Based on the new panel (Figure 4F), it is possible to see that when catastrophic errors started to appear in Figure 5C, the standard deviation of phase shifts was of order 0.15 (MSD ~ 0.025), and there was already a small but appreciable probability for relative phase shifts to become of order unity.
However, based on previous works (Fiete et al., 2008, Vágó and Ujfalussy, 2018), it should be clear that the probability for a catastrophic readout must depend in a complicated manner on other factors such as the precise grid spacings, the number of modules and the size of the arena. The new supporting figure (Figure 5—figure supplement 2) demonstrates some of this complexity.
We added a paragraph in the Discussion section (paragraph three) addressing some of the points raised above.
Regarding the last question: we added a quantification of the mean square error relative to the true trajectory, Figure 4G. In the coupled system, Noise in the three modules is effectively projected on the direction corresponding to joint motion, which is equivalent to averaging the noise injected to each module individually. The mean square error accumulated in each module individually is thus expected to be smaller by a factor of m in the case of coupled modules, compared to the uncoupled case, in agreement with our numerical results. We explain this in paragraph five of subsection “Generalization to two dimensions and several modules”.
d) Finally, how does performance scale with the number of modules? A general theoretical model was derived for m modules but results are shown for only 2 and 3 modules (and because these two cases were illustrated in one and two dimensions, respectively, it's not possible to compare directly between them). Larger numbers of modules could presumably lead to a couple of different scenarios that might impact the conclusions in different ways: i) more modules could lead to a reduction in readout errors (even in the absence of coupling), because the readout is averaging over inconsistencies among modules; this would suggest that some of observed error could be alleviated with additional modules, or ii) more modules could lead to an increase in readout errors via the additional intermodule recurrence (e.g. in cases where assumptions about linearity or small input velocity break down). It would be useful to report more on these type of analyses and show how they depend on the source/magnitude of noise. Can these findings be interpreted in terms of the actual numbers of modules in mEC (which would be helpful to report somewhere)?
We agree that this is an interesting question. The actual number of grid cell modules in the brains of rats and mice is unknown, and so far there is direct evidence for the existence of four modules.
We added a supporting figure (Figure 5—figure supplement 2) with several panels that show how the probability for catastrophic errors, and the MSE, evolve in time when the number of modules is increased. We find that increasing the number of modules reduces the probability for catastrophic errors, in accordance with the intuition described above under scenario i). In interpreting these figures, however, it is important to keep in mind that the size of the environment is important. Increasing the number of modules leads to an increase in the range of positions that can be represented unambiguously. Therefore, when comparing the rate of catastrophic readout errors across systems that differ in the number of modules, one has to decide whether to keep the size of environment fixed, or increase it to reflect the increased capacity. Increasing the size of the environment goes handinhand with an increase in the probability for catastrophic errors, since the larger environment includes a larger set of positions whose representation by the grid phases might be confused with the true position. This is demonstrated in panels EJ in Figure 5—figure supplement 2.
In fact, spatial coding and memory maintenance in the entorhinalhippocampal system may require resilience to an even broader range of errors than those associated with confusion between two positions in any given environment. An error in module phases, accumulated during foraging in the absence of sensory cues, might lead to a combination of phases that matches a position in a different spatial map (representing a different environment), more than any position in the present map.
The main goal of the examples shown in Figure 5 and the new Figure 5—figure supplement 2 is not to systematically analyze the effect of catastrophic errors. The goal is to demonstrate that these effects are highly detrimental over a broad range of parameters, and that the coupling mechanism proposed in our work dramatically reduces these detrimental effects.
We briefly discuss Figure 5—figure supplement 2 in the text (paragraph four of subsection “Consequences for spatial representation and readout”), and have added a paragraph in the Discussion section addressing some of the issues raised above (paragraph four).
3) Provide, in the main text, more details/intuition about the choice of inter and intramodule couplings, which is indeed a fundamental aspect of the work. (See also point 1 and 2a above.)
A key point of the theory is that modules should be coupled in a phaseindependent manner, and that the coupling should only depend on the rate of change of the phase. There are two things that could be shown, perhaps in Figure 1 and incorporated into Figure 2:
– show the detrimental effect of phasedependent coupling for position encoding
– recap the properties of selfmotion integration networks (e.g. Seung's model used throughout the paper), showing the velocity dependent tuning of the left and right network and how a simple set of synaptic weights between modules can extract this information.
We added to Figure 1 a schematic illustration of two phasecoupled gears (panel C), and included a short explanation inside the figure caption on why this scheme is unfavorable from the point of view of capacity. We refer to this point also in paragraph six of the Introduction.
We also added a panel to Figure 2 (panel B), illustrating the difference in activity in the left and right bumps as the source of motion along the attractor, and expanded the text in the caption to explain how the double ring model works. Note that the section after Equation 3 in the main text also explain the mechanism of velocity integration in the double ring model.
The readout mechanism that we propose to each module’s velocity is derived and discussed extensively in Appendix 1. We now also explain in the main text (paragraph two of subsection “Coupling modules by synaptic connectivity”) and in the Materials and methods section (after Equation 13) in more detail the structure of synaptic connectivity that injects this readout velocity (as read out from each module) as input to the other modules.
4) Discuss the role of landmarks (e.g. boundaries) in path integration – this important aspect did not receive enough attention throughout the manuscript.
For example, the section on "Intrinsic neural noise" shows how the coupling between modules coordinates their drift. An alternate idea might have been to largely shut down the drift via interaction with border cells (or more generally with boundaries and landmarks). A comment on this alternative would be in order in the Results and in the Discussion since there has been a good bit of work on it lately. These kinds of error correction mechanisms that people have considered so far still work separately on each module, and so a coordination mechanism such as the one in the present paper will still be necessary because even a small amount of independent drift between modules can cause problems with the spatial representation.
This is an important point which was addressed in our initial submission only indirectly, by highlighting the significance of coupling under conditions in which sensory cues are absent. We added a paragraph in the Discussion section (paragraph six), in which we comment on the role of boundaries in this context, and cite recent related work. We also briefly mention the possible role of encounters with the walls as a means for updating the spatial representation in paragraph four of subsection “Generalization to two dimensions and several modules”.
5) An additional figure should be added at the end of the paper, illustrating the nontrivial prediction related to the inactivation of one or more grid cell modules. This would help make the work for broadly accessible, and would strengthen the discussion.
Thanks for this suggestion. We added a figure (Figure 4—figure supplement 1) showing the rate maps of individual neurons from three modules after disconnecting module 1 from the other modules. We refer to this figure in subsection “Experimental predictions”.
6) Discuss the possibility that other (published) mechanisms might be at work for the selforganized emergence of modular structure (which is otherwise assumed in the current manuscript).
See added text and citation in paragraph seven of the Discussion.
In particular, a potential overlap is with recent work on the role of interactions with borders and landmarks in anchoring and reducing drift in the grid system (e.g. Keinath et al., eLife; Ocko et al., PNAS; Hardcastle et al., Neuron; Pollock et al., bioRxiv). The present paper is complementary in that it suggests a way of coordinating drift between modules, rather than reducing the overall amount of the drift. Indeed the border interaction mechanisms may not reduce drift enough on their own, and the coordination mechanism described in the present paper may be necessary also – both mechanisms could be in play. Some commentary would be worthwhile in the Discussion as an interesting direction for the future is to ask how border interactions and intermodule interactions could work together.
See the paragraph six in the Discussion.
https://doi.org/10.7554/eLife.48494.020Article and author information
Author details
Funding
Israel Science Foundation (1745/18)
 Yoram Burak
Israel Science Foundation (1978/13)
 Yoram Burak
Gatsby Charitable Foundation
 Yoram Burak
Dalia and Dan Maydan Fellowship
 Noga Mosheiff
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This research was supported by the Israel Science Foundation grant No. 1745/18 and (in part) by grant No. 1978/13. We acknowledge support from the Gatsby Charitable Foundation, and from the Dalia and Dan Maydan Fellowship (NM).
Senior Editor
 Laura L Colgin, University of Texas at Austin, United States
Reviewing Editor
 Stephanie Palmer, University of Chicago, United States
Reviewer
 Ann M Hermundstad, Howard Hughes Medical Institute, United States
Publication history
 Received: May 15, 2019
 Accepted: August 29, 2019
 Accepted Manuscript published: August 30, 2019 (version 1)
 Version of Record published: September 23, 2019 (version 2)
Copyright
© 2019, Mosheiff and Burak
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 692
 Page views

 131
 Downloads

 1
 Citations
Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.