Learning place cells, grid cells and invariances with excitatory and inhibitory plasticity
 Cited 3
 Views 2,057
 Annotations
Abstract
Neurons in the hippocampus and adjacent brain areas show a large diversity in their tuning to location and head direction, and the underlying circuit mechanisms are not yet resolved. In particular, it is unclear why certain cell types are selective to one spatial variable, but invariant to another. For example, place cells are typically invariant to head direction. We propose that all observed spatial tuning patterns – in both their selectivity and their invariance – arise from the same mechanism: Excitatory and inhibitory synaptic plasticity driven by the spatial tuning statistics of synaptic inputs. Using simulations and a mathematical analysis, we show that combined excitatory and inhibitory plasticity can lead to localized, gridlike or invariant activity. Combinations of different input statistics along different spatial dimensions reproduce all major spatial tuning patterns observed in rodents. Our proposed model is robust to changes in parameters, develops patterns on behavioral timescales and makes distinctive experimental predictions.
https://doi.org/10.7554/eLife.34560.001eLife digest
Knowing where you are never hurts, be it during a holiday in New York or on a hiking trip in the Alps. Our sense of location seems to depend on a structure deep within the brain called the hippocampus, and its neighbor, the entorhinal cortex. Studies in rodents have shown that these areas act a little like an inbuilt GPS for the brain. They contain different types of neurons that help the animal to work out where it is and where it is going. Among those are place cells, present within the hippocampus, and grid cells and head direction cells, found within the entorhinal cortex and other areas.
Place cells fire whenever an animal occupies a specific location in its environment, with each place cell firing at a different spot. Grid cells generate virtual maps of the surroundings that resemble grids of repeating triangles. Whenever an animal steps onto a corner of one of these virtual triangles, the grid cell that generated that map starts to fire. Head direction cells increase their firing whenever an animal’s head is pointing in a specific direction. These cell types thus provide animals with complementary information about their location. But how do the cells first become selective for specific places or head directions?
Weber and Sprekeler propose that a single mechanism gives rise to the spatial characteristics of all these different types of cells. Like all neurons, these cells communicate with their neighbors at junctions called synapses. These may be either excitatory or inhibitory. Cells at excitatory synapses activate their neighbors, whereas cells at inhibitory synapses deactivate them. Weber and Sprekeler used a computer to simulate changes in excitatory and inhibitory synapses in a virtual rat exploring an environment. Interactions between the two types of synapses gave rise to virtual cells that behaved like place, grid or head direction cells. Which cell type emerged depended on whether the excitatory or the inhibitory synapses were more sensitive to the virtual rat’s location.
This idea adds to a range of others proposed to explain how the brain codes for locations. Whether any of these ideas or a combination of them is correct remains to be determined. Further pieces are needed if we are to solve the puzzle of how the brain supports navigation.
https://doi.org/10.7554/eLife.34560.002Introduction
Neurons in the hippocampus and the adjacent regions exhibit a broad variety of spatial activation patterns that are tuned to position, head direction or both. Common observations in these spatial dimensions are localized, bellshaped tuning curves (O'Keefe, 1976; Taube et al., 1990), periodically repeating activity (Fyhn et al., 2004; Hafting et al., 2005) and invariances (Muller et al., 1994; Burgess et al., 2005), as well as combinations of these along different spatial dimensions (Sargolini et al., 2006a; Krupic et al., 2012). For example, head direction cells are often invariant to location (Burgess et al., 2005), and place cells are commonly invariant to head direction (Muller et al., 1994). The cellular and network mechanisms that give rise to each of these firing patterns are subject to extensive experimental and theoretical research. Several computational models have been suggested to explain the emergence of grid cells (Fuhs and Touretzky, 2006; McNaughton et al., 2006; Franzius et al., 2007a; Burak and Fiete, 2009; Couey et al., 2013; Burgess et al., 2007; Kropff and Treves, 2008; Bush and Burgess, 2014; Castro and Aguiar, 2014; Dordek et al., 2016; Stepanyuk, 2015; Giocomo et al., 2011; Zilli, 2012; D'Albis and Kempter, 2017; MonsalveMercado and Leibold, 2017), place cells (Tsodyks and Sejnowski, 1995; Battaglia and Treves, 1998; Arleo and Gerstner, 2000; Solstad et al., 2006; Franzius et al., 2007b; Burgess and O'Keefe, 2011; Franzius et al., 2007a) and head direction cells (McNaughton et al., 1991; Redish et al., 1996; Zhang, 1996; Franzius et al., 2007a). Most of these models are designed to explain the spatial selectivity of one particular cell type and do not consider invariances along other dimensions, although the formation of invariant representations is a nontrivial problem (DiCarlo and Cox, 2007). In view of the variety of spatial tuning patterns, the question arises of whether differences in tuning of different cells in different areas reflect differences in microcircuit connectivity, single cell properties or plasticity rules, or whether there is a unifying principle. In this paper we suggest that both the observed spatial selectivities and invariances can be explained by a common mechanism – interacting excitatory and inhibitory synaptic plasticity – and that the observed differences in the response profiles of grid, place and head direction cells result from differences in the spatial tuning of excitatory and inhibitory synaptic afferents. Here, we explore this hypothesis in a computational model of a feedforward network of ratebased neurons. Simulations as well as a mathematical analysis indicate that the model reproduces the large variety of response patterns of neurons in the hippocampal formation and adjacent areas and can be used to make predictions for the input statistics of each cell type.
Results
We study the development of spatial representations in a network of ratebased neurons with interacting excitatory and inhibitory plasticity. A single model neuron that represents a cell in the hippocampal formation or adjacent areas receives feedforward input from excitatory and inhibitory synaptic afferents. As a simulated rat moves through an environment, these synaptic afferents are weakly modulated by spatial location and in later sections also by head direction. This modulation is irregular and nonlocalized with multiple maxima (Buetfering et al., 2014); see Figure 1a and Materials and methods. Importantly, different inputs show different modulation profiles and each profile is temporally stable. We also show results for localized, that is, place celllike, input (O'Keefe and Dostrovsky, 1971; Marshall et al., 2002; Wilent and Nitz, 2007). The output rate is given by a weighted sum of the excitatory and inhibitory inputs.
In our model, both excitatory and inhibitory synaptic weights are subject to plasticity. The excitatory weights change according to a Hebbian plasticity rule (Hebb, 1949) that potentiates the weights in response to simultaneous pre and postsynaptic activity. The inhibitory synapses evolve according to a plasticity rule that changes their weights in proportion to presynaptic activity and the difference between postsynaptic activity and a target rate (1 Hz in all simulations). This rule has previously been shown to balance excitation and inhibition such that the firing rate of the output neuron approaches the target rate (Vogels et al., 2011; D'Amour and Froemke, 2015). We assume the inhibitory plasticity will act fast enough to track changes of excitatory weights, so that excitation and inhibition are approximately balanced at all times.
Relative spatial smoothness of the excitatory and inhibitory input determines the firing pattern of the output neuron
We first simulate a rat that explores a linear track (Figure 1). The spatial tuning of each input neuron is stable in time and depends smoothly on the location of the animal, but is otherwise random (e.g. Figure 1a). As a measure of smoothness, we use the spatial autocorrelation length. In the following, this is the central parameter of the input statistics, which is chosen separately for excitation and inhibition. In short, we assume that temporally stable spatial information is presynaptically present but we have minimal requirements on its format, aside from the spatial autocorrelation length.
At the beginning of each simulation, all synaptic weights are random. As the animal explores the track, the excitatory and inhibitory weights change in response to pre and postsynaptic activity, and the output cell gradually develops a spatial activity pattern. We find that this pattern is primarily determined by whether the excitatory or inhibitory inputs are smoother in space. If the inhibitory tuning is smoother than the excitatory tuning (Figure 1b), the output neuron develops equidistant firing fields, reminiscent of grid cells on a linear track (Hafting et al., 2008). If instead the excitatory tuning is smoother, the output neuron fires close to the target rate of 1 Hz everywhere (Figure 1c); it develops a spatial invariance. For spatially untuned inhibitory afferents (Grienberger et al., 2017), the output neuron develops a single firing field, reminiscent of a onedimensional place cell (Figure 1d); (cf. Clopath et al., 2016).
The emergence of these firing patterns can be best explained in the simplified scenario of place fieldlike input tuning (Figure 1e,f). The spatial smoothness is then given by the size of the place fields. Let us assume that the output neuron fires at the target rate everywhere (see Materials and methods). From this homogeneous state, a small potentiation of one excitatory weight leads to an increased firing rate of the output neuron at the location of the associated place field (highlighted red curve in Figure 1e). To bring the output neuron back to the target rate, the inhibitory learning rule increases the synaptic weight of inhibitory inputs that are tuned to the same location (highlighted blue curve in Figure 1e). If these inhibitory inputs have smaller place fields than the excitatory inputs (Figure 1c), this restores the target rate everywhere (Vogels et al., 2011). Hence, inhibitory plasticity can stabilize spatial invariance if the inhibitory inputs are sufficiently precise (i.e. not too smooth) in space. In contrast, if the spatial tuning of the inhibitory inputs is smoother than that of the excitatory inputs, the target firing rate cannot be restored everywhere. Instead, the compensatory potentiation of inhibitory weights increases the inhibition in a spatial region at least the size of the inhibitory place fields. This leads to a corona of inhibition, in which the output neuron cannot fire (Figure 1e, blue region). Outside of this inhibitory surround the output neuron can fire again and the next firing field develops. Iterated, this results in a periodic arrangement of firing fields (Figure 1f and Figure 7b for a depiction of the input currents). Spatially untuned inhibition corresponds to a large inhibitory corona that exceeds the length of the linear track, so that only a single place field remains. From a different perspective, spatially untuned input can also be understood as a limit case of vanishing spatial variation in the firing rate rather than a limit of infinite smoothness. Consistent with this view, a development of grid patterns or invariance requires a sufficiently strong spatial modulation of the inhibitory inputs (Materials and methods).
The argument of the preceding paragraph can be extended to the scenario where input is irregularly modulated by space. For nonlocalized input tuning (Figure 1b,c,d), any weight change that increases synaptic input in one location will also increase it in a surround that is given by the smoothness of the input tuning (see Materials and methods for a mathematical analysis). In the simulations, the randomness manifests itself in occasional defects in the emerging firing pattern (Figure 1h, bottom, and Figure 1—figure supplement 1). The above reasoning suggests that the width of individual firing fields is determined by the smoothness of the excitatory input tuning, while the distance between grid fields, that is, the grid spacing, is set by the smoothness of the inhibitory input tuning. Indeed, both simulations and a mathematical analysis (Materials and methods) confirm that the grid spacing scales linearly with the inhibitory smoothness in a large range, both for localized (Figure 1g) and nonlocalized input tuning (Figure 1h). The analysis also reveals a weak logarithmic dependence of the grid spacing on the ratio of the learning rates, the mean firing rates and the number of afferents of the excitatory and inhibitory population (Equation 78 and Figure 8b).
In summary, the interaction of excitatory and inhibitory plasticity can lead to spatial invariance, spatially periodic activity patterns or single place fields depending on the spatial statistics of the excitatory and inhibitory input.
Emergence of hexagonal firing patterns
When a rat navigates in a twodimensional arena, the spatial firing maps of grid cells in the medial entorhinal cortex (mEC) show pronounced hexagonal symmetry (Hafting et al., 2005; Fyhn et al., 2004) with different grid spacings and spatial phases. To study whether a hexagonal firing pattern can emerge from interacting excitatory and inhibitory plasticity, we simulate a rat in a quadratic arena. The rat explores the arena for 10 hr, following trajectories extracted from behavioral data (Sargolini et al., 2006b); Materials and methods. To investigate the role of the input statistics, we consider three different classes of input tuning: (i) place celllike input (Figure 2a), (ii) sparse nonlocalized input, in which the tuning of each input neuron is given by the sum of 100 randomly located place fields (Figure 2b and (iii) dense nonlocalized input, in which the tuning of each input is a random function with fixed spatial smoothness (Figure 2c). For all input classes, the spatial tuning of the inhibitory inputs is smoother than that of the excitatory inputs.
Initially, all synaptic weights are random and the activity of the output neuron shows no spatial symmetry. While the rat forages through the environment, the output cell develops a periodic firing pattern for all three input classes, reminiscent of grid cells in the mEC (Fyhn et al., 2004; Hafting et al., 2005) and typically with the same hexagonal symmetry. This hexagonal arrangement is again a result of smoother inhibitory input tuning, which generates a spherical inhibitory corona around each firing field (compare Figure 1e). These centersurround fields are arranged in a hexagonal pattern – the closest packing of spheres in two dimensions; (cf. Turing, 1952). We find that the spacing of this pattern is determined by the inhibitory smoothness. The similarity between cells in terms of orientation and phase of the grid depends – in decreasing order – on whether they receive the same inputs, on the trajectories on which the tuning was learned and on the initial synaptic weights (Figure 2—figure supplement 1). Two grid cells can thus have different phase and orientation, even if they share a large fraction or all of their inputs.
For the linear track, the randomness of the nonlocalized inputs leads to defects in the periodicity of the grid pattern. In two dimensions, we find that the randomness leads to distortions of the hexagonal grid. To quantify this effect, we simulated 500 random trials for each of the three input scenarios and plotted the grid score histogram (Appendix 1) before and after 10 hr of spatial exploration (Figure 2d,e,f). Different trials have different trajectories, different initial synaptic weights and different random locations of the input place fields (for sparse input) or different random input functions (for dense input). For place celllike input, most of the output cells develop a positive grid score during 10 hr of spatial exploration (33% before to 86% after learning, Figure 2d). Even for low grid scores, the firing rate maps look gridlike after learning but exhibit a distorted symmetry (Figure 2d). For sparse nonlocalized input, the fraction of output cells with a positive grid score increases from 35% to 87% and for dense nonlocalized input from 16% to 68% within 10 hr of spatial exploration (Figure 2e,f). The excitatory and inhibitory inputs are not required to have the same tuning statistics. Grid patterns also emerge when excitation is localized and inhibition is nonlocalized (Figure 2—figure supplement 2).
In summary, the interaction of excitatory and inhibitory plasticity leads to gridlike firing patterns in the output neuron for all three input scenarios. The grids are typically less distorted for sparser input (Figure 2g).
Rapid appearance of grid cells and their reaction to modifications of the environment
In unfamiliar environments, neurons in the mEC exhibit gridlike firing patterns within minutes (Hafting et al., 2005). Moreover, grid cells react quickly to changes in the environment (Fyhn et al., 2007; Savelli et al., 2008; Barry et al., 2012). These observations challenge models for grid cells that require gradual synaptic changes during spatial exploration. In principle, the time scale of plasticitybased models can be augmented arbitrarily by increasing the synaptic learning rates. For stable patterns to emerge, however, significant weight changes must occur only after the animal has visited most of the environment. To explore the edge of this tradeoff between speed and stability, we increased the learning rates to a point where the grids are still stable but where further increase would reduce the stability (Figure 3—figure supplement 1). For place celllike input, periodic patterns can be discerned within 10 min of spatial exploration, starting with random initial weights (Figure 3a,b). The pattern further emphasizes over time and remains stable for many hours (Figure 3c and Figure 3—figure supplement 2).
To investigate the robustness of this phenomenon, we ran 500 realizations with different trajectories, initial synaptic weights and locations of input place fields. In all simulations, a periodic pattern emerged within the first 30 min, and a majority of patterns exhibited hexagonal symmetry after 3 hr (increasing from 33% to 81%, Figure 3c,d). For nonlocalized input, the emergence of the final grids typically takes longer, but the first grid fields are also observed within minutes and are still present in the final grid, as observed in experiments (Hafting et al., 2005); (Figure 3—figure supplement 3).
Above, we modeled the exploration of a previously unknown room by assuming the initial synaptic weights to be randomly distributed. If the rat had previous exposure to the room or to a similar room, a structure might already have formed in some of the synaptic weights. This structure could aid the development of the grid in similar rooms or hinder it in a novel room. To study this, we simulate a network that first learns the synaptic weights in one room. We then introduce a graded modification of the room by remapping the firing fields of a fraction of input neurons to random locations. We find that the output firing pattern is robust to such perturbations, even if more than half of the inputs are remapped (Figure 3—figure supplement 2). If all inputs are changed, corresponding to a novel room, a grid pattern is learned anew. The strong initial pattern in the weights does not hinder this development (Figure 3—figure supplement 2).
Recently, Wernle et al., 2018 discovered that in an arena separated by a wall, single grid cells form two independent grid patterns — one on each side of the wall — that coalesce once the wall is removed. They find that grid fields close to the partition wall move to establish a more coherent pattern. In contrast, fields far away from the partition wall do not change their locations. Rosay et al. reproduced this experimental finding by simulating grid fields as interacting particles (Rosay et al., in preparation). They also demonstrated how it could be reproduced by a feedforward model for grid cells based on firing rate adaptation (Rosay et al., in preparation; Kropff and Treves, 2008). Inspired by these experiments and simulations, we simulate a rat that first explores one half of a quadratic arena and then the other half, for 2.5 hr each (Figure 4a). A grid pattern emerges in each compartment (Figure 4b,c). We then remove the partition wall and the rat explores the entire arena for another 5 hr (Figure 4a). As observed experimentally, grid fields close to the former partition line rearrange to make the two grids more coherent and grid fields far away from the partition line basically stay where they were (Figure 4d).
In summary, periodic patterns emerge rapidly in our model and the associated time scale is limited primarily by how quickly the animal visits its surroundings, that is, by the same time scale that limits the experimental recognition of the grids.
Place cells, band cells and stretched grids
In addition to grids, the mEC and adjacent brain areas exhibit a plethora of other spatial activity patterns including spatially invariant (Burgess et al., 2005), bandlike (periodic along one direction and invariant along the other) (Krupic et al., 2012), and spatially periodic but nonhexagonal patterns (Krupic et al., 2012; Hardcastle et al., 2017; Diehl et al., 2017). Note that it is currently debated whether or not some of the observed spatially periodic but nonhexagonal firing patterns are artifacts of poorly isolated single cell data in multielectrode recordings (Navratilova et al., 2016; Krupic et al., 2015b). In contrast to spatially periodic tuning, place cells in the hippocampus proper are typically only tuned to a single or few locations in a given environment (O'Keefe and Dostrovsky, 1971; Moser et al., 2008; Leutgeb et al., 2005). If the animal traversed the environment along a straight line, all of these cells would be classified as periodic, localized or invariant (Figure 1), although the classification could vary depending on the direction of the line. Based on this observation, we hypothesized that all of these patterns could be the result of an input autocorrelation structure that differs along different spatial directions.
We first verified that also in a twodimensional arena, place cells emerge from a very smooth inhibitory input tuning (Figure 5a,b). The emergence of place cells is independent of the exact shape of the excitatory input. Nonlocalized inputs (Figure 5a) lead to similar results as those from grid celllike inputs of different orientation and grid spacing (Figure 5b, Methods and materials); for other models for the emergence of place cells from grid cells see (Solstad et al., 2006; Franzius et al., 2007b; Rolls et al., 2006; Molter and Yamaguchi, 2008; Ujfalussy et al., 2009; Savelli and Knierim, 2010). Next we verified that also in two dimensions, spatial invariance results when excitation is broader than inhibition (Figure 5c). We then varied the smoothness of the inhibitory inputs independently along two spatial directions. If the spatial tuning of inhibitory inputs is smoother than the tuning of the excitatory inputs along one dimension but less smooth along the other, the output neuron develops band celllike firing patterns (Figure 5d). If inhibitory input is smoother than excitatory input, but not isotropic, the output cell develops stretched grids with different spacing along two axes (Figure 5e). For these anisotropic cases, stretched hexagonal grids and rectangular arrangements of firing fields appear similarly favorable (compare Figure 5e, second row and column). A hexagonal arrangement is favored by a dense packing of inhibitory coronas, whereas a rectangular arrangement would maximize the proximity of the excitatory centers, given the inhibitory corona (Figure 5—figure supplement 1).
In summary, the relative spatial smoothness of inhibitory and excitatory input determines the symmetry of the spatial firing pattern of the output neuron. The requirements for the input tuning that support invariance, periodicity and localization apply individually to each spatial dimension, opening up a combinatorial variety of spatial tuning patterns.
Spatially tuned input combined with head direction selectivity leads to grid, conjunctive and head direction cells
Many cells in and around the hippocampus are tuned to the head direction of the animal (Taube et al., 1990; Taube, 1995; Chen et al., 1994). These head direction cells are typically tuned to a single head direction, just like place cells are typically tuned to a single location. Moreover, head direction cells are often invariant to location (Burgess et al., 2005), just like place cells are commonly invariant to head direction (Muller et al., 1994). There are also cell types with conjoined spatial and head direction tuning. Conjunctive cells in the mEC fire like grid cells in space, but only in a particular head direction (Sargolini et al., 2006a), and many place cells in the hippocampus of crawling bats also exhibit head direction tuning (Rubin et al., 2014). To investigate whether these tuning properties could also result in our model, we simulated a rat that moves in a square box, whose head direction is constrained by the direction of motion (Appendix 1). Each input neuron is tuned to both space and head direction (see Figure 6 for localized and Figure 6—figure supplement 1 for nonlocalized input).
In line with the previous observations, we find that the spatial tuning of the output neuron is determined by the relative spatial smoothness of the excitatory and inhibitory inputs, and the head direction tuning of the output neuron is determined by the relative smoothness of the head direction tuning of the inputs from the two populations. If the head direction tuning of excitatory input neurons is smoother than that of inhibitory input neurons, the output neuron becomes invariant to head direction (Figure 6a). If instead only the excitatory input is tuned to head direction, the output neuron develops a single activity bump at a particular head direction (Figure 6b,c). The concurrent spatial tuning of the inhibitory input neurons determines the spatial tuning of the output neuron. For spatially smooth inhibitory input, the output neuron develops a hexagonal firing pattern (Figure 6a,b), and for less smooth inhibitory input the firing of the output neuron is invariant to the location of the animal (Figure 6c).
In summary, the relative smoothness of inhibitory and excitatory input neurons in space and in head direction determines whether the output cell fires like a pure grid cell, a conjunctive cell or a pure head direction cell (Figure 6d).
We find that the overall head direction tuning of conjunctive cells is broader than that of individual grid fields (Figure 6e). This results from variations in the preferred head direction of different grid fields. Typically, however, these variations remain small enough to preserve an overall head direction tuning of the cell, because individual grid fields tend to align their head direction tuning (compare with Figure 5—figure supplement 1, but in three dimensions). Whether or not a narrower head direction of individual grid fields or a different preferred direction for different grid fields is present also in rodents is not resolved (Figure 6—figure supplement 2).
Discussion
We presented a selforganization model that reproduces the experimentally observed spatial and head direction tuning patterns in the hippocampus and adjacent brain regions. Its core mechanism is an interaction of Hebbian plasticity in excitatory synapses and homeostatic Hebbian plasticity in inhibitory synapses (Vogels et al., 2011; D'Amour and Froemke, 2015). The main prediction of the model is that the spatial autocorrelation structure of excitatory and inhibitory inputs determines – and should thus be predictable from – the output pattern of the cell. Investigations of the tuning of individual cells (Wertz et al., 2015) or even synapses (Wilson et al., 2016) that project to spatially tuned cells would thus be a litmus test for the proposed mechanism.
Origin of spatially tuned synaptic input
The origin of synaptic input to spatially tuned cells is not fully resolved (van Strien et al., 2009). Given that our model is robust to the precise properties of the input, it is consistent with input from higher sensory areas (Tanaka, 1996; Quiroga et al., 2005) that could inherit spatial tuning from their sensory tuning in a stable environment (Arleo and Gerstner, 2000; Franzius et al., 2007a). This is in line with the observation that grid cells lose their firing profiles in darkness (Chen et al., 2016; PérezEscobar et al., 2016) and that the hexagonal pattern rotates when a visual cue card is rotated (PérezEscobar et al., 2016).
The input could also stem from within the hippocampal formation, where spatial tuning has been observed in both excitatory (O'Keefe, 1976) and inhibitory (Marshall et al., 2002; Wilent and Nitz, 2007; Hangya et al., 2010) neurons. For example, the notion that mEC neurons receive input from hippocampal place cells is supported by several studies: Place cells in the hippocampus emerge earlier during development than grid cells in the mEC (Langston et al., 2010; Wills et al., 2010), grid cells lose their tuning pattern when the hippocampus is deactivated (Bonnevie et al., 2013) and both the firing fields of place cells and the spacing and field size of grid cells increase along the dorsoventral axis (Jung et al., 1994; Brun et al., 2008b; Stensola et al., 2012). Moreover, entorhinal stellate cells, which often exhibit gridlike firing patterns, receive a large fraction of their input from the hippocampal CA2 region (Rowland et al., 2013), where many cells are tuned to the location of the animal (Martig and Mizumori, 2011).
Inhibition is usually thought to arise from local interneurons – but see (Melzer et al., 2012) – suggesting that spatially tuned inhibitory input to mEC neurons originates from the entorhinal cortex itself. Interneurons in mEC display spatial tuning (Buetfering et al., 2014; Savelli et al., 2008; Frank et al., 2001) that could be inherited from hippocampal place cells, other grid cells (Couey et al., 2013; Pastoll et al., 2013; Winterer et al., 2017) or from entorhinal cells with nongrid spatial tuning (Diehl et al., 2017; Hardcastle et al., 2017). The broader spatial tuning required for the emergence of spatial selectivity could be established, for example by pooling over cells with similar tuning or through a nonlinear inputoutput transformation in the inhibitory circuitry. If inhibitory input is indeed local, the increase in grid spacing along the dorsoventral axis (Brun et al., 2008b) suggests that the tuning of inhibitory interneurons gets smoother along this axis. For smoother tuning functions, fewer neurons are needed to cover the whole environment, in accordance with the decrease in interneuron density along the dorsoventral axis (Beed et al., 2013).
The excitatory input to hippocampal place cells could originate from grid cells in entorhinal cortex (Figure 5b), which is supported by anatomical (van Strien et al., 2009) and lesion studies (Brun et al., 2008a). The required untuned inhibition could arrive from interneurons in the hippocampus proper that often show very weak spatial tuning (Marshall et al., 2002). In addition to grid cell input, place cells are also thought to receive inputs from other cell types, such as border cells (Muessig et al., 2015) and other brain regions such as the medial septum (Wang et al., 2015) .
Dissociation from continuous attractor network models
The observed spatial tuning patterns have also been explained by other models. In continuous attractor networks (CAN), each cell type could emerge from a specific recurrent connectivity pattern, combined with a mechanism that translates the motion of the animal into shifts of neural activity on an attractor. How the required connectivity patterns – which lie at the core of any CAN model – could emerge is subject to debate (Widloski and Fiete, 2014). Our model is qualitatively different in that it does not rely on attractor dynamics in a recurrent neural network, but on experiencedependent plasticity of spatially modulated afferents to an individual output neuron (Mehta et al., 2000). A measurable distinction of our model from CAN models is its response to a rapid global reduction of inhibition. While a modification of inhibition typically changes the grid spacing in CAN models of grid cells (Couey et al., 2013; Widloski and Fiete, 2015), the grid field locations generally remain untouched in our model. The grid fields merely change in size, until inhibition is recovered by inhibitory plasticity (Figure 7a). This can be understood by the colocalization of the grid fields and the peaks in the excitatory membrane current (Figure 7b,c). A reduction of inhibition leads to an increased protrusion of these excitatory peaks and thus to wider firing fields. Grid patterns in mEC are temporally stable in spite of dopaminergic modulations of GABAergic transmission (Cilz et al., 2014) and the spacing of mEC grid cells remains constant during the silencing of inhibitory interneurons (Miao et al., 2017). Both observations are in line with our model. Moreover, we found that for localized input tuning, the inhibitory membrane current typically also peaks at the locations of the grid fields. This cotuning breaks down for nonlocalized input (Figure 7b). In contrast, CAN models predict that the inhibitory membrane current has the same periodicity as the grid (SchmidtHieber and Häusser, 2013), but possibly phase shifted.
The grid patterns of topologically nearby grid cells in the mEC typically have the same orientation and spacing but different phases (Hafting et al., 2005). Moreover, the coupling between anatomically nearby grid cells – for example their difference in spatial phase – is more stable to changes of the environment than the firing pattern of individual grid cells (Yoon et al., 2013). These properties are immanent to CAN models. In contrast, single cell models (Burgess et al., 2007; Kropff and Treves, 2008; Castro and Aguiar, 2014; Stepanyuk, 2015; Dordek et al., 2016; D'Albis and Kempter, 2017; MonsalveMercado and Leibold, 2017) require additional mechanisms to develop a coordination of neighboring grid cells. The challenge for any mechanism is to correlate the grid orientations, but leave the grid phases uncorrelated. The most obvious candidate, recurrent connections among different grid cells (Si et al., 2012), requires an intricate combination of mechanisms to perform this balancing act. We assume that an appropriate recurrent connectivity would not be simpler in our model.
CAN models predict that all grid fields in a conjunctive (grid x head direction) cell have the same head direction tuning, whereas our model predicts that there could be differences between different grid fields (Figure 6e). Our preliminary analysis suggests that an indepth evaluation would require data for central grid fields without trajectory biases (Figure 6—figure supplement 2), which are at present not publicly available.
In addition, CAN models require that conjunctive (grid x head direction) cells are positively modulated by running speed. Such modulation has been observed in experiments (Kropff et al., 2015). In our model, we could introduce a running speed dependence, for example as a global modulation of the input signals. We expect that in this case, the output neuron would inherit speed tuning from the input but would otherwise develop similar spatial tuning patterns.
A recent analysis has shown that periodic firing of entorhinal cells in rats that move on a linear track can be assessed as slices through a hexagonal grid (Yoon et al., 2016), which arises naturally in a twodimensional CAN model. In our model, we would obtain slices through a hexagonal grid if the rat learns the output pattern in two dimensions and afterwards is constrained to move on a linear track that is part of the same arena. If the rat learns the firing pattern on the linear track from scratch, the firing fields would be periodic.
Rapid appearance and rearrangement of grids
Models that learn grid cells from spatially tuned input do not have to assume a preexisting connectivity pattern or specific mechanisms for path integration (Burgess et al., 2007), but are challenged by the fast emergence of hexagonal firing patterns in unfamiliar environments (Hafting et al., 2005). Most plasticitybased models require slow learning, such that the animal explores the whole arena before significant synaptic changes occur. Therefore, grid patterns typically emerge slower than experimentally observed (Dordek et al., 2016). This delay is particularly pronounced in models that require an extensive exploration of both space and movement direction (Kropff and Treves, 2008; Franzius et al., 2007a; D'Albis and Kempter, 2017). In contrast to these models, which give center stage to the temporal statistics of the animal’s movement, our approach relies purely on the spatial statistics of the input and is hence insensitive to running speed.
For the mechanism we suggested, the selforganization was very robust and allowed rapid pattern formation on short time scales, similar to those observed in rodents (Figure 3). This speed could be further increased by accelerated reactivation of previous experiences during periods of rest (Lee and Wilson, 2002). By this means, the exploration time and the time it takes to activate all input patterns could be decoupled, leading to a much faster emergence of grid cells in all trajectoryindependent models with associative learning. Other models that explain the emergence of grid patterns from place cell input through synaptic depression and potentiation also develop grid cells in realistic times (Castro and Aguiar, 2014; Stepanyuk, 2015; MonsalveMercado and Leibold, 2017). These models differ from ours in that they do not require inhibition, but instead specific forms of ratedependent synaptic depression and potentiation that change the synaptic weights such that place celllike input leads to grid celllike output. How these models generalize to potentially nonlocalized input is yet to be shown.
Learning the required connectivity in CAN models can take a long time (Widloski and Fiete, 2014). However, as soon as the required connectivity and translation mechanism is established, a grid pattern would be observed immediately, even in a novel room. For different rooms this pattern could have different phases and orientations, but similar grid spacing (Fyhn et al., 2007). Similarly, we found that room switches in our model lead to grid patterns of the same grid spacing but different phases and orientations. The pattern emerges rapidly, but is not instantaneously present (Figure 3—figure supplement 2). It would be interesting to study whether rotation of a fraction of the input would lead to a bimodal distribution of grid rotations: No rotation and corotation with the rotated input, as recently observed in experiments where distal cues were rotated but proximal cues stayed fixed (Savelli et al., 2017).
Recently, it was discovered that in an arena separated by a wall, single grid cells form two independent grid patterns – one on each side – that coalesce once the wall is removed (Wernle et al., 2018; Rosay et al., in preparation). This coalescence is local, that is, grid fields close to the partition wall readjust, whereas grid fields far away do not change their locations. Feedforward models like ours can explain such a local rearrangement (Figure 4; Rosay et al., in preparation).
Boundary effects
Experiments show that the pattern and the orientation of grid cells is influenced by the geometry of the environment. In a quadratic arena, the orientation of grid cells tends to align – with a small offset – to one of the box axes (Stensola et al., 2015). In trapezoidal arenas, the hexagonality of grids is distorted (Krupic et al., 2015a). We considered quadratic and circular arenas with rat trajectories from behavioral experiments and found that the boundaries also distort the grid pattern in our simulations, particularly for localized inputs (Figure 2—figure supplement 3). In trapezoidal geometries, we expect this to lead to nonhexagonal grids. However, we did not observe a pronounced alignment to quadratic boundaries if the input place fields were randomly located (Figure 2—figure supplement 3).
Conclusion
We found that interacting excitatory and inhibitory plasticity serves as a simple and robust mechanism for rapid selforganization of stable and symmetric patterns from spatially modulated feedforward input. The suggested mechanism ports the robust pattern formation of attractor models from the neural to the spatial domain and increases the speed of selforganization of plasticitybased mechanisms to time scales on which the spatial tuning of neurons is typically measured. It will be interesting to explore how recurrent connections between output cells can help to understand the role of local inhibitory (Couey et al., 2013; Pastoll et al., 2013) and excitatory connections (Winterer et al., 2017) and the presence or absence of topographic arrangements of spatially tuned cells (O'Keefe et al., 1998; Stensola et al., 2012; Giocomo et al., 2014). We illustrated the properties and requirements of the model in the realm of spatial representations. As invariance and selectivity are ubiquitous properties of receptive fields in the brain, the interaction of excitatory and inhibitory synaptic plasticity could also be essential to form stable representations from sensory input in other brain areas (Constantinescu et al., 2016; Clopath et al., 2016).
Materials and methods
Code availability
The code for reproducing the essential findings of this article is available at https://github.com/simweb/spatial_patterns (Weber, 2018) under the GNU General Public License v3.0. A copy is archived at https://github.com/elifesciencespublications/spatial_patterns.
Network architecture and neuron model
We study a feedforward network where a single output neuron receives synaptic input from ${N}_{\mathrm{E}}$ excitatory and ${N}_{\mathrm{I}}$ inhibitory neurons (Figure 1a) with synaptic weight vectors $\mathbf{w}}^{\mathrm{E}}\text{}\in \text{}{\mathbb{R}}^{{N}_{\mathrm{E}}$, $\mathbf{w}}^{\mathrm{I}}\text{}\in \text{}{\mathbb{R}}^{{N}_{\mathrm{I}}$ and spatially tuned input rates $\mathbf{r}}^{\mathrm{E}}(\mathbf{x}\mathbf{)}\in {\mathbb{R}}^{{\mathbf{N}}_{\mathrm{E}}$, $\mathbf{r}}^{\mathrm{I}}(\mathbf{x}\mathbf{)}\in {\mathbb{R}}^{{\mathbf{N}}_{\mathrm{I}}$, respectively. Here $\mathbf{x}\in {\mathbb{R}}^{\mathrm{d}\mathrm{i}\mathrm{m}\mathrm{e}\mathrm{n}\mathrm{s}\mathrm{i}\mathrm{o}\mathrm{n}\mathrm{s}}$ denotes the location and later also the head direction of the animal. For simplicity and to allow a mathematical analysis we use a ratebased description for all neurons. The firing rate of the output neuron is given by the rectified sum of weighted excitatory and inhibitory inputs:
where $[\cdot ]$_{+} denotes a rectification that sets negative firing rates to zero. To comply with the notion of excitation and inhibition, all weights are constrained to be positive. In most simulations we use $N}_{\mathrm{E}}=4{N}_{\mathrm{I}$. Simulation parameters are shown in Tables 1–3 for the main figures and in Tables 4–6 for the supplementary figures.
Excitatory and inhibitory plasticity
In each unit time step ($\mathrm{\Delta}t=1$), the excitatory weights are updated according to a Hebbian rule:
The excitatory learning rate ${\eta}_{\mathrm{E}}$ is a constant that we chose individually for each simulation. To avoid unbounded weight growth, we use a quadratic multiplicative normalization, that is, we keep the sum of the squared weights of the excitatory population ${\sum}_{i=1}^{{N}_{\mathrm{E}}}{({w}_{i}^{\mathrm{E}})}^{2}$ constant at its initial value, by rescaling the weights after each unit time step. However, synaptic weight normalization is not a necessary ingredient for the emergence of firing patterns (Figure 2—figure supplement 4). We model inhibitory synaptic plasticity using a previously suggested learning rule (Vogels et al., 2011):
with inhibitory learning rate ${\eta}_{\mathrm{I}}$ and target rate $\rho}_{0$ = 1 Hz. Negative inhibitory weights are set to zero.
Rat trajectory
In the linear track model (one dimension, Figures 1 and 7), we create artificial runandtumble trajectories $x(t)$ constrained on a line of length $L$ with constant velocity $v$ = 1 cm per unit time step and persistence length $L/2$ (Appendix 1).
In the open arena model (two dimensions, Figures 2, 3, 5 and 7), we use trajectories $\mathbf{x}(t)$ from behavioral data (Sargolini et al., 2006b) of a rat that moved in a 1 m × 1 m quadratic enclosure (Appendix 1). In the simulations with a separation wall (Figure 4), we create trajectories as a twodimensional persistent random walk (Appendix 1). In the model for neurons with head direction tuning (three dimensions, Figure 6), we use the same behavioral trajectories as in two dimensions and model the head direction as noisily aligned to the direction of motion (Appendix 1).
Spatially tuned inputs
The firing rates of excitatory and inhibitory synaptic inputs ${r}_{i}^{\mathrm{E}},{r}_{j}^{\mathrm{I}}$ are tuned to the location $\mathbf{\mathbf{x}}$ of the animal. In the following, we use $x$ and $y$ for the first and second spatial dimension and $z$ for the head direction.
For place fieldlike input, we use Gaussian tuning functions with standard deviation ${\sigma}_{\mathrm{E}}$, ${\sigma}_{\mathrm{I}}$ for the excitatory and inhibitory population, respectively. In Figure 5 the standard deviation is chosen independently along the $x$ and $y$ direction. The centers of the Gaussians are drawn randomly from a distorted lattice (Figure 2—figure supplement 5). This way we ensure random but spatially dense tuning. The lattice contains locations outside the box to reduce boundary effects.
For sparse nonlocalized input with $N}_{\mathrm{P}}^{\mathrm{f}$ fields per neuron of population $\mathrm{P}$, we first create $N}_{\mathrm{P}}^{\mathrm{f}$ distorted lattices, each with ${N}_{\mathrm{P}}$ locations. We then assign $N}_{\mathrm{P}}^{\mathrm{f}$ of the resulting $N}_{\mathrm{P}}^{\mathrm{f}}{N}_{\mathrm{P}$ locations at random and without replacement to each input neuron (see also Appendix 1).
For dense nonlocalized input, we convolve Gaussians with white noise and increase the resulting signal to noise ratio by setting the minimum to zero and the mean to 0.5 (Appendix 1). The Gaussian convolution kernels have different standard deviations for different populations. For each input neuron we use a different realization of white noise. This results in arbitrary tuning functions of the same autocorrelation length as the – potentially asymmetric – Gaussian convolution kernel. For grid celllike input, we place Gaussians of standard deviation ${\sigma}_{\mathrm{E}}$ on the nodes of perfect hexagonal grids whose spacing and orientation is variable. In Figure 5b we draw the grid spacing of each input from a normal distribution of mean $6{\sigma}_{\mathrm{E}}$ and standard deviation ${\sigma}_{\mathrm{E}}/6$. The grid orientation was drawn from a uniform distribution between $30$ and $30$ degrees.
For input with combined spatial and head direction tuning, we use the Gaussian tuning curves described above for the spatial tuning and von Mises distributions along the head direction dimension (Appendix 1).
For all input tunings, the standard deviation of the firing rate is of the same order of magnitude as the mean firing rate (Appendix 1).
Initial synaptic weights and global reduction of inhibition
We specify a mean for the initial excitatory and inhibitory weights, respectively, and randomly draw each synaptic weight from the corresponding mean $\pm 5\%$. The excitatory mean is chosen such that the output neuron would fire above the target rate everywhere in the absence of inhibition; we typically take this mean to be 1 (Table 1 and Appendix 1). The mean inhibitory weight is then determined such that the output neuron would fire close to the target rate, if all the weights were at their mean value (Table 2 and Appendix 1). Choosing the weights this way ensures that initial firing rates are random, but neither zero everywhere, nor inappropriately high. We model a global reduction of inhibition by scaling all inhibitory weights by a constant factor, after the grid has been learned.
Mathematical analysis of the learning rules
In the following, we derive the spacing of periodic firing patterns as a function of the simulation parameters for the linear track.
We first show that homogeneous weights, chosen such that the output neuron fires at the target rate, are a fixed point for the time evolution of excitatory and inhibitory weights under the assumption of slow learning. We then perturb this fixed point and study the time evolution of the perturbations in Fourier space. The translational invariance of the input overlap leads to decoupling of spatial frequencies and leaves a twodimensional dynamical system for each spatial frequency. For smoother spatial tuning of inhibitory input than excitatory input, the eigenvalue spectrum of the dynamical system has a unique maximum, which indicates the most unstable spatial frequency. This frequency accurately predicts the grid spacing. We first consider place celllike input (Gaussians) and then nonlocalized input (Gaussians convolved with white noise).
At the end of the analysis, you will find a glossary of the notation. Whenever we use P as a sub or superscript instead of E or I, this implies that the equation holds for neurons of the excitatory and the inhibitory population.
The analysis is written as a detailed and comprehensible walkthrough. The reader who is interested only in the result can jump to Equations 78 and 104.
Assumption of slow learning
The firing rate of the output neuron is the weighted sum of excitatory and inhibitory input rates:
where $\left[\dots \right]}_{+$ indicates that negative firing rates are set to zero.
Written as a differential equation, the excitatory learning rule with quadratic multiplicative normalization is given by:
where $\mathbb{\U0001d7d9}$ is the ${N}_{\mathrm{E}}\times {N}_{\mathrm{E}}$ identity matrix. The projection operator $\frac{{\mathbf{w}}^{\mathrm{E}}{{\mathbf{w}}^{\mathrm{E}}}^{\mathrm{T}}}{\Vert {\mathbf{w}}^{\mathrm{E}}{\Vert}^{2}}$ ensures that the weights are constrained to remain on the hypersphere whose radius is determined by the initial value of the sum of the squares of all excitatory weights (Miller and MacKay, 1994). The inhibitory learning rule is given by:
We assume that the rat will learn slowly, such that it forages through the environment before significant learning (i.e. weight change) occurs. Therefore we can coarsen the time scale and rewrite Equation 5 and 6 as
and
respectively, where the spatial average, $\u27e8\dots \u27e9}_{x$, is defined as
and $L$ is the length of the linear track.
High density assumption and continuum limit for place celllike input
We assume a high density of input neurons and formulate the system in continuous variables. More precisely, we assume the distance between two neighboring firing fields to be much smaller than the width of the firing fields, that is, $L/{N}_{P}\ll {\sigma}_{\mathrm{P}}$. Furthermore, we assume that the linear track is very long compared with the width of the firing fields, that is, ${\sigma}_{\mathrm{P}}\ll L$.
We replace the neuron index with the continuous variable $\mu $ and denote the weight ${w}_{\mu}^{\mathrm{P}}$ and the tuning function ${r}^{\mathrm{P}}(\mu ,x)$ associated with a place field that is centered at $\mu $ in the continuum limit as:
The distance between two neighboring place fields is given by $\mathrm{\Delta}\mu =L/{N}_{\mathrm{P}}$. Thus, for sums over all neurons we get the following integral in the continuum limit:
We will switch between the discrete and continuous formulations, using whatever is more convenient.
For place celllike input we take Gaussian tuning curves:
with height ${\alpha}_{\mathrm{P}}$ and standard deviation ${\sigma}_{\mathrm{P}}$. Thus, in the continuum limit we get:
Because of the translational invariance of ${r}^{\mathrm{P}}(\mu ,x)$, integration over space gives the same result as integration over all center locations and the mean of all inputs is the same:
where we introduced ${M}_{\mathrm{P}}:={\alpha}_{\mathrm{P}}\sqrt{2\pi {\sigma}_{\mathrm{P}}^{2}}$ for the area under the tuning curves. Accordingly, we get a summarized input activity that is independent of location:
Equal weights form a fixed point
In the following, we will show that equal weights ${w}^{\mathrm{E}}(\mu )={w}_{0}^{\mathrm{E}}$ and ${w}^{\mathrm{I}}({\mu}^{\prime})={w}_{0}^{\mathrm{I}}$, $\forall \mu ,{\mu}^{\prime}$ form a fixed point if ${w}_{0}^{\mathrm{I}}$ is chosen such that the output neuron fires at the target rate, ${\rho}_{0}$, throughout the arena. With equal weights we get a constant firing rate ${r}_{0}^{\mathrm{out}}$,
which according to Equation 17 does not depend on $x$. Furthermore, according to Equation 14, $\u27e8{r}_{i}^{\mathrm{P}}(x)\u27e9}_{x$ does not depend on the neuron index $i$. Now the stationarity of the excitatory weight evolution follows from Equation 7:
that is, excitatory weights are stationary for all values of ${w}_{0}^{\mathrm{E}}$ and ${w}_{0}^{\mathrm{I}}$ (here ${\delta}_{ij}$ denotes the Kronecker delta which is 1 if $i=j$ and 0 otherwise). This holds for all input functions for which $\u27e8{r}_{j}^{\mathrm{E}}(x)\u27e9}_{x$ is independent of $j$. If ${r}^{\mathrm{out}}={\rho}_{0}$, it immediately follows from Equation 6 that $\frac{\mathrm{d}{w}^{\mathrm{I}}}{\mathrm{d}t}=0$, so the inhibitory weights are stationary if
which is fulfilled if
Linear stability analysis
In the following, we will show that the fixed point of equal weights, the homogeneous steady state, is unstable when the spatial tuning of inhibitory inputs is broader than that of the excitatory inputs. In this case, perturbations of the fixed point will grow and one particular spatial frequency will grow fastest. We will show that this spatial frequency predicts the spacing of the resulting periodic pattern (Figure 1g).
We perturb the fixed point
and look at the time evolution of the perturbations $\frac{\mathrm{d}\delta {w}^{\mathrm{E}}}{\mathrm{d}t}$ and $\frac{\mathrm{d}\delta {w}^{\mathrm{I}}}{\mathrm{d}t}$ of the excitatory and inhibitory weights around the fixed point.
Close to the fixed point the output neuron fires around the target rate ${\rho}_{0}$. We thus ignore the rectification in Equation 4, that is, ${r}^{\mathrm{out}}={\rho}_{0}+\delta {r}^{\mathrm{out}}$, with $\delta {r}^{\mathrm{out}}={\sum}_{k}\delta {w}_{k}^{\mathrm{E}}{r}_{k}^{\mathrm{E}}{\sum}_{{k}^{\prime}}\delta {w}_{{k}^{\prime}}^{\mathrm{I}}{r}_{{k}^{\prime}}^{\mathrm{I}}$.
Time evolution of perturbations of the inhibitory weights
We start with the time evolution of the inhibitory weight perturbations:
where only the rates ${\mathbf{\mathbf{r}}}^{\mathrm{P}}$ depend on $x$. Intuitively, the first term in Equation 29 means that the rate of change of the inhibitory weight perturbation of the weight associated with one location depends on the excitatory perturbations of the weights associated with every other location, weighted with the overlap (the cross correlation) of the two associated tuning functions (analogous for inhibitory weight perturbations). In the continuum limit, the sums are:
where we introduce overlap kernels
The overlap $\u27e8{r}^{\mathrm{P}}(\mu ){r}^{{\mathrm{P}}^{\mathrm{\prime}}}({\mu}^{\mathrm{\prime}})\u27e9}_{x$depends only on the distance of the Gaussian fields, that is,
Taking $L\to \mathrm{\infty}$, the time evolution of the perturbations of the inhibitory weights can thus be written as convolutions:
where $*$ denotes a convolution.
Time evolution of perturbations of the excitatory weights
To derive the time evolution of the excitatory weights, we first show that the weight normalization term in Equation 7 , expressed through the projection operator ${P}_{ij}=\frac{{w}_{i}{w}_{j}}{{\sum}_{k}{w}_{k}^{2}}$, leads to a term that balances homogeneous weight perturbations and a term that can be neglected in the continuum limit.
Let $P$ be the projection operator responsible for the normalization of the excitatory weights by projecting a weight update onto a vector that is orthogonal to the hypersphere of constant ${\sum}_{i=1}^{{N}_{\mathrm{E}}}{({w}_{i}^{\mathrm{E}})}^{2}$. We now determine the projection operator around the fixed point (We drop the index ‘E’ in the following, to improve readability):
Using Taylor’s theorem
and ${w}_{l}={w}_{0}\mathrm{\forall}l$, we get
In summary this gives:
Using the perturbed projection operator Equation 39 with Equation 7, we obtain the time evolution of the excitatory weight perturbation to linear order:
Term $(1)$ in Equation 44 has a similar structure as in the inhibitory case (Equation 27), and will lead to analogous convolutions. he second term is given by
and the third term by
where $\delta (\mu {\mu}^{\prime})$ denotes the Dirac delta function. Together, this leads to the time evolution of the excitatory weight perturbations:
We now assume $L\gg {\sigma}_{\mathrm{P}}$ and write everything as convolutions, also trivial ones:
Decoupling of spatial frequencies
The convolutions in Equations 34 and 60 show how the excitatory and inhibitory weight perturbations at one location influence the time evolution of weights at every other location. Transforming the system to frequency space leads to a drastic simplification: The time evolution of a perturbation of a particular spatial frequency depends only on the excitatory and inhibitory perturbation of the same spatial frequency, that is, the Fourier components decouple. We define the Fourier transform $f(k)\equiv \mathcal{F}[f(\mu )]$ with wavevector $k$ of a function $f(\mu )$ of location $\mu $ as:
and note that
Using the Convolution theorem and the linearity of the Fourier transform we get
and
The $\delta (k)$ term in Equation 63 balances homogeneous perturbations in such a way that the output neuron would still fire at the target rate, if not for permutations at other frequencies. In the following, we drop this term, because we are not interested in spatially homogeneous perturbations. Moreover, the continuum limit is valid only for high densities: ${N}_{\mathrm{P}}/L\to \mathrm{\infty}$. We can thus drop terms of lower order than ${N}_{\mathrm{P}}/L$, which eliminates the $\frac{{\eta}_{\mathrm{E}}{\rho}_{0}{M}_{\mathrm{E}}}{{w}_{0}^{\mathrm{E}}L}$ term. Writing the remaining terms of Equations 63 and 64 as a matrix leads to:
which no longer contains terms from the weight normalization. The characteristic polynomial of the above matrix is:
The difference, $K}^{\mathrm{E}\mathrm{I}}{K}^{\mathrm{I}\mathrm{E}}{K}^{\mathrm{E}\mathrm{E}}{K}^{\mathrm{I}\mathrm{I}$, vanishes for Gaussian input, because:
where we completed the square and used $\int}_{\mathrm{\infty}}^{+\mathrm{\infty}}{e}^{a{x}^{2}}=\sqrt{\frac{\pi}{a}$. Taking the Fourier transform and completing the square again gives
and thus ${K}^{\mathrm{E}\mathrm{I}}{K}^{\mathrm{I}\mathrm{E}}{K}^{\mathrm{E}\mathrm{E}}{K}^{\mathrm{I}\mathrm{I}}=0$.
For $\mathrm{P}={\mathrm{P}}^{\prime}$, Equation 70 simplifies to:
This leads to the eigenvalues:
which are shown in Figure 8a. Perturbations with spatial frequencies for which ${\lambda}_{1}(k)$ is positive will grow. Setting $\frac{\mathrm{d}{\lambda}_{1}(k)}{\mathrm{d}k}=0$ gives the wavevector ${k}_{\mathrm{max}}$ of the Fourier component that grows fastest:
Assuming that the fastestgrowing spatial frequency from the linearized system will prevail, the final spacing of the periodic pattern, $\mathrm{\ell}$, is determined by:
Equation 78 is in exact agreement with the grid spacing obtained in simulations (Figure 1g). Moreover, it indicates the bifurcation point: When excitation is as smooth as inhibition (${\sigma}_{\mathrm{E}}={\sigma}_{\mathrm{I}}$), there is no unstable spatial frequency anymore and every perturbation gets balanced (Figure 1g compare Equation 103). The grid spacing also depends on the ratio of the inhibitory and excitatory parameters ${\eta}^{\mathrm{P}},{N}_{\mathrm{P}},{\alpha}_{\mathrm{P}}$ (logarithmic term in Equation 78). We confirm this dependence with simulations on the linear track where we increase either ${\eta}_{\mathrm{I}}$ or ${N}_{\mathrm{I}}$ or ${\alpha}_{\mathrm{I}}^{2}$ such that the product $\gamma ={\eta}_{\mathrm{I}}{N}_{\mathrm{I}}{\alpha}_{\mathrm{I}}^{2}$ increases with respect to the initial product ${\gamma}_{0}$. We find a good agreement with the theoretical prediction for all three variations (Figure 8b).
Note that the term ${\eta}^{\mathrm{P}}{M}_{\mathrm{P}}^{2}{N}_{\mathrm{P}}$ in the logarithm in Equation 78 is essentially a factor that determines the rate of weight change of population $\mathrm{P}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}{\eta}^{\mathrm{P}}$ is just the scaling factor; ${M}_{\mathrm{P}}$ is the mass under a tuning function (with quadratic influence: once directly through the firing rate of the input, once through the increased firing rate of the output neuron); ${N}_{\mathrm{P}}$ is the number of tuning functions. The remaining ${\sigma}_{\mathrm{P}}^{2}$ originates specifically from the Gaussian shape of the tuning functions.
Analysis for nonlocalized input (Gaussian random fields)
Above, we derived the time evolution of perturbations of excitatory and inhibitory weights for place fieldlike input, that is, Gaussian tuning curves. In the following we conduct a similar analysis, using nonlocalized input, that is, random functions with a given spatial autocorrelation length. We show that the grid spacing is predicted by an equation that is equivalent to Equation 78.
The nonlocalized input ${r}_{i}^{\mathrm{P}}$ for input neuron $i$ of population $\mathrm{P}$ was obtained by rescaling a Gaussian random field (GRF) ${g}_{i}^{\mathrm{P}}$ to mean $1/2$ and minimum 0:
where ${\mathrm{min}}_{x}$ denotes the minimum over all locations and the GRF ${g}_{i}^{\mathrm{P}}$ is obtained by convolving a Gaussian ${\mathcal{\mathcal{G}}}^{\mathrm{P}}(x)=\mathrm{exp}({x}^{2}/2{\sigma}_{\mathrm{P}}^{2})$ with white noise ${\xi}_{i}$ from a uniform distribution between $0.5$ and $0.5$:
As the white noise has zero mean, the spatial average of a GRF is also 0 in expectation:
The individual minima ${\mathrm{min}}_{x}{g}_{i}^{\mathrm{P}}(x)$ in Equation 79 would complicate the subsequent analysis. If we again consider infinitely large systems $L\to \mathrm{\infty}$ with infinite density ${N}_{\mathrm{P}}/L\to \mathrm{\infty}$, Equation 79 simplifies. The mean of the distribution of GRF minima over different input neurons scales logarithmically with the number of samples (Bovier, 2005). Here the number of samples corresponds to the number of minima in a GRF, which scales inversely with the width of the convolution kernel that was used to obtain the GRF:
In the continuum limit the variance of the minima distribution over cells decreases and the relative difference between the mean minimum value of excitation and inhibition vanishes (Figure 8c):
NB: For the argument it doesn’t matter if it scales purely logarithmically or with $\mathrm{log}}^{\gamma$, where $\gamma$ is any exponent.
Thus, we take the minimum value as a constant m, which does neither depend on the population nor on the input neuron. This leads to a simplified expression of the input tuning functions:
As $\u27e8{r}_{i}^{\mathrm{P}}\u27e9=0.5$ is independent of $i$, equal excitatory weights are a fixed point for the excitatory learning rule Equation 7 as described in Equation 19. Moreover, the sum over all input neurons does not depend on the location:
Therefore, given constant excitatory weights, all inhibitory weights can be set to a value ${w}_{0}^{\mathrm{I}}$ such that the output neuron fires at the target rate, that is, homogeneous weights are a fixed point of the learning rules, as in the scenario with Gaussian input. Moreover, Equation 29 holds also for GRF input. The analysis of the projection operator (see above) of the weight normalization lead to a term of homogeneous weight perturbations and a term that could be neglected in the high density limit. We now omit these terms a priori. The time evolution of excitatory and inhibitory weight perturbations can thus be summarized as (compare Equations 29 and 44):
The above equation describes the time evolution of each synaptic weight. For the Gaussian input of the earlier sections, each synaptic weight is associated with one location. In the continuum limit we thus identified the synaptic weight associated with location $\mu $ with ${w}^{\mathrm{P}}(\mu )$. An increase of ${w}^{\mathrm{E}}(\mu )$ corresponded to an increase in firing at location $\mu $ (and in the surrounding, given by the width of the Gaussian of the excitatory tuning). Analogously, an increase of ${w}^{\mathrm{I}}(\mu )$ caused a decrease in firing at location $\mu $ (and in the surrounding, given by the width of the Gaussian of the inhibitory tuning). Because of the nonlocalized tuning of GRF input, each synaptic weight has an influence on the firing rate at many locations. The influence of neuron $i$ of population $\mathrm{P}$ at location $\mu $ is expressed by ${\xi}_{i}^{\mathrm{P}}(\mu )$. If one wanted to increase the firing rate at a specific location $\mu $ – and not just everywhere – one would thus increase all excitatory weights with high ${\xi}_{i}^{\mathrm{P}}(\mu )$ and decrease all excitatory weights with low ${\xi}_{i}^{\mathrm{P}}(\mu )$ (note that ${\xi}^{\mathrm{P}}$ can also be negative). The ‘weight’ that corresponds to location $\mu $ is thus expressed as:
where we weight each synaptic weight with the value of the corresponding white noise at location $\mu $. This corresponds to expressing the weights in a basis that is associated with the location and not with the individual input neurons. Combining Equation 88 and Equation 87 gives the time evolution of the weight perturbations associated with location $\mu $:
We now look at the first term of the above equation, the second term will be treated analogously:
The sum containing the white noise can be simplified using the zero mean property and the expression for the variance of uniform white noise:
where $\beta $ is a proportionality constant that does not depend on the population type $\mathrm{P}$. The Dirac delta $\delta ({x}^{\prime}\mu )$ occurs, because the white noise at different locations is uncorrelated. The sum of the product of weight perturbations and input rates can be rewritten as:
The first term is independent of location $x$ and thus will lead only to spatially homogeneous perturbations, which we do not consider in the following. Inserting Equations 94 and 95 and the analogous terms for inhibition in Equation 91 leads to:
where we introduce kernels for the translation invariant overlap between two Gaussians with different centers (similar to Equation 32):
Equation 89 can thus be written as:
which leads to a dynamical system for the Fourier components of the weight perturbations that is equivalent to Equation 65 with eigenvalues:
Thus, we get the same expression for the grid spacing as in the scenario of Gaussian input (with $\alpha}_{\mathrm{E}$ = $\alpha}_{\mathrm{I}$ = 1):
Glossary
A summary of notation:
Appendix 1
Rat trajectory
In the linear track model (one dimension, Figure 1), we create artificial trajectories $x(t)$. The rat moves along a line of length L with constant velocity v = 1 cm per unit time step Δt = 1. The rat always inverts its direction of motion when it hits either end of the enclosure at $L/2$ or $L/2$. Additionally, in each unit time step it inverts its direction with a probability of $2\upsilon \mathrm{\Delta}t/L$, resulting in a typical persistence length of L∕2.
In the open arena model (two dimensions, Figures 2, 3 and 5), we take trajectories $\mathbf{x}(t)$ from behavioral data (Sargolini et al., 2006b) of a rat that moved in a 1 m × 1 m quadratic enclosure. The data provide coherent trajectories in intervals of 10 min. To get a 10hr trajectory, we concatenate 60 individual trajectories. Different trajectories in our simulations correspond to different random orders of concatenation. A 10min trajectory contains 30,000 locations. We update the location in every unit time step. A time step thus corresponds to 20 ms. For simulations with a separation wall (Figure 4), we use a persistent random walk to constrain the motion of the rat to either side of the arena (see below).
In the model for neurons with head direction tuning (three dimensions, Figure 6), we use the same behavioral trajectories as in two dimensions. To account for the experimental observation that the head direction of the animal is only roughly aligned with the direction of motion, we model the head direction as the direction of motion plus a random angle that is drawn in each unit time step from a normal distribution with standard deviation $\pi /6$.
In all dimensions and for the learning rates under consideration, we find that the precise trajectory of the rat has only a small influence on the results (see also Figure 2—figure supplement 1).
Spatial tuning of input neurons
The firing rates of excitatory and inhibitory synaptic inputs $r}_{i}^{\mathrm{E}},\text{}{r}_{j}^{\mathrm{I}$ are tuned to the location x of the animal. In the following, we use $x$ and $y$ for the first and second spatial dimensions and $z$ for the head direction. The values of $x,y,z$ are in the range $[L/2,\text{}L/2]$. Note that we take the interval of length L even for the dimension of head direction to have spatial and head direction input at the same scale. In the interpretation of head direction input, the periodic interval is to be understood as the full circle of 360 degrees.
We analyzed three different kinds of input tuning functions. Place cells (single Gaussians), several place fields (sum of multiple Gaussians) and nonlocalized input (Gaussians convolved with white noise). We summarize the tuning functions of neurons from the excitatory and the inhibitory population by referring to them as population $\mathrm{P}$, where $\mathrm{P}\text{}\in \text{}\{\mathrm{E},\text{}\mathrm{I}\}$.
For readability, we define a Gaussian of height 1 with standard deviation $\sigma}_{\mathrm{P}$:
The input function of the ith neuron of population P with $N}_{\mathrm{P}}^{\mathrm{f}$ place fields per input neuron in one dimension is then given by:
where $\mu}_{i,\beta}^{\mathrm{P}$ denotes the center location of field number β of input neuron i of population P. The scenario of place celllike inputs is obtained by setting ${N}_{\mathrm{P}}=1$.
For higher dimensions we define the center components as ${\mathit{\mu}}_{i,\beta}^{\mathrm{P}}=({\mu}_{i,\beta ,x}^{\mathrm{P}},{\mu}_{i,\beta ,y}^{\mathrm{P}},{\mu}_{i,\beta ,z}^{\mathrm{P}})$. In two dimensions, the tuning of the ith neuron of population $\mathrm{P}$ with $N}_{\mathrm{P}}^{\mathrm{f}$ place fields per input neuron is thus given by:
Here, the two onedimensional Gaussians can have different standard deviations along different axes, $\sigma}_{\mathrm{P},x$ and $\sigma}_{\mathrm{P},y$, respectively. For simplicity, we constrain the resulting elliptic bellshaped curve to be aligned with the $x$ or $y$ axis.
In three dimensions we also consider bellshaped tuning functions along the $z$direction. However, as the head direction component is periodic, we take von Mises functions that are periodic in the interval $[L/2,L/2]$:
In the interpretation of head direction input, the periodic interval is to be understood as the full circle of 360 degrees. In three dimensions, the tuning of the $i$th neuron of population P with $N}_{\mathrm{P}}^{\mathrm{f}$ place fields per input neurons is thus given by:
The center locations ${\mu}^{\mathrm{p}}$ for neurons of type P in an enclosure of side length $L$ are drawn from a randomly distorted lattice (Figure 2—figure supplement 5). First, the total number of input neurons is factorized into its dimensional components $N}_{P}={N}_{\mathrm{P},x}{N}_{\mathrm{P},y}{N}_{\mathrm{P},z$. Then, for example along the $x$ dimension, center locations of neurons of population $\mathrm{P}$ are placed equidistantly in $[\frac{L}{2}3{\sigma}_{\mathrm{P},x},\frac{L}{2}+3{\sigma}_{\mathrm{P},x}]$. Allowing the field centers to lie a multiple of their standard deviation outside the box reduces boundary effects. Each point on the equidistant lattice is subsequently distorted with noise drawn from a uniform distribution whose range is given by the distance between two points on the undistorted lattice, that is, $[\frac{L}{2({N}_{\mathrm{P},x}1)},\frac{L}{2({N}_{\mathrm{P},x}1)}]$; see Figure 2—figure supplement 5. Other dimensions are treated analogously. This procedure ensures a random, but still dense, coverage of the arena with few place fields. A truly random distribution of centers leads to similar results (not shown), but requires more input neurons to cover the arena densely. We create $N}_{\mathrm{P}}^{\mathrm{f}$ of such distorted lattices. To each input neuron we assign one center location from each of the $N}_{\mathrm{P}}^{\mathrm{f}$ lattices at random and without replacement. This guarantees that each input neuron has $N}_{\mathrm{P}}^{\mathrm{f}$ randomly located fields that together cover the arena densely.
We obtain dense nonlocalized input by convolving Gaussians as in Equations 105 and 107 (with ${N}_{\mathrm{P}}^{\mathrm{f}}=1$) with uniform white noise between −0.5 and 0.5. For the discretization we choose ${\sigma}_{\mathrm{P}}/20$ and center the Gaussian convolution kernel on an array of eight times its standard deviation. We convolve this array with a sufficiently large array of white noise such that we keep only the values where the array of the convolution kernel is inside the array of the white noise. This way we avoid boundary effects at the edges. From the resulting function we subtract its minimum and then divide by twice the mean of the difference between the function and its minimum. This increases the signal to noise ratio and ensures that all of the inputs have a mean value of 0.5 across the arena and a minimum at 0. For each input neuron we take a different realization of white noise. This results in arbitrary tuning functions of the same autocorrelation length as the Gaussian convolution kernel. We define the autocorrelation length as the distance at which the autocorrelation has decayed to $1/e$ of its maximum, where $e$ is Euler’s number. The above mentioned also holds for circular enclosures, but we drop all field centers outside of a circle of radius $L/2+3{\sigma}_{\mathrm{P}}$ because they never get activated. This is not necessary but it reduces simulation time.
Learning two sides of a room independently
In Figure 4 we simulated a rat that explores each half of an arena that is divided by a wall. Then the wall is removed and the animal explores the entire arena. This setup was inspired by recent experiments (Wernle et al., 2018) and simulations (Rosay et al., in preparation; Mégevand, 2013). To simulate the two separated compartments, we use two independent sets of inputs, that is, place cells that are randomly distributed around the entire arena (AB). One set is active when the rat explores the first compartment, the other set is active when the rat explores the second compartment. Both sets are active when the wall is removed. If we used a single set of inputs, the grids would be merged, even before the wall was removed. The excitatory synaptic weights of the two sets are normalized independently. This is important, because otherwise the synaptic weights of inputs that are only active when the rat is in compartment A would die out while the rat explored compartment B.
To constrain the motion of the rat to one side of the arena, we create artificial rat trajectories as a persistent random walk with velocity $v$ along a direction vector $(\mathrm{cos}\varphi ,\mathrm{sin}\varphi )$, with polar angle $\varphi $. In each time step, $\mathrm{\Delta}t$, a random number drawn from a normal distribution with mean 0 and standard deviation $\sqrt{4v\mathrm{\Delta}t/L}$ is added to $\varphi $, resulting in a twodimensional random walk of persistence length $L/2$. Whenever the rat hits one of the boundaries, the direction vector is modified such that the angle of incidence equals the angle of reflection. We relate the trajectory to behavioral times by assuming an average rat velocity of 20 cm/s.
Boundary effects and stability of grids
The motion of the rat is not periodic. We constrain it to either a square or a circular box. The input tuning is not periodic either. Consequently, input neurons with tuning fields that lie partially outside the boundary receive less activation. This leads to boundary effects: Excitatory weights associated with fields at the boundaries grow less, because the Hebbian learning scales with the presynaptic activation. This leads to smaller firing rates at the boundary. According to the inhibitory learning rule, the inhibitory weights of neurons that are tuned to boundary locations then also grow less. At a distance given by the width of the excitatory firing fields, the excitatory weights grow as fast as those that are far away from the boundary. However, if inhibition is more broadly tuned than excitation, the inhibitory input is still reduced at these locations. Firing fields are thus favored at a distance from the boundary that is determined by the width of the excitatory tuning, because at this location the excitation will exceed the inhibition. This preference of firing at a certain distance from the boundary competes with the preference for hexagonal firing that is induced by the interaction of excitatory and inhibitory plasticity. For place fieldlike input arranged on a symmetric lattice, the alignment to the boundary can be seen in the alignment of one grid axis to the boundary in a square box (Figure 2—figure supplement 3a). This alignment is not an artifact of the symmetric distribution of input fields, because it is not present in a circular arena (Figure 2—figure supplement 3b). The tendency to align with the boundary can be overcome using a random distribution of input fields (Figure 2—figure supplement 3c), and in particular by using input with more than one place field per neuron, that is, nonlocalized input. Nonetheless, we observe boundary effects in all simulations when simulating for very long times.
Distribution of initial synaptic weights
To start with reasonable firing rates, we take the initial weights close to the values that would correspond to the fixed point weights (see also the mathematical analysis). More precisely, initially all synaptic weights are chosen from a uniform distribution. For the spreading of the distribution we take $\pm 5\%$ of the mean value. For the mean value of the excitatory weights, ${w}_{0}^{\mathrm{E}}$, we typically take ${w}_{0}^{\mathrm{E}}=1$, see Table 1. We then determine the mean of the initial inhibitory weights, ${w}_{0}^{\mathrm{I}}$, such that the output neuron fires on average around the target rate:
so
The sums are given by:
where ${N}_{\mathrm{P}}$ is the number of input neurons, ${M}_{\mathrm{P}}$ is the area under a tuning function and ${A}_{\mathrm{P}}$ is the area in which the centers of the input tuning function can lie. For the fixed point weight relation Equation 111 this leads to
The values for ${A}_{\mathrm{P}}$ and ${M}_{\mathrm{P}}$ depend on the dimensionality of the system.
One dimension
For Gaussian input we have:
For Gaussian random field input we have:
Two dimensions
For Gaussian input we have:
For Gaussian random field input we have:
Three dimensions
In three dimensions we use a von Mises distribution along the third dimension to account for the periodicity of the head direction angle. We thus get
where ${I}_{0}$ is the modified Bessel function. The area in which the function centers can lie is given by:
Grid score measure
We use the grid score suggested by Langston et al. (2010) . More precisely, we determine the grid score of a spatial autocorrelogram – the Pearson correlation coefficients for all spatial shifts of the firing rate map against itself – in the following way: We crop a centered doughnut shape from the correlogram. To get the inner radius of the doughnut, we clip all values in the correlogram with values smaller than 0.1 to 0. We obtain the resulting clusters that are larger than 0.1 using scipy.ndimage.measurements.label from the SciPy package for Python with a quadratic filter structure, $((1,1,1),(1,1,1),(1,1,1))$, for a correlogram with $51\times 51$ pixels. We use the distance from the center to the outermost pixel of the innermost cluster as the inner radius of the doughnut. For the outer radius we try 50 values, linearly increasing from the inner radius to the corner of the quadratic arena. For each of the resulting 50 doughnuts, we rotate the doughnut around the center and correlate it with the unrotated doughnut. We determine the correlation for 30, 60, 90, 120 and 150 degrees. We define the grid score as the minimum of the correlation values at 60 and 120 degrees minus the maximum of the correlation values at 30, 90 and 150 degrees. After trying all 50 doughnuts, we take the highest resulting grid score as the grid score of the cell. A hexagonal symmetry thus leads to positive values, whereas a quadratic symmetry leads to negative values.
Measure for head direction tuning
To quantify the head direction tuning of a cell, we compare the head direction tuning to a uniform circular tuning, using Watson’s ${U}^{2}$ measure. We adopted the code from Mégevand, 2013 . We drew $10,000$ samples, s_HD , from a probability distribution created from the head direction tuning array, and $10,000$ samples, s_uniform , from a uniform distribution and use watson_u2(s_uniform, s_HD) from Mégevand, 2013 to quantify the degree of noncircularity. The sharper the head direction tuning, the higher the resulting value.
Measure for grid spacing on the linear track
We define the grid spacing of onedimensional grids as the location of the first noncentered peak in the autocorrelogram of the firing pattern (Figure 1g). For place celllike input, we obtain the grid spacing from a single simulation.
For nonlocalized input the grids show defects, which results in misleading peaks in the correlogram. In this case, we used the first peak of the average of 50 correlograms to get the grid spacing (Figure 1h). The 50 correlograms were obtained from 50 realizations that differ only in the randomness of the input function. To avoid taking a fluctuation in the correlogram as the first peak – and thus obtaining misleading grid spacing – we take the maximum between $3{\sigma}_{\mathrm{E}}$ (to cut out the center of the correlogram) and 1 (a value larger than the largest grid spacing in Figure 1h).
For high values of the spatial smoothness of inhibition, ${\sigma}_{\mathrm{I}}$, the simulation results deviate from the analytical solution. This is because for high ${\sigma}_{\mathrm{I}}$ but small ${\sigma}_{\mathrm{E}}$, the output neuron fires very sparsely, which impedes the learning. This can be readily overcome by increasing the tuning width, ${\sigma}_{\mathrm{E}}$, of the excitatory input.
References

1
Spatial cognition and neuromimetic navigation: a model of hippocampal place cell activityBiological Cybernetics 83:287–299.https://doi.org/10.1007/s004220000171
 2
 3
 4

5
Grid cells require excitatory drive from the hippocampusNature Neuroscience 16:309–317.https://doi.org/10.1038/nn.3311

6
Extreme Values of Random Processes: Lecture NotesAccessed February 8, 2017.
 7
 8
 9

10
Accurate path integration in continuous attractor network models of grid cellsPLoS Computational Biology 5:e1000291.https://doi.org/10.1371/journal.pcbi.1000291

11
An oscillatory interference model of grid cell firingHippocampus 17:801–812.https://doi.org/10.1002/hipo.20327
 12

13
Models of place and grid cell firing and theta rhythmicityCurrent Opinion in Neurobiology 21:734–744.https://doi.org/10.1016/j.conb.2011.07.002

14
A hybrid oscillatory interference/continuous attractor network model of grid cell firingJournal of Neuroscience 34:5065–5079.https://doi.org/10.1523/JNEUROSCI.401713.2014
 15
 16
 17
 18
 19
 20

21
Recurrent inhibitory circuitry as a mechanism for grid formationNature Neuroscience 16:318–324.https://doi.org/10.1038/nn.3310

22
A singlecell spiking model for the origin of gridcell patternsPLoS Computational Biology 13:e1005782.https://doi.org/10.1371/journal.pcbi.1005782
 23

24
Untangling invariant object recognitionTrends in Cognitive Sciences 11:333–341.https://doi.org/10.1016/j.tics.2007.06.010
 25
 26
 27

28
Slowness and sparseness lead to place, headdirection, and spatialview cellsPLoS Computational Biology 3:e166.https://doi.org/10.1371/journal.pcbi.0030166

29
From grids to placesJournal of Computational Neuroscience 22:297–299.https://doi.org/10.1007/s1082700600137

30
A spin glass model of path integration in rat medial entorhinal cortexJournal of Neuroscience 26:4266–4276.https://doi.org/10.1523/JNEUROSCI.435305.2006
 31
 32
 33

34
Topography of head direction cells in medial entorhinal cortexCurrent Biology 24:252–262.https://doi.org/10.1016/j.cub.2013.12.002
 35
 36
 37

38
Complementary spatial firing in place cellinterneuron pairsThe Journal of Physiology 588:4165–4175.https://doi.org/10.1113/jphysiol.2010.194274
 39
 40
 41
 42
 43
 44

45
Spatially Periodic Cells Are Neither Formed From Grids Nor Poor IsolationSpatially Periodic Cells Are Neither Formed From Grids Nor Poor Isolation.
 46
 47
 48
 49

50
Hippocampal pyramidal cellinterneuron spike transmission is frequency dependent and responsible for place modulation of interneuron dischargeJournal of Neuroscience 22:RC197.
 51

52
Path integration and the neural basis of the 'cognitive map'Nature Reviews Neuroscience 7:663–678.https://doi.org/10.1038/nrn1932

53
"Dead reckoning," landmark learning, and the sense of direction: a neurophysiological and computational hypothesisJournal of Cognitive Neuroscience 3:190–202.https://doi.org/10.1162/jocn.1991.3.2.190
 54
 55

56
watsons_u2, version 09f649aGitHub.
 57

58
The role of constraints in hebbian learningNeural Computation 6:100–126.https://doi.org/10.1162/neco.1994.6.1.100
 59

60
Hippocampal spiketiming correlations lead to hexagonal grid fieldsPhysical Review Letters 119:038101.https://doi.org/10.1103/PhysRevLett.119.038101

61
Place cells, grid cells, and the brain's spatial representation systemAnnual Review of Neuroscience 31:69.https://doi.org/10.1146/annurev.neuro.31.061307.090723
 62

63
On the directional firing properties of hippocampal place cellsJournal of Neuroscience 14:7235–7251.https://doi.org/10.1523/JNEUROSCI.141207235.1994
 64

65
Place cells, navigational accuracy, and the human hippocampusPhilosophical Transactions of the Royal Society B: Biological Sciences 353:1333–1340.https://doi.org/10.1098/rstb.1998.0287
 66

67
Place units in the hippocampus of the freely moving ratExperimental Neurology 51:78–109.https://doi.org/10.1016/00144886(76)900558
 68
 69
 70

71
A coupled attractor model of the rodent head direction systemNetwork: Computation in Neural Systems 7:671–685.https://doi.org/10.1088/0954898X_7_4_004

72
Entorhinal cortex grid cells can map to hippocampal place cells by competitive learningNetwork: Computation in Neural Systems 17:447–465.https://doi.org/10.1080/09548980601064846
 73

74
Encoding of head direction by hippocampal place cells in batsThe Journal of Neuroscience 34:1067–1080.https://doi.org/10.1523/JNEUROSCI.539312.2014
 75

76
Grid cell– raw dataGrid cell– raw data, http://www.ntnu.edu/kavli/research/gridcelldata.

77
Hebbian analysis of the transformation of medial entorhinal gridcell inputs to hippocampal place fieldsJournal of Neurophysiology 103:3167–3183.https://doi.org/10.1152/jn.00932.2009
 78
 79

80
Cellular mechanisms of spatial navigation in the medial entorhinal cortexNature Neuroscience 16:325–331.https://doi.org/10.1038/nn.3340

81
Grid alignment in entorhinal cortexBiological Cybernetics 106:483–506.https://doi.org/10.1007/s0042201205137

82
From grid cells to place cells: a mathematical modelHippocampus 16:1026–1031.https://doi.org/10.1002/hipo.20244
 83
 84

85
Selforganization of grid fields under supervision of place cells in a neuron model with associative plasticityBiologically Inspired Cognitive Architectures 13:48–62.https://doi.org/10.1016/j.bica.2015.06.006

86
Inferotemporal cortex and object visionAnnual Review of Neuroscience 19:109–139.https://doi.org/10.1146/annurev.ne.19.030196.000545

87
Headdirection cells recorded from the postsubiculum in freely moving rats. II. Effects of environmental manipulationsJournal of Neuroscience 10:436–447.
 88

89
Associative memory and hippocampal place cellsInternational Journal of Neural Systems 6:81–86.

90
The chemical basis of morphogenesisPhilosophical Transactions of the Royal Society B: Biological Sciences 237:37–72.https://doi.org/10.1098/rstb.1952.0012

91
Parallel computational subunits in dentate granule cells generate multiple place fieldsPLoS Computational Biology 5:e1000500.https://doi.org/10.1371/journal.pcbi.1000500

92
The anatomy of memory: an interactive overview of the parahippocampalhippocampal networkNature Reviews Neuroscience 10:272–282.https://doi.org/10.1038/nrn2614
 93

94
Theta sequences are essential for internally generated hippocampal firing fieldsNature Neuroscience 18:282–288.https://doi.org/10.1038/nn.3904
 95

96
Integration of grid maps in merged environmentsNature Neuroscience 21:92–101.https://doi.org/10.1038/s4159301700366
 97
 98
 99

100
Discrete place fields of hippocampal formation interneuronsJournal of Neurophysiology 97:4152–4161.https://doi.org/10.1152/jn.01200.2006
 101
 102
 103

104
Specific evidence of lowdimensional continuous attractor dynamics in grid cellsNature Neuroscience 16:1077–1084.https://doi.org/10.1038/nn.3450
 105
 106

107
Models of grid cell spatial firing published 20052011Frontiers in Neural Circuits 6:16.https://doi.org/10.3389/fncir.2012.00016
Decision letter

David FosterReviewing Editor; University of California, Berkeley, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
[Editors’ note: a previous version of this study was rejected after peer review, but the authors submitted for reconsideration. The first decision letter after peer review is shown below.]
Thank you for submitting your work entitled "Learning place cells, grid cells and invariances with excitatory and inhibitory plasticity" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Senior Editor and a Reviewing Editor. The reviewers have opted to remain anonymous.
The reviewers independently prepare their comments and then engaged in a discussion, coordinated by the Reviewing Editor, leading to the final decision. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife.
While all three reviewers found merit in the work, there were significant questions raised about the generality of the model. In particular, there was the concern that the model relies too heavily on assumed values for certain parameters (inputs, and learning rates), so that the contribution of the model is hard to interpret. For this reason, it was felt that the work, while potentially valuable, would be better directed to a more specialized readership than the general neuroscience readership of eLife.
Reviewer #1:
The authors present a mathematical model with the ability to generate grid cells, as well as place cells. In this model, each neuron receives excitatory and inhibitory input, which can exhibit varying degrees of smoothness across space. The neural activity evolves according to classic Hebbian plasticity rules, and the relationship between the smoothness of the excitatory versus inhibitory input dictates whether a place field, grid fields, or no spatial tuning at all, is exhibited by the neuron after learning. The authors showed that this model can learn somewhat rapidly, and extended their results to include head direction tuning as well. I think while it is quite bold to present a nonattractor network model of grid cells, especially when there's a lot of evidence that point to grid cells as an attractor network (e.g. Yoon et al., 2013), this model is also interesting and may help explain some of the other nongrid spatial patterns, and conjunctive coding, found in MEC and other parahippocampal areas. The paper is also clear, wellpresented, and appears to be very transparent, which is appreciated. However, I think this paper could be further improved by comparing to experimental results, and by fully detailing the caveats and differences with the attractor network model.
1) I don't know if this is possible, but it would be nice if the authors could connect more of the predictions of the model back to data. For example, the authors show that in their conjunctive gridhead direction cells, the head direction preference is also a function of location. The authors state that whether or not this is the case is the data is 'unresolved', but it would be really nice if the authors could dig into this a bit further. This is another prediction that could potentially allow the field to distinguish between this model and the attractor network, and with the amount of grid cell data that is publicly available, it seems possible to at least demonstrate what is currently known.
2) Additionally, it is not completely clear what bells and whistles would have to be added to this model to reproduce the experimental results that the attractor network captures (e.g. that nearby grid cells have a similar orientation and spacing, and may preferentially functionally connect to each other (Dunn et al., 2015)). The authors should expand upon this part so the two models can be compared on a more equal footing.
3) The timescale of learning seems to be a bit of an issue to me. I appreciate that this model does better than others, but I still think that this part of the paper could be expanded upon a bit. When an animal is in a novel environment, the grid pattern appears to be there almost immediately (Hafting et al., 2005). Perhaps this is difficult to actually see, since the animal needs to cover the environment at least once over to see a robust grid pattern. It might be nice to see some examples of what the firing pattern would look like after 35 minutes of exploration (perhaps with behavioral trajectories from the Sargolini data), so the model can be mapped directly back to what has been observed experimentally.
Alternatively – I suppose one way around this is that the spatial inputs to the cell don't change with environment, but then this input can't be place cells, which can remap across environments. Perhaps another alternative is that the input comes from nongrid spatial cells in MEC, although these also might not be stable enough across environments (Diehl et al., 2017). Have the authors considered whether the inputs do (or don't) change across environments, and what that might do the stability of the grid pattern?
4) It is a little unclear to me what the spatial inputs are – biologically speaking – to the place cells. The authors state that this is not fully resolved, but I think this should be fleshed out a bit more, given that the entire basis of the model is on these cell types. The input to the grid cells (combinations of place cell, or place cell like cells) seems a bit more grounded, but the authors should expand upon what the inputs to the place cells might actually be. Otherwise, in some parts of the paper, it seems a bit like a place cell is generated from the combined inputs of other place cells (which could also be true, but then as presented feels a bit circular and not as exciting or novel).
5) Is it possible to get nongrid spatial cells, and nongrid spatial cells that also encode head direction and/or running speed, like that seen in Deihl et al., 2017?
Reviewer #2:
This paper presents a neural network model of the development of the spatial tunings of different cell types in the hippocampal formation, with special emphasis on grid cells. The authors use ratebased excitatory and inhibitory neurons, in conjunction with simple learning rules, to show that based on the relative smoothness of the spatial profiles of the inhibitory and excitatory inputs, the same learning rule can yield gridlike, placelike, or spatially invariant tunings in the output neurons. In particular, grid cell period is determined by the width of the inhibitory inputs, as shown through simulation and analytics. The authors demonstrate their finding using both simplistic unimodal spatial inputs as well as more realistic nonlocal, multimodal inputs.
I found the paper to be easy to understand, in particular because the learning rules (Hebbian excitatory and homeostatic Hebbian inhibitory) are straightforward, and their effects on the selforganization of the overall tuning by the rearrangement of activity clusters is intuitive and well described. Further, I found the extension of the model to nonlocal inputs to be an important result, as too often overly simplistic inputs are used for training. I have a few major comments:
1) As the authors point out, it is well known that grids belonging to the same module share a similar orientation, and that this is an issue for singlecell feedforward models. The authors cite Si et al., 2012 as a potential mechanism for orienting the grids, who show that grid formation and alignment emerge simultaneously. Given that a major thrust of the paper is to show that grids can be learned on the time scale at which they are experimentally ascertained, and therefore that the learning rule is biologically plausible, it is incumbent on the authors to demonstrate that learning to align the grids will not interfere with grid selforganization as proposed in their model, and that it is possible to do so within a reasonable time frame. A supplementary figure should be sufficient.
2) Along similar lines, the authors criticize continuous attractor network (CAN) developmental models of grid cells (in particular, Widloski and Fiete 2014) for their slowness of learning. The authors miss the point here, for while in these models grids are slow to develop during exposure to the first environment, during which the recurrent weights develop (trained on unimodal spatial inputs similar to those used in the author's work), they are rapidly expressed in any other environment afterwards, unfamiliar or not. This is because, once the continuous attractor is established (and this only needs to be done once), grid field expression is simply a matter of network path integration. Thus, grids appear instantly, even in the absence of localized information (e.g., darkness), and shared grid orientation comes for free.
3) In Figure 3—figure supplement 3, the author's state that "early firing fields are still present in the final grid", and say that this is consistent with Hafting et al., 2005. In this figure, the considerable drift in the fields over time makes it hard to relate fields expressed at the end of learning with those at the beginning. I would like to see this quantified in some way other than gridness score, maybe through measuring rate map correlations between the final mature map and the map as it develops in time. To what extent is there phase/scale/orientation drift through time? Further, it is interesting that, according to Figure 3C, the gridness develops rather abruptly, consistent with papers from Wills et al., (Wills et al., 2010, 2012). Does phase/scale/orientation also develop/stabilize with a similar time course?
4) The distinction between feedforward models (for example in this paper) and CAN models is clear, but less so between different feed forward models that selforganize in similar ways. I would like to see these fleshed out a bit more in the Discussion, in particular with regards to Castro and Aguiar, 2014 and Stepanyuk, 2015, as many might not be aware of them.
5) It is well known that grid cells recorded in 1d exhibit tuning curves that are strikingly nonperiodic. Recently, it was shown (Yoon et al., 2016) that for cells belonging to the same module, these 1d responses are consistent with slices through the same 2d lattice. The authors propose a model that can potentially develop aperiodic grids when using nonlocal inputs. However, getting the right aperiodicities across cells so that they correspond to slices through the same 2d lattice would seem difficult.
Reviewer #3:
This study proposes a general mechanism for the emergence of the diverse spatial correlates found in the hippocampal formation. Through formal analysis and computational simulations, the authors show that the firing patterns of place cells, grid cells, head direction cells, and related conjunctive correlates can all be obtained by particular forms of synaptic plasticity operating on spatially modulated, excitatory and inhibitory inputs. This framework is interesting for at least two reasons. First, it offers insights that complement those derived from models based on path integration, especially in regard to the sensorydependent characteristics of these correlates. Second, it suggests a specific role for inhibitory inputs and circuits, which have received increasing attention in the context of the entorhinal cortical architecture and grid cells. Below I summarize my main concerns before providing more detailed comments.
The success of this model appears to rest on the ad hoc design of spatial inputs (both excitatory and inhibitory) that often seem unrealistic, combined with equally ad hoc choices for the learning rates of the synaptic plasticity operating on these inputs (examples discussed below). Moreover, while I appreciate the theoretical appeal of reducing the emergence of the many diverse types of spatial correlates to a single unifying mechanism, this approach is very likely to overlook the functional significance of the complex and varied architecture of the circuits of the hippocampal formation, and to oversimplify the computational interaction of the spatial correlates in these circuits. Thus the proposed framework seems to shift the complexity of the spatial tunings that it aims to explain one synapse upstream, by assuming a perplexing and yet undocumented degree of specificity and organization in the modulation of inhibitory input patterns and their plasticity.
As to the organization of the paper, the mathematical analysis that is described in the Materials and methods section appears an important part of the results in this study. Checking all the mathematical derivations thoroughly is a task better fitting the peerreview style and timeframe of more theoretically oriented journals, and I have only read this part superficially. Still, I think that a summary of the main results of this analysis should be included in the Results section and accompanied by an intuitive explanation. Full derivations should still be provided separately (as they are now, but possibly in an appendix, rather than in the Materials and methods section – again this is because I see the mathematical analysis as one of the contributions/results of this paper, rather than a method). I think such a reorganization would benefit the dissemination of the complete results of this paper to a general audience.
https://doi.org/10.7554/eLife.34560.036Author response
[Editors’ note: the author responses to the first round of peer review follow.]
Reviewer #1:
The authors present a mathematical model with the ability to generate grid cells, as well as place cells. In this model, each neuron receives excitatory and inhibitory input, which can exhibit varying degrees of smoothness across space. The neural activity evolves according to classic Hebbian plasticity rules, and the relationship between the smoothness of the excitatory versus inhibitory input dictates whether a place field, grid fields, or no spatial tuning at all, is exhibited by the neuron after learning. The authors showed that this model can learn somewhat rapidly, and extended their results to include head direction tuning as well. I think while it is quite bold to present a nonattractor network model of grid cells, especially when there's a lot of evidence that point to grid cells as an attractor network (e.g. Yoon et al., 2013), this model is also interesting and may help explain some of the other nongrid spatial patterns, and conjunctive coding, found in MEC and other parahippocampal areas. The paper is also clear, wellpresented, and appears to be very transparent, which is appreciated. However, I think this paper could be further improved by comparing to experimental results, and by fully detailing the caveats and differences with the attractor network model.
1) I don't know if this is possible, but it would be nice if the authors could connect more of the predictions of the model back to data. For example, the authors show that in their conjunctive gridhead direction cells, the head direction preference is also a function of location. The authors state that whether or not this is the case is the data is 'unresolved', but it would be really nice if the authors could dig into this a bit further. This is another prediction that could potentially allow the field to distinguish between this model and the attractor network, and with the amount of grid cell data that is publicly available, it seems possible to at least demonstrate what is currently known.
We now extended our manuscript to discuss more data. For example we reproduce a recent experiment of coalescing grids in contiguous environments (now Figure 4). We also looked at publicly available data of the head direction tuning of individual grid fields (Figure 6—figure supplement 1, former Figure 5). Unfortunately, this analysis is inconclusive because of a substantial trajectory bias in most of the grid fields, i.e., the distribution of head directions was too uneven in most individual grid fields. This problem tends to be less prominent for grid fields in the center of the arena. A thorough analysis would hence require grid cell recordings with several central firing fields, i.e., smaller grid spacing. Such recordings exist (e.g., Stensola et al. 2015), but we had no success in obtaining these data.
2) Additionally, it is not completely clear what bells and whistles would have to be added to this model to reproduce the experimental results that the attractor network captures (e.g. that nearby grid cells have a similar orientation and spacing, and may preferentially functionally connect to each other (Dunn et al., 2015)). The authors should expand upon this part so the two models can be compared on a more equal footing.
We appreciate that this is the achilles heel of all single cell models. Si et al., 2012 achieved a coorientation of grid cells using recurrent connectivity, but required a set of intricate mechanisms to simultaneously obtain uncorrelated grid phases. In general, we strongly suspect that obtaining a coorientation and phase dispersion together – while learning on all synapses – is challenging. In a separate project, a student is investigating to what extent the suggested mechanism can be implemented in a recurrent and fully plastic network. Even on a linear track – hence ignoring orientation of the grid – the phenomenology is very rich and clearly beyond the scope of the present manuscript. CAN models had 10+ years of refinement to accommodate an increasing number of experimental observations, and it is hard to exceed the resulting high bar with a new model.
3) The timescale of learning seems to be a bit of an issue to me. I appreciate that this model does better than others, but I still think that this part of the paper could be expanded upon a bit. When an animal is in a novel environment, the grid pattern appears to be there almost immediately (Hafting et al., 2005). Perhaps this is difficult to actually see, since the animal needs to cover the environment at least once over to see a robust grid pattern. It might be nice to see some examples of what the firing pattern would look like after 35 minutes of exploration (perhaps with behavioral trajectories from the Sargolini data), so the model can be mapped directly back to what has been observed experimentally.
The color coding in Figure 3A, B is showing the activity during learning, not after learning. Thus, the whole learning process is in principle visible in the figure. We now highlight this in the revised figure caption.
Alternatively – I suppose one way around this is that the spatial inputs to the cell don't change with environment, but then this input can't be place cells, which can remap across environments. Perhaps another alternative is that the input comes from nongrid spatial cells in MEC, although these also might not be stable enough across environments (Diehl et al., 2017). Have the authors considered whether the inputs do (or don't) change across environments, and what that might do the stability of the grid pattern?
Thank you very much for this interesting suggestion. In the revised manuscript we studied this and added a small section to the main text and a new figure as a supplement to Figure 3 (Figure 3—figure supplement 2). In short: the output firing pattern is robust to changes of a substantial fraction of the inputs. If all inputs are remapped, a grid is learned anew.
New text in Results paragraph “Rapid appearance of grid cells and their reaction to modifications of the environment”:
“Above, we modeled the exploration of a previously unknown room by assuming the initial synaptic weights to be randomly distributed. […] The strong initial pattern in the weights does not hinder this development (Figure 3—figure supplement 2).”
New text in the Discussion:
“Similarly, we found that room switches in our model lead to grid patterns of the same grid spacing but different phases and orientations. […] It would be interesting to study if a rotation of a fraction of the input would lead to a bimodal distribution of grid rotations: No rotation and corotation with the rotated input, as recently observed in experiments where distal cues were rotated but proximal cues stayed fixed (Savelli et al., 2017).”
4) It is a little unclear to me what the spatial inputs are – biologically speaking – to the place cells. The authors state that this is not fully resolved, but I think this should be fleshed out a bit more, given that the entire basis of the model is on these cell types. The input to the grid cells (combinations of place cell, or place cell like cells) seems a bit more grounded, but the authors should expand upon what the inputs to the place cells might actually be. Otherwise, in some parts of the paper, it seems a bit like a place cell is generated from the combined inputs of other place cells (which could also be true, but then as presented feels a bit circular and not as exciting or novel).
The requirement to obtain place cells is that the inhibitory input tuning is very smooth or not tuned to location at all, whereas the excitatory inputs show some kind of tuning to location. The type of the excitatory tuning is not crucial, be it place cells, grid cells or some form of nonlocalized nonperiodic cells. In the revised manuscript, we emphasized this finding in the paragraph “Place cells, band cells and stretched grids” and added a simulation to Figure 5 (former Figure 4), where we obtain a place cell from excitatory grid cell input from EC and untuned inhibition.
New text in Results paragraph “Place cells, band cells and stretched grids”:
“The emergence of place cells is independent of the exact shape of the excitatory input. Nonlocalized inputs (Figure 5A) lead to similar results as grid celllike inputs of different orientation and grid spacing (Figure 5B, Materials and methods); for other models for the emergence of place cells from grid cells see (Solstad et al., 2006; Franzius et al., 2007b; Rolls et al., 2006; Molter and Yamaguchi, 2008; Ujfalussy et al., 2009; Savelli and Knierim, 2010).”
5) Is it possible to get nongrid spatial cells, and nongrid spatial cells that also encode head direction and/or running speed, like that seen in Deihl et al., 2017?
Cells that would not be classified as grid cells occur naturally in our model. This can be seen, e.g., in the broad distribution of grid scores in Figure 2. Most of these cells show distorted grids, however. It is not difficult to obtain nongrid spatial cells, e.g., by perturbing the smoothness conditions in the input signals locally. Since there are many ways of introducing irregularities, any particular solution would appear arbitrary. For this reason, we did not include simulations in the manuscript. We now cite Diehl 2017 in the section 'Place cells, band cells and stretched grids’, where we introduce the zoo of firing patterns.
We added a small paragraph on running speed modulation to our Discussion:
“In addition, CAN models require that conjunctive (grid x head direction) cells are positively modulated by running speed. […] We expect that in this case, the output neuron would inherit a speed tuning from the input but would otherwise develop similar spatial tuning patterns.”
Reviewer #2:
This paper presents a neural network model of the development of the spatial tunings of different cell types in the hippocampal formation, with special emphasis on grid cells. The authors use ratebased excitatory and inhibitory neurons, in conjunction with simple learning rules, to show that based on the relative smoothness of the spatial profiles of the inhibitory and excitatory inputs, the same learning rule can yield gridlike, placelike, or spatially invariant tunings in the output neurons. In particular, grid cell period is determined by the width of the inhibitory inputs, as shown through simulation and analytics. The authors demonstrate their finding using both simplistic unimodal spatial inputs as well as more realistic nonlocal, multimodal inputs.
I found the paper to be easy to understand, in particular because the learning rules (Hebbian excitatory and homeostatic Hebbian inhibitory) are straightforward, and their effects on the selforganization of the overall tuning by the rearrangement of activity clusters is intuitive and well described. Further, I found the extension of the model to nonlocal inputs to be an important result, as too often overly simplistic inputs are used for training. I have a few major comments:
1) As the authors point out, it is well known that grids belonging to the same module share a similar orientation, and that this is an issue for singlecell feedforward models. The authors cite Si et al., 2012 as a potential mechanism for orienting the grids, who show that grid formation and alignment emerge simultaneously. Given that a major thrust of the paper is to show that grids can be learned on the time scale at which they are experimentally ascertained, and therefore that the learning rule is biologically plausible, it is incumbent on the authors to demonstrate that learning to align the grids will not interfere with grid selforganization as proposed in their model, and that it is possible to do so within a reasonable time frame. A supplementary figure should be sufficient.
See reply to reviewer #1, point 2 above.
2) Along similar lines, the authors criticize continuous attractor network (CAN) developmental models of grid cells (in particular, Widloski and Fiete 2014) for their slowness of learning. The authors miss the point here, for while in these models grids are slow to develop during exposure to the first environment, during which the recurrent weights develop (trained on unimodal spatial inputs similar to those used in the author's work), they are rapidly expressed in any other environment afterwards, unfamiliar or not. This is because, once the continuous attractor is established (and this only needs to be done once), grid field expression is simply a matter of network path integration. Thus, grids appear instantly, even in the absence of localized information (e.g., darkness), and shared grid orientation comes for free.
Thanks for spotting this imprecise phrasing. In the revised manuscript, we removed the citation at this particular point in the text and explicitly comment on this matter in more detail in our Discussion:
“Learning the required connectivity in CAN models can take a long time (Widloski and Fiete, 2014). […] The pattern emerges rapidly, but is not instantaneously present (Figure 3—figure supplement 2).”
3) In Figure 3—figure supplement 3, the author's state that "early firing fields are still present in the final grid", and say that this is consistent with Hafting et al., 2005. In this figure, the considerable drift in the fields over time makes it hard to relate fields expressed at the end of learning with those at the beginning. I would like to see this quantified in some way other than gridness score, maybe through measuring rate map correlations between the final mature map and the map as it develops in time. To what extent is there phase/scale/orientation drift through time? Further, it is interesting that, according to Figure 3C, the gridness develops rather abruptly, consistent with papers from Wills et al., (Wills et al., 2010, 2012). Does phase/scale/orientation also develop/stabilize with a similar time course?
The abruptness in Figure 3C is an artifact of the grid score, which is highly sensitive to displaced firing fields. On reasonable time scales (many tens of hours), we observed no grid drift, i.e., phase, spacing and orientation are rather stable. This can be seen, e.g., in the newly added Figure 3—figure supplement 2C (with remapping fraction 0), which shows the correlation coefficient of the grid pattern after 5h of learning with the grid patterns at earlier and later times. The correlation of the grid at 5 hours with the grid at 10 hours is very high considering the small size of the grid fields, suggesting that fields drift only very little once they are learned. The same figure also illustrates the claimed stability of the firing fields that arise early during learning, as the correlation with the grid pattern after 5h shows a very steep rise at learning onset. We refer to this figure also when we talk about grid stability: See response to major comment #3 of reviewer #1.
4) The distinction between feedforward models (for example in this paper) and CAN models is clear, but less so between different feed forward models that selforganize in similar ways. I would like to see these fleshed out a bit more in the Discussion, in particular with regards to Castro and Aguiar, 2014 and Stepanyuk, 2015, as many might not be aware of them.
In the revised version we now highlight the differences to our model when we cite these papers:
“Other models that explain the emergence of grid patterns from place cell input through synaptic depression and potentiation also develop grid cells in realistic times (Castro and Aguiar, 2014; Stepanyuk, 2015). […] How these models generalize to potentially nonlocalized input is yet to be shown.”
5) It is well known that grid cells recorded in 1d exhibit tuning curves that are strikingly nonperiodic. Recently, it was shown (Yoon et al., 2016) that for cells belonging to the same module, these 1d responses are consistent with slices through the same 2d lattice. The authors propose a model that can potentially develop aperiodic grids when using nonlocal inputs. However, getting the right aperiodicities across cells so that they correspond to slices through the same 2d lattice would seem difficult.
The aperiodicities from general inputs would indeed not have to be consistent with slices through a 2d grid cell. We added a paragraph to our Discussion:
“A recent analysis has shown that the periodic firing of entorhinal cells in rats that move on a linear track can be assessed as slices through a hexagonal grid (Yoon et al., 2016), which arises naturally in a two dimensional CAN. In our model, we would obtain slices through a hexagonal grid if the rat learns the output pattern in two dimensions and afterwards is constrained to move on a linear track that is part of the same arena. If the rat learns the firing pattern on the linear track from scratch, the firing fields would be periodic.”
Reviewer #3:
This study proposes a general mechanism for the emergence of the diverse spatial correlates found in the hippocampal formation. Through formal analysis and computational simulations, the authors show that the firing patterns of place cells, grid cells, head direction cells, and related conjunctive correlates can all be obtained by particular forms of synaptic plasticity operating on spatially modulated, excitatory and inhibitory inputs. This framework is interesting for at least two reasons. First, it offers insights that complement those derived from models based on path integration, especially in regard to the sensorydependent characteristics of these correlates. Second, it suggests a specific role for inhibitory inputs and circuits, which have received increasing attention in the context of the entorhinal cortical architecture and grid cells. Below I summarize my main concerns before providing more detailed comments.
The success of this model appears to rest on the ad hoc design of spatial inputs (both excitatory and inhibitory) that often seem unrealistic, combined with equally ad hoc choices for the learning rates of the synaptic plasticity operating on these inputs (examples discussed below). Moreover, while I appreciate the theoretical appeal of reducing the emergence of the many diverse types of spatial correlates to a single unifying mechanism, this approach is very likely to overlook the functional significance of the complex and varied architecture of the circuits of the hippocampal formation, and to oversimplify the computational interaction of the spatial correlates in these circuits. Thus the proposed framework seems to shift the complexity of the spatial tunings that it aims to explain one synapse upstream, by assuming a perplexing and yet undocumented degree of specificity and organization in the modulation of inhibitory input patterns and their plasticity.
We are a bit surprised by the impression of the reviewer that we chose our input patterns adhoc, and interpret this as a signal that we presented our case poorly. The core idea of the model is that it is only statistical properties of the input tuning functions that shape the output patterns (their smoothness, i.e., autocorrelation length), and that details of these patterns do not change the results. Of course, any computational model has to commit to a particular input tuning to run analyses. Since we do not know enough about the actual inputs to the various cell types, we resorted to illustrating that the suggested mechanism produces the same results for various different input tunings, including input tuning functions that minimize any further assumptions (the Gaussian random field inputs in Figures 1, 2, 5 are samples from a maximumentropy distribution for a given autocorrelation function). We were trying to make the point that the requirements of the model are relatively mild and leave significant flexibility regarding the actual input tuning functions, with the exception of the smoothness assumptions, which we regard as testable predictions. To further strengthen this point, we now added simulations with grid cells as inputs, and a simulation with placefieldlike excitation and nonlocalized inhibition. If the reviewer has a suggestion how to make this point even clearer, we would appreciate it.
As to the organization of the paper, the mathematical analysis that is described in the Materials and methods section appears an important part of the results in this study. Checking all the mathematical derivations thoroughly is a task better fitting the peerreview style and timeframe of more theoretically oriented journals, and I have only read this part superficially. Still, I think that a summary of the main results of this analysis should be included in the Results section and accompanied by an intuitive explanation. Full derivations should still be provided separately (as they are now, but possibly in an appendix, rather than in the Materials and methods section – again this is because I see the mathematical analysis as one of the contributions/results of this paper, rather than a method). I think such a reorganization would benefit the dissemination of the complete results of this paper to a general audience.
We also regard the mathematical analysis as a strong point and result of the paper. For this reason, we decided to keep it in the Materials and methods section, to make sure it remains part of the main manuscript. Either option is fine for us, and we would leave this decision to the editors.
https://doi.org/10.7554/eLife.34560.037Article and author information
Author details
Funding
Bundesministerium für Bildung und Forschung (FKZ 01GQ1201)
 Henning Sprekeler
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Reviewing Editor
 David Foster, University of California, Berkeley, United States
Publication history
 Received: December 21, 2017
 Accepted: February 19, 2018
 Accepted Manuscript published: February 21, 2018 (version 1)
 Version of Record published: April 30, 2018 (version 2)
Copyright
© 2018, Weber et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 2,057
 Page views

 407
 Downloads

 3
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, Scopus, PubMed Central.