Abstract
Summary
Discoveries of functional cell types, exemplified by the cataloging of spatial cells in the hippocampal formation, are heralded as scientific breakthroughs. We question whether the identification of cell types based on human intuitions has scientific merit and suggest that “spatial cells” may arise in non-spatial computations of sufficient complexity. We show that deep neural networks (DNNs) for object recognition, which lack spatial grounding, contain numerous units resembling place, border, and head-direction cells. Strikingly, even untrained DNNs with randomized weights contained such units and support decoding of spatial information. Moreover, when these “spatial” units are excluded, spatial information can be decoded from the remaining DNN units, which highlights the superfluousness of cell types to spatial cognition. Now that large-scale simulations are feasible, the complexity of the brain should be respected and intuitive notions of cell type, which can be misleading and arise in any complex network, should be relegated to history.
1 Introduction
Spatial cognition encompasses our cognitive abilities to know where we are and navigate in complex environments, which includes self-localization, comprehension of environmental layouts, and efficient navigation toward distant goal locations. The prevailing view of the field attributes these functions to neural machinery within the hippocampal formation which contain a diverse array of “spatial” cell types including place cells, head-direction cells, border cells, and grid cells (e.g., Hartley et al. 2014; Moser et al. 2008; Grieves and Jeffery 2017), and that these cells collectively form the foundation of an internal cognitive map of space that underlie our spatial abilities (O’Keefe and Dostrovsky, 1971; Taube et al., 1990; Lever et al., 2009; Hafting et al., 2005). As such, a substantial proportion of the field focuses on characterizing particular functional “cell types” based on neural activity patterns in these regions. However, these “spatial cells” were identified due to their intriguing firing patterns that piqued the interest of neuroscientists and later defined based on subjective criteria – place cells appear to encode particular locations, border cells respond to borders, head-direction cells are sensitive to a heading direction, grid cells exhibit firing fields in a grid-like pattern. In practice, the field’s initial fascination with these cells has led to a stubborn tendency to focus on “cell type” classification and overlook the fact that the criteria for distinct cell types seldom align seamlessly with empirical data and often exhibit mixed selectivity across task and environmental variables (e.g., Hollup et al. 2001; McKenzie et al. 2013; Dupret et al. 2010; Eichenbaum 2015; Grieves and Jeffery 2017; Latuske et al. 2015; Tang et al. 2015), raising questions about their precise roles, if any.
Could these cells simply arise from any systems with complex computations, unrelated to processes specific to spatial or navigation? There are strong hints that “spatial” cells may arise simply from domain-general learning algorithms that can lead to both concept and spatial neural representations (Mok and Love, 2019, 2023; Stachenfeld et al., 2017; Whittington et al., 2020). Indeed, models show that spatial firing patterns can arise from factors entirely unrelated to spatial cognition, such as sparseness constraints (Kropff and Treves, 2008; Franzius et al., 2007a,b).
Another question is whether these cells are particularly important for spatial cognition. Research has shown that “spatial” cells may not be functionally more critical than those with other cells with multiplexed or hard-to-interpret firing patterns for spatial localization (Diehl et al., 2017) or for explaining neuronal response profiles in the medial entorhinal cortex (Nayebi et al., 2021). Even lesion studies in the hippocampal formation do not consistently result in specific navigation impairments (Hales et al., 2014; Whishaw and Tomie, 1997; Whishaw et al., 1997).
Considering that both biological and artificial “cells” rarely perfectly align with human-defined criteria with many cells exhibit “uninterpretable” firing patterns, one unsettling possibility is that the neuroscience may have inadvertently constrained its pursuit by fixating on the search for interpretable “cell types”. By adopting this top-down, perhaps näıve approach, the field may be investing a large proportion of its resources in a potentially fruitless quest, ignoring other explanations for the emergence of spatial cells and the foundations of spatial cognition.
A harsh appraisal of the field is that it looks for whatever cell type superficially reflects the answer sought. For example, how do animals localize themselves? The naive answer is that there must be place cells. Setting aside this answer provides zero insight into how such a cell came to be, there is no reason for the brain to perfectly align with our intuitions and, equally problematic, cells with these properties might arise but not serve the top-down function neuroscientist ascribe to them. Ascribing interpretable functions to cells and naming them may be good for neuroscience careers, but is it good for neuroscience?
In this contribution, we offer an alternative to the prevailing notion that cells with readily interpretable receptive fields in regions like hippocampus underlie spatial systems. We propose that what we commonly refer to as “spatial cells” might be inevitable derivatives of general computational mechanisms, and are in no way intrinsically spatial. Our results demonstrate that these cells could manifest within any sufficiently rich information processing system, such as perception. Remarkably, not only do we find spatial signals within a non-spatial system but also demonstrate that these purported signals, ostensibly representing spatial knowledge, do not appear to have a privileged role for tangible downstream tasks related to spatial cognition as previously claimed.
To evaluate these possibilities, we turned to deep neural networks (DNNs) that were information processing systems devoid of components dedicated to spatial processing. Specifically, we analyzed deep learning models of perception with hundreds of millions of parameters. DNNs trained on natural images have achieved human-level performance in recognizing real-world objects in photographs (Krizhevsky et al., 2012; Simonyan and Zisserman, 2015). These models exhibit strong parallels with representations in the primate ventral visual stream (Khaligh-Razavi and Kriegeskorte, 2014; Yamins et al., 2014; Güçlü and van Gerven, 2015; Eickenberg et al., 2017; Zeman et al., 2020). Our investigation aims to determine whether processing egocentric visual information within such complex – yet non-spatial – models can lead to brain-like representations of allocentric spatial environments, and whether these representations are essential for spatial cognition. Additionally, we considered whether untrained DNNs (absent any experiences) can account for the emergence of spatial cells and spatial cognition.
To evaluate functional spatial information in perceptual DNN models, we began by creating a three-dimensional virtual environment reminiscent of a laboratory used in animal studies in neuroscience (Fig. 1). In this virtual setting, an agent randomly forages in a two-dimensional square area, much like an animal would explore an enclosure. The agent “sees” images of the room within its field of view over many locations and heading directions. Images from the first-person perspective are processed by a DNN, and we assess whether the internal representations of the model contain various kinds of spatial knowledge. We adopted a decoding framework which parallels experimental techniques employed by neuroscientists. Specifically, we trained linear regression models using various levels of representations (unit activations from visual inputs) generated by DNNs of different architectures to decode navigation-relevant variables, including self-localization, heading direction, and distance to the closest wall on locations and heading directions they have not encountered before. To test whether DNN units carry information like “spatial” cells in the brain, we classified and visualized model units based on standard criteria for place, direction, and border cells (Tanni et al., 2022; Banino et al., 2018). To evaluate whether these cells play a privileged role in spatial cognition, we excluded units classified as spatial units based on traditional spatial cell criteria and assessed their spatial knowledge.
To foreshadow our findings, we discovered that DNNs, regardless of their architectural variations, representation levels, or training states, possess noteworthy proficiency in allocentric spatial understanding. Spatial variables including location, heading direction and distance to boundaries can be accurately decoded from the DNN models. Additionally, a substantial number of DNN units exhibit spatial firing patterns that align with the criteria used in neuroscience to classify place, head-direction, and border cells. Notably, a significant portion of these units employs mixed-selectivity and conjunctive coding, akin to characteristics found in hippocampal neurons (Eichenbaum, 2015). We further revealed that excluding units meeting the classical criteria of spatial cell types has minimal impact on spatial decoding performance, reinforcing the perspective that the code underpinning spatial cognition is more distributed and diverse than previously thought, relying on neural population coding rather than individual and specialized cell types (Hebb, 1949; Eichenbaum, 2018; Ebitz and Hayden, 2021). In summary, our analyses suggest that “spatial” cell types may be inevitable and superfluous in complex information processing systems.
2 Results
2.1 Spatial Knowledge through Non-spatial Systems
We predict that spatial knowledge can arise in computational systems with sufficient complexity absent a spatial grounding. We test this hypothesis by analyzing the percepts of an agent freely moving in a virtual environment. The percepts take the form of viewpoints fed into a deep neural network with its millions of weights either randomly initialized or trained for object recognition, a non-spatial task. From various network layers, we attempted to decode spatial information such as location, from this complex computational system.
To assess information latent in complex networks, we trained linear regression models to decode the agent’s location, head-direction and distance to the closet border using fixed unit activity extracted at various levels of the networks. While sequential information is important for spatial systems, our decoding framework does not utilize sequential information of sampled views during training, making it a stricter test for the central claim that spatial knowledge can form out of general information processing. We expect that one’s location, heading-direction and distance from border may be decodable from visual information alone due to the inherent continuity of perception across spatial locations and/or directions. All results reported here are tested on out-of-sample locations and views using the perception model VGG-16 (Simonyan et al., 2014), unless noted otherwise. For details referring to training and testing procedure, see Methods. For results of other models such as Resnet-50 (He et al., 2016) and Vision Transformers (ViT; Dosovitskiy et al. 2020), see Appendix.
We find that spatial knowledge, namely one’s own location (spatial coordinates; Fig. 2B, left panel), heading direction (Fig. 2B, middle panel) and distance to the nearest wall (Fig. 2B, right panel) can be successfully decoded from unit activity in a perception model. We quantified the degree of spatial knowledge by the decoding error, that is, the deviation of the model’s prediction of its own location, heading direction, or distance to the nearest wall relative to the ground truth (i.e., the physical unit of space or angle in the virtual environment). We normalized decoding error with respect to the maximal unit length of the moving area (see Methods). For example, a normalized error of 0.05 for location prediction means the decoded location is 5% off from the true location relative to the maximal width/depth of the entire moving area. As more locations and views are sampled for training the agent (x-axis), decoding performance improves. The agent demonstrates impressive decoding performance across all spatial knowledge tests, even when trained on just 30% of randomly selected independent locations and associated viewpoints within the entire space. The moving area encompasses 289 distinct locations, each featuring 24 evenly spaced views covering the full 360-degree spectrum (for more details, please see Methods).
Decoding performance varied across layers of the network where deeper layers typically achieved lower decoding error (apart from the penultimate layer in VGG-16, i.e., fc2). Crucially, decoders across all sampling rates and levels of representation show substantially better performance than chance (green and blue lines), determined by strategies invariant to visual inputs (i.e., random or predicting the center of possible choices; see Methods). Our results demonstrate there is sufficient spatial information to achieve remarkably low decoding error in a perception model based on a few visual snapshots of the virtual space, which could be further improved through the integration of information across views in a perception-based navigation system.
To test the generality our claim that spatial knowledge can arise from complex computational systems irrespective of their architectural variations or training states, we examined several computational systems including deep convolutional neural networks (DCNN) and Vision Transformers (ViT). We evaluated these models in both their pre-trained form (optimized for visual object recognition), and in an untrained state (randomly initialized parameters), following the same decoding approach described earlier. Here, we present the results for the penultimate layer representation and uniformly sampled 30% of all locations and their views for training the decoder, but similar results were observed across layers (for full results across various models, layers, and sampling rates, see Appendix.). We found that all models possessed latent spatial knowledge, exhibiting lower errors far below chance (Figure 2C). Notably, this was true even in untrained networks. While one might contend that deep convolutional neural networks, with their reliance on local convolution operations, inherently possess a predisposition for extracting spatial knowledge from visual information, the revelation that an untrained ViT—comprising nothing more than a hierarchy of fully-connected layers (i.e., self-attention) and non-linear operations—can achieve superior decoding performance illustrates the inevitability of computational complexity in facilitating the extraction of spatial information necessary for cognitive spatial processes, even in non-spatial systems.
2.2 “Spatial Cells” Inevitably Arise in Complex Computational Systems
How was it possible to decode spatial information from non-spatial networks? The field’s common conception is that cells exhibiting spatial firing profiles play a pivotal role in supporting spatial cognition, as these profiles appear intuitively useful to spatial tasks such as navigation. Could spatial cells be responsible for the impressive decoding of spatial information within a perception system like we have shown? If a perception system devoid of a spatial component demonstrates classically spatially-tuned unit representations, such as place, head-direction, and border cells, can “spatial cells” truly be regarded as “spatial”? Might they be inevitable derivatives of any complex system? If so, do they drive spatial knowledge, or are they in fact superfluous in spatial cognition?
To determine if the model’s spatial knowledge is primarily supported by spatially-tuned units like those found in the brain, we classified every hidden unit in each model based on criteria used to identify spatial cells in neuroscience for place cells (Tanni et al., 2022), head-direction cells (Banino et al., 2018), and border cells (Banino et al., 2018), respectively (see Methods). The composition of cell types across layers is shown in Fig. 3A. Contrary to the predominant view that “spatial cells” are unique properties of spatial systems and are grounded in space-related tasks, we found many units in the non-spatial model VGG-16 that satisfied the criteria of place, head-direction, and border cells (see Methods for criteria; for other models, see Appendix). The majority of units show mixed selectivity irrespective of layer depth. Notably, a significant proportion of units show both place and directional tuning (P+D), matching a common observation in the literature (e.g., McNaughton et al. 1991; Tang et al. 2015; Grieves and Jeffery 2017). We selected example units and plotted their spatial activation patterns in the two-dimensional virtual space (irrespective of direction), and their direction selectivity in polar plots (Fig. 3B-3E; for more examples, see Appendix). These examples show spatially-tuned units without directional tuning that match hippocampal place cells (Fig. 3B), direction-selective units with strong direction selectivity accompanied by minor spatial selectivity matching head-directional cells (Fig. 3C), and units with boundary cell-like tuning (Fig. 3D). There were also many units that exhibited both strong place and directional tuning (Fig. 3E). These results support our theory that “spatial cells” might arise in any computational systems, even in systems designed for nonspatial tasks (e.g., object recognition). Consistent with our view, we found no clear relationship between cell type distribution and spatial information in each layer. This raises the possibility that “spatial cells” do not play a pivotal role in spatial tasks as is broadly assumed.
2.3 “Spatial Cells” Are Superfluous in Spatial Cognition
We have establishd that“spatial cells” can arise in non-spatial systems. Here, we consider whether units with spatial properties are necessary to decode spatial information. It may very well be that spatial units, akin to place cells, might not only arise in complex (including random) networks but are also superfluous to spatial cognition. To test this possibility, we performed a systematic exclusion analysis to assess whether model units that exhibit the strongest spatial firing properties contribute to spatial knowledge. First, we scored and ranked each unit classified as place, head-direction, and border cell units (see Methods), and then re-trained linear decoders without the top n units of a specific cell type and evaluated the model’s corresponding spatial knowledge of the environment. That is, we test model’s location, heading direction and distance to closet border decoding ability by excluding place, head-direction and border cell units, respectively. We repeated this procedure with a progressively higher exclusion ratio (see Methods).
In line with our hypothesis, excluding spatial units in the models that scored highest on each of the corresponding criteria (place field activity, number of place fields, directional tuning, and border tuning), had minimal effect on decoding performance even with a large proportion of the highest ranked spatial units being excluded (Fig 4B, top panel). To assess whether any detriment was observed was significant and specific to excluding highly-ranked spatial units, we randomly excluded the same number of units without regard to the spatial score (Fig. 4B, bottom panel) and observed a similar pattern across four exclusion scenarios, meaning that highly-tuned spatial units contributed no more to spatial cognition than a random selection of units.
Finally, we performed an additional analysis where we excluded model units that contributed most to each of the tasks. Specifically, we identified each unit’s importance for each task, based on the magnitudes of the coefficients learned by the linear decoders (trained in Section 2.1), excluded the top n units, and re-trained linear decoders for each respective task using the remaining units. This approach had a greater impact on decoding error compared to spatial unit criteria or random exclusion, with performance suffering more as the exclusion ratio rose (see Figure 4C). Nevertheless, decoding was still possible at high exclusion ratios, suggesting that spatial information is widely distributed across network units.
3 Discussion
The prevailing perspective in neuroscience conflates spatial knowledge and the variety of spatial cognitive abilities in animals with “spatial cells” in the hippocampal formation. In this work, we proposed an alternative perspective where “spatial cells” can simply arise in non-spatial computational systems with sufficient complexity, and that these units may, in fact, be by-products of such systems absent any leading role in spatial cognition.
Deep neural networks (DNNs) for object recognition served as an ideal testbed for our hypotheses. Object recognition DNNs, designed to generate translation-invariant visual features for successful categorization, lack inherent spatial elements. Unlike spatial navigation systems (e.g., Banino et al. 2018; Cueva and Wei 2018), DNNs in object recognition do not purposefully incorporate information about velocity, heading direction, or event sequences. Exploring various DNN architectures for object recognition allows a comprehensive examination of the connection between spatial representations and non-spatial complex systems.
We applied DNNs to a three-dimensional virtual environment to simulate an agent foraging in a two-dimensional space, akin to how an animal explores an enclosure. Detailed analyses revealed that DNN’s representations contained sufficient spatial information to decode key variables, despite these DNNs being non-spatial perceptual systems. We found that many DNN units passed established criteria for classification as place, head-direction, and border cells, suggesting such units can be easily found in various non-spatial computational systems. Furthermore, excluding the “spatial” units had minimal impact on spatial decoding tasks, suggesting a non-essential role for spatial cognition. These results call into question utility of labelling these cell “types”.
We considered object recognition models from two prominent DNN families: deep convolutional networks, drawing inspiration from the mammalian visual system (Fukushima, 1980), and visual transformers, adapted from the Transformer architecture originally designed for natural language processing (Vaswani et al., 2017). Spatial information was readily decoded from models from both families, including untrained models with randomly initialized weights. The decoding results from the untrained visual transformer are particularly striking because transformers incorporate minimal inductive biases, such as constraints that respect spatial proximity (e.g., convolutions). In summary, complex networks that are not spatial systems, coupled with environmental input, appear sufficient to decode spatial information.
Our work raises the question of whether focusing on single cells and their “type” is the correct approach to understanding spatial cognition, or indeed any cognitive function. The field has maintained a persistent inclination towards identifying cell types based on firing profiles that are subjectively interpreted to be useful for the behavior of interest (Hartley et al., 2014; Moser et al., 2008; Sarel et al., 2017; Tsao et al., 2018; Høydal et al., 2019; Grieves et al., 2020; Ormond and O’Keefe, 2022). This is likely due to the historical backdrop in neuroscientific research where pioneers in visual neuroscience discovered and named individual cells in V1 by observing their firing profiles (Hubel and Wiesel, 1959) which won the Nobel Prize (1981) and marked a foundational era for the field. This paradigm, led to discoveries of place and grid cells (Nobel Prize in Physiology or Medicine, 2014), and the “Jennifer Aniston” neuron or “concept cells” (Quiroga et al., 2005), which were intuitively compelling and attractive as an apparent explanation. These cells may in fact play a role, but our work suggests that this must be critically assessed rather than assumed, as they may not play a privileged role compared to other cells in the population, and could even be superfluous for the context at hand. The general assumption that the simplicity and interpretability of firing profiles somehow lends support to a crucial role of these cells should be questioned. With a growing recognition of the imperative to move beyond mere cell type categorization, as underscored by Olshausen and Field (2006) on the insufficient characterization of a majority of V1 cells, there is a new shift to focus on a broader perspective where neural assemblies or populations form the fundamental computational unit (Hebb, 1949; Eichenbaum, 2018; Ebitz and Hayden, 2021).
We pose a broader question: do our preconceptions of how complex systems should work hinder our aim to understand the brain’s inner workings? Notably, spatial representations conventionally linked to the hippocampal formation have been unveiled in sensory areas (Saleem et al., 2018; Long et al., 2021; Long and Zhang, 2021). While prior attributions pointed to modulatory signals from the hippocampus, our study raises concerns about potential biases stemming from preconceived notions and an underestimation of the complexity inherent in the sensory system—it may be shouldering more extensive responsibilities than initially presumed. Work in other domains are reaching similar conclusions (cf. McMahon and Isik 2023).
One upshot of our contribution is that the pursuit of identifying and cataloging cell types aligned with subjective notions of how the brain works might impede scientific progress. While searching for cell types was a reasonable strategy in an era before it was straightforward to conduct large-scale simulations, this quest should now be relegated to history. Otherwise, apparent discoveries of cell types are likely to reflect the activity of broad classes of complex networks that implement a variety of functions or none at all in the case of the random networks we considered. It is far too easy for neuroscientists, including computational neuroscientists who design their models to manifest a variety of cell types, to fool themselves when the underlying complexity of the systems considered is not taking into account.
4 Methods
4.1 Virtual environment
We setup our virtual environment using the Unity3D Engine (editor version: 2021.3.16f1; Silicon). The three-dimensional laboratory environment was adapted from an existing template purchased on the Unity Store (3DEverything, 2022). Our virtual agent’s movement is constrained in a two-dimensional square placed inside the three-dimensional space, much like a moving area for an animal in real-world experiments. The agent randomly moves around the square area and captures first-person pictures of the environment across locations and heading directions.
Environment specifications
The specifications for the environment, measured in Unity units, are as follows: the lab room has a height of 3.17 units, a width of 6.29 units, and a depth of 10.81 units. The moving area, which is a distinct part of the environment, has a height of 0.25 units, a width of 2 units, and a depth of 2 units. Additionally, the moving area is located at a specific location whose relative distances from the center to the lab room boundaries are: 3.59 units to the left wall, 5.35 units to the right wall, 5.55 units to the bottom wall, and 2.80 units to the front wall.
The agent is only allowed to move within the squared moving area. For simplicity, we discretize reachable locations by the agent in the area as a 2D grid where the agent can only move between points on the grid. We use a 17×17 grid (289 unique locations).
Visual input
To collect visual inputs from the virtual environment, the agent randomly moves along the specified n-by-n grid in the squared area. We denote l as the total number of unique locations on the grid. The grid is evenly spaced. At each location, the agent turns around and captures m number of frames, each separated by a fixed angle (e.g., 15 degrees per frame). Each frame obtained by the agent is processed by a DNN (up to a given layer). Resulting outputs at the same location are considered independent data-points, each forming a long 1D vector. Repeating this procedure for all locations yields a M-by-F data matrix X where M = l × m and F is the number of feature variables. Each feature variable represents a neuron on the output layer of the DNN.
4.2 Spatial decoding
To evaluate whether representations in the deep neural networks (DNN) optimized for object recognition on natural images encode spatial information, we set up a series of spatial decoding tasks where we use fixed network representations across various hidden layers to decode spatial location (i.e., coordinates), head-direction (i.e., rotation degrees) and distance (i.e., Euclidean) to the nearest border using visual inputs of views from unvisited locations and/or directions. We formulate spatial decoding as a prediction problem where we fit a set of linear regression models on representations of visual views to predict location, direction and distance to nearest borders. We define the nearest border to the agent as the border that has the shortest Euclidean distance to the agent’s location regardless of the agent’s heading direction.
Decoder training and testing
To fit and test spatial decoders, we randomly sample locations in the environment (subject to a sampling rate). When a location is sampled for training, we consider views of different rotations as independent data-points. For example, a location decoder is fit to predict a vector ti = [tx, ty, tr, tb] where tx, ty are the x, y coordinates, tr is the direction, tb is the shortest distance to a border, given a view respectively. For the entire training set (Xi, Ti) where Ti = {t1, t2, …, ti}, we can write the decoder as
where Wi are the coefficients and bi are the intercepts. Decoder weights (both coefficients and intercepts) are optimized to minimize the mean squared error between the true and predicted targets.
Feature representation and selection
We consider a number of representations for visual inputs. Given a DNN model, we train separate decoders using representations across different layers. We apply L2 regularization on the decoder weights to reduce overfitting. Additionally, we consider different exclusion strategies (see Sec. 4.4) where unit representations are selectively removed based on their firing profiles (e.g., the number of place fields a unit contains).
Performance measure
We report the normalized decoding error on the heldout set of location, head-direction and nearest border decodings for each representation at a given sampling rate. The normalized decoding error reflects the deviation of the model’s prediction on location, heading direction or distance to the nearest wall relative to the ground truth (i.e., in terms of physical unit of space or angle in the virtual environment) and normalized with respect to the maximal length of the moving area (2 units in both depth and width). To reflect variance of the average error, we compute a two-sided 95% confidence interval of the average using a bootstrapping approach over locations.
Baselines
We compare decoding performance achieved by our agent to baselines where the agent decodes location, head-direction and distance to border based on fixed policies independent to visual inputs. Specifically, we establish two baselines where the agent simply either predict randomly within the legal bound or predict the centre position of the room (when predicting location), predict 90 degrees (when predicting rotation) and predict the distance from the centre of the room (when predicting nearest border distance).
4.3 Profiling spatial properties of model units
To gain a more intuitive understanding of the kinds of model units that are most useful for spatial decoding tasks, we profile all model units from the layers we consider based on their spatial patterns of firing. We consider the most common cell types that are found to encode spatial information as defined in the neuroscience literature, which include place cells, border cells and head-direction cells. For each cell type, we track a number of measures defined below.
Place cells
In our investigation, the identification of place-cell-like model units hinges on several key characteristics of unit firing. Firstly, we examine the number of place fields exhibited by each unit, defining a qualified field as one that spans 2D space ranging from 10 pixels to half the size of the environment, as outlined in Tanni et al. (2022). Additionally, we consider the maximal activation of each field, providing insights into the intensity and significance of unit firing.
Head-direction cells
Following Banino et al. (2018), the degree of directional tuning exhibited by each unit was assessed using the length of the resultant vector of the directional activity map. In our case, there is an activity map spanning entire 2D space for each direction. Vectors corresponding to each direction of an activity map were created:
where α and β are, respectively, the angle and average intensity (irrespective of spatial locations) of direction i in the activity map. These vectors were averaged to generate a mean resultant vector:
and the length of the resultant vector calculated as the magnitude of ⃗r. We used 24 angular directions uniformly spaced.
Border cells
Border score definition is based on Banino et al. (2018) where for each of the four walls in the square enclosure, the average activation for that wall, bi, was compared to the average centre activity c obtaining a border score for that wall, and the maximum was used as the border-score for the unit:
where bi is the mean activation for bins within db distance from the i-th wall and c the average activity for bins further than db bins from any wall (db = 3). Units with border score > 0.5 are considered border-like.
4.4 Spatial decoding with unit exclusion
To delve deeper into how different units within the DNN models contribute to the linear decoding of spatial information, we employ two complementary analyses. Model units are selectively excluded subject to criteria detailed below. Spatial decoders are re-trained on the rest of the model units following the same procedure as before (sec. 4.2).
Excluding units by spatial profile
In this first analysis, we selectively exclude units based on their firing profiles, which fall into the various spatial unit categories mentioned earlier. We initially rank all units in descending order according to their specific spatial measures (e.g., field count) and then systematically increase the exclusion rate, assessing the impact on the corresponding downstream tasks as we go. For example, we progressively exclude the top n% of units with the most place-like characteristics for varying values of n, examining the relationship between exclusion and performance of location decoding. This process is carried out separately for each unit type and their associated tasks we have defined, and we also include a control group where an equivalent number of units are excluded at random.
Excluding units by task-relevance
Our second exclusion analysis focuses on identifying the task-relevance of each model unit by utilizing previously learned decoders and their coefficient magnitudes. Essentially, units with larger coefficients are deemed more crucial for decoding, while those with smaller or zero coefficients are considered less relevant. Similar to the exclusion by spatial profile, we gradually increase the exclusion rate by selectively removing units based on the magnitude of their decoder coefficients. Additionally, we conduct a control analysis where an equal number of units are randomly chosen for exclusion. We then retrain linear decoders with the remaining units and assess their performance on downstream spatial tasks.
Data and code availability
The code for simulations and analyses are publicly available at https://github.com/don-tpanic/Space.
Acknowledgements
This work was supported by ESRC (ES/W007347/1), Wellcome Trust (WT106931MA), and a Royal Society Wolfson Fellowship (18302) to B.C.L., and the Medical Research Council UK (MC UU 00030/7) and a Leverhulme Trust Early Ca-reer Fellowship (Leverhulme Trust, Isaac Newton Trust: SUAI/053 G100773, SUAI/056 G105620, ECF-2019-110) to R.M.M.
References
- 1.Hospital Laboratory | 3D Interior | Unity Asset Storehttps://assetstore.unity.com/packages/3d/props/interior/hospital-laboratory-54382
- 2.Vector-based navigation using grid-like representations in artificial agentsNature 557:429–433https://doi.org/10.1038/s41586-018-0102-6
- 3.Emergence of grid-like representations by training recurrent neural networks to perform spatial localizationarXiv http://arxiv.org/abs/1803.07770
- 4.Grid and Nongrid Cells in Medial Entorhinal Cortex Represent Spatial Location and Environmental Features with Complementary Coding SchemesNeuron 94:83–92https://doi.org/10.1016/j.neuron.2017.03.004
- 5.An Image is Worth 16x16 Words: Transformers for Image Recognition at ScalearXiv https://doi.org/10.48550/arxiv.2010.11929
- 6.The reorganization and reactivation of hippocampal maps predict spatial memory performanceNature Neuroscience 13:995–1002https://doi.org/10.1038/nn.2599
- 7.The population doctrine in cognitive neuroscienceNeuron 109:3055–3068https://doi.org/10.1016/j.neuron.2021.07.011
- 8.Perspectives on 2014 Nobel PrizeHippocampus 25:679–681https://doi.org/10.1002/hipo.22445
- 9.Barlow versus Hebb: When is it time to abandon the notion of feature detectors and adopt the cell assembly as the unit of cognition?Neuroscience Letters 680:88–93https://doi.org/10.1016/j.neulet.2017.04.006
- 10.Seeing it all: Convolutional network layers map the function of the human visual systemNeuroImage 152:184–194https://doi.org/10.1016/J.NEUROIMAGE.2016.10.001
- 11.Slowness and Sparseness Lead to Place, Head-Direction, and Spatial-View CellsPLoS Computational Biology 3https://doi.org/10.1371/journal.pcbi.0030166
- 12.From grids to placesJournal of Computational Neuroscience 22:297–299https://doi.org/10.1007/s10827-006-0013-7
- 13.Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in positionBiological Cybernetics 36:193–202https://doi.org/10.1007/BF00344251
- 14.The representation of space in the brainBehavioural Processes 135:113–131https://doi.org/10.1016/j.beproc.2016.12.012
- 15.The place-cell representation of volumetric space in ratsNature Communications 11https://doi.org/10.1038/s41467-020-14611-7
- 16.Deep neural networks reveal a gradient in the complexity of neural representations across the ventral streamJournal of Neuroscience 35:10005–10014https://doi.org/10.1523/JNEUROSCI.5023-14.2015
- 17.Microstructure of a spatial map in the entorhinal cortexNature 436:801–806https://www.nature.com/articles/nature03721
- 18.Medial Entorhinal Cortex Lesions Only Partially Disrupt Hippocampal Place Cells and Hippocampus-Dependent Place MemoryCell Reports 9:893–901https://doi.org/10.1016/j.celrep.2014.10.009
- 19.Space in the brain: how the hippocampal formation supports spatial cognitionPhilosophical Transactions of the Royal Society B: Biological Sciences 369https://doi.org/10.1098/rstb.2012.0510
- 20.Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern RecognitionIEEE Computer Society https://doi.org/10.1109/CVPR.2016.90
- 21.The Organizationof Behavior: A Neuropsychological Theory, 2002nd ednNew York: Psychology Press
- 22.Accumulation of Hippocampal Place Fields at the Goal Location in an Annular Watermaze TaskThe Journal of Neuroscience 21:1635–1644https://doi.org/10.1523/JNEUROSCI.21-05-01635.2001
- 23.Receptive fields of single neurones in the cat’s striate cortexThe Journal of Physiology 148:574–591https://doi.org/10.1113/JPHYSIOL.1959.SP006308
- 24.Objectvector coding in the medial entorhinal cortexNature 568:400–404https://doi.org/10.1038/s41586-019-1077-7
- 25.Deep Supervised, but Not UnsupervisedModels May Explain IT Cortical Representation. PLoS Computational Biology 10https://doi.org/10.1371/journal.pcbi.1003915
- 26.ImageNet classification with deep convolutional neural networksIn: Advances in Neural Information Processing Systems
- 27.The emergence of grid cells: Intelligent design or just adaptation?Hippocampus 18:1256–1269https://doi.org/10.1002/hipo.20520
- 28.Interspike Intervals Reveal Functionally Distinct Cell Populations in the Medial Entorhinal CortexJournal of Neuroscience 35:10963–10976https://doi.org/10.1523/JNEUROSCI.0276-15.2015
- 29.Boundary Vector Cells in the Subiculum of the Hippocampal FormationThe Journal of Neuroscience 29:9771–9777https://doi.org/10.1523/JNEUROSCI.1319-09.2009
- 30.A novel somatosensory spatial navigation system outside the hippocampal formationCell Research 31:649–663https://doi.org/10.1038/s41422-020-00448-8
- 31.A compact spatial map in V2 visual cortex. preprintNeuroscience https://doi.org/10.1101/2021.02.11.430687
- 32.Learning Causes Reorganization of Neuronal Firing Patterns to Represent Related Experiences within a Hippocampal SchemaJournal of Neuroscience 33:10243–10256https://doi.org/10.1523/JNEUROSCI.0879-13.2013
- 33.Seeing social interactionsTrends in Cognitive Sciences 27:1165–1179https://doi.org/10.1016/j.tics.2023.09.001
- 34.DeadReckoning,”kmdmark Learning, and the Sense of Direction: A Neurophysiological and Computational HypothesisJournal of Cognitive Neuroscience 3:190–202
- 35.A non-spatial account of place and grid cells based on clustering models of concept learningNature Communications 10https://doi.org/10.1038/S41467-019-13760-8
- 36.A multilevel account of hippocampal function in spatial and concept learning: Bridging models of behavior and neural assembliesScience Advances 9https://doi.org/10.1126/sciadv.ade6903
- 37.Place Cells, Grid Cells, and the Brain’s Spatial Representation SystemAnnual Review of Neuroscience 31:69–89https://doi.org/10.1146/annurev.neuro.31.061307.090723
- 38.Explaining heterogeneity in medial entorhinal cortex with task-driven neural networks. In: Advances in Neural Information Processing SystemsNeuroscience https://doi.org/10.1101/2021.10.30.466617
- 39.The hippocampus as a spatial mapPreliminary evidence from unit activity in the freely-moving rat. Brain Research 34:171–175https://doi.org/10.1016/0006-8993(71)90358-1
- 40.What Is the Other 85 Percent of V1 Doing?23 Problems in Systems Neuroscience New York: Oxford University Press :182–212https://doi.org/10.1093/acprof:oso/9780195148220.003.0010
- 41.Hippocampal place cells have goaloriented vector fields during navigationNature 607:741–746https://doi.org/10.1038/s41586-022-04913-9
- 42.Invariant visual representation by single neurons in the human brainNature 435:1102–1107https://doi.org/10.1038/nature03687
- 43.Coherent encoding of subjective spatial position in visual cortex and hippocampusNature 562:124–127https://doi.org/10.1038/s41586-018-0516-1
- 44.Vectorial representation of spatial goals in the hippocampus of batsScience 355:176–180https://doi.org/10.1126/science.aak9589
- 45.Very Deep Convolutional Networks for LargeScale Image RecognitionarXiv http://arxiv.org/abs/1409.1556
- 46.Deep inside convolutional networks: Visualising image classification models and saliency maps2nd International Conference on Learning Representations, ICLR 2014 Workshop Track Proceedings pp 1–8
- 47.The hippocampus as a predictive mapNature Neuroscience 20:1643–1653https://doi.org/10.1038/nn.4650
- 48.Anatomical Organization and Spatiotemporal Firing Patterns of Layer 3 Neurons in the Rat Medial Entorhinal CortexJournal of Neuroscience 35:12346–12354https://doi.org/10.1523/JNEUROSCI.0696-15.2015
- 49.State transitions in the statistically stable place cell population correspond to rate of perceptual changeCurrent Biology 32:3505–3514https://doi.org/10.1016/j.cub.2022.06.046
- 50.Head-direction cells recorded from the postsubiculum in freely moving ratsII. Effects of environmental manipulations. The Journal of Neuroscience 10:436–447https://doi.org/10.1523/JNEUROSCI.10-02-00436.1990
- 51.Integrating time from experience in the lateral entorhinal cortexNature 561:57–62https://doi.org/10.1038/s41586-018-0459-6
- 52.Attention is all you needIn: Advances in Neural Information Processing Systems
- 53.Perseveration on place reversals in spatial swimming pool tasks: Further evidence for place learning in hippocampal ratsHippocampus 7:361–370https://doi.org/10.1002/(SICI)1098-1063(1997)7:4<361::AID-HIPO2>3.0.CO;2-M
- 54.Hippocampal lesions and path integrationCurrent Opinion in Neurobiology 7:228–234https://doi.org/10.1016/S0959-4388(97)80011-6
- 55.The Tolman-Eichenbaum Machine: Unifying Space and Relational Memory through Generalization in the Hippocampal FormationCell 183:1249–1263https://doi.org/10.1016/J.CELL.2020.10.024
- 56.Performance-optimized hierarchical models predict neural responses in higher visual cortexProceedings of the National Academy of Sciences of the United States of America 111:8619–8624https://doi.org/10.1073/pnas.1403112111
- 57.Orthogonal Representations of Object Shape and Category in Deep Convolutional Neural Networks and Human Visual CortexScientific Reports 10:1–12https://doi.org/10.1038/s41598-020-59175-0
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Copyright
© 2024, Luo et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 787
- downloads
- 38
- citation
- 1
Views, downloads and citations are aggregated across all versions of this paper published by eLife.