The inevitability and superfluousness of cell types in spatial cognition

Xiaoliang Luo; Robert M Mok; Bradley C Love

doi:10.7554/eLife.99047.1

eLife assessment

This study uses a deep neural network approach to challenge the role of spatially selective neurons like place, head or border cells for position decoding. The findings are important as they suggest that such functional cell types may emerge naturally from object recognition in complex visual environments, but are neither necessary, nor particularly critical for position decoding. However, direct evidence supporting this conclusion remains incomplete.

https://doi.org/10.7554/eLife.99047.1.sa4

Significance of findings

important: Findings that have theoretical or practical implications beyond a single subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

incomplete: Main claims are only partially supported

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Summary

Discoveries of functional cell types, exemplified by the cataloging of spatial cells in the hippocampal formation, are heralded as scientific breakthroughs. We question whether the identification of cell types based on human intuitions has scientific merit and suggest that “spatial cells” may arise in non-spatial computations of sufficient complexity. We show that deep neural networks (DNNs) for object recognition, which lack spatial grounding, contain numerous units resembling place, border, and head-direction cells. Strikingly, even untrained DNNs with randomized weights contained such units and support decoding of spatial information. Moreover, when these “spatial” units are excluded, spatial information can be decoded from the remaining DNN units, which highlights the superfluousness of cell types to spatial cognition. Now that large-scale simulations are feasible, the complexity of the brain should be respected and intuitive notions of cell type, which can be misleading and arise in any complex network, should be relegated to history.

1 Introduction

Spatial cognition encompasses our cognitive abilities to know where we are and navigate in complex environments, which includes self-localization, comprehension of environmental layouts, and efficient navigation toward distant goal locations. The prevailing view of the field attributes these functions to neural machinery within the hippocampal formation which contain a diverse array of “spatial” cell types including place cells, head-direction cells, border cells, and grid cells (e.g., Hartley et al. 2014; Moser et al. 2008; Grieves and Jeffery 2017), and that these cells collectively form the foundation of an internal cognitive map of space that underlie our spatial abilities (O’Keefe and Dostrovsky, 1971; Taube et al., 1990; Lever et al., 2009; Hafting et al., 2005). As such, a substantial proportion of the field focuses on characterizing particular functional “cell types” based on neural activity patterns in these regions. However, these “spatial cells” were identified due to their intriguing firing patterns that piqued the interest of neuroscientists and later defined based on subjective criteria – place cells appear to encode particular locations, border cells respond to borders, head-direction cells are sensitive to a heading direction, grid cells exhibit firing fields in a grid-like pattern. In practice, the field’s initial fascination with these cells has led to a stubborn tendency to focus on “cell type” classification and overlook the fact that the criteria for distinct cell types seldom align seamlessly with empirical data and often exhibit mixed selectivity across task and environmental variables (e.g., Hollup et al. 2001; McKenzie et al. 2013; Dupret et al. 2010; Eichenbaum 2015; Grieves and Jeffery 2017; Latuske et al. 2015; Tang et al. 2015), raising questions about their precise roles, if any.

Could these cells simply arise from any systems with complex computations, unrelated to processes specific to spatial or navigation? There are strong hints that “spatial” cells may arise simply from domain-general learning algorithms that can lead to both concept and spatial neural representations (Mok and Love, 2019, 2023; Stachenfeld et al., 2017; Whittington et al., 2020). Indeed, models show that spatial firing patterns can arise from factors entirely unrelated to spatial cognition, such as sparseness constraints (Kropff and Treves, 2008; Franzius et al., 2007a,b).

Another question is whether these cells are particularly important for spatial cognition. Research has shown that “spatial” cells may not be functionally more critical than those with other cells with multiplexed or hard-to-interpret firing patterns for spatial localization (Diehl et al., 2017) or for explaining neuronal response profiles in the medial entorhinal cortex (Nayebi et al., 2021). Even lesion studies in the hippocampal formation do not consistently result in specific navigation impairments (Hales et al., 2014; Whishaw and Tomie, 1997; Whishaw et al., 1997).

Considering that both biological and artificial “cells” rarely perfectly align with human-defined criteria with many cells exhibit “uninterpretable” firing patterns, one unsettling possibility is that the neuroscience may have inadvertently constrained its pursuit by fixating on the search for interpretable “cell types”. By adopting this top-down, perhaps näıve approach, the field may be investing a large proportion of its resources in a potentially fruitless quest, ignoring other explanations for the emergence of spatial cells and the foundations of spatial cognition.

A harsh appraisal of the field is that it looks for whatever cell type superficially reflects the answer sought. For example, how do animals localize themselves? The naive answer is that there must be place cells. Setting aside this answer provides zero insight into how such a cell came to be, there is no reason for the brain to perfectly align with our intuitions and, equally problematic, cells with these properties might arise but not serve the top-down function neuroscientist ascribe to them. Ascribing interpretable functions to cells and naming them may be good for neuroscience careers, but is it good for neuroscience?

In this contribution, we offer an alternative to the prevailing notion that cells with readily interpretable receptive fields in regions like hippocampus underlie spatial systems. We propose that what we commonly refer to as “spatial cells” might be inevitable derivatives of general computational mechanisms, and are in no way intrinsically spatial. Our results demonstrate that these cells could manifest within any sufficiently rich information processing system, such as perception. Remarkably, not only do we find spatial signals within a non-spatial system but also demonstrate that these purported signals, ostensibly representing spatial knowledge, do not appear to have a privileged role for tangible downstream tasks related to spatial cognition as previously claimed.

To evaluate these possibilities, we turned to deep neural networks (DNNs) that were information processing systems devoid of components dedicated to spatial processing. Specifically, we analyzed deep learning models of perception with hundreds of millions of parameters. DNNs trained on natural images have achieved human-level performance in recognizing real-world objects in photographs (Krizhevsky et al., 2012; Simonyan and Zisserman, 2015). These models exhibit strong parallels with representations in the primate ventral visual stream (Khaligh-Razavi and Kriegeskorte, 2014; Yamins et al., 2014; Güçlü and van Gerven, 2015; Eickenberg et al., 2017; Zeman et al., 2020). Our investigation aims to determine whether processing egocentric visual information within such complex – yet non-spatial – models can lead to brain-like representations of allocentric spatial environments, and whether these representations are essential for spatial cognition. Additionally, we considered whether untrained DNNs (absent any experiences) can account for the emergence of spatial cells and spatial cognition.

To evaluate functional spatial information in perceptual DNN models, we began by creating a three-dimensional virtual environment reminiscent of a laboratory used in animal studies in neuroscience (Fig. 1). In this virtual setting, an agent randomly forages in a two-dimensional square area, much like an animal would explore an enclosure. The agent “sees” images of the room within its field of view over many locations and heading directions. Images from the first-person perspective are processed by a DNN, and we assess whether the internal representations of the model contain various kinds of spatial knowledge. We adopted a decoding framework which parallels experimental techniques employed by neuroscientists. Specifically, we trained linear regression models using various levels of representations (unit activations from visual inputs) generated by DNNs of different architectures to decode navigation-relevant variables, including self-localization, heading direction, and distance to the closest wall on locations and heading directions they have not encountered before. To test whether DNN units carry information like “spatial” cells in the brain, we classified and visualized model units based on standard criteria for place, direction, and border cells (Tanni et al., 2022; Banino et al., 2018). To evaluate whether these cells play a privileged role in spatial cognition, we excluded units classified as spatial units based on traditional spatial cell criteria and assessed their spatial knowledge.

Assessing spatial knowledge in non-spatial perception systems using a linear decoding approach in a virtual environment.
(A) A three-dimensional virtual space is created to resemble a realistic laboratory environment with a variety of visual features. An agent moves randomly in a two-dimensional area which is within a three-dimensional space and processes first-person views of the environment. Central image shows the threedimensional environment. Surrounding images are example views taken by the agent at different locations and heading directions. (B) Top-down view of the area where the agent can explore. We define spatial knowledge of the agent with four values. The agent’s location is denoted by the Cartesian coordinates t_x, t_y. The agent’s heading direction is denoted by the angle t_r. The distance between the agent and the nearest wall is t_b. (C) Individual views are processed by perception models (deep neural networks of object recognition). We train linear regression models with various levels of internal representations from these networks to assess spatial knowledge related to self-location (t_x, t_y), heading direction (t_r) and distance to the closest wall (t_b).

To foreshadow our findings, we discovered that DNNs, regardless of their architectural variations, representation levels, or training states, possess noteworthy proficiency in allocentric spatial understanding. Spatial variables including location, heading direction and distance to boundaries can be accurately decoded from the DNN models. Additionally, a substantial number of DNN units exhibit spatial firing patterns that align with the criteria used in neuroscience to classify place, head-direction, and border cells. Notably, a significant portion of these units employs mixed-selectivity and conjunctive coding, akin to characteristics found in hippocampal neurons (Eichenbaum, 2015). We further revealed that excluding units meeting the classical criteria of spatial cell types has minimal impact on spatial decoding performance, reinforcing the perspective that the code underpinning spatial cognition is more distributed and diverse than previously thought, relying on neural population coding rather than individual and specialized cell types (Hebb, 1949; Eichenbaum, 2018; Ebitz and Hayden, 2021). In summary, our analyses suggest that “spatial” cell types may be inevitable and superfluous in complex information processing systems.

2 Results

2.1 Spatial Knowledge through Non-spatial Systems

We predict that spatial knowledge can arise in computational systems with sufficient complexity absent a spatial grounding. We test this hypothesis by analyzing the percepts of an agent freely moving in a virtual environment. The percepts take the form of viewpoints fed into a deep neural network with its millions of weights either randomly initialized or trained for object recognition, a non-spatial task. From various network layers, we attempted to decode spatial information such as location, from this complex computational system.

To assess information latent in complex networks, we trained linear regression models to decode the agent’s location, head-direction and distance to the closet border using fixed unit activity extracted at various levels of the networks. While sequential information is important for spatial systems, our decoding framework does not utilize sequential information of sampled views during training, making it a stricter test for the central claim that spatial knowledge can form out of general information processing. We expect that one’s location, heading-direction and distance from border may be decodable from visual information alone due to the inherent continuity of perception across spatial locations and/or directions. All results reported here are tested on out-of-sample locations and views using the perception model VGG-16 (Simonyan et al., 2014), unless noted otherwise. For details referring to training and testing procedure, see Methods. For results of other models such as Resnet-50 (He et al., 2016) and Vision Transformers (ViT; Dosovitskiy et al. 2020), see Appendix.

We find that spatial knowledge, namely one’s own location (spatial coordinates; Fig. 2B, left panel), heading direction (Fig. 2B, middle panel) and distance to the nearest wall (Fig. 2B, right panel) can be successfully decoded from unit activity in a perception model. We quantified the degree of spatial knowledge by the decoding error, that is, the deviation of the model’s prediction of its own location, heading direction, or distance to the nearest wall relative to the ground truth (i.e., the physical unit of space or angle in the virtual environment). We normalized decoding error with respect to the maximal unit length of the moving area (see Methods). For example, a normalized error of 0.05 for location prediction means the decoded location is 5% off from the true location relative to the maximal width/depth of the entire moving area. As more locations and views are sampled for training the agent (x-axis), decoding performance improves. The agent demonstrates impressive decoding performance across all spatial knowledge tests, even when trained on just 30% of randomly selected independent locations and associated viewpoints within the entire space. The moving area encompasses 289 distinct locations, each featuring 24 evenly spaced views covering the full 360-degree spectrum (for more details, please see Methods).

Perception models absent a spatial basis possess extensive spatial knowledge.
(A) Decoding performance across tasks and model layers. Across three tasks, mid-to-late layers exhibited lower decoding errors compared to early layers and the penultimate layer of VGG-16 (fc2). All layers outperformed chance (green and blue lines). (B) Decoding performance across a number of deep neural network architectures (penultimate layer; see Appendix for full results), including convolutional networks and vision transformers. All pre-trained and untrained models outperformed baseline measures. Error is in normalized virtual environment units (see Methods). See full results across models, layers and sampling rates in the Appendix. Shaded areas in B and error bars in C represent 95% confidence intervals of the mean decoding error (bootstrapped across locations).

Decoding performance varied across layers of the network where deeper layers typically achieved lower decoding error (apart from the penultimate layer in VGG-16, i.e., fc2). Crucially, decoders across all sampling rates and levels of representation show substantially better performance than chance (green and blue lines), determined by strategies invariant to visual inputs (i.e., random or predicting the center of possible choices; see Methods). Our results demonstrate there is sufficient spatial information to achieve remarkably low decoding error in a perception model based on a few visual snapshots of the virtual space, which could be further improved through the integration of information across views in a perception-based navigation system.

To test the generality our claim that spatial knowledge can arise from complex computational systems irrespective of their architectural variations or training states, we examined several computational systems including deep convolutional neural networks (DCNN) and Vision Transformers (ViT). We evaluated these models in both their pre-trained form (optimized for visual object recognition), and in an untrained state (randomly initialized parameters), following the same decoding approach described earlier. Here, we present the results for the penultimate layer representation and uniformly sampled 30% of all locations and their views for training the decoder, but similar results were observed across layers (for full results across various models, layers, and sampling rates, see Appendix.). We found that all models possessed latent spatial knowledge, exhibiting lower errors far below chance (Figure 2C). Notably, this was true even in untrained networks. While one might contend that deep convolutional neural networks, with their reliance on local convolution operations, inherently possess a predisposition for extracting spatial knowledge from visual information, the revelation that an untrained ViT—comprising nothing more than a hierarchy of fully-connected layers (i.e., self-attention) and non-linear operations—can achieve superior decoding performance illustrates the inevitability of computational complexity in facilitating the extraction of spatial information necessary for cognitive spatial processes, even in non-spatial systems.

2.2 “Spatial Cells” Inevitably Arise in Complex Computational Systems

How was it possible to decode spatial information from non-spatial networks? The field’s common conception is that cells exhibiting spatial firing profiles play a pivotal role in supporting spatial cognition, as these profiles appear intuitively useful to spatial tasks such as navigation. Could spatial cells be responsible for the impressive decoding of spatial information within a perception system like we have shown? If a perception system devoid of a spatial component demonstrates classically spatially-tuned unit representations, such as place, head-direction, and border cells, can “spatial cells” truly be regarded as “spatial”? Might they be inevitable derivatives of any complex system? If so, do they drive spatial knowledge, or are they in fact superfluous in spatial cognition?

To determine if the model’s spatial knowledge is primarily supported by spatially-tuned units like those found in the brain, we classified every hidden unit in each model based on criteria used to identify spatial cells in neuroscience for place cells (Tanni et al., 2022), head-direction cells (Banino et al., 2018), and border cells (Banino et al., 2018), respectively (see Methods). The composition of cell types across layers is shown in Fig. 3A. Contrary to the predominant view that “spatial cells” are unique properties of spatial systems and are grounded in space-related tasks, we found many units in the non-spatial model VGG-16 that satisfied the criteria of place, head-direction, and border cells (see Methods for criteria; for other models, see Appendix). The majority of units show mixed selectivity irrespective of layer depth. Notably, a significant proportion of units show both place and directional tuning (P+D), matching a common observation in the literature (e.g., McNaughton et al. 1991; Tang et al. 2015; Grieves and Jeffery 2017). We selected example units and plotted their spatial activation patterns in the two-dimensional virtual space (irrespective of direction), and their direction selectivity in polar plots (Fig. 3B-3E; for more examples, see Appendix). These examples show spatially-tuned units without directional tuning that match hippocampal place cells (Fig. 3B), direction-selective units with strong direction selectivity accompanied by minor spatial selectivity matching head-directional cells (Fig. 3C), and units with boundary cell-like tuning (Fig. 3D). There were also many units that exhibited both strong place and directional tuning (Fig. 3E). These results support our theory that “spatial cells” might arise in any computational systems, even in systems designed for nonspatial tasks (e.g., object recognition). Consistent with our view, we found no clear relationship between cell type distribution and spatial information in each layer. This raises the possibility that “spatial cells” do not play a pivotal role in spatial tasks as is broadly assumed.

Representations of perception models developed for object recognition exhibit typical spatial cell-like firing profiles.
Active units were classified across model layers based on standard criteria used to identify place cells (P), head-direction cells (D), and border cells (B). Example units from VGG-16 (see Appendix for more examples from other models). (A) Pie charts illustrating the proportion of “spatial” cell types identified in DNNs, including units that are inactive. Many units satisfied the criteria for place, head-direction and border cells irrespective of layer depth. Many units exhibited mixed selectivity, with a significant amount of units displaying strong place and directional tuning. (B-E) Spatial firing profiles of model units in spatial activation maps and polar plots . For activation maps, each unit’s activation was plotted at each location in the two-dimensional area irrespective of heading direction. For direction selectivity, polar plots show the average activity of the activity map across location at a given angle, which reflects the tuning magnitude to each heading direction). (B) Example place cell units that show strong spatial selecitivity with little direction selectivity. (C) Examples of head-direction cell units that show strong direction selectivity but weak location selectivity. (D) Examples of border cell units that respond strongly to boundaries of the environment. (E) Examples of mixed-selective place and head-direction cell units with strong spatial and directional tuning. Examples presented are from VGG-16. See Appendix for more examples.

2.3 “Spatial Cells” Are Superfluous in Spatial Cognition

We have establishd that“spatial cells” can arise in non-spatial systems. Here, we consider whether units with spatial properties are necessary to decode spatial information. It may very well be that spatial units, akin to place cells, might not only arise in complex (including random) networks but are also superfluous to spatial cognition. To test this possibility, we performed a systematic exclusion analysis to assess whether model units that exhibit the strongest spatial firing properties contribute to spatial knowledge. First, we scored and ranked each unit classified as place, head-direction, and border cell units (see Methods), and then re-trained linear decoders without the top n units of a specific cell type and evaluated the model’s corresponding spatial knowledge of the environment. That is, we test model’s location, heading direction and distance to closet border decoding ability by excluding place, head-direction and border cell units, respectively. We repeated this procedure with a progressively higher exclusion ratio (see Methods).

In line with our hypothesis, excluding spatial units in the models that scored highest on each of the corresponding criteria (place field activity, number of place fields, directional tuning, and border tuning), had minimal effect on decoding performance even with a large proportion of the highest ranked spatial units being excluded (Fig 4B, top panel). To assess whether any detriment was observed was significant and specific to excluding highly-ranked spatial units, we randomly excluded the same number of units without regard to the spatial score (Fig. 4B, bottom panel) and observed a similar pattern across four exclusion scenarios, meaning that highly-tuned spatial units contributed no more to spatial cognition than a random selection of units.

“Spatial” cells do not play a privileged role in spatial cognition.
Exclusion analyses showed that units exhibiting traditional spatial firing profiles do not form the basis of the model’s spatial knowledge. (A) Units in each layer were ranked based on standard criteria for place (maximum place field activity, number of place fields), head-direction (strength of directional tuning), and border (strength of border tuning) cells. As more highly-ranked spatial units were excluded, the overall decoding performance remained relatively stable across all tasks (top). Excluding an equivalent number of units randomly yielded similar performance (bottom). (B) Model units’ contribution to spatial knowledge based on their contribution to decoding performance (magnitude of regression coefficients). More highly-ranked task-relevant units excluded results in a marked deterioration of decoding performance by decoders trained on the remaining units (top). Randomly excluding an equivalent number of units had minimal impact on performance (bottom). Results here are from VGG-16. For other models, please refer to the Appendix.

Finally, we performed an additional analysis where we excluded model units that contributed most to each of the tasks. Specifically, we identified each unit’s importance for each task, based on the magnitudes of the coefficients learned by the linear decoders (trained in Section 2.1), excluded the top n units, and re-trained linear decoders for each respective task using the remaining units. This approach had a greater impact on decoding error compared to spatial unit criteria or random exclusion, with performance suffering more as the exclusion ratio rose (see Figure 4C). Nevertheless, decoding was still possible at high exclusion ratios, suggesting that spatial information is widely distributed across network units.

3 Discussion

The prevailing perspective in neuroscience conflates spatial knowledge and the variety of spatial cognitive abilities in animals with “spatial cells” in the hippocampal formation. In this work, we proposed an alternative perspective where “spatial cells” can simply arise in non-spatial computational systems with sufficient complexity, and that these units may, in fact, be by-products of such systems absent any leading role in spatial cognition.

Deep neural networks (DNNs) for object recognition served as an ideal testbed for our hypotheses. Object recognition DNNs, designed to generate translation-invariant visual features for successful categorization, lack inherent spatial elements. Unlike spatial navigation systems (e.g., Banino et al. 2018; Cueva and Wei 2018), DNNs in object recognition do not purposefully incorporate information about velocity, heading direction, or event sequences. Exploring various DNN architectures for object recognition allows a comprehensive examination of the connection between spatial representations and non-spatial complex systems.

We applied DNNs to a three-dimensional virtual environment to simulate an agent foraging in a two-dimensional space, akin to how an animal explores an enclosure. Detailed analyses revealed that DNN’s representations contained sufficient spatial information to decode key variables, despite these DNNs being non-spatial perceptual systems. We found that many DNN units passed established criteria for classification as place, head-direction, and border cells, suggesting such units can be easily found in various non-spatial computational systems. Furthermore, excluding the “spatial” units had minimal impact on spatial decoding tasks, suggesting a non-essential role for spatial cognition. These results call into question utility of labelling these cell “types”.

We considered object recognition models from two prominent DNN families: deep convolutional networks, drawing inspiration from the mammalian visual system (Fukushima, 1980), and visual transformers, adapted from the Transformer architecture originally designed for natural language processing (Vaswani et al., 2017). Spatial information was readily decoded from models from both families, including untrained models with randomly initialized weights. The decoding results from the untrained visual transformer are particularly striking because transformers incorporate minimal inductive biases, such as constraints that respect spatial proximity (e.g., convolutions). In summary, complex networks that are not spatial systems, coupled with environmental input, appear sufficient to decode spatial information.

Our work raises the question of whether focusing on single cells and their “type” is the correct approach to understanding spatial cognition, or indeed any cognitive function. The field has maintained a persistent inclination towards identifying cell types based on firing profiles that are subjectively interpreted to be useful for the behavior of interest (Hartley et al., 2014; Moser et al., 2008; Sarel et al., 2017; Tsao et al., 2018; Høydal et al., 2019; Grieves et al., 2020; Ormond and O’Keefe, 2022). This is likely due to the historical backdrop in neuroscientific research where pioneers in visual neuroscience discovered and named individual cells in V1 by observing their firing profiles (Hubel and Wiesel, 1959) which won the Nobel Prize (1981) and marked a foundational era for the field. This paradigm, led to discoveries of place and grid cells (Nobel Prize in Physiology or Medicine, 2014), and the “Jennifer Aniston” neuron or “concept cells” (Quiroga et al., 2005), which were intuitively compelling and attractive as an apparent explanation. These cells may in fact play a role, but our work suggests that this must be critically assessed rather than assumed, as they may not play a privileged role compared to other cells in the population, and could even be superfluous for the context at hand. The general assumption that the simplicity and interpretability of firing profiles somehow lends support to a crucial role of these cells should be questioned. With a growing recognition of the imperative to move beyond mere cell type categorization, as underscored by Olshausen and Field (2006) on the insufficient characterization of a majority of V1 cells, there is a new shift to focus on a broader perspective where neural assemblies or populations form the fundamental computational unit (Hebb, 1949; Eichenbaum, 2018; Ebitz and Hayden, 2021).

We pose a broader question: do our preconceptions of how complex systems should work hinder our aim to understand the brain’s inner workings? Notably, spatial representations conventionally linked to the hippocampal formation have been unveiled in sensory areas (Saleem et al., 2018; Long et al., 2021; Long and Zhang, 2021). While prior attributions pointed to modulatory signals from the hippocampus, our study raises concerns about potential biases stemming from preconceived notions and an underestimation of the complexity inherent in the sensory system—it may be shouldering more extensive responsibilities than initially presumed. Work in other domains are reaching similar conclusions (cf. McMahon and Isik 2023).

One upshot of our contribution is that the pursuit of identifying and cataloging cell types aligned with subjective notions of how the brain works might impede scientific progress. While searching for cell types was a reasonable strategy in an era before it was straightforward to conduct large-scale simulations, this quest should now be relegated to history. Otherwise, apparent discoveries of cell types are likely to reflect the activity of broad classes of complex networks that implement a variety of functions or none at all in the case of the random networks we considered. It is far too easy for neuroscientists, including computational neuroscientists who design their models to manifest a variety of cell types, to fool themselves when the underlying complexity of the systems considered is not taking into account.

4 Methods

4.1 Virtual environment

We setup our virtual environment using the Unity3D Engine (editor version: 2021.3.16f1; Silicon). The three-dimensional laboratory environment was adapted from an existing template purchased on the Unity Store (3DEverything, 2022). Our virtual agent’s movement is constrained in a two-dimensional square placed inside the three-dimensional space, much like a moving area for an animal in real-world experiments. The agent randomly moves around the square area and captures first-person pictures of the environment across locations and heading directions.

Environment specifications

The specifications for the environment, measured in Unity units, are as follows: the lab room has a height of 3.17 units, a width of 6.29 units, and a depth of 10.81 units. The moving area, which is a distinct part of the environment, has a height of 0.25 units, a width of 2 units, and a depth of 2 units. Additionally, the moving area is located at a specific location whose relative distances from the center to the lab room boundaries are: 3.59 units to the left wall, 5.35 units to the right wall, 5.55 units to the bottom wall, and 2.80 units to the front wall.

The agent is only allowed to move within the squared moving area. For simplicity, we discretize reachable locations by the agent in the area as a 2D grid where the agent can only move between points on the grid. We use a 17×17 grid (289 unique locations).

Visual input

To collect visual inputs from the virtual environment, the agent randomly moves along the specified n-by-n grid in the squared area. We denote l as the total number of unique locations on the grid. The grid is evenly spaced. At each location, the agent turns around and captures m number of frames, each separated by a fixed angle (e.g., 15 degrees per frame). Each frame obtained by the agent is processed by a DNN (up to a given layer). Resulting outputs at the same location are considered independent data-points, each forming a long 1D vector. Repeating this procedure for all locations yields a M-by-F data matrix X where M = l × m and F is the number of feature variables. Each feature variable represents a neuron on the output layer of the DNN.

4.2 Spatial decoding

To evaluate whether representations in the deep neural networks (DNN) optimized for object recognition on natural images encode spatial information, we set up a series of spatial decoding tasks where we use fixed network representations across various hidden layers to decode spatial location (i.e., coordinates), head-direction (i.e., rotation degrees) and distance (i.e., Euclidean) to the nearest border using visual inputs of views from unvisited locations and/or directions. We formulate spatial decoding as a prediction problem where we fit a set of linear regression models on representations of visual views to predict location, direction and distance to nearest borders. We define the nearest border to the agent as the border that has the shortest Euclidean distance to the agent’s location regardless of the agent’s heading direction.

Decoder training and testing

To fit and test spatial decoders, we randomly sample locations in the environment (subject to a sampling rate). When a location is sampled for training, we consider views of different rotations as independent data-points. For example, a location decoder is fit to predict a vector t_i = [t_x, t_y, t_r, t_b] where t_x, t_y are the x, y coordinates, t_r is the direction, t_b is the shortest distance to a border, given a view respectively. For the entire training set (X_i, T_i) where T_i = {t₁, t₂, …, t_i}, we can write the decoder as

where W_i are the coefficients and b_i are the intercepts. Decoder weights (both coefficients and intercepts) are optimized to minimize the mean squared error between the true and predicted targets.

Feature representation and selection

We consider a number of representations for visual inputs. Given a DNN model, we train separate decoders using representations across different layers. We apply L₂ regularization on the decoder weights to reduce overfitting. Additionally, we consider different exclusion strategies (see Sec. 4.4) where unit representations are selectively removed based on their firing profiles (e.g., the number of place fields a unit contains).

Performance measure

We report the normalized decoding error on the heldout set of location, head-direction and nearest border decodings for each representation at a given sampling rate. The normalized decoding error reflects the deviation of the model’s prediction on location, heading direction or distance to the nearest wall relative to the ground truth (i.e., in terms of physical unit of space or angle in the virtual environment) and normalized with respect to the maximal length of the moving area (2 units in both depth and width). To reflect variance of the average error, we compute a two-sided 95% confidence interval of the average using a bootstrapping approach over locations.

Baselines

We compare decoding performance achieved by our agent to baselines where the agent decodes location, head-direction and distance to border based on fixed policies independent to visual inputs. Specifically, we establish two baselines where the agent simply either predict randomly within the legal bound or predict the centre position of the room (when predicting location), predict 90 degrees (when predicting rotation) and predict the distance from the centre of the room (when predicting nearest border distance).

4.3 Profiling spatial properties of model units

To gain a more intuitive understanding of the kinds of model units that are most useful for spatial decoding tasks, we profile all model units from the layers we consider based on their spatial patterns of firing. We consider the most common cell types that are found to encode spatial information as defined in the neuroscience literature, which include place cells, border cells and head-direction cells. For each cell type, we track a number of measures defined below.

Place cells

In our investigation, the identification of place-cell-like model units hinges on several key characteristics of unit firing. Firstly, we examine the number of place fields exhibited by each unit, defining a qualified field as one that spans 2D space ranging from 10 pixels to half the size of the environment, as outlined in Tanni et al. (2022). Additionally, we consider the maximal activation of each field, providing insights into the intensity and significance of unit firing.

Head-direction cells

Following Banino et al. (2018), the degree of directional tuning exhibited by each unit was assessed using the length of the resultant vector of the directional activity map. In our case, there is an activity map spanning entire 2D space for each direction. Vectors corresponding to each direction of an activity map were created:

where α and β are, respectively, the angle and average intensity (irrespective of spatial locations) of direction i in the activity map. These vectors were averaged to generate a mean resultant vector:

and the length of the resultant vector calculated as the magnitude of ⃗r. We used 24 angular directions uniformly spaced.

Border cells

Border score definition is based on Banino et al. (2018) where for each of the four walls in the square enclosure, the average activation for that wall, b_i, was compared to the average centre activity c obtaining a border score for that wall, and the maximum was used as the border-score for the unit:

where b_i is the mean activation for bins within d_b distance from the i-th wall and c the average activity for bins further than d_b bins from any wall (d_b = 3). Units with border score > 0.5 are considered border-like.

4.4 Spatial decoding with unit exclusion

To delve deeper into how different units within the DNN models contribute to the linear decoding of spatial information, we employ two complementary analyses. Model units are selectively excluded subject to criteria detailed below. Spatial decoders are re-trained on the rest of the model units following the same procedure as before (sec. 4.2).

Excluding units by spatial profile

In this first analysis, we selectively exclude units based on their firing profiles, which fall into the various spatial unit categories mentioned earlier. We initially rank all units in descending order according to their specific spatial measures (e.g., field count) and then systematically increase the exclusion rate, assessing the impact on the corresponding downstream tasks as we go. For example, we progressively exclude the top n% of units with the most place-like characteristics for varying values of n, examining the relationship between exclusion and performance of location decoding. This process is carried out separately for each unit type and their associated tasks we have defined, and we also include a control group where an equivalent number of units are excluded at random.

Excluding units by task-relevance

Our second exclusion analysis focuses on identifying the task-relevance of each model unit by utilizing previously learned decoders and their coefficient magnitudes. Essentially, units with larger coefficients are deemed more crucial for decoding, while those with smaller or zero coefficients are considered less relevant. Similar to the exclusion by spatial profile, we gradually increase the exclusion rate by selectively removing units based on the magnitude of their decoder coefficients. Additionally, we conduct a control analysis where an equal number of units are randomly chosen for exclusion. We then retrain linear decoders with the remaining units and assess their performance on downstream spatial tasks.

Data and code availability

The code for simulations and analyses are publicly available at https://github.com/don-tpanic/Space.

Acknowledgements

This work was supported by ESRC (ES/W007347/1), Wellcome Trust (WT106931MA), and a Royal Society Wolfson Fellowship (18302) to B.C.L., and the Medical Research Council UK (MC UU 00030/7) and a Leverhulme Trust Early Ca-reer Fellowship (Leverhulme Trust, Isaac Newton Trust: SUAI/053 G100773, SUAI/056 G105620, ECF-2019-110) to R.M.M.

VGG-16 (untrained).
Linear decoders trained with representations from various layers of the untrained VGG-16 achieve low errors across sampling rates; though not as good as its trained version. Mid-to-advanced layers show superior performance than early layers. As more locations are sampled for training the linear decoders, overall decoding performance improves. All model layers can decode better than two visual-invariant baselines.

ResNet-50 (trained).
Linear decoders trained with representations from various layers of the ResNet-50 pretrained on object recognition achieve low errors across sampling rates. Mid-to-advanced layers show superior performance than early layers. The penultimate layer decoding performance did not improve as much as the intermediate layers as more locations are sampled for training the linear decoders. All model layers can decode better than two visual-invariant baselines.

ResNet-50 (untrained).
Similar to ResNet-50 pretrained on images, the untrained counterpart can effectively decode spatial knowledge related to location, heading direction and distance to borders. All layers decoder better than the baseline decoders which do not rely on visual signals of the environment.

ViT-B/16 (trained).
Linear decoders trained on different layers of the pretrained ViT model show very similar decoding performance. Overall the decoding performance is much better than the two baseline decoders which do not incorporate visual signals.

ViT-B/16 (untrained).
Linear decoders trained on the untrained ViT model also achieve accurate decoding performance on location, heading direction and distance to the nearest border. Similar to the trained version, all layers considered in our analysis achieve comparable performance and outperform the baselines.

Distribution of different spatial unit types across layers of perceptual models of object recognition.

Examples of model units exhibiting spatial characteristics.

Excluding units with the strongest spatial profiles had minimal impact on spatial knowledge (ResNet-50).

Excluding units by task-relevance affects spatial decoding performance (ResNet-50).

Excluding units with the strongest spatial profiles had minimal impact on spatial knowledge (ViT-B/16).

Excluding units by task-relevance affects spatial decoding performance (ViT-B/16).

References

1.
1. 3DEverything
2022Hospital Laboratory | 3D Interior | Unity Asset Storehttps://assetstore.unity.com/packages/3d/props/interior/hospital-laboratory-54382 Google Scholar
2.
1. Banino A
2. Barry C
3. Uria B
4. Blundell C
5. Lillicrap T
6. Mirowski P
7. Pritzel A
8. Chadwick MJ
9. Degris T
10. Modayil J
11. Wayne G
12. Soyer H
13. Viola F
14. Zhang B
15. Goroshin R
16. Rabinowitz N
17. Pascanu R
18. Beattie C
19. Petersen S
20. Sadik A
21. Gaffney S
22. King H
23. Kavukcuoglu K
24. Hassabis D
25. Hadsell R
26. Kumaran D
2018Vector-based navigation using grid-like representations in artificial agentsNature 557:429–433https://doi.org/10.1038/s41586-018-0102-6 Google Scholar
3.
1. Cueva CJ
2. Wei XX
2018Emergence of grid-like representations by training recurrent neural networks to perform spatial localizationarXiv http://arxiv.org/abs/1803.07770 Google Scholar
4.
1. Diehl GW
2. Hon OJ
3. Leutgeb S
4. Leutgeb JK
2017Grid and Nongrid Cells in Medial Entorhinal Cortex Represent Spatial Location and Environmental Features with Complementary Coding SchemesNeuron 94:83–92https://doi.org/10.1016/j.neuron.2017.03.004 Google Scholar
5.
1. Dosovitskiy A
2. Beyer L
3. Kolesnikov A
4. Weissenborn D
5. Zhai X
6. Unterthiner T
7. Dehghani M
8. Minderer M
9. Heigold G
10. Gelly S
11. Uszkoreit J
12. Houlsby N
2020An Image is Worth 16x16 Words: Transformers for Image Recognition at ScalearXiv https://doi.org/10.48550/arxiv.2010.11929 Google Scholar
6.
1. Dupret D
2. O’Neill J
3. Pleydell-Bouverie B
4. Csicsvari J
2010The reorganization and reactivation of hippocampal maps predict spatial memory performanceNature Neuroscience 13:995–1002https://doi.org/10.1038/nn.2599 Google Scholar
7.
1. Ebitz RB
2. Hayden BY
2021The population doctrine in cognitive neuroscienceNeuron 109:3055–3068https://doi.org/10.1016/j.neuron.2021.07.011 Google Scholar
8.
1. Eichenbaum H
2015Perspectives on 2014 Nobel PrizeHippocampus 25:679–681https://doi.org/10.1002/hipo.22445 Google Scholar
9.
1. Eichenbaum H
2018Barlow versus Hebb: When is it time to abandon the notion of feature detectors and adopt the cell assembly as the unit of cognition?Neuroscience Letters 680:88–93https://doi.org/10.1016/j.neulet.2017.04.006 Google Scholar
10.
1. Eickenberg M
2. Gramfort A
3. Varoquaux G
4. Thirion B
2017Seeing it all: Convolutional network layers map the function of the human visual systemNeuroImage 152:184–194https://doi.org/10.1016/J.NEUROIMAGE.2016.10.001 Google Scholar
11.
1. Franzius M
2. Sprekeler H
3. Wiskott L
2007aSlowness and Sparseness Lead to Place, Head-Direction, and Spatial-View CellsPLoS Computational Biology 3:e166https://doi.org/10.1371/journal.pcbi.0030166 Google Scholar
12.
1. Franzius M
2. Vollgraf R
3. Wiskott L
2007bFrom grids to placesJournal of Computational Neuroscience 22:297–299https://doi.org/10.1007/s10827-006-0013-7 Google Scholar
13.
1. Fukushima K
1980Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in positionBiological Cybernetics 36:193–202https://doi.org/10.1007/BF00344251 Google Scholar
14.
1. Grieves RM
2. Jeffery KJ
2017The representation of space in the brainBehavioural Processes 135:113–131https://doi.org/10.1016/j.beproc.2016.12.012 Google Scholar
15.
1. Grieves RM
2. Jedidi-Ayoub S
3. Mishchanchuk K
4. Liu A
5. Renaudineau S
6. Jeffery KJ
2020The place-cell representation of volumetric space in ratsNature Communications 11:789https://doi.org/10.1038/s41467-020-14611-7 Google Scholar
16.
1. Güçclü U
2. van Gerven MAJ
2015Deep neural networks reveal a gradient in the complexity of neural representations across the ventral streamJournal of Neuroscience 35:10005–10014https://doi.org/10.1523/JNEUROSCI.5023-14.2015 Google Scholar
17.
1. Hafting T
2. Fyhn M
3. Molden S
4. Moser MB
5. Moser EI
2005Microstructure of a spatial map in the entorhinal cortexNature 436:801–806https://www.nature.com/articles/nature03721 Google Scholar
18.
1. Hales J
2. Schlesiger M
3. Leutgeb J
4. Squire L
5. Leutgeb S
6. Clark R
2014Medial Entorhinal Cortex Lesions Only Partially Disrupt Hippocampal Place Cells and Hippocampus-Dependent Place MemoryCell Reports 9:893–901https://doi.org/10.1016/j.celrep.2014.10.009 Google Scholar
19.
1. Hartley T
2. Lever C
3. Burgess N
4. O’Keefe J
2014Space in the brain: how the hippocampal formation supports spatial cognitionPhilosophical Transactions of the Royal Society B: Biological Sciences 369:20120510https://doi.org/10.1098/rstb.2012.0510 Google Scholar
20.
1. He K
2. Zhang X
3. Ren S
4. Sun J
2016Deep residual learning for image recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern RecognitionIEEE Computer Society https://doi.org/10.1109/CVPR.2016.90 Google Scholar
21.
1. Hebb D
1949The Organizationof Behavior: A Neuropsychological Theory, 2002nd ednNew York: Psychology Press Google Scholar
22.
1. Hollup SA
2. Molden S
3. Donnett JG
4. Moser MB
5. Moser EI
2001Accumulation of Hippocampal Place Fields at the Goal Location in an Annular Watermaze TaskThe Journal of Neuroscience 21:1635–1644https://doi.org/10.1523/JNEUROSCI.21-05-01635.2001 Google Scholar
23.
1. Hubel DH
2. Wiesel TN
1959Receptive fields of single neurones in the cat’s striate cortexThe Journal of Physiology 148:574–591https://doi.org/10.1113/JPHYSIOL.1959.SP006308 Google Scholar
24.
1. Høydal Skytøen ER
2. Andersson SO
3. Moser MB
4. Moser EI
2019Objectvector coding in the medial entorhinal cortexNature 568:400–404https://doi.org/10.1038/s41586-019-1077-7 Google Scholar
25.
1. Khaligh-Razavi SM
2. Kriegeskorte N
2014Deep Supervised, but Not UnsupervisedModels May Explain IT Cortical Representation. PLoS Computational Biology 10:e1003915https://doi.org/10.1371/journal.pcbi.1003915 Google Scholar
26.
1. Krizhevsky A
2. Sutskever I
3. Hinton GE
2012ImageNet classification with deep convolutional neural networksIn: Advances in Neural Information Processing Systems Google Scholar
27.
1. Kropff E
2. Treves A
2008The emergence of grid cells: Intelligent design or just adaptation?Hippocampus 18:1256–1269https://doi.org/10.1002/hipo.20520 Google Scholar
28.
1. Latuske P
2. Toader O
3. Allen K
2015Interspike Intervals Reveal Functionally Distinct Cell Populations in the Medial Entorhinal CortexJournal of Neuroscience 35:10963–10976https://doi.org/10.1523/JNEUROSCI.0276-15.2015 Google Scholar
29.
1. Lever C
2. Burton S
3. Jeewajee A
4. O’Keefe J
5. Burgess N
2009Boundary Vector Cells in the Subiculum of the Hippocampal FormationThe Journal of Neuroscience 29:9771–9777https://doi.org/10.1523/JNEUROSCI.1319-09.2009 Google Scholar
30.
1. Long X
2. Zhang SJ
2021A novel somatosensory spatial navigation system outside the hippocampal formationCell Research 31:649–663https://doi.org/10.1038/s41422-020-00448-8 Google Scholar
31.
1. Long X
2. Deng B
3. Cai J
4. Chen ZS
5. Zhang SJ
2021A compact spatial map in V2 visual cortex. preprintNeuroscience https://doi.org/10.1101/2021.02.11.430687 Google Scholar
32.
1. McKenzie S
2. Robinson NTM
3. Herrera L
4. Churchill JC
5. Eichenbaum H
2013Learning Causes Reorganization of Neuronal Firing Patterns to Represent Related Experiences within a Hippocampal SchemaJournal of Neuroscience 33:10243–10256https://doi.org/10.1523/JNEUROSCI.0879-13.2013 Google Scholar
33.
1. McMahon E
2. Isik L
2023Seeing social interactionsTrends in Cognitive Sciences 27:1165–1179https://doi.org/10.1016/j.tics.2023.09.001 Google Scholar
34.
1. McNaughton L
2. Chen LL
3. Markus J
1991DeadReckoning,”kmdmark Learning, and the Sense of Direction: A Neurophysiological and Computational HypothesisJournal of Cognitive Neuroscience 3:190–202Google Scholar
35.
1. Mok RM
2. Love BC
2019A non-spatial account of place and grid cells based on clustering models of concept learningNature Communications 10:1https://doi.org/10.1038/S41467-019-13760-8 Google Scholar
36.
1. Mok RM
2. Love BC
2023A multilevel account of hippocampal function in spatial and concept learning: Bridging models of behavior and neural assembliesScience Advances 9:eade6903https://doi.org/10.1126/sciadv.ade6903 Google Scholar
37.
1. Moser EI
2. Kropff E
3. Moser MB
2008Place Cells, Grid Cells, and the Brain’s Spatial Representation SystemAnnual Review of Neuroscience 31:69–89https://doi.org/10.1146/annurev.neuro.31.061307.090723 Google Scholar
38.
1. Nayebi A
2. Attinger A
3. Campbell MG
4. Hardcastle K
5. Low II
6. Mallory CS
7. Mel GC
8. Sorscher B
9. Williams AH
10. Ganguli S
11. Giocomo LM
12. Yamins DL
2021Explaining heterogeneity in medial entorhinal cortex with task-driven neural networks. In: Advances in Neural Information Processing SystemsNeuroscience https://doi.org/10.1101/2021.10.30.466617 Google Scholar
39.
1. O’Keefe J
2. Dostrovsky J
1971The hippocampus as a spatial mapPreliminary evidence from unit activity in the freely-moving rat. Brain Research 34:171–175https://doi.org/10.1016/0006-8993(71)90358-1 Google Scholar
40.
1. Olshausen BA
2. Field DJ
2006What Is the Other 85 Percent of V1 Doing?
In:
1. Van Hemmen JL
2. Sejnowski TJ
, editors. 23 Problems in Systems Neuroscience New York: Oxford University Press pp. 182–212
https://doi.org/10.1093/acprof:oso/9780195148220.003.0010 Google Scholar
41.
1. Ormond J
2. O’Keefe J
2022Hippocampal place cells have goaloriented vector fields during navigationNature 607:741–746https://doi.org/10.1038/s41586-022-04913-9 Google Scholar
42.
1. Quiroga RQ
2. Reddy L
3. Kreiman G
4. Koch C
5. Fried I
2005Invariant visual representation by single neurons in the human brainNature 435:1102–1107https://doi.org/10.1038/nature03687 Google Scholar
43.
1. Saleem AB
2. Diamanti EM
3. Fournier J
4. Harris KD
5. Carandini M
2018Coherent encoding of subjective spatial position in visual cortex and hippocampusNature 562:124–127https://doi.org/10.1038/s41586-018-0516-1 Google Scholar
44.
1. Sarel A
2. Finkelstein A
3. Las L
4. Ulanovsky N
2017Vectorial representation of spatial goals in the hippocampus of batsScience 355:176–180https://doi.org/10.1126/science.aak9589 Google Scholar
45.
1. Simonyan K
2. Zisserman A
2015Very Deep Convolutional Networks for LargeScale Image RecognitionarXiv http://arxiv.org/abs/1409.1556 Google Scholar
46.
1. Simonyan K
2. Vedaldi A
3. Zisserman A
2014Deep inside convolutional networks: Visualising image classification models and saliency maps2nd International Conference on Learning Representations, ICLR 2014 Workshop Track Proceedings pp 1–8 Google Scholar
47.
1. Stachenfeld KL
2. Botvinick MM
3. Gershman SJ
2017The hippocampus as a predictive mapNature Neuroscience 20:1643–1653https://doi.org/10.1038/nn.4650 Google Scholar
48.
1. Tang Q
2. Ebbesen CL
3. Sanguinetti-Scheck JI
4. Preston-Ferrer P
5. Gundlfinger A
6. Winterer J
7. Beed P
8. Ray S
9. Naumann R
10. Schmitz D
11. Brecht M
12. Burgalossi A
2015Anatomical Organization and Spatiotemporal Firing Patterns of Layer 3 Neurons in the Rat Medial Entorhinal CortexJournal of Neuroscience 35:12346–12354https://doi.org/10.1523/JNEUROSCI.0696-15.2015 Google Scholar
49.
1. Tanni S
2. De Cothi W
3. Barry C
2022State transitions in the statistically stable place cell population correspond to rate of perceptual changeCurrent Biology 32:3505–3514https://doi.org/10.1016/j.cub.2022.06.046 Google Scholar
50.
1. Taube J
2. Muller R
3. Ranck J
1990Head-direction cells recorded from the postsubiculum in freely moving ratsII. Effects of environmental manipulations. The Journal of Neuroscience 10:436–447https://doi.org/10.1523/JNEUROSCI.10-02-00436.1990 Google Scholar
51.
1. Tsao A
2. Sugar J
3. Lu L
4. Wang C
5. Knierim JJ
6. Moser MB
7. Moser EI
2018Integrating time from experience in the lateral entorhinal cortexNature 561:57–62https://doi.org/10.1038/s41586-018-0459-6 Google Scholar
52.
1. Vaswani A
2. Shazeer N
3. Parmar N
4. Uszkoreit J
5. Jones L
6. Gomez AN
7. Kaiser L
8. Polosukhin I
2017Attention is all you needIn: Advances in Neural Information Processing Systems Google Scholar
53.
1. Whishaw IQ
2. Tomie JA
1997Perseveration on place reversals in spatial swimming pool tasks: Further evidence for place learning in hippocampal ratsHippocampus 7:361–370https://doi.org/10.1002/(SICI)1098-1063(1997)7:4<361::AID-HIPO2>3.0.CO;2-M Google Scholar
54.
1. Whishaw IQ
2. McKenna JE
3. Maaswinkel H
1997Hippocampal lesions and path integrationCurrent Opinion in Neurobiology 7:228–234https://doi.org/10.1016/S0959-4388(97)80011-6 Google Scholar
55.
1. Whittington JC
2. Muller TH
3. Mark S
4. Chen G
5. Barry C
6. Burgess N
7. Behrens TE
2020The Tolman-Eichenbaum Machine: Unifying Space and Relational Memory through Generalization in the Hippocampal FormationCell 183:1249–1263https://doi.org/10.1016/J.CELL.2020.10.024 Google Scholar
56.
1. Yamins DLK
2. Hong H
3. Cadieu CF
4. Solomon EA
5. Seibert D
6. DiCarlo JJ
2014Performance-optimized hierarchical models predict neural responses in higher visual cortexProceedings of the National Academy of Sciences of the United States of America 111:8619–8624https://doi.org/10.1073/pnas.1403112111 Google Scholar
57.
1. Zeman AA
2. Ritchie JB
3. Bracci S
4. de Beeck H Op
2020Orthogonal Representations of Object Shape and Category in Deep Convolutional Neural Networks and Human Visual CortexScientific Reports 10:1–12https://doi.org/10.1038/s41598-020-59175-0 Google Scholar

Article and author information

Author information

Xiaoliang Luo
Department of Experimental Psychology, University College London, London, UK
ORCID iD: 0000-0002-5297-2114
- To whom correspondence should be addressed; E-mail: xiao.luo.17@ucl.ac.uk
Robert M Mok
MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK, Department of Psychology, Royal Holloway, University of London Royal Holloway, University of London, Egham, UK
ORCID iD: 0000-0001-7261-9257
- Co-senior Authors.
Bradley C Love
Department of Experimental Psychology, University College London, London, UK, The Alan Turing Institute, London, UK
ORCID iD: 0000-0002-7883-7076
- Co-senior Authors.

Version history

Preprint posted: January 12, 2024
Sent for peer review: May 8, 2024
Reviewed Preprint version 1: August 28, 2024

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.99047. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 949
downloads: 52
citation: 1

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Significance of findings

Strength of evidence

Abstract

Summary

1 Introduction

Assessing spatial knowledge in non-spatial perception systems using a linear decoding approach in a virtual environment.

2 Results

2.1 Spatial Knowledge through Non-spatial Systems

Perception models absent a spatial basis possess extensive spatial knowledge.

2.2 “Spatial Cells” Inevitably Arise in Complex Computational Systems

Representations of perception models developed for object recognition exhibit typical spatial cell-like firing profiles.

2.3 “Spatial Cells” Are Superfluous in Spatial Cognition

“Spatial” cells do not play a privileged role in spatial cognition.

3 Discussion

4 Methods

4.1 Virtual environment

Environment specifications

Visual input

4.2 Spatial decoding

Decoder training and testing

Feature representation and selection

Performance measure

Baselines

4.3 Profiling spatial properties of model units

Place cells

Head-direction cells

Border cells

4.4 Spatial decoding with unit exclusion

Excluding units by spatial profile

Excluding units by task-relevance

Data and code availability

Acknowledgements

VGG-16 (untrained).

ResNet-50 (trained).

ResNet-50 (untrained).

ViT-B/16 (trained).

ViT-B/16 (untrained).

Distribution of different spatial unit types across layers of perceptual models of object recognition.

Examples of model units exhibiting spatial characteristics.

Excluding units with the strongest spatial profiles had minimal impact on spatial knowledge (ResNet-50).

Excluding units by task-relevance affects spatial decoding performance (ResNet-50).

Excluding units with the strongest spatial profiles had minimal impact on spatial knowledge (ViT-B/16).

Excluding units by task-relevance affects spatial decoding performance (ViT-B/16).

References

Article and author information

Author information

Xiaoliang Luo

Robert M Mok**

Bradley C Love**

Version history

Cite all versions

Copyright

Metrics

Robert M Mok

Bradley C Love