Human EEG and artificial neural networks reveal disentangled representations and processing timelines of object real-world size and depth in natural images

  1. Zitong Lu  Is a corresponding author
  2. Julie Golomb
  1. Department of Psychology, The Ohio State University, United States
  2. McGovern Institute for Brain Research, Massachusetts Institute of Technology, United States
5 figures and 2 additional files

Figures

Figure 1 with 2 supplements
Overview of our analysis pipeline including constructing three types of RDMs and conducting comparisons between them.

We computed RDMs from three sources: neural data (EEG), hypothesized object features (real-world size, retinal size, and real-world depth), and artificial models (ResNet, CLIP, and Word2Vec). Then we conducted cross-modal representational similarity analyses between: EEG ×HYP (partial correlation, controlling for other two HYP features), ANN (or W2V) ×HYP (partial correlation, controlling for other two HYP features), and EEG ×ANN (correlation).

Figure 1—figure supplement 1
Four ANN RDMs of ResNet early layer, ResNet late layer, CLIP early layer, and CLIP late layer.
Figure 1—figure supplement 2
Word2Vec RDM.
Methods for calculating neural (EEG), hypothesis-based (HYP), and artificial neural network (ANN) & semantic language processing (Word2Vec, W2V) model-based representational dissimilarity matrices (RDMs).

(A) Steps of computing the neural RDMs from EEG data. EEG analyses were performed in a time-resolved manner on 17 channels as features. For each time t, we conducted pairwise cross-validated SVM classification. The classification accuracy values across different image pairs resulted in each 200×200 RDM for each time point. (B) Calculating the three hypothesis-based RDMs: Real-World Size RDM, Retinal Size RDM, and Real-World Depth RDM. Real-world size, retinal size, and real-world depth were calculated for the object in each of the 200 stimulus images. The number in the bracket represents the rank (out of 200, in ascending order) based on each feature corresponding to the object in each stimulus image (e.g. ‘ferry’ ranks 197th in real-world size from small to big out of 200 objects). The connection graph to the right of each RDM represents the relative representational distance of three stimuli in the corresponding feature space. (C) Steps of computing the ANN and Word2Vec RDMs. For ANNs, the inputs were the resized images, and for Word2Vec, the inputs were the words of object concepts. For clearer visualization, the shown RDMs were separately histogram-equalized (percentile units).

Figure 3 with 5 supplements
Cross-modal RSA results.

(A) Similarities (Spearman correlations) between three hypothesis-based RDMs. Asterisks indicate a significant similarity, p<0.05. (B) Representational similarity time courses (full Spearman correlations) between EEG neural RDMs and hypothesis-based RDMs. (C) Temporal latencies for peak similarity (partial Spearman correlations) between EEG and the three types of object information. Error bars indicate ± SEM. Asterisks indicate significant differences across conditions (p<0.05); (D) Representational similarity time courses (partial Spearman correlations) between EEG neural RDMs and hypothesis-based RDMs. (E) Representational similarities (partial Spearman correlations) between the four ANN RDMs and hypothesis-based RDMs of real-world depth, retinal size, and real-world size. Asterisks indicate significant partial correlations (bootstrap test, p<0.05). (F) Representational similarity time courses (Spearman correlations) between EEG neural RDMs and ANN RDMs. Color-coded small dots at the top indicate significant timepoints (cluster-based permutation test, p<0.05). Shaded area reflects ± SEM across the 10 subjects.

Figure 3—figure supplement 1
Representational similarities (partial Spearman correlations) between ANN RDMs from multiple layers in ResNet and CLIP and hypothesis-based RDMs of real-world depth, retinal size, and real-world size (as extended results of Figure 3E).

Asterisks indicate significant partial correlations (bootstrap test, p<0.05).

Figure 3—figure supplement 2
Representational similarity time courses (Spearman correlations) between EEG neural RDMs and ANN RDMs from multiple layers (as extended results of Figure 3F).

(A) Similarity between EEG and ResNet’s multiple layers (from early layer to late layer: ResNet.maxpool layer, ResNet.layer1 layer, ResNet.layer2 layer, ResNet.layer3 layer, ResNet.layer4 layer, ResNet.avgpool). (B) Similarity between EEG and CLIP’s multiple layers (from early layer to late layer: CLIP.visual.avgpool layer, CLIP.visual.layer1 layer, CLIP.visual.layer2 layer, CLIP.visual.layer3 layer, CLIP.visual.layer4 layer, CLIP.visual.attnpool). Color-coded small dots at the top indicate significant timepoints (cluster-based permutation test, p<0.05). Shaded area reflects ± SEM across the 10 subjects.

Figure 3—figure supplement 3
Representational similarity time courses (partial Spearman correlations) between EEG neural RDMs and ANN RDMs controlling for the three hypothesis-based RDMs.

Color-coded small dots at the top indicate significant timepoints (cluster-based permutation test, p<0.05). Shaded area reflects ± SEM across the 10 subjects.

Figure 3—figure supplement 4
Noise ceiling of representational similarity analysis based on EEG neural RDMs.

Shaded area reflects the lower and upper bound.

Figure 3—figure supplement 5
RSA results based on CORnet.

(A) Representational similarities (partial Spearman correlations) between the five CORnet RDMs from multiple layers (from early layer to late layer: CORnet.V1, CORnet.V2, CORnet.V4, CORnet.IT, and CORnet.decoder.avgpool) and hypothesis-based RDMs of real-world depth, retinal size, and real-world size. Asterisks indicate significant partial correlations (bootstrap test, p<0.05). (B) Representational similarity time courses (Spearman correlations) between EEG neural RDMs and the five CORnet RDMs. Color-coded small dots at the top indicate significant timepoints (cluster-based permutation test, p<0.05). Shaded area reflects ± SEM across the 10 subjects.

Figure 4 with 1 supplement
Contribution of image backgrounds to object size and depth representations.

Representational similarity results (partial Spearman correlations) between ANNs fed inputs of cropped object images without backgrounds and the hypothesis-based RDMs. Stars above bars indicate significant partial correlations (bootstrap test, p<0.05).

Figure 4—figure supplement 1
Four ANN RDMs with inputs of cropped object images without background of ResNet early layer, ResNet late layer, CLIP early layer, and CLIP late layer.
Figure 5 with 2 supplements
Representation similarity with a non-visual semantic language processing model (Word2Vec) fed word inputs corresponding to the images’ object concepts.

(A) Representational similarity results (partial Spearman correlations) between Word2Vec RDM and hypothesis-based RDMs. Stars above bars indicate significant partial correlations (bootstrap test, p<0.05). (B) Representational similarity time course (Spearman correlations) between EEG RDMs (neural activity while viewing images) and Word2Vec RDM (fed corresponding word inputs). Color-coded small dots at the top indicate significant timepoints (cluster-based permutation test, p<0.05). Line width reflects ± SEM across the 10 subjects.

Figure 5—figure supplement 1
Representational similarity time courses (partial Spearman correlations) between EEG neural RDMs and the Word2Vec RDM controlling for four ANN RDMs (ResNet early/late and CLIP early/late layers).

Color-coded small dots at the top indicate significant timepoints (cluster-based permutation test, p<0.05). Shaded area reflects ± SEM across the 10 subjects.

Figure 5—figure supplement 2
Unique and shared variance of real-world size and semantic information (from Word2Vec) in EEG neural representations using variance partitioning analysis while controlling retinal size and real-world depth.

Color-coded small dots at the top indicate significant timepoints (cluster-based permutation test, p<0.05). Shaded area reflects ± SEM across the 10 subjects.

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Zitong Lu
  2. Julie Golomb
(2025)
Human EEG and artificial neural networks reveal disentangled representations and processing timelines of object real-world size and depth in natural images
eLife 13:RP98117.
https://doi.org/10.7554/eLife.98117.3