Measuring and modeling the perceptual similarity of geometric shapes.

(A) The eleven quadrilaterals used throughout the experiments (colors are consistently used in all other figures). (B) Sample displays for the behavioral visual search task used to estimate the 11×11 shape similarity matrix. Participants had to locate the deviant shape. The right insert shows two trials from the behavioral visual task search task, used to estimate the 11×11 shape similarity matrix. Participants had to find the intruder within 9 shapes. (C) Multidimensional scaling of human dissimilarity judgments; the grey arrow indicates the projection on the MDS space of the number of geometric primitives in a shape. (D) The behavioral dissimilarity matrix (left) was better captured by a geometric feature coding model (middle) than by a convolutional neural network (right). The graph at right /textnf(E) shows the GLM coefficients for each participant.

Localizing the brain systems involved in geometric shape perception

(A) Visual categories used in the localizer. Here and in the rest of this document, faces have been masked to comply with bioRxiv’s policy, but participants were shown unmasked faces. (B) Task: Passive presentation by miniblocks of consistent visual categories. In some miniblock, among a series of 6 pictures from a given category, participants had to detect a rare target star. (C) Statistical map associated with the contrast “single geometric shape > faces, houses and tools”, projected on an inflated brain (top: adults; bottom: children; clusters significant at cluster-corrected p<.05 with nonparametric two-tailed bootstrap test as reported in the text). (D) BOLD response amplitude (regression weights, arbitrary units) within each significant cluster with subject-specific localization. Geometric shapes activate the intraparietal sulcus (IPS) and posterior inferior temporal gyrus (pITG), while causing reduced activation in broad bilateral ventral areas compared to other stimuli; see Fig.S4 for analysis of subject-specific ventral subregions.

Dissociating two neural pathways for the perception of geometric shape.

(A) fMRI intruder task. Participants indicated the location of a deviant shape via button clicks (left or right). Deviants were generated by moving a corner by a fixed amount in four different directions. (B) Performance inside the fMRI: both populations tested displayed an increasing error rate with geometric shape complexity, which significantly correlates with previous data collected online. (C) Whole brain correlation of the BOLD signal with geometric regularity in adults, as measured by the error rate in a previous online intruder detection task (Sablé-Meyer et al., 2021). Positive correlations are shown in red and negative ones in blue. Voxel threshold p<.001, cluster-corrected by permutation at p<.05. Side panels show the activation in two significant ROIs whose coordinates were identified in adults and where the correlation was also found in children (one-tailed test, corrected for the number of ROIs tested this way). (D) Whole-brain searchlight-based RSA analysis in adults (same statistical thresholds). Colors indicate the model which elicited the cluster: purple for CNN encoding, orange for the geometric feature model, green for their overlap.

Using MEG to time the two successive neural codes for geometric shapes

(A) Task structure: participants passively watch a constant stream of geometric shapes, one per second (presentation time 800ms). The stimuli are presented in blocks of 30 identical shapes up to scaling and rotation, with 4 occasional deviant shape. Participants do not have a task to perform beside fixating. (B,C) Performance of a classifier using MEG signals to predict whether the stimulus is a regular shape or an oddball. Left: performance for each shape; middle: correlation with geometrical regularity (same x axis as in figure 3C); right: visualization of the average decoding performance over the cluster. In B, training of the classifier was performed on MEG signals from all 11 shapes; In C, eleven different classifiers were trained separately, one for each shape. (D) Sensor-level temporal RSA analysis. At each time point, the 11×11 dissimilarity matrix of MEG signals was modeled by the two model RDMs in Fig. 1D, and the graph shows the time course of the corresponding whitened correlation coefficients. Below the timecourses, we display the average empirical dissimilarity matrix across participants at two notable timepoints: when the correlation with the CNN and Geometric Feature models are maximal (CNN: t=84ms; Geometric Features: 232ms) (E) Source-level temporal-spatial searchlight RSA. Same analysis as in C, but now after reconstruction of cortical source activity.

Additional CNN encoding models of human behavior.

A, Correlation matrix between all pairs of RDMs generated by each CNN and layer considered. B, Replication of Fig.1D with different CNNs. Stars indicate a significant difference between the geometric feature model and the respective CNN encoding model (p<0.001). t and p values also indicate whether the CNN encoding model is a significant predictor of participant’s behavior (the geometric feature model is always highly significant). C Replication of B using different layers of CORnet, organized from early to late layers, from left to right. Note that the late layers are much more significant predictors of human behavior than the early ones – although still far inferior to the geometric feature model. D Replication of our GLM analysis including only shapes for which there is no obvious name in English–though we gave them names in this manuscriptto refer to them: “kite”, “rightKite”, “hinge”, “rustedHinge” and “random”.

Overview of the stimuli used for the category localizer.

A, Average pixel value (left) and average standard deviation across pixels (right) for stimuli within each category (y axis). An ANOVA indicated no significant effect of the stimulus category on either the average or the standard deviation across pixels. B, Average (top) and max (bottom) pixel value at each location across the eight possible visual categories used in the localizer.

Details of the fMRI results in children (complement to Fig.2 and Fig.3 in the main text).

A: Statistical map associated with the contrast “single geometric shape > faces, houses and tools”, projected on an inflated brain (top: adults; bottom: children; for illustration purpose we display the uncorrected statistical map at the p<.01 level). Notice how similar the activations are in both age groups. B: Same as A, but for the contrast “single geometric shape > all single-object visual categories (face, house, tools, Chinese characters)”. The activation maps are very similar to the previous contrast, and very similar across age groups. C: Whole brain correlation of the BOLD signal with geometric regularity in children, as measured by the error rate in a previous online intruder detection task (Sablé-Meyer et al., 2021). Positive correlations are shown in red and negative ones in blue. Voxel threshold p<.001, no correction for multiple comparisons, but the p-value indicates the only cluster that was significant at the cluster-level corrected p<.05 threshold. D, Results of RSA analysis in children. No cluster was significant at the p<.05 level for the geometric feature models; one right-lateralized occipital cluster reached significance for the CNN encoding model (cluster-level corrected p=.019), and its symmetrical counterpart was close to the significance threshold (cluster-level corrected p=.062)

fMRI response of subject-specific voxels in the ventral visual pathway to geometric shapes and other visual stimuli.

The brain slices show the group-level clusters associated to various contrasts known to elicit a selective response in the ventral visual pathway, in both age groups: VWFA (words > others; green), FFA (faces > others; purple), tool-selective ROIs (tools > others; red) and PPA (houses > others; light blue). Within each ROI, plots show the mean beta coefficients for the BOLD effect within a subject-specific selection of the 10% best voxels, using independent runs for selection and plotting to avoid circularity and “double-dipping”.

Coordinates and characteristics of significant fMRI clusters responding to geometric shapes in localizer runs.

For each age group, each line gives the peak coordinates, volume and statistics of a cluster with p<.05 (whole brain, permutation test) for the contrast “single shape > other single visual categories”. The sign of the peak t-value and the shading indicate whether the contrast was positive (white background) or negative (grey background). Coordinates are given in MNI space.

A Overlap in red between out geometry contrast in green (shape versus other single objects) and our number contrast in orange (numbers > words), in three slices: two that intersect with the bilateral IPS areas (z=60 and z=52) and the rITG (z=2) in both populations. To help visualize areas that coincide between populations but did not reach significance in one or the other, the maps here are uncorrected, p<.01. B Statistical map from (Pinel et al., 2001) showing areas where activation was correlated with numerical distance in a number comparison task, slice at z=48 (p<.001, uncorrected). C Statistical map from (Amalric and Dehaene, 2016) showing the overlap between three math-related tasks, including high-level mathematical judgments in mathematicians; slice at z=52. D Statistical tests for the “shape > other categories” contrast from the ROIs identified in independent work (Amalric and Dehaene, 2016) in both populations, with all ROIs having a significant test at the p<.05 level except the LpITG.

Coordinates and characteristics of significant fMRI clusters in the RSA analysis.

Same organisation as table S1 for the RSA analysis.

Additional Models: Behavior and MEG.

A. Correlation between several models and the average RDM across participants. In particular, we have added the last two layers of DINOv2 (Oquab et al., 2023) as well as two different implementation of distances in skeletal spaces (Ayzenberg and Lourenco, 2019; Morfoisse and Izard, 2021) B. Fig. 1D with the symbolic model replaced with the empirical RDM obtained from the last layer of DINOv2 using Euclidean distance. C. Fig. 4C with the same replacement of the symbolic model with the last layer of DINOv2. D. Fig. 4D with the same replacement of the symbolic model with the last layer of DINOv2. E. Timecourse of the similarity between empirical RDMs and two additional neural networks: a vision transformer network (ViT, top (Dosovitskiy et al., 2020)), and a large CNN (ConvNeXT, bottom, (Liu et al., 2022)), both with many parameters (respectively ∼1 billion and ∼800 million) and trained on 2 billion images (Cherti et al., 2023; Schuhmann et al., 2022)).

Additional Models: fMRI.

t-values inside significant clusters at the p<.05 level for four models: geometric features (top left), CNN encoding (top right), DINOv2 last layer (bottom left) and skeletal representations from (Morfoisse and Izard, 2021) (bottom right). Skeletal representations from (Ayzenberg and Lourenco, 2019) did not yield any significant clusters in adults. In children (bottom), only the DINOv2 elicited any significant cluster.