Concurrent category-selective neural activity across the ventral occipito-temporal cortex supports a non-hierarchical view of human visual recognition

  1. Université de Lorraine, CNRS, IMoPA, Nancy, France
  2. Université de Lorraine, CHRU-Nancy, Service de Neurologie, Nancy, France
  3. Université de Lorraine, CHRU-Nancy, Service de Neurochirurgie, Nancy, France

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Iris Groen
    University of Amsterdam, Amsterdam, Netherlands
  • Senior Editor
    Michael Frank
    Brown University, Providence, United States of America

Reviewer #1 (Public review):

Summary:

This manuscript aims to test the idea that visual recognition (of faces) is hierarchically organized in the human ventral occipital-temporal cortex (VOTC). The paper proposes that if VOTC has a hierarchical organization, this should be seen in two independent features of the VOTC signal. First, hierarchy assumes that signals along the hierarchy increase in representational complexity. Second, hierarchy assumes a progressive increase in the onset time of the earliest neural response at each level of the hierarchy. To test these predictions, the authors extract high-frequency broadband signals from iEEG electrodes in a very large sample of patients (N=140). They find that face selectivity in these signals is distributed across the VOTC with increasing posterior-anterior face selectivity, hence providing evidence for the first prediction. However, they also find broadband activity to occur concurrently, therefore challenging the view of a serial hierarchy.

Strengths:

(1) The hypothesis (that VOTC is hierarchically organized) and predictions (that hierarchy predicts increases in representational complexity and increases in onset time) were clearly described.

(2) The number of subjects sampled (140) is extremely large for iEEG studies that typically involve <10 subjects. Also, 444 face selective recording contacts provide a very nice sampling of the areas of interest.

Weaknesses:

(1) A control analysis where areas have known differences in response onset should be performed to increase confidence that the proposed analyses would reveal expected results when a difference in response onset was present across areas. From Figure 3, it can be seen that many electrodes are placed in earlier visual areas (V1-V3) that have previously been shown to have earlier broadband responses to visual images compared to VOTC (e.g. Martin et al., 2019, JNeurosci https://doi.org/10.1523/JNEUROSCI.1889-18.2018). The same analyses as in Figures 4 and 5 should be used comparing VOTC to early visual areas to confirm that the analyses would detect that V1-V3 have earlier onsets compared to VOTC.

(2) It is unclear why correlating mean timeseries helps understand how much variance is shared between regions (Figure 4). Any variance between images is lost when averaging time series across all images, and this metric thus overestimates the variance shared between areas. Moreover, the finding that correlating time domain signals across VOTC areas does not differ from correlating signals within an area could be driven by this averaging. For example, if the same analysis was done on electrodes in left and right V1 when half of the images had contrast in the left hemifield and the other half had contrast in the right hemifield, the average signals may correlate extremely well, while this correlation falls apart on a trial-by-trial basis. These analyses therefore need to be evaluated on a trial-by-trial basis.

(3) Previous studies on visual processing in VOTC have shown that evoked potentials are more predictive of the onset of visual stimuli than broadband activity (e.g. Miller et al., 2016, PLOS CB, https://doi.org/10.1371/journal.pcbi.1004660). Testing the prediction from a hierarchical representation that signals along the VOTC increase in onset time should therefore include an evaluation of evoked potential onsets in addition to broadband signals.

(4) Testing the second prediction, that the onset time of processing increases along the VOTC posterior to anterior path, is difficult using the iEEG broadband signal, because from a signal processing perspective, broadband signals are inherently temporally inaccurate, given that they are filtered. Any filtering in the signal introduces a certain level of temporal smoothing. The manuscript should clearly describe the level of temporal smoothing for the filter settings used.

(5) The onsets of neural activity in VOTC are surprisingly early: around 80-100 ms. This is earlier than what has previously been reported. For example, the cited Quian Quiroga et al. (2023) found single neuron responses to have the earlier onset around 125 ms (their Figure 3). Similarly, the cited Jacques et al., 2016b and Kadipasaoglu et al., 2017 papers also observe broadband onsets in VOTC after 100 ms. Understanding the temporal smoothing in the broadband signal, as well as showing that typical evoked potentials have latencies compared to other work, would increase confidence that latencies are not underestimated due to factors in the analysis pipeline.

(6) Understanding the extent to which neural processing in the VOTC is hierarchical is essential for building models of vision that capture processing in the human brain, and the data provides novel insight into these processes.

For additional context, a schematic figure of the hierarchical view and a more parallel system described in the paragraph on models of visual recognition (lines 553) would help the reader interpret and understand the implications of the paper.

Reviewer #2 (Public review):

Summary:

This very ambitious project addresses one of the core questions in visual processing related to the underlying anatomical and functional architecture. Using a large sample of rare and high-quality EEG recordings in humans, the authors assess whether face-selectivity is organised along a posterior-anterior gradient, with selectivity and timing increasing from posterior to anterior regions. The evidence suggests that it is the case for selectivity, but the data are more mixed about the temporal organisation, which the authors use to conclude that the classic temporal hierarchy described in textbooks might be questioned, at least when it comes to face processing.

Strengths:

A huge amount of work went into collecting this highly valuable dataset of rare intracranial EEG recordings in humans. The data alone are valuable, assuming they are shared in an easily accessible and documented format. Currently, the OSF repository linked in the article is empty, so no assessment of the data can be made. The topic is important, and a key question in the field is addressed. The EEG methodology is strong, relying on a well-established and high SNR SSVEP method. The method is particularly well-suited to clinical populations, leading to interpretable data in a few minutes of recordings. The authors have attempted to quantify the data in many different ways and provided various estimates of selectivity and timing, with matching measures of uncertainty. Non-parametric confidence intervals and comparisons are provided. Collectively, the various analyses and rich illustrations provide superficially convincing evidence in favour of the conclusions.

Weaknesses:

(1) The work was not pre-registered, and there is no sample size justification, whether for participants or trials/sequences. So a statistical reviewer should assess the sensitivity of the analyses to different approaches.

(2) Frequentist NHST is used to claim lack of effects, which is inappropriate, see for instance:

Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: A guide to misinterpretations. European Journal of Epidemiology, 31(4), 337-350. https://doi.org/10.1007/s10654-016-0149-3

Rouder, J. N., Morey, R. D., Verhagen, J., Province, J. M., & Wagenmakers, E.-J. (2016). Is There a Free Lunch in Inference? Topics in Cognitive Science, 8(3), 520-547. https://doi.org/10.1111/tops.12214

(3) In the frequentist realm, demonstrating similar effects between groups requires equivalence testing, with bounds (minimum effect sizes of interest) that should be pre-registered:

Campbell, H., & Gustafson, P. (2024). The Bayes factor, HDI-ROPE, and frequentist equivalence tests can all be reverse engineered-Almost exactly-From one another: Reply to Linde et al. (2021). Psychological Methods, 29(3), 613-623. https://doi.org/10.1037/met0000507

Riesthuis, P. (2024). Simulation-Based Power Analyses for the Smallest Effect Size of Interest: A Confidence-Interval Approach for Minimum-Effect and Equivalence Testing. Advances in Methods and Practices in Psychological Science, 7(2), 25152459241240722. https://doi.org/10.1177/25152459241240722

(4) The lack of consideration for sample sizes, the lack of pre-registration, and the lack of a method to support the null (a cornerstone of this project to demonstrate equivalence onsets between areas), suggest that the work is exploratory. This is a strength: we need rich datasets to explore, test tools and generate new hypotheses. I strongly recommend embracing the exploration philosophy, and removing all inferential statistics: instead, provide even more detailed graphical representations (include onset distributions) and share the data immediately with all the pre-processing and analysis code.

(5) Even if the work was pre-registered, it would be very difficult to calculate p-values conditional on all the uncertainty around the number of participants, the number of contacts and the number of trials, as they are random variables, and sampling distributions of key inferences should be integrated over these unknown sources of variability. The difficulty of calculating/interpreting p-values that are conditional on so many pre-processing stages and sources of uncertainty is traditionally swept under the rug, but nevertheless well documented:

Kruschke, J.K. (2013) Bayesian estimation supersedes the t test. J Exp Psychol Gen, 142, 573-603. https://pubmed.ncbi.nlm.nih.gov/22774788/

Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14(5), 779-804. https://doi.org/10.3758/BF03194105
https://link.springer.com/article/10.3758/BF03194105

(6) Currently, there is no convincing evidence in the article to clearly support the main claims.

Bootstrap confidence intervals were used to provide measures of uncertainty. However, the bootstrapping did not take the structure of the data into account, collapsing across important dependencies in that nested structure: participants > hemispheres > contacts > conditions > trials.

Ignoring data dependencies and the uncertainty from trials could lead to a distorted CI. Sampling contacts with replacement is inappropriate because it breaks the structure of the data, mixing degrees of freedom across different levels of analysis. The key rule of the bootstrap is to follow the data acquisition process, and therefore, sampling participants with replacement should come first. In a hierarchical bootstrap, the process can be repeated at nested levels, so that for each resampled participant, then contacts are resampled (if treated as a random variable), then trials/sequences are resampled, keeping paired measurements together (hemispheres, and typically contacts in a standard EEG experiment with fixed montage). The same hierarchical resampling should be applied to all measurements and inferences to capture all sources of variability. Selectivity and timing should be quantified at each contact after resampling of trials/sequences before integrating across hemispheres and participants using appropriate and justified summary measures.

The authors already recognise part of the problem, as they provide within-participant analyses. This is a very good step, inasmuch as it addresses the issue of mixing-up degrees of freedom across levels, but unfortunately these analyses are plagued with small sample sizes, making claims about the lack of differences even more problematic--classic lack of evidence == evidence of absence fallacy. In addition, there seem to be discrepancies between the mean and CI in some cases: 15 [-20, 20]; 8 [-24, 24].

(7) Three other issues related to onsets:

(a) FDR correction typically doesn't allow localisation claims, similarly to cluster inferences:

Winkler, A. M., Taylor, P. A., Nichols, T. E., & Rorden, C. (2024). False Discovery Rate and Localizing Power (No. arXiv:2401.03554). arXiv. https://doi.org/10.48550/arXiv.2401.03554

Rousselet, G. A. (2025). Using cluster-based permutation tests to estimate MEG/EEG onsets: How bad is it? European Journal of Neuroscience, 61(1), e16618. https://doi.org/10.1111/ejn.16618

(b) Percentile bootstrap confidence intervals are inaccurate when applied to means. Alternatively, use a bootstrap-t method, or use the pb in conjunction with a robust measure of central tendency, such as a trimmed mean.

Rousselet, G. A., Pernet, C. R., & Wilcox, R. R. (2021). The Percentile Bootstrap: A Primer With Step-by-Step Instructions in R. Advances in Methods and Practices in Psychological Science, 4(1), 2515245920911881. https://doi.org/10.1177/2515245920911881

(c) Defining onsets based on an arbitrary "at least 30 ms" rule is not recommended:

Piai, V., Dahlslätt, K., & Maris, E. (2015). Statistically comparing EEG/MEG waveforms through successive significant univariate tests: How bad can it be? Psychophysiology, 52(3), 440-443. https://doi.org/10.1111/psyp.12335

(8) Figure 5 and matching analyses: There are much better tools than correlations to estimate connectivity and directionality. See for instance:

Ince, R. A. A., Giordano, B. L., Kayser, C., Rousselet, G. A., Gross, J., & Schyns, P. G. (2017). A statistical framework for neuroimaging data analysis based on mutual information estimated via a Gaussian copula. Human Brain Mapping, 38(3), 1541-1573. https://doi.org/10.1002/hbm.23471

(9) Pearson correlation is sensitive to other features of the data than an association, and is maximally sensitive to linear associations. Interpretation is difficult without seeing matching scatterplots and getting confirmation from alternative robust methods.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation