Layer-specific spatiotemporal dynamics of feedforward and feedback in human visual object perception

Tony Carricarte; Siying Xie; Johannes Singer; Robert Trampel; Laurentius Huber; Zejin Lu; Tim C Kietzmann; Nikolaus Weiskopf; Radoslaw M Cichy

doi:10.7554/eLife.111186.1

eLife Assessment

This valuable study combines sub-millimeter 7T fMRI, EEG, representational similarity analysis, and deep neural network modeling to investigate layer-specific spatiotemporal dynamics underlying human object processing in early visual cortex and lateral occipital cortex; the authors report temporally distinct signatures in superficial layers of LOC that are interpreted as reflecting sequential feedforward and feedback processing during visual recognition. The multimodal methodological approach and empirical dataset are substantial and will be of broad interest to researchers in visual neuroscience, layer-fMRI methodology, and computational vision. However, the evidence supporting the central interpretation of interareal feedback remains incomplete, as the observed dynamics could also be explained by alternative mechanisms such as within-area recurrent processing, and there are additional concerns regarding several methodological and modeling choices underlying claims about increasing representational complexity at later time points. Overall, the study provides solid evidence for layer- and time-specific neural dynamics during object processing, while the interpretation of these signals as feedback-related remains provisional.

https://doi.org/10.7554/eLife.111186.1.sa3

Significance of findings

valuable: Findings that have theoretical or practical implications for a subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

incomplete: Main claims are only partially supported

solid: Methods, data and analyses broadly support the claims with only minor weaknesses

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Visual object perception is mediated by information flow between regions of the ventral visual stream along feedforward and feedback anatomical connections. However, feedforward and feedback signals during naturalistic vision are rapid and overlapping, complicating their identification and precise functional specification. Here we recorded human layer-specific fMRI responses to naturalistic object images in early visual cortex (EVC) and lateral occipital complex (LOC) to isolate feedforward and feedback information signals spatially by their cortical layer-specific termination pattern. We combined these layer-specific fMRI responses with electroencephalography (EEG) responses for the same images to segregate feedforward and feedback signals in both time and space. Feedforward signals emerge early in the middle layers of EVC and LOC, followed by feedback signals in the superficial layer of LOC. Comparing the identified dynamics in LOC to a visual deep neural network (DNN), revealed that early feedforward signals in LOC encode medium-to-high complexity features, whereas later feedback signals increase the representational format to high complexity features. Together this specifies the spatiotemporal dynamics and functional role of feedforward and feedback information flow mediating visual object perception.

Introduction

Human object vision relies on anatomical bidirectional connections along the ventral visual stream^1,2, spanning the visual hierarchy from early visual cortex (EVC) to the lateral occipital complex (LOC)³. These connections mediate visual computations via feedforward and feedback information flows, with complex overlapping spatiotemporal dynamics^4–6. While the feedforward flow carries sensory information up the visual processing hierarchy^7,8, downstream feedback concurrently carries information down the hierarchy, refining and shaping feedforward neural dynamics⁹. Identifying the distinct contributions of these information flows to visual computation in space and time is therefore crucial for understanding the mechanism of human object vision.

Previous efforts to disentangle the specific signatures of feedforward from feedback have typically used one of two main approaches. The first involves invasive manipulations on the anatomically-⁸ and functionally-defined^6,10,11 neural circuits in non-human primates, selectively targeting bottom-up vs top-down processes. While these interventions offer precise circuit-level control, their invasive nature makes them largely impractical for human research. The second approach comprises contrasting experimental conditions where the feedforward and feedback contributions vary, such as perception vs mental imagery¹², attention vs non-attention¹³, early vs late backward masking⁵, and expectation vs sensory input^14,15. However, this method does not directly assess information flow during naturalistic vision and has conceptual limitations, as it relies on indirect comparisons of experimental conditions that may be confounded by incompletely controlled differences between them^16–18.

Instead, here we capitalized on the layer-specific anatomical connectivity found in the primate visual cortex^7,8,19, to dissect feedforward from feedback signals: while feedforward connections terminate primarily in the middle layer²⁰, feedback connections target superficial and deep layers^21,22.

Based on this three-compartment model of cortical depth, we characterized the layer-specific spatiotemporal dynamics of feedforward and feedback signals in object perception. Using sub-millimeter resolution fMRI at 7T, we recorded layer-specific brain activity in human EVC, the initial stage of visual processing²³, and in LOC, a high-level key region for object representations^3,24, while participants viewed naturalistic object images. We then combined these layer-specific fMRI responses with time-resolved EEG responses for the same images within the framework of representational similarity analysis (RSA)^25–27 to resolve feedforward and feedback processing in millisecond resolution. Based on this, we then characterized the representational format in terms of visual feature complexity of feedforward and feedback processing using artificial deep neural networks (DNNs).

Results

Using sub-millimeter resolution 7T MRI, we recorded gradient-echo blood oxygenation level-dependent (GE-BOLD) signals in EVC and LOC of 10 participants in response to 24 different naturalistic object images (Fig. 1A). During the acquisition participants viewed naturalistic object images on real-world backgrounds in random order (Fig 1B). We estimated the neural response to each condition, i.e., object image, by fitting a general linear model.

Stimuli, experimental design and multivariate pattern classification
**(A)** Stimulus set. The stimuli consisted of 24 different naturalistic object images. **(B)** fMRI experimental design. On each trial, participants viewed images for 1 s followed by a 4-s baseline interval. Participants were required to perform a color-change detection task on the fixation cross that occurred randomly throughout the experiment. **(C)** Extraction of voxel values to form condition-specific pattern vectors from region of interest (example here: EVC) and pairwise-object classification using a support vector machine. ROIs are depicted for visualization purposes only **(D)** Object-pairwise multivariate decoding output. Robust object-specific information was reliably decoded from EVC (71,16%, P = 0.0039) and LOC (60,69%, P = 0.0092) using one-sample permutation tests. Error bars indicate the standard error of the mean across participants.

Robust object information in both EVC and LOC is a precondition for further dissecting feedback and feedforward-related aspects. To ascertain this, we extracted voxel values from EVC and LOC separately to form condition-specific pattern vectors (Fig. 1C). Based on these pattern vectors we trained and tested a support vector machine (SVM) to perform pairwise-object classification of objects. We found robust decoding accuracy in both EVC (71,16%, P = 0.0039, one-sample permutation test) and LOC (60,69%, P = 0.0092), ascertaining reliable object information in those regions (Fig. 1D).

We then proceeded to identify and examine the spatiotemporal neural dynamics of visual representations in two steps that build upon each other. First, we assessed the macroscale of cortical regions (i.e., EVC and LOC) to establish and thus validate representational EEG-fMRI fusion at 7T, and to characterize the representational format of the identified dynamics. Based on this validation, we dissect feedforward from feedback neural processing at the finer mesoscale level of cortical layers.

Spatiotemporal neural dynamics of object representations in EVC and LOC at the macroscale of brain regions

To establish the time course of object processing in EVC and LOC at the macroscale, we employed representational EEG-fMRI fusion^26–28, which integrates time-resolved EEG with spatially resolved fMRI measurements for a combined spatiotemporally resolved view. For MRI, we used the dataset collected in this study, complemented with EEG responses from an existing dataset to the same set of 24 images⁵ as in the MRI experiment. We assessed the representational geometry of EVC and LOC with region-specific representational dissimilarity matrices (RDMs; Fig. 2A), and the representational geometry of EEG signals with RDMs in a time-resolved way from −200 to 600 ms with respect to image onset (Fig. 2B). We related EVC and LOC RDMs to the time-resolved EEG-RDMs, yielding a time course of representational similarity for EVC and LOC each, for which we report peak latency and peak latency differences with 95% confidence intervals in parenthesis as assessed by bootstrapping (1,000 iterations).

Macroscale representational EEG-fMRI fusion
**(A)** Representational similarity analysis. For each condition, we extracted the neural pattern from the region of interest (example here: EVC). We assessed the extent of pattern dissimilarity by calculating 1 - Pearson’s correlation for all combinations of experimental conditions (i, j) and assigned the dissimilarity values to an fMRI representational dissimilarity matrix (RDM) indexed by the conditions in rows and columns, at entry (i, j). ROIs are depicted for visualization purposes only **(B)** Representational EEG-fMRI fusion. For each time point t, we correlated the EEG-RDM to the fMRI-RDMs of EVC and LOC using Spearman’s rank order correlation. **(C)** Spatiotemporal neural dynamics at the macroscale level. Time course in EVC peaked earlier than in LOC. **(D)** Difference between EVC and LOC curves in C. EEG signals correlated first more with EVC than with LOC and later more with LOC than with EVC. **(E)** Commonality analysis. For each DNN layer, we correlated its RDM to each ROI-specific fMRI-RDM and the mean EEG-RDM at the time interval with significant ROI-specific temporal dynamics. **(F)** Format of representation (≈ feature complexity) in EVC and LOC. Visual representations of low-to-mid-level complexity emerge early in EVC, while mid-to-high-level object representations emerge later in LOC. Shaded regions and error bars indicate the standard error of the mean across participants. Colored circles denote significant time points (N = 32; cluster-defining threshold P < 0.05; cluster threshold P < 0.05); uncolored circles and horizontal lines indicate peak latency means and 95% confidence intervals, respectively. Colored asterisks indicate significant correlations (right-tailed permutation tests, FDR-corrected; *P < 0.05; **P < 0.01; ***P < 0.001). Colored triangles represent model layers with the highest occurrence proportion, determined through 1,000-iteration bootstraps.

As expected from the well-established hierarchical organization of the visual system^7,8,29, object processing in EVC preceded that in LOC (Fig. 2C): The EVC time course peaked at 120 ms (100 – 120 ms), whereas the LOC time course peaked later at 180 ms (170 – 370 ms), with a significant difference between peak times of 60 ms (50 – 250 ms; P < 0.001). Furthermore, in direct comparison by subtraction of the correlation curves of LOC from EVC, early representations correlated stronger with EVC than with LOC at 90 ms (90 – 120 ms; Fig. 2D), whereas late representations correlated stronger with LOC than with EVC at 300 ms (180 – 510 ms). A supplementary EEG-fMRI fusion analysis accounting for similarities in representational geometry between LOC and EVC by partial correlations confirmed these observations (Suppl. Fig. 1A and B), as also expected from the low correlation between the LOC and ECS RDMs (R = 0.05). Our results extend EEG-fMRI fusion^5,26,28,30 from 3T to 7T fMRI and provide additional support to the procedure, thus warranting a spatially finer investigation of the spatiotemporal neural dynamics at the mesoscale level of cortical layers.

The format of object representations in EVC and LOC at the macroscale of brain regions

The visual cortex represents objects in formats of increasing feature complexity from low to high along the ventral visual stream^31–34. To confirm this progression here, we related the neural dynamics of EVC and LOC to the Vision Transformer (ViT) architecture³⁵ from the DeiT family³⁶ trained on object categorization³⁷. The underlying rationale is that the feature complexity of visual representations progressively increases from lower to higher layers across the network’s layers, so that assessing the fit of the model layers to brain data reveals feature complexity of the neural representations^34,38. We conducted commonality analysis, linking each of the 4 layer-specific DNN-RDMs to the ROI-specific fMRI-RDMs from EVC and LOC, and to the mean EEG-RDM at corresponding time intervals, defined by the significant time points of the raw curves in Fig. 2C. We report peak layers with 95% confidence intervals in parenthesis (1,000 bootstraps).

We found commonality with predominantly low to mid model layers, with a peak at model layer 2 (1 – 2) in EVC. and with predominantly middle to high model layers with a peak at model layer 4 (3 – 4) in LOC (Fig. 2F). This indicates representations of primarily low- to mid-complexity in EVC and mid- to high-complexity in LOC. This pattern of results remained consistent when using an alternative experimental choice based on significant time points from the difference between EVC and LOC curves in Fig. 2D (Suppl. Fig. 1C). Our results thus confirm a transition from representations of low-to-mid complexity processed early in EVC into representations of higher complexity processed later in LOC²⁸, further providing additional support to the analytic approach at 7T and warranting a finer investigation of representational format at the mesoscale level.

Spatiotemporal neural dynamics of object representations in EVC and LOC at the mesoscale of cortical layers

To investigate the spatiotemporal neural dynamics at the mesoscale level, we applied the research procedure validated above to the finer level of cortical layers (Fig. 3A). To this end, we segmented the cortical ribbon into three equidistant layers³⁹: deep, middle and superficial. We then applied representational EEG-fMRI fusion as established above, but now at the level of layers, to yield time-resolved and layer-specific visual object processing time courses in EVC and LOC. To control for non-specific macrovascular responses⁴⁰ that affect layer specificity⁴¹ and are reflected as increased GE-BOLD responses towards superficial layers (Suppl. Fig. 1D), we conducted EEG-fMRI fusion analysis partialing out for each layer the effect of the layers beneath.

Mesoscale representational EEG-fMRI fusion
(A). For each cortical layer — deep, middle, and superficial — (example here: for EVC) we computed the partial Spearman’s rank-order correlation between its layer-specific fMRI-RDM and the EEG-RDM at each time point t. **(B)** Layer-specific spatiotemporal neural dynamics in EVC. EEG signals correlated early across layers, with the time course in the middle layer peaking earlier than in the deep and superficial layers. **(C)** Difference between EVC layer curves in B. EEG signals correlated lately more with deep and superficial layers than with the middle layer **(D)** Layer-specific spatiotemporal neural dynamics in LOC. EEG signals correlated early in the middle layer and later in the superficial layer. **(E)** Difference between LOC layer curves in D. EEG signals correlated early more with the middle layer than with the superficial layer, and later more with the superficial layer than with the middle layer. Shaded regions indicate the standard error of the mean across participants. Colored circles denote significant time points (N = 32, cluster-defining threshold *P <* 0.05, cluster threshold *P <* 0.05); uncolored circles and horizontal lines indicate peak latency means and 95% confidence intervals, respectively.

In EVC we observed a correlation pattern suggesting two processing stages with a common rise (Fig. 3B, C) indexed by different profiles across layers and in timing. The first stage is marked by a peak in the middle layer at 100 ms (90 – 120 ms; Fig. 3B). The second stage is characterized by peaks at 140 ms in both the deep (120 –180 ms) and superficial (120 – 190 ms) layers, with significant latency differences of 40 ms (10 – 50 ms; P = 0.004) between the middle layer and the deep layer and 40 ms (20 – 80 ms; P = 0.002) between the middle layer and superficial layer, but not between the deep layer and the superficial layer 0 ms (−20 – 40 ms; P = 0.814). This pattern was further substantiated by subtracting the layer-specific time courses which showed significant effects (Fig. 3C). We observed stronger correlations of late representations at 140 ms with the deep (140–150 ms) and the superficial (140–140 ms) layers than with the middle layer. This suggests initial feedforward processing in the middle layer, followed by later signals in the deep and superficial layers that could reflect either feedback processing or the sequential propagation of feedforward activity along the canonical microcircuit^8,42 in EVC.

In LOC, the results pattern also indicated two stages with a distinctive layer and temporal profile (Fig. 3D, E). We observe earlier processing in the middle layer, followed by processing in the superficial layer later (Fig. 3D). In detail, time courses of object processing peaked first in the middle layer at 160 ms (160 – 180 ms), and later in the superficial layer at 400 ms (350 – 440 ms), with a significant difference in peak latency of 240 ms (110 – 290 ms; P < 0.012). A direct comparison of time courses by subtraction (Fig. 3E) substantiated the observation in that representations correlated stronger with the middle layer early than the superficial layer at 160 ms (160 – 170 ms), and late representations correlated stronger with the superficial layer than the middle layer later, at 400 ms (90 – 440 ms). This suggests an initial feedforward processing in the middle layer and a later emerging feedback processing with a distinctive layer profile in the superficial layer in LOC.

An analogous results pattern in EVC (Suppl. Fig. 2A, C) and LOC (Suppl. Fig. 2B, D) was observed when assessing layers directly without partialing out the effect of deeper layers. To ensure that differences in the number of voxels across cortical layers did not bias the representational dissimilarities, we repeated the analyses using an equal number of voxels per layer by randomly subsampling to match the layer with the lowest voxel number. This control analysis yielded comparable results in EVC (Suppl. Fig. 2E) and LOC (Suppl. Fig. 2F), confirming the robustness of the results to particular analysis choices.

Together, this resolves the spatiotemporal dynamics of feedforward and feedback information flow during visual object processing across EVC and LOC through cortical layer-specific and temporally distinct response profiles.

The format of object representations in EVC and LOC at the mesoscale of cortical layers

One interpretation of the observed layer-specific spatiotemporal dynamics, being guided by the functional pattern of cortical layer connectivity^43,44, is that they indicate interareal communication via feedforward and feedback connections^42,45. However, an alternative explanation is that they arise intra-areal communication via lateral connections⁴⁶ or from superficial bias in the fMRI measurements⁴⁷.

To disambiguate between these options, we characterized the format of representations by assessing feature complexity as present in DNN layers, from low to high, applying the research approach as used at the macroscale but now at the level of cortical layers.

Feedforward processing within a cortical region involves convolution-like computations from simple to complex and hypercomplex cells^48,49, progressively transforming representations to achieve some tolerance. We conceptualize this as a transformation that changes the tolerance of a feature, not the intrinsic feature complexity itself. Instead, an increase in intrinsic feature complexity requires the integration of different features into new, emergent, and more elaborated representations, as those reflected across different DNN layers. Feedback processing, in contrast, is likely to play an active role in this feature complexity increase^5,50 by transferring higher-complexity representations. Accordingly, if the observed layer-specific spatiotemporal dynamics reflect an interplay of feedforward and feedback information flow, we would expect the flow to carry information of varying complexity^51,52 across different levels of the visual hierarchy^53,54, resulting in distinct representational formats across cortical layers.

We conducted commonality analysis, linking each model layer-specific DNN-RDM to the deep, middle and superficial layer-specific fMRI-RDM and mean EEG-RDM at corresponding time intervals, defined by the significant time points of the raw curves in Fig. 3B and D indicating periods of layer-specific neural dynamics (Fig. 4A).

Mesoscale commonality analysis between fMRI, EEG and ViT
**(A)** Procedure. For each transformer layer, we correlated its RDM to each layer-specific fMRI-RDM in EVC and LOC and the mean EEG-RDM at the time interval with significant layer-specific temporal dynamics. **(B)** Format of representation (≈ feature complexity) across cortical layers in EVC. Low model layers correlated strongly across layers in EVC. **(C)** Format of representation (≈ feature complexity) across cortical layers in LOC. Middle model layers correlated strongly with the middle layer in LOC, while high model layers correlated primarily with the superficial layer. Colored asterisks indicate significant correlations (N = 32, right-tailed permutation tests, FDR-corrected; **P <* 0.05; ***P <* 0.01; ****P <* 0.001). Colored triangles represent model layers with the highest occurrence proportion, determined through 1,000-iteration bootstraps.

In EVC (Fig. 4B) we observed a uniform results pattern across cortical layers: object representations shared variance with all DNN layers, but most strongly with model layer 2 (2 – 2), indicating mainly representations of low-to-mid level complexity. This suggests that the observed layer-specific spatiotemporal dynamics with a common rise in EVC reflect feedforward propagation, lateral connections modulating the neural gain^55–59 or superficial bias⁴⁷, rather than feedback.

In contrast, in LOC (Fig. 4C) we observed a shift from mid- to high feature complexity across cortical layers. The early emerging representations in the middle cortical layer (Fig. 3D) were of mid-to-high-complexity, as indicated by a peak at model layer 3 (2 – 3). In contrast, the late emerging representations in the superficial layers were of high complexity, with a peak at model layer 4 (4 – 4). A direct comparison of feature complexity between the middle and superficial layers, using bootstrapped peak layer differences, confirmed that the superficial layer representations encoded higher-level features than the middle layer (1 model layer difference; P < 0.001).

These results proved robust across multiple experimental choices. These included using an equal number of voxels per layer (Suppl. Fig. 2G and H), testing a different network architecture, AlexNet³⁷ (Suppl. Fig. 2I and J) to account for DNN-specific biases, averaging over significant time intervals from the layer-curve differences from the layer-curve differences in Fig. 3C and E (Suppl. Fig. 2K and L), and analyses across individual time points spanning the full time course (Suppl. Fig. 3).

Together, these findings suggest a dynamic shift in representational format in LOC, transitioning from early representations of mid-level feature complexity in the feedforward flow to later representations of high feature complexity through interareal feedback.

Discussion

We leveraged layer-resolved fMRI, time-resolved EEG data and DNNs to identify and characterize the spatiotemporal neural dynamics of feedforward and feedback information flow underlying object perception.

We validated our methods using 7T fMRI at the macroscale of cortical regions by replicating the temporal dynamics and representational format in EVC and LOC observed in 3T fMRI studies^5,26,30,34, allowing us to proceed to the mesoscale of cortical layers. There we made two key observations. First, we observed distinct layer-specific temporal profiles for EVC and LOC. Visual representations in the middle layers emerged earlier than in the deep and superficial layers of EVC and the superficial layers of LOC. Second, the identified layer-specific dynamics in LOC had distinctive visual feature complexity profiles: the early emerging middle layer representations were of mid-to-high complexity and the later emerging superficial layer representations were of high complexity.

Layer-specific EEG-fMRI fusion reveals sequential feedforward and feedback processing in visual object perception

Visual object perception unfolds through intricate spatiotemporal neural dynamics^4–6,60,61, mediating feedforward and feedback information flow^26,62 rapidly and through temporally overlapping responses. Here, we disentangle the temporal dynamics of feedforward from feedback signals by leveraging the anatomical canonical microcircuit of cortex^7,8 at the input and output stage of cortical object processing^3,63,64 – EVC and LOC. We find that feedforward signals emerge early in the middle layer of both EVC at 100 ms and in LOC at 160 ms, while feedback appear relatively later in superficial layer of LOC at 400 ms, supporting the idea of sequential processing of visual information first through feedforward, then through feedback processing^44,65. Our results provide a functional temporal characterization for the anatomical canonical cortical microcircuit model of feedforward and feedback connectivity in the human ventral visual stream.

Although the observed differences in feedforward peak latency between EVC and LOC indicate a consistent temporal hierarchy, their interpretation warrants caution due to the indirect nature of the fusion-derived measures. Note that there is no trivial way to relate the exact peaks reported in the fusion analysis to other measures of temporal dynamics in cortex, such as spike timing^11,66. They reflect the relation mass signals in the EEG with BOLD signals in the fMRI, involving the foveal confluence and thus aggregated across V1 to V3. Thus, while the timing and in particular its differences across regions and layers as determined by fusion are robust and define a clear order, the relationship to neural activity measured invasively in the brain needs further confirmation.

While previous studies have consistently reported feedback modulation in the EVC^{11,54,67–69}, our results suggest that the observed layer-specific spatiotemporal dynamics instead reflect feedforward propagation, lateral interactions modulating the neural gain^55–59 or superficial bias⁴⁷, with no evidence of interareal feedback. This apparent lack of feedback signals could reflect the limited laminar specificity of our measurements, influenced by vascular contamination and partial volume effects^47,70 (see limitation section for more details), effectively masking feedback signals under the prevailing feedforward activity in the EVC.

Our findings have theoretical implications. For example, they specify in spatiotemporal terms the dynamical communication model in predictive coding theory⁷¹ which posits distinct neural channels for the transmission of sensory and predictive information^72–74. Our results indicate that whereas sensory signals are conveyed early in middle layers, subsequent predictive signals are transmitted later as feedback in superficial layers. This specification opens a new path for further testing predictions of predictive coding by investigating the content and integration of predictive feedback and sensory feedforward signals^75–77. For example, our results invite investigating layer-specific neural dynamics at distinct frequency bands for feedforward and feedback processing as predicted from human^78,79 and non-human invasive studies^72,80–82, or recently the spatiotemporal dynamics of false percepts^73,74.

Our approach facilitates resolving the interplay of feedforward and feedback processing in a variety of visual research contexts. It may allow dissecting the distinctive roles of feedforward and feedback information flow crucial to cognitive functions such as attention, expectation and memory by identifying and characterizing their distinct spatiotemporal dynamics. Similarly, using multi-rather than univariate methods^12,72,83 to assess layer-specific fMRI extends content-sensitive analysis^84–86 from the macro- to the mesoscale of cortical organization, allowing for the assessment of the representational contents of feedforward and feedback information flow underlying human cognition.

Layer-specific EEG-fMRI fusion clarifies neural dynamics of high-level visual cortex regions

Neural dynamics in response to faces²⁶, scenes⁸⁷ and objects^88,89 assessed at the macroscale of cortical regions commonly display a double peak pattern (see also Fig. 2C), with a sharp, early peak around 160-200 ms followed by a wide second peak around 240-450 ms, suggesting distinct contributing neural circuits and kinds of processing. Our layer-specific results clarify the neural circuits²⁶ and functional nature of the components underlying this pattern: it results from mixing the early, narrow-peaked and neural dynamics at the middle layer ∼160 ms of feedforward information flow with the later, wide-peaked and thus more persistent dynamics at the superficial layer ∼400 ms of feedback information flow. Whereas the early peak latency matches that of sensory feedforward signals⁹⁰, the late peak dynamics match that of and attention-⁹¹, consciousness-⁹² or task-related⁹³ feedback, originating from higher-order brain regions, such as the frontal eye fields¹⁰ or the prefrontal cortex⁹⁴, outside the visual cortex. Thus, our results provide circuit-level mechanistic as well as functional interpretative guidance for standard 3T human neuroimaging studies that typically cannot resolve visual information flow at this fine spatiotemporal level.

We did not observe sustained activity in EVC following the middle-layer peak observed in LOC. This difference may reflect variations in temporal processing across the visual hierarchy during object perception, potentially arising from distinct contributions of feedforward and feedback information flow. Specifically, neural dynamics in EVC are likely dominated by rapid, sensory-driven feedforward input, whereas LOC exhibits both feedforward and more sustained feedback components. Such prolonged activity has been reported previously in higher-level visual areas^26,88,89, but not in EVC. We therefore do not interpret the absence of sustained EVC activity as a lack of feedback per se, but rather as a limitation of the spatial and temporal sensitivity of our measurements, which likely capture the dominant population-level dynamics, consistent with patterns observed in univariate analyses⁹⁵.

Differential representational format across layer-specific dynamics in LOC implies interareal feedback mediating high-complexity visual features

The analysis of the representational format of the neural dynamics in LOC suggests that early feedforward signals primarily convey mid-complexity features, while feedback signals convey high-complexity features. This is consistent with the idea that feedback does not merely modulate the gain of pre-existing features but is actively involved in the emergence of additional, more complex features that are absent in feedforward processing⁵. This finding supports the notion that interareal feedback is integral to core object recognition^96–98. A potential source of this feedback might be the dorsolateral prefrontal cortex (DLPFC)⁹⁹, whose silencing decreases feature complexity in monkey inferior temporal cortex⁵⁰, the homologue of human LOC. Based on our results we predict that disrupting processing in human DLPFC, e.g., through transcranial magnetic stimulation^100,101, will yield analogous effects specifically on superficial, late cortical layer responses in LOC.

Limitations of the study and future research

We based our analyses on the GE-BOLD signal that is affected by locally nonspecific responses from macrovasculature⁴⁷, compromising the estimation of the laminar response. To mitigate this effect while assessing the relationship between neural patterns, we applied a two-steps approach. First, we used Pearson correlation, which is invariant to signal scaling and thus less sensitive to pattern amplification arising from differences in vascular density across the cortical ribbon¹⁰². Second, we applied partial correlations to remove residual baseline effects originating from underlying layers. Although this approach enabled us to assess pattern relationships reducing unwanted blood contamination effects, non-linear signal leakage, particularly from draining veins toward the pial surface might still persist¹⁰³.

To ensure sufficient signal to noise ratio (SNR) while covering both EVC and LOC, we used an isotropic voxel size of 0.9 mm. However, given that the cortical thickness in the visual cortex often does not exceed three times this resolution¹⁰⁴, partial volume effects might exist, resulting in signal mixing across cortical layers.

Despite these limitations, our methodological framework holds exciting potential for noninvasive, in vivo resolution of the spatiotemporal dynamics of feedforward and feedback processing in the human brain. Here, we combined EEG and fMRI data to capture common and stable neural effects across modalities using a condition-by-condition approach, whereas other studies have employed trial-by-trial methods¹⁰⁵. Exploring how these two approaches could be integrated offers an interesting avenue for future research, as it may provide complementary insights into the coupling between electrophysiological and hemodynamic signals.

Key challenges remain, including expanding spatial coverage without compromising signal quality, which would enable the study of the spatiotemporal dynamics of object information flow across the entire ventral visual stream and beyond. Future human fMRI studies using acquisitions with higher spatial specificity^106–108, optimized for condition-rich experimental designs²⁵, will be essential to confirm and extend our findings.

Conclusion

Understanding how the abundant feedforward and feedback connections in visual cortex mediate object vision requires specifying their functional role. Here we provided two key advances in this regard. First, leveraging layer-specific fMRI with EEG-fMRI fusion we dissociated temporally overlapping feedforward and feedback processing in LOC and EVC, specifying their unique temporal profile. Second, by assessing feature complexity, we showed that feedback in LOC might actively increase the complexity of LOC’s feature format.

Materials and Methods

fMRI participants

10 adult volunteers (mean age 29.4 years; age range 20-37 years; 5 female) participated in the study and provided written informed consent. The sample size was based on previous conventional layer fMRI studies conducted at 7T^83,106,109. All participants had normal or corrected-to-normal vision and no history of neurological disorders. All participants received a monetary reward at the end of the study. The study was approved by the Ethics Committee of the Faculty of Medicine at Leipzig University, Germany, and conducted in accordance with the ethical principles of the Declaration of Helsinki, except for study preregistration, which was not performed.

Stimulus set

The stimulus set consisted of 24 naturalistic color images of everyday objects on real-world backgrounds, each of a different category from the ImageNet image database (i.e., animals, tools, vehicles, foods and others)³⁴.

fMRI experimental design

The study was composed of a main experimental part and one localizer run.

The main experimental part consisted of 6 to 14 runs (mean ± SD: 9.9 ± 3.2), each lasting 621 s. Each run started and ended with a 10.5-s baseline period and included all 24 images, each repeated five times, for a total of 120 trials presented in random order. Each trial began with a 1-s stimulus-on interval, during which a centrally displayed object image (5° visual angle) appeared on a grey background with a pink fixation cross. We used a short stimulus-off interval of 4 s and randomization of condition ordering, consistent with previous RSA studies^5,25,26,28, to allow for both a high number of conditions and trial repetitions with the goal of high reliability of the estimated activity patterns (Suppl. Fig. 4A, B and C). During this interval, only the fixation cross was displayed. Participants were instructed to maintain their gaze on the fixation cross throughout the experiment and to perform a color-change detection task unrelated to the object images, thereby, minimizing task-dependent effects during object processing¹¹⁰. Participants responded by pressing a button as soon as the fixation cross turned blue for 300 ms.

The functional localizer run was intended to define regions of interest (ROIs) EVC and LOC. Participants completed it at the beginning of the recording session. The localizer consisted of 15-s blocks displaying objects (not included in the main experiment) and scrambled objects overlayed with a fixation cross, interleaved with 7.5-s baseline blocks showing only a fixation cross on a grey background. In each block images were centrally presented (12° visual angle) for 400 ms, followed by a 350-ms display of the fixation cross. Participants were instructed to maintain their gaze on the fixation cross and to press a button if the same image appeared in consecutive trials. The localizer run included 12 blocks of each image type, resulting in a total duration of 465 s. Block order was pseudo-randomized to avoid immediate repetition of the same block type.

MRI Procedure

MRI data were acquired at the Max Planck Institute for Human Cognitive and Brain Sciences in Leipzig, Germany. Four participants completed the experiment in one scanning session. Six participants completed the experiment in two scanning sessions on two separate days. During the first scanning session, we acquired a T1-weighted anatomical image, the functional localizer and 5 to 8 runs of the main experiment. During the second scanning session, participants completed 6 to 8 runs of the main experiment. Additionally, to enable distortion correction, five volumes with reversed phase-encoding polarity were acquired following the first run of each main experimental session. To ensure that participants were familiar with the experimental tasks, we provided them with verbal and written instructions prior to the scanning, and the participants completed a 2-min training for both the localizer and the main tasks.

MRI acquisition parameters

We acquired MR images on a Siemens Magnetom Terra 7T whole-body system (Siemens Healthineers, Erlangen, Germany) with a single-channel-transmit and a 32-channel radio-frequency (RF) receive head coil (Nova Medical Inc, Wilmington, USA). We acquired the functional data using a 2D Gradient-echo (GE) echo planar imaging (EPI) sequence¹¹¹ (voxel size = 0.9 mm isotropic resolution, TE/TR = 26.2/3500 ms, in-plane field of view (FoV) 192 × 192 mm², 48 axial slices, flip angle = 75°, echo spacing = 1.0 ms, GRAPPA factor = 3, partial Fourier = 6/8, phase encoding direction anterior-posterior). We recorded anatomical data using an MP2RAGE sequence¹¹² (voxel size = 0.7 mm isotropic resolution, TE/TR = 2.01/5590 ms, in-plane FoV 224 × 224 mm, GRAPPA factor = 2) yielding two inversion contrasts (TI1 = 900 ms, flip angle 1 = 5°; TI2 = 2750 ms, flip angle 2 = 3°). The two inversion contrasts were combined to produce T1-weighted MP2RAGE uniform (UNI) images with high contrast to noise ratio.

MRI preprocessing

For each recording session, we spatially realigned the functional volumes to their mean volume using SPM12 (http://www.fil.ion.ucl.ac.uk/spm). To correct for geometric distortions in the phase encoding direction, we calculated a deformation field based on reverse gradient estimation, using the Advanced Normalization Tools (ANTs) software package (http://stnava.github.io/ANTs/). In detail, we combined the mean functional volume (forward image) with the mean volume acquired with opposing phase encoding direction to generate a distortion-corrected template. Next, we estimated the deformation map by registering the forward image to the corrected template reference using non-linear (SyN) transformations. Finally, the deformation map was used to produce a distortion-corrected mean functional volume.

To co-register the anatomical and the distortion-corrected mean functional volumes, we initially referred to the Glasser’s atlas¹¹³ and the Kanwisher’s atlas¹¹⁴ to identify the approximate location of EVC and LOC, respectively. Then we outlined a volume containing these regions in the occipito-temporal cortex of both hemispheres on the individual participant’s native space (from here on referred to as manual mask) using ITK-SNAP with the 3D paintbrush tool (v.3.8)¹¹⁵. We then estimated a second deformation map in ANTs by registering the distortion-corrected mean volume to the T1-weighted volume, applying nonlinear (SyN) transformations within the manual mask. We visually inspected the fixed and registered volumes in ITK-SNAP for each participant. If the volume was not correctly registered within the region of the manual mask, we repeated the registration, adding linear transformations (rigid and affine) with stricter convergence criteria and increased iterations, until an accurate alignment was achieved. To minimize spatial resolution loss during resampling, we combined both deformation maps and resampled the functional images to the anatomical reference in a single interpolation step using a fifth-order spline function.

Finally, we spatially smoothed the functional localizer images using a 6-mm full width at half maximum (FWHM) Gaussian kernel. Functional images of the main runs were not smoothed to preserve spatial specificity.

Reliable layer analyses require accurate tissue segmentation. Because current automatic algorithms typically fail to capture fine cortical boundaries, we employed a manual segmentation approach to define the gray matter boundaries. Specifically, we delineated the outer and inner grey matter borders within the regions of interest (ROIs; see section “Definition of regions of interest” for details) on the T1-weighted UNI volume, following the procedure in ITK-SNAP outlined here (https://www.youtube.com/watch?v=tSA77mFTwcg&t=1042s). Prior to delineation, we corrected for bias field effects with a customized script¹¹⁶. The manually delineated gray matter surfaces were then used as input for automatic segmentation of the cortical ribbon into laminar and columnar compartments using the LN2_LAYERS algorithm from LAYNII (v2.2.1)¹¹⁷. In detail, we segmented the gray matter into three cortical depths within each ROI: deep, middle and superficial (Fig. 3A) applying the equi-distant model³⁹. We chose the equi-distant model over the equi-volume model, as it provides more accurate layer segmentation for coarser resolutions (>0.3 mm), such as that of our fMRI data.

fMRI univariate analysis

To estimate neural responses, we ran separate General Linear Model (GLM) analyses in SPM12 for each pre-processed functional run, i.e., all main experimental runs and the localizer run. All analyses were conducted in each participant’s native anatomical space. Specifically, we modelled 25 regressors (i.e., 24 object images + baseline) for each main experimental run, and 3 regressors (i.e., objects, scrambled objects, and baseline) for the localizer run. We created the regressors by convolving a boxcar function representing the onsets and durations of the corresponding condition with the canonical (2 Gamma) hemodynamic response function (HRF). We incorporated the motion estimates into the model as nuisance regressors. By fitting a GLM, we obtained beta weight estimates for each condition (i.e., regressor) for each run, which were subsequently used in further analyses.

Definition of regions of interest

At the macroscale, we defined two ROIs for each participant: EVC, comprising areas V1, V2, and V3, and LOC in a two-step procedure. First, we used anatomical masks from the above-mentioned brain atlases for EVC¹¹³ and for LOC¹¹⁴. These masks were resampled from the MNI152 space into each participant’s individual space. Second, we identified the overlap between the participant-specific anatomical masks and the corresponding functional contrast T-statistic map from the localizer experiment, retaining the top 2,000 voxels. Specifically, we ranked the voxels according to the objects + scrambled > baseline contrast T-statistic for EVC, and the objects > scrambled contrast T-statistic for LOC. Due to lack of activation in the lateral occipitotemporal cortex for one participant (3), we used the objects + scrambled > baseline contrast to define LOC. Any voxels overlapping across ROIs were excluded. This process resulted in one final EVC and LOC mask for each participant.

At the mesoscale, we defined six ROIs based on the two brain regions – EVC and LOC – and the three cortical layers – deep, middle and superficial (see above for definition of cortical layers). Here, the ROI definition was analogous, with the following deviations to address issues arising specifically at the mesoscale level. Voxels with higher spatial resolution exhibit a reduced SNR and increased susceptibility to noise sources⁴⁷, compromising the quality of the recorded signal. Additionally, signal reliability gradually decreases along the ventral visual cortex from lower to higher visual areas¹¹⁸, likely due to signal loss and susceptibility-induced distortions¹¹⁹. To address this, we chose a higher number of voxels for EVC compared to LOC. This resulted in 3000 and 1500 voxels with the highest T-statistic within the cortical ribbon of EVC and LOC, respectively. We then assigned each voxel to one of the three layers and selected those forming a complete column. For example, a voxel with column index 5 in the deep layer was only included if the corresponding voxels with column index 5 in the middle and superficial layers were also included.

EEG data – paradigm, acquisition and analysis

We used a subset of the EEG data (N = 32) collected by Xie et al., 2024⁵ for the same images as in the fMRI experiment. Below is a summary of the relevant experimental procedures and acquisition steps.

The EEG experiment employed and backward masking paradigm consisting of 2,544 trials. In each trial, an object image was briefly presented for 17 ms and followed by a dynamic mask under two conditions: an early mask condition with inter-stimulus interval (ISI) of 17 ms or a late mask condition with an ISI of 600 ms. Object images and masks were randomly paired on each trial. All stimuli were centrally displayed on a gray background, with a size of 5° visual angle, and overlaid with a bull’s-eye fixation symbol. Participants were instructed to maintain fixation and refrain from blinking during trials, except during designated task trials (i.e., two-alternative forced-choice task, 25% of total trials) where blinking was allowed after making a response. The inter-trial interval (ITI) ranged from 900 – 1,100 ms; following the task trials, the ITI was extended to 2,000 ms to reduce motor artifacts.

For the present analyses, we focused on data from the late mask condition (ISI = 600 ms), yielding approximately 53 trials per object image. We analyzed the time window from 200 ms pre-stimulus to 600 ms post-stimulus, before mask onset.

EEG data were recorded with a 64-electrode ActiCap system and a Brainvision actiChamp amplifier. Electrodes were positioned according to the international 10-10 system, with a ground electrode and a reference electrode placed on the scalp. The signals were sampled at 1,000 Hz and online filtered between 0.03 and 100 Hz. EEG data were preprocessed using Brainstorm-3¹²⁰. Noisy channels (mean = 2.2, SD = 1.8) were removed, and the data were low-pass filtered at 40 Hz. Independent component analysis was applied to remove eye movement and other artifact components (mean = 2.7, SD = 0.9). Data were then segmented into epochs from −200 ms to 600 ms relative to stimulus onset, baseline-corrected, and multivariate noise normalization¹²¹ was applied for decoding analyses.

Multivariate analysis was performed on a participant-specific basis using SVMs as implemented in the LIBSVM toolbox in MATLAB (2021a). To determine when the brain processes object information, time-resolved decoding analysis was conducted from −200 ms to 600 ms relative to target image onset in 10 ms intervals. At each time point, trial-specific EEG channel activations were extracted and arranged into 64-dimensional pattern vectors for each of the 24 object image conditions. For each condition, trials were randomly grouped into four equally sized bins and averaged to create four pseudo-trials, repeated over 100 permutations. Pseudo-trials were then divided into a training set (three pseudo-trials) and a testing set (one pseudo-trial) for pairwise object identity decoding. This process was repeated for all pairwise combinations of object conditions.

Multivariate pattern analysis of fMRI data

To decode object information from voxel activation patterns in EVC and LOC, we used SVMs as implemented in the scikit-learn library in Python. Analyses were conducted separately for each participant and for EVC and LOC at the macroscale. We aggregated run-specific voxel-wise beta estimates derived from the GLM into pattern vectors for each of the 24 conditions. To enhance the signal to noise ratio, we averaged beta weights across two randomly selected subsets of trials into two pseudo-trials for each condition. We repeated this process 300 times, each time performing pairwise object decoding, for all object condition combinations, using a two-fold cross-validation approach. We averaged the results across all iterations and all object condition combinations to yield one grand-average decoding accuracy for each ROI and participant.

Representational similarity analysis

To relate object representations across different signal spaces (i.e., voxel activation patterns in fMRI, sensor activation patterns in EEG, embeddings in DNNs), we used representational similarity analysis²⁵. This approach is based on the rationale that if two images elicit similar neural representations, their corresponding fMRI, EEG or DNN signal patterns should also be similar. We used two variants of RSA: (i) representational similarity analysis-based fusion^25,27,28, relating spatially localized object representations (from fMRI) to specific temporal dynamics (from EEG) to resolve the spatiotemporal dynamics with which visual representations emerge, and (ii) representational similarity analysis-based commonality analysis^122,123 to determine the visual feature complexity (from layer-specific DNN embeddings) of spatiotemporally identified dynamics. For this we calculated the common variance between spatially localized object representations (from fMRI), specific temporal dynamics (from EEG), and layer embeddings (from DNNs). We detail both approaches below after describing the specifics of summarizing representational similarity for each signal space in representational dissimilarity matrices (RDMs).

Construction of fMRI representational dissimilarity matrices

To construct the fMRI-RDMs, we first extracted and vectorized beta weight estimates for each of the 24 conditions to form fMRI neural patterns for a given ROI (area or cortical layer). To account for voxel-wise differences in noise and signal variance across layers, we applied multivariate noise normalization^124,125. To quantify pattern relationship while mitigating scaling effects introduced by draining veins⁴⁰, we used Pearson’s correlation as a scale-invariant measure for all pairwise combinations of experimental conditions. Output correlations were transformed into dissimilarity values using 1 – Pearson’s correlation, and organized into a 24×24 fMRI representational dissimilarity matrix (RDM), indexed in rows and columns by the compared conditions. For each participant, this yielded two fMRI-RDMs at the macroscale level of brain areas: the EVC RDM and LOC RDM; and six layer-specific fMRI-RDMs at the mesoscale level of cortical layers: Deep EVC RDM, Middle EVC RDM, Superficial EVC RDM, and Deep LOC RDM, Middle LOC RDM and Superficial LOC RDM. To contextualize the model performance with respect to noise in the fMRI measurement, we computed the noise ceiling (Suppl. Fig. 4D, E and F). Given the higher noise levels in fMRI, particularly in layer-resolved measurements, compared to EEG, we followed previous EEG-fMRI fusion studies^5,26,30 and averaged the fMRI-RDMs across participants to optimize signal quality.

Construction of EEG representational dissimilarity matrices

A detailed description of the creation of the EEG-RDMs is provided by Xie et. al., (2025)⁵. Briefly, for each participant and each time point in the epoch from −200 to 600 ms, we pairwise decoded object information using EEG channel activation patterns, via SVMs as implemented in the LIBSVM toolbox¹²⁶ in MATLAB (2021a). The decoding accuracies for each pair of object images were assembled into 24×24 EEG-RDM for each time point (801 in total), with rows and columns indexed by the conditions. Each matrix was symmetric across the diagonal, with diagonal entries left undefined.

Construction of DNN representational dissimilarity matrices

To compute the DNN-RDMs, we used the Vision Transformer architecture³⁵ from the DeiT family³⁶, specifically the deit_small_patch16_224 model trained on the ImageNet dataset¹²⁷ for visual object categorization. In detail, we presented the same set of 24 object images to the pretrained network and extracted the activation patterns for each object condition from the final output embeddings of the first, seventh, and twelfth transformer encoder blocks, as well as from the final classification head. Next, we quantified the pairwise pattern dissimilarity using 1 – Pearson’s correlation for all pairs of image combinations and organized the outputs into 24×24 DNN-RDMs, with rows and columns representing Pearson’s based dissimilarity measure between object conditions. This resulted in a total of 7 layer-specific DNN RDMs.

Representational similarity analysis-based EEG-fMRI fusion

We used representational similarity analysis-based fusion^25,27,28, relating fMRI-RDMs to EEG-RDMs using Spearman rank order correlation. We applied this analysis at the macroscale and at the mesoscale, for each participant separately. At the macroscale, we related region-specific RDMs (i.e., EEG, LOC) to EEG-RDMs, yielding one time course for each ROI for each participant. At the mesoscale, to control for non-specific macrovascular responses^40,41, we performed EEG-fMRI fusion analyses in which, for each cortical layer, we partialed out the influence of the underlying layers using partial Spearman rank order correlation. Specifically, for the superficial layer, we partialed out the effect of the middle layer. Similarly, for the middle layer, we partialed out the effect of the deep layer. The deep layer remained unaffected by this approach. By relating region- and layers-specific ROIs to EEG-RDMs, we obtained one time course for each of the six ROI-layer combinations for each participant.

Representational similarity analysis-based commonality analysis

To characterize the level of feature complexity of the neural representations identified above in space and time, we related them to DNN embeddings across model layers. Feature complexity increases progressively across DNN layers: early model layers are more sensitive to low complexity features, whereas later layers are tuned to high-complexity features^128,129, analogous to the ventral visual hierarchy in humans and non-human primates^34,38. Thus, by linking the neural representations to those from the DNN layers, we can determine the feature complexity of spatially localized object representations to particular time points. We did this using representational similarity analysis-based commonality analysis¹¹⁰. In detail, we calculated the commonality coefficient corresponding to the shared variance among fMRI-RDMs for each brain region and cortical layer, EEG-RDMs at time points with significant effects in EEG-fMRI fusion (averaged across time), and DNN RDMs for each model layer.

Quantification and statistical analysis

To assess statistical significance, we performed non-parametric statistical analyses that do not rely on assumptions about the data distribution. The empirical estimations, including decoding accuracy as well as correlation values from the representational similarity analysis-based fusion and from the commonality analysis, were tested against a null distribution created by sign permutation (1,000 permutations). We randomly multiplied the participant-specific data by ±1 to generate permutation samples, recomputed the statistic for each sample, and derived P values. All permutation tests were one-sided (right-tailed), consistent with the directional hypothesis that the observed effects were greater than zero.

We controlled the familywise error rate across time points using cluster-based statistics. First, P-value maps were thresholded at P < 0.05 to define temporally contiguous suprathreshold clusters. These clusters were then used to construct an empirical null distribution of maximum cluster weights, and a corrected threshold was determined at the 95th percentile of the right tail of this distribution. For the commonality analyses we corrected P values for multiple comparisons by FDR-correction.

We estimated the 95% confidence intervals for the peak latencies in the RSA-derived time courses using bootstrapping (1,000 paired bootstrap resamples with replacement). For each bootstrap we calculated the statistic, resulting in a bootstrap estimate of peak latencies from which we derived the confidence intervals.

To estimate confidence intervals for peak latency differences, we used an analogous bootstrapping approach, resampling the mean peak-to-peak latency difference for each resample. This generated a distribution of mean differences, from which we derived the 95% confidence interval. We set P < 0.05, i.e., if the 95% confidence interval did not include 0, we rejected the null hypothesis of no peak-to-peak latency differences.

Supplementary information

Macroscale representational EEG-fMRI fusion and commonality analysis using partial correlations, and laminar profiles of the GE-BOLD response
(A) Spatiotemporal neural dynamics at the macroscale level. For EVC we partialed out the effect of LOC and for LOC we partialed out the effect of EVC. Early representations correlated more strongly with EVC than with LOC and later representations correlated more strongly with LOC than with EVC. (B) Difference between EVC and LOC curves in (A). (C) Macroscale commonality analysis for time intervals showing significant EVC–LOC differences. Visual representations of low-to-mid-complexity emerge primarily in EVC, while mid-to-high-level object representations emerge in LOC. (D) Laminar profiles of the GE-BOLD response to objects in EVC and LOC. Laminar responses derived from GE-BOLD signals are strongly affected by non-specific macrovascular signals, leading to higher activation in superficial layers. Data were averaged across 24 object conditions and 10 fMRI participants. Shaded area represents the standard error of the mean across EEG participants. Shaded regions denote the standard error of the mean across participants; colored circles indicate significant time points (N = 32, cluster-defining threshold *P <* 0.05, cluster threshold *P <* 0.05); uncolored circles and horizontal lines indicate peak latency means and 95% confidence intervals, respectively. Colored asterisks indicate significant correlations (N = 32, right-tailed permutation tests, FDR-corrected; **P <* 0.05; ***P <* 0.01; ****P <* 0.001); colored triangles represent model layers with the highest occurrence proportion, determined through 1,000-iteration bootstraps.

Mesoscale representational EEG–fMRI fusion and commonality analyses under alternative experimental choices
(A–D) Representational EEG–fMRI fusion without partial correlations. (A) Layer-specific spatiotemporal neural dynamics in early visual cortex (EVC). (B) Differences between EVC layer-specific curves shown in (A). (C) Layer-specific spatiotemporal neural dynamics in LOC. (D) Differences between LOC layer-specific curves shown in (C). (E–H) Representational EEG–fMRI fusion and commonality analyses with matched voxel counts across cortical layers. For each layer, voxels were randomly subsampled to match the layer with the fewest voxels; this procedure was repeated ten times and results were averaged across permutations. (E) Layer-specific spatiotemporal neural dynamics in EVC. (F) Layer-specific spatiotemporal neural dynamics in LOC. (G,H) Representational format (≈ feature complexity) across cortical layers in EVC and LOC, respectively. (I,J) Representational format (≈ feature complexity) across cortical layers in EVC and LOC using AlexNet, respectively. (K,L) Representational format (≈ feature complexity) across cortical layers in EVC and LOC for time intervals showing significant between-layer differences. Two temporal clusters (labeled 1 and 2) were defined from these intervals. For each cluster, commonality analysis linked model layer-specific DNN-RDMs to layer-specific fMRI-RDMs and the mean EEG-RDM within the corresponding time window in EVC (K) and LOC (L). Shaded regions and error bars indicate the standard error of the mean across participants. Colored circles denote significant time points (N = 32; cluster-defining threshold P < 0.05; cluster threshold P < 0.05); uncolored circles and horizontal lines indicate peak latency means and 95% confidence intervals, respectively. Colored asterisks indicate significant correlations (right-tailed permutation tests, FDR-corrected; *P < 0.05; **P < 0.01; ***P < 0.001). Colored triangles represent model layers with the highest occurrence proportion, determined through 1,000-iteration bootstraps.

Mesoscale commonality analysis across individual time points
Representational format (≈ feature complexity) in deep layer (A, D), middle (B, E), and superficial (C, F) layers in EVC (**A–C**) and LOC (**D–F**). Shaded regions denote the standard error of the mean across participants. Colored circles indicate significant time points (N = 32, cluster-defining threshold *P <* 0.05, cluster threshold *P <* 0.05).

Reliability and noise ceiling of the fMRI data at macroscale and mesoscale
RDM reliability at the macroscale across brain regions (A) and at the mesoscale across cortical layers in EVC (B) and LOC (C). Noise ceiling estimates at the macroscale (D) and mesoscale in EVC (E) and LOC (F). Error bars indicate the standard error of the mean across participants (N = 10).

Data availability

The study generated a new laminar fMRI dataset. The raw data as well as the analysis code and scripts used to generate the results and figures are not publicly available at the time of submission.

Acknowledgements

This work was funded by the German Research Foundation (DFG, CI241/1-1, CI241/3-1, INST 272/297-1 to R.M.C.), by the European Research Council (ERC) starting grant (ERC-2018-STG 803370 to R.M.C.), supported by the NIH Intramural Program of NIMH/NINDS (#ZIC MH002884 to R. H.) and by the Einstein Center for Neuroscience (to T.C.). Computing resources were provided by the high-performance computing facilities at ZEDAT, Freie Universität Berlin, Germany.

Additional information

Funding

Deutsche Forschungsgemeinschaft (DFG)

Radoslaw Martin Cichy

EC | European Research Council (ERC)

Radoslaw Martin Cichy

Einstein Center for Neurosciences Berlin (ECN)

Tony Carricarte

References

1.
1. Mishkin M.
2. Ungerleider L. G.
3. Macko K. A.
1983Object vision and spatial vision: two cortical pathwaysTrends in Neurosciences 6:414–417https://doi.org/10.1016/0166-2236(83)90190-x Google Scholar
2.
1. Kravitz D. J.
2. Saleem K. S.
3. Baker C. I.
4. Ungerleider L. G.
5. Mishkin M.
2013The ventral visual pathway: an expanded neural framework for the processing of object qualityTrends in Cognitive Sciences 17:26–49https://doi.org/10.1016/j.tics.2012.10.011 PubMed Google Scholar
3.
1. Grill-Spector K.
2. Kourtzi Z.
3. Kanwisher N.
2001The lateral occipital complex and its role in object recognitionVision Research 41:1409–1422https://doi.org/10.1016/s0042-6989(01)00073-6 PubMed Google Scholar
4.
1. Lamme V. A.
2. Supèr H.
3. Spekreijse H.
1998Feedforward, horizontal, and feedback processing in the visual cortexCurrent Opinion in Neurobiology 8:529–535https://doi.org/10.1016/s0959-4388(98)80042-1 PubMed Google Scholar
5.
1. Xie S.
2. Singer J.
3. Yilmaz B.
4. Kaiser D.
5. Cichy R. M.
2025Recurrence affects the geometry of visual representations across the ventral visual stream in the human brainPLOS Biology 23:e3003354https://doi.org/10.1371/journal.pbio.3003354 PubMed Google Scholar
6.
1. Lamme V. A. F.
2. Zipser K.
3. Spekreijse H.
1998Figure-ground activity in primary visual cortex is suppressed by anesthesiaProceedings of the National Academy of Sciences 95:3263–3268https://doi.org/10.1073/pnas.95.6.3263 PubMed Google Scholar
7.
1. Markov N. T.
2. et al.
2014Anatomy of hierarchy: Feedforward and feedback pathways in macaque visual cortexJournal of Comparative Neurology 522:225–259https://doi.org/10.1002/cne.23458 PubMed Google Scholar
8.
1. Felleman D.
2. Van Essen D.
1991Distributed hierarchical processing in the primate cerebral cortexCereb Cortex 1:1–47https://doi.org/10.1093/cercor/1.1.1-a PubMed Google Scholar
9.
1. Kreiman G.
2. Serre T.
2020Beyond the feedforward sweep: feedback computations in the visual cortexAnnals of the New York Academy of Sciences 1464:222–241https://doi.org/10.1111/nyas.14320 PubMed Google Scholar
10.
1. Hüer J.
2. Saxena P.
3. Treue S.
2024Pathway-selective optogenetics reveals the functional anatomy of top–down attentional modulation in the macaque visual cortexProc. Natl. Acad. Sci. U.S.A 121:e2304511121https://doi.org/10.1073/pnas.2304511121 PubMed Google Scholar
11.
1. Nurminen L.
2. Merlin S.
3. Bijanzadeh M.
4. Federer F.
5. Angelucci A.
2018Top-down feedback controls spatial summation and response amplitude in primate visual cortexNature Communications 9:2281https://doi.org/10.1038/s41467-018-04500-5 PubMed Google Scholar
12.
1. Bergmann J.
2. et al.
2024Cortical depth profiles in primary visual cortex for illusory and imaginary experiencesNature Communications 15:1002https://doi.org/10.1038/s41467-024-45065-w PubMed Google Scholar
13.
1. Lawrence S. J.
2. Norris D. G.
3. de Lange F. P.
2019Dissociable laminar profiles of concurrent bottom-up and top-down modulation in the human visual cortexeLife 8:e44422https://doi.org/10.7554/eLife.44422 PubMed Google Scholar
14.
1. Aitken F.
2. et al.
2020Prior expectations evoke stimulus-specific activity in the deep layers of the primary visual cortexPLOS Biology 18:e3001023https://doi.org/10.1371/journal.pbio.3001023 PubMed Google Scholar
15.
1. Thomas E. R.
2. et al.
2024Predictions and errors are distinctly represented across V1 layersCurrent Biology 34:2265–2271https://doi.org/10.1016/j.cub.2024.04.036 PubMed Google Scholar
16.
1. Carricarte T.
2. et al.
2024Laminar dissociation of feedforward and feedback in high-level ventral visual cortex during imagery and perceptioniScience 27:110229https://doi.org/10.1016/j.isci.2024.110229 PubMed Google Scholar
17.
1. Dowdle L.
2. et al.
2023Characterizing top-down microcircuitry of complex human behavior across different levels of the visual hierarchybioRxiv https://doi.org/10.1101/2022.12.03.518973 Google Scholar
18.
1. Liu C.
2. et al.
2021Layer-dependent multiplicative effects of spatial attention on contrast responses in human early visual cortexProgress in Neurobiology 207:101897https://doi.org/10.1016/j.pneurobio.2020.101897 PubMed Google Scholar
19.
1. Rockland K. S.
2. Pandya D. N.
1979Laminar origins and terminations of cortical connections of the occipital lobe in the rhesus monkeyBrain Research 179:3–20https://doi.org/10.1016/0006-8993(79)90485-2 PubMed Google Scholar
20.
1. Rockland K. S.
2019What do we know about laminar connectivity?NeuroImage 197:772–784https://doi.org/10.1016/j.neuroimage.2017.07.032 PubMed Google Scholar
21.
1. Siu C.
2. Balsor J.
3. Merlin S.
4. Federer F.
5. Angelucci A.
2021A direct interareal feedback-to-feedforward circuit in primate visual cortexNat Commun 12:4911https://doi.org/10.1038/s41467-021-24928-6 PubMed Google Scholar
22.
1. Federer F.
2. Ta’afua S.
3. Merlin S.
4. Hassanpour M. S.
5. Angelucci A.
2021Stream-specific feedback inputs to the primate primary visual cortexNature Communications 12:228https://doi.org/10.1038/s41467-020-20505-5 PubMed Google Scholar
23.
1. Hubel D. H.
2. Wiesel T. N.
1959Receptive fields of single neurones in the cat’s striate cortexJ Physiol 148:574–591https://doi.org/10.1113/jphysiol.1959.sp006308 PubMed Google Scholar
24.
1. Grill-Spector K.
2. et al.
1999Differential Processing of Objects under Various Viewing Conditions in the Human Lateral Occipital ComplexNeuron 24:187–203https://doi.org/10.1016/s0896-6273(00)80832-6 PubMed Google Scholar
25.
1. Kriegeskorte N.
2. Mur M.
3. Bandettini P. A.
2008Representational similarity analysis - connecting the branches of systems neuroscienceFrontiers in Systems Neuroscience 2https://doi.org/10.3389/neuro.06.004.2008 PubMed Google Scholar
26.
1. Cichy R. M.
2. Pantazis D.
3. Oliva A.
2014Resolving human object recognition in space and timeNature Neuroscience 17:455–462https://doi.org/10.1038/nn.3635 PubMed Google Scholar
27.
1. Cichy R. M.
2. Oliva A.
2020A M/EEG-fMRI Fusion Primer: Resolving Human Brain Responses in Space and TimeNeuron 107:772–781https://doi.org/10.1016/j.neuron.2020.07.001 PubMed Google Scholar
28.
1. Cichy R. M.
2. Pantazis D.
3. Oliva A.
2016Similarity-Based Fusion of MEG and fMRI Reveals Spatio-Temporal Dynamics in Human Cortex During Visual Object RecognitionCerebral Cortex 26:3563–3579https://doi.org/10.1093/cercor/bhw135 PubMed Google Scholar
29.
1. Essen D. C. V.
2. Maunsell J. H. R.
1983Hierarchical organization and functional streams in the visual cortexTrends in Neurosciences 6:370–375https://doi.org/10.1016/0166-2236(83)90167-4 Google Scholar
30.
1. Singer J. J. D.
2. Karapetian A.
3. Hebart M. N.
4. Cichy R. M.
2025Identifying and characterizing scene representations relevant for categorization behaviorImaging Neuroscience 3:imag_a_00449https://doi.org/10.1162/imag_a_00449 PubMed Google Scholar
31.
1. DiCarlo J. J.
2. Zoccolan D.
3. Rust N. C.
2012How Does the Brain Solve Visual Object Recognition?Neuron 73:415–434https://doi.org/10.1016/j.neuron.2012.01.010 PubMed Google Scholar
32.
1. Kourtzi Z.
2. Connor C. E.
2011Neural Representations for Object Perception: Structure, Category, and Adaptive CodingAnnual Review of Neuroscience 34:45–67https://doi.org/10.1146/annurev-neuro-060909-153218 PubMed Google Scholar
33.
1. Muukkonen I.
2. Ölander K.
3. Numminen J.
4. Salmela V. R.
2020Spatio-temporal dynamics of face perceptionNeuroImage 209:116531https://doi.org/10.1016/j.neuroimage.2020.116531 PubMed Google Scholar
34.
1. Cichy R. M.
2. Khosla A.
3. Pantazis D.
4. Torralba A.
5. Oliva A.
2016Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondenceScientific Reports 6:27755https://doi.org/10.1038/srep27755 PubMed Google Scholar
35.
1. Dosovitskiy A.
2020An image is worth 16×16 words: Transformers for image recognition at scalearXiv https://doi.org/10.48550/arXiv.2010.11929 Google Scholar
36.
1. Touvron H.
2. et al.
2021Training data-efficient image transformers & distillation through attentionIn: Proceedings of the 38th International Conference on Machine Learning pp. 10347–10357Google Scholar
37.
1. Krizhevsky A.
2. Sutskever I.
3. Hinton G. E.
2012ImageNet Classification with Deep Convolutional Neural NetworksIn: Advances in Neural Information Processing Systems Google Scholar
38.
1. Güçlü U.
2. van Gerven M. A. J.
2015Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral StreamJ. Neurosci 35:10005https://doi.org/10.1523/jneurosci.5023-14.2015 PubMed Google Scholar
39.
1. Waehnert M. D.
2. et al.
2014Anatomically motivated modeling of cortical laminaeNeuroImage 93:210–220https://doi.org/10.1016/j.neuroimage.2013.03.078 PubMed Google Scholar
40.
1. Barth M.
2. Norris D. G.
2007Very high-resolution three-dimensional functional MRI of the human visual cortex with elimination of large venous vesselsNMR in Biomedicine 20:477–484https://doi.org/10.1002/nbm.1158 PubMed Google Scholar
41.
1. Markuerkiaga I.
2. Barth M.
3. Norris D. G.
2016A cortical vascular model for examining the specificity of the laminar BOLD signalNeuroImage 132:491–498https://doi.org/10.1016/j.neuroimage.2016.02.073 PubMed Google Scholar
42.
1. Grossberg S.
2001Linking the laminar circuits of visual cortex to visual perception: Development, grouping, and attentionNeuroscience & Biobehavioral Reviews 25:513–526https://doi.org/10.1016/s0149-7634(01)00030-6 PubMed Google Scholar
43.
1. van Kerkoerle T.
2. Self M. W.
3. Roelfsema P. R.
2017Layer-specificity in the effects of attention and working memory on activity in primary visual cortexNature Communications 8:13804https://doi.org/10.1038/ncomms13804 PubMed Google Scholar
44.
1. Self M. W.
2. van Kerkoerle T.
3. Supèr H.
4. Roelfsema P. R.
2013Distinct Roles of the Cortical Layers of Area V1 in Figure-Ground SegregationCurrent Biology 23:2121–2129https://doi.org/10.1016/j.cub.2013.09.013 PubMed Google Scholar
45.
1. Barzegaran E.
2. Plomp G.
2022Four concurrent feedforward and feedback networks with different roles in the visual cortical hierarchyPLOS Biology 20:e3001534https://doi.org/10.1371/journal.pbio.3001534 PubMed Google Scholar
46.
1. Smith M. A.
2. Jia X.
3. Zandvakili A.
4. Kohn A.
2013Laminar dependence of neuronal correlations in visual cortexJournal of Neurophysiology 109:940–947https://doi.org/10.1152/jn.00846.2012 PubMed Google Scholar
47.
1. Polimeni J. R.
2. Fischl B.
3. Greve D. N.
4. Wald L. L.
2010Laminar analysis of 7T BOLD using an imposed spatial activation pattern in human V1NeuroImage 52:1334–1346https://doi.org/10.1016/j.neuroimage.2010.05.005 PubMed Google Scholar
48.
1. Hubel D. H.
2. Wiesel T. N.
1968Receptive fields and functional architecture of monkey striate cortexThe Journal of Physiology 195:215–243https://doi.org/10.1113/jphysiol.1968.sp008455 PubMed Google Scholar
49.
1. Wiesel T. N.
2. Hubel D. H.
1966Spatial and chromatic interactions in the lateral geniculate body of the rhesus monkeyJournal of Neurophysiology 29:1115–1156https://doi.org/10.1152/jn.1966.29.6.1115 PubMed Google Scholar
50.
1. Kar K.
2. DiCarlo J. J.
2021Fast Recurrent Processing via Ventrolateral Prefrontal Cortex Is Needed by the Primate Ventral Stream for Robust Core Visual Object RecognitionNeuron 109:164–176https://doi.org/10.1016/j.neuron.2020.09.035 PubMed Google Scholar
51.
1. Schwiedrzik C. M.
2. Freiwald W. A.
2017High-Level Prediction Signals in a Low-Level Area of the Macaque Face-Processing HierarchyNeuron 96:89–97https://doi.org/10.1016/j.neuron.2017.09.007 PubMed Google Scholar
52.
1. Stecher R.
2. Kaiser D.
2024Representations of imaginary scenes and their properties in cortical alpha activityScientific Reports 14:12796https://doi.org/10.1038/s41598-024-63320-4 PubMed Google Scholar
53.
1. Morgan A. T.
2. Petro L. S.
3. Muckli L.
2019Scene Representations Conveyed by Cortical Feedback to Early Visual Cortex Can Be Described by Line DrawingsJ. Neurosci 39:9410https://doi.org/10.1523/jneurosci.0852-19.2019 PubMed Google Scholar
54.
1. Papale P.
2. et al.
2023The representation of occluded image regions in area V1 of monkeys and humansCurrent Biology 33:3865–3871https://doi.org/10.1016/j.cub.2023.08.010 PubMed Google Scholar
55.
1. Del Rosario J.
2. et al.
2025Lateral inhibition in V1 controls neural and perceptual contrast sensitivityNature Neuroscience :836–847https://doi.org/10.1038/s41593-025-01888-4 PubMed Google Scholar
56.
1. Gilbert C. D.
2. Wiesel T. N.
1979Morphology and intracortical projections of functionally characterised neurones in the cat visual cortexNature 280:120–125https://doi.org/10.1038/280120a0 PubMed Google Scholar
57.
1. Rockland K. S.
2. Lund J. S.
1982Widespread Periodic Intrinsic Connections in the Tree Shrew Visual CortexScience 215:1532–1534https://doi.org/10.1126/science.7063863 PubMed Google Scholar
58.
1. Gilbert C.
2. Wiesel T.
1983Clustered intrinsic connections in cat visual cortexJ. Neurosci 3:1116https://doi.org/10.1523/jneurosci.03-05-01116.1983 PubMed Google Scholar
59.
1. Stettler D. D.
2. Das A.
3. Bennett J.
4. Gilbert C. D.
2002Lateral Connectivity and Contextual Interactions in Macaque Primary Visual CortexNeuron 36:739–750https://doi.org/10.1016/s0896-6273(02)01029-2 PubMed Google Scholar
60.
1. Grootswagers T.
2. Robinson A. K.
3. Shatek S. M.
4. Carlson T. A.
2024Mapping the dynamics of visual feature coding: Insights into perception and integrationPLOS Computational Biology 20:e1011760https://doi.org/10.1371/journal.pcbi.1011760 PubMed Google Scholar
61.
1. Gallimore C. G.
2. Ricci D. A.
3. Hamm J. P.
2023Spatiotemporal dynamics across visual cortical laminae support a predictive coding framework for interpreting mismatch responsesCerebral Cortex 33:9417–9428https://doi.org/10.1093/cercor/bhad215 PubMed Google Scholar
62.
1. Dobs K.
2. Isik L.
3. Pantazis D.
4. Kanwisher N.
2019How face perception unfolds over timeNature Communications 10:1258https://doi.org/10.1038/s41467-019-09239-1 PubMed Google Scholar
63.
1. Grill-Spector K.
2. Malach R.
2004The human visual cortexAnnual Review of Neuroscience 27:649–677https://doi.org/10.1146/annurev.neuro.27.070203.144220 PubMed Google Scholar
64.
1. Grill-Spector K.
2. et al.
1998A sequence of object-processing stages revealed by fMRI in the human occipital lobeHuman Brain Mapping 6:316–328https://doi.org/10.1002/(sici)1097-0193(1998)6:4<316::aid-hbm9>3.0.co;2-6 PubMed Google Scholar
65.
1. Callaway E. M.
2004Feedforward, feedback and inhibitory connections in primate visual cortexNeural Networks 17:625–632https://doi.org/10.1016/j.neunet.2004.04.004 PubMed Google Scholar
66.
1. Schmolesky M. T.
2. et al.
1998Signal Timing Across the Macaque Visual SystemJournal of Neurophysiology 79:3272–3278https://doi.org/10.1152/jn.1998.79.6.3272 PubMed Google Scholar
67.
1. Hupé J. M.
2. et al.
1998Cortical feedback improves discrimination between figure and background by V1, V2 and V3 neuronsNature 394:784–787https://doi.org/10.1038/29537 PubMed Google Scholar
68.
1. Hupé J.-M.
2. et al.
2001Feedback Connections Act on the Early Part of the Responses in Monkey Visual CortexJournal of Neurophysiology 85:134–145https://doi.org/10.1152/jn.2001.85.1.134 PubMed Google Scholar
69.
1. Vetter P.
2. Smith F. W.
3. Muckli L.
2014Decoding Sound and Imagery Content in Early Visual CortexCurrent Biology 24:1256–1262https://doi.org/10.1016/j.cub.2014.04.020 PubMed Google Scholar
70.
1. Goense J.
2. Merkle H.
3. Logothetis N. K.
2012High-Resolution fMRI Reveals Laminar Differences in Neurovascular Coupling between Positive and Negative BOLD ResponsesNeuron 76:629–639https://doi.org/10.1016/j.neuron.2012.09.019 PubMed Google Scholar
71.
1. Rao R. P. N.
2. Ballard D. H.
1999Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effectsNat Neurosci 2:79–87https://doi.org/10.1038/4580 PubMed Google Scholar
72.
1. Scheeringa R.
2. Fries P.
2019Cortical layers, rhythms and BOLD signalsNeuroImage 197:689–698https://doi.org/10.1016/j.neuroimage.2017.11.002 PubMed Google Scholar
73.
1. Haarsma J.
2. Deveci N.
3. Corbin N.
4. Callaghan M. F.
5. Kok P.
2023Expectation Cues and False Percepts Generate Stimulus-Specific Activity in Distinct Layers of the Early Visual CortexJournal of Neuroscience 43:7946–7957https://doi.org/10.1523/jneurosci.0998-23.2023 PubMed Google Scholar
74.
1. Haarsma J.
2. Hetenyi D.
3. Kok P.
2025Shared and Diverging Neural Dynamics Underlying False and Veridical PerceptionJ. Neurosci 45https://doi.org/10.1523/jneurosci.1479-24.2025 PubMed Google Scholar
75.
1. Larkum M. E.
2. Zhu J. J.
3. Sakmann B.
1999A new cellular mechanism for coupling inputs arriving at different cortical layersNature 398:338–341https://doi.org/10.1038/18686 PubMed Google Scholar
76.
1. Aru J.
2. Suzuki M.
3. Larkum M. E.
2020Cellular Mechanisms of Conscious ProcessingTrends in Cognitive Sciences 24:814–825https://doi.org/10.1016/j.tics.2020.07.006 PubMed Google Scholar
77.
1. Schuman B.
2. Dellal S.
3. Prönneke A.
4. Machold R.
5. Rudy B.
2021Neocortical Layer 1: An Elegant Solution to Top-Down and Bottom-Up IntegrationAnnual Review of Neuroscience 44:221–252https://doi.org/10.1146/annurev-neuro-100520-012117 PubMed Google Scholar
78.
1. Chen L.
2. Cichy R. M.
3. Kaiser D.
2023Alpha-frequency feedback to early visual cortex orchestrates coherent naturalistic visionScience Advances 9:eadi2321https://doi.org/10.1126/sciadv.adi2321 PubMed Google Scholar
79.
1. Xie S.
2. Kaiser D.
3. Cichy R. M.
2020Visual Imagery and Perception Share Neural Representations in the Alpha Frequency BandCurrent Biology 30:2621–2627https://doi.org/10.1016/j.cub.2020.04.074 PubMed Google Scholar
80.
1. Bastos A. M.
2. et al.
2015Visual Areas Exert Feedforward and Feedback Influences through Distinct Frequency ChannelsNeuron 85:390–401https://doi.org/10.1016/j.neuron.2014.12.018 PubMed Google Scholar
81.
1. van Kerkoerle T.
2. et al.
2014Alpha and gamma oscillations characterize feedback and feedforward processing in monkey visual cortexProceedings of the National Academy of Sciences 111:14332–14341https://doi.org/10.1073/pnas.1402773111 PubMed Google Scholar
82.
1. Michalareas G.
2. et al.
2016Alpha-Beta and Gamma Rhythms Subserve Feedback and Feedforward Influences among Human Visual Cortical AreasNeuron 89:384–397https://doi.org/10.1016/j.neuron.2015.12.018 PubMed Google Scholar
83.
1. Muckli L.
2. et al.
2015Contextual Feedback to Superficial Layers of V1Current Biology 25:2690–2695https://doi.org/10.1016/j.cub.2015.08.057 PubMed Google Scholar
84.
1. Haynes J.-D.
2. Rees G.
2006Decoding mental states from brain activity in humansNature Reviews Neuroscience 7:523–534https://doi.org/10.1038/nrn1931 PubMed Google Scholar
85.
1. Kriegeskorte N.
2. Kievit R. A.
2013Representational geometry: integrating cognition, computation, and the brainTrends in Cognitive Sciences 17:401–412https://doi.org/10.1016/j.tics.2013.06.007 PubMed Google Scholar
86.
1. Mur M.
2. Bandettini P. A.
3. Kriegeskorte N.
2009Revealing representational content with pattern-information fMRI—an introductory guideSocial Cognitive and Affective Neuroscience 4:101–109https://doi.org/10.1093/scan/nsn044 PubMed Google Scholar
87.
1. Henriksson L.
2. Mur M.
3. Kriegeskorte N.
2019Rapid Invariant Encoding of Scene Layout in Human OPANeuron 103:161–171https://doi.org/10.1016/j.neuron.2019.04.014 PubMed Google Scholar
88.
1. Mohsenzadeh Y.
2. Qin S.
3. Cichy R. M.
4. Pantazis D.
2018Ultra-Rapid serial visual presentation reveals dynamics of feedforward and feedback processes in the ventral visual pathwayeLife 7:e36329https://doi.org/10.7554/eLife.36329 PubMed Google Scholar
89.
1. Motlagh S. C.
2. Joanisse M.
3. Wang B.
4. Mohsenzadeh Y.
2024Unveiling the neural dynamics of conscious perception in rapid object recognitionNeuroImage 296:120668https://doi.org/10.1016/j.neuroimage.2024.120668 PubMed Google Scholar
90.
1. Hung C. P.
2. Kreiman G.
3. Poggio T.
4. DiCarlo J. J.
2005Fast readout of object identity from macaque inferior temporal cortexScience 310:863–866https://doi.org/10.1126/science.1117593 PubMed Google Scholar
91.
1. Hopf J.-M.
2. et al.
2000Neural Sources of Focused Attention in Visual SearchCerebral Cortex 10:1233–1241https://doi.org/10.1093/cercor/10.12.1233 PubMed Google Scholar
92.
1. Cul A. D.
2. Baillet S.
3. Dehaene S.
2007Brain Dynamics Underlying the Nonlinear Threshold for Access to ConsciousnessPLOS Biology 5:e260https://doi.org/10.1371/journal.pbio.0050260 PubMed Google Scholar
93.
1. Duan Y.
2. Zhan J.
3. Gross J.
4. Ince R. A. A.
5. Schyns P. G.
2024Pre-frontal cortex guides dimension-reducing transformations in the occipito-ventral pathway for categorization behaviorsCurrent Biology 34:3392–3404https://doi.org/10.1016/j.cub.2024.06.050 PubMed Google Scholar
94.
1. Bar M.
2003A Cortical Mechanism for Triggering Top-Down Facilitation in Visual Object RecognitionJournal of Cognitive Neuroscience 15:600–609https://doi.org/10.1162/089892903321662976 PubMed Google Scholar
95.
1. Schendan H. E.
2. Lucia L. C.
2010Object-sensitive activity reflects earlier perceptual and later cognitive processing of visual objects between 95 and 500msBrain Research 1329:124–141https://doi.org/10.1016/j.brainres.2010.01.062 PubMed Google Scholar
96.
1. Kietzmann T. C.
2. et al.
2019Recurrence is required to capture the representational dynamics of the human visual systemProceedings of the National Academy of Sciences 116:21854–21863https://doi.org/10.1073/pnas.1905544116 PubMed Google Scholar
97.
1. Kar K.
2. Kubilius J.
3. Schmidt K.
4. Issa E. B.
5. DiCarlo J. J.
2019Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behaviorNature Neuroscience 22:974–983https://doi.org/10.1038/s41593-019-0392-5 PubMed Google Scholar
98.
1. von Seth J.
2. Nicholls V. I.
3. Tyler L. K.
4. Clarke A.
2023Recurrent connectivity supports higher-level visual and semantic object representations in the brainCommun Biol 6:1–15https://doi.org/10.1038/s42003-023-05565-9 PubMed Google Scholar
99.
1. Hamm J. P.
2. Shymkiv Y.
3. Han S.
4. Yang W.
5. Yuste R.
2021Cortical ensembles selective for contextProceedings of the National Academy of Sciences 118:e2026179118https://doi.org/10.1073/pnas.2026179118 PubMed Google Scholar
100.
1. Hallett M.
2007Transcranial Magnetic Stimulation: A PrimerNeuron 55:187–199https://doi.org/10.1016/j.neuron.2007.06.026 PubMed Google Scholar
101.
1. Bergmann T. O.
2. et al.
2021Concurrent TMS-fMRI for causal network perturbation and proof of target engagementNeuroImage 237:118093https://doi.org/10.1016/j.neuroimage.2021.118093 PubMed Google Scholar
102.
1. Park S.-H.
2. Masamoto K.
3. Hendrich K.
4. Kanno I.
5. Kim S.-G.
2008Imaging brain vasculature with BOLD 3D microscopy: MR detection limits determined by in vivo two-photon microscopyMagn Reson Med 59:855–865https://doi.org/10.1002/mrm.21573 PubMed Google Scholar
103.
1. Markuerkiaga I.
2. Marques J. P.
3. Gallagher T. E.
4. Norris D. G.
2021Estimation of laminar BOLD activation profiles using deconvolution with a physiological point spread functionJ Neurosci Methods 353:109095https://doi.org/10.1016/j.jneumeth.2021.109095 PubMed Google Scholar
104.
1. Fischl B.
2. Dale A. M.
2000Measuring the thickness of the human cerebral cortex from magnetic resonance imagesProceedings of the National Academy of Sciences 97:11050–11055https://doi.org/10.1073/pnas.200033797 PubMed Google Scholar
105.
1. Scheeringa R.
2. Koopmans P. J.
3. van Mourik T.
4. Jensen O.
5. Norris D. G.
2016The relationship between oscillatory EEG activity and the laminar-specific BOLD signalProceedings of the National Academy of Sciences 113:6761–6766https://doi.org/10.1073/pnas.1522577113 PubMed Google Scholar
106.
1. Huber L.
2. et al.
2017High-Resolution CBV-fMRI Allows Mapping of Laminar Activity and Connectivity of Cortical Input and Output in Human M1Neuron 96:1253–1263https://doi.org/10.1016/j.neuron.2017.11.005 PubMed Google Scholar
107.
1. Huber L.
2. et al.
2018Techniques for blood volume fMRI with VASO: From low-resolution mapping towards sub-millimeter layer-dependent applicationsNeuroImage 164:131–143https://doi.org/10.1016/j.neuroimage.2016.11.039 PubMed Google Scholar
108.
1. Pizzuti A.
2. et al.
2023Imaging the columnar functional organization of human area MT+ to axis-of-motion stimuli using VASO at 7 TeslaCerebral Cortex 33:8693–8711https://doi.org/10.1093/cercor/bhad151 PubMed Google Scholar
109.
1. Kok P.
2. Bains L. J.
3. van Mourik T.
4. Norris D. G.
5. de Lange F. P.
2016Selective Activation of the Deep Layers of the Human Primary Visual Cortex by Top-Down FeedbackCurrent Biology 26:371–376https://doi.org/10.1016/j.cub.2015.12.038 PubMed Google Scholar
110.
1. Hebart M. N.
2. Bankson B. B.
3. Harel A.
4. Baker C. I.
5. Cichy R. M.
2018The representational dynamics of task and object processing in humanseLife 7:e32816https://doi.org/10.7554/eLife.32816 PubMed Google Scholar
111.
1. Moeller S.
2. et al.
2010Multiband multislice GE-EPI at 7 tesla, with 16-fold acceleration using partial parallel imaging with application to high spatial and temporal whole-brain fMRIMagnetic Resonance in Medicine 63:1144–1153https://doi.org/10.1002/mrm.22361 PubMed Google Scholar
112.
1. Marques J. P.
2. et al.
2010MP2RAGE, a self bias-field corrected sequence for improved segmentation and T1-mapping at high fieldNeuroImage 49:1271–1281https://doi.org/10.1016/j.neuroimage.2009.10.002 PubMed Google Scholar
113.
1. Glasser M. F.
2. et al.
2016A multi-modal parcellation of human cerebral cortexNature 536:171–178https://doi.org/10.1038/nature18933 PubMed Google Scholar
114.
1. Julian J. B.
2. Fedorenko E.
3. Webster J.
4. Kanwisher N.
2012An algorithmic method for functionally defining regions of interest in the ventral visual pathwayNeuroImage 60:2357–2364https://doi.org/10.1016/j.neuroimage.2012.02.055 PubMed Google Scholar
115.
1. Yushkevich P. A.
2. et al.
2019User-Guided Segmentation of Multi-modality Medical Imaging Datasets with ITK-SNAPNeuroinformatics 17:83–102https://doi.org/10.1007/s12021-018-9385-x PubMed Google Scholar
116.
1. Lüsebrink F.
2. Sciarra A.
3. Mattern H.
4. Yakupov R.
5. Speck O.
2017T1-weighted in vivo human whole brain MRI dataset with an ultrahigh isotropic resolution of 250 μmScientific Data 4:170032https://doi.org/10.1038/sdata.2017.32 PubMed Google Scholar
117.
1. Huber L. (Renzo)
2. et al.
2021LayNii: A software suite for layer-fMRINeuroImage 237:118091https://doi.org/10.1016/j.neuroimage.2021.118091 PubMed Google Scholar
118.
1. Badwal M. W.
2. Bergmann J.
3. Roth J.
4. Doeller C. F.
5. Hebart M. N.
2025The scope and limits of fine-grained image and category information in the ventral visual pathwayJ. Neurosci 45:e0936242024https://doi.org/10.1523/JNEUROSCI.0936-24.2024 PubMed Google Scholar
119.
1. Malekian V.
2. et al.
2023Mitigating susceptibility-induced distortions in high-resolution 3DEPI fMRI at 7TNeuroImage 279:120294https://doi.org/10.1016/j.neuroimage.2023.120294 PubMed Google Scholar
120.
1. Tadel F.
2. Baillet S.
3. Mosher J. C.
4. Pantazis D.
5. Leahy R. M.
2011Brainstorm: A User-Friendly Application for MEG/EEG AnalysisComputational Intelligence and Neuroscience 2011:879716https://doi.org/10.1155/2011/879716 PubMed Google Scholar
121.
1. Guggenmos M.
2. Sterzer P.
3. Cichy R. M.
2018Multivariate pattern analysis for MEG: A comparison of dissimilarity measuresNeuroImage 173:434–447https://doi.org/10.1016/j.neuroimage.2018.02.044 PubMed Google Scholar
122.
1. Mood A. M.
1971Partitioning Variance in Multiple Regression Analyses as a Tool For Developing Learning ModelsAmerican Educational Research Journal 8:191–202https://doi.org/10.3102/00028312008002191 Google Scholar
123.
1. Reichwein Zientek L.
2. Thompson B.
2006Commonality Analysis: Partitioning Variance to Facilitate Better Understanding of DataJournal of Early Intervention 28:299–307https://doi.org/10.1177/105381510602800405 Google Scholar
124.
1. Diedrichsen J.
2. Kriegeskorte N.
2017Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysisPLoS Comput Biol 13:e1005508https://doi.org/10.1371/journal.pcbi.1005508 PubMed Google Scholar
125.
1. Walther A.
2. et al.
2016Reliability of dissimilarity measures for multi-voxel pattern analysisNeuroImage 137:188–200https://doi.org/10.1016/j.neuroimage.2015.12.012 PubMed Google Scholar
126.
1. Chang C.-C.
2. Lin C.-J.
2011LIBSVM: A library for support vector machinesACM Trans Intell Syst Technol 2:27–27https://doi.org/10.1145/1961189.1961199 Google Scholar
127.
1. Deng J.
2. et al.
2009ImageNet: A large-scale hierarchical image databaseIn: 2009 IEEE Conference on Computer Vision and Pattern Recognition pp. 248–255https://doi.org/10.1109/CVPR.2009.5206848 Google Scholar
128.
1. Zeiler M. D.
2. Fergus R.
2013Visualizing and Understanding Convolutional NetworksarXiv https://doi.org/10.48550/arXiv.1311.2901 Google Scholar
129.
1. Zhou B.
2. Khosla A.
3. Lapedriza A.
4. Oliva A.
5. Torralba A.
2015Object Detectors Emerge in Deep Scene CNNsarXiv https://doi.org/10.48550/arXiv.1412.6856 Google Scholar

Article and author information

Author information

Tony Carricarte
Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany, Einstein Center for Neurosciences Berlin, Charité - Universitätsmedizin Berlin, Berlin, Germany, Bernstein Center for Computational Neuroscience Berlin, Humboldt-Universität zu Berlin, Berlin, Germany
ORCID iD: 0009-0001-3534-3738
- For correspondence: tcarricarte@gmail.com
Siying Xie
Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
- These authors contributed equally
Johannes Singer
Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany
- These authors contributed equally
Robert Trampel
Department of Neurophysics, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
Laurentius Huber
National Institutes of Health, Bethesda, United States
Zejin Lu
Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany, Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
Tim C Kietzmann
Institute of Cognitive Science, University of Osnabrück, Osnabrück, Germany
Nikolaus Weiskopf
Department of Neurophysics, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany, Felix Bloch Institute for Solid State Physics, Faculty of Physics and Earth Sciences, Universität Leipzig, Leipzig, Germany, Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, University College London, London, United Kingdom
ORCID iD: 0000-0001-5239-1881
Radoslaw M Cichy
Department of Education and Psychology, Freie Universität Berlin, Berlin, Germany, Einstein Center for Neurosciences Berlin, Charité - Universitätsmedizin Berlin, Berlin, Germany, Bernstein Center for Computational Neuroscience Berlin, Humboldt-Universität zu Berlin, Berlin, Germany, Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Berlin, Germany

Author Notes

Competing interests: No competing interests declared

Version history

Sent for peer review: March 27, 2026
Preprint posted: March 31, 2026
Reviewed Preprint version 1: June 23, 2026

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.111186. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Metrics

views: 175
downloads: 8
citations: 0

Views, downloads and citations are aggregated across all versions of this paper published by eLife.