The spatial frequency representation predicts category coding in the inferior temporal cortex
Figures

Experimental design and spatial frequency (SF) coding.
(a) Experimental design. The design of the experiment involved the collection of responses from inferior temporal (IT) neurons to 15 stimuli (including six faces, three non-faces, and six selective stimuli, see Materials and methods) in six SF bands (intact and R1–R5, see Materials and methods), and two versions (scrambled and unscrambled) using a passive presentation task. The presentation of blocks starts if the monkey preserves fixation for 200 ms. Each block consisted of a 33-ms stimulus presentation followed by a blank screen with a fixation point of 465 ms, and each stimulus was presented 15 times. The recorded signals were sorted, and visually responsive neurons were selected (N = 266, see Materials and methods). (b) A sample of the fixed stimulus set. This panel shows three (out of six) faces, three non-faces, and one scrambled sample stimulus. Each row corresponds to an SF range starting with intact, followed by R1–R5 (low to high SF). (c) A sample neuron. The peristimulus time histogram (PSTH) of a sample neuron (N = 151, M1) for scrambled stimuli is depicted. To generate a response vector for a given stimulus or trial, the responses of each neuron were averaged in a 50-ms time window centered around the relevant time point. The PSTH was smoothed using a Gaussian kernel with a standard deviation of 20 ms. The responses of three SF bands (R1, R3, and R5) are shown for better illustration. (d) SF coding exists in the IT cortex. The decoding performance of SF ranges using scrambled stimuli is shown over time. Single- and population-level representations were fed into a linear discriminant analysis (LDA) algorithm to predict the SF range of the scrambled stimuli. Shadows illustrate the SEM and STD for single and population levels, respectively. This figure highlights the presence of SF coding in both individual and population neural activity. (e) Low SF (LSF)-preferred nature of SF coding. The population recall of each SF band in response to scrambled stimuli, determined using the LDA method, is presented. The error bars indicate the STD. The results demonstrate a decreasing trend as SF moves toward higher frequencies, suggesting a coarse-to-fine decoding preference.
-
Figure 1—source data 1
Source data for neuronal firing rates, SF decoding accuracy at single-unit and population levels, and recall of SF decoding per SF.
- https://cdn.elifesciences.org/articles/93589/elife-93589-fig1-data1-v1.xlsx

The temporal dynamics of spatial frequency (SF) representation.
(a) Course-to-fine nature of SF coding. The onset time of the recall of each SF range in scrambled stimuli is illustrated, with error bars indicating the STD. The results suggest that the onset time of decoding increases as SF increases. (b) SF preference shifts toward higher frequencies over time. The time course of the average preferred SF (see Materials and methods) across neurons is illustrated. The average preferred SF of inferior temporal (IT) neurons moves toward higher frequencies from 170 ms after stimulus onset, reaching its highest value at 220 ms. A second peak emerges at 320 ms following the stimulus onset. The SF preference shows a monotonic increase followed by a decrease in time. The shadow shows SEM. (c, d) Shift in neural response toward high SF (HSF). The average response of all neurons within the two time intervals (T1 and T2 in panel b) is shown, with error bars indicating the SEM. (c) In T1, from 70 to 170 ms after stimulus onset, a decreasing response of the neurons is observed as the SF content shifts toward higher frequencies. The relative percentage of neurons showing stronger responses to SF ranges (R1–R5) in T1 is depicted in the inner top panel. R1 is the most responsive SF for roughly 40% of the neurons. (d) In the following interval (T2, 170–270 ms), an increasing tuning is observed from R2 to R5, where R5 elicits the highest firing rates. Furthermore, in T2, there is a roughly threefold increase in the percentage of neurons exhibiting stronger responses to R5 compared to T1, indicating a shift in the neurons’ responses toward HSF (top panel).
-
Figure 2—source data 1
Source data for temporal dynamics of SF coding, including onset of recall, preferred SF, and neuronal responses across time intervals.
- https://cdn.elifesciences.org/articles/93589/elife-93589-fig2-data1-v1.xlsx

Spatial frequency (SF) profile predicts category coding.
(a, b) SF profile predicts category selectivity. (a) The responses of each neuron were standardized by subtracting the mean and dividing by the standard deviation of the baseline time. Neurons were then categorized into four groups based on the fitting of a quadratic function to their responses (see Materials and methods). Each panel presents the average neuron responses within each category for SF ranges R1–R5, with error bars indicating the SEM of the response values. The percentage of the neurons in each category is displayed at the top of each panel. The ‘flat’ category, where the response to no SF was higher than others, was excluded from this analysis. (b) Separability index (SI) of face/non-face vs. scrambled stimuli is illustrated (see Materials and methods). The error bar shows STD. The SI value and SF profile are determined within the time window of 70–170 ms after stimulus onset. The high SF (HSF)-preferred population exhibited significantly higher face SI compared to the other groups. The low SF (LSF)-preferred population displayed a significant difference in face and non-face SI. On the other hand, the IU profile indicates a significantly higher SI value for the non-face compared to the face. The U-shaped profile did not show any significant differences between the face and the non-face. These results suggest that the neuron's response to various SF bands can predict its decoding capability. (c, d) The relation between SF and category coding in sub-populations. Initially, the linear discriminant analysis (LDA) method was employed to calculate the individual neuron's performance in the single-level category and SF coding. Next, a sorting procedure based on SF (panel c) and category (panel d) coding performances was conducted to create sub-populations of neurons exhibiting similar capabilities (see Materials and methods). The scatter plot of the category and SF coding accuracy of these sub-populations demonstrated a notably high degree of positive correlation between SF and category accuracies in the inferior temporal (IT) cortex.
-
Figure 3—source data 1
Source data for neuronal responses, SF profiles, and category selectivity, including separability indices and decoding accuracy for sub-populations of inferior temporal neurons.
- https://cdn.elifesciences.org/articles/93589/elife-93589-fig3-data1-v1.xlsx

Uncorrelated mechanisms for spatial frequency (SF) and category coding.
(a) Uncorrelated SF and category coding in the single level. The scatter plot indicates the category–SF accuracies and does not reveal a significant correlation between SF and category coding capabilities within the inferior temporal (IT) cortex at the single-neuron level. The error bars show the STD for SF and category decoding accuracies. (b) Uncorrelated neuron contribution in SF and category coding in population. The linear discriminant analysis (LDA) weight of each neuron is considered as the neuron contribution in the population coding of SF or category (see Materials and methods). The scatter plot of the neuron weights in SF shows a near-zero correlation with the neuron weights in category coding.
-
Figure 4—source data 1
Source data for single-neuron and population-level contributions to SF and category coding.
- https://cdn.elifesciences.org/articles/93589/elife-93589-fig4-data1-v1.xlsx

Sparse spatial frequency (SF) coding compared to category coding.
(a, b) Sparse mechanism for SF coding. (a) The contribution of each neuron in SF and category (face vs. non-face) decoding is evaluated by removing it from the feature set fed to the linear discriminant analysis (LDA) within the time window of 70–170 ms after stimulus onset. The histogram of the single-neuron contribution (SNC) value (see Materials and methods) is presented, indicating the amount of accuracy loss when a neuron is removed. The bar plot displays the average SNC values for SF and category, with error bars representing the SEM. The SNC value for SF is significantly higher than for the category. (b) Furthermore, the conditional mutual information (CMI) of each neuron pair, conditioned to the label (category or SF), is illustrated. CMI reflects the information redundancy between neuron pairs during SF or category decoding. A lower CMI value for SF indicates that individual neurons carry more independent SF-related information compared to category information. (c) Sparse neuron contribution in SF coding at the early phase of the response. To investigate the contribution of the neurons in population decoding, the sparseness of the LDA weights assigned to each neuron is calculated. Higher sparseness indicates a greater contribution of a smaller group of neurons to the decoding process. The time course of weight sparseness is depicted for SF and category (face vs. non-face) decoding, with shadows representing the STD. During the early phase of the response, the sparseness of SF-related weights is higher than that of the category, while this relationship is reversed during the late phase of the response.
-
Figure 5—source data 1
Source data for single-neuron and population contributions to SF and category decoding, including single-neuron contribution values, conditional mutual information, and temporal sparseness of LDA weights.
- https://cdn.elifesciences.org/articles/93589/elife-93589-fig5-data1-v1.xlsx

Spatial frequency (SF) representation in convolutional neural networks (CNNs).
(a) SF coding capabilities. We assessed the SF coding capabilities of popular CNN architectures (ResNet18, ResNet34, VGG11, VGG16, InceptionV3, EfficientNetb0, CORNet-S, CORTNet-RT, and CORNet-z) using both randomly initialized (R) and pre-trained (P) weights on ImageNet. A linear discriminant analysis (LDA) model was trained using feature maps from the four last layers of each CNN to classify the SF content of input images. The SF decoding accuracy for each CNN on our dataset is presented with error bars indicating the STD. (b) Low SF (LSF)-preferred recall performance. The recall performance of two sample networks (CORNET-z and ResNet18) is presented. STD values are illustrated with error bars. The recall values for LSF content were higher than high SF (HSF) content in most CNNs, resembling the trends observed in the inferior temporal (IT) cortex. (c) The profiles (left) and face/non-face separability index (SI) value (right) of a sample network (ResNet18). Error bars show STD. Profiles are calculated similarly to the IT cortex. CNNs did not replicate the SF-based profiles observed in the IT cortex.
-
Figure 6—source data 1
Source data for SF decoding and recall performance in multiple CNN architectures, SF preference profiles and face/non-face separability indices.
- https://cdn.elifesciences.org/articles/93589/elife-93589-fig6-data1-v1.xlsx

Strength of SF selectivity.
To assess the strength of SF selectivity in IT responses, we first ranked the SF content based on the firing rate in each neuron employing half of the trials. Then, the other half is used to calculate the firing rate of each rank. Results show that the firing rate of rank 5 is significantly higher than rank 1 (p-value = 4 × 10−4). The error bars show STD.

SF response distribution.
To check the SF response strength, the histogram of IT neuron responses to scrambled, face, and non-face stimuli is illustrated in this figure. A Gamma distribution is also fitted to each histogram. To calculate the histogram, the neuron response to each unique stimulus is calculated for each neuron in spike/seconds (Hz). In the early phase, T1, the average firing rate to scrambled stimuli is 26.3 Hz which is significantly higher than the response in −50 to 50 ms which is 23.4 Hz. In comparison, the mean response to intact face stimuli is 30.5 Hz, while non-face stimuli elicit an average response of 28.8 Hz. The average net responses to the scrambled, face, and non-face stimuli are 2.9, 7.1, and 5.4 Hz, respectively. Moving to the late phase, T2, the responses to scrambled, face, and object stimuli are 19.5, 19.4, and 22.4 Hz, respectively. The corresponding average net responses are 3.9, 4.0, and 1.0 Hz below the baseline response. While the firing rates and net responses to scrambled stimuli were modest (e.g., 2.9 Hz in T1), the differences across spatial frequency (SF) bands were statistically significant (p ≈ 1e−5) and led to a classification accuracy 24.68% above chance. This demonstrates the robustness of SF modulation in IT neurons despite low firing rates. The modest responses align with expectations for noise-like stimuli, which are less effective in driving IT neurons, yet the observed SF selectivity highlights a fundamental property of IT encoding.

SF profile robustness.
Profiles are calculated using half of the trials. Then, the average of the neuron responses in each profile is calculated with the remaining half. STD is illustrated with error bars.

LSF-preferred responses with extended stimulus duration.
We conducted the experiments in Appendix 1—figure 1c and Appendix 1—figure 2a with 200 ms of stimulus duration with the same method, in 70–170 ms after stimulus onset. (a) The recall of each SF band in the population, as elicited by scrambled stimuli and determined through the LDA method, is presented. The error bars denote the STD. The findings support the LSF-preferred nature of SF decoding observed with 33 ms of stimulus duration. (b) The onset time of recall for each spatial SF band in response to scrambled stimuli is depicted, with error bars representing the STD. The results imply an increasing onset time of decoding as SF values rise, as we observed in 33-ms stimulus duration.

The SF and category selectivity of the recorded locations.
The accuracy of single neurons for SF prediction (a) and category prediction (b) is illustrated for each recorded location. The x- and y-axes show anterior–posterior (A/P) or medial–lateral (M/L) hole location and the depth of the electrode in milliliters. A/P ranges from 5 mm (hole number 1) to 30 mm (hole number 18) and M/L ranges from 0 mm (hole number 1) to 23 mm (hole number 18).
Tables
Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
---|---|---|---|---|
Biological sample (Macaca mulatta, male) | IT cortex neurons; Neurons; Monkey | This paper | From two adult male macaques (10 and 11 kg); see Materials and methods | |
Software, algorithm | MATLAB | MathWorks | Used for stimulus presentation, control, and analysis | |
Software, algorithm | MonkeyLogic toolbox | MonkeyLogic website | For experimental control in MATLAB | |
Software, algorithm | Python | Python Software Foundation | 3.10 | Used for data analysis and machine learning workflows |
Software, algorithm | PyTorch | pytorch.org | 2.0 | Deep learning framework used for neural modeling |
Additional files
-
MDAR checklist
- https://cdn.elifesciences.org/articles/93589/elife-93589-mdarchecklist1-v1.pdf
-
Source code 1
Source code for controlling stimulus presentation and juice delivery using MATLAB and the MonkeyLogic toolbox.
- https://cdn.elifesciences.org/articles/93589/elife-93589-code1-v1.zip