Experimental design and SF coding.

a Experimental design. The design of the experiment involved the collection of responses from IT neurons to 15 stimuli (including six faces, three non-faces, and six selective stimuli, see Materials and methods) in six SF bands (intact and R1 to R5, see Materials and methods), and two versions (scrambled and unscrambled) using a passive presentation task. Presentation of blocks starts if the monkey preserves fixation for 200ms. Each block consisted of a 33ms stimulus presentation followed by a blank screen with a fixation point of 465ms, and each stimulus was presented 15 times. The recorded signals were sorted, and visually responsive neurons were selected (N = 266, see Materials and methods). b A sample of the fixed stimulus set. This panel shows three (out of six) faces, three non-faces, and one scrambled sample stimulus. Each row corresponds to an SF range starting with intact, followed by R1 to R5 (low to high SF). c A sample neuron. The PSTH of a sample neuron (N = 151, M1) for scrambled stimuli is depicted. To generate a response vector for a given stimulus or trial, the responses of each neuron were averaged in a 50ms time window centered around the relevant time point. The PSTH was smoothed using a Gaussian kernel with a standard deviation of 20ms. The responses of three SF bands (R1, R3, and R5) are shown for better illustration. d SF coding exists in the IT cortex. The decoding performance of SF ranges using scrambled stimuli is shown over time. Single-level and population-level representations were fed into an LDA algorithm to predict the SF range of the scrambled stimuli. Shadows illustrate the SEM and STD for single and population levels, respectively. This figure highlights the presence of SF coding in both individual and population neural activity. e LSF-preferred nature of SF coding. The population recall of each SF band in response to scrambled stimuli, determined using the LDA method, is presented. The error bars indicate the STD. The results demonstrate a decreasing trend as SF moves towards higher frequencies, suggesting a coarse-to-fine decoding preference.

The temporal dynamics of SF representation.

a Course-to-fine nature of SF coding. The onset time of the recall of each SF range in scrambled stimuli is illustrated, with error bars indicating the STD. The results suggest that the onset time of decoding increases as SF increases. b SF preference shifts toward higher frequencies over time. The time course of the average preferred SF (see Materials and methods) across neurons is illustrated. The average preferred SF of IT neurons moves towards higher frequencies from 170ms after stimulus onset, reaching its highest value at 220ms. A second peak emerges at 320ms following the stimulus onset. The SF preference shows a monotonic increase followed by a decrease in time. c,d Shift in neural response towards HSF. The average response of all neurons within the two time intervals (T1 and T2 in panel b) is shown, with error bars indicating the SEM. c In T1, from 70ms to 170ms after stimulus onset, a decreasing response of the neurons is observed as the SF content shifts towards higher frequencies. The relative percentage of neurons showing stronger responses to SF ranges (R1 to R5) in T1 is depicted in the inner top panel. R1 is the most responsive SF for roughly 40% of the neurons. d In the following interval (T2, 170ms to 270ms), an increasing tuning is observed from R2 to R5, where R5 elicits the highest firing rates. Furthermore, in T2, there is a roughly threefold increase in the percentage of neurons exhibiting stronger responses to R5 compared to T1, indicating a shift in the neurons’ responses towards HSF (top panel).

SF profile predicts category coding.

a,b SF proile predicts category selectivity. a The responses of each neuron were standardized by subtracting the mean and dividing by the standard deviation of the baseline time. Neurons were then categorized into four groups based on the fitting of a quadratic function to their responses (see Materials and methods). Each panel presents the average neuron responses within each category for SF ranges R1 to R5, with error bars indicating the SEM of the response values. The percentage of the neurons in each category is displayed at the top of each panel. The “flat” category, where the response to no SF was higher than others, was excluded from this analysis. b SI of face/non-face vs. scrambled stimuli is illustrated (see Materials and methods). The SI value and SF profile are determined within the time window of 70ms to 170ms after stimulus onset. The HSF-preferred population exhibited significantly higher face SI compared to the other groups. The LSF-preferred population displayed a significant difference in face and non-face SI. On the other hand, the IU profile indicates a significantly higher SI value for the non-face compared to the face. The U-shaped profile did not show any significant differences between the face and the non-face. These results suggest that the neuron’s response to various SF bands can predict its decoding capability. c,d The relation between SF and category coding in sub-populations. Initially, the LDA method was employed to calculate the individual neuron’s performance in the single-level category and SF coding. Next, a sorting procedure based on SF (panel c) and category (panel d) coding performances were conducted to create sub-populations of neurons exhibiting similar capabilities (see Materials and methods). The scatter plot of the category and SF coding accuracy of these sub-populations demonstrated a notably high degree of positive correlation between SF and category accuracies in the IT cortex.

Uncorrelated mechanisms for SF and category coding.

a uncorrelated SF and category coding in the single level. The scatter plot indicates the category-SF accuracies and does not reveal a significant correlation between SF and category coding capabilities within the IT cortex at the single-neuron level. The error bars show the STD for SF and category decoding accuracies. b uncorrelated neuron contribution in SF and category coding in population. The LDA weight of each neuron is considered as the neuron contribution in the population coding of SF or category (see Materials and methods). The scatter plot of the neuron weights in SF shows a near-zero correlation with the neuron weights in category coding.

Sparse SF coding compared to category coding.

a,b Sparse mechanism for SF coding. a The contribution of each neuron in SF and category (face vs. non-face) decoding is evaluated by removing it from the feature set fed to the LDA within the time window of 70ms to 170ms after stimulus onset. The histogram of the SNC value (see Materials and methods) is presented, indicating the amount of accuracy loss when a neuron is removed. The bar plot displays the average SNC values for SF and category, with error bars representing the SEM. The SNC value for SF is significantly higher than for the category. b Furthermore, the CMI of each neuron pair, conditioned to the label (category or SF), is illustrated. CMI reflects the information redundancy between neuron pairs during SF or category decoding. A lower CMI value for SF indicates that individual neurons carry more independent SF-related information compared to category information. c Sparse neuron contribution in SF coding at the early phase of the response. To investigate the contribution of the neurons in population decoding, the sparseness of the LDA weights assigned to each neuron is calculated. Higher sparseness indicates a greater contribution of a smaller group of neurons to the decoding process. The time course of weight sparseness is depicted for SF and category (face vs. non-face) decoding, with shadows representing the STD. During the early phase of the response, the sparseness of SF-related weights is higher than that of the category, while this relationship is reversed during the late phase of the response.

SF representation in CNNs.

a SF coding capabilities. We assessed the SF coding capabilities of popular CNN architectures (ResNet18, ResNet34, VGG11, VGG16, InceptionV3, EffcientNetb0, CORNet-S, CORTNet-RT, and CORNet-z) using both randomly initialized (R) and pre-trained (P) weights on ImageNet. An LDA model was trained using feature maps from the four last layers of each CNN to classify the SF content of input images. The SF decoding accuracy for each CNN on our dataset is presented with error bars indicating the STD. b LSF-preferred recall performance. The recall performance of two sample networks (CORNET-z and ResNet18) is presented. STD values are illustrated with error bars. The recall values for LSF content were higher than HSF content in most CNNs, resembling the trends observed in the IT cortex. c The profiles (left) and face/non-face SI value (right) of a sample network (ResNet18). Profiles are calculated similarly to the IT cortex. CNNs did not replicate the SF-based profiles observed in the IT cortex.

Strength of SF selectivity

To asses the strength of SF selectivity in IT responses, we first ranked the SF content based on the firing rate in each neuron employing half of the trials. Then, the the other half is used to calculate the firing rate of each rank. Results show that the the firing rate of the rank 5 is significantly higher than rank 1 (p-value=4 × 10−4).

SF response distribution

To check the SF response strength, the histogram of IT neuron responses to scrambled, face, and non-face stimuli is illustrated in this Figure. A Gamma distribution is also fitted to each histogram. To calculate the histogram, the neuron response to each unique stimulus is calculated for each neuron in spike/seconds (Hz). In the early phase, T1, the average firing rate to scrambled stimuli is 26.3 Hz which is significantly higher than the response in -50 to 50ms which is 23.4 Hz. In comparison, the mean response to intact face stimuli is 30.5 Hz, while non-face stimuli elicit an average response of 28.8 Hz. Moving to the late phase, T2, the responses to scrambled, face, and object stimuli are 19.5, 19.4, and 22.4 Hz, respectively.

SF profile robustness

Profiles are calculated using half of the trials. Then, the average of the neuron responses in each profile is calculated with the remaining half.

LSF-preferred responses with extended stimulus duration

We conducted the experiments in Appendix 1—Figure 1(e) and Appendix 1—Figure 2(a) with 200ms of stimulus duration with the same method, in 70-170ms after stimulus onset. a The recall of each SF band in the population, as elicited by scrambled stimuli and determined through the LDA method, is presented. The error bars denote the STD. The findings support the LSF-preferred nature of SF decoding observed with 33ms of stimulus duration. b The onset time of recall for each spatial SF band in response to scrambled stimuli is depicted, with error bars representing the STD. The results imply an increasing onset time of decoding as SF values rise, as we observed in 33ms stimulus duration.

The SF and category selectivity of the recorded locations

The accuracy of single neurons for SF prediction(a) and category prediction (b) is illustrated for each recorded location. x-axis and y-axis show anterior-posterior (A/P) or medial-lateral (M/L) hole location and the depth of the electrode in milliliters. A/P ranges from 5 mm (hole number 1) to 30 (hole number 18) mm and M/L ranges from 0 mm (hole number 1) to 23 mm (hole number 18).

Main results for the two monkeys

The recall (a), onset of recall (b) and SI of each profile (c) is illustrated for M1 and M2, respectively. The results are consistent with our observations in Results section.