Neuroscience

When Do Visual Category Representations Emerge in Infants’ Brains?

Xiaoqian Yan author has email address
Sarah Tung
Bella Fascendini
Yulan Diana Chen
Anthony M Norcia
Kalanit Grill-Spector

Department of Psychology, Stanford University, Stanford, USA
Wu Tsai Neurosciences Institute, Stanford University, Stanford, USA
Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China
Neurosciences Program, Stanford University, Stanford, USA

https://doi.org/10.7554/eLife.100260.1

Open access
Copyright information

Figures and data

Experimental design and stimuli analysis.
(A) Example segments of presentation sequences in which faces (top panel) and limbs (bottom panel) were the target category. Images spanning 12° containing gray-level images of items from different categories on a phase scrambled background appeared for 233 ms (frequency: 4.286 Hz). A different exemplar from a single category appeared every 5th image (frequency: 0.857 Hz). Between the target category images, randomly drawn images from the other four categories were presented. Sequences consisted of 20% images from each of the five categories and no images were repeated. Each category condition lasted for 14-s and contained 12 such cycles. Participants viewed in random order 5 category conditions: faces, limbs, corridors, characters, and cars forming a 70-s presentation sequence. (B) Images were controlled for several low-level properties using the SHINE toolbox as explained in ⁸. Metrics are colored by category (see legend). Contrast: mean standard deviation of gray-level values in each image, averaged across 144 images of a category. Luminance: Mean gray-level of each image, averaged across 144 images of a category. Similarity: Mean pixel wise similarity between all pairs of images in a category. For all 3 metrics, boxplots indicate median, 25%, 75% percentiles, range, and outliers. Significant differences between categories are indicated by asterisks, for contrast and luminance (non-parametric permutation t-test p < .05, Bonferroni corrected); for image similarity, all categories are significantly different than others (non-parametric permutation testing, p < 0.05, Bonferroni corrected, except for corridors vs. cars, p = .24). Spatial Frequency: Solid lines: Distribution of spectral amplitude in each frequency averaged across 144 images in each category. Shaded area: standard deviation. Spatial frequency distributions are similar across categories.

Strong visual responses over occipital cortex at the image-update frequency and its harmonics in all age groups.
Each panel shows mean responses across infants in an age group. (A) 3-4-month-olds, n=17; (B) 4-6-month- olds, n=14; (C) 6-8-month-olds, n=15; (D) 12-15-month-olds, n=15. Left panels in each row: spatial distribution of the visual response at the image update frequency and its first three harmonics. Middle panels in each row: mean Fourier Amplitude Spectrum across 9 occipital electrodes of the occipital ROI showing high activity at harmonics of the image update frequency marked out by thicker lines. Data are first averaged in each participant and each condition and then across participants. Error bars: standard error of the mean across participants. Asterisks: Response amplitudes significantly larger than zero, p < .01, FDR corrected. Colored bars: amplitude of response at category frequency and its harmonics. White bars: amplitude of response at noise frequencies. (E) Noise amplitudes in the frequency range up to 8.571 Hz (except for the visual response frequencies and visual category response frequencies) from the amplitude spectra in (A) for each age group (white bars on the spectra). Error bars: standard error of the mean across participants. (F) Mean image-update response over occipital electrodes for each age group. Waveforms are cycle averages over the period of the individual image presentation time (233-ms). Lines: mean response. Shaded areas: standard error of the mean across participants of each group. Horizontal lines colored by age group: significant responses vs. zero (p < .05 with a cluster-based analysis, see Methods). (G) Peak latency for the first peak in the 60- 90 ms interval after stimulus onset. Each dot is a participant; Dots are colored by age group. Line: linear mixed model (LMM) estimate of peak latency as a function of log10(age). Shaded area: 95% confidence interval (CI). (H) Same as (G) but for the second peak in the 90-160 ms interval for 3-4-month-olds, and 90-110 ms for older infants.

Face responses emerge over occipitotemporal electrodes after 4 months of age.
Each panel shows mean responses at the category frequency and its harmonics across infants in an age group. (A) 3-4-month-olds, n=17; (B) 4-6-month-olds; n=14; (C) 6-8-month-olds, n=15; (D) 12-15-month-olds, n=15. Left panels in each row: spatial distribution of response to category frequency at the 0.857 Hz and its first harmonic. Harmonic frequencies are indicted on the top. Right two panels in each row: mean Fourier amplitude spectrum across two ROIS: 7 left and 7 right occipitotemporal electrodes (shown in black on the left panel). Data are first averaged across electrodes in each participant and then across participants. Error bars: standard error of the mean across participants of an age group. Asterisks: significant amplitude vs. zero (p < .05, FDR corrected at 2 levels). Black bars: image frequency and harmonics; Colored bars: category frequency and harmonics. White bars: noise frequencies. Responses for the other categories (limbs, corridors, characters and cars) in Supplementary Figures S5-S8.

Temporal dynamics of category-selective responses as a function of age.
Category-selective responses to (A) faces, (B) limbs, (C) corridors, (D) characters, and (E) cars over left and right occipitotemporal ROI. Data are averaged across electrodes of an ROI and across individuals. Left four panels in each row show the responses in the time domain for the four age groups. Colored lines: mean responses in the right occipitotemporal ROI. Gray lines: mean responses in the left occipitotemporal ROI. Colored horizontal lines above x-axis: significant responses relative to zero for the right OT ROI. Gray horizontal lines above x-axis: significant responses relative to zero for the left OT ROI. Top: 3-D topographies of the spatial distribution of the response to target category stimuli at a 483-500 ms time window after stimulus onset. Right panel in each row: amplitude of the peak deflection defined in a 400-700 ms time interval after stimulus onset. Each dot is a participant; Dots are colored by age group. Red line: linear mixed model (LMM) estimate of peak amplitude as a function of log10(age). Shaded area: 95% CI.

Successful decoding of faces from mean spatiotemporal responses starting from 4 months of age.
(A) An illustration of winner-takes-all leave-one-out-cross validation (LOOCV) classifier from mean spatiotemporal response patterns of each category. Spatiotemporal patterns of response for each category are generated by concatenating the mean time courses from N-1 infants from three ROIs: left occipitotemporal (LOT), occipital (OCC), and right occipitotemporal (ROT). At each iteration, we train the classifier with the mean spatiotemporal patterns of each category from N-1 infants, and test how well it predicts the category the left-out infant is viewing from their spatiotemporal brain response. The winner-take-all (WTA) classifier determines the category based on the training vector that has highest pairwise correlation with the and test vectors. (B) Mean decoding accuracies across all five categories in each age group. Asterisks: significant decoding above chance level (p < .01, Bonferroni corrected, one- tailed). (C) Percentage of infants in each age group we could successfully decode for each category. Dashed lines: chance level.

Individual split-half spatiotemporal pattern analyses reveal category information slowly emerges in the visual cortex after 6 months of age.
(A) Representation similarity matrices (RSM) generated from odd/even split- halves of the spatiotemporal patterns of responses in individual infants. Spatiotemporal patterns for each category are generated by concatenating the mean time courses of each of 23 electrodes across left occipitotemporal (LOT), occipital (OCC), and right occipitotemporal (ROT). (B) Top panel: Category distinctiveness calculated for each infant and category by subtracting the mean between-category correlation values from the within-category correlation value. Bottom panel: Distinctiveness as a function of age; panels by category; Each dot is a participant. Dots are colored by age group. Red line: linear mixed model (LMM) estimates of distinctiveness as a function of log10(age). Shaded area: 95% CI.

Selective responses to items of a category and distinctiveness in distributed patterns develop at different times during the first year of life. Blue arrows: presence of significant mean ROI category- selective responses in lateral occipital ROIs, combining results of analyses in the frequency and time domains. Yellow arrows: presence of significantly above zero distinctiveness in the distributed spatiotemporal response patterns across occipital and lateral occipital electrodes.

Linear mixed models (LMMs)

Demographic Information.

Average number of valid epochs summed across all five categories for each age group before and after data pre-processing.

Mean (±SD) values of contrast, luminance and similarity metrices across images within each five stimuli categories.

Robust visual and categorical responses recorded over occipitotemporal and occipital cortex in 20 adults.
(A) Left panel: spatial distribution of visual response at 4.286 Hz and harmonic. Harmonic frequencies are indicted on the top. Right panel: mean Fourier amplitude spectrum across 14 electrodes in the occipitotemporal and 9 electrodes in the occipital ROIs. Error bars: standard error of the mean across participants. Black bars: image frequency and harmonics; Purple bars: category frequency and harmonics; Asterisks: significant response amplitude from zero at pFDR < .05. (B) Left: spatial distribution of visual responses at time window 145- 155 ms. Right: Mean visual responses over two ROIs in the time domain. Waveforms are shown for a time-window of 233-ms during which 1 image are shown. Shaded area: standard error of the mean across participants. Blank line at around y = -1.5: stimulus onset duration. To define time windows in which amplitudes were significantly different from zero, we used a cluster-based nonparametric permutation t-test (1000 permutations, with a threshold of p < 0.05, two-tailed) on the post-stimulus onset time-points (0-1167 ms) ^6,7.

Adult control: using the same amount of data as infants reveals strong category- selective responses in adults’ occipitotemporal cortex.
(A) Mean Fourier Amplitude Spectrum across 7 (left OT: 57, 58, 59, 63, 64, 65, 68; Right OT: 90, 91, 94, 95, 96, 99) electrodes in bilateral occipitotemporal ROIs. Data are first averaged in each participant and then across 20 participants. Error bars: standard error of the mean across participants. Black bars: visual response at image frequency and harmonics; Colored bars: categorical response at category frequency and harmonics. Asterisks: significant response amplitude from zero at pFDR < .05 for category harmonics. Crosses: significant response amplitude from zero at p < .05 with no FDR correction. Black dots: ROI channels used in analysis. (B) Mean category-selective responses in the time domain. Data are averaged across electrodes of each of the left and right occipitotemporal ROI in each participant and then across participants. Colored lines along x-axis at y = -1.5: significant deflections against zero (calculated with a cluster-based method, see methods part). Black line above x-axis: stimulus onset duration. Bottom panel: spatial distribution of category- selective responses at time window 200-217 ms.

Adult control: using the same amount of data as infants reveals distributed category- selective responses in adults’ occipitotemporal cortex.
(A) Left: An illustration of winner-takes-all leave-one-out- cross-validated classifier (LOOCV) using the spatiotemporal response patterns of each category. Spatiotemporal patterns of response for each category are generated by concatenating the mean time courses from each of the three ROIs: left occipitotemporal (LOT), occipital (OCC), and right occipitotemporal (ROT). At each iteration, we train the classifier with the mean spatiotemporal patterns of each category from N-1 participants, and test how well it predicts the category the left-out participant is viewing from their spatiotemporal brain response. This is a winner-take-all classifier which predicts the category based on the highest pairwise correlation between the training and testing vectors. Right: White: mean decoding accuracy across all five categories. In adults, this is significantly above chance level (t(19) = 15.4, p < .001). Colored: decoding accuracy per category. (B) Left: Average adult representation similarity matrix (RSM) for odd/even splits of spatiotemporal patterns of categorical over 23 electrodes in the LOT, OCC, ROT. RSMs were generated in each participant and then averaged across all participants. Diagonal: correlation of distributed responses across different exemplars of the same category; Off- diagonal: correlations across different exemplars from different categories. Acronyms: F: faces; L: limbs; Corr: corridors; Char: characters; Car: Cars.

Grating acuity as a function of age measured with a swept spatial frequency technique combining EEG.
(A) Acuity growth functions are similar across studies, with acuity increasing from 5-8 cycles per degree (cpd) in 3-month-olds to around 10-16 cpd in 6-month-olds. This figure is adapted from¹⁰.

Peak latency of visual responses by age and time-window. (window 1: 60-90 ms; window 2: 90-160 ms for 3-4-month-olds, and 90-110 ms for other groups) Formula: Peak latency ∼ 1 + log10(Age) × time window + (1|Participant) Significant effects are indicated by asterisks.

Peak latency of visual responses by age at each of the two time- windows
Formula: Peak latency ∼ 1 + log10(Age) + (1|Participant) Significant effects are indicated by asterisks.

Analysis of peak amplitude of visual responses by age and time- window
Formula: Peak amplitude ∼ 1 + log10(Age) × time window + (1|Participant) Significant effects are indicated by an asterisk.

Peak amplitude of visual responses by age at each of the two time- windows
Formula: Peak amplitude ∼ 1 + log10(Age) + (1|Participant) Significant effects are indicated by asterisks.

Limb responses emerge over occipitotemporal electrodes after 6 months of age.
Each panel shows mean responses at the category frequency (0.857 Hz) and its harmonics across infants in an age group. (A) 3-4-month-olds, n=17; (B) 4-6-month-olds; n=14; (C) 6-8-month-olds, n=15; (D) 12-15-month-olds, n=15. Left panels in each row: spatial distribution of categorical response at 0.857 Hz and its first harmonic. Harmonic frequencies are indicted on the top. Right two panels in each row: mean Fourier Amplitude Spectrum across 7 left occipitotemporal electrodes and 7 right occipitotemporal (shown in black on the left panel). Data are first averaged in each participant and then across participants. Error bars: standard error of the mean across participants in an age group. Asterisk: significant amplitude vs. zero (p < .05, FDR corrected). Cross: significant amplitude vs. zero (p < .05, with no FDR correction). Black bars: image frequency and harmonics; Colored bars: category frequency and harmonics.

Corridor responses emerge over occipitotemporal electrodes after 6 months of age.
Each panel shows mean responses at the category frequency and its harmonics across infants in an age group. (A) 3-4-month-olds, n=17; (B) 4-6-month-olds; n=14; (C) 6-8-month-olds, n=15; (D) 12-15-month-olds, n=15. Left panels in each row: spatial distribution of categorical response at 0.857 Hz and its first harmonic. Harmonic frequencies are indicted on the top. Right two panels in each row: mean Fourier Amplitude Spectrum across 7 left occipitotemporal electrodes and 7 right occipitotemporal (shown in black on the left panel). Data are first averaged in each participant and then across participants. Error bars: standard error of the mean across participants in an age group. Asterisks: significant amplitude vs. zero (p < .05, FDR corrected). Crosses: significant amplitude vs. zero (p < .05, with no FDR correction). Black bars: image frequency and harmonics; Colored bars: category frequency and harmonics.

Significant characters responses found over occipitotemporal electrodes at 12-15 months of age.
Each panel shows mean responses at the category frequency (0.857 Hz) and its harmonics across infants in an age group. (A) 3-4-month-olds, n=17; (B) 4-6-month-olds; n=14; (C) 6-8-month-olds, n=15; (D) 12-15- month-olds, n=15. Left panels in each row: spatial distribution of categorical response at 0.857 Hz and its first harmonic. Harmonic frequencies are indicted on the top. Right two panels in each row: mean Fourier Amplitude Spectrum across 7 left occipitotemporal electrodes and 7 right occipitotemporal (shown in black on the left panel). Data are first averaged in each participant and then across participants. Error bars: standard error of the mean across participants in an age group. Asterisks: significant amplitude vs. zero (p < .05, FDR corrected). Black bars: image frequency and harmonics; Colored bars: category frequency and harmonics.

Significant cars responses found over occipitotemporal electrodes at 3-4 months of age.
Each panel shows mean responses at the category frequency and its harmonics across infants in an age group. (A) 3-4-month-olds, n=17; (B) 4-6-month-olds; n=14; (C) 6-8-month-olds, n=15; (D) 12-15-month-olds, n=15. Left panels in each row: spatial distribution of categorical response at 0.857 Hz and its first harmonic. Harmonic frequencies are indicted on the top. Right two panels in each row: mean Fourier Amplitude Spectrum across 7 left occipitotemporal electrodes and 7 right occipitotemporal (shown in black on the left panel). Data are first averaged in each participant and then across participants. Error bars: standard error of the mean across participants in an age group. Asterisk: significant amplitude vs. zero (p < .05, FDR corrected). Cross: significant amplitude vs. zero (p< .05, with no FDR correction). Black bars: image frequency and harmonics; Colored bars: category frequency and harmonics.

Analysis of peak amplitude of waveforms of category responses by age and category.
Separate LMMs were done separately for the left occipitotemporal (OT) and right OT ROIs.
Formula: Peak amplitude ∼ 1 + log10(Age) × category + (1|Participant); Peak latency ∼ 1 + log10(Age) × category + (1|Participant).
Significant effects are indicated by an asterisk.

Analysis of peak amplitude of waveforms of category responses for each category in the right OT ROI.
Formula: Peak amplitude ∼ age + (1|participant) Significant effects are indicated by an asterisk.

Illustration of the winner-takes-all (WTA) classifier.
In each individual, the time series data is split into odd and even trials. We concatenate the time series data from 23 electrodes in the left occipitotemporal, occipital, and right occipitotemporal ROIs in to a pattern vector for each split half and each condition. The classifier is trained on one half of the data (i.e. odd or even trials) and tested on how well it could predict the rest half of the data (i.e. even or odd trials) for each individual. The bottom shows the representation similarity matrix (RSM) in an example infant. Each cell indicates the correlation between distributed responses to different images of the same (on diagonal) or different (off diagonal) categories. F: Faces; L: Limbs; Corr: Corridors; Char: Characters; Car: Cars.

Sign up for email alerts