Abstract
Organizing the continuous stream of visual input into categories like places or faces is important for everyday function and social interactions. However, it is unknown when neural representations of these and other visual categories emerge. Here we used steady state evoked potential electroencephalography to measure cortical responses in infants at 3-4 months, 4-6 months, 6-8 months, and 12-15 months, when they viewed controlled, gray-level images of faces, limbs, corridors, characters, and cars. We found that distinct responses to these categories emerge at different ages. Reliable brain responses to faces emerge first, at 4-6 months, followed by limbs and places around 6-8 months. Between 6-15 months response patterns become more distinct, such that a classifier can decode what an infant is looking at from their brain responses. These findings, have important implications for assessing typical and atypical cortical development as they not only suggest that category representations are learned, but also that representations of categories that may have innate substrates emerge at different times during infancy.
Introduction
Visual categorization is important for everyday activities and is amazingly rapid: adults categorize the visual input in about one-tenth of a second1,2. In adults and school-age children, this key behavior is supported by both clustered and distributed responses to visual categories in high-level visual cortex in ventral and lateral temporal cortex (VTC and LOTC, respectively)3,4. A visual category consists of items that share common visual features and configurations2,5–8; e.g., corridors share features of floors, walls, and ceilings, with a typical spatial relationship. Clustered regions in VTC and LOTC4,9,10 respond more strongly to items of ecologically-relevant categories (faces, bodies, places, words) than other stimuli5,11–16 and distributed neural responses across VTC and LOTC4,9,10 are reliable across items of a category but distinct across items of different categories. However, it is unknown when these visual category representations emerge in infants’ brains.
Behaviorally, infants can perform some level of visual categorization within the first year of life. Measurements of infants’ looking preferences and looking times suggest that visual saliency impacts young infants’ viewing patterns17 : between 4-10 months of age, infants can behaviorally distinguish between faces and objects17,18 and between different animals like cats and dogs19,20. Later on, between 10-19 months, infants behaviorally distinguish broader-level animate versus inanimate categories17. Neurally, electroencephalographic (EEG) studies have found stronger responses to images of faces vs. objects or textures in 4-12-month-olds21–23 and that stimulus category can be decoded from distributed responses slightly but significantly above chance in 6-15-month-olds24,25. Functional magnetic resonance imaging (fMRI) studies have found stronger responses to videos of faces26,27, bodies27, and places26,27 vs. objects in clustered regions in VTC and LOTC of 2-10-month-olds. However, because prior studies used different types of stimuli and age ranges, it is unknown when representations to various categories emerge during the first year of life. To address this key open question, we examined when neural representations to different visual categories emerge during infancy using EEG in infants of 4 age groups spanning 3-15 months of age.
We considered two main hypotheses regarding the developmental trajectories of category representations. One possibility is that representations to multiple categories emerge together because infants need to organize the barrage of visual input to understand what they see. Supporting this hypothesis are findings of (i) selective responses to faces, places, and body parts in VTC and LOTC of 2-10- month-olds27, and (ii) above chance classification of distributed EEG responses to toys, bodies, faces, houses in 6-8 month-olds25 as well as animals and body parts in 12-15-month-olds24.
Another possibility is that representations of different categories may emerge at different times during infancy. This may be due to two reasons. First, representations of ecologically-relevant categories like faces, body parts, and places may be innate because of their evolutionary importance28–31, whereas representations for other categories, may develop later only with learning5,10,32. Supporting this hypothesis are findings that newborns and young infants tend to orient to faces33 and face-like stimuli34, as well as have cortical responses to face-like stimuli35, but word representations only emerge in childhood with the onset of reading instruction5,10,36. Second, even if visual experience is necessary for the development category representations (including faces37–40) categories that are seen more frequently earlier in infancy may develop before others. Measurements using head mounted cameras suggest that infants’ visual diet (composition of visual input) varies across categories and age: The visual diet of 0-3- month-olds contains ∼25% faces and <10% hands, that of 12-15-month-olds contains ∼20% faces and ∼20% hands 41,42, and that of 24-month-olds contains ∼10% faces, and ∼25% hands. Thus, looking behavior in infants predicts that representations of faces may emerge before that of limbs.
Results
45 infants from four age groups: 3-4 months (n = 17, 7 females), 4-6 months (n = 14, 7 females), 6-8 months (n = 15, 6 females), and 12-15 months (n = 15, 4 females) participated in EEG experiments. Twelve participants were part of an ongoing longitudinal study and came for several sessions spanning at least 3 months apart. Infants viewed gray-scale images from five visual categories present in infants’ environments (faces, limbs, corridors, characters, and cars) while EEG was recorded. Different from prior infant studies21,23–27, we used images that have been widely used in fMRI studies5,7,43–45 and are largely controlled for low-level properties such as luminance, contrast, similarity, and spatial frequency (Fig 1B and supplementary Table S3). We use a Steady-State Visual Evoked Potential23,46–48 (SSVEP) paradigm: In each 70-s sequence, images from five categories were shown every 0.233-s; one of the categories was the target, so different images from that category appeared every 1.167-s, and the rest of the images were drawn from the other four categories in a random order (Fig 1A). Images of all categories appeared at equal probability and no images were repeated8. Infants participants in 5 conditions, which varied by the target category. We used the EEG-SSVEP approach because: (i) it affords a high signal-to-noise ratio with short acquisitions making it effective for infants 23,46, (ii) it has been successfully used to study responses to faces in infants23,46,49, and (iii) it enables measuring both general visual response to images by examining responses at the image presentation frequency (4.286 Hz), as well as category-selective responses by examining responses at the category frequency (0.857 Hz, Fig 1A).
As the EEG-SSVEP paradigm is novel and we are restricted in the amount of data we can obtain in infants, we first tested if we can use this paradigm and a similar amount of data to detect category- selective responses in adults. Results show that in adults we can (i) reliably measure category-selective responses (Supplementary Figs S1–S2) and (ii) decode category information from distributed spatiotemporal response patterns (Supplementary Figs S3) with the same amount of data that we obtained in infants. This validates the SSVEP paradigm for measuring category-selectivity. As infants have lower cortical visual acuity, we also tested if the stimuli are distinguishable to infants. Thus, we simulated how they may look to infants by filtering the images to match the cortical acuity of 3-month-olds Supplementary Fig S4). Despite being blurry, images of different categories are readily distinguishable by adults (Supplementary Movies 1-5), suggesting that there is sufficient visual information in the lower spatial frequencies of the stimuli for infants to distinguish visual categories.
Robust visual responses in occipital regions to visual stimuli in all infant age groups
We first tested if there are significant visual responses to our stimuli in infants’ brains by evaluating the amplitude of responses at the image presentation frequency (4.286 Hz) and its first three harmonics. We found that in all age groups, visual responses were concentrated spatially over occipital electrodes (Fig 2A-D, left panel, Figure S1A). Quantification of the mean visual response amplitude over a region of interest (ROI) spanning 9 electrodes over early visual cortex revealed significant responses in all infant age groups at the image frequency and its first 3 harmonics (response amplitudes significantly above zero with FDR corrected at 4 levels; except for the first harmonic at 8.571 Hz in 6-8-month-olds; Fig 2A-D>, right panel). We also tested if experimental noise varied across age groups. Noise level was estimated in the occipital electrodes by measuring the amplitude of response in frequencies up to 8.571 Hz excluding image presentation frequencies (4.286 Hz and harmonics) and category frequencies (0.857 Hz and harmonics) as this frequency range includes the relevant main harmonics. We found no significant difference in noise across age groups (Fig 2E). These analyses indicate that infants were looking at the stimuli as there are significant visual responses even in the youngest 3-4 months-old infants’ and there are no significant differences in noise levels across infants of different ages.
Prior EEG data21,50 suggest that the timing and waveform of visual responses may vary across development. To complement the frequency domain analysis, we transformed the responses at image frequency and its harmonics to the time domain using an inverse Fourier transformation for two reasons. First, the time-domain provides access to information about response timing and waveform that is not directly accessible from an analysis of responses of individual harmonics. Second the total visual response is better reflected in the time-domain as the individual harmonic amplitudes can sum constructively.
We observed that during the 233-ms image presentation, temporal waveforms had two deflections in 3-4-month-olds (one negativity and one positivity, Fig 2F) and four deflections for infants older than 4 months (two minima and two maxima, Fig 2F). To evaluate developmental effects, we examined the latency and amplitude of the peak visual response during two time windows related to the first deflection (60-90 ms), and second deflection (90-160 ms for 3-4-month-olds, and 90-110 ms for other age groups). In general, we find that the latency of the peak deflection decreased from 3 to 15 months (Fig 2G, H). As data includes both cross-sectional and longitudinal measurements and we observed larger development earlier than later in infancy, we used a linear mixed model (LMM) to model peak latency as a function of the logarithm of age (see Methods). Results reveal that the latency of the peak deflection significantly and differentially decreased with age in the two time-windows (βage x time window = -45.78, 95% CI: -58.39 – -33.17, t(118) = -7.19, p = 6.39 × 10-11; LMM with age and time window as fixed effects, and participant as a random effect, all stats in Supplementary Table S4-S5). There were larger decreases in the peak latency in the second than first time window (Fig 2G, H, first: βage = -7.44, 95% CI: -13.82 – -1.06, t(118) = -2.33, pFDR < .05; second: βage = -46.91, 95% CI: -56.56 – -37.27, t(59) = -9.73, pFDR < .001). Peak amplitude also differentially develops across the two windows (βage x time window = -4.90, 95% CI: -8.66 – -1.14, t(118) = -2.58, p = .01, Supplementary Table S6-S7). The decrease in peak amplitude with age was significant only for the second deflection (βage = -3.59, 95% CI: -6.38 – -0.81, t(59) = -2.58, p = .01, LMM). These data suggest that the temporal dynamics of visual responses over occipital cortex develop from 3 to 15 months of age.
What is the nature of category-selective responses in infants?
We next examined if in addition to visual responses to the rapid image stream, there are also category-selective responses in infants, by evaluating the amplitude of responses at the category frequency (0.857 Hz) and its harmonics. This is a selective response as it reflects the relative response to images of category above the general visual response. Fig 3 shows the spatial distribution and amplitude of the mean category response for faces and its harmonics in each age group. Mean category-selective responses to limbs, cars, corridors, and words are shown in Supplementary Figs S5-S8. We analyzed mean responses over two ROIs spanning 7 electrodes each over left and right occipitotemporal cortex where high-level visual regions are located51.
We found significant group-level category responses to some but not all categories and a differential development of category-selective responses during infancy. The largest and earliest developing category- selective responses were to faces. In contrast to visual responses, which were centered over occipital electrodes (Fig 2A-D, left panel), significant categorical responses to faces (at 0.857 Hz and its first harmonic, 1.714 Hz) were observed over lateral occipitotemporal electrodes (Fig 3A-D, left panel). Notably, there were significant responses to faces over bilateral occipitotemporal electrodes in 4-6- month-olds at 0.857 Hz (Fig 3B, response amplitudes significantly above zero with Hotelling’s T2 statistic, pFDR < .05, FDR corrected over 2 levels: the category frequency and its first harmonic), as well as 6-8- month-olds and 12-15-month-olds at the category frequency and its first harmonic (Fig 3C,D, both pFDR
< .05). However, there were no significant responses to faces in 3-4-month-olds at either the category frequency or its harmonics (Fig 3A, right panel). These data suggest that face-selective responses start to reliably emerge over lateral occipitotemporal cortex between 4 and 6 months of age.
We did not find significant group-level category-selective responses that survived FDR correction to any of the other categories before 6 months of age (Supplementary Fig S8, except for a weak but statistically significant response for cars in the right occipitotemporal ROI in 3-4-month-olds). Instead, we found significant category-selective responses that survived FDR correction for (i) limbs in 6-8-month-olds in the right occipitotemporal ROI (Supplementary Fig S5), (ii) corridors in 6-8-month-olds and 12-15 months-old in the left occipitotemporal ROI (Supplementary Fig S6), and (iii) characters in 6-8-month-olds in the right occipitotemporal ROI, and in 12-15-month-olds in bilateral occipitotemporal ROI (Supplementary Fig S7).
We evaluated the temporal dynamics of category-selective waveforms by transforming the data at the category frequency and its harmonics to the time domain. This analysis was done separately for each of the left and right occipitotemporal ROIs for each category and age group. Consistent with frequency domain analyses, average temporal waveforms over lateral-occipital ROIs show significant responses to faces that emerge at ∼4 months of age (Fig 4A, significant responses relative to zero, cluster-based nonparametric permutation 10,000 times, two-tailed t-test, at p < .05). The temporal waveforms of responses to faces in infants show an initial positive deflection peaking ∼500 ms after stimulus onset followed by a negative deflection peaking at ∼900 ms. Notably, mean waveforms associated with limbs, corridors, and characters in lateral occipital-ROIs, are different from faces: there is only a single negative defection which peaks at ∼500 ms after stimulus onset, which is significant only in 6-8 and 12-15-month- olds (Fig 4B-D). There was no significant category response to cars in infants except for a late (∼1000 ms) positive response in 4-6-month-olds (Fig 4E). These results show that both the timing and waveform differ across categories, which suggests that there might be additional category information in the distributed spatiotemporal response.
We next examined the development of the peak response and latency of the category waveforms separately for the right and left lateral occipitotemporal ROIs. We found significant development of the peak response in the right lateral occipitotemporal ROI which varied by category (βcategory x age = -1.09, 95% CI – -2.00 – -0.14, t(301) = -2.26, pFDR < .05, LMM as a function of log (age) and category; participant: random effect). Post-hoc analyses revealed that the peak response for faces significantly increased from 3 to 15 months (Fig 4A-right, βage = 7.27, 95% CI: 4.03 – 10.51, t(59) = 4.50, pFDR < .05, LMM as a function of log(age); participant: random effect) and the peak response for limbs significantly decreased (Fig 4B-right, βage = - 2.90, 95% CI: -5.41 – -0.38, t(59) = -2.31, p = .02, not significant after FDR correction over 5 category levels). There were no other significant developments of peak amplitude (Supplementary Tables S8-S9).
Additionally, for all categories, the latency of the peak response in the right occipitotemporal ROI significantly decreased from 3 to 15 months of age (βage = -173.17, 95% CI: -284.73 – -61.61, t(301) = -3.05, p = .002, LMM as a function of log(age) and category; participant: random effect). We found no significant development of peak latency in the left occipitotemporal ROI (Supplementary Tables S8-9).
Are spatiotemporal patterns of responses to visual categories consistent across infants?
As we observed different mean waveforms over the lateral occipital ROIs for the five categories (Fig 4), we asked whether the distributed spatiotemporal patterns of brain responses evoked by each category are unique and reliable across infants. We reasoned that if different categories generated consistent distributed spatiotemporal responses, an independent classifier would be able to predict the category an infant was viewing from their distributed spatiotemporal pattern of response. Thus, we used a leave-one- out-cross-validation (LOOCV) approach (Fig 5A) and tested if a classifier can decode the category a left- out infant viewed based on the similarity of their distributed spatiotemporal response to the mean response to each of the categories in the remaining N-1 infants. We calculated for each infant the mean category-waveform (same as Fig 4) across the occipital and lateral occipitotemporal ROIs and concatenated the waveforms across the three ROIs to generate the distributed spatiotemporal response to a category (Fig 5A). The classifier was trained and tested separately for each age group.
Results reveal two main findings. First, the LOOCV classifier decoded category information from brain responses significantly above the 20% chance level in infants aged 6 months and older but not in younger infants (Fig 5B, 6-8-month-olds, significant above chance: t(14) = 4.1, pFDR < .01, one-tailed, FDR corrected over 4 age groups; 12-15-month-olds, t(14) = 3.4, pFDR < .01). This suggests that spatiotemporal patterns of responses to different categories become reliable across infants after 6 months of age. Second, examination of classification by category, shows that the LOOCV classifier successfully determined from spatiotemporal responses when infants were viewing faces in 64% of 4-6-month-olds, in 93% of 6-8- month-olds, and 87% of 12-15-month-olds (Fig 5C). In contrast, classification performance was numerically lower for the other categories (successful classification in less than 40% of the infants). This suggests that a reliable spatiotemporal response to faces that is consistent across infants develops after 4 months of age and dominates classification performance.
What is the nature of categorical spatiotemporal patterns in individual infants?
While the prior analyses leverage the power of averaging across electrodes and infants, this averaging does not provide insight to fine-grained neural representations within individual infants. To examine the finer-grain representation of category information within each infant’s brain, we examined the distributed spatiotemporal responses to each category across the 23 electrodes spanning the left and right occipitotemporal cortex in each infant. We tested: (i) if categorical representations in an infant’s brain are reliable across different images of a category, and (ii) if category representations become more distinct during the first year of life. We predicted that if representations become more similar across items of a category and more dissimilar between items of different categories then category distinctiveness (defined as the difference between mean within and between category similarity) would increase from 3 to 15 months of age.
To examine the representational structure, we calculated representation similarity matrices across odd/even split-halves of the data in each infant. Each cell in the RSM quantifies the similarity between two spatiotemporal patterns: On-diagonal cells of the RSM quantify the similarity of distributed spatiotemporal responses to different images from the same category and off-diagonal cells quantify the similarity of spatiotemporal responses to images from different categories. Categorical structure will manifest in RSMs as positive on diagonal values indicating reliable within-category spatiotemporal responses which are higher than off-diagonal between category similarity (Fig 6 and Supplementary Fig S3B).
Examination of mean RSMs in each age group reveals no reliable category information in individuals at 3-4-month-olds or 4-6-month-olds, as within-category similarity is not significantly above zero (Fig 6A, 3-4-month-olds: on-diagonal, -0.03 ± 0.06, p = .96, one-tailed; 4-6-month-olds: on diagonal: 0.009 ± 0.11, p = .38). However, starting around 6 months some category structure emerges in the RSMs. In particular, distributed responses to faces become reliable as within category similarity for faces is significantly above zero in 6-8-month-olds (Fig 6A, 0.31 ± 0.24, t(14) = 5.1, pFDR < .05, FDR corrected over 5 category levels), and stays reliable in 12-15-month-olds (Fig 6A, 0.26 ± 0.24, t(14) = 4.18, pFDR < .05). Distributed responses to limbs become reliable later on as within category similarity for limbs is significantly above zero in 12- 15-months-olds (Fig 6A, 0.11 ± 0.21, t(14) = 1.98, p = 0.03, but not surviving FDR correction at 5 levels).
Next, we evaluated the development of category distinctiveness, which was calculated for each category and infant. Individual infants’ category distinctiveness is shown in Fig 6B (infants ordered by age) and in the scatterplots in Fig 6C. In infants younger than 4 months (120 days) category distinctiveness is largely close to zero or even negative, suggesting no differences between spatiotemporal responses to one category vs another. Category distinctiveness increases with age and becomes more positive from 84 to 445 days of age (Fig 6B,C). The biggest increase is for faces where after ∼6 months of age (194 days) face distinctiveness is consistently positive in individual infants (13/15 infants aged 194-364 days and 12/15 infants aged 365-445 days). The increase in distinctiveness is more modest for other categories and appears later in development. For example, positive distinctiveness for limbs and cars in individual infants is consistently observed after 12 months of age (Fig 6B,C; limbs: 9/15 infants aged 365-445 days vs 5/15 infants aged 194-364 days; cars: 12/15 365-445 days vs 7/15 194-364 days).
Using LMMs we determined if distinctiveness significantly changed with age (log transformed) and category (participant, random factor). Results indicate that category distinctiveness significantly increased from 3 to 15 months of age (βage = 0.77, 95% CI: 0.54 –1.00, t(301) = 6.62, p = 1.67×10-10), and further that development significantly varies across categories (βage x category = -0.13, 95% CI: -0.2 – -0.06, t(301) = -3.61, p = 3.5×10-4; main effect of category, βcategory = 0.27, 95% CI: 0.11 – 0.43, t(301) = 3.38, p = 8.2×10-4). Post- hoc analyses for each category (Fig 6C) reveal that distinctiveness significantly increased with age for faces (βage = 0.9, 95% CI: 0.6 – 1.1, t(59) = 6.8, pFDR < .001), limbs (βage = 0.4, 95% CI: 0.2 – 0.6, t(59) = 5.0, pFDR < .001), characters (βage = 0.2, 95% CI: 0.02 – 0.3, t(59) = 2.2, pFDR < .05), and cars (βage = 0.4, 95% CI: 0.2 – 0.5, t(59) = 3.7, pFDR < .001). Post-hoc t-tests show that for faces, the category distinctiveness is significantly above zero after 6 months of age (6-8-month-olds, t(14) = 6.73, pFDR < .05; 12-15-month-olds, t(14) = 5.30, pFDR < .05) and for limbs and cars at 12-15 months of age (limbs: t(14) = 2.19, pFDR < .05; cars: t(14) = 4.53, pFDR < .05).
This suggests that category distinctiveness slowly emerges in the visual cortex of infants from 3 to 15 months of age, with the largest and fastest development for faces.
Discussion
We find that both selective responses to items of a category over others across lateral occipital ROIs and the distinctiveness of distributed visual category representations progressively develop from 3 to 15 months of age. Notably, we find a differential development of category-selective responses (Fig 7), whereby responses to faces emerge the earliest, at 4-6 months of age and continue to develop through the first year of life. Category-selective responses to limbs, corridors, and characters follow, emerging at 6-8 months of age. Our analysis of the distinctiveness of the distributed spatiotemporal patterns to each category also find that distributed representations to faces become more robust in 6-8-month-olds and remain robust in 12-15-month-olds. While the distinctiveness of distributed patterns to limbs and cars only become reliable at 12-15 months of age. Together these data suggest a rethinking of the development of category representations during infancy as they not only suggest that category representations are learned, but also that representations of categories that may have innate substrates such as faces, bodies, and places emerge at different times during infancy.
Reliable category representations start to emerge at 4 months of age
While 3-4 months old infants have significant and reliable evoked visual responses over early visual cortex, we find no reliable category representations of faces, limbs, corridors, or characters in these young infants. Both analyses of average responses across lateral occipital ROIs and analyses of distributed spatiotemporal responses across visual cortex find no reliable category representations in 3-4-month-olds, either when examining mean response across an ROI or in distributed spatiotemporal patterns across visual cortex. The earliest categorical responses we find are for faces, and they emerge at 4-6 months of age.
Is it possible that there are some category representations in 3-4-month-olds, but we lack the sensitivity to measure them? We believe this is unlikely, because (i) we can measure significant visual responses from the same 3-4-month-olds, (ii) with the same amount of data, we can measure category selective responses and decode category information from distributed spatiotemporal responses in infants older than 4 months and in adults.
Ours findings together with a recent fMRI study in 2-10-month-olds27 provide accumulating evidence for multiple visual categories representations in infants’ brains before the age of one. However, there are also differences across studies. The earliest we could find reliable group-level category-selective responses for faces was 4-6-month-olds and for limbs and corridors only after 6 months of age. In contrast, Kosakowski and colleagues27 report category-selective responses to faces, bodies, and scenes in example 4-5-months-olds. Group average data in their study found significant face- and place-selective responses in infants’ ventral temporal cortex (VTC) but not in lateral occipital cortex (LOTC), and significant body- selective responses in LOTC, but not VTC. Because Kosakowski et al.27 report group-averaged data across infants spanning 8 months, their study does not provide insights to the time course of this development. We note that, the studies differ in both measurement modalities (EEG/fMRI) and in the types of stimuli infants viewed. In27 infants viewed isolated, colored, and moving stimuli, but in our study, infants viewed still, gray-level images on phase-scrambled backgrounds, which were controlled for several low level properties. Thus, future research is necessary to determine whether differences between findings are due to differences in measurement modalities or differences in stimulus format.
Face representations emerge around 4-6 months of age
Recognizing faces (e.g., a caregiver’s face) is crucial for infant’s daily lives. Converging evidence from many studies suggest that infants have significant and reliable face-selective neural responses at 4-6 months of age22,23,26,52,53. While some studies report responses to face-like (high contrast paddle-like) stimuli in newborns34,35,54 and significant visual evoked responses to faces in 3-month-olds55–58, these studies have largely compared responses to an isolated faces vs. another isolated object. In contrast, we do not find reliable face-selective responses (Fig 3 & 4) or reliable distributed representations (Fig 5 & 6) to faces in 3-4-month-olds when responses to faces are contrasted to many other items and when stimuli are shown on a background rather than in isolation. Our findings are consistent with longitudinal research in macaques showing that robust cortical selectivity to faces takes several months to emerge39 and support the hypothesis that experience with faces is necessary for the development of cortical face selectivity37,39,40.
Our data also reveal that face-selective responses and distributed representations to faces become more robust in 6-8-month-olds and remain robust in 12-15-month-olds. For example, successful decoding of faces in the group level was observed in 80% of individual infants based on several minutes of EEG data. Reliable distributed spatiotemporal responses to different images of faces become significantly different from responses to images from different categories. This robust decoding has important clinical ramifications as it may serve as an early biomarker for cortical face processing, which is important for early detection of social and cognitive developmental disorders such as Autism59,60 and Williams Syndrome61. Future research is necessary for elucidating the relationship between the development of brain responses to faces to infant behavior. For example, it is interesting that at 6 months of age, when we find robust face representations, infants also start to exhibit recognition of familiar faces (like parents) and stranger anxiety62.
One fascinating aspect of the development of cortical face selectivity is that among the categories we tested, selectivity to faces seems to emerge the earliest at around 4 months of age, yet the development of selectivity and distributed representations to faces is protracted compared to objects and places14,63,15. Indeed, in both our data and prior work, face-selective responses and distributed representations to faces in infants are immature compared to adults21,25, and a large body of work has shown that face selectivity5,14,15,50,63–65 and distributed representations to faces 10 continue to develop during childhood and adolescence. This suggests that not only experience during infancy but also life-long experience with faces, sculpts cortical face selectivity. We speculate that the extended cortical plasticity for faces may be due to both the expansion of social circles (family, friends and acquaintances) across the lifespans and also the changing statistics of faces we socialize with (e.g., child and adult faces have different appearance).
New insight about development: different category representations emerge at different times
To our knowledge, our study is the first to examine the development of both ROI level and spatiotemporal distributed responses in infants across the first year of life. We note that both analyses find that category information to faces develops before other categories. However, there are also some differences across analyses (Fig 7). For example, for limbs and corridors we find significant category- selective responses at the ROI level in lateral occipitotemporal ROIs starting at 6-8 months but no reliable distinct distributed responses across visual cortex at this age. In contrast, for cars, we find an opposite pattern where there is a distinct spatiotemporal pattern in 12-15 month-olds even as there is no significant car-selective response in the ROI level. As these approaches have different senstivities, they reveal insights to the nature of the underlying representations. For example, as visible in Fig 4, limbs and corridor have a clear category selective waveform in both in 6-8 and 12-15-months-olds, but the waveform of limbs and its spatial distribution is not that different from that to corridors, which may explain why distinctiveness of spatitemporal patterns for limbs is low in 6-8-moths old (Fig 6). Likewise, even as there is no signifincat response for cars (Fig 4e), its spatiotemporal pattern is consistently different than for other categories giving rise to a distinctive spatiotemporal response by 12 months (Fig 6).
In sum, the key finding from our study is that the development of category selectivity during infancy is non-uniform: face-selective responses and representations of distributed patterns develop before representations to limbs and other categories. We hypothesize that this differential development of visual category representations may be due to differential visual experience with these categories during infancy. This hypothesis is consistent with behavioral research using head-mounted cameras that revealed that the visual input during early infancy is dense with faces, while hands become more prevalent in the visual input later in development and especially when in contact with objects41,42. Additionally, a large body of research has suggested that young infants preferentially look at faces and face-like stimuli17,18,33,34, as well as look longer at faces than other objects41, indicating that not only the prevalence of faces in babies’ environments but also longer looking times may drive the early development of face representations. Future studies that examine the visual diet, looking behavior, and brain development using additional behaviorally relevant categories such as food66–68 can test how environmental and experiential differences may influence infants’ developmental trajectories
Together our findings not only suggest that visual experience is necessary for the development of visual category representations, including faces, but also necessitate a rethinking of how visual category representations develop in infancy. Moreover, this differential development during infancy is evident even for categories that have evolutionary importance and may have innate substrates such as faces, bodies, and places28–31. Finally, our findings have important ramifications for theoretical and computational models of visual development as well as for the assessment of atypical infant development.
Methods
Participants
Ethical permission for the study was obtained from the Institutional Review Board of Stanford University. Parents of the infant participants provided written informed consent prior to their first visit and also prior to each session if they came for multiple sessions. Participants were paid 20$/hour for participation. Participants were recruited via ads on social media (Facebook and Instagram).
Sixty-two full-term, typically developing infants were recruited. Twelve participants were part of an ongoing longitudinal study and came for several sessions spanning ∼3 months apart (seven 3-4-month- olds, three 4-6-month-olds, eight 6-8-month-olds, and twelve 12-15-month-olds). Data from nineteen infants (nine 3-4-month-olds, six 4-6-month-olds, and eight 6-8-month-olds; among whom seven were longitudinal) were acquired in two visits within a two-week span to obtain a sufficient number of valid data epochs. Supplementary Table S1 contains participants’ demographic information (sex and race). The youngest infants were 3 months old, as the EEG setup requires the infants to be able to hold their head and look at the screen in front of them. 23 adults (14 females) also participated in the study. All participants had normal/corrected-to-normal vision and provided written informed consent.
Data exclusion criteria: We excluded participants who had less than 20 valid epochs (1.1667-s/epoch) per category, had noise/muscular artifacts during the EEG recordings, couldn’t record data, or had no visual responses over the occipital electrodes. As such, we excluded (1) five infants due to an insufficient number of epochs, (2) two infants who had no visual responses, (3) ten infants due to technical issues during data collection, and (4) three adults due to excessive noise/muscular artifacts during EEG. In total, we report data from 45 infants (Supplementary Table S1) and 20 adults (13 females, 19-38 years) that met inclusion criteria.
Visual stimuli
Natural grayscale images of adult faces, limbs, corridors, characters, and cars are used as stimuli, with 144 images per category from the fLOC image database (https://github.com/VPNL/fLoc) 8. The size, view, and retinal position of the items varied, and the items were overlaid on phase-scrambled backgrounds that were generated from a randomly drawn image in the stimulus set. The images were also controlled for multiple low-level differences between stimuli of different categories including their luminance, contrast, similarity and spatial frequency power distributions using the SHINE toolbox 69. As only five of ten categories from 8 were used, we evaluated the stimuli used in our experiments to test if they differed in (i) contrast, (ii) luminance, (iii) similarity, and (iv) spatial frequency. Results show that categories were largely matched on most metrics (Fig 1B and Supplementary Materials). The stimuli were presented on a gamma-corrected OLED monitor screen (SONY PVM-2451; SONY Corporation, Tokyo Japan) at a screen resolution of 1920 × 1080 pixels and a monitor refresh rate of 60 Hz. When viewed from 70 cm away, each image extended a visual angle of approximately 12°.
EEG Protocol
The experiments were conducted in a calm, dimly illuminated lab room. Stimuli were presented using custom stimulus presentation software with millisecond timing precision. During testing, infant participants were seated on their parent’s laps in front of the screen at a distance of 70 cm. One experimenter stood behind the presentation screen to monitor where the infant was looking. The visual presentation was paused if the infant looked away from the screen and was continued when the infant looked again at the center of the screen. To motivate infants to fixate and look at the screen, we presented at the center of the screen small (∼1°) colored cartoon images such as butterflies, flowers, ladybugs. For adults, we used a fixation cross of the same size instead of the cartoons and asked the participants to fixate and indicate when the fixation’s color changed by pressing a space bar key on a keyboard. EEG measurements for infant participants continued until the infant no longer attended the screen and we obtained between 2-12 different 70-s sequences per individual. For adult participants, we acquired 12 sequences per individual.
A frequency-tagging paradigm 23,46 was used to measure brain responses. In the experiment, randomly selected images from 5 categories were presented sequentially at a rate of 4.286 Hz (∼233 ms per image) with no inter stimulus interval during each 70-s sequence. For each condition, one category was determined as the target category; for this category random selected images from that category were presented first and followed by four images randomly drawn from the other four categories with no regular order (Fig 1A). The target images are therefore presented periodically at 0.857 Hz (i.e. 4.286 Hz/5), but the intervals between sequential presentations of images from the other 4 categories was not periodic. The probability of image occurrences across categories was equal at 20%. The experiment had five conditions one for each of the following target categories: faces, limbs, corridors, characters, and cars. Each 70-s experimental sequence was composed of five 14-s long conditions which included a 1.1667-s stimulus fade-in and a 1.1667-s stimulus fade-out. The order of the category conditions was randomized within each 70-s sequence. No image was repeated within a sequence. Two presentation frequencies were embedded in the experiment: (i) the image frequency (4.286 Hz), which is predicted to elicit visual responses to all stimuli over occipital visual cortex, and, (ii) the category frequency (0.857 Hz), which is predicted to elicit a category-selective response over lateral occipital-temporal electrodes.
EEG Acquisition
EEG data were recorded at 500 Hz from a 128-channel EGI High-Density Geodesic Sensor Net. For infants, the net was connected to a NetAmps 300 amplifier (Electrical Geodesics, Inc., Eugene, OR, USA). For the adults, the net was connected to a NetAmps400 amplifier. The EEG recording was referenced online to a single vertex (electrode Cz) and the channel impedance was kept below 50 KΩ.
Pre-processing
EEG recordings were down-sampled to 420 Hz and were filtered using a 0.03-50 Hz bandpass filter with custom signal processing software. For infants, channels with more than 20% of samples exceeding a 100-150 μV amplitude threshold were replaced with the average amplitude of its six nearest-neighbor channels. The continuous EEG signals were then re-referenced to the common average of all channels and segmented into 1166.7-ms epochs (i.e., duration of five stimuli starting with a target category image followed with four images drawn from the rest four categories). Epochs with more than 15% of time samples exceeding threshold (150 - 200 μV) were excluded further on a channel-by-channel basis 70. For adults, the two-step artifact rejection was performed with different criteria as EEG response amplitudes are lower in adults than infants 70. EEG channels with more than 15% of samples exceeding a 30 μV amplitude threshold were replaced by the average value of their neighboring channels. Then the EEG signals were re-referenced to the common average of all channels and segmented into 1.1667-s epochs. Epochs with more than 10% of time samples exceeding threshold (30 - 80 μV) were excluded on a channel- by-channel basis 71.
Supplementary Table S2. shows the number of epochs (1.1667-s each) we acquired before and after data pre-processing summing across all five categories. We used data after pre-processing for further analyses. There was no significant difference in the number of pre-processed epochs across infant age- groups (F(3,57) = 1.5, p = .2).
Univariate EEG analyses
Both image update and categorical EEG visual responses are reported in the frequency and time domain over three regions-of-interest (ROIs): two occipito-temporal ROIs (left occipitotemporal (LOT): channels 57, 58, 59, 63, 64, 65 and 68; right occipitotemporal (ROT) channels: 90, 91, 94, 95, 96, 99, and 100) and one occipital ROI (channels 69, 70, 71, 74, 75, 76, 82, 83 and 89). These ROIs were selected a priori based on a previously published study 51. We further removed several channels in these ROIs for two reasons: (1) Three outer rim channels (i.e., 73, 81, and 88) were not included in the occipital ROI for further data analysis for both infant and adult participants because they were consistently noisy. (2) Three channels (66, 72, and 84) in the occipital ROI, one channel (50) in the LOT ROI, and one channel (101) in the ROT ROI were removed because they did do not show substantial responses in the group-level analyses.
Frequency domain analysis
Individual participant’s preprocessed EEG signals for each stimulus condition were averaged over two consecutive epochs (2.3334-s). The averaged time courses for each participant were then converted to the frequency domain at a frequency resolution of 0.4286 Hz via a Discrete Fourier Transform (DFT). The frequency bins of interest are at exactly every other bin in the frequency spectrum. The real and imaginary Fourier coefficients for each of the categorical and image update responses for each condition were averaged across participants (vector averaging) to obtain a group-level estimates. The amplitudes of response were then computed from the coherently averaged vector. Hotelling’s T2 statistic 72 was used to test whether response amplitudes were significantly different from zero. We used Benjamini’s & Hochberg’s false discovery rate (FDR) procedure to correct for multiple comparisons.
Image-update visual responses (image frequency): The amplitude and phase of the evoked response at the image presentation rate and its first 3 harmonics (4.286 Hz, 8.571 Hz, 12.857 Hz, and 17.143 Hz).
Categorical responses (category frequency): The amplitude and phase of the response at the category repetition frequency and its second harmonic (0.857 Hz, 1.714 Hz) for each category condition.
Time domain analyses
Preprocessed time domain EEG signals of each participant were low-passed filtered with a 30Hz cut-off. The raw EEG signals contain many components including categorical responses 0.857 Hz and harmonics), general visual responses (4.286 Hz and harmonics) and noise. To separate out the temporal waveforms of these two responses, we first transformed the epoch-averaged (2.3334-s) time series of each condition into frequency domain using a DFT. Then, we used an inverse DFT to transform back to the time domain keeping only the responses at the category frequency and its harmonics, zeroing the other frequencies. The same approach was used to separate the visual responses by using an inverse DFT transform of the responses at 4.286 Hz and harmonics.
Categorical responses
We kept responses at frequencies of interest (0.857 Hz and its harmonics up to 30 Hz, excluding the harmonic frequencies that overlapped with the image frequency and its harmonics) and zeroed responses in other frequencies. Then we applied an inverse Fourier transform to transform the data to the time domain. We further segmented the time series into 1.1667-s epochs and averaged across these epochs for each condition and individual. The mean and standard error across participants were computed for each condition at each time-point.
Image update visual responses
A similar procedure was performed except that frequencies of interest are 4.286 Hz and its harmonics, and the rest were zeroed. As temporal waveforms for image- update responses were similar across different category conditions, we averaged waveforms across all five conditions and report the mean response (Fig 2).
Statistical analyses
To determine time windows in which amplitudes were significantly different from zero for each condition, we used a cluster-based nonparametric permutation t-test, 10000 permutations, with a threshold of p < 0.05, two-tailed) on the post-stimulus onset time points (0-1167 ms) 73,74. The null hypothesis is that the evoked waveforms are not different from zero at any time point. For each permutation, we assigned random signs to the data of individual participants and computed the group-level difference against zero using a t-test. We then calculated the cluster-level statistic as the sum of t-values in the consecutive time points with p-values less than 0.05 75. We calculated the maximum cluster-level statistic for each permutation to generate a non-parametric reference distribution of cluster- level statistics. We rejected the null hypothesis if the cluster-level statistic for any consecutive time points in the original data was larger than 97.5% or smaller than 2.5% of the values in the null distribution.
Decoding analyses
Group-level
We used a leave-one-out-cross-validation (LOOCV) classifier to test if spatiotemporal responses patterns to each of the five categories were reliable across participants. The classifier was trained on averaged data from N-1 participants and tested on how well it predicted the category the left- out participant was viewing from their brain activations. This procedure was repeated for each left-out participant. We calculated the averaged category temporal waveform for each category across channels of our three ROIs: 7 LOT, 9 occipital, and 7 ROT, as the exact location of the channels varies across individuals. Then we concatenated the waveform from the three ROIs to form a spatiotemporal response vector. At each iteration, the LOOCV classifier computed the correlation between each of the five category vectors from the left-out participant (test data, for an unknown stimulus) and each of the mean spatiotemporal vectors across the N-1 participants (training data, labeled data). The winner-take-all (WTA) classifier classifies the test vector to the category of the training vector that yields the highest correlation with the test vector. We computed group mean decoding performance across all N iterations for each category, and the group mean decoding accuracies across five categories.
Individual level
Similar to group level with 2 differences: (i) All analyses were done within an individual using a split-half approach. That is, the classifier was trained on one half of the data (i.e., odd or even trials) and tested on the other half of the data. (ii) Spatiotemporal patterns for each category used the concatenated waveforms across 23 channels spanning the occipital and bilateral occipitotemporal ROIs.
Category distinctiveness10
Category distinctiveness is defined as the difference between the similarity (correlation coefficient) of spatiotemporal responses within a category across odd and even splits using different image and the average between-category similarity of spatiotemporal responses across odd and even splits 10. Distinctiveness is higher when the within category similarity is positive and the between category similarity is negative and varies from -2 (no category information) to 2 (maximal category information). We computed category distinctiveness for each of the 5 categories as in each infant and determined if it varied from 3 to 15 months of age.
Statistical Analyses of Developmental Effects
To quantify developmental effects, we used linear mixed models (LMMs, 76 with the ‘fitlme’ function in MATLAB version 2021b (MathWorks, Inc.). LMMs allow explicit modeling of both within-subject effects (e.g., longitudinal measurements) and between-subject effects (e.g., cross-sectional data) with unequal number of points per participants, as well as examine main and interactive effects of both continuous (age) and categorical (e.g., stimulus category) variables. We used random-intercept models that allow the intercept to vary across participants (term: 1|participant). In all LMMs, we measured development as a function of log 10(age in days) as development is typically faster earlier on. Indirect evidence comes from neuroimaging and post-mortem studies showing that the structural development of infants’ brain is nonlinear, with development in the first two years being rapid, especially the first year 77.
We report slope (rate of development), interaction effects, and their significance. Table 1 summarizes LMMs used in this study.
Analysis of Noise
To test whether EEG noise levels vary with age, for example whether noise in the EEG data is larger in the younger infants than older ones, we quantified the response amplitudes in the occipital ROI in the frequency domain, at frequency bins next to the category and image frequency bins (white bars in Fig 2A- D, right panel). The noise level was quantified as the amplitude of response up to 8.571 Hz excluding image presentation frequencies (4.286 Hz and harmonics) and category frequencies (0.857 Hz and harmonics) as this frequency range includes the relevant main harmonics (Fig 2E). We used a LMM to test if noise varies with age, with participant as a random effect:
Noise amplitude ∼ 1 + log10(age in days) + (1|participant)
We found no significant differences in noise amplitude across infant age groups (Fig 2E, mean amplitude across the first five noise bins: βage = -.005, 95% CI: -.12 – .11, t(59) = -.09, p = .93; mean noise across the first 10 bins: βage = .04, 95% CI: -.03 – .12, t(59) = 1.16, p = .25, linear mixed models, LMMs, with age (log transformed) as fixed effects and participant as a random effect).
Data availability
Individual EEG data, and code for all analyses are available on Github: https://github.com/VPNL/InfantObjectCategorization.
Acknowledgements
This work was supported by grants from the Wu Tsai Neurosciences Institute of Stanford University, the Human Centered Artificial Intelligence Institute of Stanford University to KGS and AMN.
Competing interests
The authors declare no competing interests.
Supplementary Materials
1. Demographic information
2. Low-level image properties analyses
As images of items of visual categories vary both on low-level and high-level properties, and both the EEG signal and babies’ attention may be affected by low-level properties like contrast or luminance, it is important to control for several low-level factors such as luminance, contrast, similarity spatial frequency, as well as high-level properties such as familiarity across categories. Here we used a subset of categories from1, that did not contain familiar stimuli to our participants and were controlled for several low-level factors using the SHINE toolbox 2. To validate the effect of employing SHINE, we evaluated several low-level metrics and tested if they differed across categories: (i) Luminance: mean gray level of each image; (ii) Contrast: mean standard deviation of gray level values of each image; (iii) Similarity: mean pixel-wise similarity between gray levels across pairs of images of a category, and (iv) Spatial frequency distribution: we transformed each image to the Fourier Domain using FFT and calculated its circular amplitude spectrum. We calculated these metrics for each image, then tested if contrast, luminance, similarity metrics were significantly different across categories using pair-wise non- parametric permutation t-tests (10,000 times, with Bonferroni correction). While it is difficult to completely control multiple low-level metrics across categories2, our analyses show that images are largely controlled for several low-level metrics across categories, and that medians, means, and ranges are matched across categories (Fig 1B; supplementary Table 3). With respect to differences in contrast and luminance, there are no significant differences across categories except that limbs have slightly but significantly lower luminance (ps < .05, Bonferroni corrected) and lower pairwise similarity (ps < .01) and higher contrast (ps < .05, except for corridors) than other categories. For similarity, the medians and ranges are similar across categories, but the outliers vary producing significant differences across categories (permutation tests, p < .05, Bonferroni corrected). For spatial frequency distribution, the means and standard deviations are similar across categories (Fig 1B-right). We further ran Kolmogrov- Smirnov tests for each pair of categories with Bonferroni correction, our results revealed significantly different distributions across category pairs (ps < .001, except for limbs and corridors).
3. Analyses of adult data
In the study, one concern is that the amount of empirical data typically collected in infants is less than in adults, which may compromise the experimental power to detect responses in infants as it may affect the noise-bandwidth (sensitivity) of the frequency tagging analysis. Thus, we tested whether it is possible to measure category-selective responses in 20 adults using the same experiment (Fig 1) and same amount of data collected in infants using three types of analyses. By using the same number of trials in adults as those obtained from our infant participants for data analyses, our goal was to test that we had sufficient power to detect categorical responses from infants using the experimental paradigm. We expect temporal and amplitude differences between adults and infants3 as infants have immature brains and skulls. For example, cortical gyrification, which determines the orientation of the electrical fields generated in certain part of the brain region on the scalp, still undergoes development during the first two years of infants’ lives4. Second, adults’ skulls are thicker and have lower conductivity than infants’ skulls, thus electrical signals on their scalp are lower than infants5. Nonetheless, we tested if we could in principle detect category information in adults with the same amount of data as infants. We reasoned that if category information can be detected in adults and signals are stronger in infants then we should have the power to detect category information in infants.
Visual Responses
We averaged the preprocessed data across two consecutive epochs (2.3334-s) and across conditions for each individual to measure visual responses to all images that were presented at 4.286 Hz and then transformed the data to the frequency domain. Supplementary Fig S1A shows the three-D topographies of group-averaged responses at 4.286 Hz and its first 3 harmonics (left panel) and the group-averaged Fourier amplitude spectrum over middle occipital and occipitotemporal ROIs (right panel). We used Hotelling’s T2 statistic to test the amplitude significance and found significant visual responses in adults at 4.286 Hz over the occipital ROI containing 9 electrodes over early visual cortex (p < .05, FDR corrected at 4 levels) and significant visual responses at 4.286 Hz and its first two harmonics over the occipitotemporal ROI (14 electrodes) (ps < .05).
Examining the temporal waveform of the visual response (filtered at the image frequency and its harmonics), we found a relatively slow waveform with one peak, after 100 ms since stimulus onset (Supplementary Fig S1B).
Visual Category responses
As adults have mature category-selective regions, we next tested if using our SSVEP paradigm and the same amount of data as in infants, we identify significant category- selective responses in adults. First, we analyzed the mean response at the category frequency and its harmonics in lateral occipitotemporal regions of interest (ROIs). This is a selective response as it reflects the relative response to the target category (generalized across exemplars) above the general visual response to images of other categories. Despite lower response amplitudes in adults than infants, using the same amount of data as infants, adults show significant category-selective responses to each of these five categories (Supplementary Fig S2A for all categories). Supplementary Fig S2A shows the group-averaged Fourier amplitude spectrum over left and right occipitotemporal ROIs (top panels) and the three-D topographies of group-averaged responses at 0.857 Hz and its first harmonic (bottom panel). We used Hotelling’s T2 statistic to test the amplitude significance and found significant above- zero category-selective responses in adults at 0.857Hz and its multiple harmonics over bilateral occipitotemporal ROIs containing 7 electrodes each (ps < .05, FDR corrected over 8 levels).
By transforming the filtered data at the category frequency and its harmonics to examine the temporal waveform of each category. Notably, each category generates a unique topography (bottom panels) at 200-217 ms after stimulus onset (Supplementary Fig 2B). For faces, we found an early negative deflection peaking ∼200 ms after stimulus onset in both the left and right occipitotemporal ROIs, with the right occipitotemporal ROI showing a numerically larger mean response amplitude than the left (Supplementary Fig S2B-faces). Similarly, we found an early negative peak at around 200 ms for characters, and a left hemisphere dominance (at the 162-248 ms and 398-474 ms time windows (Supplementary Fig S2B-characters). For both limbs and corridors, there was an early positive waveform peaking at around 200 ms in both hemispheres with no hemispheric differences.
Second, we examined in the same 20 adults whether the spatiotemporal pattern of brain responses evoked by different visual categories are distinct from one another and are reliable across participants, using a leave-one-out-cross-validation (LOOCV) classifier approach with spatiotemporal timeseries concatenated with mean temporal waveforms from 3 ROIs: (Supplementary Fig 3A-left). Mean LOOCV classification performance was around 80% and significantly above chance in adults (Supplementary Fig 3A-right). This classification was associated with correct decoding of all 5 categories from distributed responses in the majority of participants.
Third, we examined in the same 20 adults if there is reliable category information in spatiotemporal distributed in each individual. Distributed spatiotemporal responses were measured in individual participants over 23 occipital and lateral occipital ROIs using split half of data. We computed the representations similarity matrix (RSM) across split halves and then calculated category distinctiveness by subtracting mean between category similarity from within category similarity for each category.
Mean RSM across 20 adults shows that spatiotemporal patterns are more similar across items of a category than across items of different categories (Supplementary Fig 3B-left), and the category distinctiveness scores are significantly above zero for all five categories: faces, t(19) = 11.18, pFDR < .05; limbs, t(19) = 5.01, pFDR < .05; corridors, t(19) = 5.58, pFDR < .05; characters, t(19) = 6.56, pFDR < .05; and cars, t(19) = 10.59, pFDR < .05.
Together, these analyses suggest that the experimental paradigm has sufficient power to identify category representations both at the group and individual levels.
4. Validation of experimental paradigm
As visual acuity develops during the first year of life8,9 (Supplementary Fig S4), one concern is that our controlled natural, gray-level stimuli may not be distinguishable to infants. Measurements of visual evoked potentials8,9 suggest that visual acuity in 3-month-olds is around 5–8 cycles per degree (cpd) and in 6-month-olds around 10–16 cpd (Supplementary Fig S4). Thus, to simulate how our images may appear to infants, we filtered all images at 5 cpd. Despite being blurry, images of different categories are distinguishable and individual items retain their identity by visualization (Supplementary Movies 1-5).
5. Linear mixed model (LMM) analyses of visual responses (associated with Figure 2)
6. Frequency domain analyses of infants’ categorical responses to limbs, corridors, characters and cars (associated with Fig 3)
Fig 3 shows the group-averaged categorical response to faces. Supplementary Figs S5-8 show group-averaged categorical responses to the rest four conditions other than faces in four age groups. We found significant responses in 6-8-month-olds for limbs, corridors, and characters, and in 12-15 months for corridors and characters.
7. Linear mixed model (LMM) analyses of category-selective responses
8. Individual level decoding analysis
References
- 1.Speed of processing in the human visual systemNature 381:520–522
- 2.Visual Recognition: As Soon as You Know It Is there, You Know What It IsPsychological Science 16
- 3.The functional architecture of the ventral temporal cortex and its role in categorizationNat Rev Neurosci 15:536–548
- 4.Task alters category representations in prefrontal but not high-level visual cortexNeuroimage 155:437–449
- 5.Cortical recycling in high-level visual cortex during childhood developmentNat Hum Behav 5:1686–1697
- 6.Ultra-high-resolution fMRI of Human Ventral Temporal Cortex Reveals Differential Representation of Categories and DomainsJ. Neurosci 40:3008–3024
- 7.Microstructural proliferation in human cortex is coupled with the development of face processingScience 355:68–71
- 8.Temporal Processing Capacity in High-Level Visual Cortex Is Domain SpecificJournal of Neuroscience 35:12412–12424
- 9.Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal CortexScience 293:2425–2430
- 10.Longitudinal development of category representations in ventral temporal cortex predicts word and face recognitionNat Commun 14
- 11.The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face PerceptionThe Journal of Neuroscience 17:4302–4311
- 12.A cortical representation of the local visual environmentNature 392:598–601
- 13.A Cortical Area Selective for Visual Processing of the Human BodyScience 293:2470–2473
- 14.Differential development of high-level visual cortex correlates with category- specific recognition memoryNature Neuroscience 10:512–522
- 15.Visual category-selectivity for faces, places and objects emerges along different developmental trajectoriesDevelopmental Science 10:F15–F30
- 16.The emergence of the visual word form: Longitudinal evolution of category-specific ventral visual areas during reading acquisitionPLOS Biology 16
- 17.Visual object categorization in infancyProc. Natl. Acad. Sci. U.S.A 119
- 18.Face Perception During Early InfancyPsychol Sci 10:419–422
- 19.Evidence for Representations of Perceptually Similar Natural Categories by 3-Month-Old and 4-Month-Old InfantsPerception 22:463–475
- 20.Parsing Items into Separate Categories: Developmental Change in Infant CategorizationChild Development 70:291–303
- 21.Face-sensitive brain responses in the first year of lifeNeuroImage 211
- 22.Piecing it together: Infants’ neural responses to face and object structureJournal of Vision 12
- 23.Rapid categorization of natural face images in the infant right hemisphereeLife 4
- 24.Temporal dynamics of visual representations in the infant brainDevelopmental Cognitive Neuroscience 45
- 25.Visual category representations in the infant brainCurrent Biology 32:5422–5432
- 26.Organization of high-level visual cortex in human infantsNat Commun 8
- 27.Selective responses to faces, scenes, and bodies in the ventral visual pathway of infantsCurrent Biology 32:265–274
- 28.Functional specificity in the human brain: A window into the functional architecture of the mindProceedings of the National Academy of Sciences 107:11163–11170
- 29.Innate face processingCurrent Opinion in Neurobiology 19:39–44
- 30.What drives the organization of object knowledge in the brain?Trends in Cognitive Sciences 15:97–103
- 31.Object Domain and Modality in the Ventral Visual PathwayTrends in Cognitive Sciences 20:282–290
- 32.A vision of graded hemispheric specializationAnn N Y Acad Sci :30–46
- 33.Mother’s face recognition by neonates: A replication and an extensionInfant Behavior & Development 18:79–85
- 34.Newborns’ preferential tracking of face-like stimuli and its subsequent declineCognition 40:1–19
- 35.Cortical route for facelike pattern processing in human newbornsProceedings of the National Academy of Sciences 116:4625–4630
- 36.How Learning to Read Changes the Cortical Networks for Vision and LanguageScience 330:1359–1364
- 37.Universal Mechanisms and the Development of the Face Network: What You See Is What You GetAnnu Rev Vis Sci 5:341–372
- 38.A domain-relevant framework for the development of face processingNat Rev Psychol 2:183–195
- 39.Development of the macaque face-patch systemNat Commun 8
- 40.Seeing faces is necessary for face-domain formationNat Neurosci 20:1404–1412
- 41.From faces to hands: Changing visual input in the first two yearsCognition 152:101–107
- 42.Why are faces denser in the visual experiences of younger than older infants?Developmental Psychology 53:38–49
- 43.A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligenceNat Neurosci 25:116–126
- 44.Converging evidence for functional and structural segregation within the left ventral occipitotemporal cortex in readingProceedings of the National Academy of Sciences 115:E9981–E9990
- 45.Texture-like representation of objects in human visual cortexProceedings of the National Academy of Sciences 119
- 46.Piecing it together: Infants’ neural responses to face and object structureJournal of Vision 12
- 47.Frequency-domain analysis of fast oddball responses to visual stimuli: A feasibility studyInternational Journal of Psychophysiology 73:287–293
- 48.An objective index of individual face discrimination in the right occipito-temporal cortex by means of fast periodic oddball stimulationNeuropsychologia 52:57–72
- 49.Odor-driven face-like categorization in the human infant brainProceedings of the National Academy of Sciences 118
- 50.ERP evidence of developmental changes in processing of facesClinical Neurophysiology 110:910–915
- 51.Neural Correlates of Facial Emotion Processing in InfancyDev Sci 22
- 52.The Cortical Development of Specialized Face Processing in InfancyChild Dev 87:1581–1600
- 53.Face-sensitive cortical processing in early infancyJournal of Child Psychology and Psychiatry 45:1228–1234
- 54.Visual following and pattern discrimination of face-like stimuli by newborn infantsPediatrics 56:544–549
- 55.Cortical specialisation for face processing: face-sensitive event-related potential components in 3- and 12-month-old infantsNeuroImage 19:1180–1193
- 56.A behavioural and ERP investigation of 3-month-olds’ face preferencesNeuropsychologia 44:2113–2125
- 57.Frequency tagging with infants: The visual oddball paradigmFront. Psychol 13
- 58.Neural Correlates of Woman Face Processing by 2-Month-Old InfantsNeuroImage 15:454–461
- 59.Biomarkers of Face Perception in Autism Spectrum Disorder: Time to Shift to Fast Periodic Visual Stimulation With Electroencephalography?Biological Psychiatry: Cognitive Neuroscience and Neuroimaging 5:258–260
- 60.Reduced neural sensitivity to rapid individual face discrimination in autism spectrum disorderNeuroImage: Clinical 21
- 61.Characterizing the neural signature of face processing in Williams syndrome via multivariate pattern analysis and event related potentialsNeuropsychologia 142
- 62.Infants’ recognition of their mothers’ faces in facial drawingsDev Psychobiol 62:1011–1020
- 63.Neural mechanisms of rapid natural scene categorization in human visual cortexNature 460:94–97
- 64.Cortical Representations of Symbols, Objects, and Faces Are Pruned Back during Early ChildhoodCerebral Cortex 21:191–199
- 65.Representational similarity precedes category selectivity in the developing ventral visual pathwayNeuroImage 197:565–574
- 66.Visual cortex: Big data analysis uncovers food specificityCurrent Biology 32:R1012–R1015
- 67.Selectivity for food in human ventral visual cortexCommunications Biology 6
- 68.Color-biased regions in the ventral visual pathway are food selectiveCurrent Biology 33:134–146
- 69.Controlling low-level image properties: The SHINE toolboxBehavior Research Methods 42:671–684
- 70.Evidence for long-range spatiotemporal interactions in infant and adult visual cortexJournal of Vision 17
- 71.Parietal Contributions to Abstract Numerosity Measured with Steady State Visual Evoked PotentialsbioRxiv https://doi.org/10.1101/2020.08.06.239889
- 72.A new statistic for steady-state evoked potentialsElectroencephalography and Clinical Neurophysiology 78:378–388
- 73.Cue-Invariant Networks for Figure and Background Processing in Human Visual CortexJ. Neurosci 26:11695–11708
- 74.An alternative method for significance testing of waveform difference potentialsPsychophysiology 30:518–524
- 75.Nonparametric statistical testing of EEG- and MEG-dataJournal of Neuroscience Methods 164:177–190
- 76.Multilevel Analysis : An Introduction to Basic and Advanced MultilevelModeling :1–368
- 77.Imaging structural and functional brain development in early childhoodNat Rev Neurosci 19:123–137
- 1.Temporal Processing Capacity in High-Level Visual Cortex Is Domain SpecificJournal of Neuroscience 35:12412–12424
- 2.Controlling low-level image properties: The SHINE toolboxBehavior Research Methods 42:671–684
- 3.Visual category representations in the infant brainCurrent Biology 32:5422–5432
- 4.Mapping longitudinal development of local cortical gyrification in infants from birth to 2 years of ageJ Neurosci 34:4228–4238
- 5.Spatial correlation of the infant and adult electroencephalogramClinical Neurophysiology 114:1594–1608
- 6.Cue-Invariant Networks for Figure and Background Processing in Human Visual CortexJ. Neurosci 26:11695–11708
- 7.An alternative method for significance testing of waveform difference potentialsPsychophysiology 30:518–524
- 8.Spatial frequency sweep VEP: Visual acuity during the first year of lifeVision Research 25:1399–1408
- 9.Development of contrast sensitivity in the human infantVision Research 30:1475–1486
- 10.Development of vision in infancyAdler’s Physiology of the Eye Elsevier Health Sciences :531–551
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Copyright
© 2024, Yan et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 224
- downloads
- 7
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.