The emergence of visual category representations in infants’ brains
eLife Assessment
This valuable study investigates the development of high-level visual responses in infants, finding that neural responses specific to faces are present by 4-6 months but not earlier. The study is methodologically convincing, using state-of-the-art experimental design and analysis approaches. The findings would be of broad interest to the cognitive neuroscience and developmental psychology research communities.
https://doi.org/10.7554/eLife.100260.3.sa0Valuable: Findings that have theoretical or practical implications for a subfield
- Landmark
- Fundamental
- Important
- Valuable
- Useful
Convincing: Appropriate and validated methodology in line with current state-of-the-art
- Exceptional
- Compelling
- Convincing
- Solid
- Incomplete
- Inadequate
During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments
Abstract
Organizing the continuous stream of visual input into categories like places or faces is important for everyday function and social interactions. However, it is unknown when neural representations of these and other visual categories emerge. Here, we used steady-state evoked potential electroencephalography to measure cortical responses in infants at 3–4 months, 4–6 months, 6–8 months, and 12–15 months, when they viewed controlled, gray-level images of faces, limbs, corridors, characters, and cars. We found that distinct responses to these categories emerge at different ages. Reliable brain responses to faces emerge first, at 4–6 months, followed by limbs and places around 6–8 months. Between 6 and 15 months response patterns become more distinct, such that a classifier can decode what an infant is looking at from their brain responses. These findings have important implications for assessing typical and atypical cortical development as they not only suggest that category representations are learned, but also that representations of categories that may have innate substrates emerge at different times during infancy.
Introduction
Visual categorization is important for everyday activities and is amazingly rapid: adults categorize the visual input in about one-tenth of a second (Thorpe et al., 1996; Grill-Spector and Kanwisher, 2005). In adults and school-age children, this key behavior is supported by both clustered and distributed responses to visual categories in high-level visual cortex in ventral temporal and lateral occipitotemporal cortex (VTC and LOTC, respectively) (Grill-Spector and Weiner, 2014; Bugatus et al., 2017). A visual category consists of items that share common visual features and configurations (Grill-Spector and Kanwisher, 2005; Nordt et al., 2021; Margalit et al., 2020; Gomez et al., 2017; Stigliani et al., 2015); e.g., corridors share features of floors, walls, and ceilings, with a typical spatial relationship. Clustered regions in VTC and LOTC (Bugatus et al., 2017; Haxby et al., 2001; Nordt et al., 2023) respond more strongly to items of ecologically relevant categories (faces, bodies, places, words) than other stimuli (Nordt et al., 2021; Kanwisher et al., 1997; Epstein and Kanwisher, 1998; Downing et al., 2001; Golarai et al., 2007; Scherf et al., 2007; Dehaene-Lambertz et al., 2018) and distributed neural responses across VTC and LOTC (Bugatus et al., 2017; Haxby et al., 2001; Nordt et al., 2023) are reliable across items of a category but distinct across items of different categories. However, it is unknown when these visual category representations emerge in infants’ brains.
Behaviorally, infants can perform some level of visual categorization within the first year of life. Measurements of infants’ looking preferences and looking times suggest that visual saliency impacts young infants’ viewing patterns (Spriet et al., 2022): between 4 and 10 months of age, infants can behaviorally distinguish between faces and objects (Spriet et al., 2022; Mondloch et al., 1999) and between different animals like cats and dogs (Quinn et al., 1993; Younger and Fearing, 1999). Later on, between 10 and 19 months, infants behaviorally distinguish broader-level animate vs. inanimate categories (Spriet et al., 2022). Neurally, electroencephalographic (EEG) studies have found stronger responses to images of faces vs. objects or textures in 4- to 12-month-olds (Conte et al., 2020; Farzin et al., 2012; de Heering and Rossion, 2015) and that stimulus category can be decoded from distributed responses slightly but significantly above chance in 6- to 15-month-olds (Bayet et al., 2020; Xie et al., 2022). Functional magnetic resonance imaging (fMRI) studies have found stronger responses to videos of faces (Deen et al., 2017; Kosakowski et al., 2022), bodies (Kosakowski et al., 2022), and places (Deen et al., 2017; Kosakowski et al., 2022) vs. objects in clustered regions in VTC and LOTC of 2- to 10-month-olds. However, because prior studies used different types of stimuli and age ranges, it is unknown when representations to various categories emerge during the first year of life. To address this key open question, we examined when neural representations to different visual categories emerge during infancy using EEG in infants of four age groups spanning 3–15 months of age.
We considered two main hypotheses regarding the developmental trajectories of category representations. One possibility is that representations to multiple categories emerge together because infants need to organize the barrage of visual input to understand what they see. Supporting this hypothesis are findings of (i) selective responses to faces, places, and body parts in VTC and LOTC of 2- to 10-month-olds (Kosakowski et al., 2022), and (ii) above chance classification of distributed EEG responses to toys, bodies, faces, houses in 6- to 8-month-olds (Xie et al., 2022) as well as animals and body parts in 12- to 15-month-olds (Bayet et al., 2020).
Another possibility is that representations of different categories may emerge at different times during infancy. This may be due to two reasons. First, representations of ecologically relevant categories like faces, body parts, and places may be innate because of their evolutionary importance (Kanwisher, 2010; Sugita, 2009; Mahon and Caramazza, 2011; Bi et al., 2016), whereas representations for other categories may develop later only with learning (Nordt et al., 2021; Nordt et al., 2023; Behrmann and Plaut, 2015). Supporting this hypothesis are findings that newborns and young infants tend to orient to faces (Pascalis et al., 1995) and face-like stimuli (Johnson et al., 1991), as well as have cortical responses to face-like stimuli (Buiatti et al., 2019), but word representations only emerge in childhood with the onset of reading instruction (Nordt et al., 2021; Nordt et al., 2023; Dehaene et al., 2010). Second, even if visual experience is necessary for the development category representations (including faces; Arcaro et al., 2019; Scott and Arcaro, 2023; Livingstone et al., 2017; Arcaro et al., 2017), categories that are seen more frequently earlier in infancy may develop before others. Measurements using head-mounted cameras suggest that infants’ visual diet (composition of visual input) varies across categories and age: The visual diet of 0- to 3-month-olds contains ~25% faces and <10% hands, that of 12- to 15-month-olds contains ~20% faces and ~20% hands (Fausey et al., 2016; Jayaraman et al., 2017), and that of 24-month-olds contains ~10% faces, and ~25% hands. Thus, looking behavior in infants predicts that representations of faces may emerge before that of limbs.
Results
45 infants from four age groups: 3–4 months (n=17, 7 females), 4–6 months (n=14, 7 females), 6–8 months (n=15, 6 females), and 12–15 months (n=15, 4 females) participated in EEG experiments. Twelve participants were part of an ongoing longitudinal study and came for several sessions spanning at least 3 months apart. Infants viewed gray-scale images from five visual categories present in infants’ environments (faces, limbs, corridors, characters, and cars) while EEG was recorded. Different from prior infant studies (Conte et al., 2020; de Heering and Rossion, 2015; Bayet et al., 2020; Xie et al., 2022; Deen et al., 2017; Kosakowski et al., 2022), we used images that have been widely used in fMRI studies (Nordt et al., 2021; Gomez et al., 2017; Allen et al., 2022; Lerma-Usabiaga et al., 2018; Jagadeesh and Gardner, 2022) and are largely controlled for low-level properties such as luminance, contrast, similarity, and spatial frequency (Figure 1B and Appendix 1—table 3). We use a steady-state visual evoked potential (de Heering and Rossion, 2015; Farzin et al., 2012; Heinrich et al., 2009; Liu-Shuang et al., 2014) (SSVEP) paradigm: In each 70 s sequence, images from five categories were shown every 0.233 s; one of the categories was the target, so different images from that category appeared every 1.167 s, and the rest of the images were drawn from the other four categories in a random order (Figure 1A). Images of all categories appeared at equal probability and no images were repeated (Stigliani et al., 2015). Infants participated in five conditions, which varied by the target category. We used the EEG-SSVEP approach because: (i) it affords a high signal-to-noise ratio with short acquisitions making it effective for infants (de Heering and Rossion, 2015; Farzin et al., 2012), (ii) it has been successfully used to study responses to faces in infants (de Heering and Rossion, 2015; Farzin et al., 2012; Rekow et al., 2021), and (iii) it enables measuring both general visual response to images by examining responses at the image presentation frequency (4.286 Hz) and category-selective responses by examining responses at the category frequency (0.857 Hz, Figure 1A).
As the EEG-SSVEP paradigm is novel and we are restricted in the amount of data we can obtain in infants, we first tested if we can use this paradigm and a similar amount of data to detect category-selective responses in adults. Results in adults validate the SSVEP paradigm for measuring category selectivity: as they show that (i) category-selective responses can be reliably measured using EEG-SSVEP with the same amount of data as in infants (Appendix 1—figure 1, Appendix 1—figure 2), and that (ii) category information from distributed spatiotemporal response patterns can be decoded with the same amount of data as in infants (Appendix 1—figure 3).
As infants have lower cortical visual acuity, we also tested if the stimuli are distinguishable to infants. Thus, we simulated how they may look to infants by filtering the images to match the cortical acuity of 3-month-olds (Appendix 1—figure 4). Despite being blurry, images of different categories are readily distinguishable by adults (Videos 1–5), suggesting that there is sufficient visual information in the lower spatial frequencies of the stimuli for infants to distinguish visual categories.
Robust visual responses in occipital regions to visual stimuli in all infant age groups
We first tested if there are significant visual responses to our stimuli in infants’ brains by evaluating the amplitude of responses at the image presentation frequency (4.286 Hz) and its first three harmonics. We found that in all age groups, visual responses were concentrated spatially over occipital electrodes (Figure 2A–D, left panel, Appendix 1—figure 1). Quantification of the mean visual response amplitude over a region of interest (ROI) spanning nine electrodes over early visual cortex (occipital ROI) revealed significant responses in all infant age groups at the image frequency and its first three harmonics (response amplitudes significantly above zero with false discovery rate [FDR] corrected at four levels; except for the first harmonic at 8.571 Hz in 6- to 8-month-olds; Figure 2A–D, right panel). Analysis of visual responses separately by category condition revealed that visual responses were not significantly different across category conditions (Appendix 1—figure 5; no significant main effect of category, βcategory = 0.08, 95% CI: –0.08–0.24, t(301) = 0.97, p=0.33, or category by age interaction, βcategory × age = -0.04, 95% CI: –0.11–0.03, t(301) = –1.09, p=0.28, linear mixed model (LMM) on response amplitude to 4.286 Hz and its first three harmonics). We also tested if experimental noise varied across age groups. Noise level was estimated in the occipital electrodes by measuring the amplitude of response in frequencies up to 8.571 Hz excluding image presentation frequencies (4.286 Hz and harmonics) and category frequencies (0.857 Hz and harmonics) as this frequency range includes the relevant main harmonics. We found no significant difference in noise across age groups (Figure 2E). These analyses indicate that infants were looking at the stimuli as there are significant visual responses even in the youngest 3- to 4-month-old infants’ and there are no significant differences in noise levels across infants of different ages.
Prior EEG data (Conte et al., 2020; Taylor et al., 1999) suggest that the timing and waveform of visual responses may vary across development. To complement the frequency domain analysis, we transformed the responses at image frequency and its harmonics to the time domain using an inverse Fourier transformation for two reasons. First, the time domain provides access to information about response timing and waveform that is not directly accessible from an analysis of responses of individual harmonics. Second, the total visual response is better reflected in the time domain as the individual harmonic amplitudes can sum constructively.
We observed that during the 233 ms image presentation, temporal waveforms had two deflections in 3- to 4-month-olds (one negativity and one positivity, Figure 2F) and four deflections for infants older than 4 months (two minima and two maxima, Figure 2F). To evaluate developmental effects, we examined the latency and amplitude of the peak visual response during two time windows related to the first deflection (60–90 ms), and the second deflection (90–160 ms for 3- to 4-month-olds, and 90–110 ms for other age groups). In general, we find that the latency of the peak deflection decreased from 3 to 15 months (Figure 2G and H). As data includes both cross-sectional and longitudinal measurements and we observed larger development earlier than later in infancy, we used an LMM to model peak latency as a function of the logarithm of age (see Methods). Results reveal that the latency of the peak deflection significantly and differentially decreased with age in the two time windows (βage × time window = –45.78, 95% CI: –58.39 to –33.17, t(118) = –7.19, p=6.39 × 10–11; LMM with age and time window as fixed effects, and participant as a random effect, all stats in Appendix 1—table 4, Appendix 1—table 5). There were larger decreases in the peak latency in the second than first time window (Figure 2G and H, first:βage = –7.44, 95% CI: –13.82 to –1.06, t(118) = –2.33, pFDR<0.05; second: βage = –46.91, 95% CI: –56.56 to –37.27, t(59) = –9.73, pFDR<0.001). Peak amplitude also differentially develops across the two windows (βage × time window = –4.90, 95% CI: –8.66 to –1.14, t(118) = –2.58, p=0.01, Appendix 1—table 6, Appendix 1—table 7). The decrease in peak amplitude with age was significant only for the second deflection (βage = –3.59, 95% CI: –6.38 to –0.81, t(59) = –2.58, p=0.01, LMM). These data suggest that the temporal dynamics of visual responses over occipital cortex develop from 3 to 15 months of age.
What is the nature of category-selective responses in infants?
We next examined if in addition to visual responses to the rapid image stream, there are also category-selective responses in infants, by evaluating the amplitude of responses at the category frequency (0.857 Hz) and its harmonics. This is a selective response as it reflects the relative response to images of category above the general visual response. Figure 3 shows the spatial distribution and amplitude of the mean category response for faces and its harmonics in each age group. Mean category-selective responses to limbs, cars, corridors, and words are shown in Appendix 1—figures 6–9. We analyzed mean responses over two ROIs spanning seven electrodes each over the left (LOT) and right occipitotemporal (ROT) cortex where high-level visual regions are located (Xie et al., 2019).
We found significant group-level category responses to some but not all categories and a differential development of category-selective responses during infancy. The largest and earliest developing category-selective responses were to faces. In contrast to visual responses, which were centered over occipital electrodes (Figure 2A–D, left panel), significant categorical responses to faces (at 0.857 Hz and its first harmonic, 1.714 Hz) were observed over lateral occipitotemporal electrodes (Figure 3A–D, left panel). Notably, there were significant responses to faces over bilateral occipitotemporal electrodes in 4- to 6-month-olds at 0.857 Hz (Figure 3B, response amplitudes significantly above zero with Hotelling’s T2 statistic, pFDR<0.05, FDR corrected over two levels: the category frequency and its first harmonic), as well as 6- to 8-month-olds and 12- to 15-month-olds at the category frequency and its first harmonic (Figure 3C and D, both pFDR<0.05). However, there were no significant responses to faces in 3- to 4-month-olds at either the category frequency or its harmonics (Figure 3A, right panel). These data suggest that face-selective responses start to reliably emerge over lateral occipitotemporal cortex between 4 and 6 months of age.
We did not find significant group-level category-selective responses that survived FDR correction to any of the other categories before 6 months of age (Appendix 1—figures 6–9, except for a weak but statistically significant response for cars in the ROT ROI in 3- to 4-month-olds). Instead, we found significant category-selective responses that survived FDR correction for (i) limbs in 6- to 8-month-olds in the ROT ROI (Appendix 1—figure 6), (ii) corridors in 6- to 8-month-olds and 12- to 15-months-old in the left occipitotemporal (LOT) ROI (Appendix 1—figure 7), and (iii) characters in 6- to 8-month-olds in the ROT ROI, and in 12- to 15-month-olds in bilateral occipitotemporal ROI (Appendix 1—figure 8).
We next examined the development of the category-selective responses separately for the right and left lateral occipitotemporal ROIs. The response amplitude was quantified by the root mean square (RMS) amplitude value of the responses at the category frequency (0.857 Hz) and its first harmonic (1.714 Hz) for each category condition and infant. With an LMM analysis, we found significant development of response amplitudes in both the occipitotemporal ROIs which varied by category (LOT ROIs: βcategory × age = –0.21, 95% CI: –0.39 to –0.04, t(301) = –2.40, pFDR<0.05; ROT ROIs: βcategory × age = –0.26, 95% CI: –0.48 to –0.03, t(301) = –2.26, pFDR<0.05, LMM as a function of log (age) and category; participant: random effect).
We evaluated the temporal dynamics of category-selective waveforms by transforming the data at the category frequency and its harmonics to the time domain. This analysis was done separately for each of the LOT and ROT ROIs for each category and age group. Consistent with frequency domain analyses, average temporal waveforms over lateral-occipital ROIs show significant responses to faces that emerge at ~4 months of age (Figure 4A, significant responses relative to zero, cluster-based nonparametric permutation 10,000 times, two-tailed t-test, at p<0.05). The temporal waveforms of responses to faces in infants show an initial positive deflection peaking ~500 ms after stimulus onset followed by a negative deflection peaking at ~900 ms. Notably, mean waveforms associated with limbs, corridors, and characters in lateral occipital ROIs are different from faces: there is only a single negative deflection that peaks at ~500 ms after stimulus onset, which is significant only in 6- to 8- and 12- to 15-month-olds (Figure 4B–D). There was no significant category response to cars in infants except for a late (~1000 ms) positive response in 4- to 6-month-olds (Figure 4E). These results show that both the timing and waveform differ across categories, which suggests that there might be additional category information in the distributed spatiotemporal response.
We next examined the development of the peak response and latency of the category waveforms separately for the right and left lateral occipitotemporal ROIs. We found significant development of the peak response in the right lateral occipitotemporal ROI which varied by category (βcategory × age = –1.09, 95% CI: –2.00 to –0.14, t(301) = –2.26, pFDR<0.05, LMM as a function of log(age) and category; participant: random effect). Post hoc analyses revealed that the peak response for faces significantly increased from 3 to 15 months (Figure 4A right, βage = 7.27, 95% CI: 4.03–10.51, t(59) = 4.50, pFDR<0.05, LMM as a function of log(age); participant: random effect) and the peak response for limbs significantly decreased (Figure 4B right, βage = –2.90, 95% CI: –5.41 to –0.38, t(59) = –2.31, p=0.02, not significant after FDR correction over five category levels). There were no other significant developments of peak amplitude (Appendix 1—table 8, Appendix 1—table 9).
Additionally, for all categories, the latency of the peak response in the ROT ROI significantly decreased from 3 to 15 months of age (βage = –173.17, 95% CI: –284.73 to –61.61, t(301) = –3.05, p=0.002, LMM as a function of log(age) and category; participant: random effect). We found no significant development of peak latency in the LOT ROI (Appendix 1—table 8, Appendix 1—table 9).
Are spatiotemporal patterns of responses to visual categories consistent across infants?
As we observed different mean waveforms over the lateral occipital ROIs for the five categories (Figure 4), we asked whether the distributed spatiotemporal patterns of brain responses evoked by each category are unique and reliable across infants. We reasoned that if different categories generated consistent distributed spatiotemporal responses, an independent classifier would be able to predict the category an infant was viewing from their distributed spatiotemporal pattern of response. Thus, we used a leave-one-out-cross-validation (LOOCV) approach (Figure 5A) and tested if a classifier can decode the category a left-out infant viewed based on the similarity of their distributed spatiotemporal response to the mean response to each of the categories in the remaining N–1 infants. We calculated for each infant the mean category waveform (same as Figure 4) across the occipital and lateral occipitotemporal ROIs and concatenated the waveforms across the three ROIs to generate the distributed spatiotemporal response to a category (Figure 5A). The classifier was trained and tested separately for each age group.
Results reveal two main findings. First, the LOOCV classifier decoded category information from brain responses significantly above the 20% chance level in infants aged 6 months and older but not in younger infants (Figure 5B, 6- to 8-month-olds, significant above chance: t(14) = 4.1, pFDR<0.01, one-tailed, FDR corrected over four age groups; 12- to 15-month-olds, t(14) = 3.4, pFDR<0.01). This suggests that spatiotemporal patterns of responses to different categories become reliable across infants after 6 months of age. Second, examination of classification by category shows that the LOOCV classifier successfully determined from spatiotemporal responses when infants were viewing faces in 64% of 4- to 6-month-olds, in 93% of 6- to 8-month-olds, and 87% of 12- to 15-month-olds (Figure 5C). In contrast, classification performance was numerically lower for the other categories (successful classification in less than 40% of the infants). This suggests that a reliable spatiotemporal response to faces that is consistent across infants develops after 4 months of age and dominates classification performance.
What is the nature of categorical spatiotemporal patterns in individual infants?
While the prior analyses leverage the power of averaging across electrodes and infants, this averaging does not provide insight to fine-grained neural representations within individual infants. To examine the finer-grain representation of category information within each infant’s brain, we examined the distributed spatiotemporal responses to each category across the 23 electrodes spanning the LOT and ROT cortex in each infant. We tested: (i) if categorical representations in an infant’s brain are reliable across different images of a category, and (ii) if category representations become more distinct during the first year of life. We predicted that if representations become more similar across items of a category and more dissimilar between items of different categories then category distinctiveness (defined as the difference between mean within and between category similarity) would increase from 3 to 15 months of age.
To examine the representational structure, we calculated representation similarity matrices (RSMs) across odd/even split-halves of the data in each infant. Each cell in the RSM quantifies the similarity between two spatiotemporal patterns: On-diagonal cells of the RSM quantify the similarity of distributed spatiotemporal responses to different images from the same category and off-diagonal cells quantify the similarity of spatiotemporal responses to images from different categories. Categorical structure will manifest in RSMs as positive on diagonal values indicating reliable within-category spatiotemporal responses which are higher than off-diagonal between category similarity (Figure 6, Appendix 1—figure 3B, and Appendix 1—figure 10).
Examination of mean RSMs in each age group reveals no reliable category information in individuals at 3- to 4-month-olds or 4- to 6-month-olds, as within-category similarity is not significantly above zero (Figure 6A and 3- to 4-month-olds: on-diagonal, –0.03 ±0.06, p=0.96, one-tailed; 4- to 6-month-olds: on-diagonal: 0.009 ± 0.11, p=0.38). However, starting around 6 months some category structure emerges in the RSMs. In particular, distributed responses to faces become reliable as within category similarity for faces is significantly above zero in 6- to 8-month-olds (Figure 6A, 0.31 ± 0.24, t(14) = 5.1, pFDR<0.05, FDR corrected over five category levels), and stays reliable in 12- to 15-month-olds (Figure 6A, 0.26 ± 0.24, t(14) = 4.18, pFDR<0.05). Distributed responses to limbs become reliable later on as within category similarity for limbs is significantly above zero in 12- to 15-months-olds (Figure 6A, 0.11 ± 0.21, t(14) = 1.98, p=0.03, but not surviving FDR correction at five levels).
Next, we evaluated the development of category distinctiveness, which was calculated for each category and infant. Individual infants’ category distinctiveness is shown in Figure 6B (infants ordered by age) and in the scatterplots in Figure 6C. In infants younger than 4 months (120 days) category distinctiveness is largely close to zero or even negative, suggesting no differences between spatiotemporal responses to one category vs. another. Category distinctiveness increases with age and becomes more positive from 84 to 445 days of age (Figure 6B and C). The biggest increase is for faces where after ~6 months of age (194 days) face distinctiveness is consistently positive in individual infants (13/15 infants aged 194–364 days and 12/15 infants aged 365–445 days). The increase in distinctiveness is more modest for other categories and appears later in development. For example, positive distinctiveness for limbs and cars in individual infants is consistently observed after 12 months of age (Figure 6B and C; limbs: 9/15 infants aged 365–445 days vs. 5/15 infants aged 194–364 days; cars: 12/15 365–445 days vs. 7/15 194–364 days).
Using LMMs we determined if distinctiveness significantly changed with age (log transformed) and category (participant, random factor). Results indicate that category distinctiveness significantly increased from 3 to 15 months of age (βββage = 0.77, 95% CI: 0.54–1.00, t(301) = 6.62, p=1.67×10–10), and further that development significantly varies across categories (βage × category = –0.13, 95% CI: –0.2 to –0.06, t(301) = –3.61, p=3.5×10–4; main effect of category, βcategory = 0.27, 95% CI: 0.11–0.43, t(301) = 3.38, p=8.2×10–4). Post hoc analyses for each category (Figure 6C) reveal that distinctiveness significantly increased with age for faces (βage = 0.9, 95% CI: 0.6–1.1, t(59) = 6.8, pFDR<0.001), limbs (βage = 0.4, 95% CI: 0.2–0.6, t(59) = 5.0, pFDR<0.001), characters (βage = 0.2, 95% CI: 0.02–0.3, t(59) = 2.2, pFDR<0.05), and cars (βage = 0.4, 95% CI: 0.2–0.5, t(59) = 3.7, pFDR<0.001). Post hoc t-tests show that for faces, the category distinctiveness is significantly above zero after 6 months of age (6- to 8-month-olds, t(14) = 6.73, pFDR<0.05; 12- to 15-month-olds, t(14) = 5.30, pFDR<0.05) and for limbs and cars at 12–15 months of age (limbs: t(14) = 2.19, pFDR<0.05; cars: t(14) = 4.53, pFDR<0.05). This suggests that category distinctiveness slowly emerges in the visual cortex of infants from 3 to 15 months of age, with the largest and fastest development for faces.
Discussion
We find that both selective responses to items of a category over others across lateral occipital ROIs and the distinctiveness of distributed visual category representations progressively develop from 3 to 15 months of age. Notably, we find a differential development of category-selective responses (Figure 7), whereby responses to faces emerge the earliest, at 4–6 months of age and continue to develop through the first year of life. Category-selective responses to limbs, corridors, and characters follow, emerging at 6–8 months of age. Our analysis of the distinctiveness of the distributed spatiotemporal patterns to each category also finds that distributed representations to faces become more robust in 6- to 8-month-olds and remain robust in 12- to 15-month-olds. While the distinctiveness of distributed patterns to limbs and cars only become reliable at 12–15 months of age. Together these data suggest a rethinking of the development of category representations during infancy as they not only suggest that category representations are learned, but also that representations of categories that may have innate substrates such as faces, bodies, and places emerge at different times during infancy.
Reliable category representations start to emerge at 4 months of age
While 3- to 4-month-old infants have significant and reliable evoked visual responses over early visual cortex, we find no reliable category representations of faces, limbs, corridors, or characters in these young infants. Both analyses of average responses across lateral occipital ROIs and analyses of distributed spatiotemporal responses across visual cortex find no reliable category representations in 3- to 4-month-olds, either when examining mean response across an ROI or in distributed spatiotemporal patterns across visual cortex. The earliest categorical responses we find are for faces, and they emerge at 4–6 months of age.
Is it possible that there are some category representations in 3- to 4-month-olds, but we lack the sensitivity to measure them? We believe this is unlikely, because (i) we can measure significant visual responses from the same 3- to 4-month-olds, (ii) with the same amount of data, we can measure category-selective responses and decode category information from distributed spatiotemporal responses in infants older than 4 months and in adults. As using SSVEP to study high-level representations is a nascent field (Gentile and Rossion, 2014; Retter et al., 2020; Peykarjou, 2022), future work can further examine how SSVEP parameters such as stimulus and target category presentation rate may affect the sensitivity of measurements in infants (see review by Peykarjou, 2022).
Our findings together with a recent fMRI study in 2- to 10-month-olds (Kosakowski et al., 2022) provide accumulating evidence for multiple visual categories representations in infants’ brains before the age of one. However, there are also differences across studies. The earliest we could find reliable group-level category-selective responses for faces was 4- to 6-month-olds and for limbs and corridors only after 6 months of age. In contrast, Kosakowski et al., 2022, report category-selective responses to faces, bodies, and scenes in example 4- to 5-month-olds. Group average data in their study found significant face- and place-selective responses in infants’ VTC but not in LOTC, and significant body-selective responses in LOTC, but not VTC. Because Kosakowski et al., 2022, report group-averaged data across infants spanning 8 months, their study does not provide insights to the time course of this development. We note that, the studies differ in several ways: (i) measurement modalities (fMRI in Kosakowski et al., 2022, and EEG here), (ii) the types of stimuli infants viewed: in Kosakowski et al., 2022, infants viewed isolated, colored, and moving stimuli, but in our study, infants viewed still, gray-level images on phase-scrambled backgrounds, which were controlled for several low-level properties, and (iii) contrasts used to detect category-selective responses, whereby in Kosakowski et al., 2022, the researchers identified within predefined parcels – the top 5% of voxels that responded to the category of interest vs. objects, here we contrasted the category of interest vs. all other categories the infant viewed. Thus, future research is necessary to determine whether differences between findings are due to differences in measurement modalities, stimulus format, and data analysis choices.
Face representations emerge around 4–6 months of age
Recognizing faces (e.g. a caregiver’s face) is crucial for infants’ daily lives. Converging evidence from many studies suggest that infants have significant and reliable face-selective neural responses at 4–6 months of age (Farzin et al., 2012; de Heering and Rossion, 2015; Deen et al., 2017; Guy et al., 2016; Halit et al., 2004). While some studies report responses to face-like (high-contrast paddle-like) stimuli in newborns (Johnson et al., 1991; Buiatti et al., 2019; Goren et al., 1975) and significant visual evoked responses to faces in 3-month-olds (Halit et al., 2003; Cassia et al., 2006; Peykarjou, 2022; Tzourio-Mazoyer et al., 2002), these studies have largely compared responses to an isolated face vs. another isolated object. In contrast, we do not find reliable face-selective responses (Figures 3–4) or reliable distributed representations (Figures 5–6) to faces in 3- to 4-month-olds when responses to faces are contrasted to many other items and when stimuli are shown on a background rather than in isolation. Our findings are consistent with longitudinal research in macaques showing that robust cortical selectivity to faces takes several months to emerge (Livingstone et al., 2017) and support the hypothesis that experience with faces is necessary for the development of cortical face selectivity (Arcaro et al., 2019; Livingstone et al., 2017; Arcaro et al., 2017).
Our data also reveal that face-selective responses and distributed representations to faces become more robust in 6- to 8-month-olds and remain robust in 12- to 15-month-olds. For example, successful decoding of faces in the group level was observed in 80% of individual infants based on several minutes of EEG data. Reliable distributed spatiotemporal responses to different images of faces become significantly different from responses to images from different categories. This robust decoding has important clinical ramifications as it may serve as an early biomarker for cortical face processing, which is important for early detection of social and cognitive developmental disorders such as Autism (Rossion, 2020; Vettori et al., 2019) and Williams syndrome (Farran et al., 2020). Future research is necessary for elucidating the relationship between the development of brain responses to faces to infant behavior. For example, it is interesting that at 6 months of age, when we find robust face representations, infants also start to exhibit recognition of familiar faces (like parents) and stranger anxiety (Kobayashi et al., 2020).
One fascinating aspect of the development of cortical face selectivity is that among the categories we tested, selectivity to faces seems to emerge the earliest at around 4 months of age, yet the development of selectivity and distributed representations to faces is protracted compared to objects and places (Golarai et al., 2007; Peelen et al., 2009; Scherf et al., 2007). Indeed, in both our data and prior work, face-selective responses and distributed representations to faces in infants are immature compared to adults (Conte et al., 2020; Xie et al., 2022), and a large body of work has shown that face selectivity (Nordt et al., 2021; Golarai et al., 2007; Scherf et al., 2007; Taylor et al., 1999; Peelen et al., 2009; Cantlon et al., 2011; Cohen et al., 2019) and distributed representations to faces (Nordt et al., 2023) continue to develop during childhood and adolescence. This suggests that not only experience during infancy but also life-long experience with faces, sculpts cortical face selectivity. We speculate that the extended cortical plasticity for faces may be due to both the expansion of social circles (family, friends, and acquaintances) across the lifespans and also the changing statistics of faces we socialize with (e.g. child and adult faces have different appearance).
A new insight about cortical development: different category representations emerge at different times during infancy
To our knowledge, our study is the first to examine the development of both ROI level and spatiotemporal distributed responses in infants across the first year of life. We note that both analyses find that category information to faces develops before other categories. However, there are also some differences across analyses (Figure 7). For example, for limbs and corridors we find significant category-selective responses at the ROI level in lateral occipitotemporal ROIs starting at 6–8 months but no reliable distinct distributed responses across visual cortex at this age. In contrast, for cars, we find an opposite pattern where there is a distinct spatiotemporal pattern in 12- to 15-month-olds even as there is no significant car-selective response in the ROI level. As these approaches have different sensitivities, they reveal insights to the nature of the underlying representations. For example, as visible in Figure 4, limbs and corridor have a clear category-selective waveform in both in 6- to 8- and 12- to 15-months-olds, but the waveform of limbs and its spatial distribution is not that different from that to corridors, which may explain why distinctiveness of spatiotemporal patterns for limbs is low in 6- to 8-month-olds (Figure 6). Likewise, even as there is no significant response for cars (Figure 4e), its spatiotemporal pattern is consistently different than for other categories giving rise to a distinctive spatiotemporal response by 12 months (Figure 6).
In sum, the key finding from our study is that the development of category selectivity during infancy is non-uniform: face-selective responses and representations of distributed patterns develop before representations to limbs and other categories. We hypothesize that this differential development of visual category representations may be due to differential visual experience with these categories during infancy. This hypothesis is consistent with behavioral research using head-mounted cameras that revealed that the visual input during early infancy is dense with faces, while hands become more prevalent in the visual input later in development and especially when in contact with objects (Fausey et al., 2016; Jayaraman et al., 2017). Additionally, a large body of research has suggested that young infants preferentially look at faces and face-like stimuli (Spriet et al., 2022; Mondloch et al., 1999; Pascalis et al., 1995; Johnson et al., 1991), as well as look longer at faces than other objects (Fausey et al., 2016), indicating that not only the prevalence of faces in babies’ environments but also longer looking times may drive the early development of face representations. Further supporting the role of visual experience in the formation of category selectivity is a study that found that infant macaques that are reared without seeing faces do not develop face selectivity but develop selectivity to other categories in their environment like body parts (Arcaro et al., 2017). An alternative hypothesis is that differential development of category representations is of maturational origin. For example, we found differences in the temporal dynamics of visual responses among four infant age groups, which suggests that the infant’s visual system is still developing during the first year of life. While underlying maturational mechanisms are yet unknown they may include myelination and cortical tissue maturation (Grotheer et al., 2022; Natu et al., 2021; Tooley et al., 2021; Adibpour et al., 2024; Lebenberg et al., 2019; Gilmore et al., 2018). Future studies can test experience-driven vs. maturational alternatives by examining infants’ visual diet, looking behavior, and anatomical brain development and examine responses using additional behaviorally relevant categories such as food (Bannert and Bartels, 2022; Jain et al., 2023; Pennock et al., 2023). These measurements can test how environmental and individual differences in visual experiences may impact infants’ developmental trajectories. Specifically, a visual experience account predicts that differences in visual experience would translate into differences in brain development, but a maturational account predicts that visual experience will have no impact on the development of category representations.
Together our findings not only suggest that visual experience is necessary for the development of visual category representations, including faces, but also necessitate a rethinking of how visual category representations develop in infancy. Moreover, this differential development during infancy is evident even for categories that have evolutionary importance and may have innate substrates such as faces, bodies, and places (Kanwisher, 2010; Sugita, 2009; Mahon and Caramazza, 2011; Bi et al., 2016). Finally, our findings have important ramifications for theoretical and computational models of visual development as well as for the assessment of atypical infant development.
Methods
Participants
Ethical permission for the study was obtained from the Institutional Review Board of Stanford University. Parents of the infant participants provided written informed consent prior to their first visit and also prior to each session if they came for multiple sessions. Participants were paid 20$/hr for participation. Participants were recruited via ads on social media (Facebook and Instagram).
Sixty-two full-term, typically developing infants were recruited. Twelve participants were part of an ongoing longitudinal study that obtained both anatomical MRI and EEG data in infants. Some of the infants participated in both studies and some only in one of the studies. Infants were recruited at around newborn, 3 months, 6 months, and 12 months. We did not recruit infants between 8 and 12 months of age because around 9 months there is little contrast between gray and white matter in anatomical MRI scans that were necessary for the MRI study. Infants came for several sessions spanning ~3 months apart (seven 3- to 4-month-olds, three 4- to 6-month-olds, eight 6- to 8-month-olds, and twelve 12- to 15-month-olds). Data from 19 infants (nine 3- to 4-month-olds, six 4- to 6-month-olds, and eight 6- to 8-month-olds; among whom seven were longitudinal) were acquired in two visits within a 2-week span to obtain a sufficient number of valid data epochs. Appendix 1—table 1 contains participants’ demographic information (sex and race). The youngest infants were 3 months of age, as the EEG setup requires the infants to be able to hold their head and look at the screen in front of them. 23 adults (14 females) also participated in the study. All participants had normal/corrected-to-normal vision and provided written informed consent.
Data exclusion criteria: We excluded participants who had less than 20 valid epochs (1.1667 s/epoch) per category, had noise/muscular artifacts during the EEG recordings, couldn’t record data, or had no visual responses over the occipital electrodes. As such, we excluded (i) five infants due to an insufficient number of epochs, (ii) two infants who had no visual responses, (iii) ten infants due to technical issues during data collection, and (iv) three adults due to excessive noise/muscular artifacts during EEG. In total, we report data from 45 infants (Appendix 1—table 1) and 20 adults (13 females, 19–38 years) that met inclusion criteria.
Visual stimuli
Natural grayscale images of adult faces, limbs, corridors, characters, and cars are used as stimuli, with 144 images per category from the fLOC image database (https://github.com/VPNL/fLoc, copy archived at Yan, 2024; Stigliani et al., 2015). The size, view, and retinal position of the items varied, and the items were overlaid on phase-scrambled backgrounds that were generated from a randomly drawn image in the stimulus set. The images were also controlled for multiple low-level differences between stimuli of different categories including their luminance, contrast, similarity, and spatial frequency power distributions using the SHINE toolbox (Willenbockel et al., 2010). As only five of ten categories from Stigliani et al., 2015, were used, we evaluated the stimuli used in our experiments to test if they differed in (i) contrast, (ii) luminance, (iii) similarity, and (iv) spatial frequency. Results show that categories were largely matched on most metrics (Figure 1B and Appendix). The stimuli were presented on a gamma-corrected OLED monitor screen (SONY PVM-2451; SONY Corporation, Tokyo Japan) at a screen resolution of 1920 ×1080 pixels and a monitor refresh rate of 60 Hz. When viewed from 70 cm away, each image extended a visual angle of approximately 12°.
EEG protocol
The experiments were conducted in a calm, dimly illuminated lab room. Stimuli were presented using custom stimulus presentation software with millisecond timing precision. During testing, infant participants were seated on their parent’s laps in front of the screen at a distance of 70 cm. One experimenter stood behind the presentation screen to monitor where the infant was looking. The visual presentation was paused if the infant looked away from the screen and was continued when the infant looked again at the center of the screen. To motivate infants to fixate and look at the screen, we presented at the center of the screen small (~1°) colored cartoon images such as butterflies, flowers, and ladybugs. They were presented in random order with durations uniformly distributed between 1 and 1.5 s. For adults, we used a fixation cross of the same size instead of the cartoons and asked the participants to fixate and indicate when the fixation’s color changed by pressing a space bar key on a keyboard. EEG measurements for infant participants continued until the infant no longer attended the screen and we obtained between 2 and 12 different 70 s sequences per individual. For adult participants, we acquired 12 sequences per individual.
A frequency-tagging paradigm (de Heering and Rossion, 2015; Farzin et al., 2012) was used to measure brain responses. In the experiment, randomly selected images from five categories were presented sequentially at a rate of 4.286 Hz (~233 ms per image) with no inter stimulus interval during each 70 s sequence. For each condition, one category was determined as the target category; for this category random selected images from that category were presented first and followed by four images randomly drawn from the other four categories with no regular order (Figure 1A). The target images are therefore presented periodically at 0.857 Hz (i.e. 4.286 Hz/5), but the intervals between sequential presentations of images from the other four categories was not periodic. The probability of image occurrences across categories was equal at 20%. The experiment had five conditions, one for each of the following target categories: faces, limbs, corridors, characters, and cars. Each 70 s experimental sequence was composed of five 14 s long conditions which included a 1.1667 s stimulus fade-in and a 1.1667 s stimulus fade-out. The order of the category conditions was randomized within each 70 s sequence. No image was repeated within a sequence. Two presentation frequencies were embedded in the experiment: (i) the image frequency (4.286 Hz), which is predicted to elicit visual responses to all stimuli over occipital visual cortex, and, (ii) the category frequency (0.857 Hz), which is predicted to elicit a category-selective response over lateral occipital-temporal electrodes.
EEG acquisition
EEG data were recorded at 500 Hz from a 128-channel EGI High-Density Geodesic Sensor Net. For infants, the net was connected to a NetAmps 300 amplifier (Electrical Geodesics, Inc, Eugene, OR, USA). For the adults, the net was connected to a NetAmps400 amplifier. The EEG recording was referenced online to a single vertex (electrode Cz) and the channel impedance was kept below 50 KΩ.
Pre-processing
EEG recordings were down-sampled to 420 Hz and were filtered using a 0.03–50 Hz bandpass filter with custom signal processing software. Artifact rejection was performed in two steps. For infants, first, channels with more than 20% of samples exceeding a 100–150 μV amplitude threshold were replaced with the average amplitude of its six nearest-neighbor channels. The continuous EEG signals were then re-referenced to the common average of all channels and segmented into 1166.7 ms epochs (i.e. duration of five stimuli starting with a target category image followed with four images drawn from the rest four categories). Epochs with more than 15% of time samples exceeding threshold (150–200 μV) were excluded further on a channel-by-channel basis (Norcia et al., 2017). For adults, the two-step artifact rejection was performed with different criteria as EEG response amplitudes are lower in adults than infants (Norcia et al., 2017). EEG channels with more than 15% of samples exceeding a 30 μV amplitude threshold were replaced by the average value of their neighboring channels. Then the EEG signals were re-referenced to the common average of all channels and segmented into 1.1667 s epochs. Epochs with more than 10% of time samples exceeding threshold (30–80 μV) were excluded on a channel-by-channel basis (Kohler et al., 2020).
Appendix 1—table 2 shows the number of epochs (1.1667 s each) we acquired before and after data pre-processing summing across all five categories. We used data after pre-processing for further analyses. There was no significant difference in the number of pre-processed epochs across infant age groups (F(3,57) = 1.5, p=0.2). The number of electrodes being interpolated for each age group were 10.0±4.8 for 3- to 4-month-olds, 9.9 ± 3.7 for 4- to 6-month-olds, 9.9 ±3.9 for 6- to 8-month-olds, and 7.7 ±4.7 for 12- to 15-month-olds. There was no significant difference in the number of electrodes being interpolated across infant age groups (F(3,55) = 0.78, p=0.51).
Univariate EEG analyses
Both image-update and categorical EEG visual responses are reported in the frequency and time domain over three ROIs: two occipito-temporal ROIs (LOT: channels 57, 58, 59, 63, 64, 65, and 68; ROT channels: 90, 91, 94, 95, 96, 99, and 100) and one occipital ROI (channels 69, 70, 71, 74, 75, 76, 82, 83, and 89). These ROIs were selected a priori based on a previously published study (Xie et al., 2019). We further removed several channels in these ROIs for two reasons: (i) Three outer rim channels (i.e. 73, 81, and 88) were not included in the occipital ROI for further data analysis for both infant and adult participants because they were consistently noisy. (ii) Three channels (66, 72, and 84) in the occipital ROI, one channel (50) in the LOT ROI, and one channel (101) in the ROT ROI were removed because they did not show substantial responses in the group-level analyses.
Frequency domain analysis
Individual participant’s pre-processed EEG signals for each stimulus condition were averaged over two consecutive epochs (2.3334 s). The averaged time courses for each participant were then converted to the frequency domain at a frequency resolution of 0.4286 Hz via a discrete Fourier transform (DFT). The frequency bins of interest are at exactly every other bin in the frequency spectrum. The real and imaginary Fourier coefficients for each of the categorical and image-update responses for each condition were averaged across participants (vector averaging) to obtain a group-level estimate. The amplitudes of response were then computed from the coherently averaged vector. Hotelling’s T2 statistic (Victor and Mast, 1991) was used to test whether response amplitudes were significantly different from zero. We used Benjamini’s & Hochberg’s FDR procedure to correct for multiple comparisons.
Image-update visual responses (image frequency)
The amplitude and phase of the evoked response at the image presentation rate and its first three harmonics (4.286 Hz, 8.571 Hz, 12.857 Hz, and 17.143 Hz).
Categorical responses (category frequency)
The amplitude and phase of the response at the category repetition frequency and its second harmonic (0.857 Hz, 1.714 Hz) for each category condition.
Time domain analyses
Pre-processed time domain EEG signals of each participant were low-passed filtered with a 30 Hz cut-off. The raw EEG signals contain many components including categorical responses (0.857 Hz and harmonics), general visual responses (4.286 Hz and harmonics), and noise. To separate out the temporal waveforms of these two responses, we first transformed the epoch-averaged (2.3334 s) time series of each condition into frequency domain using a DFT. Then, we used an inverse DFT to transform back to the time domain keeping only the responses at the category frequency and its harmonics, zeroing the other frequencies. The same approach was used to separate the visual responses by using an inverse DFT transform of the responses at 4.286 Hz and harmonics.
Categorical responses
We kept responses at frequencies of interest (0.857 Hz and its harmonics up to 30 Hz, excluding the harmonic frequencies that overlapped with the image frequency and its harmonics) and zeroed responses in other frequencies. Then we applied an inverse Fourier transform to transform the data to the time domain. We further segmented the time series into 1.1667 s epochs and averaged across these epochs for each condition and individual. The mean and standard error across participants were computed for each condition at each time point.
Image-update visual responses
A similar procedure was performed except that frequencies of interest are 4.286 Hz and its harmonics, and the rest were zeroed. As temporal waveforms for image-update responses were similar across different category conditions, we averaged waveforms across all five conditions and report the mean response (Figure 2).
Statistical analyses
To determine time windows in which amplitudes were significantly different from zero for each condition, we used a cluster-based nonparametric permutation t-test, 10,000 permutations, with a threshold of p<0.05, two-tailed on the post-stimulus onset time points (0–1167 ms) (Appelbaum et al., 2006; Blair and Karniski, 1993). The null hypothesis is that the evoked waveforms are not different from zero at any time point. For each permutation, we assigned random signs to the data of individual participants and computed the group-level difference against zero using a t-test. We then calculated the cluster-level statistic as the sum of t-values in the consecutive time points with p-values less than 0.05 (Maris and Oostenveld, 2007). We calculated the maximum cluster-level statistic for each permutation to generate a nonparametric reference distribution of cluster-level statistics. We rejected the null hypothesis if the cluster-level statistic for any consecutive time points in the original data was larger than 97.5% or smaller than 2.5% of the values in the null distribution.
Decoding analyses
Group level
We used an LOOCV classifier to test if spatiotemporal responses patterns to each of the five categories were reliable across participants. The classifier was trained on averaged data from N–1 participants and tested on how well it predicted the category the left-out participant was viewing from their brain activations. This procedure was repeated for each left-out participant. We calculated the averaged category temporal waveform for each category across channels of our three ROIs: seven LOT, nine occipital, and seven ROT, as the exact location of the channels varies across individuals. Then, we concatenated the waveform from the three ROIs to form a spatiotemporal response vector (Figure 5A). At each iteration, the LOOCV classifier computed the correlation between each of the five category vectors from the left-out participant (test data, for an unknown stimulus) and each of the mean spatiotemporal vectors across the N–1 participants (training data, labeled data). The winner-take-all classifier classifies the test vector to the category of the training vector that yields the highest correlation with the training vector (Figure 5A). For a given test pattern, correct classification yields a score of 1 and an incorrect classification yields a score of 0. For each left-out infant, we computed the percentage correct across all categories, and then the mean decoding performance across all participants in an age group (Figure 5B).
Individual level
Similar to group level with two differences: (i) All analyses were done within an individual using a split-half approach. That is, the classifier was trained on one half of the data (i.e. odd or even trials) and tested on the other half of the data. (ii) Spatiotemporal patterns for each category used the concatenated waveforms across 23 channels spanning the occipital and bilateral occipitotemporal ROIs.
Category distinctiveness (Nordt et al., 2023)
Category distinctiveness is defined as the difference between the similarity (correlation coefficient) of spatiotemporal responses within a category across odd and even splits using different images and the average between-category similarity of spatiotemporal responses across odd and even splits (Nordt et al., 2023). Distinctiveness is higher when the within-category similarity is positive and the between-category similarity is negative and varies from –2 (no category information) to 2 (maximal category information). We computed category distinctiveness for each of the five categories as in each infant and determined if it varied from 3 to 15 months of age.
Statistical analyses of developmental effects
To quantify developmental effects, we used LMMs (Bosker and Snijders, 2011), with the ‘fitlme’ function in MATLAB version 2021b (MathWorks, Inc). LMMs allow explicit modeling of both within-subject effects (e.g. longitudinal measurements) and between-subject effects (e.g. cross-sectional data) with unequal number of points per participants, as well as examine main and interactive effects of both continuous (age) and categorical (e.g. stimulus category) variables. We used random-intercept models that allow the intercept to vary across participants (term: 1|participant). In all LMMs, we measured development as a function of log 10(age in days) as development is typically faster earlier on. Indirect evidence comes from neuroimaging and post-mortem studies showing that the structural development of infants’ brains is nonlinear, with development in the first 2 years being rapid, especially the first year (Gilmore et al., 2018).
We report slope (rate of development), interaction effects, and their significance. Table 1 summarizes LMMs used in this study.
Analysis of noise
To test whether EEG noise levels vary with age, e.g. whether noise in the EEG data is larger in the younger infants than older ones, we quantified the response amplitudes in the occipital ROI in the frequency domain, at frequency bins next to the category and image frequency bins (white bars in Figure 2A–D, right panel). The noise level was quantified as the amplitude of response up to 8.571 Hz excluding image presentation frequencies (4.286 Hz and harmonics) and category frequencies (0.857 Hz and harmonics) as this frequency range includes the relevant main harmonics (Figure 2E). We used a LMM to test if noise varies with age, with participant as a random effect:
We found no significant differences in noise amplitude across infant age groups (Figure 2E, mean amplitude across the first five noise bins: βage = –0.005, 95% CI: –0.12 to –0.11, t(59) = –0.09, p=0.93; mean noise across the first 10 bins: βage = .04, 95% CI: –0.03 to –0.12, t(59) = 1.16, p=0.25, LMMs with age (log transformed) as fixed effects and participant as a random effect).
Appendix 1
Demographic information
Low-level image properties analyses
As images of items of visual categories vary both on low-level and high-level properties, and both the EEG signal and babies’ attention may be affected by low-level properties like contrast or luminance, it is important to control for several low-level factors such as luminance, contrast, similarity spatial frequency, as well as high-level properties such as familiarity across categories. Here, we used a subset of categories from Stigliani et al., 2015, that did not contain familiar stimuli to our participants and were controlled for several low-level factors using the SHINE toolbox (Willenbockel et al., 2010). To validate the effect of employing SHINE, we evaluated several low-level metrics and tested if they differed across categories: (i) Luminance: mean gray level of each image; (ii) contrast: mean standard deviation of gray-level values of each image; (iii) similarity: mean pixel-wise similarity between gray levels across pairs of images of a category, and (iv) spatial frequency distribution: we transformed each image to the Fourier domain using FFT and calculated its circular amplitude spectrum. We calculated these metrics for each image, then tested if contrast, luminance, similarity metrics were significantly different across categories using pairwise nonparametric permutation t-tests (10,000 times, with Bonferroni correction). While it is difficult to completely control multiple low-level metrics across categories (Willenbockel et al., 2010), our analyses show that images are largely controlled for several low-level metrics across categories, and that medians, means, and ranges are matched across categories (Figure 1B; Appendix 1—table 3). With respect to differences in contrast and luminance, there are no significant differences across categories except that limbs have slightly but significantly lower luminance (ps<0.05, Bonferroni corrected) and lower pairwise similarity (ps<0.01) and higher contrast (ps<0.05, except for corridors) than other categories. For similarity, the medians and ranges are similar across categories, but the outliers vary producing significant differences across categories (permutation tests, p<0.05, Bonferroni corrected). For spatial frequency distribution, the means and standard deviations are similar across categories (Figure 1B, right). We further ran Kolmogrov-Smirnov tests for each pair of categories with Bonferroni correction, our results revealed significantly different distributions across category pairs (ps<0.001, except for limbs and corridors).
Analyses of adult data
In the study, one concern is that the amount of empirical data typically collected in infants is less than in adults, which may compromise the experimental power to detect responses in infants as it may affect the noise bandwidth (sensitivity) of the frequency tagging analysis. Thus, we tested whether it is possible to measure category-selective responses in 20 adults using the same experiment (Figure 1) and the same amount of data collected in infants using three types of analyses. By using the same number of trials in adults as those obtained from our infant participants for data analyses, our goal was to test that we had sufficient power to detect categorical responses from infants using the experimental paradigm. We expect temporal and amplitude differences between adults and infants (Xie et al., 2022) as infants have immature brains and skulls. For example, cortical gyrification, which determines the orientation of the electrical fields generated in a certain part of the brain region on the scalp, still undergoes development during the first 2 years of infants’ lives (Li et al., 2014). Second, adults’ skulls are thicker and have lower conductivity than infants’ skulls, thus electrical signals on their scalp are lower than infants (Grieve et al., 2003). Nonetheless, we tested if we could in principle detect category information in adults with the same amount of data as infants. We reasoned that if category information can be detected in adults and signals are stronger in infants then we should have the power to detect category information in infants.
Visual responses
We averaged the pre-processed data across two consecutive epochs (2.3334 s) and across conditions for each individual to measure visual responses to all images that were presented at 4.286 Hz and then transformed the data to the frequency domain. Appendix 1—figure 1A shows the three-D topographies of group-averaged responses at 4.286 Hz and its first three harmonics (left panel) and the group-averaged Fourier amplitude spectrum over middle occipital and occipitotemporal ROIs (right panel). We used Hotelling’s T2 statistic to test the amplitude significance and found significant visual responses in adults at 4.286 Hz over the occipital ROI containing nine electrodes over early visual cortex (p<0.05, FDR corrected at four levels) and significant visual responses at 4.286 Hz and its first two harmonics over the occipitotemporal ROI (14 electrodes) (ps<0.05).
Examining the temporal waveform of the visual response (filtered at the image frequency and its harmonics), we found a relatively slow waveform with one peak, after 100 ms since stimulus onset (Appendix 1—figure 1B).
Visual category responses
As adults have mature category-selective regions, we next tested if using our SSVEP paradigm and the same amount of data as in infants, we identify significant category-selective responses in adults. First, we analyzed the mean response at the category frequency and its harmonics in lateral occipitotemporal ROIs. This is a selective response as it reflects the relative response to the target category (generalized across exemplars) above the general visual response to images of other categories. Despite lower response amplitudes in adults than infants, using the same amount of data as infants, adults show significant category-selective responses to each of these five categories (Appendix 1—figure 2A for all categories). Appendix 1—figure 2A shows the group-averaged Fourier amplitude spectrum over LOT and ROT ROIs (top panels) and the three-D topographies of group-averaged responses at 0.857 Hz and its first harmonic (bottom panel). We used Hotelling’s T2 statistic to test the amplitude significance and found significant above-zero category-selective responses in adults at 0.857 Hz and its multiple harmonics over bilateral occipitotemporal ROIs containing seven electrodes each (ps<0.05, FDR corrected over eight levels).
By transforming the filtered data at the category frequency and its harmonics to examine the temporal waveform of each category. Notably, each category generates a unique topography (bottom panels) at 200–217 ms after stimulus onset (Appendix 1—figure 2B). For faces, we found an early negative deflection peaking ~200 ms after stimulus onset in both the LOT and ROT ROIs, with the ROT ROI showing a numerically larger mean response amplitude than the left (Appendix 1—figure 2B, faces). Similarly, we found an early negative peak at around 200 ms for characters, and a left hemisphere dominance (at the 162–248 ms and 398–474 ms time windows Appendix 1—figure 2B, characters). For both limbs and corridors, there was an early positive waveform peaking at around 200 ms in both hemispheres with no hemispheric differences.
Second, we examined in the same 20 adults whether the spatiotemporal pattern of brain responses evoked by different visual categories are distinct from one another and are reliable across participants, using an LOOCV classifier approach with spatiotemporal time series concatenated with mean temporal waveforms from three ROIs: (Appendix 1—figure 3A, left). Mean LOOCV classification performance was around 80% and significantly above chance in adults (Appendix 1—figure 3A, right). This classification was associated with correct decoding of all five categories from distributed responses in the majority of participants.
Third, we examined in the same 20 adults if there is reliable category information in spatiotemporal distributed responses in each individual. Distributed spatiotemporal responses were measured in individual participants over 23 occipital and lateral occipital ROIs using split half of data. We computed the RSM across split-halves and then calculated category distinctiveness by subtracting mean between category similarity from within-category similarity for each category. Mean RSM across 20 adults shows that spatiotemporal patterns are more similar across items of a category than across items of different categories (Appendix 1—figure 3B, left), and the category distinctiveness scores are significantly above zero for all five categories: faces, t(19) = 11.18, pFDR<0.05; limbs, t(19) = 5.01, pFDR<0.05; corridors, t(19) = 5.58, pFDR<0.05; characters, t(19) = 6.56, pFDR<0.05; and cars, t(19) = 10.59, pFDR<0.05.
Together, these analyses suggest that the experimental paradigm has sufficient power to identify category representations both at the group and individual levels.
Validation of experimental paradigm
As visual acuity develops during the first year of life (Norcia and Tyler, 1985; Norcia et al., 1990; Appendix 1—figure 4), one concern is that our controlled natural, gray-level stimuli may not be distinguishable to infants. Measurements of visual evoked potentials (Norcia and Tyler, 1985; Norcia et al., 1990) suggest that visual acuity in 3-month-olds is around 5–8 cycles per degree (cpd) and in 6-month-olds around 10–16 cpd (Appendix 1—figure 4). Thus, to simulate how our images may appear to infants, we filtered all images at 5 cpd. Despite being blurry, images of different categories are distinguishable and individual items retain their identity by visualization (Appendix 1 – Videos 1–5).
Visual responses over occipital cortex per condition for all age groups
In the main analysis, we averaged the image-update visual responses across five conditions for each infant, as the same visual stimuli from all five stimuli categories were viewed by the infant. However, we are showing the mean Fourier amplitude spectrum over the occipital cortex for each condition for all age groups (Appendix 1—figure 5). The response patterns across conditions at each age group is similar. To examine whether there are visual response amplitude differences between conditions by age groups, we quantified the RMS amplitude value of the responses at the image-update frequency (4.286 Hz) and its first three harmonics (8.571 Hz, 12.857 Hz, and 17.143 Hz) for each category condition and infant. Then, we used an LMM to test for an age by category interaction. The LMM was conducted over the posterior occipital ROI. Results of this analysis find no significant main effect of category (βcategory = 0.08, 95% CI: –0.08 to 0.24, t(301) = 0.97, p=0.33) or category by age interaction (βcategory × age = –0.04, 95% CI: –0.11 to 0.03, t(301) = –1.09, p=0.28), which means that the visual response amplitudes are consistent across category conditions.
LMM analyses of visual responses (associated with Figure 2)
Frequency domain analyses of infants’ categorical responses to limbs, corridors, characters, and cars (associated with Figure 3)
Figure 3 shows the group-averaged categorical response to faces. Appendix 1—figures 6–9 show group averaged categorical responses to the rest four conditions other than faces in four age groups. We found significant responses in 6- to 8-month-olds for limbs, corridors, and characters, and in 12–15 months for corridors and characters.
LMM analyses of category-selective responses
Individual-level decoding analysis
Data availability
Individual preprocessed EEG data, and code for all analyses are available on Github: https://github.com/VPNL/InfantObjectCategorization (copy archived at Yan, 2023). Individual EEG data is also available on OSF (https://doi.org/10.17605/OSF.IO/G5FTA).
References
-
Cue-invariant networks for figure and background processing in human visual cortexThe Journal of Neuroscience 26:11695–11708.https://doi.org/10.1523/JNEUROSCI.2741-06.2006
-
Seeing faces is necessary for face-domain formationNature Neuroscience 20:1404–1412.https://doi.org/10.1038/nn.4635
-
Universal mechanisms and the development of the face network: what you see is what you getAnnual Review of Vision Science 5:341–372.https://doi.org/10.1146/annurev-vision-091718-014917
-
Visual cortex: big data analysis uncovers food specificityCurrent Biology 32:R1012–R1015.https://doi.org/10.1016/j.cub.2022.08.068
-
Temporal dynamics of visual representations in the infant brainDevelopmental Cognitive Neuroscience 45:100860.https://doi.org/10.1016/j.dcn.2020.100860
-
A vision of graded hemispheric specializationAnnals of the New York Academy of Sciences 1359:30–46.https://doi.org/10.1111/nyas.12833
-
Object domain and modality in the ventral visual pathwayTrends in Cognitive Sciences 20:282–290.https://doi.org/10.1016/j.tics.2016.02.002
-
BookMultilevel Analysis: An Introduction to Basic and Advanced Multilevel ModelingSage.
-
Organization of high-level visual cortex in human infantsNature Communications 8:13995.https://doi.org/10.1038/ncomms13995
-
Imaging structural and functional brain development in early childhoodNature Reviews. Neuroscience 19:123–137.https://doi.org/10.1038/nrn.2018.1
-
Visual following and pattern discrimination of face-like stimuli by newborn infantsPediatrics 56:544–549.
-
Spatial correlation of the infant and adult electroencephalogramClinical Neurophysiology 114:1594–1608.https://doi.org/10.1016/s1388-2457(03)00122-6
-
Visual recognition: as soon as you know it is there, you know what it isPsychological Science 16:152–160.https://doi.org/10.1111/j.0956-7976.2005.00796.x
-
The functional architecture of the ventral temporal cortex and its role in categorizationNature Reviews. Neuroscience 15:536–548.https://doi.org/10.1038/nrn3747
-
The cortical development of specialized face processing in infancyChild Development 87:1581–1600.https://doi.org/10.1111/cdev.12543
-
Face-sensitive cortical processing in early infancyJournal of Child Psychology and Psychiatry, and Allied Disciplines 45:1228–1234.https://doi.org/10.1111/j.1469-7610.2004.00321.x
-
Frequency-domain analysis of fast oddball responses to visual stimuli: A feasibility studyInternational Journal of Psychophysiology 73:287–293.https://doi.org/10.1016/j.ijpsycho.2009.04.011
-
Selectivity for food in human ventral visual cortexCommunications Biology 6:175.https://doi.org/10.1038/s42003-023-04546-2
-
Why are faces denser in the visual experiences of younger than older infants?Developmental Psychology 53:38–49.https://doi.org/10.1037/dev0000230
-
The fusiform face area: a module in human extrastriate cortex specialized for face perceptionThe Journal of Neuroscience 17:4302–4311.https://doi.org/10.1523/JNEUROSCI.17-11-04302.1997
-
Infants’ recognition of their mothers’ faces in facial drawingsDevelopmental Psychobiology 62:1011–1020.https://doi.org/10.1002/dev.21972
-
Mapping longitudinal development of local cortical gyrification in infants from birth to 2 years of ageThe Journal of Neuroscience 34:4228–4238.https://doi.org/10.1523/JNEUROSCI.3976-13.2014
-
Development of the macaque face-patch systemNature Communications 8:14897.https://doi.org/10.1038/ncomms14897
-
What drives the organization of object knowledge in the brain?Trends in Cognitive Sciences 15:97–103.https://doi.org/10.1016/j.tics.2011.01.004
-
Nonparametric statistical testing of EEG- and MEG-dataJournal of Neuroscience Methods 164:177–190.https://doi.org/10.1016/j.jneumeth.2007.03.024
-
Face perception during early infancyPsychological Science 10:419–422.https://doi.org/10.1111/1467-9280.00179
-
Development of contrast sensitivity in the human infantVision Research 30:1475–1486.https://doi.org/10.1016/0042-6989(90)90028-j
-
BookDevelopment of vision in infancyIn: Norcia AM, editors. Adler’s Physiology of the Eye. Elsevier Health Sciences. pp. 531–551.https://doi.org/10.1016/B978-0-323-05714-1.00038-8
-
Cortical recycling in high-level visual cortex during childhood developmentNature Human Behaviour 5:1686–1697.https://doi.org/10.1038/s41562-021-01141-5
-
Mother’s face recognition by neonates: a replication and an extensionInfant Behavior and Development 18:79–85.https://doi.org/10.1016/0163-6383(95)90009-8
-
Frequency tagging with infants: the visual oddball paradigmFrontiers in Psychology 13:1015611.https://doi.org/10.3389/fpsyg.2022.1015611
-
Biomarkers of face perception in autism spectrum disorder: time to shift to fast periodic visual stimulation with electroencephalography?Biological Psychiatry. Cognitive Neuroscience and Neuroimaging 5:258–260.https://doi.org/10.1016/j.bpsc.2020.01.008
-
A domain-relevant framework for the development of face processingNature Reviews Psychology 2:183–195.https://doi.org/10.1038/s44159-023-00152-5
-
Visual object categorization in infancyPNAS 119:e2105866119.https://doi.org/10.1073/pnas.2105866119
-
Temporal processing capacity in high-level visual cortex is domain specificThe Journal of Neuroscience 35:12412–12424.https://doi.org/10.1523/JNEUROSCI.4822-14.2015
-
Innate face processingCurrent Opinion in Neurobiology 19:39–44.https://doi.org/10.1016/j.conb.2009.03.001
-
ERP evidence of developmental changes in processing of facesClinical Neurophysiology 110:910–915.https://doi.org/10.1016/s1388-2457(99)00006-1
-
Environmental influences on the pace of brain developmentNature Reviews. Neuroscience 22:372–384.https://doi.org/10.1038/s41583-021-00457-5
-
A new statistic for steady-state evoked potentialsElectroencephalography and Clinical Neurophysiology 78:378–388.https://doi.org/10.1016/0013-4694(91)90099-p
-
Controlling low-level image properties: the SHINE toolboxBehavior Research Methods 42:671–684.https://doi.org/10.3758/BRM.42.3.671
-
Neural correlates of facial emotion processing in infancyDevelopmental Science 22:e12758.https://doi.org/10.1111/desc.12758
-
Visual category representations in the infant brainCurrent Biology 32:5422–5432.https://doi.org/10.1016/j.cub.2022.11.016
-
SoftwareInfantObjectCategorization, version swh:1:rev:299e90b1d435701db7b66e690a922a14062ebb5fSoftware Heritage.
Article and author information
Author details
Funding
Wu Tsai Neurosciences Institute, Stanford University (139471)
- Anthony M Norcia
- Kalanit Grill-Spector
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This work was supported by grants from the Wu Tsai Neurosciences Institute of Stanford University, the Human Centered Artificial Intelligence Institute of Stanford University to KGS and AMN.
Ethics
Ethical permission (eprotocol number: 48634) for the study was obtained from the Institutional Review Board of Stanford University. Parents of the infant participants provided written informed consent prior to their first visit and also prior to each session if they came for multiple sessions.
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
- Version of Record published:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.100260. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2024, Yan et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 404
- views
-
- 11
- downloads
-
- 0
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Neuroscience
Speech production and perception involve complex neural dynamics in the human brain. Using magnetoencephalography, our study explores the interaction between cortico-cortical and cortico-subcortical connectivities during these processes. Our connectivity findings during speaking revealed a significant connection from the right cerebellum to the left temporal areas in low frequencies, which displayed an opposite trend in high frequencies. Notably, high-frequency connectivity was absent during the listening condition. These findings underscore the vital roles of cortico-cortical and cortico-subcortical connections within the speech production and perception network. The results of our new study enhance our understanding of the complex dynamics of brain connectivity during speech processes, emphasizing the distinct frequency-based interactions between various brain regions.
-
- Neuroscience
- Stem Cells and Regenerative Medicine
Parkinson’s disease (PD) is a multifactorial disease caused by irreversible progressive loss of dopaminergic neurons (DANs). Recent studies have reported the successful conversion of astrocytes into DANs by repressing polypyrimidine tract binding protein 1 (PTBP1), which led to the rescue of motor symptoms in a chemically-induced mouse model of PD. However, follow-up studies have questioned the validity of this astrocyte-to-DAN conversion model. Here, we devised an adenine base editing strategy to downregulate PTBP1 in astrocytes and neurons in a chemically-induced PD mouse model. While PTBP1 downregulation in astrocytes had no effect, PTBP1 downregulation in neurons of the striatum resulted in the expression of the DAN marker tyrosine hydroxylase (TH) in non-dividing neurons, which was associated with an increase in striatal dopamine concentrations and a rescue of forelimb akinesia and spontaneous rotations. Phenotypic analysis using multiplexed iterative immunofluorescence imaging further revealed that most of these TH-positive cells co-expressed the dopaminergic marker DAT and the pan-neuronal marker NEUN, with the majority of these triple-positive cells being classified as mature GABAergic neurons. Additional research is needed to fully elucidate the molecular mechanisms underlying the expression of the observed markers and understand how the formation of these cells contributes to the rescue of spontaneous motor behaviors. Nevertheless, our findings support a model where downregulation of neuronal, but not astrocytic, PTBP1 can mitigate symptoms in PD mice.