Abstract
Studying infant minds with movies is a promising way to increase engagement relative to traditional tasks. However, the spatial specificity and functional significance of movie-evoked activity in infants remains unclear. Here we investigated what movies can reveal about the organization of the infant visual system. We collected fMRI data from 15 awake infants and toddlers aged 5–23 months who attentively watched a movie. The activity evoked by the movie reflected the functional profile of visual areas. Namely, homotopic areas from the two hemispheres responded similarly to the movie, whereas distinct areas responded dissimilarly, especially across dorsal and ventral visual cortex. Moreover, visual maps that typically require time-intensive and complicated retinotopic mapping could be predicted, albeit imprecisely, from movie-evoked activity in both data-driven analyses (i.e., independent components analysis) at the individual level and by using functional alignment into a common low-dimensional embedding to generalize across participants. These results suggest that the infant visual system is already structured to process dynamic, naturalistic information and that fine-grained cortical organization can be discovered from movie data.
Studying the function and organization of the youngest human brains remains a challenge. Despite the recent growth in infant fMRI1–6, one of the most important obstacles facing this research is that infants are unable to maintain fochus for long periods of time and struggle to complete traditional cognitive tasks7. Movies can be a useful tool for studying the developing mind8, as has been shown in older children9–11. The dynamic, continuous, and content-rich nature of movie stimuli12, 13 make them effective at capturing infant attention14, 15. Here, we examine what can be revealed about the functional organization of the infant brain during movie-watching.
We focus on visual cortex because its organization at multiple spatial scales is well understood from traditional, task-based fMRI. The mammalian visual cortex is divided into multiple areas with partially distinct functional roles between areas16–18. Within visual areas, there are orderly, topographic representations, or maps, of visual space19, 20. These maps capture information about the location and spatial extent of visual stimuli with respect to fixation. Thus, maps reflect sensitivity to polar angle, measured via alternations between horizontal and vertical meridians that define area boundaries21, 22, and sensitivity to spatial frequency, reflected in gradients of sensitivity to high and low spatial frequencies from foveal to peripheral vision, respectively23. Previously, we reported that these maps could be revealed by a retinotopy task in infants as young as 5 months of age24. However, it remains unclear whether these maps are evoked by more naturalistic task-designs.
The primary goal of the current study is to investigate whether movie-watching data recapitulates the organization of visual cortex. Movies drive strong and naturalistic responses in sensory regions while minimizing task demands12, 13, 25 and are thus a proxy for typical experience. In adults, movies and resting-state data have been used to characterize the visual cortex in a data-driven fashion26–28. Movies have been useful in awake infant fMRI for studying event segmentation29, functional alignment30, and brain networks31. However, this past work did not address the granularity and specificity of cortical organization that movies evoke. For example, movies evoke similar activity across infants in anatomically aligned visual areas29, but it remains unclear whether responses to movie content differ between visual areas (e.g., is there more similarity of function within visual areas than between32). Moreover, it is unknown whether structure within visual areas, namely visual maps, contributes substantially to visual evoked activity. Additionally, we wish to test whether methods for functional alignment can be used with infants. Functional alignment finds a mapping between participants using functional activity – rather than anatomy – and in adults can improve signal-to-noise, enhance across participant prediction, and enable unique analyses28, 33–35.
Nonetheless, there are several reasons for skepticism that movies could evoke detailed, retinotopic organization: Movies may not fully sample the stimulus parameters (e.g., spatial frequencies) or visual functions needed to find topographic maps and areas in visual cortex. Even if movies contain the necessary visual properties, they may unfold at a faster rate than can be detected by fMRI. Additionally, naturalistic stimuli may not drive visual responses as robustly as experimenter-defined stimuli that are designed for retinotopic mapping with discrete onsets and high contrast. Finally, the complexity of movie stimuli may result in variable attention between participants, impeding discovery of reliable visual structure across individuals. If movies do show the fine-grained organization of the infant visual cortex, this suggests that this structure (e.g., visual maps) scaffolds the processing of ongoing visual information.
We conducted several analyses to probe different kinds of visual granularity in infant movie-watching fMRI data. First, we asked whether distinct areas of the infant visual cortex have different functional profiles. Second, we asked whether the topographic organization of visual areas can be recovered within participants. Third, we asked whether this within-area organization is aligned across participants. These three analyses assess key indicators of the mature visual system: functional specialization between areas, organization within areas, and consistency between individuals.
Results
We performed fMRI in awake, behaving infants and toddlers using a protocol described previously7. The dataset consisted of 15 sessions of infant participants (4.8–23.1 months old) who had both movie-watching data and retinotopic mapping data collected in the same session (Table S1). All available movies from each session were included (Table S2), with an average duration of 540.7s (range: 186–1116s).
The retinotopic-mapping data from the same infants24 allowed us to generate infant-specific meridian maps (horizontal versus vertical stimulation) and spatial frequency maps (high versus low stimulation). The meridian maps were used to define regions of interest (ROIs) for visual areas V1, V2, V3, V4, and V3A/B.
As a proof of concept that the analyses we use with infants can identify fine-grained visual organization, we ran the main analyses on an adult sample. These adults (8 participants) had both retinotopic mapping data and movie-watching data. Figures S1, S2, S3, and S4 demonstrate that applying these analyses to adult movie data reveals similar structure to what we find in infants.
Evidence of area organization with homotopic similarity
To determine what movies can reveal about the organization of areas in visual cortex, we compared activity across left and right hemispheres. Although these analyses cannot define visual maps, they test whether visual areas have different functional signatures. Namely, we correlated timecourses of movie-related BOLD activity between retinotopically defined, participant-specific ROIs (7.3 regions per participant per hemisphere, range: 6–8)32, 36, 37. Higher correlations between the same (i.e., homotopic) areas than different areas indicates differentiation of function between areas. Moreover, other than V1, homotopic visual areas are anatomically separated across the hemispheres, so similar responses are unlikely to be attributable to spatial autocorrelation.
Homotopic areas (e.g., left ventral V1 and right ventral V1; diagonal of Figure 1A) were highly correlated (Mean [M]=0.88, range of area means: 0.85–0.90), and more correlated than non-homotopic areas, such as the same visual area across streams (e.g., left ventral V1 and right dorsal V1; Figure 1B; ΔFisher ZM=0.42, p<0.001). To clarify, we use the term ‘stream’ to liberally distinguish visual regions that are more dorsal or more ventral, as opposed to the functional definition used in reference to the ‘what’ and ‘where’ streams18. We found no evidence that the variability in movie duration per participant correlated with this difference (r=0.08, p=.700). Within stream (Figure 1C), homotopic areas were more correlated than adjacent areas in the visual hierarchy (e.g., left ventral V1 and right ventral V2; ΔFisher Z M=0.09, p<0.001), and adjacent areas were more correlated than distal areas (e.g., left ventral V1 and right ventral V4; ΔFisher Z M=0.20, p<0.001). There was no correlation between movie duration and effect (Same > Adjacent: r=- 0.01, p=.965, Adjacent > Distal: r=-0.09, p=.740). Additionally, if we control for motion in the correlation between areas — in case motion transients drive consistent activity across areas — then the effects described here are negligibly different (Figure S5). Hence, movies elicit distinct processing dynamics across areas of infant visual cortex defined independently using retinotopic mapping.
We previously found24 that an anatomical segmentation of visual cortex38 could identify these same areas reasonably well. Indeed, the results above were replicated when using visual areas defined anatomically (Figure S6). However, a key advantage of anatomical segmentation is that it can define visual areas not mapped by a functional retinotopy task. This could help address limitations of the analyses above, namely that there was a variable number of retinotopic areas identified across infants and these areas covered only part of visually responsive cortex. Focusing on broader areas that include portions of the ventral and dorsal stream in the adult visual cortex18, 38, we tested for functional differentiation of these streams in infants. We applied multi-dimensional scaling (MDS) — a data-driven method for assessing the clustering of data — to the average cross-correlation matrix across participants (Figure S6)36, 39. The stress of fitting these data with a two-dimensional MDS was in the acceptable range (0.076). Clear organization was present (Figure 2): areas in the adult-defined ventral stream (e.g., VO, PHC) differentiated from areas in the adult-defined dorsal stream (e.g., V3A/B). Indeed, we see a slight separation between canonical dorsal areas and the recently defined lateral pathway40 (e.g., LO1, hMT), although more evidence is needed to substantiate this distinction. This separation between streams is striking when considering that it happens despite differences in visual field representations across areas: while dorsal V1 and ventral V1 represent the lower and upper visual field, respectively, V3A/B and hV4 both have full visual field maps. These visual field representations can be detected in adults41; however, they are often not the primary driver of function39. We see that in infants too: hV4 and V3A/B represent the same visual space yet have distinct functional profiles. Again, this organization cannot be attributed to mere spatial autocorrelation within stream because analyses were conducted across hemispheres (at significant anatomical distance) and this pattern is preserved when accounting for motion (Figure S5). These results thus provide evidence of a dissociation in the functional profile of anatomically defined ventral and dorsal streams during infant movie-watching.
Evidence of within-area organization with independent components analysis
We next explored whether movies can reveal fine-grained organization within visual areas by using independent components analysis (ICA) to propose visual maps in individual infant brains26, 27, 36, 42, 43. ICA is a method for decomposing a source into constituent signals by finding components that account for independent variance. When applied to fMRI data (using MELODIC in FSL), these components have spatial structure that varies in strength over time. Many of these components reflect noise (e.g., motion, breathing) or task-related signals (e.g., face responses), while other components reflect the functional architecture of the brain (e.g., topographic maps)26, 27, 36, 42, 43. We visually inspected each component and categorized it as a potential spatial frequency map, a potential meridian map, or neither. This process was blind to the ground truth of what the visual maps look like for that participant from the retinotopic mapping task, simulating what would be possible if retinotopy data from the participants were unavailable. Success in this process requires that 1) retinotopic organization accounts for sufficient variance in visual activity to be identified by ICA and 2) experimenters can accurately identify these components.
Multiple maps could be identified per participant because there were more than one candidate that the experimenter thought was a suitable map. Across infant participants, we identified an average of 2.4 (range: 0–5) components as potential spatial frequency maps and 1.1 (range: 0–4) components as potential meridian maps. To evaluate the quality of these maps, we compared them to the ground truth of that participant’s task-evoked maps (Figure 3). Spatial frequency and meridian maps are defined by their systematic gradients of intensity across the cortical surface44. Lines drawn parallel to area boundaries show monotonic gradients on spatial frequency maps, with stronger responses to high spatial frequency at the fovea, and stronger responses to low spatial frequencies in the periphery (Figure S7). By contrast, lines drawn perpendicular to the area boundaries show oscillations in sensitivity to horizontal and vertical meridians on meridian maps (Figure S8). Using the same manually traced lines from the retinotopy task, we measured the intensity gradients in each component from the movie-watching data. We can then use the gradients of intensity in the retinotopy task-defined maps as a benchmark for comparison with the ICA-derived maps.
To assess the selected component maps, we correlated the gradients (described above) of the task-evoked and component maps. This test uses independent data: the components were defined based on movie data and validated against task-evoked retinotopic maps. Figure 4A shows the absolute correlations between the task-evoked maps and the manually identified spatial frequency components (M=0.52, range: 0.23–0.85). To evaluate whether movies are a viable method for defining retinotopic maps, we tested whether the task-evoked retinotopic maps were more similar to manually identified components than other components. We identified the best component in 6 of 13 participants (Figure 4B). The percentile of the average manually identified component was high (M=63.8 percentile, range: 26.7–98.1) and significantly above chance (ΔM=13.8, CI=[3.3–23.9], p=.010). This illustrates that the manually identified components derived from movie-watching data are similar to the spatial frequency maps derived from retinotopic mapping. The fact that this can work also indicates the underlying architecture of the infant visual system influences how movies are processed.
We performed the same analyses on the meridian maps. As noted above, the lines were now traced perpendicular to the boundaries. Figure 4C shows the correlation between the task-evoked meridian maps and the manually identified components (M=0.46, range: 0.03–0.81). Compared to all possible components identified by ICA, the best possible component was identified for 1 out of 9 participants (Figure 4D). Although the percentile of the average manually identified component was numerically high (M=67.6 percentile, range: 3.0–100.0), it was not significantly above chance (ΔM=17.6, CI=[-1.8–33.0], p=.074). This difference in performance compared to spatial frequency is also evident in the fact that fewer components were identified as potential meridian maps, and that several participants had no such maps. Even so, some participants have components that are highly similar to the meridian maps (e.g., s8037 1 2 or s6687 1 5 in Figure S8). Because it is possible, albeit less likely, to identify meridian maps from ICA, the structure may be present in the data but more susceptible to noise or gaze variability. Spatial frequency maps have a coarser structure than meridian maps, and are more invariant to fixation, which may explain why they are easier to identify. Equivalent analyses of adult data (Figure S3) support this conclusion: meridian maps are found in fewer adult participants.
Evidence of within area organization with shared response modeling
Finally, we investigated whether the organization of visual cortex in one infant can be predicted from movie-watching data in other participants using functional alignment28. For such functional alignment to work, stimulus-driven responses to the movie must be shared across participants. These analyses also benefit from greater amounts of data, so we expanded the sample in two ways (Table S2): First, we added 71 movie-watching datasets from additional infants who saw the same movies but did not have usable retinotopy data (and thus were not included in the analyses above that compared movie and retinotopy data within participant). Second, we used data from adult participants, including 8 participants who completed the retinotopy task and saw a subset of the movies we showed infants, and 41 datasets from adults who had seen the movies shown to infants but did not have retinotopy data.
With this expanded dataset, we used shared response modeling (SRM)34 to predict visual maps from other participants (Figure 5). Specifically, we held out one participant for testing purposes and used SRM to learn a low-dimensional, shared feature space from the movie-watching data of the remaining participants in a mask of occipital cortex. This shared space represented the responses to that movie in visual cortex that were shared across participants, agnostic to the precise localization of these responses across voxels in each individual (Figure 5A). The number of features in the shared space (K=10) was determined via a cross-validation procedure on movie-watching data in adults (Figure S9). The task-evoked retinotopic maps from all but the held-out participant were transformed into this shared space and averaged, separately for each map type (Figure 5B). We then mapped the held-out participant’s movie data into the learned shared space without changing the shared space (Figure 5C). In other words, the shared response model was learned and frozen before the held-out participant’s data was considered. This approach has been used and validated in prior SRM studies45. Taking the inverse of the held-out participant’s mapping allowed us to transform the averaged shared space representation of visual maps into the held-out participant’s brain space (Figure 5D).
This predicted visual organization was compared to the participant’s actual visual map from the retinotopy task using the same methods as for ICA. In other words, the manually traced lines were used to measure the intensity gradients in the predicted maps, and these gradients were compared to the ground truth. Critically, predicting the retinotopic maps used no retinotopy data from the held-out participant. Moreover, it is completely unconstrained anatomically (except for a liberal occipital lobe mask). Hence, the similarity of the SRM-predicted map to the task-evoked map is due to representations of visual space in other participants being mapped into the shared space.
We trained SRMs on two populations to predict a held-out infant’s maps: (1) other infants and (2) adults. There may be advantages to either approach: infants are likely more similar to each other than adults in terms of how they respond to the movie; however, their data is more contaminated by motion. When using the infants to predict a held out infant, the spatial frequency map (Figure 6A) and meridian map (Figure 6C) predictions are moderately correlated with task-evoked retinotopy data (spatial frequency: M=0.46, range: -0.06-0.78; meridian: M=0.24, range: - 0.12-0.78). Some participants were fit well using SRM (e.g., s2077 1 1, and s6687 1 5 for Figures S10, S11
To evaluate whether success was due to fitting the shared response, we flipped the held-out participant’s movie data (i.e., the first timepoint became the last timepoint and vice versa) so that an appropriate fit is not be learnable. The vertical lines for each movie in Figure 6 indicate the change in performance for this baseline. Indeed, flipping significantly worsened prediction of the spatial frequency map (ΔFisher Z M=0.52, CI=[0.24–0.80], p<.001) and the meridian map (ΔFisher Z M=0.24, CI=[0.02–0.49], p=.034). Hence, the movie-evoked response enables the mapping of other infants’ retinotopic maps into a held-out infant.
Using adult data to predict infant data also results in maps similar to task-evoked spatial frequency maps (Figure 6B; M=0.56, range: 0.17–0.79) and meridian maps (Figure 6D; M=0.34, range: -0.27-0.64). Some participants were well predicted by these methods (e.g., s8037 1 2, and s6687 1 4 for Figures S12, S13). Again, flipping the held-out participants movie data significantly worsened prediction of the held-out participant’s spatial frequency map (ΔFisher Z M=0.40, CI=[0.17–0.65], p<.001) and meridian map (ΔFisher Z M=0.33, CI=[0.12–0.55], p=.002). There was no significant difference in SRM performance when using adults versus infants as the training set (spatial frequency: ΔFisher Z M=0.14, CI=[-0.00–0.27], p=.054; meridian: ΔFisher ZM=0.11, CI=[-0.05–0.28], p=.179). In sum, SRM could be used to predict visual maps with moderate accuracy. This indicates that functional alignment methods like SRM can partially capture the retinotopic organization of visual cortex from infant movie-watching data.
We performed an anatomical alignment analog of the functional alignment (SRM) approach. This analysis serves as a benchmark for predicting visual maps using task-based data, rather than movie data, from other participants. For each infant participant, we aggregated all other infant or adult participants as a reference. The retinotopic maps from these reference participants were anatomically aligned to the standard surface template, and then averaged. These averages served as predictions of the maps in the test participant, akin to SRM, and were analyzed equivalently (i.e., correlating the gradients in the predicted map with the gradients in the task-based map). These correlations (Table S4) are significantly higher than for functional alignment (using infants to predict spatial frequency, anatomical alignment < functional alignment: ΔFisher Z M=0.44, CI=[0.32– 0.58], p<.001; using infants to predict meridians, anatomical alignment < functional alignment: ΔFisher ZM=0.61, CI=[0.47–0.74], p<.001; using adults to predict spatial frequency, anatomical alignment < functional alignment: ΔFisher Z M=0.31, CI=[0.21–0.42], p<.001; using adults to predict meridians, anatomical alignment < functional alignment: ΔFisher Z M=0.49, CI=[0.39–0.60], p<.001). This suggests that even if SRM shows that movies can be used to produce retinotopic maps that are significantly similar to a participant, these maps are not as good as those that can be produced by anatomical alignment of the maps from other participants without any movie data.
Discussion
We present evidence that movies can reveal the organization of infant visual cortex at different spatial scales. We found that movies evoke differential function across areas, topographic organization of function within areas, and this topographic organization is shared across participants.
We show that the movie-evoked response in a visual area is more similar to the same area in the other hemisphere than to different areas in the other hemisphere. This suggests that visual areas are functionally differentiated in infancy and that this function is shared across hemispheres32. By comparing across anatomically distant hemispheres, we reduced the impact of spatial auto-correlation and isolated the stimulus-driven signals in the brain activity32, 36, 46. The greater across-hemisphere similarity for same versus different areas provides some of the first evidence that visual areas and streams are functionally differentiated in infants as young as 5 months old. Previous work suggests that functions of the dorsal and ventral streams are detectable in young infants47 but that the localization of these functions is immature48. Despite this, we find that the areas of infant visual cortex that will mature into the dorsal and ventral streams have distinct activity profiles during movie watching.
Not only do movies evoke differentiated activity in the infant visual cortex between areas, but movies also evoke fine-grained information about the organization of maps within areas. We used a data-driven approach (ICA) to discover maps that are similar to retinotopic maps in the infant visual cortex. We observed components that were highly similar to a spatial frequency map obtained from the same infant in a retinotopy task. This was also true for the meridian maps, to a lesser degree. This means that the retinotopic organization of the infant brain accounts for a detectable amount of variance in visual activity, otherwise components resembling these maps would not be discoverable. Importantly, the components could be identified without knowledge of these ground-truth maps; however, their moderate similarity to the task-defined maps makes them a poor replacement. One caveat for interpreting these results is that although some of the components are similar to a spatial frequency map or meridian map, they could reflect a different kind of visual map. For instance, the spatial frequency map is highly correlated with the eccentricity map23, 49–51 (which itself is related to receptive field size). This means it is inappropriate to make strong claims about the underlying function of the components based on their similarity to visual maps alone. Another limitation is that ICA does not provide a scale to the variation: although we find a correlation between gradients of spatial frequency in the ground truth and the selected component, we cannot use the component alone to infer the spatial frequency selectivity of any part of cortex. In other words, we cannot infer units of spatial frequency sensitivity from the components alone. Nonetheless, these results do show that it is possible to discover approximations of visual maps in infants and toddlers with movie-watching data and ICA.
We also asked whether functional alignment30, 34 could be used to detect visual maps in infants. Using a shared response model34 trained on movie-watching data of infants or adults, we transformed the visual maps of other individuals into a held-out infant’s brain to evaluate the fit to visual maps from a retinotopy task28. Like ICA, this was more successful for the spatial frequency maps, but it was still possible in some cases with the meridian maps. This is remarkable because the complex pattern of brain activity underlying these visual maps could be ‘compressed’ by SRM into only 10 dimensions in the shared space (i.e., the visual maps were summarized by a vector of 10 values). The weight matrix that ‘decompressed’ visual maps from this low-dimensional space into the held-out infant was learned from their movie-watching data alone. Hence, success with this approach means that visual maps are engaged during infant movie-watching. Furthermore, this result shows that functional alignment is practical for studies in awake infants that produce small amounts of data52. This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults28, 33, 34, or revealing changing function over development45, which may prove especially useful for infant fMRI52. In sum, movies evoke sufficiently reliable activity across infants and adults to find a shared response, and this shared response contains information about the organization of infant visual cortex.
To be clear, we are not suggesting that movies work well enough to replace a retinotopy task when accurate maps are needed. For instance, even though ICA found components that were highly correlated with the spatial frequency map, we also selected some components that turned out to have lower correlations. Without knowing the ground truth from a retinotopy task, there would be no way to weed these out. Additionally, anatomical alignment (i.e., averaging the maps from other participants and anatomically aligning them to a held-out participant) resulted in maps that were highly similar to the ground truth. Indeed, we previously24 found that adult-defined visual areas were moderately similar to infants. While functional alignment with adults can outperform anatomical alignment methods in similar analyses28, here we find that functional alignment is inferior to anatomical alignment. Thus, if the goal is to define visual areas in an infant that lacks task-based retinotopy, anatomical alignment of other participants’ retinotopic maps is superior to using movie-based analyses, at least as we tested it.
In conclusion, movies evoke activity in infants and toddlers that recapitulate the organization of the visual cortex. This activity is differentiated across visual areas and contains information about the visual maps at the foundation of visual processing. The work presented here is another demonstration of the power of content-rich, dynamic, and naturalistic stimuli to reveal insights in cognitive neuroscience.
Methods
Participants
Infant participants with retinotopy data were previously reported in another study24. Of those 17 original sessions, 15 had usable movie data collected in the same session and thus could be included in the current study. In this subsample, the age range was 4.8–23.1 months (M=13.0; 12 female; Table S1). The combinations of movies that infants saw were inconsistent, so the types of comparisons vary across analyses reported here. In brief, all possible infant participant sessions (15) were used in the Homotopy analyses and ICA, whereas two of these sessions (ages = 18.5, 23.1 months) could not be used in the SRM analyses. Table S1 reports demographic information for the infant participants. Table S2 reports participant information about each of the movies. It also reports the number and age of participants that were used to bolster the SRM analyses.
An adult sample was collected (N=8, 3 females) and used for validating the analyses and for supporting SRM analyses in infants. Each participant had both retinotopy and movie watching data. The adult participants saw the five most common movies that were seen by infants in our retinotopy sample. To support the SRM analyses, we also utilized any other available adult data from sessions in which we had shown the main movies in otherwise identical circumstances (Table S2).
Participants were recruited through fliers, word of mouth, or the Yale Baby School. This study was approved by the Human Subjects Committee at Yale University. Adults provided informed consent for themselves or their child.
Materials
Our experiment display code can be found here: https://github.com/ntblab/experiment_menu/tree/Movies/ and https://github.com/ntblab/experiment_menu/tree/retinotopy/. The code used to perform the data analyses is available at https://github.com/ntblab/infant_neuropipe/tree/predict_retinotopy/; this code uses tools from the Brain Imaging Analysis Kit53; https://brainiak.org/docs/). Raw and preprocessed functional and anatomical data will be available here: https://datadryad.org/PENDING_URL but is available from this temporary link while under review: https://drive.google.com/drive/folders/1zKWLluNhUz48MZMAS8-I-xEINpLWYVo_?usp=sharing).
Data acquisition
Data were collected at the Brain Imaging Center (BIC) in the Faculty of Arts and Sciences at Yale University. We used a Siemens Prisma (3T) MRI and only the bottom half of the 20-channel head coil. Functional images were acquired with a whole-brain T2* gradientecho EPI sequence (TR=2s, TE=30ms, flip angle=71, matrix=64x64, slices=34, resolution=3mm iso, interleaved slice acquisition). Anatomical images were acquired with a T1 PETRA sequence for infants (TR1=3.32ms, TR2=2250ms, TE=0.07ms, flip angle=6, matrix=320x320, slices=320, resolution=0.94mm iso, radial slices=30000) and a T1 MPRAGE sequence for adults, with the top of the head coil attached, (TR=2300ms, TE=2.96ms, TI=900ms, flip angle=9, iPAT=2, slices=176, matrix=256x256, resolution=1.0mm iso).
Procedure
Our approach for collecting fMRI data from awake infants has been described in a previous methods paper7, with important details repeated below. Infants were first brought in for a mock scanning session to acclimate them and their parent to the scanning environment. Scans were scheduled when the infants were typically calm and happy. Participants were carefully screened for metal. We applied hearing protection in three layers for the infants: silicon inner ear putty, over-ear adhesive covers and ear muffs. For the infants that were played sound (see below), Optoacoustics noise cancelling headphones were used instead of the ear muffs. The infant was placed on a vacuum pillow on the bed that comfortably reduced their movement. The top of the head coil was not placed over the infant in order to maintain comfort. Stimuli were projected directly on to the surface of the bore. A video camera (High Resolution camera, MRC systems) recorded the infant’s face during scanning. Adult participants underwent the same procedure with the following exceptions: they did not attend a mock scanning session, hearing protection was only two layers (earplugs and Optoacoustics headphones), and they were not on a vacuum pillow. Some infants participated in additional tasks during their scanning session.
When the infant was focused, experimental stimuli were shown using Psychtoolbox54 for MATLAB. The details for the retinotopy task are explained fully elsewhere24. In short, we showed two types of blocks. For the meridian mapping blocks, a bow tie cut-out of a colorful, large, flickering checkerboard was presented in either a vertical or horizontal orientation55. For the spatial frequency mapping blocks, the stimuli were grayscale Gaussian random fields of high (1.5 cycles per visual degree) or low (0.05 cycles per visual degree) spatial frequency36. For all blocks, a smaller (1.5 visual degree) grayscale movie was played at center to encourage fixation. Each block type contained two phases of stimulation. The first phase consisted of one of the conditions (e.g., horizontal or high) for 20s, followed immediately by the second phase with the other condition of the same block type (e.g., vertical or low, respectively) for 20s. At the end of each block there was at least 6s rest before the start of the next block. Infant participants saw up to 12 blocks of this stimulus, resulting in 24 epochs of stimuli. Adults all saw 12 blocks.
Participants saw a broad range of movies in this study (Table S3), some of which have been reported previously29, 31. The movie titled ‘Child Play’ comprises the concatenation of four silent videos that range in duration from 64–143s and were shown in the same order (with 6s in-between). They extended 40.8° wide by 25.5° high on the screen. The other movies were stylistically similar, computer-generated animations that each lasted 180s. These movies extended 45.0° wide by 25.5° high. Some of the movies were collected as part of an unpublished experiment in which we either played the full movie or inserted drops every 10s (i.e., the screen went blank while the audio continued). We included the ‘Dropped’ movies in the Homotopy analyses and ICA (average number of ‘Dropped’ movies per participant: 0.9, range: 0–3); however, we did not include them in the SRM analyses. Moreover, we only included 4 (out of 17) of these movies in the SRM analyses because there were insufficient numbers of infant participants to enable the training of the SRM.
Gaze Coding
The infant gaze coding procedure for the retinotopy data was the same as reported previously24. The gaze coding for the movies was also the same as reported previously29, 31. Participants looked at the screen for an average of 93.7% of the time (range: 78–99) for the movies used in the homotopy and ICA analyses, and 94.5% of the time (range: 82–99) for the movies used in the SRM analyses (Table S1) Adult participants were not gaze coded, but they were monitored online for inattentiveness. One adult participant was drowsy so they were manually coded. This resulted in the removal of four out of the 24 epochs of retinotopy.
Preprocessing
We used FSL’s FEAT analyses with modifications in order to implement infant-specific preprocessing of the data7. If infants participated in other experiments during the same functional run (14 sessions), the data was split to create a pseudorun. Three burn-in volumes were discarded from the beginning of each run/pseudorun when available. To determine the reference volume for alignment and motion correction, the Euclidean distance between all volumes was calculated and the volume that minimized the distance between all points was chosen as reference (the ‘centroid volume’). Adjacent timepoints with greater than 3mm of movement were interpolated. To create the brain mask we calculated the SFNR56 for each voxel in the centroid volume. This produced a bimodal distribution reflecting the signal properties of brain and non-brain voxels. We thresholded the brain voxels at the trough between these two peaks. We performed Gaussian smoothing (FWHM=5mm). Motion correction with 6 degrees of freedom was performed using the centroid volume. AFNI’s despiking algorithm attenuated voxels with aberrant timepoints. The data for each movie were z-scored in time.
We registered the centroid volume to a homogenized and skull-stripped anatomical volume from each participant. Initial alignment was performed using FLIRT with a normalized mutual information cost function. This automatic registration was manually inspected and then corrected if necessary using mrAlign from mrTools57.
The final step common across analyses created a transformation into surface space. Surfaces were reconstructed from iBEAT v2.058. These surfaces were then aligned into standard Buckner40 standard surface space59 using FreeSurfer59.
Additional preprocessing steps were taken for the SRM analyses. For each individual movie (including each movie that makes up ‘Child Play’), the fMRI data was time-shifted by 4s and the break after the movie finished was cropped. This was done to account for hemodynamic lag, so that the first TR and last TR of the data approximately60 corresponded to the brain’s response to the first and last 2s of the movie, respectively.
Occipital masks were aligned to the participant’s native space for the SRM analyses. To produce these, a mapping from native functional space to standard space was determined. This was enabled using non-linear alignment of the anatomical image to standard space using ANTs61. For infants, an initial linear alignment with 12 DOF was used to align anatomical data to the age-specific infant template62, followed by non-linear warping using diffeomorphic symmetric normalization. Then, we used a predefined transformation (12 DOF) to linearly align between the infant template and adult standard. For adults, we used the same alignment procedure, except participants were directly aligned to adult standard. We used the occipital mask from the MNI structural atlas63 in standard space – defined liberally to include any voxel with an above zero probability of being labelled as the occipital lobe – and used the inverted transform to put it into native functional space.
Analysis
Retinotopy
For our measure of task-evoked retinotopy in infants, we used the outputs of the retinotopy analyses from our previous paper24 that are publicly released. In brief, we performed separate univariate contrasts between conditions in the study (horizontal>vertical, high spatial frequency>low spatial frequency). We then mapped these contrasts into surface space. Then, in surface space rendered by AFNI64, we demarcated the visual areas V1, V2, V3, V4, and V3A/B using traditional protocols based on the meridian map contrast65. We traced lines perpendicular and parallel to the area boundaries to quantify gradients in the visual areas. The anatomically-defined areas of interest38 used in Figure 2 were available in this standard surface space. The adult data were also traced using the same methods as infants (described previously24) by one of the original infant coders (CE).
Homotopy
The homotopy analyses compared the time course of functional activity across visual areas in different hemispheres of each infant. For the participants that had more than one movie in a session (N=9), all the movies were concatenated along with burn out time between the movies (Mean number of movies per participant=2.7, range: 1–6, Mean duration of movies=540.7s, range: 186–1116). For the areas that were defined with the retinotopy task (average number of areas traced in each hemisphere = 7.3, range: 6.0–8.0), the functional activity was averaged within area and then Pearson correlated between all other areas. The resulting cross-correlation matrix was Fisher Z transformed before different cells were averaged or compared. If infants did not have an area traced then those areas were ignored in the analyses. We grouped visual areas according to stream, where areas that are more dorsal of V1 were called ‘dorsal’ stream and areas more ventral were called ‘ventral’ stream. To assess the functional similarity of visual areas, Fisher Z correlations between the same areas in the same stream were averaged, and compared to the correlations of approximately equivalent areas from different streams (e.g., dorsal V2 compared with ventral V2). The averages for each of the two conditions (same stream vs. different stream) were evaluated statistically using bootstrap resampling66. Specifically, we computed the mean difference between conditions in a pseudosample, generated by sampling participants with replacement. We created 10,000 such pseudosamples and took the proportion of differences that showed a different sign than the true mean, multiplied by two to get the two-tailed p-value. To evaluate how distance affects similarity, we additionally compared with bootstrap resampling the Fisher Z correlations of areas across hemispheres in the same stream: same area to adjacent areas (e.g., ventral V1 with ventral V2), to distal areas (e.g., ventral V1 with ventral V3). Before reporting the results in the figures, the Fisher Z values were converted back into Pearson correlation values.
As an additional analysis to the one described above, we used an atlas of anatomically-defined visual areas from adults38 to define both early and later visual areas. Specifically, we used the areas labeled as part of the ventral and dorsal stream (excluding the intraparietal sulcus and frontal eye fields since they often cluster separately39), and then averaged the functional response within each area. The functional responses were then correlated across hemispheres, as in the main analysis. Multi-dimensional scaling was then performed on the cross-correlation matrix, and the dimensionality that fell below the threshold for stress (0.2) was chosen. In this case, that was a dimensionality of 2 (stress=0.076). We then visualized the resulting output of the data in these two dimensions.
Independent Components Analysis (ICA)
To conduct ICA, we provided the preprocessed movie data to FSL’s MELODIC42. Like in the homotopy analyses, we used all of the movie data available per session. The algorithm found a range of components across participants (M=76.4 components, range: 31–167). With this large number of possible components, an individual coder (CE) sorted through them to determine whether each one looked like a meridian map, spatial frequency map, or neither (critically, without referring to the ground truth from the retinotopy task). We initially visually inspected each component in volumetric space, looking for the following features: First, we searched for whether there was a strong weighting of the component in visual cortex. Second, we looked for components that had a symmetrical pattern in visual cortex between the two hemispheres. To identify the spatial frequency maps, we looked for a continuous gradient emanating out from the early visual cortex. For meridian maps, we looked for sharp alternations in the sign of the component, particularly near the midline of the two hemispheres. Based on these criteria, we then chose a small set of components that were further scrutinized in surface space. On the surface, we looked for features that clearly define a visual map topography. Again, this selection process was blind to the task-evoked retinotopic maps, so that a person without retinotopy data could take the same steps and potentially find maps. For the adult participants who were analyzed, the components were selected before those participants were retinotopically traced, in order to minimize the potential contamination that could occur when performing these manual steps close in time.
These components were then tested against that participant’s task-evoked retinotopic maps. If the component was labeled as a potential spatial frequency map, we tested whether there was a monotonic gradient from fovea to periphery. Specifically, we measured the component response along lines drawn parallel to the area boundaries, averaged across these lines, and then correlated this pattern with the same response in the actual map. The absolute correlation was used because the sign of ICA is arbitrary. For each participant, we then ranked the components to ask if the ones that were chosen were the best ones possible out of all those derived from MELODIC. To test whether the identified components were better than the non-identified components, we ranked all the components correlation to the task-evoked maps. This ranking was converted into a percentile, where 100% means it is the best possible component. We took the identified component’s percentile (or averaged the percentiles if there were multiple components chosen) and compared it to chance (50%). This difference from chance was used for bootstrap resampling to evaluate whether the identified components were significantly better than chance. We performed the same kind of analysis for meridian maps, except in this case the lines used for testing were those drawn perpendicular to the areas. In this case, we were testing whether the components showed oscillations in the sign of the intensity.
Shared Response Modeling (SRM)
We based our SRM analyses on previous approaches using hyperalignment28 and adapted them for our sample. SRM embeds the brain activity of multiple individuals viewing a common stimulus into a shared space with a small number of feature dimensions. Each voxel of each participant is assigned a weight for each feature. The weight reflects how much the voxel loads onto that feature. For our study, the SRM was either trained on infant movie-watching data or adult movie-watching data to learn the shared response, and the mapping of the training participants into this shared space. For the infant SRM, we used a leave-one-out approach. We took a movie that the held-out infant saw (e.g., ‘Aeronaut’) and considered all other infant participants that saw that movie (including additional participants without any retinotopy data). We fit an SRM model on all of the participants except the held-out one. This model has 10 features, as was determined based on cross-validation with adult data (Figure S9). We used an occipital anatomical mask to fit the SRM. Using the learned individual weight matrices, the retinotopic maps from the infants in the training set were then transformed into the shared space and averaged across participants. The held-out participant’s movie data were used to learn a mapping to the learned SRM features. By applying the inverse of this mapping, we transformed the averaged visual maps of the training set in shared space into the brain space of the held-out participant to predict their visual maps. Using the same methods as described for ICA above, we compared the task-evoked and predicted gradient responses. These analysis steps were also followed for the adult SRM, with the difference being that the group of participants used to create the SRM model and to create the averaged visual maps were adults. As with the infant SRM, additional adult participants without retinotopy data were used for training. Across both types of analysis, the held-out participant was completely ignored when fitting the SRM, and no retinotopy data went into training the SRM.
To test the benefit of SRM, we performed a control analysis in which we scrambled the movie data from the held-out participant before learning their mapping into the shared space. Specifically, we flipped the timecourse of the data so that the first timepoint became the last, and vice versa. By creating a mismatch in the movie sequence across participants, this procedure should result in meaningless weights for the held-out participant and, in turn, the prediction of visual maps using SRM will fail. We compared ‘real’ and ‘flipped’ SRM procedures by computing the difference in fit (transformed into Fisher Z) for each movie, and then averaging that difference within participant across movies. Those differences were then bootstrap resampled to evaluate significance. We also performed bootstrap resampling to compare the ‘real’ SRM accuracy when using infants versus adults for training.
Anatomical alignment test
We performed a second type of between-participant analysis in addition to SRM. Specifically, we anatomically aligned the retinotopic maps from other participants to make a prediction of the map in a held-out participant. To achieve this, we first aligned all spatial frequency and meridian maps from infant and adult participants with retinotopy into the Buckner40 standard surface space59. For each infant participant, we composed a map from the average of the other participants. The other participants were either all the other infants or all the adult participants. We then used the lines traced parallel to the area boundaries (for spatial frequency) or perpendicular to the area boundaries (for meridian) to extract gradients of response in the average maps. These gradients were then correlated with the ground-truth gradients (i.e., the alternations in sensitivity in the held-out infant using lines traced from that participant). These correlations were then compared to SRM results within participants using bootstrap resampling. If a participant had multiple movies worth of data, then they were averaged prior to this comparison.
Contributions
C.T.E. and M.J.A. conceived of the analyses. C.T.E., T.S.Y., & N.B.T-B. collected the data. C.T.E. & T.S.Y. preprocessed the data. C.T.E. performed the analyses. All authors contributed to the drafting of the manuscript.
Acknowledgements
We are thankful to the families of infants who participated. We also acknowledge the hard work of the Yale Baby School team, including L. Rait, J. Daniels, A. Letrou, and K. Armstrong for recruitment, scheduling, and administration, and L. Skalaban, A. Bracher, D. Choi, and J. Trach for help in infant fMRI data collection. Thank you to J. Wu, J. Fel, and A. Klein for help with gaze coding, and R. Watts for technical support. We are grateful for internal funding from the Department of Psychology and Faculty of Arts and Sciences at Yale University. N.B.T-B. was further supported by the Canadian Institute for Advanced Research and the James S. McDonnell Foundation (https://doi.org/10.37717/2020-1208).
Supplementary materials
References
- 1.Bold response selective to flow-motion in very young infantsPLoS Biology 13
- 2.Development of bold response to motion in human infantsJournal of Neuroscience 43:3825–3837
- 3.Anatomical correlates of category-selective visual regions have distinctive signatures of connectivity in neonatesDevelopmental Cognitive Neuroscience 58
- 4.Organization of high-level visual cortex in human infantsNature Communications 8
- 5.Selective responses to faces, scenes, and bodies in the ventral visual pathway of infantsURL
- 6.The development of intrinsic timescales: A comparison between the neonate and adult brainNeuroImage 275
- 7.Re-imagining fmri for awake behaving infantsNature Communications 11
- 8.Movies in the magnet: Naturalistic paradigms in developmental functional neuroimagingDevelopmental Cognitive Neuroscience 36
- 9.Inscapes: A movie paradigm to improve compliance in functional magnetic resonance imagingNeuroImage 122:222–232
- 10.Development of the social brain from age three to twelve yearsNature Communications 9
- 11.An open resource for transdiagnostic research in pediatric mental health and learning disordersScientific Data 4:1–26
- 12.Keep it real: rethinking the primacy of experimental control in cognitive neuroscienceNeuroImage 222
- 13.Naturalistic imaging: The use of ecologically valid conditions to study brain functionNeuroImage 247
- 14.Free viewing gaze behavior in infants and adultsInfancy 21:262–287
- 15.Online recruitment and testing of infants with mechanical turkJournal of Experimental Child Psychology 156:168–178
- 16.Vergleichende Lokalisationslehre der Grosshirnrinde in ihren Prinzipien dargestellt auf Grund des Zellenbaues
- 17.Distributed hierarchical processing in the primate cerebral cortexCerebral Cortex (New York, NY: 1991) 1:1–47
- 18.Two cortical visual systemsAnalysis of visual behavior Cambridge: MIT Press :549–586
- 19.Topographic maps are fundamental to sensory processingBrain research bulletin 44:107–112
- 20.Vision and cortical map developmentNeuron 56:327–338
- 21.Retinotopic organization of human visual cortex mapped with positron-emission tomographyJournal of Neuroscience 7:913–922
- 22.Functional topographic mapping of the cortical ribbon in human vision with conventional mri scannersNature 365:150–153
- 23.Spatial frequency tuning in human retinotopic visual areasJournal of Vision 8:1–13
- 24.Retinotopic organization of visual cortex in human infantsNeuron 109:2616–2626
- 25.Naturalistic audio-movies and narrative synchronize “visual” cortices across congenitally blind but not sighted individualsJournal of Neuroscience 39:8940–8948
- 26.Topographic connectivity reveals task-dependent retinotopic processing throughout the human brainProceedings of the National Academy of Sciences 118
- 27.Spontaneous activity in the visual cortex is organized by visual streamsHuman Brain Mapping 38:4613–4630
- 28.A model of representational spaces in human cortexCerebral cortex 26:2919–2934
- 29.Neural event segmentation of continuous experience in human infantsProceedings of the National Academy of Sciences 119
- 30.Capturing shared and individual information in fmri dataIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE :826–830
- 31.Functional networks in the infant brain during sleep and wake statesbioRxiv :2023–2
- 32.Development of visual cortex in human neonates is selectively modified by postnatal experienceeLife 11
- 33.Hybrid hyperalignment: A single high-dimensional model of shared information embedded in cortical patterns of response and functional connectivityNeuroImage 233
- 34.A reduced-dimension fmri shared response modelNIPS 28:460–468
- 35.Searching through functional space reveals distributed visual, auditory, and semantic coding in the human brainPLOS Computational Biology 16
- 36.A hierarchical, retinotopic proto-organization of the primate visual system at birtheLife 6
- 37.Hierarchical and homotopic correlations of spontaneous neural activity within the visual cortex of the sighted and blindFrontiers in human neuroscience 9
- 38.Probabilistic maps of visual topography in human cortexCerebral Cortex 25:3911–3931
- 39.Objective analysis of the topological organization of the human cortical visual connectome suggests three visual pathwaysCortex 98:73–83
- 40.Third visual pathway, anatomy, and cognition across speciesTrends in Cognitive Sciences 25:548–549
- 41.Connective field modelingNeuroimage 66:376–384
- 42.Investigations into resting-state connectivity using independent component analysisPhilosophical Transactions of the Royal Society B: Biological Sciences 360:1001–1013
- 43.Functional connectivity of the macaque brain across stimulus and arousal statesJournal of Neuroscience 29:5897–5909
- 44.Retinotopic organization of human ventral visual cortexJournal of Neuroscience 29:10638–10652
- 45.Emergence and organization of adult brain function throughout child developmentNeuroImage 226
- 46.Longitudinal analysis of neural network development in preterm infantsCerebral Cortex 20:2852–2862
- 47.Reorganization of global form and motion processing during human visual developmentCurrent Biology 20:411–415
- 48.Development of human visual functionVision Research 51:1588–1609
- 49.Estimating receptive field size from fmri data in human striate and extrastriate visual cortexCerebral Cortex 11:1182–1190
- 50.Novel domain formation reveals proto-architecture in inferotemporal cortexNature Neuroscience 17:1776–1783
- 51.On the variety of spatial frequency selectivities shown by neurons in area 17 of the catProceedings of the Royal Society of London. Series B. Biological Sciences 213:183–199
- 52.Infant fmri: A model system for cognitive neuroscienceTrends in Cognitive Sciences 22:375–387
- 53.BrainIAK tutorials: User-friendly learning materials for advanced fMRI analysisPLOS Computational Biology 16
- 54.What’s new in psychtoolbox-3Perception 36
- 55.Functional analysis of human mt and related visual cortical areas using magnetic resonance imagingJournal of Neuroscience 15:3215–3230
- 56.Report on a multicenter fmri quality assurance protocolJournal of Magnetic Resonance Imaging 23:827–839
- 57.mrtools: Analysis and visualization package for functional magnetic resonance imaging dataZenodo 10
- 58.ibeat v2. 0: a multisite-applicable, deep learning-based pipeline for infant cerebral cortical surface reconstructionNature Protocols :1–32
- 59.Cortical surface-based analysis: I. segmentation and surface reconstructionNeuroImage 9:179–194
- 60.Individual focused studies of functional brain development in early human infancyCurrent Opinion in Behavioral Sciences 40:137–143
- 61.A reproducible evaluation of ants similarity metric performance in brain image registrationNeuroImage 54:2033–2044
- 62.Unbiased average age-appropriate atlases for pediatric studiesNeuroImage 54:313–327
- 63.A probabilistic atlas and reference system for the human brain: International consortium for brain mapping (icbm)Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences 356:1293–1322
- 64.Afni: software for analysis and visualization of functional magnetic resonance neuroimagesComputers and Biomedical research 29:162–173
- 65.Visual field maps in human cortexNeuron 56:366–383
- 66.Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracyStatistical Science :54–75
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
Copyright
© 2023, Ellis et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 343
- downloads
- 9
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.