Author response:
The following is the authors’ response to the original reviews.
Public Reviews:
Reviewer #1 (Public review):
Summary:
This study aims to investigate the development of infants' responses to music by examining neural activity via EEG and spontaneous body kinematics using video-based analysis. The authors also explore the role of musical pitch in eliciting neural and motor responses, comparing infants at 3, 6, and 12 months of age.
Strengths:
A key strength of the study lies in its analysis of body kinematics and modeling of stimulus-motor coupling, demonstrating how the amplitude envelope of music predicts infant movement, and how higher musical pitch may enhance auditory-motor synchronization.
Weaknesses:
The neural data analysis is currently limited to auditory evoked potentials aligned with beat timing. A more comprehensive approach is needed to robustly support the proposed developmental trajectory of neural responses to music.
We thank the reviewer for this comment and would like to clarify that there has been a misunderstanding: our EEG analyses were time-locked to actual tone onsets, not to expected beat positions. For both music and shuffled conditions, ERPs were computed by epoching around all real auditory events present in each stimulus. This approach ensures that the AEPs reflect neural responses to actual auditory events rather than to predicted or expected events that do not exist in the shuffled stimuli. We have now clarified this further in the revised manuscript (p. 9).
Reviewer #2 (Public review):
Summary:
Infants' auditory brain responses reveal processing of music (clearly different from shuffled music patterns) from the age of 3 months; however, they do not show a related increase in spontaneous movement activity to music until the age of 12 months.
Strengths:
This is a nice paper, well designed, with sophisticated analyses and presenting clear results that make a lot of sense to this reviewer. The additions of EEG recordings in response to music presentations at 3 different infant ages are interesting, and the manipulation of the music stimuli into shuffled, high, and low pitch to capture differences in brain response and spontaneous movements is good. I really enjoyed reading this work and the well-written manuscript.
Weaknesses:
I only have two comments. The first is a change to the title. Maybe the title should refer to the first "postnatal" year, rather than the first year of life. There are controversies about when life really starts; it could be in the womb, so using postnatal to refer to the period after birth resolves that debate.
Thank you very much for your thoughtful suggestion regarding the title. To ensure clarity and to unambiguously indicate that our study focuses on the period after birth, we agree that specifying "first postnatal year” in the title is appropriate. We have revised the title accordingly.
The other comment relates to the 10 Principal Movements (PMs) identified. I was wondering about the rationale for identifying these different PMs and to what extent many PMs entered in the analyses may hinder more general pattern differences. Infants' spontaneous movements are very variable and poorly differentiated in early development. Maybe, instead of starting with 10 distinct PMs, a first analysis could be run using the combined Quantity of Movements (QoM) without PM distinctions to capture an overall motor response to music. Maybe only 2 PMs could be entered in the analysis, for the arms and for the legs, regardless of the patterns generated. Maybe the authors have done such an analysis already, but describing an overall motor response, before going into specific patterns of motor activation, could be useful to describe the level of motor response. Again, infants provide extremely variable patterns of response, and such variability may potentially hinder an overall effect if the QoM were treated as a cumulated measure rather than one with differentiated patterns.
We agree that due to the high variability and limited differentiation of infant motor responses at this age, it is important to consider an overall measure of movement in addition to specific PMs. To address exactly this, we had included an analysis in which we combined all 10 PMs into a single global QoM metric. This ‘All PMs’ measure reflects the overall motor response to the different auditory stimuli. For clarity, this result is presented in Figure 5, where we show the denoised global QoM signal and highlight the observed Condition × Age interaction (which averaged QoM for all PMs and is therefore equivalent to QoM without PM distinction). We now emphasize this analysis more clearly in the Results section (p. 16).
Reviewer #3 (Public review):
Summary:
This study provides a detailed investigation of neural auditory responses and spontaneous movements in infants listening to music. Analyses of EEG data (event-related potentials and steady-state responses) first highlighted that infants at 3, 6, and 12 months of age and adults showed enhanced auditory responses to music than shuffled music. 6-month-olds also exhibited enhanced P1 response to high-pitch vs low-pitch stimuli, but not the other groups. Besides, whole body spontaneous movements of infants were decomposed into 10 principal components. Kinematic analyses revealed that the quantity of movement was higher in response to music than shuffled music only at 12 months of age. Although Granger causality analysis suggested that infants' movement was related to the music intensity changes, particularly in the high-pitch condition, infants did not exhibit phase-locked movement responses to musical events, and the low movement periodicity was not coordinated with music.
Strengths:
This study investigates an important topic on the development of music perception and translation to action and dance. It targets a crucial developmental period that is difficult to explore. It evaluates two modalities by measuring neural auditory responses and kinematics, while cross-modal development is rarely evaluated. Overall, the study fills a clear gap in the literature.
Besides, the study uses state-of-the-art analyses. All steps are clearly detailed. The manuscript is very clear, well-written, and pleasant to read. Figures are well-designed and informative.
Weaknesses:
(1) Differences in neural responses to high-pitch vs low-pitch stimuli between 6-month-olds and other infants are difficult to interpret.
We agree with the reviewer that the differences in neural responses to high-pitch versus low-pitch stimuli between 6-month-olds and other infants are difficult to interpret. We have offered several possible explanations for these findings, including developmental changes in auditory plasticity, social interaction effects, maturation of the auditory system, and arousal or exposure differences. If the reviewer has additional perspectives or alternative explanations, we would be very pleased to incorporate them into the revised manuscript.
(2) Making some links between the neural and movement responses that are described in this manuscript could be expected, given the study goal. Although kinematic analyses suggested that movement responses are not phase-locked to the music stimuli, analyses of Granger causality between motion velocity and neural responses could be relevant.
We appreciate the suggestion that exploring links between neural and movement responses would be valuable, especially given the study's goals. We were initially cautious about interpreting potential Granger-causal relations between neural and motor activity, as temporal scale differences between the two measures can easily bias directionality estimates. Neural responses typically occur on the scale of milliseconds, whereas movement unfolds over seconds. As a result, an apparent directional relation might emerge simply due to these intrinsic timescale differences rather than reflecting genuine causal influence.
Nevertheless, we agree that this relationship warrants further investigation and added the following analyses to the supplements (p. 9). Accordingly, we conducted additional exploratory analyses to examine whether ERP amplitudes correlated with movement measures. To this end, we computed correlations between neural and movement responses using participant-averaged data (not single trials). For neural measures, we extracted mean ERP amplitudes in the time window post-tone-onset encompassing the P1 component derived from cluster-based analyses. For movement measures, we used: (1) total movement quantity (mean velocity across the entire trial), and (2) Granger causality F-values reflecting music-to-movement coupling strength. These analyses included comparisons between music and shuffled music conditions, as well as between high- and low-pitch conditions. We therefore ran two linear mixed-effects models, with ERP amplitudes as response variables and either QoM or Granger causality F-values as fixed effects. Infants were modelled as random intercepts. Our results showed no significant correlations between ERP amplitudes and movement quantity, irrespective of conditions (p>.124), and neither when comparing music vs shuffled music (p>.111) nor when comparing high vs low pitch (p>.071) across all age groups. We also do not find significant correlations between ERP amplitudes and Granger causality F-values, irrespective of conditions (p>.164), and when comparing music vs shuffled music (p>.494) or high vs low pitch (p>.175) across all age groups. The absence of robust correlations suggests that neural sensitivity to musical structure (as indexed by ERPs) and motor responsiveness to music (as indexed by movement quantity or coupling strength) develop somewhat independently during the first year of life. This dissociation aligns with broader developmental theories proposing that perceptual sensitivity often precedes and enables later motor coordination, rather than developing together.
(3) The study considers groups of infants at different ages, but infants within each group might be at different stages of motor development. Was this assessed behaviorally? Would it be possible to explore or take into account this possible inter-individual variability?
We agree this is important. Infants in each age group were within a quite narrow age range (3 months: M=113.04 days, SD=5.68 days, Range=98-120 days, 6 months: M=195.88 days, SD=9.46 days, Range=182-211 days,12-13 months: M=380.44 days, SD=14.93 days, range=361-413 days), as detailed in the sample description on p. 37. Despite this, we asked parents to report on infants' major motor milestones, specifically their ability to sit and/or walk. At 6 months, 25% of infants were able to sit (N = 20), and at 12 months, 50% of infants were able to walk (N = 18). Given the relatively small group sizes for these milestones, we are concerned that conducting detailed analyses could yield unstable or misleading results that may not generalize beyond our sample. Therefore, we chose to focus on broader analyses that are more robust given our current dataset. We fully support your suggestion that future studies with larger samples and more comprehensive motor assessments will better clarify these developmental trajectories.
Recommendations for the authors:
Reviewer #1 (Recommendations for the authors):
While the analysis and findings on auditory-evoked spontaneous movement are highly interesting, the results from the neural data raise questions about the genuine role of music in the observed evoked and induced responses.
General comments on the findings related to neural data
(1) The main neural finding is a larger response in the Music condition compared to the Shuffled Music condition. To address their hypothesis, the authors computed the AEP to tones at the beat position and compared responses between the Music and Shuffled Music conditions, aligning the onset to the expected beat position. However, given that inter-onset intervals were permuted in the Shuffled condition, an AEP time-locked to the expected beat position is not meaningful, as no tone is expected at that time. Therefore, it is expected to have a relatively flat AEP in response to the shuffled condition. Furthermore, given the reduced regularity in the Shuffled condition, the observed difference in ASSR at the beat frequency is expected. Similar results could be obtained using an isochronous sequence of pure tones and a shuffled version of the same sequence. Therefore, these two analyses do not strongly support the conclusion of infants' enhanced neural responses to music.
The authors could consider comparing AEPs by aligning onsets in the Shuffled condition to the actual tone positions, potentially focusing only on tones with sufficiently long preceding and following IOIs to avoid confounds from short intervals. The two conditions could then be compared with correction for the number of tones. Potential differences in this case could have suggested an impact beyond the auditory evoked responses.
We agree that ASSR analyses at the beat frequency is not enough to evidence enhanced neural responses to music. However, we would like to clarify that for the AEP analyses, the EEG data were epoched to all actual tone onsets rather than the expected beat positions, therefore adding to the ASSR analysis. Thus, for the shuffled music condition, the EEG was aligned with the real tone onsets present in that sequence, not with hypothetical beat positions derived from a regular rhythm. This approach ensures that the AEPs reflect neural responses to actual auditory events rather than to predicted or expected events that do not exist in the shuffled stimuli.
We further clarify this in the results section on p. 9
“Figure 2 shows the average ERPs to the bassline notes in the auditory stimuli, with EEG data time-locked to actual tone onsets (see Methods for details).”
Finally, following the reviewer’s suggestion, we carried out three control analyses: 1) including only epochs corresponding to bassline tones whose prior inter-onset interval (IOI) exceeded the median IOI duration, 2) including only epochs corresponding to bassline tones whose subsequent IOI exceeded the median IOI duration, and 3) including only epochs corresponding to both melody and bassline tones whose prior and subsequent IOI exceeded the median IOI duration. These analyses yielded event-related potentials in the shuffled music condition that were highly similar to those obtained when all epochs were included (see Figure S1). Therefore, the greater neural response to music compared with shuffled music likely reflects an effect of predictability in the musical condition or, more generally, infants’ disengagement with the shuffled stimuli.
It would also be helpful to see whether the authors explored other approaches for evaluating neural responses across conditions, such as brain-stimulus synchronization, coherence measures, or temporal response functions (TRF), and whether these yielded comparable results.
Thank you for this question. We have not explored these approaches, but we agree that alternative methods for evaluating neural responses, such as brain-stimulus synchronization, coherence measures, or temporal response functions (TRF), could offer complementary insights. Given the scope and focus of the present work, and the already extensive set of neural and behavioral measures reported, we chose to prioritize analyses most directly relevant to our initial research questions. Incorporating further methods might risk complicating the narrative and obscuring the key findings. We appreciate the value of these additional methods and consider them promising avenues for future investigations.
(2) Another important finding concerns the difference in AEPs between the High Pitch and Low Pitch conditions in 6-month-old infants, a pattern not observed in the younger (3-month) or older (12-month and adult) groups. The authors interpret this as heightened sensitivity to high-pitch sounds, typical of infant-directed speech. However, the absence of this effect at 12 months raises questions. It would be helpful to consider whether this pattern may be influenced by data quality differences across age groups. Additionally, the authors could discuss this observation in relation to studies showing stronger neural tracking of rhythms in infants, particularly for low-frequency sounds (e.g., Lenc et al., Developmental Science, 2022).
This is an interesting consideration that we investigated further. Regarding data quality differences, we considered different measures and now report these in the methods section (p. 30) and supplements (p. 1).
“We conducted two analyses to compare the EEG data quality across age groups. First, we compared the number of trials that were included in the final analysis per age group. The trial number did not differ significantly across age groups (p > .361). Second, we calculated the SNR by dividing the EEG power at the frequency of interest (i.e., 2.25 Hz, matching the musical beat) by the background noise in surrounding bins (3rd to 5th bin, see ASSR methodology for further details; c.f., Christodoulou et al., 2018; Cirelli et al., 2014). This division yields a signal-to-noise ratio that can be averaged across conditions and compared across age groups to assess variations in signal quality (especially when focusing on the pitch conditions with the same beat frequency). Here, we find that all three age groups show considerable SNR above 1 (3m: M = 2.569, SD = 1.104; 6m: M = 2.743, SD = 1.001; 12m: M = 1.907, SD = 0.749), with no statistically significant differences (three t-tests, FDR-corrected, p > .134). Importantly, our key comparison of High vs. Low Pitch was performed within each age group, thus controlling for any overall differences in signal quality across groups. Together, these two analyses indicate that signal quality was comparable across age groups.”
Overall, these control analyses seem to support the observed high-pitch sensitivity in the neural response of 6-month-olds, specifically, and in line with previous research investigating this age range (Trainor & Zacharias, 1998; Fernald & Kuhl, 1987). What is more is that there might be some particular changes towards the end of the first year that mark infants’ widening of their attention towards others (beyond their primary caregivers) and objects in their environment (Cooper et al., 1997; Newman & Hussain, 2006), as well as a decrease in exposure to face-to-face interactions with their primary caregivers (Jayaraman et al., 2015). Taken together, research shows that infants' preference for infant-directed speech decreases significantly between 4.5 and 9 months, coinciding with developmental changes in attentional systems and social interaction patterns. This might explain the absence of high-pitch sensitivity in 12-month-olds. However, further research is needed to determine if and in which contexts high-pitch sensitivity to music changes throughout infancy.
We also edited the discussion in order to compare our results to those of Lenc et al., 2023, p. 23: “It should also be noted that our musical stimuli comprised polyphonic (two-voice) music, carrying sound frequencies falling within the typical range of infant-directed song (~200-400 Hz, Cirelli et al., 2020; Nguyen, Reisner, et al., 2023b; Trainor & Zacharias, 1998). As such, our results might specifically speak for infants’ ability to separate (and prioritize among) simultaneous communicative auditory streams (Marie & Trainor, 2013; Trainor, 2015). Indeed, other studies presenting one-voice pure tone sequences (single isochronous and isotonous tones) with high vs. low pitch - notably at frequencies outside our range (130 vs. 1237 Hz) - have reported stronger neural responses to relatively low frequencies (Lenc et al., 2023). Together, these contrasting observations suggest that pitch prioritization changes not only throughout development but also depends on the polyphonic complexity and spectral characteristics of the perceived stimuli. Further research might investigate this interesting issue further.”
(3) It would also be helpful if the authors provided more detailed information on the stimuli, including both temporal/rhythmic and spectral content, for the original music, high-pitch and low-pitch variations, and shuffled versions.
Absolutely. We agree that this is important to report. We have added a Table to the Results (Table 1) and a Table S1 with M, SD and range of the envelope to further describe the temporal and spectral features of the Stimuli.
General comments on the findings related to body kinematics
(4) Quantification of movement based on the PMs did not lead to any differences between the High Pitch and Low Pitch conditions. However, Granger causality showed high prediction strength for the High Pitch condition. In the discussion, the authors proposed that high-pitch music might have led to higher arousal. If this were the case, one might expect to observe increased movement in the High Pitch condition relative to the Low Pitch condition in the PM analyses. I propose that the authors revise the discussion to address the misalignment between different findings.
We thank the reviewer for highlighting this important point and welcome the suggestion to clarify the relationship between movement quantification based on principle movements (PM) and the Granger causality results. We agree that the apparent discrepancy between these measures merits further clarification. We note that the discrepancy suggests that Granger causality may capture subtler temporal coordination between movements and the music, rather than gross movement magnitude. We have incorporated this reasoning into the revised discussion paragraph (page 23-24), which now reads as:
“If increased arousal were to result in greater overall movement, we would expect higher movement levels in the high pitch condition; however, this was not observed. QoM analyses based on the PMs did not reveal significant differences between the high pitch and low pitch conditions. This discrepancy may arise because Granger causality captures subtler temporal coordination between movement and music rather than gross movement quantity. Thus, high-pitch music may modulate the timing and coordination of motor responses without necessarily increasing the overall amount of movement. In line with prior work (e.g., Bigand et al., 2024), this interpretation emphasizes that musical coordination often involves changes in coupling strength rather than movement quantity per se.”
(5) The authors report a lack of periodicity and phase-locked movement in infants. Considering the developmental stage, I assume that spontaneous movements to music have emerged over short periods during each exposition period. Probably to further investigate movement periodicity, which has been previously suggested, the authors can first automatically extract periods of periodic movement and further evaluate the tempo/frequency and synchronization with the stimulus during these specific periods.
We thank the reviewer for this thoughtful suggestion. We conducted similar analyses prior to submission, using methods comparable to previous studies (Fujii et al., 2014). These analyses did not yield additional insights beyond those already presented in the manuscript, so we opted not to include them initially. For completeness, we briefly mention these results on p. 19:
“Robustness analyses based on thresholding of variation in the time series to identify movement burst epochs (similar to Fujii et al., 2014) yielded consistent results. No significant movement-to-music synchronization was found across age groups (all ps > .563).“
It is important to clarify that while movement periodicity in infants listening to music has been previously suggested, the evidence for actual synchronization to musical beats remains limited and has been frequently misinterpreted in the literature. The seminal study by Zentner and Eerola (2010) is often cited as evidence for infant rhythmic entrainment, but their findings actually demonstrated tempo flexibility rather than synchronization, i.e., infants moved faster when the music was faster. Similarly, Fujii et al. (2014) found that while individual infants showed some movement-to-music coordination, this occurred in only 2 out of 11 tested infants (18%), and the authors emphasized that "movement-to-music synchronization is rare in infants and observed at an individual level".
(6) A last general comment is that the authors try to explain the findings of the current study, providing hypotheses, for instance, on the origin of differences in the neural response to high and low pitch only at 6 months. It would be helpful if the authors also consider the misalignment of results with previous findings.
We thank the reviewer for this comment and acknowledge the importance of placing our findings in the context of prior research on infant pitch perception, including some apparent inconsistencies such as those noted for Lenc et al. (2023), which we have addressed in our response to comment 2. We agree that results inevitably vary across studies due to differences in methods, stimuli, and participant samples—all factors that contribute to some variability in developmental trajectories observed in the literature.
Importantly, our observation of a transient difference in neural responses to high versus low pitch emerging at 6 months aligns with existing evidence indicating significant neural reorganization occurring around this age (Carr et al., 2022) and continuing toward 12 months (Kuhl et al., 2014). This may reflect a sensitive developmental window during which infants show heightened sensitivity to prosodic features important for early social and communicative interactions. After this window, attentional and auditory processing priorities shift, which could explain the subsequent decline in pitch sensitivity.
We emphasize that these interpretations are preliminary, and further systematic investigations—preferably longitudinal studies incorporating diverse pitch ranges and multimodal attentional and neural measures—are needed to delineate the developmental course of pitch sensitivity comprehensively.
Reviewer #2 (Recommendations for the authors):
Thank you for the opportunity to read this interesting work.
Thank you for the constructive comments.
Reviewer #3 (Recommendations for the authors):
(1) I would suggest replacing "first year of life" with "first post-natal year".
Thank you for the suggestion. In line with yours and Reviewer #2’s comments, we have revised the title to “first postnatal year”.
(2) Precising the music paradigm and the stimuli nature/timing would be useful at the beginning of the Results section.
We agree and have added two tables (Table 1 and Table S1 for continued information on the envelope) for further information about the paradigm and stimuli to the beginning of the results section (p.8).
In addition, the stimuli are also shared on a repository: https://doi.org/10.48557/DCSCFO.
(3) Since the infants moved during the experiment, EEG data might show movement artefacts. Was the approach used to correct these artefacts satisfactory, even in 12-month-olds who moved more?
We appreciate the reviewer’s important question regarding artifact correction in infant EEG data, especially given increased movement in older infants. We recognize that movement-related artifacts are an inherent challenge in EEG recordings with infants, and complete elimination of such artifacts is technically difficult (if not impossible). However, several points support the robustness of our ERP findings despite spontaneous movement:
First, we used a two‐stage pipeline to maximize artifact removal without bias: First, Artifact Subspace Reconstruction (ASR) repaired brief, high‐variance artifacts by reconstructing contaminated channels from clean data. Second, Independent Component Analysis (ICA, as implemented in ICLabel) decomposed the ASR‐cleaned EEG into independent components, allowing us to remove residual non‐neural artifacts (e.g., eye movements) based on their spatial and spectral features. Both ASR and ICA operate agnostically to condition or age group and automatically, without subjective decisions, ensuring unbiased cleaning and reliable ERP comparisons.
As noted in the response to R1 Comment (2), we also compared the EEG data quality across age groups and conditions. The trial number did not differ significantly across age groups (p > .361). Second, we calculated the SNR by dividing the EEG power at the frequency of interest and found no statistically significant differences across age groups (three t-tests, FDR-corrected, p > .134). Together, these two analyses indicate that signal quality was comparable across age groups.
Infant movements during the session were sporadic and, most importantly not time-locked to tone onsets (see Fig S2). Because artifact rejection (namely, Artifact Subspace Reconstruction and Independent Component Analysis) discarded only those epochs containing large, transient artifacts irrespective of condition, residual movement-related noise would not systematically inflate ERPs.
(4) The timing of the P200 response peak could be specified in adults as for infants.
The timing of the P200 in adults is mentioned on page 9: “[…] a second positivity peaking at 158 ms post-stimulus (so-called “P200”, here reaching an amplitude of 0.85 µV).” The timing of the infant P2 is specified on p 10 and 11: “The P2 ranged between 307 and 325 ms post-stimulus and peaked at 316 ms, reaching an average amplitude of 1.026 µV.”
(5) In infants, the evocation of "peaking at 212ms" is not completely clear: does this timing correspond to the P1 peak at 3 months of age or to the time when the response to music was enhanced compared to shuffled music?
Thank you for highlighting the need for greater clarity regarding the timing of the P1 peak and its relation to the observed enhancement. We have revised the text to explicitly state that 212 ms corresponds to the P1 peak in 3-month-old infants within the window where the response to music was significantly enhanced compared to shuffled music.
p.9: “Importantly, and in line with the adults’ data, all infant groups exhibited enhanced P1 amplitudes in response to music compared to shuffled music. Cluster-based permutation (nPerm=1000) testing revealed that 3-month-old infants’ P1 amplitude was enhanced between 177 and 305 ms post-stimulus (cluster-t=1111.90, p=.002). Within this window, the P1 peaked at 212 ms and reached an amplitude of 1.8 µV.”
(6) It might be useful to put the results of this study into perspective with other studies of infant motor development (e.g., Hinnekens et al, eLife 2023).
Thank you for pointing out this study. We have integrated the Hinnekens et al. (2023) findings into our discussion of infant motor development toward dance-like behaviors. p.22 “Taking a broader perspective on infants’ motor development, our findings align with research on locomotion across the first 14 months of life, which shows that as the number of motor primitives increases, their intrinsic variability decreases (Hinnekens et al., 2023). Viewed together, these patterns point toward a gradual refinement of motor control: the human motor system first develops the capacity to control individual muscles, and gradually to integrate them into motor synergies that support complex, coordinated behaviours, such as locomotion, musical synchronization, and dance.”
(7) Regarding the progressive maturation of the auditory/linguistic pathways during infancy, the authors might also refer to (Dubois et al, Cerebral Cortex 2016).
Thank you for the suggestion. We added the study to the discussion on page 22: “This developmental trajectory aligns with neuroimaging evidence showing that while the ventral linguistic pathway (connecting temporal and frontal regions via the extreme capsule) is well-established at birth, the dorsal pathway—particularly the arcuate fasciculus connecting temporal regions to inferior frontal areas—continues maturing throughout the first postnatal months, with different maturational timelines for dorsal versus ventral connections (Dubois et al., 2016).“