Abstract
Delayed onset of canonical babbling and first words is often reported in infants later diagnosed with autism spectrum disorder (ASD). Identifying the neural mechanisms underlying language acquisition in ASD is therefore critical to inform early diagnosis, prognosis, and intervention strategies. In this study, we investigated two speech processing mechanisms previously identified as atypical in children and adults with ASD: the neural ability to track syllables; and statistical learning (SL), the capacity to detect speech regularities beneath surface variability. We recorded 83 longitudinal high-density EEGs from 44 infants (2.5–22.6 months) at high (HL) and low (LL) likelihood for ASD and assessed their verbal outcomes at 20 months. Neural entrainment was measured at syllable and word frequencies during exposure to a multi-speaker stream of concatenated tri-syllabic words, followed by a word recognition test using ERP recording. Our findings revealed reduced tracking abilities at the syllabic level in HL infants, a measure that correlated with verbal outcomes. While HL infants did not exhibit deficits in SL itself, they displayed reduced novelty orientation during the word recognition test, indicated by a reduced late ERP. By contrast, multi-talker variability temporarily disrupted word segmentation around 12 months in LL infants, but not in HL infants, potentially reflecting decreased sensitivity to human voices in the HL group. These results emphasize the importance of longitudinal protocols employing online, implicit measures to track the hierarchical stages of speech processing in both HL and LL infants.
Introduction
Autism Spectrum Disorder (ASD) is a neurodevelopmental condition marked by early and pervasive challenges in social interaction and communication, coupled with repetitive behaviors and restricted interests (1). The prevalence of ASD has risen over the past decades to an estimated 3.2% (2,3). It is frequently associated with language difficulties that greatly vary across individuals and age (4,5). Atypical babbling and reduced word recognition are among the earliest indicators of ASD (6,7). Identifying and addressing these early difficulties is crucial to mitigating their long-term detrimental cascading effects on later verbal and non-verbal abilities (8–10). Achieving this, however, requires a deeper understanding of the neural mechanisms underlying language acquisition in infants who eventually develop ASD compared to typically developing peers.
A key step in language acquisition is the ability to discover the deep linguistic structure that lies beneath a variable surface. In particular, a primary challenge is to perceive the chain of discrete words embedded in a continuous speech flow, a difficult problem for autistic individuals vividly described by Donna Williams: “the way my brain had broken down sentences into words left me with a strange and sometimes unintelligible message” (11). This capacity relies on the correct identification of the successive phonemes in the spoken flow and their attribution to a given word, as unlike written text where spaces delineate words, spoken language lacks clear perceptual boundaries. Successful word segmentation relies on a complex and hierarchical integration of prosodic, phonetic, lexical, and contextual cues (12). Among these cues, Saffran and colleagues (1996) highlighted the role of statistical regularities in the speech stream, showing that after just two minutes of exposure to a stream of randomly concatenated four tri-syllabic none-words, 8-month-old infants could distinguish these words from syllable combinations spanning word boundaries (13). This ability was attributed to the learning of transition probabilities (TP) between adjacent syllables—the computation of the likelihood of syllable B coming after syllable A in the artificial speech, P(B|A). In their experiment, where each word was followed by one of three other words, TPs were 1.0 within a word and 0.33 across word boundaries. This ability has put forward the contribution of statistical learning (SL) to speech segmentation, notably in preverbal infants. It was later shown that this capacity is already available in sleeping neonates (14,15). Nowadays, SL is recognized as a general learning mechanism, available at any age (16) and observed in a wide range of perceptual domains (17–19). It operates largely automatically, as it has been observed even in sleeping and comatose states (15,20). Importantly, a causal role of SL in language acquisition difficulties has been proposed in several studies of developmental disorders (21–24).
Considering this background, infants who eventually develop ASD may face challenges at two levels of auditory processing. First, they may struggle following the rhythm of speech and robustly encoding syllables. Second, SL itself might be impaired. Prospective longitudinal studies of infants at high likelihood for ASD (HL) have become the standard methodology to explore how neural processing takes place before the emergence of a reliable diagnosis (25,26). HL status usually stems from a family history of ASD and/or a condition strongly associated with ASD like some specific genetic syndromes (27).
Regarding the first level, several EEG studies have reported decreased cortical synchronization to speech syllabic modulations in both autistic children and infants at high likelihood for autism (28–31). Specifically, these studies identified a negative relationship between theta-range (4-7 Hz) power in response to naturalistic speech stimuli and verbal abilities in autistic children. This diminished synchronization may stem from an excitatory/inhibitory neuronal imbalance in auditory cortices (32–34), as well as from anomalies in early auditory perception. For instance, studies have observed jittering or distorted frequency encoding in auditory brainstem responses in ASD, which could disrupt subsequent processing stages, such as a correct identification of the phonemes (35–37).
A growing body of literature also emphasizes atypical SL in individuals with ASD, particularly when using linguistic stimuli (38). After repeated exposures to a syllable stream, 10 y.o. autistic children showed a magneto-encephalographic (MEG) neural response that was not modulated by the statistical properties of the syllable sequences, indicating a deficit in SL (39). Exposing autistic children and HL infants to similar stimuli, two independent fMRI studies found that the left temporo-parietal cortex, left amygdala, and basal ganglia were less activated by statistical cues (40,41). However, the difference between groups was mainly due to a lack of activation in autistic children and might be related to low-level sensory difficulties classically described in ASD, amplified by the scanner noise rather than to a genuine lack of SL capacities, as discussed by the authors themselves. Importantly, most studies exploring either cortical synchronization or SL have focused on autistic children rather than HL infants, i.e., well beyond the age at which those mechanisms crucially serve language acquisition.
Building on the evidence linking SL to early language development and its potential disruptions in ASD, we sought to investigate the preverbal developmental trajectory of this ability in infants at low and high likelihood of ASD. Using high-density EEG, we conducted a prospective longitudinal study to evaluate how infants process an artificial speech stream similar to the one used by Saffran et al.’s (1996), during their first two years of life. Our study had four primary goals: first, to map the infant developmental trajectory of auditory SL, which remains incompletely understood even in typically developing infants (42); second, to identify the specific levels of word learning at which HL infants may show difficulties; third, to determine whether these difficulties were stable, worsened, or improved over time compared to LL infants, and fourth, to assess whether any of these early indicators could predict later verbal difficulties.
We recorded 83 high-density EEG sessions from 44 infants (19 LL, 25 HL) aged 2.5 to 22.6 months, using an experimental paradigm originally designed for sleeping newborns (15,43,44). During the experiment, infants were exposed to an artificial speech stream consisting of syllables of constant duration that formed tri-syllabic non-words (Figure 1A). Following this learning phase, they listened to isolated triplets corresponding to words from the stream and to part-words, which spanned word boundaries. This design allowed us to obtain several neural measures, which should help identify the specific difficulties faced by children with ASD. First, the regular presentation of syllables at a fixed duration elicits increased power at the syllable frequency (4 Hz), providing a measure of the efficiency of neural synchronization with the speech signal. Second, previous research has shown that if the regular word structure is detected, neural entrainment at the word frequency emerges, resulting in increased power at the word frequency (4 Hz / 3 syllables = 1.33 Hz). This measure reflects the ability to segment the stream into words through statistical computations (15,16,45,46). Finally, comparing ERPs to isolated words and part-words enabled us to assess subsequent word recognition through familiarity/surprise responses (15). For each electrophysiological measure, we assessed its association with participants’ receptive and expressive verbal outcome collected at 18-21 months of age.

A. Experimental procedure and multivariate statistical analyses. The learning part was sandwiched by a silent resting state (RS) and a random stream (RND) with even transition probabilities between syllables. This design accounted for the potential effect of time during the experiment and changes in vigilance state on neural entrainment measures. The learning segment consisted of a long structured (STR) stream where syllables were organized into four three-syllable words presented in random order with no repetition. Following this, six test-blocks were presented, each comprising 8 triplets from the words and part-words conditions with 2-second silences interleaved between items. To sustain learning, 30-second short STR streams were interspersed between test blocks. A 4.5s fade-in/out at the borders of each stream was included to minimize any perceptual anchor effect. The full procedure lasted ∼17 minutes. Arrows’ width schematically represents the transition probability (TP) magnitude. B. Pipeline for longitudinal partial least square correlation (PLS-C) analysis. Details are provided in the Methods section. ERP: Event-Related Potential.
To increase the cognitive demands of our task and elicit potentially larger differences between LL and HL participants, we introduced random speaker changes across syllables, adding acoustic variability that may pose a particular challenge for autistic individuals to filter out (47,48). Despite this variability, LL participants were expected to disregard voice changes, as numerous studies have shown that infants can normalize phonetic representations across voices (49), enabling them to focus on the transition probabilities between syllables and extract the statistical structure of the stream. Notably, SL has been demonstrated in neonates under these conditions of voice variation (44).
However, the relationship between voices and language may evolve with age, particularly because priors about what a word is, strengthen with increased speech exposure (42). For instance, native phonotactic rules modulate the computation of transition probabilities in adults (50). It is furthermore highly unnatural for a word to be split across multiple voices. Voice recognition also becomes increasingly important in the infant social environment, and multi-talker stimuli have been shown to impair word recognition in infants (51) and delay speech processing in adults (52), seemingly contradicting the process of normalizing voice variations described earlier. These performances may stem from the attentional cost associated with reorienting after a voice change (52,53) or from more precise contextual memories due to hippocampal maturation (54). The developmental trajectories of LL and HL participants in the task might thus differ depending on acquired knowledge and/or atypical attentional focus on speech features, native language, or voice-specific characteristics (55,56)
To account for the multiple variables of the design, data were analyzed using longitudinal partial least square correlations (PLS-c, Figure 1B). We included participants’ age, group (HL/LL), and verbal outcome to explain the EEG signal (57,58). Our analyses aim not only to provide a detailed characterization of syllable tracking and SL in ASD HL infants compared to typical infants, but also to track the evolution of these fundamental skills over the first two years of life in both groups of infants.
Results
Neural entrainment
Analyses of the entrainment at the syllabic rate (4 Hz) and the word rate (1.3 Hz) followed an identical logic. We first examined whether neural entrainment was present in the different conditions across all participants while tracing the developmental trajectories of these responses across ages. To achieve this, we used PLS-c to compare the target frequency with the average of the two adjacent frequency bins, incorporating age-related terms as outlined in the Methods section. The analysis aimed to confirm neural entrainment at the syllabic rate during both the RND and STR streams and, more importantly, at the word rate during the STR stream. Second, we investigated group differences (LL vs. HL) in entrainment by using a PLS-c with the entrainment at a given frequency as brain data and a behavioral design matrix including the group contrast, the age terms, and the verbal outcome variable. Finally, we assessed the dynamics of SL by tracking changes in neural entrainment throughout the experiment.
Neural entrainment to syllables
The PLS-c analysis testing for syllabic entrainment in the RND and STR streams recovered one latent component (p <.001, r =.75, 93.1% explained covariance, Figure 2A-B). There was a significant contrast effect (entrainment at 4Hz vs. adjacent frequency bins, bootstrap ratio, BSR: 88.1), confirming syllable rate entrainment at the whole sample level. Interestingly, we observed a quadratic age effect (age² term) on the contrast, with a convex trajectory peaking around 12 months (BSR for contrast*age²:-4.0). Separate PLS-C analyses for each stream (i.e., RND and STR) yielded similar results, showing a comparable convex age trajectory in both streams (Supplementary Material S3).

Neural entrainment to syllable rate (4 Hz): Main effect across all participants (Top row); Group differences (bottom row).
A. Design salience (left) and brain salience (right topography) derived from the significant latent component (LC) for neural entrainment to the syllable rate (4 Hz) using the targeted frequency (4 Hz) versus adjacent frequencies as a contrast. Significance (i.e., salience) was established through bootstrapping. Bars represent the mean of 500 random salience samples with replacement bootstrapping, and error bars indicate the 95% confidence interval. Yellow shading highlights variables that significantly contribute to the LC, defined by a Bootstrap Ratio (BSR; mean of bootstrapping divided by standard deviation) > 2.3. The topography of the BSR values shows electrodes significantly contributing to the LC (indicated by black dots, BSR > 2.3). B. Individual raw Phase Locking Values (PLVs) extracted from the salient electrodes identified by the LC (black dots in the topography on 2A) are displayed. A fitted curve is included for visualization purposes only, produced by a mixed-effects model with a 95% confidence interval. This curve is intended solely to aid visualization, as the statistical relationships between EEG and behavioral variables are determined by the PLS-C analysis. C. PLS-C analysis of the differences between HL and LL groups is presented following the same format as in A. D. Individual raw PLVs extracted from the salient electrodes (black dots in subFigure 2C) are shown for visualization purposes only. At every age, a verbal developmental quotient (DQ) of 100 is expected in the general population.
Regarding syllable entrainment differences between groups, we identified one significant latent component (LC) (p<.001; r=.50; 61.5% explained covariance, Figure 2C-D). This LC was characterized by a significant effect of group (BSR: 3.1). LL infants showed overall stronger syllable entrainment compared to HL participants. There was no significant age or age*group effect, indicating that syllable entrainment age-trajectories followed a similar convex shape in both groups. Moreover, the LC comprised a positive effect of verbal outcome (BSR: 10.0), as well as a negative group*verbal outcome interaction effect (BSR:-5.9). This indicates that lower syllable entrainment was associated with poorer verbal performances at 18-21 months of age, predominantly in the HL group.
Neural entrainment to words
The PLS-c on 1.33Hz entrainment vs. the entrainment at adjacent frequency bins during the STR stream resulted in one significant LC (p =.006, r =.33, 30.7% explained covariance). There was a significant contrast effect (BSR: 29.6; Figure 3A-B), confirming that at the whole sample level, infants’ brains phased locked to the word rate and thus learned the regularities. This analysis also revealed a quadratic effect of age in the opposite direction of that observed for syllable entrainment, which showed a convex pattern (contrast × age² BSR: –4.0; Figure 2A-B). By contrast, the age-related trajectory for word entrainment was concave, reaching a nadir at 12 months (contrast*age² BSR: 3.6). To confirm the specificity of the word effect, we performed a PLS-c analysis at 1.3 Hz during the RND condition, in which no entrainment was expected. This analysis didn’t reveal any significant LC.

Neural entrainment to word rate (1.3Hz): Main effect across all participants
(Top row); Group differences (bottom row). A. Design salience (left) and brain salience (right) derived (through bootstrapping) from the significant latent component (LC) for neural entrainment to word rate. B. Individual raw PLVs extracted from the salient electrodes given by the LC (black dots on subfigure A) with a fitted curve, for visualization purposes only. C. PLS-c applied on PLV at 1.3Hz with adjacent frequencies subtracted, using group as contrast, and verbal outcome (verbal developmental quotient [DQ] collected at 18-21 months) added as a design variable. D. Raw individual data extracted from the salient electrodes of the LC (black dots on subFigure 2C) with a linear regression curve fitted for illustration purpose only.
Using group as the contrast variable (LL vs HL), we identified a significant latent component (p=.002; r=.50; 27.7% explained covariance) (Figure 3C-D), but no reliable group effect (BSR<2.3). This indicates globally homogeneous neural tracking of words across infancy in both groups. There was also no effect of verbal outcome, nor significant interaction verbal outcome *group.
Given previous evidence of impaired SL in ASD, we conducted a supplementary PLS-c analysis restricted to the HL group to confirm that HL participants exhibited SL abilities. Using the contrast (1.3Hz vs. adjacent frequencies) and the same age-related design variables, we confirmed significant word entrainment within the HL group (one significant LC with p=.002, r=.50 and 38.2% explained covariance; contrast BSR=12.9, see supplementary material S4). Moreover, the salient electrodes contributing to the LC showed a spatial distribution closely matching that observed in Figures 3A and 3C, confirming that the core neural mechanisms supporting SL were consistent across groups. However, the significant age*group interaction, along with two additional effects: mean-age*group (BSR: 14.2) capturing cross-sectional age differences, and delta-age*group (BSR: –13.0), suggests distinct developmental trajectories between the two groups. Word entrainment in LL participants followed a more concave (U-shaped) age-trajectory than in HL participants (Figure 3D). This interpretation was supported by a supplementary model comparison using Akaike Information Criterion (AIC) to evaluate constant, linear, and quadratic mixed-effects models on raw PLV values extracted from salient electrodes (black dots in Figure 3C) within each group (59). For HL participants, the constant model had the lowest AIC (106.4) with a significant intercept above zero (.27 estimate, p=.021). In contrast, for LL participants, the quadratic model was best (AIC =93.4) with significant effects for the intercept (estimate 1.33, p=.025), age (estimate-.24, p=.036) and age2 (.01 estimate, p=.049). A PLS-c analysis restricted to the HL group (contrast: 1.33 Hz vs. adjacent frequency bins) further confirmed the absence of a U-shaped trajectory, showing no interaction with age² (BSR: <2.3, Supplementary material S4). In summary, SL was reliably present and stable in HL participants, while LL participants showed a transient dip in performance, pointing to divergent developmental trajectories rather than differences in learning capacity.
Time course of the entrainment along the experiment
As a post-hoc analysis, we explored the temporal dynamics of SL across the session using PLV data at each 1.5-second timeframe within a 120-second time window, averaged across the salient electrodes identified in the previous global analysis. Mixed-effects models fitted at each time frame comparing entrainment at 1.33Hz versus the adjacent frequency bins revealed that word-level entrainment emerged approximately 90 seconds after the onset of the structured stream (Supplementary material, S5).
To gain further insight into the learning dynamics over time in the two groups, we conducted a PLS-c analysis comparing the syllabic and word entrainment time courses between LL and HL participants. HL/LL group was used as a contrast, while the brain data matrix (X) used PLV in both time and space dimensions, i.e., PLV in each electrode at each timeframe. For syllable entrainment, the analysis revealed a significant latent component (p <.001; see Supplementary Material S6). The group, age, and interaction BSRs were similar to those observed in the previous PLS-c presented in Figure 2C. The group difference in syllable tracking became significant (>2.3) approximately 90 seconds after the onset of the STR stream, coinciding with the time participants began tracking word boundaries (Supplementary material, S5). This result suggests that the group difference in syllable entrainment, as shown in Figure 2C-D, as its association with verbal outcome, becomes more pronounced from the moment word segmentation comes online. In contrast, a PLS-c analysis of the word-level entrainment time course revealed no significant latent components when comparing the two groups.
ERP analyses
Visual inspection of the grand-average ERP across all recordings showed an early response characterized by a frontal positivity accompanied by a posterior negativity, corresponding to the auditory response and developing over approximately the first second (word duration=750 ms). It was followed by a late component displaying a reversed spatial pattern from 1500 to 3000 ms (see Supplementary Material S7). ERP analyses using either condition or group as contrast variables were conducted in two temporal time windows: an early window sensitive to auditory processing [0-1000ms], and a later window [1500-3000ms] typically associated with higher-order cognitive responses, such as novelty detection and surprise-related activity (60).
Early ERP
We first tested whether a significant difference between word and part-word conditions was observed across all participants in the early [0-1000ms] time window (grand average topographies are displayed in Figure S8). We performed a longitudinal PLS-c contrasting part-word and word conditions. This analysis identified one significant LC (p<.001; R=.54; 63.0% explained covariance, Figure 4). However, this component did not show any significant contribution from the condition contrast or its interaction with age variables (BSR<2.3). Instead, only the age-related variables (BSRs for age:-12.6; delta-age:-29.5; age²: 11.1) significantly contributed to the LC. This indicates age-related decline in the amplitude of both the frontal positivity and the posterior negativity, following a slightly quadratic concave shape (Figure 4B). There was no evidence for any modulatory effect of the part-word/word condition on this developmental pattern.

Early evoked response potential (ERP) to part-words compared to words across all participants.
A. Design and brain saliences derived from the significant latent component. Brain topographies of bootstrap ratios (BSR) are displayed at 250ms intervals. Black dots indicate BSR > 2.3. B. Participants’ brain scores for part-word and word conditions, as a function of age. Brain scores are participants’ raw voltage data projected onto electrode saliencies. Brain scores illustrate how individual EEG data fit the saliences derived from the latent component. Linear fitting is used for illustrative purposes only. C. Voltage grand averaged (left) and differential (right) responses to part-word and word conditions at each age bin.
A PLS-c analysis of the differential ERP to part-words vs. words using group as contrast (HL versus LL) with age-related and verbal outcome parameters did not reveal any significant LC. This indicates that the absence of an early condition effect in the full sample (Figure 4) was not due to opposing response patterns between the LL and HL groups.
Late ERP
The PLS-C analysis in the late time window (1500 to 3000 ms), contrasting words vs. part-words, resulted in one significant LC (p<.001, R=.49, 47.4% explained covariance) (Figure 5). The part-word vs. word contrast showed a significant contribution (BSR: 6.9), with a late frontal negativity and posterior positivity more pronounced in the part-word condition than in the word condition (see Figure S8 for grand average topographies). All age variables contributed significantly to the LC (BSRs for age:-10.9; delta-age:-21.6; age²: 8.6), with a similar pattern to that observed in the early time window: a decrease in grand average amplitude with age, following a slightly concave trajectory. Although the contrast × age² interaction reached statistical significance (BSR = –3.2), the corresponding effect (i.e., reduced condition differences at the youngest and oldest age) was modest and not clearly visible in the data (Figure 5B-C). Additional data will be needed to assess the robustness and replicability of this pattern.

Late evoked response potential (ERP) to part-words compared to words across all participants.
A. Design and brain saliences derived from the significant latent component. Brain topographies of bootstrap ratios (BSR) are displayed at 250ms intervals. Black dots indicate BSR > 2.3. B. Participants’ brain scores for part-word and word conditions, as a function of age. For details, see Figure 4. C. Voltage averaged (left) and differential (right) responses to part-word and word conditions at each age bin.
A PLS-c analysis of the difference part-words minus words using group as contrast revealed one significant LC (p=.002; r=.65; 29.3% explained covariance, Figure 6), with a similar topography and timing that the LC identified on the activation using words vs part-word as contrast (Figure S9 shows grand average topographies per group and condition). This LC included a significant group effect, with stronger frontal negativity and posterior positivity in LL participants compared to HL (BSR: 15.7). Additionally, significant negative age (BSR:-5.8), age² (BSR:-3.7), and age²*group interaction (BSR:-12.1) effects were observed. Furthermore, the LC included a positive effect of both verbal outcome (BSR: 16.1) and verbal outcome*group interaction (BSR: 3.0). This indicates that the magnitude of the late response to novelty predicted verbal outcomes, particularly in LL participants.

Late evoked response potential (ERP) to novelty in infants at high and low likelihood for autism.
A. Design and brain saliences derived from the significant latent component. Brain topographies of bootstrap ratios (BSR) are displayed at 250ms intervals. Black dots indicate BSR > 2.3. B. Participants’ brain scores for part-word and word conditions, as a function of age. For details, see Figure 4. C. Voltage differential responses to part-word and word conditions at each age bin and within each group. HL: high likelihood for autism; LL: low likelihood for autism.
Given that PLS-c analyses in the late window revealed both condition effects (words vs. part-words) and group differences on overlapping LCs, we asked whether the observed effect was present in both groups or driven primarily by LL participants. To test this, we ran separate supplementary PLS-c analyses within each group using the part-word vs. word contrast (see Supplementary Material S10–S11). Both analyses revealed a significant LC (p <.001), with similar spatial topographies and temporal profiles. Crucially, the condition contrast contributed significantly only in the LL group (BSR = 13.0), but not in the HL group (BSR < 2.3), suggesting that the group difference observed in Figure 6 reflects the absence of a detectable novelty response in HL participants with the current analysis.
Overall, we found no difference between conditions during the early time window, which was aligned with the temporal progression of syllables within the triplet ([0–1000] ms; Figure 4). However, in a later time window ([1500–3000] ms) following triplet onset, the ERP exhibited a differential response to part-words and words (Figure 5). This late effect was positively associated with verbal outcomes at 20 months and, notably, was not detected in the HL group.
Discussion
We exposed 44 infants at low (LL) and high (HL) likelihood for Autism Spectrum Disorder (ASD) to an artificial speech stream composed of syllabic triplets (words) with random speaker changes at every syllable to assess their ability to track both syllables and word-level structure under conditions of acoustic variability. We focused on three electrophysiological measures: 1) neural entrainment to syllables; 2) neural entrainment to words; and 3) ERP responses to isolated words and part-words, analyzed in early [0-1000ms] and late [1500-3000ms] windows. These analyses allowed us to trace the sequential stages of speech processing, from online word segmentation to later word recognition, and to identify the levels at which HL infants may encounter difficulties. While the core mechanism of statistical learning (SL) appeared intact in HL infants, as evidenced by word-level entrainment in both groups, several group differences emerged in the dynamics of syllable tracking and in the late ERP responses. Notably, some of these effects were significantly related to verbal outcomes at 18-21 months.
Syllable neural entrainment
HL infants exhibited significantly weaker syllable-level neural entrainment than their LL peers (Figure 2), despite following a similar convex developmental trajectory across the first two years of life. This persistent lag in rhythmic tracking was not only stable over time but also strongly predictive of verbal outcome at 20 months, highlighting its potential as an early neurophysiological marker of language development. This result refines previous observations regarding reduced theta power in HL infants and autistic children exposed to natural speech (28,29). Unlike these studies, which estimated speech-brain coherence across the theta band, our analysis specifically targeted a particular frequency (4 Hz) and isolated phase-locked activity by subtracting adjacent frequency components. This more selective approach points to a precise impairment in synchronization with syllabic units, rather than a general reduction in low-frequency neural activity, indicating a targeted disruption of syllabic tracking.
Interestingly, the divergence between HL and LL infants emerged specifically during the structured stream and coincided with the time window in which infants began to segment the continuous speech flow into word-like units (Supplementary Material S6). This temporal overlap suggests that successful segmentation may enhance syllable tracking via top-down predictions of the next syllable, improving alignment to syllable onsets in LL infants.
While this top-down interpretation highlights the role of word segmentation in enhancing syllable tracking, it does not exclude additional contributions from lower-level sensory mechanisms. According to the oscillatory framework of speech perception, syllable tracking is thought to be supported by endogenous neural oscillations in the theta band, which align with the natural rhythm of speech (61). Disruptions in this mechanism have been proposed in ASD (28,29). However, recent evidence suggests that this model may not fully apply to early infancy. In typical 3-month-old infants, we observed robust neural entrainment to amplitude-modulated sounds across a wide frequency range (2–45 Hz), with more accurate tracking below 12 Hz — but no selective enhancement in the theta range (Kabdebon et al., submitted), challenging the idea of particular sensitivity to entrainment in this frequency range, at least in infants.
Beyond oscillatory alignment, another possibility is that weaker syllable tracking in HL infants stems from temporally and/or frequency-imprecise responses along the auditory pathway. Impaired temporal or spectral fidelity, particularly at the brainstem level, has been reported in ASD and other language-related developmental disorders (37,62). In this context, decreased cortical power at specific frequencies observed in dyslexic adults (63) and ASD (64) might reflect the long-term consequences of lower-level auditory processing difficulties rather than their root cause. Supporting this view, recent findings show that atypical brainstem responses can be detected even in first-degree relatives of autistic individuals, and that lower brainstem response consistency is associated with poorer pragmatic language skills in these individuals (65). These results highlight the cascading effect of low-level auditory processing anomalies on higher-level verbal and communication abilities. Such a cascade could explain the observed relationship between syllabic entrainment and verbal outcomes at 18–21 months. Future studies should directly assess whether degraded brainstem responses underlie reduced entrainment in HL infants using higher EEG sampling rates.
In sum, HL infants show a specific and developmentally stable deficit in syllable tracking, which may reflect reduced top-down support from word segmentation, impaired oscillatory alignment, and/or imprecise encoding in the brainstem. Crucially, this neural signature was predictive of later verbal outcomes, underscoring its potential relevance as both a mechanistic insight and an early clinical marker. We now turn to word entrainment results, which offer a complementary window into how infants begin to extract and consolidate word units from speech.
Word segmentation: neural entrainment
Neural entrainment at the word rate (1.3 Hz) provides a robust index of speech segmentation: it can only emerge once infants begin grouping syllables into coherent word-like units. Despite the acoustic variability introduced by speaker changes, both HL and LL infants showed significant word-level entrainment after 90 seconds of exposure to the structured stream (Supplementary Material S5), replicating previous findings with single speakers (15,16). Although HL infants exhibited reduced syllable tracking, no overall group difference was observed for word entrainment (Figure 3). However, the low frequency of the word rhythm, coupled with high power in the same range in infant EEG, may reduce the signal-to-noise ratio, limiting the sensitivity of this comparison.
Our longitudinal design, however, revealed a striking divergence in developmental trajectories. LL infants displayed a U-shaped curve in word entrainment, with strong responses at 3 and 21 months and a dip around 12 months (Figure 3D). In contrast, HL infants showed a flatter trajectory, with consistent performance across ages. Given the scarcity of longitudinal studies on SL in infancy (42), these data provide novel insight into the developmental dynamics of word segmentation. Importantly, previous studies have shown that SL is present from birth (14,15,44), and that word learning trajectories in 6-month-olds resemble those observed in adults (16), suggesting that this 12-month dip does not reflect a lack of SL ability, but rather a transient disruption of an otherwise early and robust mechanism.
The dip in LL infants around 12 months coincided with an increase in syllable entrainment (Figure 2B), making it unlikely to result from reduced data quality or measurement artefacts. We propose, as a post-hoc hypothesis, that the random changes in speaker identity between syllables disrupted word segmentation. In typically developing infants, voice processing performance improves toward the end of the first year (51,66), alongside the accumulation of implicit knowledge about how speech is typically structured — notably, that words are usually produced by a single speaker and that they do not overlap prosodic boundaries. These emerging priors may constrain the segmentation process by restricting it to plausible word units, as has been shown for prosodic boundaries (67). In our paradigm, where speaker identity changed randomly between each syllable, this conflict may have temporarily disrupted this process, leading to a reduction in power at the word frequency.
Simultaneously, growing attention to voice-related features may have improved alignment to syllables — consistent with the convex concave shape of syllable entrainment. The 12-month dip in word tracking may also reflect the attentional cost of reorienting to each new voice (52,53), during a period when infants are intensely engaged in learning social cues. This interference appears to be transient: word segmentation recovers at later ages, potentially as infants become more efficient at managing speaker variability and less disrupted/interested by voice changes. This trajectory aligns with adult findings showing that word segmentation in continuous speech is context-sensitive, shaped not only by input statistics but also by the relative weight of different priors (12).
HL infants, on the other hand, did not show this transient disruption. In this group, word entrainment remained stable over time, possibly due to reduced sensitivity to social and vocal cues (68). Diminished voice processing in HL infants may have spared segmentation abilities by limiting the interference introduced by speaker variability. Supporting this interpretation, group differences in syllable tracking became significant specifically during the structured stream and aligned with the timing of segmentation onset (Supplementary Materials S5–S6). This suggests that LL and HL infants differ not in their ability to learn statistical regularities per se, but in how they integrate or suppress competing cues — such as speaker changes — during segmentation.
While HL infants may benefit from reduced sensitivity to conflicting social cues — as suggested in our paradigm — this same attenuation could become a disadvantage in more naturalistic settings, where voice identity supports word recognition (e.g., attributing speech to the correct speaker in conversation). Nonetheless, this interpretation remains speculative in the absence of a control condition with a consistent speaker. Future studies manipulating speaker variability will be essential to determine its causal role in shaping the divergent developmental trajectories observed between groups.
Word recognition: ERP
In the test phase, we observed a robust late ERP response to novel pseudo-words compared to familiar triplets, emerging around 750 ms after word offset (Figure 5). This response reveals participants’ ability to recognize the words they have been exposed to during the stream. It belongs to the family of late slow-wave components observed in infants, which have been associated with higher-order novelty detection and orientation mechanisms (60). Crucially, the amplitude of this late response correlated positively with verbal outcomes at 20 months, suggesting that the ability to track and later recognize newly learned words may serve as an early predictor of language development. This association was particularly evident in LL infants.
Despite their transient drop in word tracking at 12 months, LL infants still exhibited a difference between words and part-words at test, indicating that they had successfully stored the transitional probabilities between syllables. In other words, voice-related interference during the learning phase seems not to prevent participants from recognizing a familiar word form in test. SL can thus remain intact even when evidence for online segmentation is weak or absent — a dissociation previously reported in neonates and adults (43).
In contrast, HL infants showed no clear ERP difference between novel and familiar triplets (Figure 6; Supplementary Material S11). Given that both groups showed similar word neural entrainment during learning, this absence is unlikely to reflect a failure in SL itself. Instead, it may point either to a deficit in novelty orientation — a phenomenon previously described in ASD and HL infants (69,70), or to a failure to explicitly recognize the novel part-word. A recent study in adults showed that implicit and explicit traces of SL rely on separate consolidation mechanisms, with implicit learning being more robust (71). Future studies should explore whether this dissociation also exists in early development, and whether it differs between HL and LL infants.
Finally, we found no early ERP distinction between part-words and words in the [0–1000 ms] window following word onset (Figure 4). This contrasts with findings in neonates, who rely on the first-syllable to recognize recently learned words (15). This result suggests that by 3 months of age, infants may shift from relying on recognition of initial syllables to encoding entire word forms, a more mature form of lexical storage and retrieval. Alternatively, the topology of early responses might change during development, making the identification of these patterns difficult, given the relatively small number of participants tested at each age.
Implications for early detection and intervention
Our results highlight several neurophysiological markers that may prove useful for early identification of infants at heightened likelihood for language difficulties or neurodevelopmental disorders. Importantly, these measures rely on passive EEG paradigms, making them accessible, non-invasive, and feasible even in very young or at-risk populations.
Syllable neural entrainment, which was consistently weaker in HL infants and predictive of later verbal outcomes, may serve as an early indicator of atypical speech tracking. This deficit may disrupt the ability to process and integrate phonetic and phonotactic cues critical for internalizing the rules of the native language. Moreover, interventions known to enhance auditory encoding — such as music-based training — have shown benefits for brainstem precision and language outcomes (72,73), and may prove particularly valuable in at-risk populations.
Likewise, the absence of a late ERP orientation response in HL participants may represent an early neural signature of altered attention to novelty that can be used both as a non-invasive biomarker for heightened ASD likelihood and as a potential target for early intervention.
Limitations
Several limitations of this study should be acknowledged. First, given the multidimensional nature of the EEG data and the longitudinal design, we relied on Partial Least Squares Correlation (PLS-c) analyses to identify latent components linking neural responses with experimental and developmental variables. While this multivariate approach was necessary to reduce data dimensionality and handle collinearity, it captures only shared variance across participants. As a result, PLS-c may miss more localized or subtle effects that do not align with the dominant latent structures. Thus, although our longitudinal design offers valuable insights into developmental trajectories, the uneven age distribution and limited number of repeated measures per infant constrain the interpretation of individual growth curves. More densely sampled longitudinal data would be needed to precisely model intra-individual changes.
Finally, while we interpret reduced syllable entrainment and diminished novelty responses in HL infants as potential early markers of later verbal outcomes, the predictive validity and ASD-specificity of these neural indices should be confirmed in larger cohorts and across more diverse developmental profiles.
Conclusion
Our study underscores the value of longitudinal neuroimaging—both online and offline—for unpacking the hierarchical processes underlying infant speech processing in both HL and LL populations. Even in typical development, these mechanisms remain poorly characterized. By mapping their developmental trajectories across the first two years of life, we revealed how early speech processing abilities are not static, but dynamically shaped by concurrent maturational changes.
Importantly, our results illustrate how subtle low-level differences—such as reduced syllable tracking or diminished novelty responses—can cascade into broader developmental outcomes. This supports a neuroconstructivist perspective (74), where early neural variability may help explain later divergence in language and communication, both in typical and atypical pathways.
Materials and Methods
Participants
Forty-four infants from the ongoing Geneva autism cohort were included during longitudinal visits, contributing to 83 EEG recordings (4,75). The open longitudinal design comprised four longitudinal visits: (1) at 3 months, (2) between 6 and 9 months, (3) between 12 and 15 months and (4) between 18 and 21 months. Verbal outcome was collected at the last visit (18-21 months). Longitudinal neuroimaging designs reduce within-subject variability by distributing the information across time (76,77). All participants were exposed to the French Language in their environment. The standardized developmental assessment Mullen Scale of Early Learnings (MSEL) was administered at the last visit to estimate participants’ verbal developmental quotient (DQ), our outcome measure (78). Participants’ developmental ages at the 18-21 months visit were divided by their chronological age to get developmental quotients (DQ) centered on 100. DQs avoid floor-effects by not truncating the very low-performing participants’ scores (79,80). The current study was approved by the University of Geneva’s Ethics Committee, and parents provided written informed consent. Sample characteristics are provided in Table 1.

Sample characteristics
. Statistical comparison between LL and HL samples. For categorical variable, chi square (χ2) was applied. For continuous variables, we used two-tailed independent T tests. P values <.05 are highlighted in bold.
The 19 LL infants were healthy full-terms (>37 weeks of pregnancy) without any reported ASD in their 1st degree relatives. From the 25 HL infants, 18 had an older autistic sibling (n=18), which is known to be associated with a 18% prevalence of ASD (81). The 7 other HL infants presented with early parental concerns for ASD. At their 18-21 months visit, their Autism Parent Screen for Infants (APSI) total score was greater than 8 (15.6±6.4, [8-22] range), reflecting a 63% positive predictive value for ASD in HL populations (82). One of the HL participants with early parental concerns had a PACS1 mutation, a condition associated with a 37% ASD prevalence (83). Moreover, 4 HL infants were born preterm (range: [31-36] gestational weeks), a condition associated with a 7% ASD prevalence (84). Corrected age was used by subtracting the number of prematurity weeks from the chronological age (78,85).
There was no statistically significant difference (chi-square, p>.05) between HL and LL in either biological sex, visits’ repartition, or age (Table 1). Verbal outcome was missing for 4 participants (3 LL and 1 HL, 4 recordings) because of drop-out before the 18-21 months outcome visit. Those participants were excluded from analyses using verbal outcome as a parameter.
Of the initial sample (103 EEG recordings), 20 acquisitions from 16 participants (5 LL and 11 HL) were not included (19.4% data loss), due to too many crying and/or motion artifacts after visual inspection of the data (3 LL and 13 HL recordings), or to examiner’s acquisition error (2 LL and 2 HL recordings).
Stimuli
Stimuli and procedure were based on Fló et al’s paradigm (15,44). 12 syllables were synthesized with MBROLA (86) using French phonemes and phonotactic rules. Each syllable lasted 250 ms, had a flat intonation, and was produced by 3 male and 3 female MBROLA speakers. We built two different stream conditions: a structured stream (STR) and a random control stream (RND) by concatenating the syllables with speakers varying randomly from one syllable to the next. This created a rhythmic syllabic presentation at 4Hz in both streams. In the RND stream, syllables were presented randomly, maintaining a flat transition probability (TP) of 1/11 between syllables. In STR, syllables were organized into 4 tri-syllabic words presented in random order but without repetition, resulting in TPs equal to 1 within words and a drop in TP to 1/3 at word boundaries. Words were paced at 1.33 Hz (1/(3 × 0.25s). We built two RND streams lasting 90 s each, one long STR stream lasting 180 s, and 6 short STR streams lasting 30 s each. Note that the voice dimension was totally orthogonal to the present design, each syllable being randomly produced by one of the six speakers. Thus, 1) the voice was not constant within a word and 2) there was acoustic variability between words (i.e. different voices produced the same syllables). This implies a voice normalization process to recognize the same syllables and words across different occurrences. More details about the stimuli are provided in the Supplementary Material, Stimuli section.
To explore infants’ word recognition after familiarization with the STR stream, we constructed isolated triplets corresponding to words and part-words. Words exactly matched the triplets used to build the STR streams (AiBiCi structure, letters being syllables from the ith learned word). Part-words included the 2 last syllables of a word i, with a random first syllable of another word k (BiCiAk structure). In a part-word, the TP between the two first syllables was 1, but these syllables were not in the correct position in the word. Furthermore, the TP between the second and the last syllable was 0.33 instead of 1. While a late ERP difference would signal that infants were sensitive to any of this information, an early difference would signal participants’ recall of the ordinal position of the syllables within the learned words (Flò et al, 2022).
Data Collection
We collected high-density electroencephalograms (EEG) with a 128-electrode net (Electrical Geodesics, Inc) referenced to the vertex. The sampling frequency was 250 Hz. Participants were sitting on their parent’s lap, and a silent cartoon was presented to keep them quiet and still. Stimuli were played on a Bose® Companion 2 Series III at a 50cm distance with a standardized sound volume. The cartoon was not time-locked to the auditory stimuli and varied across participants. The same examiner (MG) collected all data. When participants were restless and/or not interested in the cartoon, the examiner engaged them in quiet activities (e.g., staring at bubbles/toys). Some EEGs were recorded while participants were asleep (61.1% at 3 months, 40% at 6-9 months, 20% at 12-15 months and 4% at 18-21 months). There was no significant difference (p=.797) in sleeping status between HL and LL (linear mixed-effect model with repeated measures). Neural entrainment and statistical learning have been shown to be present in sleep in adults and neonates (15,16). The experimental procedure is detailed in Figure 1A.
Data pre-processing
Data were first resampled to 300 Hz to get an integer sample number in each three-syllabic item, then band-pass filtered (0.2-40 Hz). APICE preprocessing pipeline (https://github.com/neurokidslab/eeg_preprocessing) was applied using EEGLAB toolbox 2020.0 on Matlab® R2018b (87,88). In brief, bad segments of data were identified using algorithms detecting low correlation with other channels (usually due to non-functional channels) and outlier values for the signal amplitude or changes in the signal amplitude (typically due to motion artifacts), with a threshold of 2 (outliers are values bigger than two interquartile ranges away from the third quartile). A sample was considered to contain motion artifacts if more than 30% of the working electrodes were rejected. An electrode was considered bad if more than 30% of the free-of-artifacts samples were rejected. Short rejected periods (less than 100 ms) were corrected using target PCA. In segments containing less than 30% of the electrodes marked as bad artifact data were spatially interpolated using spherical spline interpolation. A careful data visual inspection was also carried out with manual removal of remaining bad electrodes and motion artifacts (done by MG). Independent Component Analysis (ICA) and the iMARA algorithm for component classification (89) were used to remove physiological noise. Finally, bad electrodes were interpolated using spherical splines.
A) Neural entrainment analyses
Given the construction of the streams with regular stimuli, we expected a neural entrainment at the stimuli specific frequencies (syllabic rate for the STR and RND streams and word rate when words were discovered in STR). This specific entrainment should not be observed during rest. We thus measured neural entrainment at 4 Hz (syllabic rate) and 1.3 Hz (word rate) in each period and each recording. We did so by considering non-overlapping epochs of 7.5 seconds respecting the chronological order of presentation (Benjamin, Dehaene-Lambertz, and Fló 2021). Epochs contaminated by artifacts were excluded from the analysis. On average, we kept 86% of data during RS (range of included epochs: [11-24]), 88% during RND ([12-24]), and 91% during STR ([32-48]). The number of included epochs did not differ between LL and HL groups, nor vary with age, suggesting an even data quality across groups and age (see Data quality section in Supplementary Material, S1).
Each epoch was average-referenced and normalized by dividing by the standard deviation computed across all electrodes and samples of each epoch. Denoising source separation (DSS) was applied to remove stimulus-unrelated activity using spatial filtering (90). Briefly, a PCA was first applied to the epoched data. Afterward, a DSS filter was applied on the 30 first components, and its 6 first components were kept. Eventually, a fast Fourier transform (FFT) was applied to the denoised epochs to estimate their phase-locking value (PLV). PLV estimates how much EEG activity is synchronized in phase with a specific frequency (1):

With N being the number of trials from one stream and 𝜑(𝑓, 𝑘) the phase at frequency f and trial k. PLV ranges from 0 (desynchronized activity) to 1 (phased locked activity). PLV was computed in 31 frequency bins (0.933 - 4.933 Hz, with 0.133 increment). In each stream, the PLV of each frequency bin was Z-scored over the 12 adjacent frequency bins (six on each side)(2):

Where fa is the targeted frequency bin, and PLV(fx) is the PLV over the 12 adjacent frequency bins. From now on, PLV refers to Z-scored PLVs. Additionally, in each electrode, PLV values from immediate adjacent frequencies (in which no entrainment is expected) were subtracted to get a cleaner signal (e.g., 1.2-1.47Hz for 1.3Hz word rate and 3.87-4.13Hz for 4Hz syllable rate).
Longitudinal Partial Least Square Correlations
Partial Least Square Correlation (PLS-c) is a multivariate statistical approach that has been successfully implemented for EEG data (57,91,92). PLS-c can also be applied to longitudinal neuroimaging datasets including a variable number of timepoints per participant as ours (58,93). In a fully data-driven approach, one single longitudinal PLS-C can detect age-trajectories of electrophysiological measures (here, PLVs), indicate the electrode clusters in which those trajectories take place, and find their association with behavioral parameters (here, HL/LL group and verbal outcome). We applied longitudinal PLS-c based on the pipeline described in Delavari et al. 2021 using the myPLS toolbox (94,95): https://github.com/danizoeller/myPLS). Briefly (Figure 1B): for each participant’s visit i, we built a behavior design matrix (Y) with 9 variables: 1) group (a binary contrast corresponding to LL or HL group), 2-3) two orthogonalized variables to grasp the cross-sectional and longitudinal effects of age (mean age = averaged age across all visits of the participant, and delta age = difference between participant’s age at visit i and the participant’s mean age), 4) another age variable to grasp convex/concave trajectories (age squared = delta age * [mean age averaged across all participants]), 5) one outcome variable (verbal outcome), 6-9) four interaction variables (between the groups contrast and the other design variables). Design variables were z-scored across all participants. We also built a brain data matrix (X) for each visit i including the 128 electrodes PLVs (1.3 Hz in STR when testing entrainment to word rate, and 4 Hz in both STR and RND when exploring entrainment to syllable rate). We then computed cross-variance matrices (R) as YTX. R underwent singular value decomposition (R = USVT) to derive 9 singular values called latent components (LC). Permutation testing (1000 permutations shuffling behavioral data across participants) estimated whether each LC statistically significantly explained the correlation. Bonferroni correction was applied (9 components tested, alpha:.006). In each significant LC, bootstrapping (500 random samples and replacement) was applied to evaluate the saliency of each behavior and brain (electrodes’ PLV) variable, reflecting the stability of its contribution to the LC. Saliencies are summarized as bootstrap ratio (BSR): mean of bootstrap results divided by standard deviation. We considered BSR > 2.3 as stable (99% confidence interval not crossing zero). To respect the intra-participant longitudinal dependencies, the bootstrap samples were randomly selected across participants and not EEG recordings. To help readers interpret the complex output of the PLS-c, we created plots of the individual raw EEG data extracted from the salient electrodes as a function of key behavioral variables (e.g., age, age-squared, and verbal outcome). We overlaid a polynomial curve (fitted using a least-squares mixed-effects model) on these raw data plots for visualization purposes only. This curve is strictly intended to assist with interpretation and is not meant as an additional statistical analysis, since the statistical relationship between EEG and behavioral variables was already established by the PLS-c.
It is important to note that PLS-c identifies EEG spatial patterns shared across ages and groups, allowing us to measure group and age effects within these shared spatial maps. Consequently, this method may overlook subtle spatial differences that could exist between specific groups and age categories.
B) Analyses of ERPs to test items
The preprocessed data were low-pass filtered (20 Hz) and epoched between [− 1.75, 3.25] s from the triplets’ onset. Epochs containing artifacts were excluded. On average, we kept 83% of data for Words (range of included epochs: [23-48]), and 83% for Part-Words ([19-48]). Included epochs tended to decrease with age, but did not differ between LL and HL groups, and there was no age*group interaction, suggesting a similar data quality across groups (Supplementary Material, S2). Each participant’s data was reference-averaged and normalized by dividing by the standard deviation computed across all electrodes and samples of each epoch. Trials were averaged by condition (Words and Part-words).
ERP longitudinal Partial Least Square Correlations
We applied longitudinal PLS-C on ERP data to investigate the trajectories of word recognition between LL and HL, using the same method as above with the following differences: brain matrices (X) included voltage measures from each electrode at each sample (3.33 ms time frame) instead of PLVs at each electrode. Consequently, X were time*space matrices (57). Moreover, to summarize the ERPs relationship with age and squared age, we computed brain scores. This metric reflects the projection of participants’ raw voltage values in the electrode saliencies obtained from the PLS. Brain scores thus provide summary values reflecting how well each EEG acquisition fits the brain saliences obtained by a given LC. Fitting a mixed-effect model on brain scores allowed an illustration of the ERP trajectory with age. The same limitations outlined earlier apply to PLS-C when using ERP data—namely, it identifies spatial patterns shared between groups and age bins, which may lead to missing subtle, group-or age-specific differences.
Supplementary Material
1) Stimuli
Using the MBROLA diphone, we synthetized 12 syllables (250 ms duration) using French phonemes: 6 vowels with 160ms duration ([i], [e], [ɛ], [a], [u], [o]) and 12 consonants with 90ms duration, 6 voiced ([b], [d], [g], [v], [z], [ʒ]), 6 unvoiced ([p], [t], [k], [f], [s], [ʃ]).
Syllables were [ta], [do], [vɛ], [fi], [za], [pu], [ge,], [kɛ], [so], [ʒu], [ʃe], and [bi]. All syllable combinations followed French phonotactic rules. Each syllable was produced by six selected speakers in MBROLA: 3 male adult speakers (fr3 with low pitch, fr1 with middle pitch, fr7 with high pitch) and 3 female adult speakers (fr2 with low pitch, it4 with middle pitch and fr4 with high pitch). Random (RND) and structured (STR) streams were constructed by concatenating the syllables without coarticulation and a random choice of the speaker, with the only constraints that there was no repetition of the same speaker in a row, nor alternation between two speakers more than once (if A and B are two speakers, then ABAB is forbidden). The same rule was applied for syllable identity in the RND stream and the word in the STR stream (i.e. no repetition, nor alternation more than once of a syllable or a word). Three STRs were built, each one using different words, and randomized across participants. The words in stream 1 were [tadovɛ], [fizapu], [gekɛso], and [ʒuʃebi], and part-words (including the 2 last syllables of a word with a random first syllable of another word) were [dovɛfi], [ʃebita], [bitado], and [soʒuʃe]. In the second stream words were [dovɛfi], [zapuge], [kɛsoʒu], and [ʃebita]. Part-words were [vɛfikɛ], [soʒuʃe], [tadovɛ], and [gekɛso]. In the third stream, words were [vɛfikɛ], [soʒuʃe], [pugeza], and [bitado]. Part-words were [tadovɛ], [ʒuʃebi], [kɛsoʒu], and [zapuge].
2) Data quality
A) Data quality for neural entrainment measures
We applied a linear mixed-effect model (MyMixedModelsTrajectories toolbox in Matlab R2018: https://github.com/danizoeller/myMixedModelsTrajectories) on individuals’ amount of included epochs in RS, RND and STR (total of included epochs, the maximum possible value being 96), with fixed effects for age and group (HL versus LL), and random slope that varied by participant. The number of included epochs for each participant was used as a proxy for data quality. Three cct models with random slopes (constant, linear, quadratic) were fitted using following formula:
(1) Good-epochs ∼ age * group + (1+age|subject)
The following equation was used for the linear model:
(2) Good-epochsim = β0 + β1*groupi + β2*ageim + β3*groupi *ageim + b1m* agei + b0m+ εim
for participant i at timepoint m, with β1-3 being the fixed-effect coefficients for group (coded by a dummy variable), age and group*age interaction. The b1m term is the random slope varying by participant, b0m is the normally distributed random effect, and εim is the normally distributed observation error. The Bayesian information criterion (BIC) indicated the best-fitting model. Here, the constant model was selected (BIC=584), suggesting that age had no effect on data quality. We found no significant group effect (p=.748, beta=85.5). Consequently, we considered that data quality for neural entrainment measures was broadly similar between HL and LL and across ages. This analysis is illustrated in the Figure S1.

Neural entrainment data quality.
LL mean trajectory is in red and HL in blue, with their respective 95% confidence intervals. Included epochs in absolute number.
B) ERP data quality
The same analysis was run on the ERP epochs (total of included epochs, the largest possible value being 96: 48 Word items and 48 Part-Word items). The model of order 1 (linear) was selected (BIC=569), we found no significant group effect (p=.656, beta =-1.6) or group*age interaction effect (p=.436, betas =-0.4) or age effect (p=.149, beta =-0.6). This analysis is illustrated on the Figure S2

ERP data quality.
LL mean trajectory is in red and HL in blue, with their respective 95% confidence intervals.
3) Supplementary analyses for neural entrainment
As a quality control, we checked for the absence of spurious frequency peaks at 1.3 Hz and 4 Hz during the resting state, in which no entrainment is expected. We ran two PLS-C in the resting state stream, using the same age variables as in the other analyses, and using the targeted frequency versus adjacent frequencies as contrast. None of these control analyses revealed a significant latent component.
Follow-up analyses restricted to each stream for syllable entrainment: For RND, the LC was significant (p<.001; r=.67; 87.3% explained covariance, Figure S3) with the following BSRs: contrast: 70.3*; mean age:-1.6; contrast*mean age:-0.7; delta age: 3.6*; contrast*delta age: 8.0*; age2: 1.2; contrast*age2:-4.1*. For STR, the LC was significant (p<.001; r=.73; 91.1% explained covariance) with the following BSRs: contrast: 91.4*; mean age:-.9; contrast*mean age: 0.7; delta age: 4.2* contrast*delta age: 4.7*; age2:-2.4; contrast*age2: - 3.8*.

Neural entrainment at syllable rate (4Hz) in STR and RND separately, PLS-c.
Yellow shading on left panels and black dots on middle panels indicate BSR > 2.3. BSR >2.3 is considered significant.

Word entrainment within HL participants.
Supplementary analysis comparing 1.33Hz entrainment with neighboring frequencies within the HL population using PLS-c. Analysis confirmed presence of a significant LC (p=.002, r=.50 and 38.2% explained covariance) with following BSRs: contrast: 12.9*; mean age:-3.2*; contrast*mean age: - 5.3*; delta age: 5.9*; contrast*delta age: 6.7*; age2: 3.6*; contrast*age2: 1.9.
Learning time course across all participants: Additionally, we ran an analysis of neural entrainment over the time course of the stimuli presentation to provide an illustration of neural entrainment over the experimental session, as done in Smalle et al (2022) and Flò et al (2022) for instance. We concatenated the epochs in chronological order (180 seconds of random stream, 360 of structured stream, and 180s of random stream again). For each targeted frequency (word at 1.33 Hz and syllable at 4 Hz), we averaged PLVs over the salient electrodes obtained from the PLS-C (black dots from Figure 3C for 1.3Hz entrainment and from Figure 2C for 4Hz). For both frequencies, we fitted a mixed-effect model using Matlab fitlme function (PLV∼1+(1|participant)), using 120 second sliding windows with an increment of 1.5 second and testing for differences with zero (Figure S5).

Neural entrainment time course over the experimental session considering all participants.
The plain squares under the plots correspond to the time samples with PLVs significantly greater than 0 (p<.05).
Group differences in the time course of neural entrainment: We ran two PLS-cs to analyze this time dimension. Brain matrices included PLVs at each timeframe (120s time-windows with 1.5s increment) averaged in space across salient electrodes obtained from the PLS in space dimension (black dots on Figure 3C for word entrainment and black dots of Figure 2C for syllable entrainment). Design matrices were the same as in previous analyses (including group LL>HL as contrast and age-related variables and verbal outcome as factors).
The PLS-c for 4Hz syllable entrainment resulted in one significant LC (p=.001; r=.66; 32.0% explained covariance) with the following BSRs: contrast (LL > HL): 7.2*; mean age:-12.3*; contrast*mean age:-1.4; delta age: 5.9*; contrast*delta age:-2.7*; age2:-1.8; contrast*age2: 2.2; verbal outcome: 20.5*; contrast*verbal outcome:-12.8* (figure S6). PLS-c for 1.3Hz word entrainment gave no significant LC (most significant LC: p=.103) revealing no difference between HL and LL participants.

Group differences in the time course of syllable neural entrainment (4Hz).
Gray shading on right panel indicates BSR < 2.3; BSR >2.3 is considered significant. Vertical dashed lines indicate the transitions between random and structured streams.
4) Supplementary analyses for ERP
First, ERP topographies were inspected across ages in the whole sample to exclude the presence of maturational qualitative changes that would have prevented the inclusion of all recordings into the same longitudinal PLSC model. Figure S7 shows the ERP topographies averaged across all participants at each visit for each condition (Part-word and Word), as well as the difference between conditions (Part-Word minus Word). No salient maturational qualitative change emerges from visual inspection (e.g., polarity inversion), legitimizing the use of a single multivariate linear model (PLS-C) that includes all visits.

ERP topographies across age bins and participants Figures S8 and S9 further divide ERP topographies by group, showing the response in both LL and HL at each age bin and for each condition.

ERP topographies in each group at 3mo and 6-9mo.

ERP topographies in each group at 12-15mo and 18-21mo
Figures S10 and S11 display the PLS-C conducted within LL and HL groups separately, for the late time-window [1500-3000ms].

PLS-C conducted within LL for late response to novelty.

PLS-C conducted within HL for late response to novelty.
Acknowledgements
The authors would like to thank all the families who participated in the study, as well as the many collaborators who contributed to data collection over the years, especially Lylia Ben Hadid, Nada Kojovic, Kenza Latrèche, Sara Maglio, Irène Pittet, and Stefania Solazzo. We also would like to thank Farnaz Delavari for the invaluable pieces of advice provided for the statistical analyses.
Additional information
Data availability
The corresponding author will make available the anonymised data used to support the conclusion of this work on reasonable request.
Contribution
A.F., G.D.L., M.G. and M.S. conceptualized the experimental protocol and analytic strategy.
M.G. acquired and preprocessed the data. M.G. analyzed the data under the supervision of A.F. M.G. wrote the manuscript with the input of all authors. All authors read and approved the final manuscript.
Funding
This research was supported by the Swiss National Foundation Synapsy (Grant No. 51NF40–185897), the Swiss National Foundation for Scientific Research (Grant Nos. #191227 to M.G. and #163859, #190084, #202235, #212653 to M.S.), the Fondation Privée des Hôpitaux Universitaires de Genève (https://www.fondationhug.org) and by the Fondation Pôle Autisme (https://www.pole-autisme.ch). The funders were not involved in this study and had no role other than to provide financial support.
Funding
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (SNF) (51NF40-185897)
Marie Schaer
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (SNF) (#191227)
Michel Godel
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (SNF) (#163859)
Michel Godel
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (SNF) (#190084)
Marie Schaer
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (SNF) (#202235)
Marie Schaer
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (SNF) (#212653)
Marie Schaer
Fondation privée des Hôpitaux universitaires de Genève (HUG)
Marie Schaer
Fondation Pôle Autisme (FPA)
Marie Schaer
References
- 1.American Psychiatric Association. Diagnostic and Statistical Manual of Mental DisordersAmerican Psychiatric Association https://psychiatryonline.org/doi/book/10.1176/appi.books.9780890425596Google Scholar
- 2.Prevalence and Early Identification of Autism Spectrum Disorder Among Children Aged 4 and 8 Years — Autism and Developmental Disabilities Monitoring Network, 16 Sites, United States, 2022MMWR Surveill Summ 74:1–22Google Scholar
- 3.Etiology of Autism Spectrum Disorders and Autistic Traits Over TimeJAMA Psychiatry 77:936Google Scholar
- 4.Early trajectories and moderators of autistic language profiles: A longitudinal study in preschoolersAutism 13623613241253015Google Scholar
- 5.Language in autism: domains, profiles and co-occurring conditionsJ Neural Transm 130:433–57Google Scholar
- 6.Early Language Profiles in Infants at High-Risk for Autism Spectrum DisordersJ Autism Dev Disord 44:154–67Google Scholar
- 7.Language Differences at 12 Months in Infants Who Develop Autism Spectrum DisorderJ Autism Dev Disord 46:899–909Google Scholar
- 8.Naturalistic Developmental Behavioral Interventions: Empirically Validated Treatments for Autism Spectrum DisorderJ Autism Dev Disord 45:2411–28Google Scholar
- 9.Childhood language skills as predictors of social, adaptive and behavior outcomes of adolescents with autism spectrum disorderResearch in Autism Spectrum Disorders 103Google Scholar
- 10.Effect of Preemptive Intervention on Developmental Outcomes Among Infants Showing Early Signs of Autism: A Randomized Clinical Trial of Outcomes to DiagnosisJAMA Pediatr 175:e213298Google Scholar
- 11.Nobody nowhere: the remarkable autobiography of an autistic girlLondon: Kingsley :192Google Scholar
- 12.Integration of Multiple Speech Segmentation Cues: A Hierarchical FrameworkJournal of Experimental Psychology: General 134:477–500Google Scholar
- 13.Statistical Learning by 8-Month-Old InfantsScience :274Google Scholar
- 14.Newborns are sensitive to multiple cues for word segmentation in continuous speechDev Sci 22:e12802Google Scholar
- 15.Sleeping neonates track transitional probabilities in speech but only retain the first syllable of wordsSci Rep 12:4391Google Scholar
- 16.Preverbal Infants Discover Statistical Word Patterns at Similar Rates as Adults: Evidence From Neural EntrainmentPsychol Sci 31:1161–73Google Scholar
- 17.Visual statistical learning in the newborn infantCognition 121:127–32Google Scholar
- 18.Statistical learning of new visual feature combinations by infantsProc Natl Acad Sci USA 99:15822–6Google Scholar
- 19.Modality-Constrained Statistical Learning of Tactile, Visual, and Auditory Sequences. Journal of Experimental Psychology: LearningMemory, and Cognition 31:24–39Google Scholar
- 20.The role of conscious attention in auditory statistical learning: Evidence from patients with impaired consciousnessiScience 28:111591Google Scholar
- 21.Statistical Learning in Children With Specific Language ImpairmentJ Speech Lang Hear Res 52:321–35Google Scholar
- 22.Statistical Learning in Specific Language Impairment: A Meta-AnalysisJ Speech Lang Hear Res 60:3474–86Google Scholar
- 23.Statistical learning as a window into developmental disabilitiesJ Neurodevelop Disord 10:35Google Scholar
- 24.Impaired Statistical Learning in Developmental DyslexiaJ Speech Lang Hear Res 58:934–45Google Scholar
- 25.Prospective Longitudinal Studies of Infant Siblings of Children With Autism: Lessons Learned and Future DirectionsJ Am Acad Child Adolesc Psychiatry 55:179–87Google Scholar
- 26.Use of Longitudinal EEG Measures in Estimating Language Development in Infants With and Without Familial Risk for Autism Spectrum DisorderNeurobiology of Language 1:33–53Google Scholar
- 27.Beyond Baby Siblings—Expanding the Definition of “High-Risk Infants” in Autism ResearchCurr Psychiatry Rep 23:34Google Scholar
- 28.Atypical coordination of cortical oscillations in response to speech in autismFront Hum Neurosci http://www.frontiersin.org/Human_Neuroscience/10.3389/fnhum.2015.00171/abstractGoogle Scholar
- 29.Neural Tracking in Infancy Predicts Language Development in Children With and Without Family History of AutismNeurobiology of Language 3:495–514Google Scholar
- 30.Speech Reception in Young Children with Autism Is Selectively Indexed by a Neural Oscillation Coupling AnomalyJ Neurosci 43:6779–95Google Scholar
- 31.Early-and Late-Stage Auditory Processing of Speech Versus Non-Speech Sounds in Children With Autism Spectrum Disorder: An ERP and Oscillatory Activity StudyDevelopmental Psychobiology https://onlinelibrary.wiley.com/doi/10.1002/dev.22552Google Scholar
- 32.Measurement of excitation-inhibition ratio in autism spectrum disorder using critical brain dynamicsSci Rep 10:9195Google Scholar
- 33.Shank3 deletion in PV neurons is associated with abnormal behaviors and neuronal functions that are rescued by increasing GABAergic signalingMol Autism 14:28Google Scholar
- 34.Excitatory/inhibitory imbalance in autism: the role of glutamate and GABA gene-sets in symptoms and cortical brain structureTransl Psychiatry 13:18Google Scholar
- 35.Effects of Background Noise on Cortical Encoding of Speech in Autism Spectrum DisordersJ Autism Dev Disord 39:1185–96Google Scholar
- 36.On the Relationship between Speech-and Nonspeech-Evoked Auditory Brainstem ResponsesAudiol Neurotol 11:233–41Google Scholar
- 37.Phonetic discrimination mediates the relationship between auditory brainstem response stability and syntactic performanceBrain and Language 208Google Scholar
- 38.Dissociation Between Linguistic and Nonlinguistic Statistical Learning in Children with AutismJ Autism Dev Disord 54:1912–27Google Scholar
- 39.Predictive processing during a naturalistic statistical learning task in ASDeNeuro Google Scholar
- 40.No Neural Evidence of Statistical Learning During Exposure to Artificial Languages in Children with Autism Spectrum DisordersBiological Psychiatry 68:345–51Google Scholar
- 41.Lack of Neural Evidence for Implicit Language Learning in 9-Month-Old Infants at High Risk for AutismDevelopmental Science https://onlinelibrary.wiley.com/doi/10.1111/desc.13078Google Scholar
- 42.Changes in statistical learning across developmentNat Rev Psychol 2:205–19Google Scholar
- 43.Tracking transitional probabilities and segmenting auditory sequences are dissociable processes in adults and neonatesDevelopmental Science 26https://onlinelibrary.wiley.com/doi/10.1111/desc.13300Google Scholar
- 44.Statistical learning beyond words in human neonateseLife 13:RP101802Google Scholar
- 45.Investigating the neural correlates of continuous speech computation with frequency-tagged neuroelectric responsesNeuroImage 44:509–19Google Scholar
- 46.Electrophysiological evidence of statistical learning of long-distance dependencies in 8-month-old preterm and full-term infantsBrain and Language 148:25–36Google Scholar
- 47.The Weak Coherence Account: Detail-focused Cognitive Style in Autism Spectrum DisordersJ Autism Dev Disord 36:5–25Google Scholar
- 48.High internal noise and poor external noise filtering characterize perception in autism spectrum disorderSci Rep 7:17584Google Scholar
- 49.Orthogonal neural codes for speech in the infant brainProc Natl Acad Sci USA 118:e2020410118Google Scholar
- 50.Transitional probabilities and positional frequency phonotactics in a hierarchical model of speech segmentationMem Cogn 39:1085–93Google Scholar
- 51.The role of talker-specific information in word segmentation by infantsJournal of Experimental Psychology: Human Perception and Performance 26:1570–82Google Scholar
- 52.Distinct mechanisms for talker adaptation operate in parallel on different timescalesPsychon Bull Rev 29:627–34Google Scholar
- 53.Why are listeners hindered by talker variability?Psychon Bull Rev 31:104–21Google Scholar
- 54.Fuzzy-trace theory and memory developmentDevelopmental Review 24:396–439Google Scholar
- 55.Underconnectivity between voice-selective cortex and reward circuitry in children with autismProc Natl Acad Sci USA 110:12060–5Google Scholar
- 56.Failure to attune to language predicts autism in high risk infantsBrain and Language 194:109–20Google Scholar
- 57.Partial Least Squares (PLS) methods for neuroimaging: A tutorial and reviewNeuroImage 56:455–75Google Scholar
- 58.Dysmaturation Observed as Altered Hippocampal Functional Connectivity at Rest Is Associated With the Emergence of Positive Psychotic Symptoms in Patients With 22q11 Deletion SyndromeBiological Psychiatry 90:58–68Google Scholar
- 59.Model selection in linear mixed effect modelsJournal of Multivariate Analysis 109:109–29Google Scholar
- 60.Electrophysiological Methods in Studying Infant Cognitive DevelopmentIn:
- Cohen Kadosh K
- 61.Cortical oscillations and speech processing: emerging computational principles and operationsNat Neurosci 15:511–7Google Scholar
- 62.Brainstem transcription of speech is disrupted in children with autism spectrum disordersDevelopmental Science 12:557–67Google Scholar
- 63.Altered Low-Gamma Sampling in Auditory Cortex Accounts for the Three Main Facets of DyslexiaNeuron 72:1080–90Google Scholar
- 64.Reduced auditory steady state responses in autism spectrum disorderMolecular Autism 11:56Google Scholar
- 65.Neural Processing of Speech Sounds in ASD and First-Degree RelativesJ Autism Dev Disord 53:3257–71Google Scholar
- 66.Unveiling the development of human voice perception: Neurobiological mechanisms and pathophysiologyCurrent Research in Neurobiology 6Google Scholar
- 67.Prosody guides the rapid mapping of auditory word forms onto visual objects in 6-mo-old infantsProceedings of the National Academy of Sciences 108:6038–43Google Scholar
- 68.A Prospective Study of Autistic-Like Traits in Unaffected Siblings of Probands With Autism Spectrum DisorderJAMA Psychiatry 70:42Google Scholar
- 69.Memory in autism spectrum disorder: A meta-analysis of experimental studiesPsychological Bulletin 146:377–410Google Scholar
- 70.Attention to novelty versus repetition: Contrasting habituation profiles in Autism and Williams syndromeDevelopmental Cognitive Neuroscience 29:54–60Google Scholar
- 71.What sticks after statistical learning: The persistence of implicit versus explicit memory tracesCognition 236Google Scholar
- 72.Musical training heightens auditory brainstem function during sensitive periods in developmentFront Psychol http://journal.frontiersin.org/article/10.3389/fpsyg.2013.00622/abstract
- 73.Specialization among the specialized: Auditory brainstem function is tuned in to timbreCortex 48:360–2Google Scholar
- 74.Development itself is the key to understanding developmental disordersTrends in Cognitive Sciences 2:389–98Google Scholar
- 75.Early Adaptive Functioning Trajectories in Preschoolers With Autism Spectrum DisordersJournal of Pediatric Psychology 43:800–13Google Scholar
- 76.Within-subject reproducibility varies in multi-modal, longitudinal brain networksSci Rep 13:6699Google Scholar
- 77.Within-subject template estimation for unbiased longitudinal image analysisNeuroImage 61:1402–18Google Scholar
- 78.Mullen Scales of Early Learning. Circle PinesMN: American Guidance Service Google Scholar
- 79.Autism From 2 to 9 Years of AgeArch Gen Psychiatry 63:694Google Scholar
- 80.Early brain enlargement and elevated extra-axial fluid in infants who develop autism spectrum disorderBrain 136:2825–35Google Scholar
- 81.Recurrence risk for autism spectrum disorders: a Baby Siblings Research Consortium studyPediatrics 128:e488–495Google Scholar
- 82.The Autism Parent Screen for Infants: Predicting risk of autism spectrum disorder based on parent-reported behavior observed at 6–24 months of ageAutism 22:322–34Google Scholar
- 83.PACS1-Neurodevelopmental disorder: clinical features and trial readinessOrphanet J Rare Dis 16:386Google Scholar
- 84.Prevalence of Autism Spectrum Disorder in Preterm Infants: A Meta-analysisPediatrics 142:e20180134Google Scholar
- 85.BSID-II: Bayley Scales of infant development (Bsid-II)San Antonio, TX: Psychological Corporation Google Scholar
- 86.The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposesIn: Proceeding of Fourth International Conference on Spoken Language Processing ICSLP’96 IEEE http://ieeexplore.ieee.org/document/607874/Google Scholar
- 87.Automated Pipeline for Infants Continuous EEG (APICE): A flexible pipeline for developmental cognitive studiesDevelopmental Cognitive Neuroscience 54Google Scholar
- 88.EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysisJournal of Neuroscience Methods 134:9–21Google Scholar
- 89.Automatic classification of ICA components from infant EEG using MARADevelopmental Cognitive Neuroscience 52Google Scholar
- 90.Denoising based on spatial filteringJournal of Neuroscience Methods 171:331–9Google Scholar
- 91.Partial least squares analysis of neuroimaging data: applications and advancesNeuroImage 23:S250–63Google Scholar
- 92.Spatiotemporal analysis of experimental differences in event-related potential data with partial least squaresPsychophysiology 38:517–30Google Scholar
- 93.Amygdala subdivisions exhibit aberrant whole-brain functional connectivity in relation to stress intolerance and psychotic symptoms in 22q11.2DSTransl Psychiatry 13:145Google Scholar
- 94.Somatosensory-Motor Dysconnectivity Spans Multiple Transdiagnostic Dimensions of PsychopathologyBiological Psychiatry 86:779–91Google Scholar
- 95.Large-Scale Brain Network Dynamics Provide a Measure of Psychosis and Anxiety in 22q11.2 Deletion SyndromeBiological Psychiatry: Cognitive Neuroscience and Neuroimaging 4:881–92Google Scholar
Article and author information
Author information
Version history
- Sent for peer review:
- Preprint posted:
- Reviewed Preprint version 1:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.109901. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2025, Godel et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 0
- downloads
- 0
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.