Splitting speech

Newborn babies can recognise patterns in how voices alternate, as well as detect word-like motifs even when individual syllables are pronounced by various speakers.

Image credit: Ana Fló (CC BY 4.0)

Imagine listening to a language you don't know. When does one word end, and another begin? Human infants face a similar challenge, yet remarkably, they grasp the structure of their mother tongue naturally without receiving any explicit indications. By six months, they recognize some common nouns, and by one year, they start saying their first words. This learning begins from birth, with newborns already sensitive to speech patterns.

Previous studies have shown that the likelihood of certain syllables appearing after others allows infants to detect regularity and separate speech into chunks. This is because some syllables are more predictive of what comes next than others. For example, in English, many different syllables can follow ‘the’. However, it is highly likely that ‘brocco’ will be followed by ‘li’. The ability to detect these regularities is known as statistical learning. However, whether this relies on a general mechanism or is restricted to a specific speech component, such as the sequence of syllables, remained unknown.

To investigate, Fló et al. measured brain electrical activity of newborns up to 4 days old in response to speech specifically designed to contain certain patterns of syllables or voices. In one experiment, the speech had regular patterns in the syllables, while in a second experiment, the pattern was in the voices, and each voice could utter each syllable. Unlike tracking syllable variation, which can help with learning words, voice changes within a word are unnatural and predicting them is not relevant to real-life speech processing. Therefore, if statistical learning in speech is shaped to promote language acquisition, learning should be restricted to syllable patterns. Instead, if statistical learning is a general mechanism, newborns should also detect the patterns in voice.

Analysis revealed that newborns were equally capable of discerning regular patterns in syllables despite voice changes and in voices disregarding the syllable that was pronounced. This suggests that statistical learning is a general learning mechanism that can operate across multiple features. Additionally, pseudo-words (those which resemble a real world but don’t exist in the language) were presented to the newborns after they had been familiarised with speech containing either similar syllable or voice patterns. The researchers observed a specific neural response to the pseudowords only when related to syllable patterns. This neural component suggests that only syllabic structures are considered word candidates and processed by a dedicated neural network from birth.

Taken together, the findings of Fló et al. reveal insights into how humans process speech when experience with language is minimal, suggesting that statistical learning may have a broader role in early language acquisition that previously thought.