Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.
Read more about eLife’s peer review process.Editors
- Reviewing EditorMaria ChaitUniversity College London, London, United Kingdom
- Senior EditorBarbara Shinn-CunninghamCarnegie Mellon University, Pittsburgh, United States of America
Reviewer #1 (Public Review):
Summary:
In this work, the authors study whether the human brain uses long-term priors (acquired during our lifetime) regarding the statistics of auditory stimuli to make predictions respecting auditory stimuli. This is an important open question in the field of predictive processing.
To address this question, the authors cleverly profit from the naturally existing differences between two linguistic groups. While speakers of Spanish use phrases in which function words (short words like articles and prepositions) are followed by content words (longer words like nouns, adjectives, and verbs), speakers of Basque use phrases in the opposite order. Because of this, speakers of Spanish usually hear phrases in which short words are followed by longer words, and speakers of Basque experience the opposite. This difference in the order of short and longer words is hypothesized to result in a long-term duration prior that is used to make predictions regarding the likely durations of incoming sounds, even if they are not linguistic in nature.
To test this, the authors used MEG to measure the mismatch responses (MMN) elicited by the omission of short and long tones that were presented in alternation. The authors report an interaction between the language background of the participants (Spanish, Basque) and the type of omission MMN (short, long), which goes in line with their predictions. They supplement these results with a source-level analysis.
Unfortunately, serious concerns regarding the predictions put forward by the authors, and the interaction effect found, make the interpretation of these results difficult.
Strengths:
This work has many strengths. To test the main question, the authors profit from naturally occurring differences in the everyday auditory experiences of two linguistic groups, which allows them to test the effect of putative auditory priors consolidated over years. This is a direct way of testing the effect of long-term priors.
The fact that the priors in question are linguistic, and that the experiment was conducted using non-linguistic stimuli (i.e. simple tones), allows for testing of whether these long-term priors generalize across auditory domains.
The experimental design is elegant and the analysis pipeline is appropriate. This work is very well written. In particular, the introduction and discussion sections are clear and engaging. The literature review is complete.
Weaknesses:
There are two main issues in this work. The first one pertains to the predictions put forward by the researchers, and the second with the interaction effect reported.
- With respect to the predictions, the authors propose that the subjects, depending on their linguistic background and the length of the tone in a trial, can put forward one or two predictions. The first is a short-term prediction based on the statistics of the previous stimuli and identical for both groups (i.e. short tones are expected after long tones and vice versa). The second is a long-term prediction based on their linguistic background. According to the authors, after a short tone, Basque speakers will predict the beginning of a new phrasal chunk, and Spanish speakers will predict it after a long tone.
In this way, when a short tone is omitted, Basque speakers would experience the violation of only one prediction (i.e. the short-term prediction), but Spanish speakers will experience the violation of two predictions (i.e. the short-term and long-term predictions), resulting in a higher amplitude MMN. The opposite would occur when a long tone is omitted. So, to recap, the authors propose that subjects will predict the alternation of tone durations (short-term predictions) and the beginning of new phrasal chunks (long-term predictions).
The problem with this is that subjects are also likely to predict the completion of the current phrasal chunk. In speech, phrases are seldom left incomplete. In Spanish is very unlikely to hear a function-word that is not followed by a content-word (and the opposite happens in Basque). On the contrary, after the completion of a phrasal chunk, a speaker might stop talking and a silence might follow, instead of the beginning of a new phrasal chunk.
Considering that the completion of a phrasal chunk is more likely than the beginning of a new one, the prior endowed to the participants by their linguistic background should make us expect a pattern of results actually opposite to the one reported here.
- The authors report an interaction effect that modulates the amplitude of the omission response, but caveats make the interpretation of this effect somewhat uncertain. The authors report a widespread omission response, which resembles the classical mismatch response (in MEG) with strong activations in sensors over temporal regions. Instead, the interaction found is circumscribed to four sensors that do not overlap with the peaks of activation of the omission response. Furthermore, the boxplot in Figure 2E suggests that part of the interaction effect might be due to the presence of two outliers (if removed, the effect is no longer significant). Overall, it is possible that the reported interaction is driven by a main effect of omission type which the authors report, and find consistently only in the Basque group (showing a higher amplitude omission response for long tones than for short tones).
Because of these points, it is difficult to interpret this interaction as a modulation of the omission response. It should also be noted that in the source analysis, the interaction only showed a trend in the left auditory cortex, but in its current version the manuscript does not report the statistics of such a trend.
Reviewer #2 (Public Review):
Summary:
Morucci et al. tested the influence of linguistic prosody long-term priors in forming predictions about simple acoustic rhythmic tone sequences composed of alternating tone duration, by violating context-dependent short-term priors formed during sequence listening. Spanish and Basque participants were selected due to the different rhythmic prosody of the two languages (functor-initial vs. Functor final, respectively), despite a common cultural background. The authors found that neuromagnetic responses to casual tone omissions reflected the linguistic prosody pattern of the participant's dominant language: in Spanish speakers, omission responses were larger to short tones, whereas in Basque speakers, omission responses were larger to long tones. Source localization of these responses revealed this interaction pattern in the left auditory cortex, which the authors interpret as reflecting a perceptual bias due to acoustic cues (inherent linguistic rhythms, rather than linguistic content). Importantly, this pattern was not found when the rhythmic sequence entailed pitch, rather than duration, cues. To my knowledge, this is the first study providing neural signatures of a known behavioral effect linking ambiguous rhythmic tone sequence perceptual organization to linguistic experience.
The conclusions of the study are well supported by the data, albeit weakly by the source analysis, but I have the impression that the rationale of the study and the analyses performed may be missing an important aspect of rhythmic sequence perception, namely the involvement of entrained oscillatory activity to the perceived rhythm, particularly phase alignment to pattern onsets. This view would not change the impact of the results but add depth to their interpretation.
Strengths:
The choice of participants. The bilingual population of the Basque country is perfect for performing studies that need to control for cultural and socio-economic background while having profound linguistic differences. In this sense, having dominant Basque speakers as a sample equates that in Molnar et al. (2016), and thus overcomes the lack of direct behavioral evidence for a difference in rhythmic grouping across linguistic groups. Molnar et al. (2016)'s evidence on the behavioral effect is compelling, and the evidence on neural signatures provided by the present study aligns with it.
The experimental paradigm. It is a well-designed acoustic sequence, that considers aspects such as gap length insertion, to be able to analyze omission responses free from subsequent stimulus-driven responses, and which includes a control sequence that uses pitch instead of duration as a cue to rhythmic grouping, which provides a stronger case for the differences found between groups to be due to prosodic duration cues.
Data analyses. Sound, state-of-the-art methodology in the event-related field analyses at the sensor level.
Weaknesses:
Despite the evidence provided on neural responses, the main conclusion of the study reflects a known behavioral effect on rhythmic sequence perceptual organization driven by linguistic background (Molnar et al. 2016, particularly). Also, the authors themselves provide a good review of the literature that evidences the influence of long-term priors in neural responses related to predictive activity. Thus, in my opinion, the strength of the statements the authors make on the novelty of the findings may be a bit far-fetched in some instances.
Albeit the paradigm is well designed, I fail to see the grounding of the hypotheses laid by the authors as framed under the predictive coding perspective. The study assumes that responses to an omission at the beginning of a perceptual rhythmic pattern will be stronger than at the end. I feel this is unjustified. If anything, omission responses should be larger when the gap occurs at the end of the pattern, as that would be where stronger expectations are placed: if in my language a short sound occurs after a long one, and I perceptually group tone sequences of alternating tone duration accordingly, when I hear a short sound I will expect a long one following; but after a long one, I don't necessarily need to expect a short one, as something else might occur.
In this regard, it is my opinion that what is reflected in the data may be better accounted for (or at least, additionally) by a different neural response to an omission depending on the phase of an underlying attentional rhythm (in terms of Large and Jones rhythmic attention theory, for instance) and putative underlying entrained oscillatory neural activity (in terms of Lakatos' studies, for instance). Certainly, the fact that the aligned phase may differ depending on linguistic background is very interesting and would reflect the known behavioral effect.
Source localization is performed on sensor-level significant data. The lack of source-level statistics weakens the conclusions that can be extracted. Furthermore, only the source reflecting the interaction pattern is taken into account in detail as supporting their hypotheses, overlooking other sources. Also, the right IFG source activity is not depicted, but looking at whole brain maps seems even stronger than the left. To sum up, source localization data, as informative as it could be, does not strongly support the author's claims in its current state.
Reviewer #3 (Public Review):
Summary:
The paper investigates the effects of long-term linguistic experience on early auditory processing, a subject that has been relatively less studied compared to short-term influences. Using MEG, the study examines brain responses to auditory stimuli in speakers of Spanish and Basque, whose syntactic rules provide different degrees of exposure to durational patterns (long-short vs short-long). The findings suggest that both long-term language experience, as well as short-term transitional probabilities, can shape auditory predictive coding for non-linguistic sound sequences, evidenced by differences in mismatch negativity amplitudes localised to the left auditory cortex.
Strengths:
The study integrates linguistics and auditory neuroscience in an interesting interdisciplinary way that may interest linguists as well as neuroscientists. The fact that long-term language experience affects early auditory predictive coding is important for understanding group and individual differences in domain-general auditory perception. It has importance for neurocognitive models of auditory perception (e.g. inclusion of long-term priors), and will be of interest to researchers in linguistics, auditory neuroscience, and the relationship between language and perception. The inclusion of a control condition based on pitch is also a strength.
Weaknesses:
The main weaknesses are the strength of the effects and generalisability. The sample size is also relatively small by today's standards, with N=20 in each group. Furthermore, the crucial effects are all mostly in the .01>P<.05 range, such as the crucial interaction P=.03. It would be nice to see it replicated in the future, with more participants and other languages. It would also have been nice to see behavioural data that could be correlated with neural data to better understand the real-world consequences of the effect.