Quantification of individual prediction tendency and the multi-speaker paradigm

A) Participants passively listened to sequences of pure tones in different conditions of entropy (ordered vs. random). Four tones of different fundamental frequencies were presented with a fixed stimulation rate of 3 Hz, their transitional probabilities varied according to respective conditions. B) Expected classifier decision values contrasting the brains’ prestimulus tendency to predict a forward transition (ordered vs. random). The purple shaded area represents values that were considered as prediction tendency C) Exemplary excerpt of a tone sequence in the ordered condition. An LDA classifier was trained on forward transition trials of the ordered condition (75% probability) and tested on all repetition trials to decode sound frequency from brain activity across time. D) Participants either attended to a story in clear speech, i.e. 0 distractor condition, or to a target speaker with a simultaneously presented distractor (blue), i.e. 1 distractor condition. E) The speech envelope was used to estimate neural and ocular speech tracking in respective conditions with temporal response functions (TRF). F) The last noun of some sentences was replaced randomly with an improbable candidate to measure the effect of envelope encoding on the processing of semantic violations. Adapted from Schubert et al., 2023.

Neural speech tracking is related to prediction tendency and word surprisal independent of selective attention

A) In a single speaker condition, neural tracking of the speech envelope was significant for widespread areas, most pronounced over auditory processing regions. B) The condition effect indicates a decrease of neural speech tracking with increasing noise (1 distractor). C) Stronger prediction tendency was associated with increased neural speech tracking over left frontal areas. D) However, there was no interaction between prediction tendency and conditions of selective attention. E) There was an increased neural tracking of semantic violations over left temporal areas. F) There was no interaction between word surprisal and speaker condition, suggestive of a representation of surprising words independent of background noise. Statistics were performed using Bayesian regression models. Marked sensors show ‘significant’ clusters where at minimum two neighbouring channels showed a significant result. N = 29.

Ocular speech tracking is dependent on selective attention

A) Vertical eye movements ‘significantly’ track attended clear speech, but not in a multi-speaker condition. Temporal profiles of this effect show a downward pattern (negative TRF weights). B) Horizontal eye movements ‘significantly’ track attended speech in a multi-speaker condition. Temporal profiles of this effect show a left-rightwards (negative to positive TRF weights) pattern. Statistics were performed using Bayesian regression models. A ‘*’ within posterior distributions depicts a significant difference from zero (i.e. the 94%HDI does not include zero). Shaded areas in TRF weights represent 95% confidence intervals. N = 29.

Ocular speech tracking and selective attention to speech share underlying neural computations

A) Vertical eye movements significantly mediate neural clear speech tracking throughout the time-lags from −0.3 - 0.7 s for principal component 1 (PC1) over right-lateralized auditory regions.This mediation effect propagates to more leftwards lateralized auditory areas over later time-lags for PC2 and PC3. B) Horizontal eye movements similarly contribute to neural speech tracking of a target in a multi-speaker condition over right-lateralized auditory processing regions for PC1, also with significant anticipatory contributions and a clear peak at ∼ 0.18 s. PC2 shows a clear left-lateralization, however not only over auditory, but also parietal areas almost entirely throughout the time-window of interest with a clear anticipatory effect starting at −0.3 s. For PC3, there still remained a small anticipatory cluster ∼ −0.2 s again over mostly left-lateralized auditory regions. Colour bars represent PCA weights for the group-averaged mediation effect. Shaded areas on time-resolved model-weights represent regions of practical equivalence (ROPE) according to Kruschke et al. (2018). Solid lines show ‘significant’ clusters where at minimum two neighbouring time-points showed a significant mediation effect. Statistics were performed using Bayesian regression models. N = 29.s

Ocular, but not neural speech tracking is related to semantic speech comprehension

A) There was no significant relationship between neural speech tracking (10% sensors with strongest encoding effect) and comprehension, however, a condition effect indicated that comprehension was generally decreased in the multi-speaker condition. B & C) A ‘significant’ negative relationship between comprehension and vertical as well as horizontal ocular speech tracking shows that participants with weaker comprehension increasingly engaged in ocular speech tracking. Statistics were performed using Bayesian regression models. Shaded areas represent 94% HDIs. N = 29.

A schematic illustration of the framework

Anticipatory predictions help to interpret auditory information at different levels along the perceptual hierarchy (purple) with high feature-specificity but low temporal precision. Active sensing (green) increases the temporal precision already at early stages of the auditory system to facilitate bottom-up processing of selectively attended input (blue). We suggest that auditory inflow, on a basic level, and speech segmentation, on a more complex level, is temporally modulated via active ocular sensing, and incoming information is interpreted based on probabilistic assumptions.