Enhanced neural speech tracking through noise indicates stochastic resonance in humans

  1. Björn Herrmann  Is a corresponding author
  1. Rotman Research Institute, Baycrest Academy for Research and Education, Canada
  2. Department of Psychology, University of Toronto, Canada
8 figures, 1 table and 1 additional file

Figures

Figure 1 with 2 supplements
Results for Experiment 1 (N=22).

(A) Accuracy of story comprehension (left) and gist ratings (right). (B) Electroencephalography (EEG) prediction accuracy. (C) Temporal response functions (TRFs). (D) P1-N1 and P2-N1 amplitude difference for different speech-clarity conditions. Topographical distributions reflect the average across all speech-clarity conditions. The black asterisk close to the x-axis indicates a significant difference from a paired t-test relative to the clear condition (pFDR < 0.05; false discovery rate [FDR]-thresholded). The absence of an asterisk indicates that there was no significant difference. Error bars reflect the standard error of the mean.

Figure 1—figure supplement 1
P1-N1 amplitude from temporal response function (TRF) analyses using the amplitude envelope of speech.

An asterisk close to the x-axis indicates a significant difference from a paired t-test relative to the clear condition (pFDR < 0.05; false discovery rate [FDR]-thresholded). The absence of an asterisk indicates that there was no significant difference. Error bars reflect the standard error of the mean. For additional details, see the respective figure captions in the main article.

Figure 1—figure supplement 2
P1-N1 amplitude from cross-correlations analyses.

An asterisk close to the x-axis indicates a significant difference from a paired t-test relative to the clear condition (pFDR < 0.05; false discovery rate [FDR]-thresholded). The absence of an asterisk indicates that there was no significant difference. Error bars reflect the standard error of the mean. For additional details, see the respective figure captions in the main article.

Figure 2 with 1 supplement
Results for Experiment 2 (N=22).

(A) Hit rate (left) and response times (right) for the visual 1-back task. (B) Electroencephalography (EEG) prediction accuracy. (C) Temporal response functions (TRFs). (D) P1-N1 and P2-N1 amplitude difference for different speech-clarity conditions. Topographical distributions reflect the average across all speech-clarity conditions. The black asterisk close to the x-axis indicates a significant difference from a paired t-test relative to the clear condition (pFDR < 0.05; false discovery rate [FDR]-thresholded). The absence of an asterisk indicates that there was no significant difference. Error bars reflect the standard error of the mean.

Figure 2—figure supplement 1
Relationship between visual performance and the noise-related enhancement of the P1-N1 amplitude in Experiment 2.

(A) Shows the P1-N1 amplitude for speech in babble (averaged across all signal-to-noise ratios (SNRs) above 15 dB, for which speech was highly intelligible) and clear speech (paired t-test for statistical comparison; *p < 0.05). (B) Same as in Panel A for individuals performing below 0.9 in the visual task (to controls for potential influences of high performers, who could have attended the speech). (C) Correlation between visual-task performance and the difference in the P1-N1 amplitude between speech in babble and clear speech. The relationship was not significant. n.s. – not significant.

Depiction of stimulus samples.

(A) Time courses for clear speech and speech to which background babble or speech-matched noise was added at 20 dB signal-to-noise ratio (SNR; all sound mixtures were normalized to the same root-mean-square amplitude). The first 6 s of a story are shown. (B) Spectrograms of the samples in Panel A. (C) Power spectra for clear speech, babble, and speech-matched noise. In C, only background babble/noise is displayed, without added speech.

Results for Experiment 3 (N=23).

(A) Accuracy of story comprehension (left) and gist ratings (right). Higher versus lower intensity refers to the two sound-level normalization types, one resulting in a slightly lower intensity of the speech signal in the sound mixture than the other. (B) Electroencephalography (EEG) prediction accuracy. (C) Temporal response functions (TRFs). (D) P1-N1 (left) and P2-N1 (right) amplitude difference for clear speech and different speech-masking and sound normalization conditions. In panels A, B, and D, a colored asterisk close to the x-axis indicates a significant difference from a paired t-test relative to the clear condition (pFDR < 0.05; false discovery rate [FDR]-thresholded). The specific color of the asterisk – blue versus red – indicates the normalization type (higher vs. lower speech level, respectively). The absence of an asterisk indicates that there was no significant difference relative to clear speech. Error bars reflect the standard error of the mean.

Spectra for clear speech and different background noises.
Results for Experiment 4 (N=20).

(A) Accuracy of story comprehension (left) and gist ratings (right). (B) Electroencephalography (EEG) prediction accuracy. (C) Temporal response functions (TRFs). (D) P1-N1 and P2-N1 amplitude difference for clear speech and different speech-masking conditions. Topographical distributions reflect the average across all conditions. In panels A, B, and D, the black asterisk close to the x-axis indicates a significant difference from a paired t-test relative to the clear condition (pFDR < 0.05; false discovery rate [FDR]-thresholded). The absence of an asterisk indicates that there was no significant difference relative to clear speech. Error bars reflect the standard error of the mean.

Results for Experiment 5 (N=22).

(A) Accuracy of story comprehension (left) and gist ratings (right). (B) Electroencephalography (EEG) prediction accuracy. (C) Temporal response functions (TRFs). (D) P1-N1 and P2-N1 amplitude difference for clear speech and different speech-masking and sound-delivery conditions. In panels A, B, and D, a colored asterisk close to the x-axis indicates a significant difference from a paired t-test relative to the clear condition (pFDR < 0.05; false discovery rate [FDR]-thresholded). The specific color of the asterisk – blue versus red – indicates the sound-delivery type. The absence of an asterisk indicates that there was no significant difference relative to clear speech. Error bars reflect the standard error of the mean.

Author response image 1
P1-minus-N1 amplitude for Experiment 1 and 2, using amplitudes centered on individually estimated peak latencies.

The asterisk indicates a significant di erence from the clear speech condition (FDR-thresholded).

Tables

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Software, algorithmMATLABMATLABRRID:SCR_001622
Software, algorithmJASPJASPRRID:SCR_015823
Software, algorithmPsychToolboxPsychToolboxRRID:SCR_002881
Software, algorithmOpenAIChatGPTRRID:SCR_023775
Software, algorithmFieldTripFieldTripRRID:SCR_004849

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Björn Herrmann
(2025)
Enhanced neural speech tracking through noise indicates stochastic resonance in humans
eLife 13:RP100830.
https://doi.org/10.7554/eLife.100830.3