1. Neuroscience
Download icon

Exposing distinct subcortical components of the auditory brainstem response evoked by continuous naturalistic speech

  1. Melissa J Polonenko
  2. Ross K Maddox  Is a corresponding author
  1. Department of Neuroscience, University of Rochester, United States
  2. Del Monte Institute for Neuroscience, University of Rochester, United States
  3. Center for Visual Science, University of Rochester, United States
  4. Department of Biomedical Engineering, University of Rochester, United States
Tools and Resources
Cite this article as: eLife 2021;10:e62329 doi: 10.7554/eLife.62329
19 figures, 1 table and 8 additional files

Figures

Single-subject and group average (bottom right) weighted-average auditory brainstem responses (ABRs) to ~43 min of broadband peaky speech.

Areas for the group average show ±1 SEM. Responses were high-pass filtered at 150 Hz using a first-order Butterworth filter. Waves I, III, and V of the canonical ABR are evident in most of the single-subject responses (N = 22, 16, and 22, respectively) and are marked by the average peak latencies on the average response.

Comparison of auditory brainstem response (ABR) and middle latency response (MLR) to ~43 min each of unaltered speech and broadband peaky speech.

(A) The average waveform to broadband peaky speech (blue) shows additional, and sharper, waves of the canonical ABR and MLR than the broader average waveform to unaltered speech (black). Responses were high-pass filtered at 30 Hz with a first-order Butterworth filter. Areas show ±1 SEM. (B) Comparison of peak latencies for ABR wave V (circles) and MLR waves Na (downward triangles) and Pa (upward triangles) that were common between responses to broadband peaky and unaltered speech. Blue symbols depict individual subjects, and black symbols depict the mean.

Figure 3 with 1 supplement
Comparison of grand average (N = 22) measured and modeled responses to ~43 min of broadband peaky speech.

Amplitudes of the linear (dashed line) and auditory nerve (AN; dotted line) modeled responses were in arbitrary units, and thus scaled to match the amplitude of the measured response (solid line) over the 0–20 ms lags. The pre-stimulus component was present in all three responses using a first-order 30 Hz high-pass Butterworth filter (left column), but was minimized by aggressive high-pass filtering with a second-order 200 Hz high-pass Butterworth filter (right column).

Figure 3—figure supplement 1
Auditory brainstem response (ABR) kernel used for the simple linear deconvolution model.

The average broadband peaky speech ABR (N = 22) from 0 to 16 ms was zero-padded from 0 ms to the beginning of wave I (1.6 ms), windowed with a Hann function, normalized, and then zero-padded from −16 ms to 0 ms to center the kernel.

Figure 4 with 1 supplement
Comparison of responses derived by using the same type of regressor in the deconvolution.

Average waveforms (areas show ±1 SEM) are shown for ~43 min each of unaltered speech (black) and broadband peaky speech (blue). EEG was regressed with the (A) half-wave rectified audio and (B) pulse train. Responses were high-pass filtered at 30 Hz using a first-order Butterworth filter.

Figure 4—figure supplement 1
Comparison of responses derived by using the half-wave rectified audio as the regressor in the deconvolution with electroencephalography (EEG) recorded in response to ~43 min of unaltered speech and multiband peaky speech.

Average waveforms (areas show ±1 SEM) are shown for EEG recorded to unaltered speech (black) relative to EEG recorded to multiband peaky speech (red). Responses were high-pass filtered at 30 Hz using a first-order Butterworth filter.

Comparison of responses to 32 min each of male- (dark blue) and female-narrated (light blue) re-synthesized broadband peaky speech.

(A) Average waveforms across subjects (areas show ±1 SEM) are shown for auditory brainstem response (ABR) time lags with high-pass filtering at 150 Hz (top), and both ABR and middle latency response (MLR) time lags with a lower high-pass filtering cutoff of 30 Hz (bottom). (B) Histograms of the correlation coefficients between responses evoked by male- and female-narrated broadband peaky speech during ABR (top) and ABR/MLR (bottom) time lags. Solid lines denote the median and dotted lines the interquartile range. (C) Comparison of ABR (top) and MLR (bottom) wave peak latencies for individual subjects (gray) and the group mean (black). ABR and MLR responses were similar to both types of input but are smaller for female-narrated speech, which has a higher glottal pulse rate. Peak latencies for female-evoked speech were delayed during ABR time lags but faster for early MLR time lags.

Figure 6 with 1 supplement
Comparison of responses to ~43 min of male-narrated multiband peaky speech.

(A) Average waveforms across subjects (areas show ±1 SEM) are shown for each band (colored solid lines) and common component (dot-dash gray line, same waveform replicated as a reference for each band), which was calculated using six false pulse trains. (B) The common component was subtracted from each band’s response to give the frequency-specific waveforms (areas show ±1 SEM), which are shown with high-pass filtering at 30 Hz (solid lines) and 150 Hz (dashed lines). (C) Mean ± SEM peak latencies for each wave decreased with increasing band frequency. Numbers of subjects with an identifiable wave are given for each wave and band. Details of the mixed effects models for (C) are provided in Supplementary file 1A.

Figure 6—figure supplement 1
Comparison of responses to 64 min each of male- (left) and female-narrated (right) multiband peaky speech created with the dynamic random frequency shift method.

(A) Weighted-average waveforms for one subject are shown for each band (colored solid lines) and common component (dot-dashed gray line, same waveform replicated as a reference for each band), which was calculated using six false pulse trains. (B) The common component was subtracted from each band’s response to give the frequency-specific waveforms, which are shown with high-pass filtering at 30 Hz (solid lines) and 150 Hz (dashed lines). Responses from all four bands show more consistent resemblance to the common component, indicating that this method is effective at reducing stimulus-related bias. However, differences still remain in the lowest frequency band for latencies >30 ms, suggesting that this new method reveals true underlying low-frequency neural activity that is unique.

Comparison of responses to ~43 min of male-narrated peaky speech in the same subjects.

Average waveforms across subjects (areas show ±1 SEM) are shown for broadband peaky speech (blue) and summed frequency-specific responses to multiband peaky speech with the common component added (red), high-pass filtered at 150 Hz (left) and 30 Hz (right). Regressors in the deconvolution were pulse trains.

Comparison of responses to 32 min each of male- and female-narrated re-synthesized multiband peaky speech.

(A) Average frequency-specific waveforms across subjects (areas show ±1 SEM; common component removed) are shown for each band in response to male- (dark red lines) and female-narrated (light red lines) speech. Responses were high-pass filtered at 30 Hz (left) and 150 Hz (right) to highlight the middle latency response (MLR) and auditory brainstem response (ABR), respectively. (B) Correlation coefficients between responses evoked by male- and female-narrated multiband peaky speech during ABR/MLR (left) and ABR (right) time lags for each frequency band. Black lines denote the median. (C) Mean ± SEM peak latencies for male- (dark) and female-narrated (light) speech for each wave decreased with increasing frequency band. Numbers of subjects with an identifiable wave are given for each wave, band, and narrator. Lines are given a slight horizontal offset to make the error bars easier to see. Details of the mixed effects models for (C) are provided in Supplementary file 1B.

Comparison of responses to ~60 min each of male- and female-narrated dichotic multiband peaky speech with standard audiological frequency bands.

(A) Average frequency-specific waveforms across subjects (areas show ±1 SEM; common component removed) are shown for each band for the left ear (dotted lines) and right ear (solid lines). Responses were high-pass filtered at 30 Hz. (B) Left–right ear correlation coefficients (top, averaged across gender) and male–female correlation coefficients (bottom, averaged across ear) during auditory brainstem response time lags (0–15 ms) for each frequency band. Black lines denote the median. (C) Mean ± SEM wave V latencies for male- (dark red) and female-narrated (light red) speech for the left (dotted line, cross symbol) and right ear (solid line, circle symbol) decreased with increasing frequency band. Lines are given a slight horizontal offset to make the error bars easier to see. Details of the mixed effects model for (C) are provided in Supplementary file 1C.

Cumulative proportion of subjects who have responses with ≥0 dB signal-to-noise ratio (SNR) as a function of recording time.

Time required for unaltered (black) and broadband peaky speech (dark blue) of a male narrator is shown for 22 subjects in the left plot, and for male (dark blue) and female (light blue) broadband peaky speech is shown for 11 subjects in the right plot. Solid lines denote SNRs calculated using variance of the signal high-pass filtered at 30 Hz over the auditory brainstem response (ABR)/middle latency response (MLR) interval 0–30 ms, and dashed lines denote SNR variances calculated on signals high-pass filtered at 150 Hz over the ABR interval 0–15 ms. Noise variance was calculated in the pre-stimulus interval −480 to −20 ms.

Cumulative proportion of subjects who have frequency-specific responses (common component subtracted) with ≥0 dB signal-to-noise ratio (SNR) as a function of recording time.

Acquisition time was faster for male- (left) than female-narrated (right) multiband peaky speech with (A) four frequency bands presented diotically and (B) five frequency bands presented dichotically (total of 10 responses, five bands in each ear). SNR was calculated by comparing variance of signals high-pass filtered at 150 Hz across the auditory brainstem response interval of 0–15 ms to variance of noise in the pre-stimulus interval −480 to −20 ms.

The range of lags can be extended to allow early, middle, and late latency responses to be analyzed from the same recording to broadband peaky speech.

Average waveforms across subjects (areas show ±1 SEM) are shown for responses measured to 32 min of broadband peaky speech narrated by a male (dark blue) and female (light blue). Responses were high-pass filtered at 1 Hz using a first-order Butterworth filter, but different filter parameters can be used to focus on each stage of processing. Canonical waves of the auditory brainstem response, middle latency response, and late latency response are labeled for the male-narrated speech. Due to adaptation, amplitudes of the late potentials are smaller than typically seen with other stimuli that are shorter in duration with longer inter-stimulus intervals than our continuous speech. Waves I and III become more clearly visible by applying a 150 Hz high-pass cutoff.

Unaltered speech waveform (top left) and spectrogram (top right) compared to re-synthesized broadband peaky speech (middle left and right) and multiband peaky speech (bottom left and right).

Comparing waveforms shows that the peaky speech is as ‘click-like’ as possible, while comparing the spectrograms shows that the overall spectrotemporal content that defines speech is basically unchanged by the re-synthesis. A naïve listener is unlikely to notice that any modification has been performed, and subjective listening confirms the similarity. Yellow/lighter colors represent larger amplitudes than purple/darker colors in the spectrogram. See supplementary files for audio examples of each stimulus type for both narrators.

Relative mean-squared magnitude in decibels of multiband peaky speech with four filter bands (left) and five filter bands (right) for male- (dark red circles) and female-narrated (light red triangles) speech.

The full audio comprises unvoiced and re-synthesized voiced sections, which was presented to the subjects during the experiments. The other bands reflect the relative magnitude of the voiced sections (voiced only), and each filtered frequency band.

Figure 15 with 2 supplements
Spectral coherence of pulse trains for multiband peaky speech narrated by a male (left) and female (right).

Spectral coherence was computed across 1 s slices from 60 unique 64 s multiband peaky speech segments (3840 total slices) for each combination of bands. Each light gray line represents the coherence for one band comparison. (A) There were 45 comparisons across the 10-band (audiological) speech used in experiment 3 (5 frequency bands × 2 ears). The lowest band was unshifted, and the other nine bands had static frequency shifts. (B) There were six comparisons across four pulse trains of the bands in the pilot experiment, which all had dynamic random frequency shifts. Pulse trains (i.e., the input stimuli, or regressors, for the deconvolution) were frequency-dependent (coherent) below 72 Hz for the male multiband speech and 126 Hz for the female multiband speech.

Figure 15—figure supplement 1
Comparison of the common component derived from the average response to six fake pulse trains that were created using static frequency shifts (solid, darker lines; used in the paper) or dynamic random frequency shifts (dashed, lighter lines, pilot data and suggested in 'Materials and methods').

Responses were to 32 min each of male- (left) and female-narrated (right) re-synthesized diotic multiband peaky speech. Areas show ±1 SEM. The electroencephalography to diotic multiband peaky speech (four-bands) was regressed with six fake pulse trains created using the static shifts (used in the paper; the same common component displayed in Figure 4), as well as six fake pulse trains created using the dynamic random frequency shift method (used to create the common component in Figure 15—figure supplement 2).

Figure 15—figure supplement 2
Multiband stimuli responses for male (left) and female (right) derived by deconvolving the absolute value of the dichotic (stereo) multiband peaky audio (from experiment 3) with the 10 associated pulse trains – five pulse trains were used for each band in each ear (‘correct’ pulse trains, top row).

Each pulse train acted as a ‘wrong’ pulse train for the associated band in the other ear (bottom row). Left ear responses are shown by dotted lines and right ear responses by solid lines. Higher frequency bands are shown by lighter red colors. The non-zero responses only occurred when the correct pulse train was paired with the correct audio. Both male- and female-narrated speech responses symmetrically surrounded 0 ms and were largest for the first (lowest frequency) band when the correct, but not fake, pulse trains were used as the regressor in the deconvolution.

Octave band filters used to create re-synthesized broadband peaky speech (left, blue), diotic multiband peaky speech with four bands (middle, red), and dichotic multiband peaky speech using five bands with audiological center frequencies (right, red).

The last band (2nd, 5th, and 6th, respectively, black line) was used to filter the high frequencies of unaltered speech during mixing to improve the quality of voiced consonants. The designed frequency response using trapezoids (top) was converted into time-domain using IFFT, shifted, and Nuttall windowed to create impulse responses (middle), which were then used to assess the actual frequency response by converting into the frequency domain using FFT (bottom).

Example stimulus, regressor, and deconvolved response.

Left: A segment of broadband peaky speech stimulus (top) and the corresponding glottal pulse train (bottom) used in calculating the broadband peaky speech response. Right: An example broadband peaky speech response from a single subject. The response shows auditory brainstem response waves I, III, and V at ~3, 5, and 7 ms, respectively. It also shows later peaks corresponding to thalamic and cortical activity at ~17 and 27 ms, respectively.

Author response image 1
SNR (dB) calculations based on the response pre-stimulus baseline as used in the paper versus based off the pre-stimulus baseline and stimulus response window of the responses calculated to the wrong stimulus.

The same EEG was used for each calculation. SNRs groups around the unity line, except for similar numbers of responses that were better/worse than the paper’s SNR calculator for those responses that have poor SNR (<-5 dB SNR).

Author response image 2
Replication of ABR responses to broadband peaky speech for the 22 subjects in experiment 1.

Equal numbers of epochs for the first (black line) and second (gray line) halves of the recording were included in each replication. Correlation coefficients are provided in the top right corner for individual subjects. The median and interquartile range (IQR) are provided in the top right corner for the grand mean responses in the bottom right subplot.

Tables

Table 1
Parameter estimates and SEM for power law fits to the multiband peaky speech auditory brainstem response wave V data in the three experiments*.
TypeNarratorHigh-pass cutoffabd
Norms4.70–5.00 ms3.46–5.39 ms0.22–0.50
A. Experiment 1
Diotic
four-bands
Male30 Hz5.13 ± 0.08 ms3.95 ± 1.03 ms0.41 ± 0.02
150 Hz4.80 ± 0.08 ms3.95 ± 1.03 ms0.37 ± 0.02
B. Experiment 2
Diotic
four-bands
Male30 Hz5.06 ± 0.14 ms4.42 ± 1.04 ms0.45 ± 0.03
Female30 Hz5.58 ± 0.12 ms3.94 ± 1.08 ms0.44 ± 0.05
C. Experiment 3
Dichotic
five-bands
Male30 Hz5.06 ± 0.14 ms4.13 ± 1.04 ms§0.36 ± 0.02
Female30 Hz5.58 ± 0.12 ms3.75 ± 1.07 ms§0.41 ± 0.03
  1. *Power model: τ(f)= a+bfd where a= τsynaptic+ τIV and τsynaptic = 0.8 ms. See 'Statistical analyses' section in 'Materials and methods' for more detail.

    Norms for tone pips and derived bands were calculated for 65 dBppeSPL using the model's level-dependent parameter when appropriate (Neely et al., 1988; Rasetshwane et al., 2013; Strelcyk et al., 2009).

  2. Estimates from experiment 2 were used.

    §Estimates given for the left ear; there was not a significant difference for the right ear.

Additional files

Supplementary file 1

Details of statistical models.

(A) LMER model formula and summary output for multiband peaky speech in experiment 1. (B) LMER model formula and summary output for multiband peaky speech in experiment 2. (C) LMER model formula and summary output for multiband peaky speech in experiment 3.

https://cdn.elifesciences.org/articles/62329/elife-62329-supp1-v2.docx
Audio file 1

Unaltered speech sample from the male narrator (The Alchemyst; Scott, 2007).

https://cdn.elifesciences.org/articles/62329/elife-62329-fig2-v2.zip
Audio file 2

Broadband peaky speech sample from the male narrator (The Alchemyst; Scott, 2007).

https://cdn.elifesciences.org/articles/62329/elife-62329-fig3-v2.zip
Audio file 3

Multiband peaky speech sample from the male narrator (The Alchemyst; Scott, 2007).

https://cdn.elifesciences.org/articles/62329/elife-62329-fig4-v2.zip
Audio file 4

Unaltered speech sample from the female narrator (A Wrinkle in Time; L’Engle, 2012).

https://cdn.elifesciences.org/articles/62329/elife-62329-fig5-v2.zip
Audio file 5

Broadband peaky speech sample from the female narrator (A Wrinkle in Time; L’Engle, 2012).

https://cdn.elifesciences.org/articles/62329/elife-62329-fig6-v2.zip
Audio file 6

Multiband peaky speech sample from the female narrator (A Wrinkle in Time; L’Engle, 2012).

https://cdn.elifesciences.org/articles/62329/elife-62329-fig7-v2.zip
Transparent reporting form
https://cdn.elifesciences.org/articles/62329/elife-62329-transrepform-v2.pdf

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)