Large-scale electrophysiology and deep learning reveal distorted neural signal dynamics after hearing loss

  1. Shievanie Sabesan
  2. Andreas Fragner
  3. Ciaran Bench
  4. Fotios Drakopoulos
  5. Nicholas A Lesica  Is a corresponding author
  1. Ear Institute, University College London, United Kingdom
  2. Perceptual Technologies, United Kingdom

Abstract

Listeners with hearing loss often struggle to understand speech in noise, even with a hearing aid. To better understand the auditory processing deficits that underlie this problem, we made large-scale brain recordings from gerbils, a common animal model for human hearing, while presenting a large database of speech and noise sounds. We first used manifold learning to identify the neural subspace in which speech is encoded and found that it is low-dimensional and that the dynamics within it are profoundly distorted by hearing loss. We then trained a deep neural network (DNN) to replicate the neural coding of speech with and without hearing loss and analyzed the underlying network dynamics. We found that hearing loss primarily impacts spectral processing, creating nonlinear distortions in cross-frequency interactions that result in a hypersensitivity to background noise that persists even after amplification with a hearing aid. Our results identify a new focus for efforts to design improved hearing aids and demonstrate the power of DNNs as a tool for the study of central brain structures.

Editor's evaluation

This fundamental work uses deep neural networks to simulate activity evoked by a wide range of stimuli and demonstrates systematic differences in latent population representations between hearing-impaired and normal-hearing animals that are consistent with impaired representations of speech in noise. The evidence supporting the conclusions is compelling, and the neural-network approach is novel with potential future applications. The research will be of interest to auditory neuroscientists and computational scientists.

https://doi.org/10.7554/eLife.85108.sa0

Introduction

Hearing loss is a widespread problem with far-reaching consequences ranging from lost productivity and social isolation to decreases in quality of life and mental health (Wilson et al., 2017). It also imposes a growing societal burden with associated costs approaching $1 trillion annually (World Health Organization, 2021). Hearing aids are the only widely available treatment for hearing loss, but, unfortunately, current devices provide limited benefit in many workplace and social settings (Lesica, 2018).

The term ‘hearing loss’ does not capture the full spectrum of the effects of cochlear damage on auditory processing. One common consequence of cochlear damage is a loss of sensitivity that renders low-intensity sounds inaudible. This is the ‘hearing loss’ that is assessed in standard clinical tests and is addressed through amplification with hearing aids. But cochlear damage has other consequences beyond lost sensitivity that cause many people to struggle with high-intensity sounds that are well above their audibility threshold (Moore, 2007). For example, people with hearing loss often have difficulties understanding speech in noisy settings, with or without a hearing aid (Larson et al., 2000). The pathophysiology underlying these high-intensity deficits remains poorly understood, and, as a result, they are largely ignored by clinicians and hearing aid designers.

The consequences of cochlear damage for the processing of both low- and high-intensity sounds have been well described at the level of the auditory nerve (AN; Young, 2008). In addition to the general decrease in neural activity resulting from lost sensitivity, there are also complex changes in the spatiotemporal structure of the neural activity patterns that encode acoustic information, such as lost synchrony capture (Miller et al., 1997) and distorted tonotopy (Henry et al., 2016). Many theories have attempted to explain how the peripheral changes associated with hearing loss might lead to perceptual deficits (Humes and Dubno, 2010; Plomp, 1986). But, with few explicit studies comparing neural coding in central auditory areas before and after hearing loss, it has been difficult to differentiate between competing theories or to identify which peripheral changes are most important to address.

One recent study of individual neurons in the inferior colliculus (IC) with and without hearing loss and hearing aids found that some properties, such as phoneme selectivity, were impacted while others, such as frequency selectivity and trial-to-trial variability, were not (Armstrong et al., 2022). But, while characterizing the impact of hearing loss on neural coding in individual neurons may be sufficient at the level of the AN (a set of largely homogeneous fibers driven by a few thousand inner hair cells), neural coding in downstream areas such as the IC, which are much larger and more complex, likely involves emergent network-level properties that are not readily apparent in the activity of individual neurons.

One of the challenges in characterizing neural coding at the network level is the high dimensionality of the activity patterns. If the goal is to gain insight into the underlying computations that the activity reflects, it can be useful to find more compact representations of the full activity that retain its important features. This process, often termed ‘manifold learning’ (Mitchell-Heggs et al., 2023; Williamson et al., 2019), typically utilizes techniques such as principal component analysis (PCA) that identify a projection of the full activity into a lower dimensional space that retains as much of its variance as possible. In this study, we use manifold learning to investigate the impact of hearing loss on the neural coding of speech in gerbils, a common animal model for the study of human hearing. We employ large-scale intracranial recordings that allow us to achieve comprehensive sampling of activity from individual animals at the fine spatial and temporal scales that are critical for encoding speech (Garcia-Lazaro et al., 2013).

We begin with the traditional approach to manifold learning using PCA to identify and analyze the low-dimensional subspace in which most of the variance in the full network activity patterns resides. We focus on the signal manifold, which captures the features of neural activity that are sound-evoked, first establishing that our recordings are sufficient to identify the signal manifold in individual animals and then that the changes in signal dynamics with hearing loss are fundamental, that is, that the dynamics within the signal manifold are not simply attenuated by hearing loss, but instead are truly distorted. We then continue our analysis using deep neural networks (DNNs) to perform manifold learning within the framework of a stimulus encoding model, which allows us to investigate the impact of hearing loss on the coding of novel sounds.

We demonstrate that training DNNs on our recordings allows for accurate prediction of neural activity in conjunction with identification of the signal manifold. We use the trained DNNs to probe the processing of basic acoustic features and show that hearing loss predominantly affects spectral, rather than temporal, processing. We then probe the processing of speech and find that this impaired spectral processing creates a hypersensitivity to background noise that persists even with a hearing aid and appears to arise from aberrant cross-frequency interactions. Our results demonstrate the power of DNNs to provide new insights into neural coding at the network level and suggest that new approaches to hearing aid design are required to address the highly nonlinear nature of the effects of hearing loss on spectral processing.

Results

We recorded neural activity from the IC of anesthetized gerbils using electrode arrays with a total of 512 channels (Armstrong et al., 2022), allowing us to sample widely from neurons that were sensitive to the full range of speech frequencies (Figure 1a). We recorded activity in response to more than 10 hr of speech in each animal, presenting the entire TIMIT speech database (Garofolo, 1993) twice – once in quiet and once in background noise, with the overall intensity, speech-to-noise ratio, and noise type varied from sentence to sentence (with a small number of sentences repeated multiple times under identical conditions to assess trial-to-trial variability). We processed the recordings to extract multi-unit activity (MUA) spike counts for each recording channel, using 1.3 ms time bins to account for the fact that neurons in the IC can encode information about speech with millisecond temporal precision (Garcia-Lazaro et al., 2013).

Neural signal and noise in the gerbil inferior colliculus (IC).

(a) Schematic diagram showing the geometry of custom-designed electrode arrays for large-scale recordings in relation to the gerbil IC (center), along with the speech syllable ‘sa’ (left) and the neural activity that it elicited during an example recording (right). Each image of the neural activity corresponds to one hemisphere, with each row showing the average multi-unit activity recorded on one electrode over repeated presentations of the syllable, with the units arranged according to their location within the IC. The activity for three units with different center frequencies (CF; frequency for which sensitivity to pure tones is maximal) are shown in detail. (b) Schematic diagram showing the method for separating signal and noise in neural activity. The signal is obtained by averaging responses across repeated presentations of identical sounds. The noise is the residual activity that remains after subtracting the signal from the response to each individual presentation. (c) Signal and noise in neural activity. Left: total, signal, and noise variance in neural activity for units recorded from normal hearing animals (horizontal line indicates median, thick vertical line indicates 25th through 75th percentile, thin vertical line indicates 5th through 95th percentile; n = 2556). Right: total, signal, and noise correlation in neural activity for pairs of units recorded from normal hearing animals (n = 544,362).

The neural signal manifold is low dimensional

We began by analyzing activity from animals with normal hearing. As a first step toward characterizing the neural code for speech at the network level, we determined whether there was shared variance across units that would allow us to reduce the dimensionality of the activity patterns. Previous work has shown that the correlations in IC activity are dominated by ‘signal’ (features of activity that are reproducible across repeated trials and, thus, convey acoustic information) rather than ‘noise’ (features of activity that vary from trial-to-trial and reflect intrinsic noise or fluctuations in brain state) (Figure 1b; Garcia-Lazaro et al., 2013).

Signal correlations were also dominant in our recordings: although signal variance (the covariance in activity across repeated trials) accounted for only 40% of the overall variance in activity, signal correlations accounted for 95% of the total correlation between units (Figure 1c). For a network operating in such a regime, with each neuron having more than half of its variance uncorrelated with that of its neighbors, there is limited scope for reducing the dimensionality of the full activity patterns. However, given the large signal correlations, it may be possible to identify a low-dimensional subspace in which the acoustic information represented by the signal is embedded.

We partitioned the recordings from each animal into two sets: a training set that was used to identify the principal components (PCs) of the activity (Figure 2a, step 1) and a test set with two repeated trials that was used to measure the variance that could be explained by each PC. To measure the total variance explained (Figure 2a, steps 2a–c), we projected the activity from test trial 1 onto the PCs, then reconstructed the original activity from the same trial using the PC projection and compared the reconstruction to the original activity. The overall dimensionality of the neural activity was high, as expected, with a large number of PCs required for the reconstruction to explain 95% of the total variance in the original activity (Figure 2b).

Figure 2 with 1 supplement see all
Neural signal dynamics identified via classical methods.

(a) Schematic diagram showing the method for identifying the neural signal manifold from recordings of neural activity. Step 1: principal component analysis (PCA) is performed on a subset of the recordings allocated for training. Step 2a: a subset of recordings allocated for testing are projected onto the principal components (PCs) from step 1. Step 2b: The projections from step 2a are used to reconstruct the test recordings. Step 2c: the reconstructions from step 2b are compared to the test recordings from step 2a to determine the total variance explained. Step 3: the reconstructions from step 2b are compared to another set of test recordings (made during a second presentation of the same sounds) to determine the signal variance explained. Step 4: the projections from step 2a for one animal are compared to the projections for another animal to determine the similarity of the signal dynamics between animals. (b) Total variance explained in step 2c as a function of the number of PCs used for the reconstruction. Each thick line shows the results for one normal hearing animal (n = 6). The thin line denotes 95% variance explained. (c) Signal variance explained in step 3 as a function of the number of PCs used for the reconstruction. (d) Percent of variance explained by each PC that corresponds to neural signal (rather than neural noise) for an example animal. (e) Variance explained in step 4 for each pair of normal hearing animals. (f, g) Total variance explained in step 2c and signal variance explained in step 3 for animals with hearing loss (n = 6). (h) Variance explained in step 4 for each pair of animals with hearing loss and each pair of animals with different hearing status. (i) Distributions of variance explained in step 4 for each pair of normal hearing animals (n = 15), each pair of animals with hearing loss (n = 15), and each pair of animals with different hearing status (n = 36). Median values were compared via Kruskal–Wallis one-way ANOVA and Tukey–Kramer post hoc tests, ***p<0.001, **p<0.01, *p<0.05, ns indicates not significant. For full details of statistical tests, see Table 1.

Table 1
Details of statistical analyses.

This table provides the details for the statistical analyses in this study, including sampling unit, sample sizes, and p-values. All comparisons were made using Kruskal–Wallis one-way ANOVA with post hoc Tukey–Kramer tests to compute pairwise p-values.

Figure 2Figure 6
Figure 2iSampling unit: pairs of animalsFigure 6eSampling unit: pairs of animals
Groups:Comparisons:Groups:Comparisons:
1. NH\NH (n = 15)1 vs. 2p=0.081. NH\NH (n = 15)1 vs. 2p<1e-7
2. HL\HL (n = 15)1 vs. 3p<1e-72. NH\HL (n = 36)1 vs. 3p=0.007
3. NH\HL (n = 36)2 vs. 3p<1e-73. NH\HL* (n = 36)1 vs. 4p=0.23
4. NH\HA (n = 36)2 vs. 3p<1e-7
Figure 32 vs. 4p<1e-7
3 vs. 4p=0.29
Figure 3eSampling unit: pairs of animals
Figure 6fSampling unit: pairs of animals
Groups:Comparisons:
1. NH\NH (n = 15)1 vs. 2p=0.71Groups:Comparisons:
2. HL\HL (n = 15)1 vs. 3p<1e-101. NH\NH (n = 15)1 vs. 2p<1e-7
3. NH\HL (n = 36)2 vs. 3p<1e-102. NH\HL (n = 36)1 vs. 3p<1e-7
3. NH\HL* (n = 36)1 vs. 4p=0.002
Figure 44. NH\HA (n = 36)2 vs. 3p<1e-7
2 vs. 4p<1e-7
Figure 4fSampling unit: pairs of animals3 vs. 4p<1e-4
Groups:Comparisons:Figure 6gSampling unit: animals
1. NH\NH (n = 15)1 vs. 2p<1e-7
2. NH\HL (n = 36)1 vs. 3p<1e-3Groups:Comparisons:
3. NH\HL* (n = 36)2 vs. 3p<1e-71. NH (n = 6)1 vs. 2p<1e-6
2. HL (n = 6)1 vs. 3p=0.011
Figure 4fSampling unit: pairs of animals3. HL* (n = 6)1 vs. 4p=0.057
4. HA (n = 6)2 vs. 3p<1e-3
Groups:Comparisons:2 vs. 4p<1e-4
1. NH\NH (n = 15)1 vs. 2p<1e-73 vs. 4p=0.86
2. NH\HL (n = 36)1 vs. 3p<1e-4
3. NH\HL* (n = 36)2 vs. 3p<1e-7Figure 7
Figure 5Figure 7eSampling unit: pairs of animals
Figure 5fSampling unit: pairs of animalsGroups:Comparisons:
1. NH\NH (n = 15)1 vs. 2p<1e-7
Groups:Comparisons:2. NH\HL (n = 36)1 vs. 3p<1e-4
1. NH\NH (n = 15)1 vs. 2p<1e-73. NH\HL* (n = 36)1 vs. 4p<1e-7
2. NH\HL (n = 36)1 vs. 3p=0.994. NH\HA (n = 36)2 vs. 3p<1e-7
3. NH\HL* (n = 36)2 vs. 3p<1e-72 vs. 4p<1e-7
3 vs. 4p=0.056
Figure 5gSampling unit: pairs of animals
Figure 7fSampling unit: pairs of animals
Groups:Comparisons:
1. NH\NH (n = 15)1 vs. 2p<1e-7Groups:Comparisons:
2. NH\HL (n = 36)1 vs. 3p=0.641. NH\NH (n = 15)1 vs. 2p<1e-7
3. NH\HL* (n = 36)2 vs. 3p<1e-72. NH\HL (n = 36)1 vs. 3p<1e-7
3. NH\HL* (n = 36)1 vs. 4p<1e-7
Figure 5jSampling unit: pairs of animals4. NH\HA (n = 36)2 vs. 3p<1e-7
2 vs. 4p<1e-7
Groups:Comparisons:3 vs. 4p=0.99
1. NH\NH (n = 15)1 vs. 2p<1e-7
2. NH\HL (n = 36)1 vs. 3p=0.89Figure 7gSampling unit: animals
3. NH\HL* (n = 36)2 vs. 3p<1e-7
Groups:Comparisons:
Figure 5kSampling unit: pairs of animals1. NH (n = 6)1 vs. 2p<1e-9
2. HL (n = 6)1 vs. 3p<1e-5
Groups:Comparisons:3. HL* (n = 6)1 vs. 4p<1e-6
1. NH\NH (n = 15)1 vs. 2p<1e-74. HA (n = 6)2 vs. 3p<1e-4
2. NH\HL (n = 36)1 vs. 3p=0.972 vs. 4p<1e-3
3. NH\HL* (n = 36)2 vs. 3p<1e-73 vs. 4p=0.59
  1. NH: normal hearing; HL: hearing loss; HL*: hearing loss at best intensity; HA: hearing aid.

To measure the signal variance explained (Figure 2a, step 3), we compared the same reconstructed activity to the original activity from test trial 2 (when using activity from one trial to reconstruct activity on another, only those features that reliably encode acoustic information across trials can be successfully reconstructed and, thus, only signal variance can be explained). The signal variance explained saturated quickly, indicating that the signal dimensionality was much lower than the overall dimensionality, with only a small number of PCs (between 5 and 10) required to explain 95% of the signal variance in the original activity (Figure 2c).

These results suggest that the acoustic information in IC activity is restricted to a low-dimensional subspace, which we term the neural signal manifold, and that PCA is able to identify the dimensions that define this manifold. To confirm that PCA preferentially identified the signal manifold, we computed the fraction of the variance explained by each PC that was signal rather than noise (measured as the covariance between the activity on the two test trials when projected onto each PC relative to the overall variance after the same projection) and verified that it decreased with each successive PC (Figure 2d).

If the signal manifold reflects something fundamental about auditory processing, then we should expect the activity within it, which we term the signal dynamics, to be similar across animals with the same hearing status. To measure the similarity of the signal dynamics across animals (Figure 2a, step 4), we projected the original activity for each animal onto its respective signal manifold and then determined how much of the variance in the activity from one animal could be explained by the activity from another (allowing for additional linear transformation). We found that the signal dynamics for different animals were remarkably similar, with the signal dynamics from one animal accounting for, on average, 96% of the variance in the signal dynamics from other animals (Figure 2e). This result gives us confidence that the signal manifold is indeed fundamental, and that our methods are sufficient to identify it robustly in individual animals.

Hearing loss distorts neural signal dynamics

We next sought to use analysis of the signal manifold to better understand the impact of hearing loss on the neural code for speech at the network level. We induced sloping mild-to-moderate sensorineural hearing loss (Figure 2—figure supplement 1) by exposing gerbils (n = 6) to broadband noise using established protocols (Armstrong et al., 2022; Suberman et al., 2011). After waiting at least 1 mo for the effects of the hearing loss to stabilize, we made neural recordings while presenting the same speech and noise sounds and then performed the same manifold learning.

The results for animals with hearing loss were similar to those for animals with normal hearing: the overall dimensionality of the neural activity was high (Figure 2f); the dimensionality of the signal manifold was low (Figure 2g; between 4 and 7 PCs required to explain 95% of the signal variance); and the signal dynamics were similar across animals (Figure 2h; 95% average variance explained), demonstrating again that the signal manifold is fundamental and robust. But the similarity between the signal dynamics of normal hearing animals and animals with hearing loss was much lower than that between animals with the same hearing status (Figure 2h and i; 78% average variance explained). This result indicates that the activity within the signal manifold of an animal with hearing loss is not linearly predictable from the activity within the signal manifold of a normal hearing animal and, thus, that the impact of hearing loss at the network level is a true nonlinear distortion that reshapes the neural code in a complex way.

DNNs enable accurate simulation of neural signal dynamics

To develop an understanding of exactly how hearing loss impacts signal dynamics, further investigation is required. However, traditional approaches to manifold learning such as PCA are limited by the fact that they can only be applied to existing recordings. To overcome this limitation, we designed a DNN that allowed us to identify the signal manifold within the framework of an encoding model that maps sound to neural activity (Figure 3a). If the DNN can be trained to replicate neural activity with high accuracy for a wide range of sounds, it can then be used to probe the effects of hearing loss on signal dynamics using new sounds as needed.

Neural signal dynamics identified via deep learning.

(a) Schematic diagram of the deep neural network (DNN) used to predict inferior colliculus (IC) activity. (b) Example images of neural activity elicited by a speech syllable from an example recording (left), reconstructed as in step 2b of Figure 2 (center), and predicted by the DNN (right). (c) Left: predictive power of the DNN for each unit from one animal. The value on the horizontal axis is the fraction of the variance in the neural activity that is explainable (i.e., that is consistent across trials of identical speech). The value on the vertical axis is the percent of this explainable variance that was captured by the DNN. Analogous values for linear–nonlinear (LN) models trained and tested on the same activity are shown for comparison. Right: average predictive power for all units as a function of the number of channels in the bottleneck layer. Each line shows the results for one animal. (d) Fraction of the variance in one set of bottleneck activations explained by another set for each pair of normal hearing animals (left), each pair of animals with hearing loss (center), and each pair of animals with different hearing status (right). (e) Distributions of variance in one set of bottleneck activations explained by another set for each pair of normal hearing animals (n = 15), each pair of animals with hearing loss (n = 15), and each pair of animals with different hearing status (n = 36). Median values were compared via Kruskal–Wallis one-way ANOVA and Tukey–Kramer post hoc tests, ***p<0.001, **p<0.01, *p<0.05, ns indicates not significant. For full details of statistical tests, see Table 1. (f) Top: example images of neural activity elicited by pure tones from an example recording (left) and predicted by the DNN (right). Each subplot shows the frequency response area for one unit (the average activity recorded during the presentation of tones with different frequencies and sound levels). The colormap for each plot is normalized to the minimum and maximum activity level across all frequencies and intensities. Bottom: predictive power of the DNN for tone responses across all frequencies at two different intensities. Each point indicates the average percent of the explainable variance that was captured by the DNN for all units from one animal (with horizontal jitter added for visibility). (g) Top left: schematic diagram of transfer learning for new animals. Bottom: example images of neural activity elicited by sinusoidally amplitude modulated (SAM) noise with two different modulation frequencies from an example recording (left; average over 128 repeated trials) and predicted by the DNN (right). Top right: predictive power of the DNN for SAM noise responses across all modulation frequencies and modulation depths at two different intensities. Each point indicates the average percent of the explainable variance that was captured by the DNN for all units from one of four new animals after transfer learning using a frozen encoder from one of six animals in the original training set.

The DNN first projects sound into a high-dimensional feature space using an encoder with a cascade of convolutional layers. It then reduces the dimensionality of its feature space through a bottleneck layer and uses a simple (i.e., not convolutional) linear readout to transform the activations in the bottleneck into the neural activity for each recorded unit. During training, the DNN learns to use the bottleneck to identify the low-dimensional feature space that captures as much of the explainable variance in the recorded neural activity as possible, that is, the signal manifold. (Note that the structure of the DNN is not meant to reflect the anatomy of the auditory system; it is simply a tool for identifying latent dynamics and predicting neural activity.)

Once trained, the DNN can be used to simulate neural activity for any sound, whether it was presented during neural recordings or not, with the activations in the bottleneck layer reflecting the underlying signal dynamics. This supervised approach to identifying the signal manifold also has the added advantage that it eliminates the residual noise that is inevitable with unsupervised methods such as PCA (Figure 3b). (See decreasing SNR with successive PCs in Figure 2d.).

The utility of the DNN rests, of course, on its ability to faithfully reproduce the recorded neural activity. We trained and tested a separate DNN for each animal (after partitioning the recordings into training and test sets as described above) and found that they performed with remarkable accuracy. The explainable variance explained for activity in the test set approached 100% for the units with highest explainable variance and was far beyond that achieved by a standard single-layer linear–nonlinear model (Figure 3c). We varied the size of the bottleneck layer and found that performance plateaued with more than eight channels for both normal hearing animals and those with hearing loss, consistent with the dimensionality of the signal manifold identified through PCA (Figure 3c).

We also assessed the similarity of the DNN-derived signal dynamics across animals by measuring how much of the variance in the bottleneck activations from one animal could be explained by the bottleneck activations from another (Figure 3d and e; allowing for additional linear transformation). We found that the signal dynamics for animals with the same hearing status were nearly identical (97% average variance explained for both normal hearing and hearing loss), while the similarity between the signal dynamics for animals with normal hearing and those with hearing loss was much lower (87% average variance explained). Thus, the signal manifold as identified by the DNN appears to have similar properties to that identified through PCA, with the added advantages of reduced residual noise and the ability to probe the signal dynamics using novel sounds.

To examine the degree to which the DNNs trained on speech were capable of predicting responses to other sounds, we compared recorded and DNN-generated responses to pure tones with different frequencies and intensities (Figure 3f). The DNNs performed well, explaining an average of 83% of the explainable variance in the recorded activity across animals. To further test the generality of the DNN models, we used transfer learning to test their ability to predict responses to new sounds for a new set of animals. If the DNN encoder really does capture transformations that are common to all animals with the same hearing status, then it should be possible to use a trained encoder from one animal to predict responses for a new animal after learning only a new linear readout (Figure 3g). For each of the DNN models trained on activity from one of the six normal hearing animals in our original dataset, we froze the encoder and retrained the linear readout for each of four new normal hearing animals. We initialized the readout weights for each unit in a new animal using the readout weights for a random unit from the original animal, and then optimized the weights using a relatively small sample (between 2 and 3.5 hr) of activity recorded from the new animal during the presentation of speech and moving ripples. We then used the new DNN model (the frozen encoder and the optimized readout) to predict responses from the new animal to sinusoidally amplitude modulated (SAM) broadband noise sounds with different modulation frequencies, modulation depths, and intensities. The new DNN models performed well, explaining an average of 85% of the explainable variance in the recorded activity across animals. While pure tones and SAM noise are only two of many possible sounds, these results provide encouraging evidence of the generality of the DNN models.

Hearing loss distorts spectral processing

Before continuing our investigation of the neural coding of speech, we first used the DNN to examine the impact of hearing loss on the processing of basic acoustic features. To assess spectral processing, we presented the DNN for each animal with a stream of pure tones with different frequencies and intensities and extracted the activations from the bottleneck layer (Figure 4a; we set the dimensionality of the bottleneck layer to 8 for this and all subsequent analyses). The frequency response areas (FRAs) for individual bottleneck channels resembled those that are typically observed for individual neurons in the IC: some exhibited a clear preferred frequency at low intensities and broader tuning at high intensities, while others had more complex shapes (Figure 4b). For animals with hearing loss, elevated intensity thresholds were also evident.

Figure 4 with 1 supplement see all
Neural signal dynamics for pure tones.

(a) Schematic diagram showing pure tone sounds with different frequencies and intensities and corresponding bottleneck activations. (b) Frequency response areas (FRAs) for the eight bottleneck channels from a normal hearing animal (top) and an animal with hearing loss (bottom). Each subplot shows the average activity for one channel in response to tones with different frequencies and intensities. The colormap for each plot is normalized to the minimum and maximum activity level across all frequencies and intensities. (c) Dimensionality reduction of bottleneck activations via principal component analysis (PCA). Each line shows the variance explained by the top two principal components (PCs) for one animal as a function of the intensity of the tones. (d) Signal dynamics for pure tones for a normal hearing animal (left) and an animal with hearing loss (right). The top two rows show the projection of the bottleneck activations onto each of the top two PCs as a function of time. The inset value indicates the percent of the variance in the bottleneck activations explained by each PC. The bottom row shows the projections from the top two rows plotted against one another. Each line shows the dynamics for a different tone frequency. Each column shows the dynamics for a different tone intensity. (e) Representational dissimilarity matrices (RDMs) computed from bottleneck activations. The left image shows the average RDM for normal hearing animals for tones at 55 dB SPL. The value of each pixel is proportional to the point-by-point correlation between the activations for a pair of tones with different frequencies. The center image shows the same lower half of the RDM for normal hearing animals along with the upper half of the RDM for animals with hearing loss at the same intensity. The right image shows the same lower half of the RDM for normal hearing animals along with the upper half of the RDM for animals with hearing loss at the best intensity (that which produced the highest point-by-point correlation between the normal hearing and hearing loss RDMs). (f) The point-by-point correlation between RDMs for each pair of normal hearing animals (n = 15), and each pair of animals with different hearing status compared at either the same intensity or the best intensity (n = 36). Median values were compared via Kruskal–Wallis one-way ANOVA and Tukey–Kramer post hoc tests, ***p<0.001, **p<0.01, * p<0.05, ns indicates not significant. For full details of statistical tests, see Table 1. (g) Average signal dynamics for pure tones for normal hearing animals (left) and animals with hearing loss (right) after alignment via multiway canonical correlation analysis (MCCA). (h) The similarity between dynamics after alignment via pairwise canonical correlation analysis (CCA) (see ‘Methods’) for each pair of normal hearing animals, and each pair of animals with different hearing status compared at either the same intensity or the best intensity (that which produced the highest similarity between the dynamics).

To visualize the signal dynamics, we applied PCA to the bottleneck activations (for each intensity separately) and projected the full dynamics onto the top two PCs, which explained more than 90% of the variance for these simple sounds (Figure 4c). For normal hearing animals, the paths traced by the dynamics within the signal manifold for different sounds, which we term trajectories, were distinct and formed an orderly arrangement, but with a clear change in geometry across intensities (Figure 4d). At low intensities, the trajectories for different frequencies were distinct across both PCs, each of which accounted for substantial variance (the percent of the variance in the signal dynamics explained by each PC at each intensity is indicated on each panel). But at high intensities, the trajectories for all frequencies were similar along the first PC, which accounted for the vast majority of the variance, and varied only along the second PC. These intensity-dependent changes in the geometry of the signal dynamics are consistent with the known effects of intensity on spectral tuning in IC neurons. At low intensities, tuning is narrow and, thus, different tone frequencies elicit distinct population activity patterns. But at high intensities, because tuning is broader, the population activity patterns elicited by different frequencies are less distinct.

For animals with hearing loss, the signal dynamics were dramatically different. At moderate intensity, only the lowest frequencies elicited any activation (as expected, given the larger elevation in intensity thresholds at high frequencies; Figure 2—figure supplement 1). At higher intensity, all tones elicited activation, but rather than forming an orderly arrangement within the manifold, the trajectories for different frequencies clustered into two groups, one for low frequencies and one for high frequencies. At the highest intensity, the trajectories for different frequencies became more distinct, but clear clustering remained. The increased similarity in the signal trajectories for different frequencies is consistent with the increased similarity of spectral tuning between individual IC neurons in animals with hearing loss (Barsz et al., 2007; Willott, 1986), but the clustering of dynamics within the signal manifold is an emergent network-level phenomenon that has not previously been observed.

To further analyze the dynamics and allow for direct comparisons across animals, we first turned to representational similarity analysis (RSA) (Kriegeskorte et al., 2008). RSA uses only the relative distances between trajectories within a manifold and, thus, does not require the dynamics for different animals to be aligned. The first step in RSA is to form a representational dissimilarity matrix (RDM) for each set of trajectories, with each entry in the RDM equal to one minus the point-by-point correlation between a pair of trajectories for two different sounds (Figure 4e, left).

The structure of the RDMs was consistent with the observations about the dynamics made above. For normal hearing animals, the RDMs had a diagonal symmetry with a smooth gradient, indicating that the similarity of the trajectories for two frequencies decreased gradually as the difference between the frequencies increased. For animals with hearing loss, the RDMs had a block structure, indicating that the trajectories formed two clusters (Figure 4e, center; note that RDMs are symmetric, so the lower half of the normal hearing RDM is shown with the upper half of the hearing loss RDM for comparison).

Because we were interested in identifying the effects of hearing loss beyond those related to audibility, we also compared the normal hearing RDMs at a given intensity to the hearing loss RDMs at the best intensity, that is, at whatever intensity resulted in the highest similarity to the normal hearing RDM for each pair of animals (measured as the point-by-point correlation between the RDMs). Amplification to the best intensity softened the block structure and shifted the transition between clusters to a lower frequency, but did not fully restore the diagonal structure present in the normal hearing RDMs (Figure 4e, right). Overall, the similarity between the RDMs for different normal hearing animals at moderate intensity (55 dB SPL) was high (0.91 ± 0.01; mean ± SEM; n = 15 pairwise comparisons; Figure 4f; for full details of all statistical tests, see Table 1). The similarity between the normal hearing and hearing loss RDMs at the same moderate intensity was much lower (0.59 ± 0.02; n = 36) and remained relatively low even at the best intensity (0.78 ± 0.02; n = 36).

The results of RSA are easy to interpret, but, because it uses only the relative distances between trajectories, it can be insensitive to distortions that impact the overall structure of the dynamics (e.g., a change in temporal dynamics that is common across all sound frequencies). To allow for direct comparisons of overall structure, we used canonical correlation analysis (CCA) (Dabagia et al., 2022). CCA identifies a series of linear projections, known as canonical components (CCs), that attempt to align two sets of dynamics such that the point-by-point correlation between trajectories after projection onto their respective CCs is maximized. The set of CCs for a given set of dynamics are required to be orthogonal to each other and to explain all of the variance in the original trajectories. CCA can also be extended to simultaneously align dynamics for an entire group of animals through multiway canonical correlation analysis (MCCA) (de Cheveigné et al., 2019).

The average dynamics after alignment via CCA exhibited phenomena that were similar to those that were evident for individual animals (Figure 4g). For normal hearing animals, the trajectories for different tone frequencies were distinct and formed an orderly arrangement with frequency-dependent variation across two dimensions at low intensities, while at high intensities the variation across frequencies was largely confined to the second CC. For animals with hearing loss, the trajectories for different frequencies clustered into two groups at moderate intensities and remained clustered, albeit more weakly, at high intensities. This clustering was also evident when a similar analysis was performed directly on recorded neural activity (Figure 4—figure supplement 1).

To measure the similarity between the dynamics for different animals after alignment via CCA, we used a weighted sum of the point-by-point correlations between the two sets of dynamics after projection onto each pair of CCs, with the weight for the correlation associated with each pair of CCs given by the average variance in the original dynamics that those CCs explained (see ‘Methods’ for equation). Overall, the similarity between the dynamics for different normal hearing animals at moderate intensity after alignment via CCA was extremely high (0.98 ± 0.01; n = 15; Figure 4h). The similarity between the aligned dynamics for normal hearing and hearing loss animals at the same moderate intensity was much lower (0.37 ± 0.02; n = 36) and remained below normal even when compared at the best intensity (0.88 ± 0.01; n = 36). Taken together, the RSA and CCA results suggest that hearing loss results in a fundamental disruption of spectral processing at the network level.

Hearing loss does not distort temporal processing

We next assessed temporal processing by performing a similar analysis on the bottleneck activations elicited by a stream of SAM broadband noise sounds with different modulation frequencies and intensities (Figure 5a). For these sounds, two dimensions were again enough to capture almost all of the variance in the full signal dynamics across all intensities (Figure 5b). For both normal hearing animals and those with hearing loss, the explicit tracking of envelope modulations in the signal dynamics decreased with increasing modulation frequency and increasing intensity (Figure 5c). But when compared at the same intensity, the dynamics for animals with hearing loss clearly differed from those for animals with normal hearing (Figure 5d and e).

Neural signal dynamics for amplitude modulated noise.

(a) Schematic diagram showing amplitude-modulated noise sounds with different intensities and modulation frequencies. (b) Variance in bottleneck activations explained by top two principal components (PCs) for each animal as function of sound intensity. (c) Envelope tracking in signal dynamics measured as the coefficient of variation of the bottleneck activations for sounds with different modulation frequencies at an intensity of 55 dB SPL (left) and sounds at different intensities with a modulation frequency of 100 Hz (right). Values shown are the average across animals. (d) Signal dynamics for a normal hearing animal (left) and an animal with hearing loss (right). Each line shows the dynamics for a different modulation frequency. (e) Signal dynamics for a modulation frequency of 100 Hz after projection onto the top PC. The top panel shows the dynamics for a normal hearing animal and an animal with hearing loss at 55 dB SPL. The bottom panel shows the same dynamics for the normal hearing animal along with the dynamics for the animal with hearing loss at the best intensity. (f) Representational dissimilarity matrices (RDMs) computed from bottleneck activations and the point-by-point correlation between RDMs for different pairs of animals at 55 dB SPL or best intensity. Median values were compared via Kruskal–Wallis one-way ANOVA and Tukey–Kramer post hoc tests, ***p<0.001, **p<0.01, *p<0.05, ns indicates not significant. For full details of statistical tests, see Table 1. (g) The similarity between dynamics after alignment via pairwise canonical correlation analysis (CCA) for different pairs of animals at 55 dB SPL or best intensity. (h) Schematic diagram showing amplitude modulated noise sounds with different intensities and modulation depths. (i) Signal dynamics for a normal hearing animal at 55 dB SPL (left) and an animal with hearing loss at the best intensity (right). Each line shows the dynamics for a different modulation depth. (j) The point-by-point correlation between RDMs for different pairs of animals at 55 dB SPL or best intensity. (k) The similarity between dynamics after alignment via pairwise CCA for different pairs of animals at 55 dB SPL or best intensity.

This was confirmed by RSA (Figure 5f), which indicated that while the similarity between normal hearing RDMs at moderate intensity (55 dB SPL) was extremely high (0.99 ± 0.01; n = 15), the similarity between normal hearing and hearing loss RDMs was lower (0.76 ± 0.01; n = 36). But when compared at the best intensity to eliminate differences related to audibility, the dynamics for animals with hearing loss became nearly identical to those for animals with normal hearing (Figure 5e), and this was reflected in the similarity between RDMs (0.99 ± 0.01; n = 36).

Comparing the similarity of the dynamics after alignment via CCA yielded similar results (Figure 5g). The similarity between the dynamics for different normal hearing animals at moderate intensity after alignment via CCA was high (0.97 ± 0.01; n = 15). The similarity between the aligned dynamics for normal hearing and hearing loss animals was much lower when compared at the same moderate intensity (0.44 ± 0.02; n = 36) but increased to normal levels when the comparison was made at the best intensity (0.95 ± 0.01; n = 36). Thus, it appears that hearing loss has little impact on temporal processing beyond that which results from decreased audibility.

We verified that this was also true for the processing of sounds with different modulation depths. We performed the same analysis on the bottleneck activations elicited by a stream of SAM noise sounds with different modulation depths and intensities (and a fixed modulation frequency of 30 Hz; Figure 5h). When compared at the best intensity, the signal dynamics for normal hearing animals and animals with hearing loss were nearly identical (Figure 5i), with the explicit tracking of envelope modulations decreasing with decreasing modulation depth. The overall similarity measured at the best intensity both by RSA (0.99 ± 0.01; n = 36; Figure 5j) and after alignment via CCA (0.96 ± 0.01; n = 36; Figure 5k) confirmed that the impact of hearing loss on temporal processing beyond that which results from decreased audibility was negligible.

Distortions in the neural code for speech in quiet are largely corrected by amplification

Having established that the distortions in neural signal dynamics caused by hearing loss affect primarily spectral, rather than temporal, processing for simple sounds, we next returned to speech. We focused on consonants, which vary widely in their spectral properties and are the primary contributor to the perceptual deficits exhibited by people with hearing loss when listening to ongoing speech (Fogerty et al., 2012). We presented the DNN with a stream of isolated consonants (diphone syllables with the vowel removed), each uttered multiple times by multiple talkers (Figure 6a). The consonants can be divided into three broad groups: the vowel-like consonants (nasals and approximants), which are dominated by low frequencies; the plosives, which are broadband; and the fricatives, which are dominated by high frequencies (Figure 6b).

Neural signal dynamics for speech in quiet.

(a) Schematic diagram showing different consonants. (b) Average power spectra for three classes of consonants. The individual consonants are shown in the inset in panel (d). (c) Variance in bottleneck activations explained by top two principal components (PCs) for each animal as a function of sound intensity. (d) Signal dynamics for a normal hearing animal (left) and an animal with hearing loss (right). Each line shows the dynamics for a different consonant, averaged over all instances. (e) Representational dissimilarity matrices (RDMs) computed from bottleneck activations and the point-by-point correlation between RDMs for different pairs of animals at 60 dB SPL, best intensity, or with a hearing aid. Median values were compared via Kruskal–Wallis one-way ANOVA and Tukey–Kramer post hoc tests, ***p<0.001, **p<0.01, *p<0.05, ns indicates not significant. For full details of statistical tests, see Table 1. (f) The similarity between dynamics after alignment via pairwise canonical correlation analysis (CCA) for different pairs of animals at 60 dB SPL, best intensity, or with a hearing aid. (g) Performance of a support vector machine classifier trained to identify consonants based on bottleneck activations for each normal hearing animal (n = 6) at 60 dB SPL and each animal with hearing loss (n = 6) at either 60 dB SPL, best intensity, or with a hearing aid.

For both normal hearing animals and animals with hearing loss, two dimensions were again sufficient to explain nearly all of the variance in the bottleneck activations (Figure 6c). For normal hearing animals, the dynamics elicited by different consonants followed distinct trajectories that were organized within the signal manifold according to consonant type (Figure 6d; the dynamics shown are the average across all instances of each consonant). For animals with hearing loss, only the vowel-like consonants elicited responses at moderate intensity (as expected, given the larger elevation in intensity thresholds at high frequencies; Figure 2—figure supplement 1). At higher intensities, all consonants elicited responses but the trajectories were not as distinct as with normal hearing and exhibited a clustering similar to that observed for pure tones (Figure 4d), which softened at the highest intensity.

The differences in the dynamics for normal hearing animals and those with hearing loss were evident in the RDMs (Figure 6e). When compared at a typical conversational intensity (60 dB SPL), the similarity between normal hearing RDMs was high (0.94 ± 0.01; n = 15), but the similarity between normal hearing and hearing loss RDMs was low (0.23 ± 0.02; n = 36). The normal hearing and hearing loss RDMs were much more similar when compared at the best intensity, though some differences remained (0.87 ± 0.01; n = 36).

Comparing the similarity of the dynamics after alignment via CCA yielded similar results (Figure 6f). The similarity between the dynamics for different normal hearing animals after alignment via CCA was high (0.96 ± 0.01; n = 15). The similarity between normal hearing and hearing loss animals when compared at the same conversational intensity was much lower (0.31 ± 0.02; n = 36) and increased when the comparison was made at the best intensity, but not to normal levels (0.77 ± 0.01; n = 36).

Given that hearing loss seems to impact primarily spectral processing, we investigated whether the distortions in the neural code for speech could be corrected by providing frequency-dependent amplification using a simulated hearing aid (Armstrong et al., 2022; Alexander and Masterson, 2015). We used the measured ABR threshold shifts for each animal to set the parameters for the amplification, which resulted in a gain of approximately 10 dB at low frequencies and 30 dB at high frequencies for speech at conversational intensity (Figure 2—figure supplement 1), and presented the same stream of consonants again after processing with the hearing aid. The frequency-dependent amplification was effective in reducing the distortion in the dynamics for animals with hearing loss. The overall similarity between normal hearing and hearing loss animals as measured by RSA was restored to normal levels (0.91 ± 0.01; n = 36; Figure 6e), and the similarity measured after alignment via CCA was also increased, though some residual distortion remained (0.86 ± 0.01; n = 36; Figure 6f). (Note that we did not record neural activity in response to speech through the simulated hearing aid; while we have no specific reason to doubt the accuracy of the DNN model for this class of sounds, the fact that it has not been explicitly validated should be considered.)

To evaluate the functional consequences of the remaining distortion, we turned to decoding. We trained a support vector machine to classify consonants based on the signal dynamics for each animal. For normal hearing animals, the decoder identified 55% of consonants correctly (±1%; n = 6; chance = 4.5%) when the consonants were presented at conversational intensity (Figure 6g). For animals with hearing loss, performance at the same intensity was lower (41 ± 1%; n = 6) but increased substantially at best intensity (47 ± 1%; n = 6), and increased further still with frequency-dependent amplification by the hearing aid (49 ± 1%; n = 6). Taken together, these results suggest that while amplification cannot completely restore the neural code for speech in quiet to normal, the residual distortions are relatively minor.

Distortions in the neural code for speech in noise persist even after frequency-weighted amplification

Given that the perceptual difficulties experienced by listeners with hearing loss are most pronounced in noisy environments, we expected that the addition of background noise to the speech would create larger distortions in the neural code. We presented the same consonant stream with added speech babble (background noise formed by adding together the voices of many different talkers; Figure 7a and b) at a speech-to-noise ratio of 3 dB, which is typical of real-world settings experienced by hearing aid users (Christensen et al., 2021). The addition of background noise increased the dimensionality of the signal dynamics relative to simple sounds or speech in quiet, especially at high overall intensities; three PCs were often required to capture more than 90% of the variance (Figure 7c). (Note that both the speech and the background noise contribute to the signal dynamics, which encode all incoming sounds without distinction.)

Neural signal dynamics for speech in noise.

(a) Schematic diagram showing different consonants in speech babble. (b) Average power spectra for three classes of consonants and speech babble. (c) Variance in bottleneck activations explained by top two (left) or three (right) principal components (PCs) for each animal as a function of sound intensity. (d) Signal dynamics for a normal hearing animal (top) and an animal with hearing loss (bottom). Each line shows the dynamics for a different consonant, averaged over all instances. Dynamics for speech in quiet are shown alongside those for speech in noise for comparison. (e) Point-by-point correlation between representational dissimilarity matrices (RDMs) for different pairs of animals at 70 dB SPL, best intensity, or with a hearing aid. Median values were compared via Kruskal–Wallis one-way ANOVA and Tukey–Kramer post hoc tests, ***p<0.001, ** p<0.01, *p<0.05, ns indicates not significant. For full details of statistical tests, see Table 1. (f) The similarity between dynamics after alignment via pairwise canonical correlation analysis (CCA) for different pairs of animals at 70 dB SPL, best intensity, or with a hearing aid. (g) Performance of a support vector machine classifier trained to identify consonants based on bottleneck activations for each normal hearing animal at 70 dB SPL, and each animal with hearing loss at 70 dB SPL, best intensity, or with a hearing aid.

For normal hearing animals, the signal dynamics for speech-in-noise and speech-in-quiet were nearly identical at the lowest intensities, but differed strongly at higher intensities (Figure 7d; the dynamics shown are the average across all instances of each consonant). For speech-in-noise at the highest intensity, there was a clear distinction between the first PC, which provided a clean reflection of each consonant (though not the same as for speech in quiet), and the other PCs, which were dominated by the background noise (despite averaging across all instances of each consonant with independent noise). The trends were similar for animals with hearing loss, though the background noise was reflected in the signal dynamics even more strongly.

When compared at the same high intensity (70 dB SPL), typical of a social setting, both RSA and CCA indicated strong effects of hearing loss on the signal dynamics for speech in noise (Figure 7e and f). The similarity between normal hearing RDMs was high (0.89 ± 0.02; n = 15), but the similarity between normal hearing and hearing loss RDMs was much lower (0.15 ± 0.03; n = 36). Amplification to best intensity increased the similarity between normal hearing and hearing loss RDMs (0.62 ± 0.02; n = 36), as did the frequency-weighted amplification provided by the hearing aid (0.56 ± 0.03; n = 36), but neither was sufficient to bring the similarity close to normal levels. For both forms of amplification, the similarity of the signal dynamics for speech in noise to those with normal hearing was much lower than for speech in quiet (best intensity: 0.62 vs. 0.87, n = 36; p<1e-10, paired t-test; hearing aid: 0.56 vs. 0.91, n = 36, p<1e-10, paired t-test). Comparing the similarity of the dynamics after alignment via CCA yielded similar results. The similarity between the dynamics for different normal hearing animals after alignment via CCA was high (0.92 ± 0.01; n = 15). The similarity between normal hearing and hearing loss animals when compared at the same intensity was much lower (0.49 ± 0.02; n = 36) and increased when the comparison was made at the best intensity (0.68 ± 0.01; n = 36) or after processing with the hearing aid (0.70 ± 0.02; n = 36), but remained well below normal levels. Again, for both forms of amplification, the similarity of the signal dynamics for speech in noise to those with normal hearing was much lower than for speech in quiet (best intensity: 0.68 vs. 0.77, n = 36; p<1e-6, paired t-test; hearing aid: 0.70 vs. 0.86, n = 36, p<1e-10, paired t-test).

Decoding the signal dynamics for each animal suggested that the distortions in the signal dynamics for speech in noise had functional consequences (Figure 7g). For normal hearing animals, the decoder identified 32% of consonants correctly (±1%; n = 6). For animals with hearing loss, performance at the same intensity was lower (15 ± 1%; n = 6) and remained well below normal levels both at the best intensity (23 ± 1%; n = 6) or after processing with the hearing aid (22 ± 1%; n = 6).

Hearing loss causes hypersensitivity to background noise

To gain a better understanding of the differential impact of the background noise with and without hearing loss, we used MCCA to jointly align the signal dynamics for all animals with normal hearing and hearing loss so that we could make direct comparisons. We first analyzed the results for speech in quiet. When compared at best intensity (Figure 8a), there was good alignment between the dynamics for animals with normal hearing and hearing loss. The correlation for pairs of animals after projection onto the first CC, which accounted for 88 ± 2% and 71 ± 7% of the variance in animals with normal hearing (n = 6) and hearing loss (n = 6), respectively, was 0.94 ± 0.01 (n = 36). The correlation after projection onto the second CC, which accounted for the remaining variance, was lower (0.44 ± 0.03; n = 36).

Hypersensitivity to background noise after hearing loss.

(a) Average signal dynamics for speech in quiet at 60 dB SPL for normal hearing animals (left) and at best intensity for animals with hearing loss (right) after joint alignment via multiway canonical correlation analysis (MCCA). The inset value within each panel indicates the average percent of the variance in the bottleneck activations explained by each canonical component (CC). The inset value between columns indicates the average correlation between the dynamics after projection onto each pair of CCs. (b) Average signal dynamics for speech in quiet at 60 dB SPL for normal hearing animals (left) and with a hearing aid for animals with hearing loss (right) after joint alignment via MCCA. (c) Average signal dynamics for speech in noise at 70 dB SPL for normal hearing animals (left) and at best intensity for animals with hearing loss (right) after joint alignment via MCCA. (d) Average signal dynamics for speech in noise at 70 dB SPL for normal hearing animals (left) and with a hearing aid for animals with hearing loss (right) after joint alignment via MCCA. (e) Schematic diagram showing amplitude modulated narrowband noise target and masker sounds. (f) Correlation between bottleneck activations for target noise with and without masker noise at 70 dB SPL. Each line shows the correlation for a different target center frequency as a function of the masker center frequency, averaged across individual animals. (g) Change in correlation between bottleneck activations for target noise with and without masker noise at 70 dB SPL for animals with hearing loss relative to animals with normal hearing. (h) Change in correlation for animals with hearing loss at best intensity relative to animals with normal hearing at 70 dB SPL. (i) Change in correlation for animals with hearing loss with a hearing aid relative to animals with normal hearing at 70 dB SPL.

With the hearing aid (Figure 8b), the alignment between the dynamics for animals with normal hearing and hearing loss was even better. The correlation after projection onto the first CC remained high (0.94 ± 0.01; n = 36) and the correlation after projection onto the second CC rose substantially (0.73 ± 0.01; n = 36). These results are consistent with the analysis for speech in quiet above that suggested that only minimal distortions in the signal dynamics for animals with hearing loss remain after frequency-weighted amplification is provided (Figure 6).

For speech in noise, the alignment was generally worse overall. When compared at best intensity (Figure 8c), the correlations after projection onto the CCs were 79 ± 1%, 74 ± 3%, and 18 ± 3%, respectively (n = 36). But even for the first two CCs for which the alignment was relatively good, the percent of the variance accounted for by each was substantially different for normal hearing animals and those with hearing loss. The first CC, which provided a clean reflection of each consonant, accounted for 56 ± 1% of the variance for normal hearing animals (n = 6), but only 38 ± 7% of the variance for animals with hearing loss (n = 6). Conversely, the second CC, which was dominated by the background noise, accounted for 41 ± 6% of the variance for animals with hearing loss (n = 6), but only 28 ± 1% of the variance for animals with normal hearing. Thus, while the neural subspaces encoding the speech and the background noise seem to be similar for all animals, the balance of variance between these subspaces is tilted much more toward background noise in animals with hearing loss.

Given the nature of the hearing loss (larger elevation in intensity thresholds at high frequencies; Figure 2—figure supplement 1) and the spectral properties of speech babble (higher power at low frequencies; Figure 7b), the hypersensitivity of animals with hearing loss to the background noise is somewhat expected. However, the problem was, if anything, exacerbated by the selective amplification of high frequencies provided by the hearing aid (Figure 8d). The CC that was dominated by the background noise (now the first CC, since it produced the highest correlation) accounted for 46 ± 6% of the variance for animals with hearing loss (n = 6), but only 31 ± 1% of the variance for animals with normal hearing, while the CC that provided a clean reflection of each consonant (now the second CC) accounted for 56 ± 1% of the variance for normal hearing animals (n = 6), but only 33 ± 6% of the variance for animals with hearing loss (n = 6). Thus, it appears that hearing loss allows the signal dynamics to be captured by background noise at the expense of foreground speech in a way that cannot be corrected by simple frequency-dependent amplification.

To characterize the distortions in spectral processing that underlie this effect, we examined how the processing of one narrowband sound is impacted by the presence of another. We used narrowband noise modulated by a 20 Hz sinusoidal envelope as the target sound and narrowband noise modulated by a pink noise envelope as the masker (Figure 8e). We varied the center frequency of the target and masker separately and the intensity of the target and masker together (to maintain a constant target-to-masker ratio). We presented the target sounds to the DNN for each animal with and without the masker sounds and measured the differences between the signal dynamics across the two conditions by computing their correlation.

For animals with normal hearing, the correlation between the target only and target plus masker dynamics ranged from 0.25 to 0.75, with the masker having the largest impact when its center frequency was similar to that of the target (Figure 8f). When compared at the same high intensity (70 dB SPL), the correlation for animals with hearing loss was typically lower than normal when the masker center frequency was low, especially when the target center frequency was high (Figure 8g). This regime (high-frequency target, low-frequency masker) is comparable to the speech-in-noise scenario analyzed above (many consonants contain substantial high-frequency content, while speech babble is dominated by low frequencies, see Figure 7b), and the increased impact of the masker with hearing loss is consistent with observed hypersensitivity to background noise at the expense of foreground speech. For high-frequency maskers, the correlation for animals with hearing loss was often higher than normal, especially for low-frequency targets. This is unsurprising given the sloping nature of the hearing loss, but this regime (low-frequency target, high-frequency masker) is uncommon in real-world speech.

When the correlation for animals with normal hearing and hearing loss was compared either at the best intensity (Figure 8h) or after frequency-weighted amplification with the hearing aid (Figure 8i), the pattern of results was similar. In the regime that is most comparable to consonants in speech babble (low-frequency masker, high-frequency target), the correlation for animals with hearing loss was lower than normal. Thus, hearing loss appears to create deficits in spectral processing that manifest as highly nonlinear distortions in cross-frequency interactions that are particularly problematic for real-world speech in noise and cannot be corrected by simply compensating for lost sensitivity.

Discussion

In this study, we took advantage of recently developed tools for large-scale neural recordings and nonlinear modeling that allowed us to gain new insights into the impact of hearing loss on auditory processing at the network level. We first used a traditional approach to manifold learning to establish that the neural code for speech in the IC can be well described by low-dimensional latent signal dynamics that are common across animals with similar hearing status but fundamentally altered by hearing loss. We then trained a DNN to replicate neural coding in the IC with high accuracy using a framework that also facilitated manifold learning. The DNN exhibited dynamics in response to speech that were similar to those identified directly from IC recordings, and further probing of the DNN dynamics with novel sounds allowed us to identify changes in cross-frequency interactions as a key contributor to the distorted neural coding of speech-in-noise with hearing loss.

Suprathreshold effects of hearing loss

The effects of hearing loss beyond increased detection thresholds are often ignored in clinical assessment and treatment. But these suprathreshold effects are, in fact, the main problem for many people in real-world settings, such as busy workplaces or social gatherings, where sound intensities are high and amplification via a hearing aid provides little benefit. The clinical neglect of suprathreshold effects is not due to a lack of awareness, but rather to a lack of effective treatments. And the lack of effective treatments stems from a lack of understanding of how the many physiological changes that accompany hearing loss contribute to complex perceptual deficits.

Many specific suprathreshold impairments with plausible links to speech-in-noise perception have been identified, such as decreased frequency selectivity, dynamic range, or temporal resolution (Moore, 2007). But the extent to which each of these specific impairments contributes to real-world hearing problems has been difficult to determine. Our results suggest that changes in spectral processing, particularly in the interactions between different frequencies, are the primary problem. While impaired spectral processing was evident for simple tones as a clustering of dynamical trajectories within the signal manifold, this impairment does not appear to be a major problem for the neural coding of speech per se as suitable amplification corrected many of the distortions caused by hearing loss for speech in quiet. For speech in noise, however, there was a hypersensitivity to background noise with hearing loss that amplification (simple or frequency-weighted) did little to alleviate, and this was consistent with observed interactions between narrowband sounds, which revealed an increased disruption of high-frequency targets by low-frequency maskers both with and without amplification.

These results are consistent with a recent study that found that hearing loss caused the IC activity patterns elicited by different phonemes to become less distinct, and that a hearing aid failed to correct this problem for speech in noise (Armstrong et al., 2022). They are also consistent with a body of work demonstrating that listeners with hearing loss struggle to combine temporal envelope cues across frequency channels (Healy and Bacon, 2002; Healy and Carson, 2010; Souza and Boike, 2006; Grant et al., 2007). When speech is reduced to a single amplitude-modulated band, speech recognition performance is similar for all listeners, independent of their hearing status, suggesting that temporal processing of the speech envelope per se is unaffected by hearing loss. But as additional amplitude-modulated bands are added, performance increases more for normal hearing listeners than for those with hearing loss, suggesting that the latter group are less able to make use of complementary temporal information across multiple frequency channels. This difference is most pronounced when comparing the ability to make use of temporal modulations in high-frequency channels (4–6 kHz) in the presence of temporal modulations in lower frequency channels (1–2 kHz) (Grant et al., 2007), and it does not appear to be a simple consequence of broadened frequency tuning but rather a specific deficit in cross-frequency interactions (Healy and Carson, 2010).

Distorted spectral processing from cochlea to cortex

Understanding exactly what is going wrong with spectral processing after hearing loss at a mechanistic level remains a challenge. The effects of hearing loss on spectral processing in the cochlea have been well described in terms of the observed changes in the frequency tuning curves of individual AN fibers. After hearing loss, the tuning curve ‘tip’ (corresponding to the characteristic frequency [CF] to which the fiber is most sensitive) becomes less sensitive and may shift toward lower frequencies while the ‘tail’ (corresponding to frequencies below CF) may become more sensitive (Young, 2012). It is difficult to know the degree to which these changes distort the basic tonotopic map in the cochlea (i.e., the relationship between CF and cochlear position) because few studies have identified the cochlear position from which recorded fibers originate. The limited data that exist suggest that the effect of hearing loss on CF tonotopy is modest (Liberman and Kiang, 1984), but the effect on the tonotopic map of best frequency (BF; the frequency that elicits the strongest response from a fiber at higher intensities) can be much larger (Henry et al., 2016), and can be accompanied by more complex changes in spectral processing such as decreased synchrony capture (Young, 2012).

One recent study has provided insight into how the complex spectral distortions in the cochlea impact the neural coding of speech in noise in the AN (Parida and Heinz, 2022). In animals with mild-to-moderate sensorineural hearing loss, fibers with high CFs responded excessively to low-frequency sounds and it was this effect (rather than changes in frequency selectivity per se or temporal processing) that appeared to be most responsible for the disrupted neural coding of speech in noise. These results are consistent with our observations of hypersensitivity to background noise and increased disruption of high-frequency targets by low-frequency maskers in the IC. The distortion of spectral processing that we observe as clustering of network dynamics for simple tones in the IC, however, does not appear to be present in the AN. It is difficult to be certain since no directly comparable analysis has been performed on experimental AN responses, but our analysis of simulated AN responses suggests that hearing loss has more complex effects on the signal dynamics in the AN than simply causing clustering (Figure 4—figure supplement 1). This transformation from complex distortions in signal dynamics for tones in the AN to the simple clustering observed in the IC could arise from normal central processing of distorted peripheral inputs, but it is also possible that plastic changes in central processing that follow hearing loss facilitate a shift into a new dynamical regime.

Distorted spectral processing has also recently been observed in the auditory cortex after mild-to-moderate sloping sensorineural hearing loss (McGill et al., 2022). Animals displayed behavioral hypersensitivity for detection of tones at the edge frequencies around which the hearing loss increased from mild to moderate as well as an overrepresentation of these frequencies in the cortical tonotopic map of BF. The mechanisms underlying these phenomena are not entirely clear. Some cortical neurons tuned to these edge frequencies exhibited increased neural gain and synchrony, and direct stimulation of thalamocortical afferents demonstrated that hearing loss caused an increase in gain within the local cortical circuit. But the frequency tuning of the stimulated afferents was unknown and, thus, it is difficult to separate the effects that were cortical in origin from those that were inherited from lower levels. It is possible that the altered neural representation of spectral features that we observed in the IC results in changes in the coactivation patterns across the cortical network, prompting the plastic reorganization in the cortex. Future research should be focused on developing a coherent model of how peripheral and central changes combine to create auditory processing deficits, perhaps through coordinated experiments across many brain areas in a single species.

A new focus for hearing aid design

The complex suprathreshold effects of hearing loss that are evident in the distorted neural signal dynamics observed in this study present a difficult challenge for hearing aid designers. Current hearing aids compensate for changes in the threshold and dynamic range of auditory processing using a framework built around a bank of bandpass filters with automatic gain control. The signal processing that can be performed by such a framework is highly constrained, and it is difficult to imagine how it could be used to compensate for problems such as hypersensitivity to background noise that involve highly nonlinear interactions across frequency bands. It is possible that with a better understanding of exactly how different frequencies are interacting, new signal processing frameworks can be designed to offset the more complex effects of hearing loss. But engineering such a framework that is flexible enough to provide benefit in a wide range of real-world settings will require conceptual advances that may not be forthcoming in the near term.

One alternative approach to improving the perception of speech in noise that is already showing promise is speech enhancement, that is, the suppression of background noise. Hearing aids have offered noise suppression for many years, but the simple filters based on low-order statistics that are used by current devices often provide little real-world benefit (Brons et al., 2014; Cox et al., 2014). Recent research on speech enhancement via DNNs has demonstrated the potential to yield dramatic improvements in performance (Wang, 2017; Luo and Mesgarani, 2019). This ‘deep denoising’ may prove effective in situations with a single talker where it is obvious which sound is of interest. But in others, for example, with multiple talkers, a sound that is of primary interest one minute may become a distraction the next. It may be possible to implement cognitive control to direct enhancement toward the sound of interest but there are many significant technical challenges that must be overcome before this approach can be widely applied (Geirnaert et al., 2021).

A more flexible alternative is to identify optimal processing algorithms for hearing aids empirically by providing DNNs with the data they need to learn how best to transform sounds in order to elicit normal neural activity from an impaired auditory system (Lesica, 2018; Drakopoulos and Verhulst, 2022). By taking advantage of the nonlinear capacity of DNNs with minimal assumptions, it should be possible to identify novel general-purpose algorithms that go well beyond the hand-designed processing in current devices. Such algorithms would be especially valuable in important contexts such as listening to music – a major problem for hearing aid users (Madsen and Moore, 2014) – in which denoising cannot help. There are, of course, limits to the degree of hearing restoration that any hearing aid can provide in cases of severe hearing loss. But the vast majority of people with hearing loss have only mild-to-moderate cochlear damage (Wilson et al., 2017), and there should be sufficient functionality remaining within the auditory system for a hearing aid to leverage when attempting elicit the necessary patterns of neural activity.

Modeling biological neural networks with DNNs

Building computational models of sensory processing has been a long-standing goal in systems neuroscience. Current models of the sensory periphery can be highly accurate. For example, there are numerous models of the cochlea that faithfully capture the transformation of incoming sound into basilar membrane motion and AN activity (Saremi et al., 2016; Verhulst et al., 2018). Models of sensory processing in the brain, however, have generally been much less accurate, with even the best models missing out on a significant fraction of the explainable variance in subcortical and cortical neural activity (Williamson et al., 2016; Rahman et al., 2020; McFarland et al., 2013; Vintch et al., 2015).

Until recently, efforts to model central sensory processing were constrained by the difficulties of fitting high-capacity models with limited experimental data. But deep learning has provided a new approach to fitting models that are well matched to sensory processing, and new methods for large-scale electrophysiology can provide the required big data. Initial efforts to use DNNs to model the sensory periphery have shown that they can be as accurate as hand-designed biophysical models. In one recent study, a DNN trained to replicate an established model of the cochlea provided accurate and generalizable predictions of the model’s response to speech and tones (Baby et al., 2021). In another study, a DNN trained on retinal ganglion cell activity predicted nearly all of the explainable variance in responses to natural images (Maheswaranathan et al., 2019).

DNN models of sensory processing in the brain have also been shown to outperform traditional models. DNN models of V1 responses to natural images explained 50 and 80% of the explainable variance in single-unit spike counts and calcium activity, respectively (Cadena et al., 2019; Walker et al., 2019), while DNN models of V4 explained 90% of the explainable variance in multi-unit spike counts (Bashivan et al., 2019). DNN models of A1 responses to speech and other natural sounds perform similarly well, explaining much of the explainable variance in high-gamma activity, fMRI voxel responses, or time-varying spike rates (Keshishian et al., 2020; Kell et al., 2018; Pennington and David, 2022).

Our results improve on these initial successes in several important ways. Firstly, our models simulate neural activity with full temporal resolution, that is, spike times with millisecond precision. While lower temporal resolution may be sufficient to describe sensory processing in some contexts, precise spike timing carries critical information about speech (Garcia-Lazaro et al., 2013). Secondly, the activity produced by our models is nearly indistinguishable from that recorded experimentally, capturing more than 95% of the explainable variance in many cases. This is especially remarkable considering the full temporal resolution (with lower resolution, variance at fine time scales, which is typically the most difficult to capture, is simply ignored). Finally, our use of a low-dimensional bottleneck allows us to achieve this performance within a framework that also provides a compact and interpretable description of the latent dynamics that underlie the full network activity patterns.

With these advances, it should now be possible to use computational models of the brain for exploratory basic research, with benefits that are both scientific (studies are no longer data limited) and ethical (animal experiments can be limited to confirmatory studies), as well as for technology development (such as improved hearing aids, as described above). With further effort, it may be possible to build models that are not only black-box simulators but also mirror the underlying structure of biological systems (Jazayeri and Ostojic, 2021; Chung and Abbott, 2021). Such models would provide powerful platforms for testing mechanistic hypotheses and developing new ways to address complex network-level dysfunctions that remain difficult to treat (such as tinnitus).

Methods

Experimental protocol

Experiments were performed on 12 young-adult gerbils of both sexes that were born and raised in standard laboratory conditions. Six of the animals were exposed to noise when they were 16–18 weeks old. (These six were chosen from among many that were noise exposed based on the pattern of hearing loss that they exhibited: sloping mild-to-moderate in both ears.) The number of animals used was not predetermined. Because of the investigative nature of the study, the key outcome measures were not known in advance and, thus, a pre-study power analysis based on anticipated effect sizes was not possible. The duration of the data collection from each animal was predetermined based on the results of preliminary experiments in which the amount of neural activity required for manifold analysis and deep learning to yield stable results was assessed. Assignment to the control and hearing loss groups was random on a per-animal basis (i.e., animals from the same litter were often assigned to different groups). Investigators were not blinded during data collection or analysis (since the difference between animals with normal hearing and hearing loss is immediately apparent upon the observation of sound-evoked neural activity), but all analyses were fully automated and objective. ABR recordings and large-scale IC recordings were made from all animals when they were 20–24 weeks old. All experimental protocols were approved by the UK Home Office (PPL P56840C21).

Noise exposure

Mild-to-moderate sensorineural hearing loss was induced by exposing anesthetized gerbils to high-pass filtered noise with a 3 dB/octave roll-off below 2 kHz at 118 dB SPL for 3 hr (Armstrong et al., 2022; Suberman et al., 2011). For anesthesia, an initial injection of 0.2 ml per 100 g body weight was given with fentanyl (0.05 mg per ml), medetomidine (1 mg per ml), and midazolam (5 mg per ml) in a ratio of 4:1:10. A supplemental injection of approximately 1/3 of the initial dose was given after 90 min. Internal temperature was monitored and maintained at 38.7°C.

Preparation for large-scale IC recordings

Animals were placed in a sound-attenuated chamber and anesthetized for surgery with an initial injection of 1 ml per 100 g body weight of ketamine (100 mg per ml), xylazine (20 mg per ml), and saline in a ratio of 5:1:19. The same solution was infused continuously during recording at a rate of approximately 2.2 μl per min. Internal temperature was monitored and maintained at 38.7°C. A small metal rod was mounted on the skull and used to secure the head of the animal in a stereotaxic device. The pinnae were removed and speakers (Etymotic ER-2) coupled to tubes were inserted into both ear canals along with microphones (Etymotic ER-10B+) for calibration. The frequency response of these speakers measured at the entrance of the ear canal was flat (±5 dB) between 0.2 and 8 kHz. Two craniotomies were made along with incisions in the dura mater, and a 256-channel multi-electrode array was inserted into the central nucleus of the IC in each hemisphere (Armstrong et al., 2022). The arrays were custom-designed to maximize coverage of the portion of the gerbil IC that is sensitive to the frequencies that are present in speech.

Auditory brainstem responses

Before beginning the IC recordings, ABRs were measured. Subdermal needles were used as electrodes with the active electrodes placed behind the ear over the bulla (one on each side), the reference placed over the nose, and the ground placed in a rear leg. Recordings were bandpass-filtered between 300 and 3000 Hz. The parallel ABR method (Polonenko and Maddox, 2019) was used, with randomly timed tones at multiple frequencies presented simultaneously and independently to each ear. The tone frequencies were 500, 1000, 2000, 4000, and 8000 Hz. Each tone was five cycles long and multiplied by a Blackman window of the same duration. Tones were presented at a rate of 40 per s per frequency with alternating polarity for 100 s at each intensity. The activity recorded in the 30 ms following each tone was extracted and thresholds for each frequency were defined as the lowest intensity at which the root mean square (RMS) of the median response across presentations was more than twice the RMS of the median activity recorded in the absence of sound.

Sounds presented during IC recordings

Speech

Sentences were taken from the TIMIT corpus (Garofolo, 1993) that contains speech read by a wide range of American English speakers. The entire corpus excluding ‘SA’ sentences was used (approximately 4.5 hr) and split into training and test sets (4.25 hr and 0.25 hr, respectively; not to be confused with the suggested training/test subdivisions in the TIMIT documentation). The training set was presented twice, once on its own and once with background noise. The test set was presented four times, twice in quiet and twice with the same background noise. The intensity for each sentence was chosen at random from 55, 65, 75, or 85 dB SPL. The speech-to-noise ratio (SNR) was chosen at random from either 0 or 10 when the speech intensity was 55 or 65 dB SPL (as is typical of a quiet setting such as a home or an office) or –10 or 0 when the speech intensity was 75 or 85 dB SPL (as is typical of a noisy setting such as a pub). The intensity of the sentences for the two presentations of the test set in quiet were identical, as were the intensity, SNR, and specific noise used for the two presentations of the test set with background noise.

Noise

Background noise sounds were taken from the Microsoft Scalable Noisy Speech Dataset (Reddy et al., 2019), which includes recordings of environmental sounds from a large number of different settings (e.g., café, office, roadside) and specific noises (e.g., washer-dryer, copy machine, public address announcements). A total of 4.5 hr of unique noises were used to match the duration of the presented speech. The intensity of the noise presented with each sentence was determined by the intensity of the speech and the SNR as described above.

Multi-unit activity

MUA was measured from recordings on each channel of the electrode array as follows: (1) a bandpass filter was applied with cutoff frequencies of 700 and 5000 Hz; (2) the standard deviation of the background noise in the bandpass-filtered signal was estimated as the median absolute deviation/0.6745 (this estimate is more robust to outlier values, e.g., neural spikes, than direct calculation); (3) times at which the bandpass-filtered signal made a positive crossing of a threshold of 3.5 standard deviations were identified and grouped into bins with a width of 1.3 ms. Only units with a signal correlation (across repeated trials of the speech in the test set) of 0.2 or higher were used for manifold learning and DNN training (420 ± 24 [mean ± SD] units from each animal out of 512 total channels).

Analysis of recorded neural activity

For each animal, the MUA was represented as an M x T matrix, where M is the number of units and T is the number of time bins. Separate matrices Rtrain , Rtest1 , and Rtest2 were formed for the training set and each repetition of the test set (see ‘Speech above).

Dimensionality of signal manifold

We applied PCA to Rtrain (after subtracting the mean from each row) to obtain the PCs, ranked in order of the amount of neural variance they explain. We projected the activity in Rtest1 onto a chosen number of PCs to obtain the latent dynamics within the manifold spanned by those PCs, yielding a new D x T matrix Xtest1 = ZRtest1 , where Z is the D x M matrix containing the first D PCs. We reconstructed the activity in Rtest1 from the latent dynamics as R^test1 = ZTXtest1 (plus the originally subtracted means) and measured the total variance explained as the ratio of the covariance between Rtest1 and R^test1 and the square root of the product of their variances. We reconstructed the activity in Rtest2 from the same latent dynamics as R^test2 = ZTXtest1 (plus the means of the rows of Rtest2 ) and measured the signal variance explained as the ratio of the covariance between Rtest2 and R^test2 and the square root of the product of their variances. We defined the dimensionality of the signal manifold for each animal based on the number PCs required to explain 95% of the signal variance.

Similarity of signal dynamics

We measured the similarity between the signal dynamics for different animals as the variance explained after linear regression of one set of dynamics Xtest1 onto another Ytest1 = Xtest1β+ ε, where β is a matrix of regression coefficients and ε is a vector of error terms.

Deep neural network models

DNNs were used to transform sound input into neural activity across four stages: (1) a SincNet layer (Ravanelli and Bengio, 2018) with 48 bandpass filters of length 32 samples, each with two learnable parameters (center frequency, bandwidth), followed by symmetric log activations Y=sgn x log(x+1); (2) a stack of five 1-D convolutional layers, each with 128 filters of length 32 samples and stride 2, followed by PReLU activations; (3) a 1-D bottleneck convolutional layer with a specified number of filters of length 32 and stride 1, followed by PReLU activations; and (4) a linear readout layer followed by exponential activations. The only hyperparameter that was varied was the number of filters in the bottleneck layer. For comparison with a linear–nonlinear (LN) model, we used a network with the same stages 1 and 4 and a single convolutional layer between them with 128 filters of length 256 samples and stride 1, followed by PReLU activations.

Training

Models were trained to transform 24,414.0625 kHz sound input frames of length 8192 samples into 762.9395 Hz neural activity frames of length 192 samples (corresponding to temporal decimation by a factor of 5 via the strided convolutions in the encoder block plus a final cropping layer that removed 32 samples at the start and end of each frame to eliminate convolutional edge effects). Sound inputs were scaled such that an RMS of 1 corresponded to a level of 94 dB SPL. Training was performed in MATLAB on a local PC with GPUs (2x NVIDIA RTX 3080) with a batch size of 64 for 10 epochs and took about 8 hr for a typical model. The Adam optimizer was used with a learning rate of 0.0001. The optimization was framed as Poisson regression with loss function M,T(R^R log(R^)) , where R is the recorded neural activity, R^ is the network output, M is the number of units, and T is the number of time bins.

Validation

For each animal, data were split into training and test sets (see ‘Speech’ above). The training set was used to learn the optimal values of the DNN parameters. The final performance of the optimized network was measured on the test set by calculating the percent of the explainable variance in the recorded responses that was explained by the network outputs based on the ratio of the covariance between Rtest1 and R^ and the covariance between Rtest1 and Rtest2 , where R^ is the network output and Rtest1 and Rtest2 are the recorded responses to the two presentations of the test speech.

Analysis of bottleneck activations

For each animal, the activations in the bottleneck layer for all sounds from a given class (e.g., all pure tones or all consonants in noise) were extracted to form the Db x T signal dynamics matrix X, where Db is the number of bottleneck channels and T is the number of time bins. For visualization, we applied PCA to the dynamics in X and projected them onto a chosen number of PCs.

Sounds presented to trained DNNs

Pure tones

100 ms tones with frequencies ranging from 500 Hz to 8000 Hz in 0.2 octave steps; intensities ranging from 25 dB SPL to 100 dB SPL in 5 dB steps; 10 ms cosine on and off ramps; and a 100 ms pause between tones.

SAM noise, fixed modulation depth

100 ms bursts of bandpass noise with cutoff frequencies of 500 and 8000 Hz; a sinusoidal envelope with frequencies ranging from 10 Hz to 240 Hz in 10 Hz steps and a modulation depth of 1; intensities ranging from 25 dB SPL to 100 dB SPL in 5 dB steps; 10 ms cosine on and off ramps; and a 100 ms pause between tones.

SAM noise, fixed modulation frequency

100 ms bursts of bandpass noise with cutoff frequencies of 500 and 8000 Hz; a sinusoidal envelope with a modulation depth ranging from 0.1 to 1 in 20 logarithmic steps and a frequency of 30 Hz; intensities ranging from 25 dB SPL to 100 dB SPL in 5 dB steps; 10 ms cosine on and off ramps; and a 100 ms pause between tones.

Isolated consonants

Speech utterances were taken from the Articulation Index LSCP (LDC Cat# LDC2015S12). Utterances were from 10 American English speakers (five males, five females). Each speaker pronounced consonant-vowel syllables made from all possible combinations of 22 consonants and 13 vowels. For each utterance, the border between the consonant and vowel was identified in a semi-automated manner (a clustering algorithm [MATLAB linkage] was applied to the spectrogram time bins to identify two clusters based on a correlation metric and the border between them was inspected and corrected if needed), values after the end of the consonant were set to zero (with a 2 ms linear ramp), and the utterance was truncated to 200 ms. Utterances were presented in random order with a 175 ms pause between sounds at intensities ranging from 25 dB SPL to 100 dB SPL in 5 dB steps.

Multi-talker speech babble noise

Continuous speech from 16 different British English speakers from the UCL Scribe database (https://www.phon.ucl.ac.uk/resource/scribe) was summed to create speech babble. The intensity of the babble was set based on the intensity of the isolated consonants to achieve a speech-to-noise ratio of 3 dB.

Narrowband target

100 ms bursts of bandpass noise with center frequencies ranging from 500 Hz to 8000 Hz in 0.5 octave steps and a bandwidth of 0.5 octaves; a sinusoidal envelope with a modulation depth of 1 and a frequency of 20 Hz; intensities ranging from 25 dB SPL to 100 dB SPL in 5 dB steps; 10 ms cosine on and off ramps; and a 100 ms pause between tones.

Narrowband noise

100 ms bursts of bandpass noise with center frequencies ranging from 500 Hz to 8000 Hz in 0.5 octave steps and a bandwidth of 0.5 octaves; a pink noise envelope (power scaled as inverse of frequency) with a modulation depth ((peak – trough)/peak) of 1; an intensity matched to that of the narrowband target; 10 ms cosine on and off ramps; and a 100 ms pause between tones.

Hearing aid simulation

A 10-channel wide-dynamic range compression hearing aid was simulated using a program provided by Prof. Johsua Alexander (Purdue University) (Alexander and Masterson, 2015). The crossover frequencies between channels were 200, 500, 1000, 1750, 2750, 4000, 5500, 7000, and 8500 Hz. The intensity thresholds below which amplification was linear for each channel were 45, 43, 40, 38, 35, 33, 28, 30, 36, and 44 dB SPL. The attack and release times (the time constants of the changes in gain following an increase or decrease in the intensity of the incoming sound, respectively) for all channels were 5 and 40 ms, respectively. The gain and compression ratio for each channel were fit individually for each ear of each gerbil using the Cam2B.v2 software provided by Prof. Brian Moore (Cambridge University) (Moore et al., 2010). The gain before compression typically ranged from 10 dB at low frequencies to 30 dB at high frequencies. The compression ratios typically ranged from 1 to 2.5, that is, the increase in sound intensity required to elicit a 1 dB increase in the hearing output ranged from 1 dB to 2.5 dB when compression was engaged.

Representational similarity analysis

For each animal, the signal dynamics matrix X was reshaped to yield X~ , an S x (Db x TS) matrix, where S is the number of sounds from a given class and TS is the number of times bins associated with an individual sound. An S x S representational dissimilarity matrix (RDM) was formed, with each entry equal to one minus the correlation between a pair of rows in X~ . To compute overall representational similarity, the upper triangular values (excluding the diagonal) from two X~ matrices were reshaped into vectors and the correlation between them was computed. For speech, dynamics were averaged across all instances of each consonant before RDMs were computed.

Canonical correlation analysis

To align two sets of signal dynamics, U=XA and V=YB were computed using QR factorization and singular value decomposition (MATLAB cannoncorr), where X and Y are the matrices containing the original dynamics, A and B are the matrices containing the canonical components, and U and V are the aligned dynamics. Overall similarity after alignment was computed as d=1Db ρ(Ud,Vd)(ρ(X^d,X)2+ρ(Y^d,Y)2)/2, with the second term in the product acting as the ‘weight for the correlation associated with each pair of CCs’ that is referred to in the ‘Results.’ Ud and Vd are the projections of X and Y onto the dth pair of canonical components, X^d=X(ad(adTad)1ad) and Y^d=Y(bd(bdTbd)1bd) are the reconstructions of X and Y from the dth pair of canonical components and ρ denotes point-by-point correlation. To jointly align more than two sets of dynamics, multiway CCA was used (de Cheveigné et al., 2019) (MATLAB NoiseTools nt_mcca).

Decoding signal dynamics

For each animal, the signal dynamics matrix X was reshaped such that each row contained the dynamics for one consonant instance. A support vector machine was trained (MATLAB fitcecoc) to identify consonants from signal dynamics with a max-wins voting strategy based on all possible combinations of binary classifiers and tenfold cross-validation.

Data availability

The metadata, ABR recordings, and a subset of the IC recordings analyzed in this study are available on figshare (DOI:10.6084/m9.figshare.845654). We have made only a subset of the IC recordings available because they are also being used for commercial purposes. These purposes (to develop improved assistive listening technologies) are distinct from the purpose for which the recordings are used in this manuscript (to better understand the fundamentals of hearing loss). Researchers seeking access to the full set of neural recordings for research purposes should contact the corresponding author via e-mail to set up a material transfer agreement. The custom code used for training the deep neural network models for this study is available at https://github.com/nicklesica/dnn, (copy archived at Lesica, 2023).

The following data sets were generated
    1. Lesica N
    (2023) figshare
    Data from Sabesan et al., 2023.
    https://doi.org/10.6084/m9.figshare.845654

References

  1. Software
    1. Garofolo JS
    (1993)
    TIMIT acoustic-phonetic continuous speech corpus
    Linguistic Data Consortium.
  2. Book
    1. Humes LE
    2. Dubno JR
    (2010) Factors affecting speech understanding in older adults
    In: Gordon-Salant S, Frisina RD, Popper AN, Fay RR, editors. The Aging Auditory System. Springer. pp. 211–257.
    https://doi.org/10.1007/978-1-4419-0993-0_8
  3. Software
    1. World Health Organization
    (2021)
    World report on hearing
    WHO.
    1. Young ED
    (2012)
    Noise-Induced Hearing Loss: Scientific Advances
    87–135, Neural coding of sound with Cochlear damage, Noise-Induced Hearing Loss: Scientific Advances, Springer, 10.1007/978-1-4419-9523-0.

Decision letter

  1. Björn Herrmann
    Reviewing Editor; Baycrest, Canada
  2. Barbara G Shinn-Cunningham
    Senior Editor; Carnegie Mellon University, United States
  3. Stephen V David
    Reviewer; Oregon Health and Science University, United States

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Distorted neural signal dynamics create hypersensitivity to background noise after hearing loss" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Barbara Shinn-Cunningham as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Stephen V David (Reviewer #2).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) The reviewers raised concerns about the generalizability of the study's approach. The approach hinges on the deep neural network getting things right, such that it generalizes across sounds. The study's claims would be substantially more convincing if the authors included data that validates their model's predictions about coding of speech in noise, tones in noise, and SAM noise in NH vs. HL animals. It appears that they have data in hand for speech and speech-in-noise from the same animals that they could analyze using methods already in the manuscript. If they are unable to validate any of these predictions, the authors should revise the manuscript to emphasize that they remain predictions until they can be validated with additional data in a different study.

2) Several labs have studied changes in inferior colliculus and cortex, but their work is not acknowledged in this manuscript. For example, the work by the Sanes lab at NYU and Polley lab at Harvard have advanced theories around decreased inhibition to accommodate reduced peripheral input. This work has also implicated deficits in temporal processing that do not at the surface appear consistent with the current study (e.g., see Sanes and Yao eLife 2018). The authors would want to place their work in the context of these and other works more clearly.

3) Details about some statistical tests were hard to find (e.g., only in Table S1), but it also appears that the authors still make important statements without statistical justification, for example, related to NH/HL+noise vs. NH/HL+quiet (Figures 6 vs. Figures 7). There are several other cases where conclusions, e.g., about dimensionality, are not supported by a statistical test. The authors would want to make sure that all their conclusions are supported quantitatively.

4) The authors would also want to flesh out the argument for why the same effects would not be present in the nerve.

Reviewer #1 (Recommendations for the authors):

Suggestions for authors:

-at the end of intro, I think you could make it a little more explicit that the DNN is being trained to predict PC responses from sound.

For people who will not be familiar with the practical constraints that necessitate a design with separate groups of normal-hearing and hearing-impaired gerbils, you might state explicitly early on that you are comparing separate groups of NH and HI gerbils.

Figure 1 caption – why are there two sample sizes given in the last sentence?

Line 83 – you might give some flavor as to the noise types that were used

Figure 2A is really well done -- you made a pretty complicated list of comparisons quite straightforward to follow

Line 111 and Figure 2 – I must be missing something, but I don't see how you can approach 100% explained signal variance given the way that I think it is calculated. Doesn't the noise variance show up in the denominator?

Line 130 – I suggest motivating/justifying the additional linear transformation for the reader

Figure 3 – can anything be said about how the left panel of Figure 3d looks pretty different from Figure 2i?

I don't understand what constitutes a "recorded unit". The methods refer to multi-unit activity. Is a unit just an electrode from the array of 512? Or is there spike sorting being performed? How is any non-stationarity in the recordings dealt with (e.g. if neurons die, or the brain moves a little w.r.t. the electrode array)?

Line 177 – I found myself wondering how the results would compare to a more conventional model with a STRF for each neuron. I suspect lots of people will wonder how the DNN compares to something like that.

Lines 183-185 – give numbers for the similarity, to be parallel to earlier sections

Lines 219-220 – the "clustering of dynamics" referred to here was not all that evident to this reviewer from eyeballing the figure – please make what you mean more explicit, and clarify how this is different from refs 18 and 19

Lines 232-233 – I recommend making the RDMs more conventional and just having more of them in the figure – I think people will find the asymmetry confusing when they page through the paper

Line 241 – are the numbers mean and SD? Please specify.

Line 246 – I didn't completely understand what would constitute a distortion to the "overall structure of the dynamics" – could you give an example?

Multiple figures – I don't think the asterisks are defined clearly, and I believe the mean different things in different figures. Please label more explicitly, and/or mention in each caption.

Line 287 – I found myself wondering about the possible effect of phase shifts or increases in response latency, which one might imagine could occur with hearing loss. I think the analysis would be highly vulnerable to this, especially given that the encoding of modulation is partly synchrony-based. The fact that the modulation analysis shows pretty similar results for NH and HI suggests there is not much of this, but some readers may wonder about this.

At several points throughout the paper, I found myself wondering about the effects of compression. I would have been greatly interested to see an analysis that separately manipulated compression (e.g., turning it off), to see how much benefit it produces on restoring normal-like responses. I also would have liked to see some discussion of its effects.

Line 344 – for this analysis, I was hoping to have chance levels explicitly stated, and ideally labeled on the graph.

Figure 7d – this panel is pretty confusing, partly because the SPL numbers are inside particular plots, so it is not completely clear what they apply to, and partly because the little numbers in the plots are not labeled or defined anywhere.

Line 404-406 – how does this jive with the findings of distorted tonotopy from the Heinz lab?

Line 408 and onwards – the text refers to CCs but the figure is labeled as PCs

Line 438 – why is this coherence rather than correlation?

Line 457-459 – these lines state the conclusions of the paper, but I think they could be more explicitly linked to the results that come earlier. Explain why the distortions are nonlinear, and explain why the effects involve cross-frequency interactions.

Lines 472-473 – the statement here (and the earlier one on line 33) seems a little too strong given the widespread prevalence of noise reduction, and the widespread use of speech in noise diagnostics in audiometric evaluations

Line 484 and earlier – can the clustering be explained merely by audibility (e.g., all the stimuli that are inaudible cluster together, for the uninteresting reason that they do not evoke a response)?

Line 496 – the claim here needs a reference

Line 511 – I wanted to know more about the absence of evidence of clustering in nerve responses. This seems critical.

Line 586 and onwards – I think the conclusions/suggestions here should be tempered given that there are almost surely going to be limits to how well DNN models trained in this way will generalize to arbitrary stimuli. And you might acknowledge some of these limitations.

Line 606 – I think it might be helpful to specify what a material transfer agreement would involve – does this just mean someone agrees not to share it with anyone else?

Line 691 – why is "/ 0.6745" here? Is this a typo?

Line 697 – what is a "unit"?

Line 768 – I wondered whether setting values to 0 without any windowing might induce artifacts…

Line 784 – it seems plausible that the hearing aid settings are suboptimal. In particular, the extent of compression is based on some informal optimization in humans. Could this partly explain the less than complete restoration of normal responses?

Line 810 – it would help to link this to the weights that are described in the Results section. It took me a couple reads to make the connection.

Overall, the statistical tests and quantitative comparisons are somewhat buried. There are a lot of statistical comparisons via color map (i.e., Figures 2H-I and 3D) where a scatter or bar plot with error bars might be more helpful.

Reviewer #2 (Recommendations for the authors):

1. While there are many open questions around central deficits following hearing loss, several labs have studied changes in IC and cortex, but their work is not acknowledged in this manuscript. In particular the Sanes lab at NYU and Polley lab at Harvard have advanced theories around decreased inhibitory tone to accommodate diminished bottom-up drive. Relevant to the current study, this work has implicated deficits in temporal processing that do not at the surface appear consistent with the current study (eg, see Sanes and Yao eLife 2018). Hearing loss and the neural coding of sound are complex, so the concern is not about the validity of the current results as much as how they fit in with the existing literature. Currently, the manuscript reads as if this previous work was never completed, and that issue should be addressed.

2. In general, the results of fairly sophisticated analyses are presented clearly, which is great. After some hunting, it was possible to find important details about some statistics in Table S1, but it appears that the authors still make important statements without statistical justification. Of particular importance to the main conclusions, the increased dissimilarity for NH/HL+noise vs. NH/HL+quiet (Figure 6 vs. Figure 7) needs to be demonstrated by a quantitative comparison between them. Table S1 doesn't appear to contain anything about comparisons between data in the different figures. Please provide quantitative support for the statement that "… neither was sufficient to bring the similarity close to normal levels" (Line 379). There are several other cases where conclusions, eg, about dimensionality, are not supported by a statistical test. The authors should make sure that all their conclusions are supported quantitatively. It would also

3. The performance of the DNN is impressive, providing a reasonable motivation for the subsequent analysis of "bottleneck PCs" for activity simulated by the model. However, one worries that since the models were not fit to stimuli tested in the simulation, that the results may not actually be reciprocated in actual neural activity. One contrast, in particular (speech in quiet vs. speech in noise), was actually collected experimentally, and it seems like the authors could validate their decoding analysis with the actual neural data. Can't the neural responses be projected back into the bottleneck space and be used to decode the same way as the DNN simulations? Such an analysis would substantially strengthen the study. Alternatively, the authors should include a caveat in the Discussion that the DNN simulations may not actually generalize to actual neural activity. The authors may wish to argue that this is a small concern, but the finding of such low-dimensional PC bottleneck is quite surprising, and it's not clear if dimensionality would be as small if the actual stimuli (pure tones, SAM noise) were included in the fit set.

https://doi.org/10.7554/eLife.85108.sa1

Author response

Essential revisions:

1) The reviewers raised concerns about the generalizability of the study's approach. The approach hinges on the deep neural network getting things right, such that it generalizes across sounds. The study's claims would be substantially more convincing if the authors included data that validates their model's predictions about coding of speech in noise, tones in noise, and SAM noise in NH vs. HL animals. It appears that they have data in hand for speech and speech-in-noise from the same animals that they could analyze using methods already in the manuscript. If they are unable to validate any of these predictions, the authors should revise the manuscript to emphasize that they remain predictions until they can be validated with additional data in a different study.

We understand the concern about generalization to out-of-sample inputs. A model trained on one set of inputs may not necessarily produce accurate simulations of responses to a different set of inputs. Thus, in order for our analysis of model latent representations of non-speech sounds to be compelling, we must demonstrate that the model responses to such sounds are an accurate simulation of the true responses.

(Some of the reviewer comments seem to suggest that we also need to demonstrate the model’s accuracy for speech and speech-in-noise sounds, but this is already shown in the original Figure 3c; the performance shown is for speech and speech-in-noise sounds that were not part of the training set.)

The non-speech sounds that we used in the study were pure tones and SAM noise. For pure tones, we were able to verify that the models generalized well by using recordings from the same animals on which the models were originally trained (without including the tone responses in training). For SAM noise, we did not have recordings from the original animals. However, this provided an opportunity to further test the generality of the model by using transfer learning to predict responses from new animals for which we had responses to SAM noise as well as a small sample of speech. We froze the DNN encoder after training on animals from the original dataset and retrained only the linear readout using the speech responses for the new animals. We then tested the ability of the updated models to predict SAM noise responses for the new animals, and they performed well. This new work is described in the revised Results (see new Figure 3f,g and associated text).

The one class of sounds for which we were not explicitly able to validate the models is hearing aid-amplified speech. While these sounds are not qualitatively different from standard speech and we have no particular reason to believe the model predictions for these sounds would be inaccurate, we have added a note to the text to indicate the lack of validation.

2) Several labs have studied changes in inferior colliculus and cortex, but their work is not acknowledged in this manuscript. For example, the work by the Sanes lab at NYU and Polley lab at Harvard have advanced theories around decreased inhibition to accommodate reduced peripheral input. This work has also implicated deficits in temporal processing that do not at the surface appear consistent with the current study (e.g., see Sanes and Yao eLife 2018). The authors would want to place their work in the context of these and other works more clearly.

We have tried to do more to place our results within the context other related studies. The study from the Polley lab that is most closely related to ours is McGill et al. (2022) in which they study the downstream effects of mild-to-moderate sensorineural hearing loss. (Other studies from the Polley lab use a model of extreme neuropathy, which is too different from mild-to-moderate sensorineural hearing loss to allow for meaningful comparisons.) The key findings of McGill et al. that relate to our study are (1) that hearing loss induces behavioral hypersensitivity for detection of tones at the frequencies around which the hearing loss increases from mild to moderate; (2) that these frequencies are overrepresented after rearrangement of the cortical tonotopic map; and (3) that (some) cortical neurons located in this region of the tonotopic map exhibit increased gain and synchrony in their responses.

Our work does not investigate the circuit-level mechanisms that underlie the observed effects of hearing loss (e.g., bottom-up drive vs. local E-I balance). The work from the Sanes lab is focused on these mechanisms and it is difficult for us to see how further consideration of our results in conjunction with theirs can lead to additional insights. The specific study suggested by Reviewer 2, Yao and Sanes (2018) is focused on developmental hearing loss, which makes it even more difficult to compare with our work. Also, the reviewer suggests that their results are somehow inconsistent with ours, but they are not. Their abstract states “We found that developmental HL … did not alter brainstem temporal processing.” Our results also suggest that HL does not alter brainstem temporal processing, and this is consistent with many other studies that have found that HL does not impact temporal processing in the early auditory pathway (see Parida and Heinz (2022) for another recent example). Understanding how temporal processing deficits arise at the level of the cortex after hearing loss is not something that our work can help with; for that we must continue to look to the Sanes lab and others who are focused on such questions.

What we can do is try to synthesize our results with others related to mild-to-moderate sensorineural hearing loss from the auditory nerve and cortex in order to better understand the transformation that takes place along the way. We have added a new section to the Discussion “Distorted spectral processing from cochlea to cortex” along these lines. Perhaps the most salient point we can take from this exercise is the recognition that coordinated studies are needed to develop a coherent picture.

3) Details about some statistical tests were hard to find (e.g., only in Table S1), but it also appears that the authors still make important statements without statistical justification, for example, related to NH/HL+noise vs. NH/HL+quiet (Figures 6 vs. Figures 7). There are several other cases where conclusions, e.g., about dimensionality, are not supported by a statistical test. The authors would want to make sure that all their conclusions are supported quantitatively.

We have added statistical tests to support our assertion that the distortions in signal dynamics caused by hearing loss are more pronounced for speech in noise than for speech in quiet (Figures 6 and 7). Whether compared at best intensity or after amplification with a hearing aid, and whether measured via RSA or CCA, the distortions were much smaller for speech in quiet than for speech in noise, with all differences highly significant (the largest p-value was less than 1e-6). We have included this information in the revised Results.

We have also added distribution plots and statistical tests to support our assertion that the signal dynamics differ between pairs of animals with normal hearing and hearing loss more than between pairs of animals with the same hearing status (new Figures 2i and 3e). Whether based on the signal manifold as identified via PCA or via DNN, signal dynamics were much more similar for pairs of animals with the same hearing status than for pairs of animals with different hearing status.

Reviewer 2 also suggested statistical tests in two instances where we did not make explicit comparisons between groups because we did not feel that these comparisons would be informative. But we include the statistical tests here for completeness:

Figure 2c,g. The dimensionality of the signal manifold

We assert only that the signal manifold is low dimensional with both normal hearing and hearing loss. We have added the range of values for the dimensionality of the signal manifold for each group of animals to the revised text. A t-test indicates that the average dimensionality with hearing loss is significantly lower than with normal hearing in the statistical sense (p = 0.04, mean = 4.8 for HL and 6.8 for NH). But since we cannot say whether or not this difference is significant in the functional sense in and of itself (as opposed to the many detailed differences in the signal manifold with and without hearing loss that we go on to analyze in the rest of the study), we did not include it in the Results.

Figure 3c. The predictive power of the DNN model

We assert only that the model performs well for both normal hearing and hearing loss. In fact, the predictive power was generally higher for hearing loss than for normal hearing: separate t-tests for each bottleneck dimensionality (see Figure 3c) yielded p-vales of 0.08, 4e-4, 1e-3, 3e-3, 1e-3, and 1e-3; all but the first of these indicates significantly better performance for hearing loss even after (Bonferroni correction). But, inasmuch as we do not follow up on this difference to understand how it arises, we don’t think it is appropriate to include it in the Results.

4) The authors would also want to flesh out the argument for why the same effects would not be present in the nerve.

Broadly speaking, many of the effects that we see (e.g., hypersensitivity to background noise) are present in the AN. What does not appear to be present in the AN is the specific form of distorted spectral processing that we observe in the IC as a clustering of signal dynamical trajectories within the latent representation of the DNN model.

Providing a definitive answer to the question of why these effects are present in the IC and not the AN is beyond us (though we added some speculative ideas to the new section in the revised Discussion). We can, however, provide more explicit evidence that the distortions in spectral processing in the AN and the IC are, in fact, different. To do this, we simulated AN responses to pure tone sounds with and without hearing loss, as suggested by Reviewer 1, and performed the same analyses of the signal dynamics as we did for the IC.

We found no evidence in the AN of the clustering of dynamical trajectories that is present in the IC. (In fact, the effects of hearing loss on spectral processing in the AN as revealed through this analysis appear to be much more complex than in the IC). We also demonstrated that the clustering of dynamical trajectories that we observed in the latent representation of the IC DNN model was also evident in experimental IC responses. These new analyses are described in Figure 4 —figure supplement 1 of the revised manuscript.

Reviewer #1 (Recommendations for the authors):

Suggestions for authors:

Figure 1 caption – why are there two sample sizes given in the last sentence?

It is just one (large) sample size: n = 544,362.

Line 83 – you might give some flavor as to the noise types that were used

This information is in the Methods.

Line 111 and Figure 2 – I must be missing something, but I don't see how you can approach 100% explained signal variance given the way that I think it is calculated. Doesn't the noise variance show up in the denominator?

With the full complement of PCs, all of the variance in a dataset can be fully explained. So, in general, it should be no surprise that a subset of that variance can also be fully explained; in fact, it would be impossible for this not to be the case. It is, however, potentially surprising that so much of the signal variance can be explained with so few PCs.

I don't understand what constitutes a "recorded unit". The methods refer to multi-unit activity. Is a unit just an electrode from the array of 512? Or is there spike sorting being performed? How is any non-stationarity in the recordings dealt with (e.g. if neurons die, or the brain moves a little w.r.t. the electrode array)?

Spike sorting was not performed, but the extraction of multi-unit activity involved some processing. This is described in the Methods. Non-stationarity was ignored.

Line 287 – I found myself wondering about the possible effect of phase shifts or increases in response latency, which one might imagine could occur with hearing loss. I think the analysis would be highly vulnerable to this, especially given that the encoding of modulation is partly synchrony-based. The fact that the modulation analysis shows pretty similar results for NH and HI suggests there is not much of this, but some readers may wonder about this.

RSA is insensitive to phase shifts; CCA is not. This difference is part of the motivation for using both methods rather than just one or the other.

At several points throughout the paper, I found myself wondering about the effects of compression. I would have been greatly interested to see an analysis that separately manipulated compression (e.g., turning it off), to see how much benefit it produces on restoring normal-like responses. I also would have liked to see some discussion of its effects.

We did an extensive analysis of the effects of compression on the neural coding of speech in a previous paper (Armstrong et al., Nat Biomed Eng., 2022).

Figure 7d – this panel is pretty confusing, partly because the SPL numbers are inside particular plots, so it is not completely clear what they apply to, and partly because the little numbers in the plots are not labeled or defined anywhere.

We understand that this figure can be confusing, but the formatting and labelling are exactly the same as in all of the previous figures, e.g., 4d and 5d. We tried several other designs for this figure, but none were judged to be better.

Line 404-406 – how does this jive with the findings of distorted tonotopy from the Heinz lab?

The relationship between our findings and the distorted tonotopy that has been observed in the auditory nerve is considered in detail in the Discussion. To our knowledge, the potential for (frequency-weighted) amplification to mitigate the effects of distorted tonotopy on speech coding at the level of the auditory nerve has not been tested.

Line 438 – why is this coherence rather than correlation?

As used here, they are equal. We have relabeled as correlation since that is likely to be more familiar to readers.

Line 484 and earlier – can the clustering be explained merely by audibility (e.g., all the stimuli that are inaudible cluster together, for the uninteresting reason that they do not evoke a response)?

That would be possible, but (1) there is also clustering of responses to low-frequency tones that evoke strong responses and (2) the high-frequency tones do, in fact, elicit a response (see Figure 4d).

Line 496 – the claim here needs a reference

The relevant papers are cited throughout the paragraph.

Line 691 – why is "/ 0.6745" here? Is this a typo?

No. That is the scaling factor required to transform an estimate of mean absolute deviation into an estimate of standard deviation.

(See https://en.wikipedia.org/wiki/Median_absolute_deviation)

Line 697 – what is a "unit"?

The MUA-processed signal from one recording channel.

Line 784 – it seems plausible that the hearing aid settings are suboptimal. In particular, the extent of compression is based on some informal optimization in humans. Could this partly explain the less than complete restoration of normal responses?

We agree that it is plausible that the hearing aid settings are suboptimal. But we think it is unlikely that this suboptimal fitting is the main reason why hearing aids are unable to restore neural responses to normal. We did an extensive analysis of the effects of hearing aids on the neural coding of speech in a previous paper (Armstrong et al., Nat Biomed Eng., 2022).

https://doi.org/10.7554/eLife.85108.sa2

Article and author information

Author details

  1. Shievanie Sabesan

    Ear Institute, University College London, London, United Kingdom
    Contribution
    Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
  2. Andreas Fragner

    Perceptual Technologies, London, United Kingdom
    Contribution
    Conceptualization, Software, Supervision, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
  3. Ciaran Bench

    Ear Institute, University College London, London, United Kingdom
    Contribution
    Software, Investigation, Methodology
    Competing interests
    No competing interests declared
  4. Fotios Drakopoulos

    Ear Institute, University College London, London, United Kingdom
    Contribution
    Software, Investigation, Methodology
    Competing interests
    No competing interests declared
  5. Nicholas A Lesica

    Ear Institute, University College London, London, United Kingdom
    Contribution
    Conceptualization, Software, Supervision, Funding acquisition, Investigation, Methodology, Writing – original draft, Writing – review and editing
    For correspondence
    n.lesica@ucl.ac.uk
    Competing interests
    is a co-founder of Perceptual Technologies
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5238-4462

Funding

Wellcome Trust (200942/Z/16/Z)

  • Shievanie Sabesan
  • Nicholas A Lesica

Engineering and Physical Sciences Research Council (EP/W004275/1)

  • Ciaran Bench
  • Fotios Drakopoulos
  • Nicholas A Lesica

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. For the purpose of Open Access, the authors have applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.

Acknowledgements

The authors thank A de Cheveigné, S Ostojic, JÁ Gallego, and F Bruford for their advice. This work was supported by a Wellcome Trust Senior Research Fellowship (200942/Z/16/Z) and a grant from the UK Engineering and Physical Sciences Research Council (EP/W004275/1). The funding sources were not involved in study design, data collection and interpretation, or the decision to submit the work for publication.

Ethics

All experimental protocols were approved by the UK Home Office (PPL P56840C21). Every effort was made to minimize suffering.

Senior Editor

  1. Barbara G Shinn-Cunningham, Carnegie Mellon University, United States

Reviewing Editor

  1. Björn Herrmann, Baycrest, Canada

Reviewer

  1. Stephen V David, Oregon Health and Science University, United States

Publication history

  1. Preprint posted: October 7, 2022 (view preprint)
  2. Received: November 22, 2022
  3. Accepted: April 27, 2023
  4. Accepted Manuscript published: May 10, 2023 (version 1)
  5. Version of Record published: May 22, 2023 (version 2)

Copyright

© 2023, Sabesan et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 185
    Page views
  • 35
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Shievanie Sabesan
  2. Andreas Fragner
  3. Ciaran Bench
  4. Fotios Drakopoulos
  5. Nicholas A Lesica
(2023)
Large-scale electrophysiology and deep learning reveal distorted neural signal dynamics after hearing loss
eLife 12:e85108.
https://doi.org/10.7554/eLife.85108

Further reading

    1. Neuroscience
    Abraham Katzen, Hui-Kuan Chung ... Shawn R Lockery
    Research Article Updated

    In value-based decision making, options are selected according to subjective values assigned by the individual to available goods and actions. Despite the importance of this faculty of the mind, the neural mechanisms of value assignments, and how choices are directed by them, remain obscure. To investigate this problem, we used a classic measure of utility maximization, the Generalized Axiom of Revealed Preference, to quantify internal consistency of food preferences in Caenorhabditis elegans, a nematode worm with a nervous system of only 302 neurons. Using a novel combination of microfluidics and electrophysiology, we found that C. elegans food choices fulfill the necessary and sufficient conditions for utility maximization, indicating that nematodes behave as if they maintain, and attempt to maximize, an underlying representation of subjective value. Food choices are well-fit by a utility function widely used to model human consumers. Moreover, as in many other animals, subjective values in C. elegans are learned, a process we find requires intact dopamine signaling. Differential responses of identified chemosensory neurons to foods with distinct growth potentials are amplified by prior consumption of these foods, suggesting that these neurons may be part of a value-assignment system. The demonstration of utility maximization in an organism with a very small nervous system sets a new lower bound on the computational requirements for utility maximization and offers the prospect of an essentially complete explanation of value-based decision making at single neuron resolution in this organism.

    1. Neuroscience
    Yuan-hao Wu, Ella Podvalny, Biyu J He
    Research Article Updated

    While there is a wealth of knowledge about core object recognition—our ability to recognize clear, high-contrast object images—how the brain accomplishes object recognition tasks under increased uncertainty remains poorly understood. We investigated the spatiotemporal neural dynamics underlying object recognition under increased uncertainty by combining MEG and 7 Tesla (7T) fMRI in humans during a threshold-level object recognition task. We observed an early, parallel rise of recognition-related signals across ventral visual and frontoparietal regions that preceded the emergence of category-related information. Recognition-related signals in ventral visual regions were best explained by a two-state representational format whereby brain activity bifurcated for recognized and unrecognized images. By contrast, recognition-related signals in frontoparietal regions exhibited a reduced representational space for recognized images, yet with sharper category information. These results provide a spatiotemporally resolved view of neural activity supporting object recognition under uncertainty, revealing a pattern distinct from that underlying core object recognition.