1. Neuroscience
Download icon

Signal integration at spherical bushy cells enhances representation of temporal structure but limits its range

  1. Christian Keine  Is a corresponding author
  2. Rudolf Rübsamen
  3. Bernhard Englitz
  1. University of Iowa, United States
  2. University of Leipzig, Germany
  3. Radboud University, Netherlands
Research Advance
  • Cited 5
  • Views 477
  • Annotations
Cite this article as: eLife 2017;6:e29639 doi: 10.7554/eLife.29639

Abstract

Neuronal inhibition is crucial for temporally precise and reproducible signaling in the auditory brainstem. Previously we showed that for various synthetic stimuli, spherical bushy cell (SBC) activity in the Mongolian gerbil is rendered sparser and more reliable by subtractive inhibition (Keine et al., 2016). Here, employing environmental stimuli, we demonstrate that the inhibitory gain control becomes even more effective, keeping stimulated response rates equal to spontaneous ones. However, what are the costs of this modulation? We performed dynamic stimulus reconstructions based on neural population responses for auditory nerve (ANF) input and SBC output to assess the influence of inhibition on acoustic signal representation. Compared to ANFs, reconstructions of natural stimuli based on SBC responses were temporally more precise, but the match between acoustic and represented signal decreased. Hence, for natural sounds, inhibition at SBCs plays an even stronger role in achieving sparse and reproducible neuronal activity, while compromising general signal representation.

https://doi.org/10.7554/eLife.29639.001

Introduction

Acoustically evoked inhibition plays a crucial role in shaping the neuronal activity already at the first central station of the auditory system (Caspary et al., 1994Kopp-Scheinpflug et al., 2002; Dehmel et al., 2010Kuenzel et al., 2011; Keine and Rübsamen, 2015; Keine et al., 2016). In a previous study, we demonstrated that inhibition on spherical bushy cells (SBCs) renders their output sparser and more reproducible than their auditory nerve fiber (ANF) input (Keine et al., 2016). These transformations persist over a wide range of acoustic stimuli and sound intensities, and can be approximated by an inhibition which we modelled as a scaled subtraction. Functionally, this inhibition emphasizes reliable events and controls the response gain across a wide range of sound levels.

Since in most previous studies, the neuronal activity was recorded either during simple or complex, but synthetic acoustic stimuli, it remains unknown if the inhibitory effect on sound encoding persists also in natural acoustic environments. Here, we extend the range of tested stimuli to natural sounds approximating a gerbil’s environment. In such a natural context, the inhibitory influence on the SBC output activity proves even stronger than under complex, but non-natural stimulus conditions tested before. In particular, while for most synthetic stimuli the SBC firing rates generally increase, they remained constant under natural acoustic stimulation.

While transformations of this kind can emphasize certain aspects of the sensory input, they may also deemphasize others. We study this trade-off directly by performing stimulus reconstruction from the population of cells, a technique which has already been successfully applied in cortical recordings (Stanley et al., 1999; Mesgarani et al., 2009). In contrast to single-cell analysis, this population-based technique provides an estimate of the overall stimulus information available in a group of neurons. Reconstructions of this type have previously been successfully employed to identify the effect of attention on the neural response (Mesgarani and Chang, 2012). We employ the respective reconstructions to analyze the effect of inhibition in the represented stimulus spectrogram.

We find stimulus reconstructions based on the ANF input to be overall more accurate than those from the SBC output. Surprisingly, the share of ANF inputs which are not transmitted to the SBC output (partly blocked by inhibition, Kuenzel et al., 2011; Keine and Rübsamen, 2015; Keine et al., 2016) delivered the most accurate reconstructions, but were temporally more variable. We find that the overall fidelity of representing the stimulus is reduced in SBCs due to the inhibitory gain control, resulting in a lower variance in stimulus reconstruction. Consistent with their role in sound localization, the SBCs’ inhibition-shaped output appears to be more focused on restricted, short-term signal representation during variable stimulus conditions, while the overall information about the stimulus is reduced.

Results

We recorded from spherical bushy cells (SBCs) in the AVCN of anesthetized gerbils in vivo to understand the influence of neuronal inhibition on the encoding of complex acoustic sounds. For this purpose, we presented real environmental sounds (e.g. walking on gravel, singing birds), and performed an explicit population decoding which allows the analysis of the stimulus representation in its original form.

Modulation of SBC responses under natural acoustic stimulation

To directly investigate the transformation at the ANF-SBC synapse, the neuronal response was separately analyzed for EPSPs which trigger an output spike (EPSPsucc) and EPSPs which fail to trigger SBC activity (EPSPfail) as described previously (Figure 1A) (Keine et al., 2016). As shown before, these failures of transmission are mostly caused by inhibition at the ANF-SBC junction (Kuenzel et al., 2011; Keine and Rübsamen, 2015; Keine et al., 2016). The acoustic stimulus was composed of seven different segments of varying spectral breadth and featuring rapid amplitude modulations (Figure 1B+C). First, we recorded the change in firing activity of ANF input and SBC output for simple pure tones at the units’ characteristic frequency. Both ANF input and SBC output rates increased during pure tone stimulation (ANFspont = 71.3 ± 31.8 Hz vs. ANFtone = 234.3 ± 53.9 Hz; Δ = 163 ± 59.3 Hz, p<0.001; SBCspont = 43.4 ± 18.3 Hz vs. SBCtone = 97.3 ± 33.9 Hz, Δ = 53.9 ± 27.9 Hz, p<0.001, see Table 1 for additional details of statistical tests, and Figure 1—source data 1, Figure 1Di). During natural acoustic stimulation, the ANF input firing rates also increased (ANFspont = 71.3 ± 31.8 Hz vs. ANFnatural = 129.7 ± 38.4 Hz, Δ = 58.4 ± 25.5 Hz, p<0.001), however, contrary to pure tone stimulation, the SBC output activity remained unchanged (SBCspont = 43.4 ± 18.3 Hz vs. SBCnatural = 43.1 ± 22.1 Hz, Δ = 0.3 ± 15.5 Hz, p=0.9, Figure 1C for representative trace and Figure 1Dii for population data). This effect was persisted when the different stimulus segments were analyzed separately (Figure 1Diii). While the increase in SBC firing rates for tones is consistent with previous studies using various synthetic stimuli (Kopp-Scheinpflug et al., 2002; Kuenzel et al., 2011; Keine and Rübsamen, 2015; Keine et al., 2016), the constancy for natural stimulation has not been demonstrated before.

Acoustic stimulation with environmental sounds increases auditory nerve firing while leaving the SBC firing rates constant.

(A) Representative voltage trace of SBC recording during acoustic stimulation. Voltage signals could be divided into EPSP followed by postsynaptic AP (blue dots) and EPSPs which fail to trigger an AP (gray dots). The sum of both types of events comprised the ANF input. (B) The acoustic stimulus was composed of a series of environmental sounds (i.e. rain falling, walking on gravel, bird singing, shoveling sand, ripping grass, walking through forest, walking on fallen leaves), containing naturally occurring frequency and amplitude modulations. Top: amplitude profile; bottom: spectrogram. The stimulus was presented at a mean sound pressure level of 40 dB SPL. PSD = power spectral density. (C) Firing rates of one representative cell during acoustic stimulation (CF = 2.6 kHz, bin size 0.1 s). Note that the firing rate fluctuations at the ANF level (orange) are higher than at the SBC level (blue). Arrows at the right indicate spontaneous firing rates in the absence of sound. (D) Population data on firing rate changes for ANF input and SBC output during pure tone and natural sound stimulation. (i) Pure tone stimulation at the units’ characteristic frequency resulted in a firing rate increase in both ANF input and SBC output. (ii) While during stimulation with natural sounds, the average ANF firing increased comparably to pure tone stimulation, the average SBC firing remains at the level of spontaneous activity. (iii) This effect was consistently observed throughout different stimulus segments of the natural sound stimulus. Horizontal lines indicate the average spontaneous firing rate in the absence of sound.

https://doi.org/10.7554/eLife.29639.002
Table 1
Summary of statistical comparisons.
https://doi.org/10.7554/eLife.29639.004
SourceParameterGroup 1 (mean ± SD)Group 2 (mean ± SD)Test statisticsStatistical test
Figure 1
Panel Di
(pure tones)
Firing Rate ANFSpont = 71.3 ± 31.8 HzStim = 234.3 ± 53.9 Hzt(df = 31) = 15.5, p=3.5e-16, U1=0.87paired t test
Firing Rate SBCSpont = 43.4 ± 18.3Stim = 97.3 ± 33.9t(31) = –10.9, p=3.5e-12, U1=0.59paired t test
Panel Dii
(natural sounds)
Firing Rate ANFSpont = 71.3 ± 31.8 HzStim = 129.7 ± 38.4 Hzt(31) = 12.9, p=4.9e-14, U1=0.3paired t test
Firing Rate SBCSpont = 43.4 ± 18.3Stim = 43.1 ± 22.1t(31) = 0.12, p=0.9, U1=0.03paired t test
Figure 2
Panel AThreshold EPSPSpont = 7.5 ± 2.4 V/sStim = 9 ± 2.8 V/st(31) = 10.2, p=1.8e-11, U1=0.1paired t test
Panel BFailure FractionSpont = 0.36 ± 0.2Stim = 0.65 ± 0.17t(31) = 16.3, p=9.1e-17, U1=0.28paired t test
Panel CiSparsityANF input = 0.09 ± 0.05SBC output = 0.17 ± 0.07t(31) = 6.9, p=8.2e-8, U1=0.17paired t test
Panel DiReproducibilityANF input = 0.13 ± 0.09SBC output = 0.28 ± 0.18Z(N=32) = 4.9, p=8.8e-7, U1=0.2Wilcox. sig.-rank
Figure 3
Panel K







Corr. SectionANF = 0.66 ± 0.14EPSP = 0.71 ± 0.11Z(N=10) = 2.8, p=0.006*, U1=0.1Wilcox. sig.-rank
Corr. SectionANF = 0.66 ± 0.14SBC = 0.56 ± 0.16Z(N=10) = 2.8, p=0.006*, U1=0.1Wilcox. sig.-rank
Corr. SectionEPSP = 0.71 ± 0.11SBC = 0.56 ± 0.16Z(N=10) = 2.8, p=0.006*, U1=0.35Wilcox. sig.-rank
Corr. Time SectionANF = 0.68 ± 0.14EPSP = 0.71 ± 0.14Z(N=10) = 2.7, p=0.0117*, U1=0.1Wilcox. sig.-rank
Corr. Time SectionANF = 0.68 ± 0.14SBC = 0.58 ± 0.15Z(N=10) = 2.7, p=0.0117*, U1=0.15Wilcox. sig.-rank
Corr. Time SectionEPSP = 0.71 ± 0.14SBC = 0.58 ± 0.15Z(N=10) = 2.8, p=0.006*, U1=0.25Wilcox. sig.-rank
Corr. Freq. SectionANF = 0.45 ± 0.042EPSP = 0.44 ± 0.071Z(N=10) = 1.6, p=0.39*, U1=0.2Wilcox. sig.-rank
Corr. Freq. SectionANF = 0.45 ± 0.042SBC = 0.34 ± 0.058Z(N=10) = 2.8, p=0.006*, U1=0.85Wilcox. sig.-rank
Corr. Freq. SectionEPSP = 0.44 ± 0.071SBC = 0.34 ± 0.058Z(N=10) = 2.8, p=0.006*, U1=0.5Wilcox. sig.-rank
Factor 1Factor 2
Figure 4
Panel EAutocorrelationHeight: p=2.9e-25Response Type: p=0.002ANOVA, 2-factor
Panel FSpectraMod. Rate: p=1.7e-6Response Type: p=2.1e-51ANOVA, 2-factor
Panel GFreq. XCorrDelta Freq: p=1.1e-14Response Type p=0.023ANOVA, 2-factor
  1. *p-values Bonferroni-corrected for N = 3 comparisons.

The unchanged average firing rates of the SBC output during natural acoustic stimulation was accompanied by an increase in threshold EPSP (Spont = 7.5 ± 2.4 V/s vs. Stim = 9 ± 2.8 V/s, Δ = 1.4 ± 0.8 V/s, p<0.001, see Figure 2—source data 1) and failure fraction (Spont = 0.36 ± 0.2 vs. Stim = 0.65 ± 0.17, Δ = 0.29 ± 0.1, p<0.001, Figure 2A) (see also Keine et al., 2016). Together with previous results, this indicates a strong influence of inhibition, which limits the increase in average SBC firing and thereby effectively regulates the SBC output gain. The quality of the neuronal response was assessed by calculating the sparsity (Figure 2B) and response reproducibility (Figure 2C) for both the complete natural stimulus and the different stimulus segments individually. Consistent with our previous results using synthetic sounds, both sparsity and reproducibility of the SBC output were increased compared to the ANF input (sparsity: ANF input = 0.09 ± 0.05 vs. SBC output = 0.17 ± 0.07, Δ = 0.08 ± 0.06, p<0.001; reproducibility: ANF input = 0.13 ± 0.09 vs. SBC output = 0.28 ± 0.18, Δ = 0.14 ± 0.12, p<0.001). Notably, while the absolute values of sparsity and reproducibility were lower for natural sounds compared to the complex, but synthetic stimuli used previously (Keine et al., 2016), the relative increase of both metrics at the ANF-SBC junction was larger during natural acoustic stimulation (sparsity: natural = 105 ± 98% vs. synthetic = 48 ± 49%, p=0.004; reproducibility: natural = 118 ± 74% vs. synthetic = 80 ± 59%, p=0.024, t-test) (Keine et al., 2016).

Figure 2 with 1 supplement see all
SBC output exhibits increased sparsity and reproducibility compared to ANF input which can be attributed to activity-dependent subtractive inhibition.

(A) During acoustic stimulation the threshold EPSP for AP generation (left) was increased and consequently so was the failure fraction (right), indicating strong inhibition during acoustic stimulation with natural sounds. (B) (i) The sparsity of the neuronal response was separately calculated for the ANF input and the SBC output. The SBC output showed consistently higher sparsity than the ANF input. (ii) The increase in sparsity from the ANF input (orange) to the SBC output (blue) was consistently observed for the different stimulus segments. Simulated (subtractive) inhibition (ANF +SI, green) resulted in similar increases in sparsity. Notably, for conditions in which the sparsity of the ANF input was high (i.e. 'sand'), the SBC output did not increase further. (C) (i) Similar to sparsity, the reproducibility of the neuronal response increased at the SBC level. (ii) Again, this effect was consistent for the different stimulus segments and well approximated by simulating subtractive inhibition.

https://doi.org/10.7554/eLife.29639.005

These results corroborate that acoustically evoked inhibition significantly shapes the SBC output activity and results in sparser and more reproducible SBC firing compared to ANF input. Next, we simulated a rate-dependent subtractive inhibition (see Materials and methods and Figure 2 – supporting Figure 1 for details) and compared the simulation results to the experimental data. We found that the simulated response (Figure 2Bii, green) showed changes similar to the measured SBC output, suggesting that activity-dependent subtractive inhibition is a possible candidate mechanism in shaping SBC output activity during acoustic stimulation.

Dynamic stimulus reconstruction from population responses

While the gain control at the SBC output seems beneficial for the input to coincidence detector neurons in the MSO, this increase in precision and reproducibility might be achieved at the expense of overall stimulus representation. This is already indicated by the flattened rate-level curves in SBCs compared to ANFs (Keine et al., 2016, Figure 3). We therefore estimated how well the neuronal responses of ANFs, SBCs and also the failed EPSPs represent the acoustic stimulus.

Figure 3 with 1 supplement see all
Stimulus representation in the SBC responses is less accurate than in the ANF activity or in the inhibited EPSPs (EPSPfail).

(A) Stimulus reconstruction was performed by estimating linear reconstruction kernels (2) for ANF-, EPSPfail- and SBC responses (1), separately. The respective reconstructions were then used to predict stimulus reconstructions (3), which were then compared to the real stimulus used in the estimation (4). (B) Spectrogram of the real stimulus. The frequency range was reduced from its original range of 16 kHz (see Figure 1B) to the range represented in the ANFs’ receptive fields (0.3–4.4 kHz, CFs shown as blue dots on the left). Color scales are identical for all spectrograms. PSD = power spectral density. (C) The population response of the ANFs sorted by CF (top) and the ANF-reconstructed stimulus (bottom). The global structure and even the envelope fine-structure is preserved in the reconstruction. For more finely resolved spectrograms see Figure 4. (D) The rate of EPSPfail sorted as above (top) and the EPSPfail-reconstructed stimulus (bottom). Again, global structure and envelope fine-structure are preserved with inaccuracies in the overall range. (E) The SBC AP population response (top) and the SBC-reconstructed stimulus (bottom). While the envelope fine-structure appears again preserved, the range of the reconstruction is much more limited, i.e. relatively faint parts appear louder in the reconstruction louder than expected (e.g. around 9 s), and vice versa loud parts appear fainter (e.g. after 6 s). (F) The joint histogram across levels between real (abscissa) and reconstructed (ordinate) for ANF responses. The correlation between the two is evident (compare to grey diagonal representing x = y), with a slight deviation below −23 dB, were the reconstructed stimulus did not cover low enough levels (yellow line indicates linear regression). (G) The EPSPfail-based joint histogram with true level exhibits an overall similar shape as the SBCs (see panel J for detailed comparison). (H) The SBC-based joint histogram is more widely distributed around the diagonal and limited in range (see panel I for detailed comparison). (I) Subtraction of the SBC-based from the ANF-based histogram indicates an increase in width apparent by the negative (blue) margins and the positive (red) spine. (J) Subtracting instead the EPSPfail-based from the ANF-based histogram, leads to a much smaller difference with an even better correlation around the diagonal for EPSPfail (blue parts on diagonal) for low and high levels. (K) The correlation between the real and the reconstructed stimulus was significantly worse for SBCs compared with either ANF or EPSPfail. Correlation was mostly governed by temporal (middle), rather than spectral (right) variations for all three signals (n = 10 cross-validation sections, based on 32 neurons, *p<0.05, **p<0.01, see Table 1 for exact p-values).

https://doi.org/10.7554/eLife.29639.008

Responses of sensory neurons covary with certain aspects of an externally presented stimulus. For single neurons, this covariation is thus often quantified in relation to certain stimulus properties (frequency, sound level, lag, etc.) establishing different types of receptive fields (as in Keine et al., 2016). While informative about single-cell properties, these analyses fail to provide a more complete understanding of the representation on the population level, which assesses the different stimulus aspects jointly. For this purpose, it is convenient to combine the population responses and relate them to the original stimulus representation. One general approach of this kind is stimulus reconstruction (Stanley et al., 1999; Mesgarani et al., 2009; Mesgarani and Chang, 2012), which performs a prediction of the stimulus based on a multitude of neural responses (see Figure 3A for illustration and Materials and methods for details).

Stimulus reconstruction based on the ANF responses (Figure 3C) provided a faithful estimate of the original stimulus (Figure 3B, reconstructed frequency range was restricted to encompass the cell’s receptive fields, see Figure 4 for zoomed samples), in particular representing large fluctuations in sound level. EPSPfail-based reconstructions (Figure 3D) appeared similar with even more pronounced representation of sound level, apparent in the population dynamics (top, grey). The SBC-based reconstruction (Figure 3E), on the other hand, showed decreased overall representation of level dynamics with both high and low levels closer to the average stimulus level (average gray level, see Discussion for a mechanistic interpretation).

Envelope fine-structure properties of the reconstructed stimulus differ between ANF and SBC responses.

(A) A 250 ms snippet of the real spectrogram (Figure 3B) zoomed in. The yellow box indicates a region for comparing the across frequency correlations across the different spectrograms (see G). (B) The reconstructed stimulus from the ANFs shows that temporal features of the stimulus can be reconstructed down to approximately 5–10 ms (visual estimate), which is coarser than the resolution of the spectrogram (2 ms). (C) The EPSPfail reconstruction appears similar to the ANF reconstruction, with an overall smoother appearance. (D) The SBC reconstruction appears more ‘vertical’, that is, with more correlation across frequencies (see G) and less modulation, but otherwise temporally sharp (see E). (E) The temporal precision of the different reconstructions was assessed by the width of the autocorrelation (inset: width of peak), resolved at multiple heights (inset: horizontal lines, black to red) relative to the correlation at Δt = 0 (inset: maximum). The SBC (blue) reconstruction was most precise, while the EPSPfail (grey) was least precise (2-way ANOVA with factors ‘relative height’ and ‘response type’, see panel for p-values, n = 10 stimulus sections). (F) The emphasis of the temporal modulations was overall similar with a significant overrepresentation of the 100–160 Hz in the SBC reconstruction (PSD = power spectral density, 2 SEM shown, however, very small variation, see panel for p-values, black dots indicate regions of significant deviation with False-Discovery-Rates at p<0.001, Benjamini and Hochberg, 1995). (G) The spectral correlation of SBC was larger than for ANFs and EPSPfail for large frequency separations (>2 kHz, see panel for p-values). Correlations were computed for different frequency separations (abscissa), but within each time-bin.

https://doi.org/10.7554/eLife.29639.011

We compared the dynamic range of the individual reconstructions by computing conditional level densities (CLD, see Methods for details). Each entry is a conditional probability Pcond (shown in grayscale) of a level in the reconstruction (ordinate), given that the same spectrotemporal bin in the real stimulus had a given level (abscissa). The CLDs (Figure 3F–H) reflected the differences in level representation (described above) by steeper slopes (yellow lines) for ANFs and EPSPfail. All three slopes were differed from each other, that is the 95% confidence intervals of the slopes are non-overlapping (ANF: [0.54 ,0.542], EPSPfail: [0.6 ,0.602], SBC: [0.427, 0.429]). To directly compare the CLDs, we subtracted pairs of CLDs for two response types (Figure 3I/J). Differences in level representation were particularly salient at the high and low-level edges (high and low level ends).

The overall reconstruction quality was quantified by cross-validated correlation (Figure 3K, Materials and methods) and was lower for SBCs (orange) than for ANFs (blue, p=0.017, Wilcoxon signed rank test, n = 10 stimulus sections, Bonferroni-corrected, same tests and n below, see Figure 3—source data 1). Surprisingly, EPSPfail-based reconstruction quality was better than for ANFs (Figure 3K, left, grey, p=0.006) despite a lower number of spikes compared to the ANF input. Temporal dynamics contributed most to the reconstruction quality (Figure 3K, middle, p<0.02 for all pairwise comparisons), while the frequency representation was comparably inaccurate (Figure 3K, right, p<0.006 for SBC vs. ANF/EPSPfail, but p=0.39 for EPSPfail vs ANF). Importantly, both temporal and frequency correlations were decreased for SBC-based reconstructions (p<0.02 for both). This was at least partially caused by the inhibitory gain control, reflected by a reduction in the variance of the reconstructed stimulus (ANF: 21.1 ± 3.3 dB, EPSPfail: 22.3 ± 4.1 dB, SBC: 15.9 ± 2.0 dB, SBC vs. ANF: p<0.005 and EPSPfail: p<0.005, EPSPfail vs. ANF: p=0.32). While the reconstruction quality could potentially be improved with a larger sample size, notably, 32 ANFs lead to a comparable reconstruction quality as >250 neurons in the primary auditory cortex (Mesgarani et al., 2009). Lastly, reconstructions from SBC responses simulated as ANF responses subjected to subtractive inhibition (as in Figure 2, ANF + SI, see Methods for details and Figure 2—figure supplement 1) were statistically comparable to real SBC reconstructions (Figure 3—figure supplement 1).

While the SBC-based reconstruction indicates an overall less faithful representation of the acoustic stimulus, they also reflect the envelope fine-structure improvement (Figure 4A–D) demonstrated previously in SBC responses (Dehmel et al., 2010; Keine et al., 2016). The autocorrelation of the SBC reconstruction within a frequency band was sharper compared to ANFs and EPSPfail (Figure 4E, p<0.001 for relative height, p<0.002 for response type, 2-way ANOVA), where the autocorrelation width was assessed at different levels relative to the peak of the autocorrelation (abscissa in Figure 4E). This sharpening is probably due to a highlighted frequency range between 100–150 Hz (corresponding to a period of 7–10 ms) in their power-spectrum (Figure 4F, p<0.001 for modulation rate, p<0.001 for response type, 2-way ANOVA on the range of 100–150 Hz), which may correspond to the inhibitory time-constant measured in vivo (~10 ms, Nerlich et al., 2014; Keine and Rübsamen, 2015; Keine et al., 2016). Finally, the correlation across frequencies (within a given time bin) was increased for the SBC output compared to the ANF input (Figure 4G, p<0.001 for frequency distance, p<0.02 for response type, 2-way ANOVA). We interpret this as a focus on temporal events in any frequency location, with a corresponding loss in representing frequency fine-structure (compare also Figure 4B/D).

Taken together, we found that during natural acoustic stimulation, SBC output activity exhibited increased sparsity and reproducibility with SBC firing rates unchanged compared to spontaneous activity. While the SBC output encoded temporal features of the acoustic signal with higher fidelity, the overall stimulus was represented less faithfully compared to the ANF input.

Discussion

The auditory system faces the challenge to localize and identify sounds in complex acoustic environments. We find that already at an initial stage of the central auditory system, the neural representation of natural sounds is conditioned to be sparser and more reproducible at the SBCs compared to its ANF input. This effect was even larger compared to other complex sounds, consistent with theoretical predictions (Lewicki, 2002; Smith and Lewicki, 2006). While signal integration at SBCs thus supports the extraction of temporal features, we found that their ability to represent the stimulus across all stimulus levels is limited in comparison to the ANF input.

The reduction in stimulus representation appears to be largely caused by a compression in the representation, leading to a reduced dynamic range (Figure 3F–H, in particular 3I) which missed out on the low and high levels in the stimulus (Figure 3C–E). This reduction in represented stimulus range resulted in an overall reduced variance in the SBC stimulus reconstruction. We hypothesize that this limitation in reconstruction gain is a consequence of the inhibitory control on the SBCs’ output gain, which flattens their rate-level curves (Keine et al., 2016, Figure 3). The inhibitory modulation at SBCs appears to specifically remove these extreme levels, as indicated by the improved representation based on the blocked ANF inputs only (Figure 3D). Remarkably, the latter representation even improves in comparison with the overall ANF input, indicating that the reduced SBC representation is not due to their lower firing rate in comparison with the ANFs.

All stimulus information available to the auditory system is encoded by the ANF responses and the processing in downstream nuclei will entail emphasizing different subsets of this information. Typically, this has detrimental effects on the non-emphasized part. In the present case, the SBC’s focus on temporal information limits the representation of sound level information. The latter is relevant for loudness-based localization (in the azimuth via ILDs, Galambos et al., 1959; Sanes and Rubel, 1988; Joris and Yin, 1995; Batra et al., 1997), estimation of elevations using spectral cues introduced via head-related transfer functions (Blauert, 1997; Grothe et al., 2010), and potentially for sound identification. The ascending input to the LSO (high-frequency SBCs, Warr, 1966; Glendenning et al., 1985; Shneiderman and Henkel, 1985; Cant and Casseday, 1986; Smith et al., 1993) as well as signal processing in other regions of the cochlear nucleus (e.g. cells in the DCN, Nelken and Young, 1994) may emphasize other features of the acoustic stimulus by integrating information differently. For the processing of stimulus envelope, relevant in ILD-based sound localization in the LSO (for review see Tollin, 2003 and Grothe et al., 2010), the optimal processing strategy appears less clear: improving the temporal precision in representing the envelope may be relevant in addition to accurately representing stimulus level.

The increase in sparsity from ANF to SBC for natural stimuli exceeded the increase for synthetic stimuli (Keine et al., 2016) and the inhibitory gain control was more prominent for natural stimuli, resulting in SBC firing rates comparable to spontaneous activity. Intrinsic properties of natural acoustic (Rieke et al., 1995; Attias and Schreiner, 1997; Nelken et al., 1999; Lewicki, 2002; Hsu et al., 2004; Chechik and Nelken, 2012, reviewed in Theunissen and Elie, 2014) and visual (e.g. Reinagel and Laughlin, 2001) stimuli have been highlighted before to provide specific properties of the neural response (typically efficient coding), potentially through evolutionary adaptation. Mechanistically, we think that the differences in spectrotemporal structure between the stimuli cause this effect in multiple ways.

First, the natural stimuli exceed the artificial stimuli in spectral width in all sections. Hence, both the excitatory and the inhibitory inputs represent integration over larger frequency ranges. In addition, since activation across frequencies is less coordinated than in the more local, synthetic stimuli, the ANF is driven more diversely for natural stimuli, and its response is thus less sparse and reproducible for almost all sections (Figure 2, compared to Figure 8, Keine et al., 2016). Therefore, a similar absolute increase in sparsity, and a smaller absolute increase in reproducibility lead to a larger relative increase in both cases. It remains to be investigated, whether absolute or relative increases are more important for neurons receiving input from SBCs.

Second, the natural stimuli showed a different modulation profile, ranging from little in rain, to nearly complete modulation for the segment sand. Again, this influences the sparsity and reproducibility already at the level of the ANF input, but will also interact with the inhibitory integration properties. Finally, correlations in the spectrogram may contribute to the differences to synthetic stimuli. Similar findings have been reported for the visual cortex (Froudarakis et al., 2014), however, for the auditory brainstem this remains speculation at the current stage.

The integration performed by inhibition could be viewed from the perspective of predictive coding (Rao and Ballard, 1999), where the prediction of expected information is subtracted from the current stimulus representation, in order to minimize the number of transmitted spikes for a certain set of statistics, e.g. natural statistics. Based on the progression of time scales represented along the auditory pathway, it could be speculated that the inhibition provides short-term contextual information against which a change in stimulus statistics could be compared. Hence, the integration may serve to detect low-level changes in the stimulus statistics, similar in principle to the integration of evidence on the cortical level (Boubenec et al., 2017).

In summary, the present study using natural stimuli suggests a possible specific adaptation of the inhibitory gain control, leading to unchanged firing rates and larger increases in sparsity and reproducibility compared to synthetic stimuli. These results also indicate, that using acoustic stimuli resembling natural sounds might be necessary to fully understand properties of synaptic integration and signal processing already at initial stages of the auditory system. However, the detailed relation between natural and synthetic stimuli needs to be further explored using hybrid stimuli which isolate specific natural properties and combine them with synthetic stimuli (e.g. SPORCs in David et al., 2009).

Materials and methods

Animals and surgical procedure

Request a detailed protocol

All experiments were performed at the Neurobiology Laboratories of the Faculty of Bioscience, Pharmacy and Psychology of the University of Leipzig (Germany), approved by the Saxonian District Government Leipzig (TVV 06/09) and conducted according to the European Communities Council Directive (86/609/EEC). Animals were housed in the animal facility of the Institute of Biology with 12 hr light/dark cycle and access to food and water ad libitum. 

In vivo loose-patch recordings were conducted as described previously (Keine et al., 2016). In brief, young adult Mongolian Gerbils aged 6–8 weeks were anesthetized by an intraperitoneal injection of a mixture of ketamine (140 μg/g body weight, Ketamin-Ratiopharm, Ratiopharm, Ulm, Germany) and xylazine hydrochloride (3 μg/g body weight, Rompun, Beyer, Leverkusen, Germany). The animal’s skull was exposed and a brass head post glued to the skull to fix the animal in a custom-built stereotactic apparatus in a prone position. Recording electrodes were pulled from borosilicate glass (GB150F-10, Science Products, Hofheim, Germany) to have impedance of 3–5 MΩ when filled with the pipette solution (in mM): 135 NaCl, 5.4 KCl, 1 MgCl2, 1.8 CaCl2, 5 HEPES, pH adjusted to 7.3 with NaOH. The recording electrode was lowered through a hole in the skull into the anterior portion of the ventral cochlear nucleus (AVCN). High-positive pressure was applied (200 mbar) when passing through non-auditory tissue and reduced to 30 mbar when entering the AVCN. When approaching a cell, the pressure was equalized or slight negative pressure (−5 mbar) applied. Single-units were recorded when exhibiting a positive AP amplitude of at least 2 mV and showing the characteristic complex waveform identifying them as large spherical bushy cells of the rostral AVCN (Pfeiffer, 1966; Winter and Palmer, 1990; Englitz et al., 2009; Typlt et al., 2010).

Acoustic stimulation

Request a detailed protocol

Recordings were performed in a sound-attenuating and electrically isolated chamber (Type 400, Industrial Acoustics, Niederkrüchten, Germany). Acoustic stimuli were generated by custom-written Matlab (RRID:SCR_001622) functions and delivered via a custom-built earphone (DT48, Beyerdynamic, Heilbronn, Germany) positioned just in front of the ear canal. Acoustic stimuli were composed of environmental sounds and consisted of seven segments of length 3.46 s with cos2 amplitude transitions of 460 ms between consecutive segments to prevent unexpected transients (see Supplementary file 1 for the audio file containing the stimulus). The stimulus had a total length of 18.7 s and was presented at least 20 times for each cell at 40 dB SPL with maximal sound intensities of 85 dB SPL.

Data analysis

Request a detailed protocol

Recorded voltage signals were digitized at 97.7 kHz (24 bit, RP2.1, Tucker-Davis Technologies) and filtered between 5 Hz and 7.5 kHz using a zero-phase digital IIR filter. Neuronal signals were detected by the fast upward stroke of the excitatory postsynaptic potential (EPSP) and separated into events which successfully trigger a postsynaptic AP (EPSPsucc) and events that fail to trigger a postsynaptic action potential (EPSPfail). Sparsity and reproducibility of the neuronal response were separately computed for the ANF input (EPSPsucc + EPSPfail) and the SBC output (EPSPsucc).

The sparsity of the response rate was computed as the variance-based method described previously (Rolls and Tovee, 1995; Willmore and Tolhurst, 2001) with sparsity defined as

S=1-rtt2 / r(t)2t

where .t indicates an average over time. Its values range between 0 (maximally dense) and 1 (maximally sparse).

The reproducibility of the neural response was computed in two steps. First, the raw cross-correlation was computed between two trials, divided by (S1 S2/Nbins), that is, the product of the number of spikes in each trial, divided by the number of time bins. Second, the cross-correlation was averaged across all pairs of non-identical trials, and then normalized by the number of such pairs, equal to N (N-1)/2, where N is the total number of trials. Finally, the cross-correlation at time 0 was chosen, and ‘1’ subtracted from it, in order to obtain a measure which equals 0 for a random process with fixed rate (aside from the centering around 0, it is thus very similar to the correlation index by Joris et al., 2006). Values > 0 indicate above chance correlation in the response between trials, beyond what would be expected based on correlation in rate alone. Importantly, reproducibility can attain values >1, with an upper limit determined by the firing rates in the different trials.

Stimulus reconstruction

Request a detailed protocol

The set of responses from all recorded neurons was used to re-estimate the stimulus spectrogram. A linear reconstruction approach was implemented, which amounts to a linear regression between the neural responses as predictors, and the spectrogram of the stimulus, computed as the absolute value of the short-term Fourier transform, as dependent variables (Mesgarani et al., 2009; Mesgarani and Chang, 2012). This kind of population-based approach allows a combination of the overall information available in the neural response for accurately re-estimating the stimulus. Explicit stimulus reconstruction provides a standardized way to compare stimulus representations along stages of a sensory system, here between the ANF, SBC, and failed EPSPs. Clearly, the brain may use different strategies internally to decode or transform this information, although the present approach has proven useful for approximating the resulting percept (Mesgarani et al., 2009; Mesgarani and Chang, 2012).

For this purpose, both the response and the stimulus spectrogram were resampled at 2 kHz. For the stimulus this was achieved by allowing neighboring time-bins to overlap by multiple samples, concretely the stimulus was divided into overlapping sections of 512 samples, starting at round (it×SRsound/SRspectrogram) for each time step it, the acoustic stimulus’ sampling rate SRsound = 97.65625 kHz, and the desired spectrogram sampling rate of 2 kHz. Neighboring stimulus bins are thus not independent, but the match between stimulus and response sampling rate is required for the general estimation procedure. Based on the chosen stimulus representation, no phase-based fine-structure predictions are possible, therefore, all analyses relating to temporal precision thus relate to envelope fine-structure. All displayed spectrograms are shown in the same scale, dB scaled (10 log10).

The reconstruction procedure was carried out as described in detail before (Mesgarani et al., 2009). Briefly, with the stimulus spectrogram denoted by S(t,f), the responses by R(t,n), and the reconstruction kernels per frequency band by gf(τ,n), the assumed response-stimulus relation can be written as

(1) S^(t,f)= τngf(τ,n)R(tτ,n)

The kernels gf can be estimated via the classical normal equation

(2) gf=CRR-1+λ ICRSf

with the cross-correlation term CRSf=RSfT, and the response correlation matrix CRR=RRT, including a ridge regression term to avoid degeneracies in the inversion of CRR (see Mesgarani et al., 2009 for a more detailed derivation) with λ=0.1. During crossvalidation, Equation 1 is then used to reconstruct the stimulus from estimates of gf on the non-predicted stimulus sections. Each gf is based on the activity of all neurons n at a range of delays τ, and thus given by a matrix (see Figure 3A middle bottom for an example). The stimulus was constrained to 300–4400 Hz, slightly extending the range of the characteristic frequencies (CFs, [1.18, 3.05] kHz) of the present sample (see Figure 3B, blue dots indicate the CFs of all cells).

The quality of reconstruction was assessed using correlation coefficients between the original and the reconstructed stimulus. The displayed values (Figure 3) are averaged between the cross-validated (divided into 10 segments) and the insample estimates, based on considerations regarding the influence of noise on estimating model-performance (Ahrens et al., 2008). Further, the original stimulus and the reconstruction were compared using conditional level densities (see below, Figure 3), the width of the autocorrelation (Figure 4E), the frequency spectrum of the activations in the spectrogram (Figure 4F), and the spectral auto-correlation (Figure 4G).

The conditional level densities (CLD) were computed as the joint histogram across levels between the real and the reconstructed spectrogram, normalized by the probability of the level in the real stimulus. The normalization was introduced to highlight the differences that occur mostly at the stimulus extremes, which are less probable in the distribution of levels. Hence, a column in a CLD sums to one, and constitutes the (empirical) conditional probability density of the estimated spectrogram, relative to the original spectrogram. We also computed differences of two CLDs to highlight changes between reconstructions (e.g. Figure 4I,J): the difference here indicates a relative (i.e. between the reconstructions) prevalence/scarcity of levels in the reconstruction in relation to the real stimulus.

The spectral autocorrelation was computed as a cross-correlation across frequencies at every timestep, averaged over the length of the stimulus (Figure 4G). It indicates how predictable the envelope level is across frequencies for the different reconstructions.

Simulation of subtractive inhibition

Request a detailed protocol

We extended our simulation of subtractive inhibition to include an estimate of previous activity in order to match it more closely with the experimentally measured delay and temporal dynamics. Inhibition was modelled to depend on the recent history of ANF firing activity in this CF range: First the PSTH of a given ANF input was time shifted by a time tc, and then integrated with an exponential kernel with a time-constant τI to obtain an intermediate representation AI(t). The resulting signal was passed through a static, sigmoidal nonlinearity given by

SIt=I01+e-S (AIt-O)

The sigmoid’s shape is controlled by I0, S and O, where I0 is the instantaneous rate of inhibition that is maximally subtracted, S is the (inverse) slope of activation, and O the value of the integrated signal AI, where the sigmoid reached 50% of I0, hence, controlling the horizontal offset. These three steps account for the integration (τI), nonlinear spike-elicitation (S, I0, O) and the conduction delay (tc). We manually estimated a set of parameters that accounted best for each SBC’s failure rates under spontaneous and natural stimulation, as well as the overall reconstruction quality and variance (see Figure 2—figure supplement 1 and Figure 3—figure supplement 1): The first three parameters were set fixed across all cells to tc=0.5 ms, τI=3 ms, S=5. The last two parameters were set individually per cell to match the experimentally observed failures rates (Figure 2—figure supplement 1). Automatic fitting was infeasible since there appeared to be many plateaus due to the discrete nature of the spikes.

The resulting output of the model constitutes a rate of spikes and was then subtracted from the neurons instantaneous activity by deleting the corresponding number of spikes (randomly across trials) in this time bin. If the number of existing spikes per bin was lower than the number of spikes allocated to subtraction, the individual bin was set to 0.

Prior to applying the subtractive inhibition, the spontaneous failure rate was accounted for by removing a random set of spikes, whose size was matched to the spontaneous failure rate of the cell.

Statistics

Data sets were tested for Gaussianity using the Shapiro-Wilk test (Shapiro and Wilk, 1965). Within-subject comparisons were performed by paired t-test (normally distributed) or Wilcoxon signed rank test (otherwise). For interpretation of all results, a p-value less than 0.05 was deemed significant, where p-values<10−5 are reported as p<0.001 in the text and figures. The effect size was calculated using the MES toolbox in Matlab (Hentschke and Stüttgen, 2011) and reported as Cohen’s U1. No statistical methods were used to pre-determine sample size. Exact p-values and test statistics are summarized in Table 1.

References

  1. 1
  2. 2
    Temporal low-order statistics of natural sounds
    1. H Attias
    2. CE Schreiner
    (1997)
    Advances in Neural Information Processing Systems 9:27–33.
  3. 3
    Sensitivity to interaural temporal disparities of low- and high-frequency neurons in the superior olivary complex. II. Coincidence detection
    1. R Batra
    2. S Kuwada
    3. DC Fitzpatrick
    (1997)
    Journal of Neurophysiology 78:1237–1247.
  4. 4
  5. 5
    Spatial Hearing: The Psychophysics of Human Sound Localization
    1. J Blauert
    (1997)
    36–200, Spatial hearing with one sound source, Spatial Hearing: The Psychophysics of Human Sound Localization, Cambridge, MIT Press.
  6. 6
  7. 7
  8. 8
    Inhibitory inputs modulate discharge rate within frequency receptive fields of anteroventral cochlear nucleus neurons.
    1. DM Caspary
    2. PM Backoff
    3. PG Finlayson
    4. PS Palombi
    (1994)
    Journal of Neurophysiology 72:2124–2157.
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
    Microelectrode study of superior olivary nuclei
    1. R Galambos
    2. J Schwartzkopff
    3. A Rupert
    (1959)
    The American Journal of Physiology 197:527–536.
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
    Envelope coding in the lateral superior olive. I. Sensitivity to interaural time differences
    1. PX Joris
    2. TC Yin
    (1995)
    Journal of Neurophysiology 73:1043–1062.
  21. 21
  22. 22
  23. 23
    Interaction of excitation and inhibition in anteroventral cochlear nucleus neurons that receive large endbulb synaptic endings
    1. C Kopp-Scheinpflug
    2. S Dehmel
    3. GJ Dörrscheidt
    4. R Rübsamen
    (2002)
    Journal of Neuroscience 22:11004–11018.
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
    Two separate inhibitory mechanisms shape the responses of dorsal cochlear nucleus type IV units to narrowband and wideband stimuli
    1. I Nelken
    2. ED Young
    (1994)
    Journal of Neurophysiology 71:2446–2462.
  30. 30
  31. 31
  32. 32
  33. 33
    Natural stimulus statistics
    1. P Reinagel
    2. S Laughlin
    (2001)
    Network 12:237–240.
  34. 34
  35. 35
    Sparseness of the neuronal representation of stimuli in the primate temporal visual cortex
    1. ET Rolls
    2. MJ Tovee
    (1995)
    Journal of Neurophysiology 73:713–726.
  36. 36
    The ontogeny of inhibition and excitation in the gerbil lateral superior olive
    1. DH Sanes
    2. EW Rubel
    (1988)
    Journal of Neuroscience 8:682–700.
  37. 37
    An analysis of variance test for normality
    1. S Shapiro
    2. M Wilk
    (1965)
    Biometrika 52:591–611.
  38. 38
  39. 39
  40. 40
  41. 41
    Reconstruction of natural scenes from ensemble responses in the lateral geniculate nucleus
    1. GB Stanley
    2. FF Li
    3. Y Dan
    (1999)
    Journal of Neuroscience 19:8036–8042.
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47

Decision letter

  1. Ian Winter
    Reviewing Editor; University of Cambridge, Cambridge, United Kingdom

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Signal representation is degraded by temporal sharpening through inhibition in the auditory brainstem" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Andrew King as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This communication follows up on a previous report (Kiene et al., 2016) regarding the "sparsity" of information represented in spike trains in the bushy cells of the gerbil cochlear nucleus. The previous report demonstrated that inhibition was part of the mechanism involved in improving the temporal representation of certain sound features using amplitude and frequency modulated tones, and using randomized γ tone sequences; one of the observations with these experiments was that the output firing rate of the bushy cells was significantly lower than the firing rate of the inputs (as defined by pre-potential "spikes"). The previous study also focused on determination of synaptic mechanisms in structuring the spiking responses of the bushy cells. The present study extends this by using a set of "natural" environmental stimuli that have varying spectral and temporal structure, although most (with the exception of the bird song) appear as slowly modulated wideband noises with varying slow spectral slopes. The results are consistent with, and build on, the previous study, and are an interesting contribution to the literature, given that naturalistic sounds or their lab-constructed synthetic components, have been relatively sparsely studied in the auditory brainstem nuclei (primarily as human speech sounds and various reduced versions of speech), and may provide important clues as to how sensory perceptions are constructed from the somewhat deconstructive processes of the auditory periphery.

There are two really interesting results in this communication. First, the average firing rate of spherical bushy cells (SBCs) is not increased during stimulation with these sound sets, whereas the average rate in the auditory nerve fibers (deduced from the pre-potential) does increase, and this was fairly independent of the stimulus category. This is remarkably different from the responses to narrow band (tonal) stimuli, and to unmodulated wide band noise stimuli. The increase in the sparsity and reproducibility of spiking in the SBCs confirms what should be predicted from the previous study, but provides generality by extending it to a wider class of stimuli, and it is very interesting that the reproducibility and sparsity measures are higher for the natural stimuli than the previous synthetic stimuli. The second interesting result is that reconstruction of the spectro-temporal pattern of the stimulus is somewhat degraded in the SBCs, although some (but maybe not all?) of the temporal aspects are "enhanced".

Essential revisions:

1) The terms "sparsity" and "response reproducibility" should be defined clearly, so they can stand independently of the earlier paper. A reader would benefit from some context on these numbers to know what is a "good" or "bad" value.

2) Value-laden terms such as "improvements" (Results section) should be avoided. One might benefit from more nuance than is conveyed by terms such as "reduced" fidelity or accuracy, and "impoverished" correlations (Abstract, Introduction and Results section). A change in information from auditory nerve to bushy cell is expected, and could be considered selection, rather than degradation, as the title suggests. The consequence of this is not well captured in the Discussion section. There is a lot more to the auditory system than just bushy cells, after all, so hearing should be just fine if one cell type disregards a certain type of information in favor of another.

3) It would be useful to know the responses of other unit types in these studies. It would also be useful to know if the authors found any primary-like units without a prepotential. Did all cells show spike failures? How does this finding fit with other studies on primary-like units which have used both simple and complex sounds (e.g. AM, FM, steady-state vowels) and found increases in response rate? Did you record the responses to single tones to construct a PSTH and/or receptive field? If so, did the units discharge above their spontaneous rate? What did the receptive field look like? What was the distribution of BFs in these units? How should we think about specificity in the output of SBCs vs. generality of the inhibitory input?

4) Subtractive inhibition (Figure 2). It seems the goal here is to show how inhibition enhances timing precision among bushy cells with respect to auditory nerve fibers. This is an important idea and should be fleshed out more, where alternatives are clearly delineated and tested. The hypothesis seems to be that temporal precision is enhanced when inhibition tracks activity with some delay. This is reasonable based on the measured EPSP(fail) rates. How these translate into the parameters for the inhibitory model should be clearly depicted. But additional analyses are needed to conclude that the pattern of inhibition is responsible for enhanced temporal precision. For example, how is temporal precision influenced by uniformly deleted spikes (tonic inhibition), randomly deleted spikes, using an opposite pattern of spike deletion (heavier following sparse activity), activity-dependent decay of different depths (I(0)? S?) and timecourses (tau1)? Also, to make the approach clear, it would be helpful to show examples of original spike train, modeled inhibition, modified spike train, and sparsity/reproducibility before and after such manipulation. Please plot on a timescale that shows the differences to best advantage. What is the variance referred to in the Results section?

5) Population vs. paired analysis. The approach taken here was to use the entire population of auditory nerve activity and the entire population of bushy cell activity to try to reconstruct the sound signal. Please explain the reasoning behind this approach. It seems to introduce some problems, the most obvious being frequency sensitivity of individual units. That is, one does not expect a single cell to be able to represent much of a sound, and a correction for this was confusingly mentioned in passing (subsection “Stimulus reconstruction”). The critical issue is how completely the individual bushy cell represents the stimulus compared to its inputs. Doing this comparison cell by cell would be more meaningful.

6) The temporal enhancement would be expected based on the mechanisms (potassium conductances, inhibition), following observations by Joris et al. (and others, including work by the present authors) on the improvements in temporal representation of simple sounds in the bushy cell pathways. However, here the sharpened representation of temporal features of the stimulus likely does not derive from the ability of bushy cells to carry the timing of stimulus "fine structure", but it appears to be in carrying timing of envelope fluctuations, at least at the scale of analysis used here. Although this is interesting, it should be clarified, as the term "fine-structure" as used in the manuscript is not always consistent with its use in other parts of the literature, where it refers more closely to the instantaneous waveform than the waveform over a several msec period.

7) The discussion should touch more carefully on the improvement in sparsity and reproducibility for the natural stimuli compared to the synthetic stimuli used in the previous study, both in terms of potential mechanisms and the importance (if there is one) of the representation. How do the authors think this change came about, and why or how is it important?

8) The paper would benefit from discussion of other studies that have considered natural vs. artificial stimuli (e.g. Rieke, Bodnar and Bialek, (1995), Reinagel, (2001)). Finally, you should address how these observations relate to localization (or not) or other aspects of parsing and/or reconstructing the acoustic environment. Some concepts related to the framework of recent work from the last author's group (Boubenec et al., 2017) might present an interesting starting point.

https://doi.org/10.7554/eLife.29639.020

Author response

Essential revisions:

1) The terms "sparsity" and "response reproducibility" should be defined clearly, so they can stand independently of the earlier paper. A reader would benefit from some context on these numbers to know what is a "good" or "bad" value.

We have added the full definitions now in the Methods section of the revised manuscript, as well as providing an aid for interpretation there. In this process, we have noted that in the original publication the normalization was not fully spelled out, which may lead to some confusion. We propose to amend the definition there as well to align the descriptions and avoid potential confusion.

2) Value-laden terms such as "improvements" (Results section) should be avoided. One might benefit from more nuance than is conveyed by terms such as "reduced" fidelity or accuracy, and "impoverished" correlations (Abstract, Introduction and Results section). A change in information from auditory nerve to bushy cell is expected, and could be considered selection, rather than degradation, as the title suggests. The consequence of this is not well captured in the Discussion section. There is a lot more to the auditory system than just bushy cells, after all, so hearing should be just fine if one cell type disregards a certain type of information in favor of another.

We think the – by now typical – attempt to squeeze too much information into a short title has created an impression that was not intended: we fully agree with the reviewer that this selection of information at the SBC junction will not be mirrored by most cells in the auditory brainstem. Other cells will likely select for different information, in line with the processing needs of their subsystem. Quite conversely, we in fact wanted to make exactly this point (both in the manuscript as a whole and the Discussion section), that given the 'task' of the SBCs in the auditory system, they appear to be selective for temporal information, and on the flipside, miss out on some sound level-information. While we agree that this conclusion is not entirely unexpected (given a limited amount of coding capacity available for a single neuron), however, the stimulus reconstruction analysis makes this notion explicit.

Another level of quantification could for example be an information theoretic account that tries to specifically account for level and timing information and explicitly analyses the degree to which these are represented in the response. In accordance with the reviewer’s comment we emphasize more strongly in many places that a) this result is (at this point) only established for the SBC (in particular in the Discussion section, but also the Introduction), and b) reduced/removed the value-aspects from the terminology. 'Improvements' are now only reported for the stimulus reconstructions, where the quality in fact varies between 0 (poor) and 1 (excellent).

3) It would be useful to know the responses of other unit types in these studies.

The present study specifically focused on the ANF-to-SBC synapse and the integration of excitation and inhibition on processing of natural sounds. We therefore did not record from other unit types in the CN. It is difficult to predict what their response would be like, since not many studies have compared synthetic and natural stimuli in this respect. From the present study, we can at least conclude that ANFs show an increase in firing rates to both tonal and natural stimuli.

It would also be useful to know if the authors found any primary-like units without a prepotential. Did all cells show spike failures?

Units in this study were exclusively recorded from the rostral AVCN, the location of large, low-frequency spherical bushy cells (Bazwinsky et al., 2008) in line with previous studies investigating the integration of excitation with acoustically evoked inhibition in SBCs (Kuenzel et al., 2011; Keine and Rübsamen, 2015; Keine et al., 2016).

During data collection, we did not encounter primary-like units without a prepotential, although in some units the prepotential was very small once the loose-patch configuration was established (see also the first point of the figure comments for further clarification of this issue). However, the complex waveform and the prominent EPSP are characteristic for SBCs that receive large endbulb inputs and markedly different compared to the biphasic waveform of other cell types in the CN, e.g. stellate cells.

All recorded units showed spike failures, albeit to different degrees, ranging from 2% to 73% in the present sample, consistent with previous observations (Kuenzel et al., 2011; Keine and Rübsamen, 2015; Keine et al., 2016). All units exhibited increased failure fractions during sound stimulation, regardless of their spontaneous failure fraction (see Author response image 1 below). As expected, units with high spontaneous failure rates showed lower increases in failures during stimulation, as a higher number of spontaneous failures will leave fewer transmission events that can be suppressed during acoustic stimulation. Hence, all recorded SBCs were qualitatively of a similar phenotype with some quantitative differences between them.

Author response image 1
Relation between spontaneous and driven failures.

(A) Acoustically driven failure rates always exceeded spontaneous failures, but correlated well with each other. This suggests that the effects of spontaneous failures and stimulus-driven failures (by inhibition or otherwise) are 'additive', although we cannot discern, whether this is true addition (i.e. FF(s + d) = FF(s) + FF(d)) or just a 'greater than' relation (i.e. FF(s+d) > FF(s)). (B) The additional failures during stimulation (i.e. Driven – Spontaneous) showed a slight negative correlation. We think this dependence is expected, since in units with higher spontaneous failure fractions, only a smaller fraction of successful transmission events remains, that can still fail during stimulation.

How does this finding fit with other studies on primary-like units which have used both simple and complex sounds (e.g. AM, FM, steady-state vowels) and found increases in response rate? Did you record the responses to single tones to construct a PSTH and/or receptive field? If so, did the units discharge above their spontaneous rate? What did the receptive field look like? What was the distribution of BFs in these units?

Yes, for every unit in this study, a pure tone tuning curve was recorded to estimate the units’ frequency response area. The response areas were similar to the ones reported in previous studies (both in our studies Keine and Rübsamen, 2015; Keine et al., 2016, and others before, e.g. Winter and Palmer, 1990; Caspary et al., 1994; Kopp-Scheinpflug et al., 2002; Dehmel et al., 2010; Kuenzel et al., 2011; Typlt et al., 2012). Given the short format of the Research Advance, and the idea that it builds upon a previous publication, we did not describe these properties again here. The BFs of the units are shown in Figure 3B as blue dots, and the receptive fields was V-shaped with high-frequency inhibitory sidebands, with inhibition found throughout the receptive field (see Keine et al., 2016 for examples and averages). Thus, the sound-driven firing rates at BF stimulation were also increased, resulting in firing rates above the spontaneous rate. This is in accordance with results from previous studies: For tonal stimuli, SBCs show a reduced response gain (Kopp-Scheinpflug et al., 2002; Kuenzel et al., 2011, 2015; Keine and Rübsamen, 2015; Keine et al., 2016) and partly non-monotonic rate level functions (Winter and Palmer, 1990; Kopp-Scheinpflug et al., 2002; Kuenzel et al., 2011, 2015; Keine and Rübsamen, 2015; Keine et al., 2016). This non-monotonicity could render sound-driven firing rates close to spontaneous ones, but for most experimental stimuli tested, the acoustically-evoked SBC firing rates are still well above the spontaneous rate, including pure tones, amplitude-modulated, frequency-modulated and complex synthetic sounds (Kopp-Scheinpflug et al., 2002; Kuenzel et al., 2011, 2015; Keine and Rübsamen, 2015; Keine et al., 2016). To emphasize the finding of constant SBC output rates during natural stimulation, we now added an additional panel to Figure 1 (Figure 1Di) depicting the change in firing rates for both ANF input and SBC output during pure tone stimulation. We have added this information to the manuscript in the Results section when describing Figure 1D.

How should we think about specificity in the output of SBCs vs. generality of the inhibitory input?

Considering the broad on-CF inhibition, we think that inhibitory inputs are recruited over a broad spectral range and integrated over a limited amount of time (the latter is captured in our current inhibition model) whenever the acoustic stimulus is close to the unit’s CF. We think that these two factors define the instantaneous strength of the inhibition. In our previous study, we also observed increased SBC output rates in response to various (complex) synthetic sounds. However, natural stimuli differ from the used synthetic stimuli both in their instantaneous spectral width, but also other spectrotemporal properties. We hypothesize that the inhibition integrates the properties, which then leads to a stronger gain control (keeping the SBC output rates constant) and (relatively) higher increases in sparsity and reproducibility. Whether this is an adaptation to natural statistics specifically, or just spectrally broad stimuli, or those with a particular autocorrelation is a question that remains to be addressed in future studies. We have added a whole section on this question in the Discussion.

4) Subtractive inhibition (Figure 2). It seems the goal here is to show how inhibition enhances timing precision among bushy cells with respect to auditory nerve fibers. This is an important idea and should be fleshed out more, where alternatives are clearly delineated and tested. The hypothesis seems to be that temporal precision is enhanced when inhibition tracks activity with some delay. This is reasonable based on the measured EPSP(fail) rates. How these translate into the parameters for the inhibitory model should be clearly depicted. But additional analyses are needed to conclude that the pattern of inhibition is responsible for enhanced temporal precision. For example, how is temporal precision influenced by uniformly deleted spikes (tonic inhibition), randomly deleted spikes, using an opposite pattern of spike deletion (heavier following sparse activity), activity-dependent decay of different depths (I(0)? S?) and timecourses (tau1)? Also, to make the approach clear, it would be helpful to show examples of original spike train, modeled inhibition, modified spike train, and sparsity/reproducibility before and after such manipulation. Please plot on a timescale that shows the differences to best advantage. What is the variance referred to in the Results section?

In the present study, we attempt to describe the increase in sparsity and reproducibility from the ANF input to the SBC output by a phenomenological model based on previous observations.

As suggested by the reviewer we compared the activity-dependent subtractive inhibition with three other models of inhibition (i) pure subtractive inhibition, i.e. removal of certain number of events, (ii) pure divisive inhibition, i.e. removal of certain fraction of events, (iii) inverted activity-dependent inhibition, i.e. stronger inhibition after low ANF activity and evaluated the results on failure fraction, sparsity and reproducibility. All parameters were chosen to best match the experimentally observed failure rates for each unit. A pure subtractive inhibition resulted in generally higher values of sparsity and reproducibility than observed in the experiments, which is likely caused by the “holes” in the PSTH for sections with low ANF activity. Divisive inhibition, i.e. scaling of the ANF response could be well-matched to the experimental failure fractions, but failed to increase sparsity and reproducibility. The inverted activity-dependent subtractive inhibition resulted in slightly increased sparsity and reproducibility, but failed to generate failure rates in the range of experimentally observed ones, particularly missing out on units with failure rates >50%. The activity-dependent subtractive inhibition showed the best match with the experimental data in failure rates, sparsity and reproducibility. While the increase in failure fraction and sparsity could be well modeled, some spread in reproducibility was observed in some cells, which might be caused by the sensitivity of this measure to spontaneously occurring spikes. While we believe that this simple model provides a fairly good match to the observed data, it is arguably not perfect. We now added a comparison of these different models to Figure 2 – supporting figure 1.

The parameters Io, S, and O have now been described in more detail in the Methods section alongside the introduction of the model. The chosen values for the parameters are now explained in more detail and a schematic is provided in Figure 2—figure supplement 1 to aid the reader on how these values were derived.

5) Population vs. paired analysis. The approach taken here was to use the entire population of auditory nerve activity and the entire population of bushy cell activity to try to reconstruct the sound signal. Please explain the reasoning behind this approach. It seems to introduce some problems, the most obvious being frequency sensitivity of individual units. That is, one does not expect a single cell to be able to represent much of a sound, and a correction for this was confusingly mentioned in passing (subsection “Stimulus reconstruction”). The critical issue is how completely the individual bushy cell represents the stimulus compared to its inputs. Doing this comparison cell by cell would be more meaningful.

We acknowledge that the reasoning for the choice of analysis was a bit brief in the manuscript, following the length restrictions of the 'Research Advance' format. We agree that single cell analysis has a lot of merits in its own right: the distribution of cellular characteristics can be analyzed and differences between cells can be detected. This is the aim of most sensory processing studies, for example also in our previous study (Keine et al., 2016), where SBCs were characterized by their rate-level functions, spectrotemporal receptive fields, modulation transmission, phase locking, sparsity and reproducibility of response.

However, in the present manuscript, our aim was different: we wanted to quantify the representation of the stimulus on the population level, i.e. in particular focusing on the combined 'information' available in the population. Importantly, if multiple neurons have similar BFs, their responses can be used together to reconstruct the stimulus, which could for example be relevant if firing rate limitations allow feature of the stimulus only to encoded partially by each neuron (even in one frequency location).

As the reviewer points out, this overcomes the limitation a single cell representation has, namely a very limited 'field of view' in frequency, hence, the impossibility to represent the entire stimulus. If one performed single cell reconstruction, the result would be very limited, and the results obtained are already captured by more traditional methods of analysis, which quantify the temporal precision of response.

In essence the population analysis accounts for the well-documented strategy of population coding, employed throughout the mammalian nervous system. Population decoding, as performed here in an explicit manner, attempts to estimate the ensemble representation of the stimulus (thus mimicking what other neurons in the system could decode from it).

Clearly, the stimulus reconstruction performed here with a limited sample of cells only constitutes a coarse estimate (maybe a lower bound) on the representation by the entire population. We would, however, hypothesize that the present analysis remains meaningful, since we perform the reconstructions for all three ANF, SBC, and EPSPfail components for paired synapses. Hence, differences should reflect those introduced specifically at the endbulb junction.

Regarding the description of this process in the Materials and methods section: "The stimulus was restricted to 200-4500 Hz, slightly extending the range of the characteristic frequencies (CFs, [1.18, 3.05] kHz) of the present sample (…)." This description only indicates that the stimulus, which in principle ranges up to 16kHz was limited for the reconstruction to the encoded frequencies by the sampled cells. As the reviewer points out, trying to reconstruct beyond this range, would be as futile as trying to reconstruct a wide range of frequencies from a single neuron.

We hope to have convinced the reviewer of the differences and utility of the present analysis. Reiterating that the single cell analysis is largely covered in the forward sense in the previous publication, and Figure 1 and 2 in the present manuscript, we provide a shortened version of the above argument to motivate the analysis in the introduction, methods and results of the present manuscript.

6) The temporal enhancement would be expected based on the mechanisms (potassium conductances, inhibition), following observations by Joris et al. (and others, including work by the present authors) on the improvements in temporal representation of simple sounds in the bushy cell pathways. However, here the sharpened representation of temporal features of the stimulus likely does not derive from the ability of bushy cells to carry the timing of stimulus "fine structure", but it appears to be in carrying timing of envelope fluctuations, at least at the scale of analysis used here. Although this is interesting, it should be clarified, as the term "fine-structure" as used in the manuscript is not always consistent with its use in other parts of the literature, where it refers more closely to the instantaneous waveform than the waveform over a several msec period.

We agree, that our terminology was not appropriate and could lead to confusions with the existing literature. We now refer to it as “envelope fine-structure” to better distinguish it from the pure-tone phase-related fine-structure. The term has been replaced throughout the manuscript.

7) The discussion should touch more carefully on the improvement in sparsity and reproducibility for the natural stimuli compared to the synthetic stimuli used in the previous study, both in terms of potential mechanisms and the importance (if there is one) of the representation. How do the authors think this change came about, and why or how is it important?

We welcome the opportunity to add an interpretation of the respective findings. Overall, we hypothesize that the relatively higher increase in sparsity and reproducibility for natural stimuli is in fact relevant, although we would not claim that it would pertain solely to natural stimuli, but extend to other, synthetic stimuli, which match the natural stimuli in some of their properties. What these subsets precisely are, is an interesting question. In studying the cortex, researchers have previously attempted to combine natural speech stimuli with synthetic, general stimuli (SPORC = Speech + TORC, where TORCs are band-limited white-noise-type stimuli (David et al., 2009). There the purpose was to estimate neural tuning in the context of naturalistic temporal modulations. In the SBCs, further experiments, using similar kinds of hybrid stimuli would be required to tease apart what drives the increase sparsity and reproducibility. We predict that the high spectral density of natural stimuli is a main contributor to the level of increase (compare e.g. Figure 7A in Keine et al. 2016 with Figure 1B here), as the tuning of the inhibitory input is quite broad and will thus be driven strongly by broad-band stimulation. Another difference between artificial and natural stimuli is that the latter will to some degree contain correlations of second order (for textures, such as sand, but even higher for bird song), while the placement of the sounds in the RGS stimulus was chosen randomly. However, we have no evidence at this point to suspect that this would have an influence on sparsity and reproducibility. While potentially there could be in addition a dependence on the level of stimulation, this still is unlikely to have contributed here, as the stimuli were presented at the same average level. A briefer version of this argument has been added to the Discussion, combined with the response to Essential Revision #8 below, which also addressed the relation between natural and artificial stimuli.

8) The paper would benefit from discussion of other studies that have considered natural vs. artificial stimuli (e.g. Rieke, Bodnar and Bialek, (1995), Reinagel, (2001)). Finally, you should address how these observations relate to localization (or not) or other aspects of parsing and/or reconstructing the acoustic environment. Some concepts related to the framework of recent work from the last author's group (Boubenec et al., 2017) might present an interesting starting point.

We agree that a number of important thematic relationships were not appropriately addressed in the previous version, largely due to word limitations. Indeed, processing and encoding of natural sounds has been a recurring topic in (auditory) neuroscience, which has in fact often highlighted an apparent match between (certain) natural stimuli and the neural activity/system. Before, we only mentioned the classical findings by Lewicki, (2002), but we have now added related findings by Attias and Schreiner, (1997); Nelken et al., (1999); Hsu et al., (2004); Chechik and Nelken, (2012), reviewed in Theunissen and Elie, (2014) as well as your excellent suggestion of Rieke et al., (1995). We have also taken up your suggestion of interpreting the function of the inhibitory gain control from the perspective of predictive/contextual encoding and have dedicated the second-to-last section in the Discussion to this topic.

https://doi.org/10.7554/eLife.29639.021

Article and author information

Author details

  1. Christian Keine

    1. Carver College of Medicine, Department of Anatomy and Cell Biology, University of Iowa, Iowa City, United States
    2. Faculty of Bioscience, Pharmacy and Psychology, University of Leipzig, Leipzig, Germany
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    christian.keine@gmail.com
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8953-2593
  2. Rudolf Rübsamen

    Faculty of Bioscience, Pharmacy and Psychology, University of Leipzig, Leipzig, Germany
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Writing—review and editing
    Competing interests
    No competing interests declared
  3. Bernhard Englitz

    Donders Center for Neuroscience, Department of Neurophysiology, Radboud University, Nijmegen, Netherlands
    Contribution
    Conceptualization, Software, Formal analysis, Validation, Methodology, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9106-0356

Funding

Deutsche Forschungsgemeinschaft (RU 390/19-1)

  • Rudolf Rübsamen

Deutsche Forschungsgemeinschaft (RU 390/20-1)

  • Rudolf Rübsamen

European Commission (Marie Sklodowska Curie Fellowship 660328)

  • Bernhard Englitz

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was supported by the German Research Foundation (DFG Priority Program 1608 ‘Ultrafast and temporally precise information processing: Normal and dysfunctional hearing’ [RU390/19–1, RU390/20–1]), and Marie Sklodowska Curie Fellowship 660328. The authors thank the three anonymous reviewers for their constructive feedback which substantially improved the manuscript. The authors declare no competing financial interests.

Ethics

Animal experimentation: Animal experimentation: All experiments were approved by the Saxonian District Government, Leipzig (TVV 06/09), and conducted according to the European Communities Council Directive (86/609/ EEC).

Reviewing Editor

  1. Ian Winter, University of Cambridge, Cambridge, United Kingdom

Publication history

  1. Received: June 23, 2017
  2. Accepted: September 25, 2017
  3. Accepted Manuscript published: September 25, 2017 (version 1)
  4. Version of Record published: October 3, 2017 (version 2)

Copyright

© 2017, Keine et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 477
    Page views
  • 78
    Downloads
  • 5
    Citations

Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Ethan A Heming et al.
    Research Article
    1. Neuroscience
    Caroline Lei Wee et al.
    Research Article