Signal integration at spherical bushy cells enhances representation of temporal structure but limits its range

  1. Christian Keine  Is a corresponding author
  2. Rudolf Rübsamen
  3. Bernhard Englitz
  1. University of Iowa, United States
  2. University of Leipzig, Germany
  3. Radboud University, Netherlands


Neuronal inhibition is crucial for temporally precise and reproducible signaling in the auditory brainstem. Previously we showed that for various synthetic stimuli, spherical bushy cell (SBC) activity in the Mongolian gerbil is rendered sparser and more reliable by subtractive inhibition (Keine et al., 2016). Here, employing environmental stimuli, we demonstrate that the inhibitory gain control becomes even more effective, keeping stimulated response rates equal to spontaneous ones. However, what are the costs of this modulation? We performed dynamic stimulus reconstructions based on neural population responses for auditory nerve (ANF) input and SBC output to assess the influence of inhibition on acoustic signal representation. Compared to ANFs, reconstructions of natural stimuli based on SBC responses were temporally more precise, but the match between acoustic and represented signal decreased. Hence, for natural sounds, inhibition at SBCs plays an even stronger role in achieving sparse and reproducible neuronal activity, while compromising general signal representation.


Acoustically evoked inhibition plays a crucial role in shaping the neuronal activity already at the first central station of the auditory system (Caspary et al., 1994Kopp-Scheinpflug et al., 2002; Dehmel et al., 2010Kuenzel et al., 2011; Keine and Rübsamen, 2015; Keine et al., 2016). In a previous study, we demonstrated that inhibition on spherical bushy cells (SBCs) renders their output sparser and more reproducible than their auditory nerve fiber (ANF) input (Keine et al., 2016). These transformations persist over a wide range of acoustic stimuli and sound intensities, and can be approximated by an inhibition which we modelled as a scaled subtraction. Functionally, this inhibition emphasizes reliable events and controls the response gain across a wide range of sound levels.

Since in most previous studies, the neuronal activity was recorded either during simple or complex, but synthetic acoustic stimuli, it remains unknown if the inhibitory effect on sound encoding persists also in natural acoustic environments. Here, we extend the range of tested stimuli to natural sounds approximating a gerbil’s environment. In such a natural context, the inhibitory influence on the SBC output activity proves even stronger than under complex, but non-natural stimulus conditions tested before. In particular, while for most synthetic stimuli the SBC firing rates generally increase, they remained constant under natural acoustic stimulation.

While transformations of this kind can emphasize certain aspects of the sensory input, they may also deemphasize others. We study this trade-off directly by performing stimulus reconstruction from the population of cells, a technique which has already been successfully applied in cortical recordings (Stanley et al., 1999; Mesgarani et al., 2009). In contrast to single-cell analysis, this population-based technique provides an estimate of the overall stimulus information available in a group of neurons. Reconstructions of this type have previously been successfully employed to identify the effect of attention on the neural response (Mesgarani and Chang, 2012). We employ the respective reconstructions to analyze the effect of inhibition in the represented stimulus spectrogram.

We find stimulus reconstructions based on the ANF input to be overall more accurate than those from the SBC output. Surprisingly, the share of ANF inputs which are not transmitted to the SBC output (partly blocked by inhibition, Kuenzel et al., 2011; Keine and Rübsamen, 2015; Keine et al., 2016) delivered the most accurate reconstructions, but were temporally more variable. We find that the overall fidelity of representing the stimulus is reduced in SBCs due to the inhibitory gain control, resulting in a lower variance in stimulus reconstruction. Consistent with their role in sound localization, the SBCs’ inhibition-shaped output appears to be more focused on restricted, short-term signal representation during variable stimulus conditions, while the overall information about the stimulus is reduced.


We recorded from spherical bushy cells (SBCs) in the AVCN of anesthetized gerbils in vivo to understand the influence of neuronal inhibition on the encoding of complex acoustic sounds. For this purpose, we presented real environmental sounds (e.g. walking on gravel, singing birds), and performed an explicit population decoding which allows the analysis of the stimulus representation in its original form.

Modulation of SBC responses under natural acoustic stimulation

To directly investigate the transformation at the ANF-SBC synapse, the neuronal response was separately analyzed for EPSPs which trigger an output spike (EPSPsucc) and EPSPs which fail to trigger SBC activity (EPSPfail) as described previously (Figure 1A) (Keine et al., 2016). As shown before, these failures of transmission are mostly caused by inhibition at the ANF-SBC junction (Kuenzel et al., 2011; Keine and Rübsamen, 2015; Keine et al., 2016). The acoustic stimulus was composed of seven different segments of varying spectral breadth and featuring rapid amplitude modulations (Figure 1B+C). First, we recorded the change in firing activity of ANF input and SBC output for simple pure tones at the units’ characteristic frequency. Both ANF input and SBC output rates increased during pure tone stimulation (ANFspont = 71.3 ± 31.8 Hz vs. ANFtone = 234.3 ± 53.9 Hz; Δ = 163 ± 59.3 Hz, p<0.001; SBCspont = 43.4 ± 18.3 Hz vs. SBCtone = 97.3 ± 33.9 Hz, Δ = 53.9 ± 27.9 Hz, p<0.001, see Table 1 for additional details of statistical tests, and Figure 1—source data 1, Figure 1Di). During natural acoustic stimulation, the ANF input firing rates also increased (ANFspont = 71.3 ± 31.8 Hz vs. ANFnatural = 129.7 ± 38.4 Hz, Δ = 58.4 ± 25.5 Hz, p<0.001), however, contrary to pure tone stimulation, the SBC output activity remained unchanged (SBCspont = 43.4 ± 18.3 Hz vs. SBCnatural = 43.1 ± 22.1 Hz, Δ = 0.3 ± 15.5 Hz, p=0.9, Figure 1C for representative trace and Figure 1Dii for population data). This effect was persisted when the different stimulus segments were analyzed separately (Figure 1Diii). While the increase in SBC firing rates for tones is consistent with previous studies using various synthetic stimuli (Kopp-Scheinpflug et al., 2002; Kuenzel et al., 2011; Keine and Rübsamen, 2015; Keine et al., 2016), the constancy for natural stimulation has not been demonstrated before.

Acoustic stimulation with environmental sounds increases auditory nerve firing while leaving the SBC firing rates constant.

(A) Representative voltage trace of SBC recording during acoustic stimulation. Voltage signals could be divided into EPSP followed by postsynaptic AP (blue dots) and EPSPs which fail to trigger an AP (gray dots). The sum of both types of events comprised the ANF input. (B) The acoustic stimulus was composed of a series of environmental sounds (i.e. rain falling, walking on gravel, bird singing, shoveling sand, ripping grass, walking through forest, walking on fallen leaves), containing naturally occurring frequency and amplitude modulations. Top: amplitude profile; bottom: spectrogram. The stimulus was presented at a mean sound pressure level of 40 dB SPL. PSD = power spectral density. (C) Firing rates of one representative cell during acoustic stimulation (CF = 2.6 kHz, bin size 0.1 s). Note that the firing rate fluctuations at the ANF level (orange) are higher than at the SBC level (blue). Arrows at the right indicate spontaneous firing rates in the absence of sound. (D) Population data on firing rate changes for ANF input and SBC output during pure tone and natural sound stimulation. (i) Pure tone stimulation at the units’ characteristic frequency resulted in a firing rate increase in both ANF input and SBC output. (ii) While during stimulation with natural sounds, the average ANF firing increased comparably to pure tone stimulation, the average SBC firing remains at the level of spontaneous activity. (iii) This effect was consistently observed throughout different stimulus segments of the natural sound stimulus. Horizontal lines indicate the average spontaneous firing rate in the absence of sound.
Figure 1—source data 1

Firing rates of auditory nerve (ANF) input and Spherical Bushy Cell (SBC) output during spontaneous activity, pure tone and natural acoustic stimulation.

Data are in Hz.
Table 1
Summary of statistical comparisons.
SourceParameterGroup 1 (mean ± SD)Group 2 (mean ± SD)Test statisticsStatistical test
Figure 1
Panel Di
(pure tones)
Firing Rate ANFSpont = 71.3 ± 31.8 HzStim = 234.3 ± 53.9 Hzt(df = 31) = 15.5, p=3.5e-16, U1=0.87paired t test
Firing Rate SBCSpont = 43.4 ± 18.3Stim = 97.3 ± 33.9t(31) = –10.9, p=3.5e-12, U1=0.59paired t test
Panel Dii
(natural sounds)
Firing Rate ANFSpont = 71.3 ± 31.8 HzStim = 129.7 ± 38.4 Hzt(31) = 12.9, p=4.9e-14, U1=0.3paired t test
Firing Rate SBCSpont = 43.4 ± 18.3Stim = 43.1 ± 22.1t(31) = 0.12, p=0.9, U1=0.03paired t test
Figure 2
Panel AThreshold EPSPSpont = 7.5 ± 2.4 V/sStim = 9 ± 2.8 V/st(31) = 10.2, p=1.8e-11, U1=0.1paired t test
Panel BFailure FractionSpont = 0.36 ± 0.2Stim = 0.65 ± 0.17t(31) = 16.3, p=9.1e-17, U1=0.28paired t test
Panel CiSparsityANF input = 0.09 ± 0.05SBC output = 0.17 ± 0.07t(31) = 6.9, p=8.2e-8, U1=0.17paired t test
Panel DiReproducibilityANF input = 0.13 ± 0.09SBC output = 0.28 ± 0.18Z(N=32) = 4.9, p=8.8e-7, U1=0.2Wilcox. sig.-rank
Figure 3
Panel K

Corr. SectionANF = 0.66 ± 0.14EPSP = 0.71 ± 0.11Z(N=10) = 2.8, p=0.006*, U1=0.1Wilcox. sig.-rank
Corr. SectionANF = 0.66 ± 0.14SBC = 0.56 ± 0.16Z(N=10) = 2.8, p=0.006*, U1=0.1Wilcox. sig.-rank
Corr. SectionEPSP = 0.71 ± 0.11SBC = 0.56 ± 0.16Z(N=10) = 2.8, p=0.006*, U1=0.35Wilcox. sig.-rank
Corr. Time SectionANF = 0.68 ± 0.14EPSP = 0.71 ± 0.14Z(N=10) = 2.7, p=0.0117*, U1=0.1Wilcox. sig.-rank
Corr. Time SectionANF = 0.68 ± 0.14SBC = 0.58 ± 0.15Z(N=10) = 2.7, p=0.0117*, U1=0.15Wilcox. sig.-rank
Corr. Time SectionEPSP = 0.71 ± 0.14SBC = 0.58 ± 0.15Z(N=10) = 2.8, p=0.006*, U1=0.25Wilcox. sig.-rank
Corr. Freq. SectionANF = 0.45 ± 0.042EPSP = 0.44 ± 0.071Z(N=10) = 1.6, p=0.39*, U1=0.2Wilcox. sig.-rank
Corr. Freq. SectionANF = 0.45 ± 0.042SBC = 0.34 ± 0.058Z(N=10) = 2.8, p=0.006*, U1=0.85Wilcox. sig.-rank
Corr. Freq. SectionEPSP = 0.44 ± 0.071SBC = 0.34 ± 0.058Z(N=10) = 2.8, p=0.006*, U1=0.5Wilcox. sig.-rank
Factor 1Factor 2
Figure 4
Panel EAutocorrelationHeight: p=2.9e-25Response Type: p=0.002ANOVA, 2-factor
Panel FSpectraMod. Rate: p=1.7e-6Response Type: p=2.1e-51ANOVA, 2-factor
Panel GFreq. XCorrDelta Freq: p=1.1e-14Response Type p=0.023ANOVA, 2-factor
  1. *p-values Bonferroni-corrected for N = 3 comparisons.

The unchanged average firing rates of the SBC output during natural acoustic stimulation was accompanied by an increase in threshold EPSP (Spont = 7.5 ± 2.4 V/s vs. Stim = 9 ± 2.8 V/s, Δ = 1.4 ± 0.8 V/s, p<0.001, see Figure 2—source data 1) and failure fraction (Spont = 0.36 ± 0.2 vs. Stim = 0.65 ± 0.17, Δ = 0.29 ± 0.1, p<0.001, Figure 2A) (see also Keine et al., 2016). Together with previous results, this indicates a strong influence of inhibition, which limits the increase in average SBC firing and thereby effectively regulates the SBC output gain. The quality of the neuronal response was assessed by calculating the sparsity (Figure 2B) and response reproducibility (Figure 2C) for both the complete natural stimulus and the different stimulus segments individually. Consistent with our previous results using synthetic sounds, both sparsity and reproducibility of the SBC output were increased compared to the ANF input (sparsity: ANF input = 0.09 ± 0.05 vs. SBC output = 0.17 ± 0.07, Δ = 0.08 ± 0.06, p<0.001; reproducibility: ANF input = 0.13 ± 0.09 vs. SBC output = 0.28 ± 0.18, Δ = 0.14 ± 0.12, p<0.001). Notably, while the absolute values of sparsity and reproducibility were lower for natural sounds compared to the complex, but synthetic stimuli used previously (Keine et al., 2016), the relative increase of both metrics at the ANF-SBC junction was larger during natural acoustic stimulation (sparsity: natural = 105 ± 98% vs. synthetic = 48 ± 49%, p=0.004; reproducibility: natural = 118 ± 74% vs. synthetic = 80 ± 59%, p=0.024, t-test) (Keine et al., 2016).

Figure 2 with 1 supplement see all
SBC output exhibits increased sparsity and reproducibility compared to ANF input which can be attributed to activity-dependent subtractive inhibition.

(A) During acoustic stimulation the threshold EPSP for AP generation (left) was increased and consequently so was the failure fraction (right), indicating strong inhibition during acoustic stimulation with natural sounds. (B) (i) The sparsity of the neuronal response was separately calculated for the ANF input and the SBC output. The SBC output showed consistently higher sparsity than the ANF input. (ii) The increase in sparsity from the ANF input (orange) to the SBC output (blue) was consistently observed for the different stimulus segments. Simulated (subtractive) inhibition (ANF +SI, green) resulted in similar increases in sparsity. Notably, for conditions in which the sparsity of the ANF input was high (i.e. 'sand'), the SBC output did not increase further. (C) (i) Similar to sparsity, the reproducibility of the neuronal response increased at the SBC level. (ii) Again, this effect was consistent for the different stimulus segments and well approximated by simulating subtractive inhibition.
Figure 2—source data 1

Threshold EPSP (in V/s) and failure fraction during acoustic spontaneous activity and natural acoustic stimulation.

Values of sparsity and reproducibility for ANF input and SBC output for both the complete stimulus and separated in stimulus segments.

These results corroborate that acoustically evoked inhibition significantly shapes the SBC output activity and results in sparser and more reproducible SBC firing compared to ANF input. Next, we simulated a rate-dependent subtractive inhibition (see Materials and methods and Figure 2 – supporting Figure 1 for details) and compared the simulation results to the experimental data. We found that the simulated response (Figure 2Bii, green) showed changes similar to the measured SBC output, suggesting that activity-dependent subtractive inhibition is a possible candidate mechanism in shaping SBC output activity during acoustic stimulation.

Dynamic stimulus reconstruction from population responses

While the gain control at the SBC output seems beneficial for the input to coincidence detector neurons in the MSO, this increase in precision and reproducibility might be achieved at the expense of overall stimulus representation. This is already indicated by the flattened rate-level curves in SBCs compared to ANFs (Keine et al., 2016, Figure 3). We therefore estimated how well the neuronal responses of ANFs, SBCs and also the failed EPSPs represent the acoustic stimulus.

Figure 3 with 1 supplement see all
Stimulus representation in the SBC responses is less accurate than in the ANF activity or in the inhibited EPSPs (EPSPfail).

(A) Stimulus reconstruction was performed by estimating linear reconstruction kernels (2) for ANF-, EPSPfail- and SBC responses (1), separately. The respective reconstructions were then used to predict stimulus reconstructions (3), which were then compared to the real stimulus used in the estimation (4). (B) Spectrogram of the real stimulus. The frequency range was reduced from its original range of 16 kHz (see Figure 1B) to the range represented in the ANFs’ receptive fields (0.3–4.4 kHz, CFs shown as blue dots on the left). Color scales are identical for all spectrograms. PSD = power spectral density. (C) The population response of the ANFs sorted by CF (top) and the ANF-reconstructed stimulus (bottom). The global structure and even the envelope fine-structure is preserved in the reconstruction. For more finely resolved spectrograms see Figure 4. (D) The rate of EPSPfail sorted as above (top) and the EPSPfail-reconstructed stimulus (bottom). Again, global structure and envelope fine-structure are preserved with inaccuracies in the overall range. (E) The SBC AP population response (top) and the SBC-reconstructed stimulus (bottom). While the envelope fine-structure appears again preserved, the range of the reconstruction is much more limited, i.e. relatively faint parts appear louder in the reconstruction louder than expected (e.g. around 9 s), and vice versa loud parts appear fainter (e.g. after 6 s). (F) The joint histogram across levels between real (abscissa) and reconstructed (ordinate) for ANF responses. The correlation between the two is evident (compare to grey diagonal representing x = y), with a slight deviation below −23 dB, were the reconstructed stimulus did not cover low enough levels (yellow line indicates linear regression). (G) The EPSPfail-based joint histogram with true level exhibits an overall similar shape as the SBCs (see panel J for detailed comparison). (H) The SBC-based joint histogram is more widely distributed around the diagonal and limited in range (see panel I for detailed comparison). (I) Subtraction of the SBC-based from the ANF-based histogram indicates an increase in width apparent by the negative (blue) margins and the positive (red) spine. (J) Subtracting instead the EPSPfail-based from the ANF-based histogram, leads to a much smaller difference with an even better correlation around the diagonal for EPSPfail (blue parts on diagonal) for low and high levels. (K) The correlation between the real and the reconstructed stimulus was significantly worse for SBCs compared with either ANF or EPSPfail. Correlation was mostly governed by temporal (middle), rather than spectral (right) variations for all three signals (n = 10 cross-validation sections, based on 32 neurons, *p<0.05, **p<0.01, see Table 1 for exact p-values).
Figure 3—source data 1

Correlation between real acoustic and reconstructed stimulus based on ANF input, SBC output and EPSPfail signals.

Responses of sensory neurons covary with certain aspects of an externally presented stimulus. For single neurons, this covariation is thus often quantified in relation to certain stimulus properties (frequency, sound level, lag, etc.) establishing different types of receptive fields (as in Keine et al., 2016). While informative about single-cell properties, these analyses fail to provide a more complete understanding of the representation on the population level, which assesses the different stimulus aspects jointly. For this purpose, it is convenient to combine the population responses and relate them to the original stimulus representation. One general approach of this kind is stimulus reconstruction (Stanley et al., 1999; Mesgarani et al., 2009; Mesgarani and Chang, 2012), which performs a prediction of the stimulus based on a multitude of neural responses (see Figure 3A for illustration and Materials and methods for details).

Stimulus reconstruction based on the ANF responses (Figure 3C) provided a faithful estimate of the original stimulus (Figure 3B, reconstructed frequency range was restricted to encompass the cell’s receptive fields, see Figure 4 for zoomed samples), in particular representing large fluctuations in sound level. EPSPfail-based reconstructions (Figure 3D) appeared similar with even more pronounced representation of sound level, apparent in the population dynamics (top, grey). The SBC-based reconstruction (Figure 3E), on the other hand, showed decreased overall representation of level dynamics with both high and low levels closer to the average stimulus level (average gray level, see Discussion for a mechanistic interpretation).

Envelope fine-structure properties of the reconstructed stimulus differ between ANF and SBC responses.

(A) A 250 ms snippet of the real spectrogram (Figure 3B) zoomed in. The yellow box indicates a region for comparing the across frequency correlations across the different spectrograms (see G). (B) The reconstructed stimulus from the ANFs shows that temporal features of the stimulus can be reconstructed down to approximately 5–10 ms (visual estimate), which is coarser than the resolution of the spectrogram (2 ms). (C) The EPSPfail reconstruction appears similar to the ANF reconstruction, with an overall smoother appearance. (D) The SBC reconstruction appears more ‘vertical’, that is, with more correlation across frequencies (see G) and less modulation, but otherwise temporally sharp (see E). (E) The temporal precision of the different reconstructions was assessed by the width of the autocorrelation (inset: width of peak), resolved at multiple heights (inset: horizontal lines, black to red) relative to the correlation at Δt = 0 (inset: maximum). The SBC (blue) reconstruction was most precise, while the EPSPfail (grey) was least precise (2-way ANOVA with factors ‘relative height’ and ‘response type’, see panel for p-values, n = 10 stimulus sections). (F) The emphasis of the temporal modulations was overall similar with a significant overrepresentation of the 100–160 Hz in the SBC reconstruction (PSD = power spectral density, 2 SEM shown, however, very small variation, see panel for p-values, black dots indicate regions of significant deviation with False-Discovery-Rates at p<0.001, Benjamini and Hochberg, 1995). (G) The spectral correlation of SBC was larger than for ANFs and EPSPfail for large frequency separations (>2 kHz, see panel for p-values). Correlations were computed for different frequency separations (abscissa), but within each time-bin.
Figure 4—source data 1

Properties of the reconstructed stimulus based on ANF, SBC and EPSPfail activity.

We compared the dynamic range of the individual reconstructions by computing conditional level densities (CLD, see Methods for details). Each entry is a conditional probability Pcond (shown in grayscale) of a level in the reconstruction (ordinate), given that the same spectrotemporal bin in the real stimulus had a given level (abscissa). The CLDs (Figure 3F–H) reflected the differences in level representation (described above) by steeper slopes (yellow lines) for ANFs and EPSPfail. All three slopes were differed from each other, that is the 95% confidence intervals of the slopes are non-overlapping (ANF: [0.54 ,0.542], EPSPfail: [0.6 ,0.602], SBC: [0.427, 0.429]). To directly compare the CLDs, we subtracted pairs of CLDs for two response types (Figure 3I/J). Differences in level representation were particularly salient at the high and low-level edges (high and low level ends).

The overall reconstruction quality was quantified by cross-validated correlation (Figure 3K, Materials and methods) and was lower for SBCs (orange) than for ANFs (blue, p=0.017, Wilcoxon signed rank test, n = 10 stimulus sections, Bonferroni-corrected, same tests and n below, see Figure 3—source data 1). Surprisingly, EPSPfail-based reconstruction quality was better than for ANFs (Figure 3K, left, grey, p=0.006) despite a lower number of spikes compared to the ANF input. Temporal dynamics contributed most to the reconstruction quality (Figure 3K, middle, p<0.02 for all pairwise comparisons), while the frequency representation was comparably inaccurate (Figure 3K, right, p<0.006 for SBC vs. ANF/EPSPfail, but p=0.39 for EPSPfail vs ANF). Importantly, both temporal and frequency correlations were decreased for SBC-based reconstructions (p<0.02 for both). This was at least partially caused by the inhibitory gain control, reflected by a reduction in the variance of the reconstructed stimulus (ANF: 21.1 ± 3.3 dB, EPSPfail: 22.3 ± 4.1 dB, SBC: 15.9 ± 2.0 dB, SBC vs. ANF: p<0.005 and EPSPfail: p<0.005, EPSPfail vs. ANF: p=0.32). While the reconstruction quality could potentially be improved with a larger sample size, notably, 32 ANFs lead to a comparable reconstruction quality as >250 neurons in the primary auditory cortex (Mesgarani et al., 2009). Lastly, reconstructions from SBC responses simulated as ANF responses subjected to subtractive inhibition (as in Figure 2, ANF + SI, see Methods for details and Figure 2—figure supplement 1) were statistically comparable to real SBC reconstructions (Figure 3—figure supplement 1).

While the SBC-based reconstruction indicates an overall less faithful representation of the acoustic stimulus, they also reflect the envelope fine-structure improvement (Figure 4A–D) demonstrated previously in SBC responses (Dehmel et al., 2010; Keine et al., 2016). The autocorrelation of the SBC reconstruction within a frequency band was sharper compared to ANFs and EPSPfail (Figure 4E, p<0.001 for relative height, p<0.002 for response type, 2-way ANOVA), where the autocorrelation width was assessed at different levels relative to the peak of the autocorrelation (abscissa in Figure 4E). This sharpening is probably due to a highlighted frequency range between 100–150 Hz (corresponding to a period of 7–10 ms) in their power-spectrum (Figure 4F, p<0.001 for modulation rate, p<0.001 for response type, 2-way ANOVA on the range of 100–150 Hz), which may correspond to the inhibitory time-constant measured in vivo (~10 ms, Nerlich et al., 2014; Keine and Rübsamen, 2015; Keine et al., 2016). Finally, the correlation across frequencies (within a given time bin) was increased for the SBC output compared to the ANF input (Figure 4G, p<0.001 for frequency distance, p<0.02 for response type, 2-way ANOVA). We interpret this as a focus on temporal events in any frequency location, with a corresponding loss in representing frequency fine-structure (compare also Figure 4B/D).

Taken together, we found that during natural acoustic stimulation, SBC output activity exhibited increased sparsity and reproducibility with SBC firing rates unchanged compared to spontaneous activity. While the SBC output encoded temporal features of the acoustic signal with higher fidelity, the overall stimulus was represented less faithfully compared to the ANF input.


The auditory system faces the challenge to localize and identify sounds in complex acoustic environments. We find that already at an initial stage of the central auditory system, the neural representation of natural sounds is conditioned to be sparser and more reproducible at the SBCs compared to its ANF input. This effect was even larger compared to other complex sounds, consistent with theoretical predictions (Lewicki, 2002; Smith and Lewicki, 2006). While signal integration at SBCs thus supports the extraction of temporal features, we found that their ability to represent the stimulus across all stimulus levels is limited in comparison to the ANF input.

The reduction in stimulus representation appears to be largely caused by a compression in the representation, leading to a reduced dynamic range (Figure 3F–H, in particular 3I) which missed out on the low and high levels in the stimulus (Figure 3C–E). This reduction in represented stimulus range resulted in an overall reduced variance in the SBC stimulus reconstruction. We hypothesize that this limitation in reconstruction gain is a consequence of the inhibitory control on the SBCs’ output gain, which flattens their rate-level curves (Keine et al., 2016, Figure 3). The inhibitory modulation at SBCs appears to specifically remove these extreme levels, as indicated by the improved representation based on the blocked ANF inputs only (Figure 3D). Remarkably, the latter representation even improves in comparison with the overall ANF input, indicating that the reduced SBC representation is not due to their lower firing rate in comparison with the ANFs.

All stimulus information available to the auditory system is encoded by the ANF responses and the processing in downstream nuclei will entail emphasizing different subsets of this information. Typically, this has detrimental effects on the non-emphasized part. In the present case, the SBC’s focus on temporal information limits the representation of sound level information. The latter is relevant for loudness-based localization (in the azimuth via ILDs, Galambos et al., 1959; Sanes and Rubel, 1988; Joris and Yin, 1995; Batra et al., 1997), estimation of elevations using spectral cues introduced via head-related transfer functions (Blauert, 1997; Grothe et al., 2010), and potentially for sound identification. The ascending input to the LSO (high-frequency SBCs, Warr, 1966; Glendenning et al., 1985; Shneiderman and Henkel, 1985; Cant and Casseday, 1986; Smith et al., 1993) as well as signal processing in other regions of the cochlear nucleus (e.g. cells in the DCN, Nelken and Young, 1994) may emphasize other features of the acoustic stimulus by integrating information differently. For the processing of stimulus envelope, relevant in ILD-based sound localization in the LSO (for review see Tollin, 2003 and Grothe et al., 2010), the optimal processing strategy appears less clear: improving the temporal precision in representing the envelope may be relevant in addition to accurately representing stimulus level.

The increase in sparsity from ANF to SBC for natural stimuli exceeded the increase for synthetic stimuli (Keine et al., 2016) and the inhibitory gain control was more prominent for natural stimuli, resulting in SBC firing rates comparable to spontaneous activity. Intrinsic properties of natural acoustic (Rieke et al., 1995; Attias and Schreiner, 1997; Nelken et al., 1999; Lewicki, 2002; Hsu et al., 2004; Chechik and Nelken, 2012, reviewed in Theunissen and Elie, 2014) and visual (e.g. Reinagel and Laughlin, 2001) stimuli have been highlighted before to provide specific properties of the neural response (typically efficient coding), potentially through evolutionary adaptation. Mechanistically, we think that the differences in spectrotemporal structure between the stimuli cause this effect in multiple ways.

First, the natural stimuli exceed the artificial stimuli in spectral width in all sections. Hence, both the excitatory and the inhibitory inputs represent integration over larger frequency ranges. In addition, since activation across frequencies is less coordinated than in the more local, synthetic stimuli, the ANF is driven more diversely for natural stimuli, and its response is thus less sparse and reproducible for almost all sections (Figure 2, compared to Figure 8, Keine et al., 2016). Therefore, a similar absolute increase in sparsity, and a smaller absolute increase in reproducibility lead to a larger relative increase in both cases. It remains to be investigated, whether absolute or relative increases are more important for neurons receiving input from SBCs.

Second, the natural stimuli showed a different modulation profile, ranging from little in rain, to nearly complete modulation for the segment sand. Again, this influences the sparsity and reproducibility already at the level of the ANF input, but will also interact with the inhibitory integration properties. Finally, correlations in the spectrogram may contribute to the differences to synthetic stimuli. Similar findings have been reported for the visual cortex (Froudarakis et al., 2014), however, for the auditory brainstem this remains speculation at the current stage.

The integration performed by inhibition could be viewed from the perspective of predictive coding (Rao and Ballard, 1999), where the prediction of expected information is subtracted from the current stimulus representation, in order to minimize the number of transmitted spikes for a certain set of statistics, e.g. natural statistics. Based on the progression of time scales represented along the auditory pathway, it could be speculated that the inhibition provides short-term contextual information against which a change in stimulus statistics could be compared. Hence, the integration may serve to detect low-level changes in the stimulus statistics, similar in principle to the integration of evidence on the cortical level (Boubenec et al., 2017).

In summary, the present study using natural stimuli suggests a possible specific adaptation of the inhibitory gain control, leading to unchanged firing rates and larger increases in sparsity and reproducibility compared to synthetic stimuli. These results also indicate, that using acoustic stimuli resembling natural sounds might be necessary to fully understand properties of synaptic integration and signal processing already at initial stages of the auditory system. However, the detailed relation between natural and synthetic stimuli needs to be further explored using hybrid stimuli which isolate specific natural properties and combine them with synthetic stimuli (e.g. SPORCs in David et al., 2009).

Materials and methods

Animals and surgical procedure

Request a detailed protocol

All experiments were performed at the Neurobiology Laboratories of the Faculty of Bioscience, Pharmacy and Psychology of the University of Leipzig (Germany), approved by the Saxonian District Government Leipzig (TVV 06/09) and conducted according to the European Communities Council Directive (86/609/EEC). Animals were housed in the animal facility of the Institute of Biology with 12 hr light/dark cycle and access to food and water ad libitum. 

In vivo loose-patch recordings were conducted as described previously (Keine et al., 2016). In brief, young adult Mongolian Gerbils aged 6–8 weeks were anesthetized by an intraperitoneal injection of a mixture of ketamine (140 μg/g body weight, Ketamin-Ratiopharm, Ratiopharm, Ulm, Germany) and xylazine hydrochloride (3 μg/g body weight, Rompun, Beyer, Leverkusen, Germany). The animal’s skull was exposed and a brass head post glued to the skull to fix the animal in a custom-built stereotactic apparatus in a prone position. Recording electrodes were pulled from borosilicate glass (GB150F-10, Science Products, Hofheim, Germany) to have impedance of 3–5 MΩ when filled with the pipette solution (in mM): 135 NaCl, 5.4 KCl, 1 MgCl2, 1.8 CaCl2, 5 HEPES, pH adjusted to 7.3 with NaOH. The recording electrode was lowered through a hole in the skull into the anterior portion of the ventral cochlear nucleus (AVCN). High-positive pressure was applied (200 mbar) when passing through non-auditory tissue and reduced to 30 mbar when entering the AVCN. When approaching a cell, the pressure was equalized or slight negative pressure (−5 mbar) applied. Single-units were recorded when exhibiting a positive AP amplitude of at least 2 mV and showing the characteristic complex waveform identifying them as large spherical bushy cells of the rostral AVCN (Pfeiffer, 1966; Winter and Palmer, 1990; Englitz et al., 2009; Typlt et al., 2010).

Acoustic stimulation

Request a detailed protocol

Recordings were performed in a sound-attenuating and electrically isolated chamber (Type 400, Industrial Acoustics, Niederkrüchten, Germany). Acoustic stimuli were generated by custom-written Matlab (RRID:SCR_001622) functions and delivered via a custom-built earphone (DT48, Beyerdynamic, Heilbronn, Germany) positioned just in front of the ear canal. Acoustic stimuli were composed of environmental sounds and consisted of seven segments of length 3.46 s with cos2 amplitude transitions of 460 ms between consecutive segments to prevent unexpected transients (see Supplementary file 1 for the audio file containing the stimulus). The stimulus had a total length of 18.7 s and was presented at least 20 times for each cell at 40 dB SPL with maximal sound intensities of 85 dB SPL.

Data analysis

Request a detailed protocol

Recorded voltage signals were digitized at 97.7 kHz (24 bit, RP2.1, Tucker-Davis Technologies) and filtered between 5 Hz and 7.5 kHz using a zero-phase digital IIR filter. Neuronal signals were detected by the fast upward stroke of the excitatory postsynaptic potential (EPSP) and separated into events which successfully trigger a postsynaptic AP (EPSPsucc) and events that fail to trigger a postsynaptic action potential (EPSPfail). Sparsity and reproducibility of the neuronal response were separately computed for the ANF input (EPSPsucc + EPSPfail) and the SBC output (EPSPsucc).

The sparsity of the response rate was computed as the variance-based method described previously (Rolls and Tovee, 1995; Willmore and Tolhurst, 2001) with sparsity defined as

S=1-rtt2 / r(t)2t

where .t indicates an average over time. Its values range between 0 (maximally dense) and 1 (maximally sparse).

The reproducibility of the neural response was computed in two steps. First, the raw cross-correlation was computed between two trials, divided by (S1 S2/Nbins), that is, the product of the number of spikes in each trial, divided by the number of time bins. Second, the cross-correlation was averaged across all pairs of non-identical trials, and then normalized by the number of such pairs, equal to N (N-1)/2, where N is the total number of trials. Finally, the cross-correlation at time 0 was chosen, and ‘1’ subtracted from it, in order to obtain a measure which equals 0 for a random process with fixed rate (aside from the centering around 0, it is thus very similar to the correlation index by Joris et al., 2006). Values > 0 indicate above chance correlation in the response between trials, beyond what would be expected based on correlation in rate alone. Importantly, reproducibility can attain values >1, with an upper limit determined by the firing rates in the different trials.

Stimulus reconstruction

Request a detailed protocol

The set of responses from all recorded neurons was used to re-estimate the stimulus spectrogram. A linear reconstruction approach was implemented, which amounts to a linear regression between the neural responses as predictors, and the spectrogram of the stimulus, computed as the absolute value of the short-term Fourier transform, as dependent variables (Mesgarani et al., 2009; Mesgarani and Chang, 2012). This kind of population-based approach allows a combination of the overall information available in the neural response for accurately re-estimating the stimulus. Explicit stimulus reconstruction provides a standardized way to compare stimulus representations along stages of a sensory system, here between the ANF, SBC, and failed EPSPs. Clearly, the brain may use different strategies internally to decode or transform this information, although the present approach has proven useful for approximating the resulting percept (Mesgarani et al., 2009; Mesgarani and Chang, 2012).

For this purpose, both the response and the stimulus spectrogram were resampled at 2 kHz. For the stimulus this was achieved by allowing neighboring time-bins to overlap by multiple samples, concretely the stimulus was divided into overlapping sections of 512 samples, starting at round (it×SRsound/SRspectrogram) for each time step it, the acoustic stimulus’ sampling rate SRsound = 97.65625 kHz, and the desired spectrogram sampling rate of 2 kHz. Neighboring stimulus bins are thus not independent, but the match between stimulus and response sampling rate is required for the general estimation procedure. Based on the chosen stimulus representation, no phase-based fine-structure predictions are possible, therefore, all analyses relating to temporal precision thus relate to envelope fine-structure. All displayed spectrograms are shown in the same scale, dB scaled (10 log10).

The reconstruction procedure was carried out as described in detail before (Mesgarani et al., 2009). Briefly, with the stimulus spectrogram denoted by S(t,f), the responses by R(t,n), and the reconstruction kernels per frequency band by gf(τ,n), the assumed response-stimulus relation can be written as

(1) S^(t,f)= τngf(τ,n)R(tτ,n)

The kernels gf can be estimated via the classical normal equation

(2) gf=CRR-1+λ ICRSf

with the cross-correlation term CRSf=RSfT, and the response correlation matrix CRR=RRT, including a ridge regression term to avoid degeneracies in the inversion of CRR (see Mesgarani et al., 2009 for a more detailed derivation) with λ=0.1. During crossvalidation, Equation 1 is then used to reconstruct the stimulus from estimates of gf on the non-predicted stimulus sections. Each gf is based on the activity of all neurons n at a range of delays τ, and thus given by a matrix (see Figure 3A middle bottom for an example). The stimulus was constrained to 300–4400 Hz, slightly extending the range of the characteristic frequencies (CFs, [1.18, 3.05] kHz) of the present sample (see Figure 3B, blue dots indicate the CFs of all cells).

The quality of reconstruction was assessed using correlation coefficients between the original and the reconstructed stimulus. The displayed values (Figure 3) are averaged between the cross-validated (divided into 10 segments) and the insample estimates, based on considerations regarding the influence of noise on estimating model-performance (Ahrens et al., 2008). Further, the original stimulus and the reconstruction were compared using conditional level densities (see below, Figure 3), the width of the autocorrelation (Figure 4E), the frequency spectrum of the activations in the spectrogram (Figure 4F), and the spectral auto-correlation (Figure 4G).

The conditional level densities (CLD) were computed as the joint histogram across levels between the real and the reconstructed spectrogram, normalized by the probability of the level in the real stimulus. The normalization was introduced to highlight the differences that occur mostly at the stimulus extremes, which are less probable in the distribution of levels. Hence, a column in a CLD sums to one, and constitutes the (empirical) conditional probability density of the estimated spectrogram, relative to the original spectrogram. We also computed differences of two CLDs to highlight changes between reconstructions (e.g. Figure 4I,J): the difference here indicates a relative (i.e. between the reconstructions) prevalence/scarcity of levels in the reconstruction in relation to the real stimulus.

The spectral autocorrelation was computed as a cross-correlation across frequencies at every timestep, averaged over the length of the stimulus (Figure 4G). It indicates how predictable the envelope level is across frequencies for the different reconstructions.

Simulation of subtractive inhibition

Request a detailed protocol

We extended our simulation of subtractive inhibition to include an estimate of previous activity in order to match it more closely with the experimentally measured delay and temporal dynamics. Inhibition was modelled to depend on the recent history of ANF firing activity in this CF range: First the PSTH of a given ANF input was time shifted by a time tc, and then integrated with an exponential kernel with a time-constant τI to obtain an intermediate representation AI(t). The resulting signal was passed through a static, sigmoidal nonlinearity given by

SIt=I01+e-S (AIt-O)

The sigmoid’s shape is controlled by I0, S and O, where I0 is the instantaneous rate of inhibition that is maximally subtracted, S is the (inverse) slope of activation, and O the value of the integrated signal AI, where the sigmoid reached 50% of I0, hence, controlling the horizontal offset. These three steps account for the integration (τI), nonlinear spike-elicitation (S, I0, O) and the conduction delay (tc). We manually estimated a set of parameters that accounted best for each SBC’s failure rates under spontaneous and natural stimulation, as well as the overall reconstruction quality and variance (see Figure 2—figure supplement 1 and Figure 3—figure supplement 1): The first three parameters were set fixed across all cells to tc=0.5 ms, τI=3 ms, S=5. The last two parameters were set individually per cell to match the experimentally observed failures rates (Figure 2—figure supplement 1). Automatic fitting was infeasible since there appeared to be many plateaus due to the discrete nature of the spikes.

The resulting output of the model constitutes a rate of spikes and was then subtracted from the neurons instantaneous activity by deleting the corresponding number of spikes (randomly across trials) in this time bin. If the number of existing spikes per bin was lower than the number of spikes allocated to subtraction, the individual bin was set to 0.

Prior to applying the subtractive inhibition, the spontaneous failure rate was accounted for by removing a random set of spikes, whose size was matched to the spontaneous failure rate of the cell.


Data sets were tested for Gaussianity using the Shapiro-Wilk test (Shapiro and Wilk, 1965). Within-subject comparisons were performed by paired t-test (normally distributed) or Wilcoxon signed rank test (otherwise). For interpretation of all results, a p-value less than 0.05 was deemed significant, where p-values<10−5 are reported as p<0.001 in the text and figures. The effect size was calculated using the MES toolbox in Matlab (Hentschke and Stüttgen, 2011) and reported as Cohen’s U1. No statistical methods were used to pre-determine sample size. Exact p-values and test statistics are summarized in Table 1.


    1. Attias H
    2. Schreiner CE
    Temporal low-order statistics of natural sounds
    Advances in Neural Information Processing Systems 9:27–33.
    1. Batra R
    2. Kuwada S
    3. Fitzpatrick DC
    Sensitivity to interaural temporal disparities of low- and high-frequency neurons in the superior olivary complex. II. Coincidence detection
    Journal of Neurophysiology 78:1237–1247.
    1. Blauert J
    Spatial Hearing: The Psychophysics of Human Sound Localization
    36–200, Spatial hearing with one sound source, Spatial Hearing: The Psychophysics of Human Sound Localization, Cambridge, MIT Press.
    1. Caspary DM
    2. Backoff PM
    3. Finlayson PG
    4. Palombi PS
    Inhibitory inputs modulate discharge rate within frequency receptive fields of anteroventral cochlear nucleus neurons.
    Journal of Neurophysiology 72:2124–2157.
    1. Galambos R
    2. Schwartzkopff J
    3. Rupert A
    Microelectrode study of superior olivary nuclei
    The American Journal of Physiology 197:527–536.
    1. Joris PX
    2. Yin TC
    Envelope coding in the lateral superior olive. I. Sensitivity to interaural time differences
    Journal of Neurophysiology 73:1043–1062.
    1. Kopp-Scheinpflug C
    2. Dehmel S
    3. Dörrscheidt GJ
    4. Rübsamen R
    Interaction of excitation and inhibition in anteroventral cochlear nucleus neurons that receive large endbulb synaptic endings
    Journal of Neuroscience 22:11004–11018.
    1. Nelken I
    2. Young ED
    Two separate inhibitory mechanisms shape the responses of dorsal cochlear nucleus type IV units to narrowband and wideband stimuli
    Journal of Neurophysiology 71:2446–2462.
    1. Reinagel P
    2. Laughlin S
    Natural stimulus statistics
    Network 12:237–240.
    1. Rolls ET
    2. Tovee MJ
    Sparseness of the neuronal representation of stimuli in the primate temporal visual cortex
    Journal of Neurophysiology 73:713–726.
    1. Sanes DH
    2. Rubel EW
    The ontogeny of inhibition and excitation in the gerbil lateral superior olive
    Journal of Neuroscience 8:682–700.
    1. Shapiro S
    2. Wilk M
    An analysis of variance test for normality
    Biometrika 52:591–611.
    1. Stanley GB
    2. Li FF
    3. Dan Y
    Reconstruction of natural scenes from ensemble responses in the lateral geniculate nucleus
    Journal of Neuroscience 19:8036–8042.

Article and author information

Author details

  1. Christian Keine

    1. Carver College of Medicine, Department of Anatomy and Cell Biology, University of Iowa, Iowa City, United States
    2. Faculty of Bioscience, Pharmacy and Psychology, University of Leipzig, Leipzig, Germany
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8953-2593
  2. Rudolf Rübsamen

    Faculty of Bioscience, Pharmacy and Psychology, University of Leipzig, Leipzig, Germany
    Conceptualization, Resources, Supervision, Funding acquisition, Writing—review and editing
    Competing interests
    No competing interests declared
  3. Bernhard Englitz

    Donders Center for Neuroscience, Department of Neurophysiology, Radboud University, Nijmegen, Netherlands
    Conceptualization, Software, Formal analysis, Validation, Methodology, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9106-0356


Deutsche Forschungsgemeinschaft (RU 390/19-1)

  • Rudolf Rübsamen

Deutsche Forschungsgemeinschaft (RU 390/20-1)

  • Rudolf Rübsamen

European Commission (Marie Sklodowska Curie Fellowship 660328)

  • Bernhard Englitz

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.


This work was supported by the German Research Foundation (DFG Priority Program 1608 ‘Ultrafast and temporally precise information processing: Normal and dysfunctional hearing’ [RU390/19–1, RU390/20–1]), and Marie Sklodowska Curie Fellowship 660328. The authors thank the three anonymous reviewers for their constructive feedback which substantially improved the manuscript. The authors declare no competing financial interests.


Animal experimentation: Animal experimentation: All experiments were approved by the Saxonian District Government, Leipzig (TVV 06/09), and conducted according to the European Communities Council Directive (86/609/ EEC).

Version history

  1. Received: June 23, 2017
  2. Accepted: September 25, 2017
  3. Accepted Manuscript published: September 25, 2017 (version 1)
  4. Version of Record published: October 3, 2017 (version 2)


© 2017, Keine et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.


  • 792
  • 109
  • 13

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Christian Keine
  2. Rudolf Rübsamen
  3. Bernhard Englitz
Signal integration at spherical bushy cells enhances representation of temporal structure but limits its range
eLife 6:e29639.

Share this article

Further reading

    1. Neuroscience
    Ya-Hui Lin, Li-Wen Wang ... Li-An Chu
    Research Article

    Tissue-clearing and labeling techniques have revolutionized brain-wide imaging and analysis, yet their application to clinical formalin-fixed paraffin-embedded (FFPE) blocks remains challenging. We introduce HIF-Clear, a novel method for efficiently clearing and labeling centimeter-thick FFPE specimens using elevated temperature and concentrated detergents. HIF-Clear with multi-round immunolabeling reveals neuron circuitry regulating multiple neurotransmitter systems in a whole FFPE mouse brain and is able to be used as the evaluation of disease treatment efficiency. HIF-Clear also supports expansion microscopy and can be performed on a non-sectioned 15-year-old FFPE specimen, as well as a 3-month formalin-fixed mouse brain. Thus, HIF-Clear represents a feasible approach for researching archived FFPE specimens for future neuroscientific and 3D neuropathological analyses.

    1. Neuroscience
    Amanda Chu, Nicholas T Gordon ... Michael A McDannald
    Research Article

    Pavlovian fear conditioning has been extensively used to study the behavioral and neural basis of defensive systems. In a typical procedure, a cue is paired with foot shock, and subsequent cue presentation elicits freezing, a behavior theoretically linked to predator detection. Studies have since shown a fear conditioned cue can elicit locomotion, a behavior that - in addition to jumping, and rearing - is theoretically linked to imminent or occurring predation. A criticism of studies observing fear conditioned cue-elicited locomotion is that responding is non-associative. We gave rats Pavlovian fear discrimination over a baseline of reward seeking. TTL-triggered cameras captured 5 behavior frames/s around cue presentation. Experiment 1 examined the emergence of danger-specific behaviors over fear acquisition. Experiment 2 examined the expression of danger-specific behaviors in fear extinction. In total, we scored 112,000 frames for nine discrete behavior categories. Temporal ethograms show that during acquisition, a fear conditioned cue suppresses reward seeking and elicits freezing, but also elicits locomotion, jumping, and rearing - all of which are maximal when foot shock is imminent. During extinction, a fear conditioned cue most prominently suppresses reward seeking, and elicits locomotion that is timed to shock delivery. The independent expression of these behaviors in both experiments reveal a fear conditioned cue to orchestrate a temporally organized suite of behaviors.