1. Neuroscience
Download icon

Inhibition in the auditory brainstem enhances signal representation and regulates gain in complex acoustic environments

  1. Christian Keine  Is a corresponding author
  2. Rudolf Rübsamen
  3. Bernhard Englitz
  1. University of Leipzig, Germany
  2. Radboud University, Netherlands
Research Article
  • Cited 12
  • Views 1,160
  • Annotations
Cite this article as: eLife 2016;5:e19295 doi: 10.7554/eLife.19295

Abstract

Inhibition plays a crucial role in neural signal processing, shaping and limiting responses. In the auditory system, inhibition already modulates second order neurons in the cochlear nucleus, e.g. spherical bushy cells (SBCs). While the physiological basis of inhibition and excitation is well described, their functional interaction in signal processing remains elusive. Using a combination of in vivo loose-patch recordings, iontophoretic drug application, and detailed signal analysis in the Mongolian Gerbil, we demonstrate that inhibition is widely co-tuned with excitation, and leads only to minor sharpening of the spectral response properties. Combinations of complex stimuli and neuronal input-output analysis based on spectrotemporal receptive fields revealed inhibition to render the neuronal output temporally sparser and more reproducible than the input. Overall, inhibition plays a central role in improving the temporal response fidelity of SBCs across a wide range of input intensities and thereby provides the basis for high-fidelity signal processing.

https://doi.org/10.7554/eLife.19295.001

eLife digest

In humans and other animals, small differences in the time at which a sound arrives at each ear are crucial for determining the location of the sound. Neurons in the first processing station of the brain – the cochlear nucleus – receive information about sounds (or “inputs”) from the ears. They then produce electrical signals that relay this information to other areas of the brain. Some of these inputs increase the activity of the neurons and so are known as “excitatory” inputs, while other “inhibitory” inputs decrease the activity of the neurons. The balance between these two inputs determines what information is passed to other parts of the brain, but it is not clear how these inputs interact.

Keine, Rübsamen and Englitz studied electrical activity in the brains of Mongolian gerbils while being exposed to sounds with more natural properties than previously studied. The experiments reveal that inhibitory inputs play an important role in controlling the activity of neurons in the cochlear nucleus. By decreasing the neurons’ activity, inhibitory inputs allow these cells to respond to many different levels of sound, from very loud to very quiet. The experiments also show that excitatory and inhibitory inputs are triggered by similar sounds so that the two processes quickly balance each other. This means that the brain is equally able to work out where a sound is coming from regardless of whether it is loud or quiet.

Further work is now needed to understand responses to natural sounds and to determine how experimentally removing the inhibitory inputs affects hearing.

https://doi.org/10.7554/eLife.19295.002

Introduction

Dynamic processing in neural networks is controlled by an interplay of excitation and inhibition. In cortical processing, the dominant excitatory neurons interact reciprocally with inhibitory neurons, which serve key functions in shaping the responses (reviewed in Isaacson and Scanziani, 2011). In the auditory cortex, recent work has emphasized the role of inhibition in dynamically balancing excitation via a high degree of co-tuning (e.g. Wehr and Zador, 2003; Renart et al., 2010) that serves to shape and accelerate network dynamics. Similarly, in other modalities, inhibition was found to be co-tuned with excitation in the cortex, typically with a wider tuning, generating the well-described inhibitory sidebands (auditory: Wang et al., 2002; Wu et al., 2008, visual: Sohya et al., 2007; Niell and Stryker, 2008; Liu et al., 2009, 2011; Katzner et al., 2011, olfactory: Poo and Isaacson, 2009). Temporally, inhibition often follows excitation closely (auditory: Wehr and Zador, 2003, somatosensory: Wilent and Contreras, 2004).

In the auditory brainstem, the role of inhibition has also been studied, however, from a more fundamental perspective, without a focus on its functional role during complex stimulation. Various studies have shown prominent inhibitory influences on signal processing in the cochlear nucleus (Caspary et al., 1994; Kopp-Scheinpflug et al., 2002; Gai and Carney, 2008), in the medial and lateral superior olive (Grothe and Sanes, 1993; Brand et al., 2002; Myoga et al., 2014), and in the dorsal and ventral nuclei of the lateral lemniscus (Yang and Pollak, 1994, 1998; Burger and Pollak, 2001; Nayagam et al., 2005; Pecka et al., 2007; Spencer et al., 2015).

The cochlear nucleus (CN), the first stage of the central auditory system, is the starting point of distinct neuronal circuits involved in sound source localization. Spherical bushy cells (SBC) in the anteroventral division of the CN (AVCN) provide the temporally precise excitatory inputs to binaural neurons in the medial superior olive (MSO), where interaural time differences are computed (Yin and Chan, 1990). These SBCs receive suprathreshold excitatory input from auditory nerve fibers (ANF) via large axosomatic terminals, the endbulbs of Held (Brawer and Morest, 1975; Schwartz and Gulley, 1978; Ryugo and Sento, 1991; Nicol and Walmsley, 2002). In addition, inhibitory inputs on SBCs have been reported, which provide surprisingly slow acoustically evoked inhibition mediated by glycine and GABA (Wu and Oertel, 1986; Kolston et al., 1992; Juiz et al., 1996; Lim et al., 2000; Mahendrasingam et al., 2004; Xie and Manis, 2013), with glycine dominating (Nerlich et al., 2014b).

Due to the requirements of high-fidelity acoustic processing underlying sound localization, many studies focused on the fast and temporally precise signal transmission in auditory brainstem circuits. With respect to changes in temporal precision from ANF to neurons in the CN some studies – comparing population data – reported a general increase in temporal precision (Joris et al., 1994a, 1994b), while others found no change (Bourk, 1976; Blackburn and Sachs, 1989; Winter and Palmer, 1990), or reported decreased temporal precision at certain stimulation frequencies (Paolini et al., 2001; Fukui et al., 2006).

More recent studies advanced the analysis to the single-cell level, by comparing the endbulb of Held evoked excitatory postsynaptic potentials (EPSP) with the action potentials (AP) of the SBCs allowing for a direct comparison of ANF input and SBC output (Typlt et al., 2010). This enabled a direct assessment of the input-output function under the condition of acoustic stimulation, also in combination with pharmacological manipulations. The respective experiments revealed a slight increase in temporal precision of signal coding, attributed to the influence of acoustically evoked inhibition (Dehmel et al., 2010; Kuenzel et al., 2011; Keine and Rübsamen, 2015). It may be argued that the stimulus conditions employed were rather static and did not adequately reflect the challenge of processing the dynamics of spectrotemporal complex acoustic signals. While fast inhibition in T stellate cells has been attributed to a role in comodulation masking release (Pressnitzer et al., 2001), the inhibitory dynamics in SBCs seem to be too slow for such an effect (Xie and Manis, 2013). Previous studies investigated inhibition using pure tone stimulation, and this is why the functional role of inhibition in signal processing at the ANF-SBC synapse during complex acoustic stimulation has not been fully resolved.

In the present study, we set out to elucidate the functional role of acoustically evoked inhibition at the ANF-SBC synapse using combined in vivo loose-patch recordings with direct iontophoretic manipulation of inhibitory receptors and a detailed input-output signal analysis based on spectrotemporal receptive fields in responses to complex acoustic stimulation. Our results indicate a reliable co-tuning of inhibition with the main excitatory input. While we observed some sharpening of the response in time and frequency, our results suggest that inhibition functions as a gain control that renders the postsynaptic response sparser in time and more reproducible across trials. Temporal sparsity, i.e. a response restricted to fewer time-points, can increase the information per spike while reducing the energy expenditure. Reproducibility, i.e. a more consistent response to identical stimuli, can provide reliable stimulus encoding.

These improvements are a consequence of the combined subtractive/divisive action of glycine (Kuenzel et al., 2011, 2015): The subtractive component enhances the temporal sparsity by raising the threshold for spiking. The divisive component acts primarily as a gain control, which - in conjunction with the co-tuning - maintains the SBC output rate in a smaller range across different stimulus levels. Together these two effects focus the SBC output onto well-timed stimulus events across a wide range of stimulus levels. Thus, inhibition improves the basis for the high-fidelity signal processing in downstream nuclei crucial for sound localization irrespective of the prevailing stimulus levels.

Results

The interaction between acoustically evoked excitation and inhibition is a key constituent at the initial stages of signal processing in the auditory brainstem (Kopp-Scheinpflug et al., 2002; Dehmel et al., 2010; Kuenzel et al., 2011; Keine and Rübsamen, 2015). This study aimed for an investigation of sound-evoked inhibition on the processing of complex structured signals (mimicking broadband acoustic conditions) at the auditory nerve-to-spherical bushy cell synapse (ANF-SBC). A total of 85 units were recorded from the rostral pole of the anteroventral cochlear nucleus (AVCN), the location of large, low-frequency coding SBCs (Bazwinsky et al., 2008). The identification of SBCs was based on the following physiological properties: a discernible prepotential in addition to the complex waveform (Pfeiffer, 1966; Englitz et al., 2009; Typlt et al., 2010), short AP duration (Typlt et al., 2012), high spontaneous firing rates (Smith et al., 1993), and the primary like response pattern to pure-tone stimulation (Blackburn and Sachs, 1989). From these 85 cells, 23 were recorded while simultaneously applying glycine receptor agonists and antagonists. Units had a characteristic frequency (CF) of (mean ± standard deviation) 2.1 ± 0.6 kHz and a minimal threshold of (median [first quartile, third quartile]) 7.5 [0.8, 14.9] dB SPL.

To understand how acoustically evoked inhibition shapes SBC output, the present report focuses on the differential analysis between SBC EPSPs that trigger a postsynaptic AP, i.e. EPSPsucc and EPSPs that fail to trigger an AP, i.e. EPSPfail. Previous studies showed that during spontaneous activity, EPSP amplitudes are close to threshold, such that not all ANF input spikes trigger an SBC output spike. Also, acoustically evoked inhibition interacts dynamically with the EPSPs and prevents output spikes (Kuenzel et al., 2011; Keine and Rübsamen, 2015).

The respective differences between ANF input and SBC output can be analyzed from the complex waveform of SBC signals consisting of the presynaptic action potential (prepotential, PP) and the excitatory postsynaptic potential (EPSP) which may or may not be followed by an AP (Figure 1A). The fast EPSP rising slope served for the detection of both types of signals, while the dynamics of the signals’ falling slopes reliably allowed to distinguish between the two: (i) EPSPsucc, i.e. EPSPs that successfully trigger postsynaptic APs, and (ii) EPSPfail, i.e. EPSPs that fail to trigger APs. The maximum falling slope was consistently higher in EPSPsucc than in EPSPfail (EPSPsucc = 21.1 ± 4.9 vs. EPSPfail = 4.4 ± 1.2 V/s, difference [Δ] = 16.8 ± 4.3 V/s, p<0.001, paired t-test, n = 62, U1 = 1, Figure 1B left, see also Figure 1—source data 1). The sum of EPSPfail and EPSPsucc was defined as the ANF input to the SBC, while the subset of EPSPsucc indicated the output ascending to the next level of processing, i.e. the superior olivary complex.

Separation and attribution of pre- and postsynaptic neuronal response components.

(A) Left: Representative trace of an in-vivo loose-patch recording of a spherical bushy cell (SBC) showing both EPSPs followed by an action potential (arrows, blue dots) and EPSPs which fail to trigger an AP (arrowheads, gray dots). Right: Superimposing the events (50 events of each type) shows that both signal types share the presence of a prepotential (PP) and an EPSP, but may (EPSPsucc) or may not (EPSPfail) trigger a postsynaptic AP. (B) Left: Both types of events are clearly separable by the maximum falling slope, with APs showing much steeper falling slopes (blue, EPSPsucc) than EPSPs that fail to trigger an AP (gray, EPSPfail). Middle: EPSP rising slopes of EPSPsucc (blue) and EPSPfail (gray) show considerable overlap, with EPSPfail having consistently smaller rising slopes than EPSPsucc. Note the mono-modal, Gaussian distribution of all EPSP inputs (orange), suggesting that both types of events originate from the same source. Right: Population data of 62 units: EPSP falling slopes show completely different value ranges (right, p<0.001) which made it possible to clearly separate the two types of events. The respective EPSP rising slopes show considerable overlap (left), but still, the rising slopes of EPSPsucc were consistently higher than for EPSPfail (p<0.001). Triangles indicate the respective values of the representative cell on the left. Box plots show medians, interquartile and minimum/maximum values. (C) Left: During spontaneous activity, not all EPSPs trigger a postsynaptic AP, gray dots indicate EPSPfail, blue dots indicate EPSPsucc. Middle: When stimulated at CF, the discharge rate increases, but the ANF-SBC synapse becomes increasingly unreliable indicated by a high proportion of EPSPfail. Right: Population data show the considerable variance of failure fraction during spontaneous activity, and a consistent increase in failure fraction during acoustic stimulation. CF = characteristic frequency. Dots indicate values > 1.5 interquartile range.

https://doi.org/10.7554/eLife.19295.003

Unlike the falling slopes of the signals, the maximal EPSP rising slopes showed considerable overlap between EPSPfail and EPSPsucc (Figure 1B middle and right). Still, EPSPsucc had higher average EPSP rising slopes than EPSPfail (EPSPsucc = 9.9 ± 2.2 V/s vs. EPSPfail = 7.5 ± 2.2 V/s, Δ = 2.4 ± 1.5 V/s, p<0.001, paired t-test, n = 62, U1 = 0.1). Considering this difference, the EPSP rising slope can – to some degree – predict the probability of AP generation. During spontaneous activity, the failure fraction, defined as the proportion of EPSPfail of the ANF input (EPSPsucc + EPSPfail) amounted to 0.28 [0.11, 0.54] with considerable variability between cells (range: 0.01 to 0.91). Acoustic stimulation at the unit’s CF at 50 dB SPL, i.e. within the excitatory response area, increased the failure fraction to 0.49 [0.41, 0.59] rendering the ANF-SBC synapse less reliable during acoustic stimulation (Δ = 0.18 ± 0.29, p<0.001, paired t-test, n = 62, U1 = 0.2, Figure 1C).

Synaptic depression alone fails to account for increased failure rates

The increased incidence of failures during acoustic stimulation has been attributed to the activation of inhibitory inputs (Kopp-Scheinpflug et al., 2002; Kuenzel et al., 2011, 2015; Keine and Rübsamen, 2015). However, also in vitro experiments need to be considered, which showed strong depression at the ANF-SBC synapse (Wang and Manis, 2008; Yang and Xu-Friedman, 2008, 2009; Wang et al., 2010) affecting SBC responsiveness for up to tens of milliseconds (Yang and Xu-Friedman, 2015). Such depression might also suppress SBC spiking in vivo and result in an increased failure fraction during acoustic stimulation. Still, in vivo the impact of depression was shown to be smaller, since ongoing spontaneous activity – completely absent in slice recordings – seems to keep the synapse in a chronically depressed state (Hermann et al., 2007; Lorteije et al., 2009; Yang and Xu-Friedman, 2015). Also, the in vivo calcium concentration was reported to be lower than in the artificial cerebrospinal fluid usually used in slice studies resulting in lower vesicle release probabilities and thus smaller depression (Borst, 2010; Kuenzel et al., 2011; Friauf et al., 2015).

To determine the cause of altered reliability of synaptic transmission at the ANF-SBC synapse, and to dissect the effect of acoustically evoked inhibition from synaptic depression, we first quantified the dependence of the EPSP rising slope on the preceding spontaneous activity. As indicated above, the rising slopes of EPSPfail and EPSPsucc differ, but still show a considerable range of overlap. For each unit, the EPSP rising slopes were pooled for EPSPsucc and EPSPfail and binned. Then, the fraction of EPSPsucc was calculated for each bin, and a Boltzmann function was fitted to the EPSPsucc probability distribution (Figure 2A left). The symmetric inflection point of this function indicates the threshold EPSP, i.e. the EPSP slope necessary to trigger an AP with >50% probability. EPSP rising slopes showed strong depression for inter-event intervals (IEI) < 2 ms resulting in high AP failure rates. But, already for IEIs > 5 ms, the preceding activity had only a minor influence on EPSP rising slopes (Figure 2A, middle). Averaging the normalized EPSP slopes across cells showed a facilitating effect for IEI between 2 ms and 20 ms (green markers, Figure 2A right). The threshold EPSP was increased for IEIs < 2 ms, but not for longer IEIs (black line) and the increase in threshold EPSP resulted in an increased failure fraction for IEIs < 2 ms (orange histogram). While IEIs up to 20 ms resulted in increased EPSP slopes, the effect of IEIs on threshold EPSP and failure fraction was limited to short IEIs < 2 ms. These data are consistent with previous studies, suggesting the presence of short-term facilitation rather than depression of synaptic events. Considering only the last preceding IEI, however, disregards the potential impact of previous medium- and short-term afferent activity. Also, in vitro studies yielded the influence of short-term depression at the ANF-SBC synapse to extend well beyond the last IEI (Yang and Xu-Friedman, 2015).

Preceding activity has only a minor, facilitating influence on EPSP rising slopes.

(A) Left: Estimation of threshold EPSP for a representative cell: The EPSP rising slopes were binned (0.5 V/s bin size) and the proportion of EPSPsucc calculated for each bin. A Boltzmann function was fit to these data. The symmetric inflection point of this function was considered the threshold EPSP and indicates the EPSP rising slope necessary to generate an AP with >50% probability. Middle: The inter-event-interval (IEI) between synaptic inputs had only a small influence on the EPSP rising slope, with small IEIs being correlated with moderately increased EPSP rising slopes. A more prominent difference was observed between EPSPfail (gray) which showed consistently smaller rising slopes than EPSPsucc (blue) and these differences prevailed over a wide range of IEIs. For IEI < 2 ms the SBCs relative refractoriness renders virtually all EPSPs unsuccessful in triggering a postsynaptic AP. The black line indicates the threshold EPSP. Right: Grand average of normalized EPSP slope, threshold EPSP (left ordinate), and failure fraction (right ordinate) in dependence of preceding IEI pooled for EPSPsucc and EPSPfail (n = 62 cells). The average EPSP slope (green, left ordinate) showed facilitation for IEIs between 2–20 ms (error bars indicate standard deviation). The median threshold EPSP (black line, left ordinate) was elevated only for IEIs < 2 ms and well below average EPSP size for larger IEIs (shaded area indicates first and third quartile). The elevated threshold EPSP resulted in an increased failure fraction for IEIs < 2 ms, while for longer IEIs the reliability of AP generation seemed not to be affected (orange, right ordinate). (B) Consideration of a wider time span of preceding activity: Sketch of the quantification of preceding activity by exponentially weighting [W] all preceding EPSP rising slopes [S] (ANF activity) or AP amplitudes (SBC activity) depending on the distance to the event under investigation. (C) Left: EPSP rising slopes for the representative cell showed only minor dependence on previous ANF activity levels. Note that the EPSPfail (gray) showed consistently lower EPSP rising slopes (histogram on the left); still, the EPSPs slopes tend to increase during periods of high activity. The threshold EPSP (black line) increased as a function of ANF activity. Threshold EPSP was calculated for different levels of ANF activity (bin size = 0.5). Right: signal amplitudes of EPSPfail (gray) and APs (blue) as a function of preceding SBC activity showed decreasing AP amplitudes but increasing EPSP amplitudes. (D) Population data for 62 units (n = 62): While EPSP slopes tend to be elevated after periods of high activity (left), AP amplitudes showed a negative correlation with preceding SBC activity (p<0.001, one-sample t-test against zero). Triangles indicate the data of the representative cell. Organization of the graph as described above. .

https://doi.org/10.7554/eLife.19295.005

To determine the impact of preceding activity on EPSP strength and AP generation in vivo, the preceding activity of each event was quantified as a weighted sum of all previous events, using an exponentially decaying kernel with a time constant of 60 ms, emphasizing temporally closer events over more distant ones (Figure 2B). The analysis yielded only minor influences of preceding activity on EPSP rising slopes on both EPSPfail (gray) and EPSPsucc (blue) as shown in a representative cell in Figure 2C (left and middle) and also evidenced for the population of recorded units (Spearman’s rho EPSPfail = 0.22 ± 0.16 vs. EPSPsucc = 0.24 ± 0.11, Δ = 0.02 ± 0.12, p=0.27, paired t-test, n = 62, U1 = 0.09, Figure 2D, see also Figure 2—source data 1). A small but consistent effect, seen in 61/62 recorded units (98%), was a positive correlation (p<0.001) between preceding ANF activity and EPSP rising slopes indicating a facilitating rather than a depressive influence of higher activity levels in vivo.

Postsynaptic spike depression may also contribute to the increase in postsynaptic spike failures (EPSPfail). When analyzing the dependence of AP amplitude on preceding SBC spiking activity (exemplary unit shown in Figure 2C right) a significant negative correlation was observed in 92% of the cells (57/62) indicating smaller AP amplitudes after periods of higher SBC activity (Figure 2D). The representative unit shown in Figure 2C (right) shows the respective change in AP amplitude and an inverse effect on the amplitudes of EPSPfail, consistent with the facilitating influence on EPSP rising slopes. These results are in agreement with previous reports (Kuenzel et al., 2011, 2015) suggesting that endbulbs are mostly in a close-to-threshold state and show low synaptic depression in vivo.

These results suggest that the increased failure fraction during acoustic stimulation in vivo is not explainable by endbulb depression evoked by high firing rates, highlighting the role of acoustically evoked inhibition on the input-output relationship at the ANF-SBC synapse.

Broadband on-CF inhibition shapes SBC tuning

Frequency response areas (FRA) of SBCs show prominent inhibitory sidebands and reduced firing activity in the excitatory field compared to the ANF input (Kopp-Scheinpflug et al., 2002; Kuenzel et al., 2011; Keine and Rübsamen, 2015). Also, about half of the SBCs show pronounced non-monotonic rate-level functions pointing to an impact of inhibition (Kopp-Scheinpflug et al., 2002; Keine and Rübsamen, 2015; Kuenzel et al., 2015) which has been further classified as ‘on-CF inhibition’ and ‘broadband inhibition’ (Winter and Palmer, 1990; Caspary et al., 1994; Kopp-Scheinpflug et al., 2002). In vitro and modeling studies showed that glycinergic inhibition can elevate the threshold EPSP for AP initiation (Xie and Manis, 2013; Kuenzel et al., 2015). Thus the threshold EPSP can serve as a suitable indicator for the activation of inhibitory inputs.

In the present loose-patch recordings, elevation in threshold EPSP (Figure 3Aii/iii) was observed throughout the FRAs (Figure 3Ai/iii) accompanied by an increase in failure fraction (Figure 3Bii). The frequency profile of threshold EPSP elevation closely matched the one of increased failure fraction (Figure 3Aii/Bii, Spearman correlation rs = 0.7 [0.4, 0.76], p<0.001, Wilcoxon signed rank test, n = 62, U1 = 0.98, population data not shown). The FRA of threshold EPSP elevation was used to quantify the inhibitory influence, which was then compared to the SBC’s excitatory FRA. Both FRAs had similar CFs, defined as the stimulus frequency at which the lowest sound intensity resulted in a significant increase in ANF firing rate (excitatory) or threshold EPSP (inhibitory) (2.2 ± 0.6 kHz vs. 2.2 ± 0.9 kHz, respectively, Δ = 0.03 ± 0.83 kHz, p=0.77, paired t-test, n = 62, U1 = 0.05, Figure 3Ci), but inhibitory FRAs exhibited higher thresholds (excitatory = 4.8 ± 6.1 dB SPL vs. inhibitory = 19.8 ± 16.6 dB SPL, Δ = 15 ± 15.5 dB SPL, p<0.001, paired t-test, n = 62, U1 = 0.2, Figure 3Cii, see also Figure 3—source data 1).

Inhibition at SBC is co-tuned with excitation and broadband, not off-CF and narrowband.

(A) i: Representative frequency response area (FRA) of the excitatory ANF input (EPSPfail and EPSPsucc) characterized by a well-defined CF, the typical steep high-frequency flank, the formation of a low-frequency tail, and the absence of frequency-intensity domains of inhibition. ii: The same recording showed elevated threshold EPSPs throughout most of the excitatory response area and extending up to two octaves above CF. The frequency, where the lowest relative intensity caused elevated threshold EPSP, matched the units CF. iii: For the same unit, comparison of excitatory (ANF, gray) response area and frequency-intensity domain of inhibition (threshold EPSP elevation, red). The inhibitory domain was symmetrically arranged around the unit’s CF. (B) i: FRA of the SBC output (EPSPsucc) shows a considerable reduction in firing activity compared to the ANF input. ii: Failure fraction, i.e. the proportion of EPSPfail. The increase in failure fraction is most prominent around the units CF. Note the similarity of the frequency-intensity domains of EPSP threshold increase and the respective domains with increased EPSPfail in Aii. iii: Rate-level functions of ANF input (gray line, left ordinate) and SBC output (solid black line, left ordinate) compared to threshold EPSP (red, solid line, right ordinate). Increasing sound pressure levels result in a monotonic increase in ANF firing and correspondingly the threshold EPSP shows a monotonic increase. The SBC output is maximal at 20 dB SPL and declines towards higher stimulus intensities. (C) Population data: comparison of excitatory (ANF, gray) and inhibitory (threshold EPSP, red) FRA indicates (i) on-CF inhibition although (ii) with higher thresholds (p<0.001, paired t-test), which is (iii) broadly tuned (Q10: p<0.01, Q40: p<0.001, two-way RM ANOVA), and (iv) shows a more symmetric tuning (p<0.001, paired t-test; the schematic drawing on the right indicates FRA shapes for different asymmetry indices). (D) Finally, the rate-level functions were shallower and showed a reduced gain in firing rate in the output compared to the input. SBC = spherical bushy cell, CF = characteristic frequency, EPSP = excitatory postsynaptic potential, ANF = auditory nerve fibers, FRA = frequency response area.

https://doi.org/10.7554/eLife.19295.007

The width of inhibitory and excitatory FRA was determined by calculating Q10 and Q40 values. Both measures were smaller for the inhibitory FRA compared to the excitatory FRA, i.e. inhibition showed reduced frequency selectivity compared to excitation (Q10: excitatory = 2.5 ± 0.6 vs. inhibitory = 1.9 ± 1.1, Δ = 0.5 ± 1.2, p<0.01; Q40: excitatory = 0.96 ± 0.1 vs. inhibitory = 0.6 ± 0.3, Δ = 0.4 ± 0.3, p<0.001, two-way RM ANOVA, Bonferroni-adjusted, n = 62, η2 = 0.06, Figure 3Ciii).

Q-values provide information about the sharpness of tuning, but not about the actual shape of the FRA. The tuning shape was evaluated using the asymmetry index (AI, see Materials and methods), with values of 0 indicating symmetric, <0 for low-frequency extended and >0 for high frequency extended tuning curves. While excitatory FRAs showed distinct low-frequency tails, typical for ANF, inhibitory FRAs were mostly symmetrically arranged around CF, partly covering high-frequency ranges above the excitatory response area (AI excitation = –0.96 ± 0.42 vs AI inhibition = –0.26 ± 0.88, Δ = 0.7 ± 0.97, p<0.001, paired t-test, n = 62, U1 = 0.21, Figure 3Civ).

The rate-level function (RLF) of the SBC output was markedly flatter and thus less variable with respect to level than the rate-level function of the excitatory ANF input. The gain of the neuronal response across stimulus level was quantified as rate level gain (RLG), defined as RLG=log10(FRmaxFRminFRspont), with FRmax and FRmin being the maximal and minimal firing rate in the RLF and FRspont, the spontaneous firing rate in the absence of acoustic stimulation (see also supplementary Matlab code). This way, overall changes in firing rates are taken into account (e.g. due to spontaneous failures). The output’s rate level function had a gain of 1 ± 0.4 which was significantly less than the input’s (1.4 ± 0.4, Δ = 0.35 ± 0.3, p<0.001, paired t-test, n = 62, U1 = 0.1, Figure 3D).

Taken together, these data demonstrate that inhibition in SBC is co-tuned with excitation and shows a broader and more symmetric frequency profile, which results in flatter rate-level functions and high-frequency inhibitory sidebands at the fringes of the tuning curve (frequently observed in SBC output activity). These findings are consistent with previous reports (Caspary et al., 1994; Kopp-Scheinpflug et al., 2002; Kuenzel et al., 2011). The lower rate-level dependence suggests a gain-normalization function of inhibition, discussed in detail below.

Acoustically evoked inhibition elevates threshold for AP generation

As shown above, the tuning of the inhibitory input on SBCs largely matches the ANF excitation. In vitro studies and modeling suggested inhibition to prevent AP generation in SBC by elevating the threshold EPSP (Xie and Manis, 2013; Kuenzel et al., 2015). Slice studies reported a predominately glycinergic inhibition with a smaller GABAergic contribution (Nerlich et al., 2014a, 2014b), and in vivo studies showed an effective, dose-dependent block of SBC spiking by iontophoretic application of glycine (Keine and Rübsamen, 2015). Consequently, we tested if the activation of glycinergic inputs can directly cause the observed elevation of threshold EPSP. Glycine was applied iontophoretically, mimicking the putative role of glycinergic inhibition, while monitoring the SBC’s spontaneous activity. Indeed, glycine caused an increase in the number of EPSPs that failed to trigger APs, and this specific effect could be blocked by simultaneous application of the glycine receptor antagonist strychnine (Figure 4A). The iontophoretic current for glycine application was adjusted to cause an increase in the spontaneous failure fraction from 0.3 ± 0.17 to 0.64 ± 0.18 (Δ = 0.34 ± 0.12, p<0.001, paired t-test, n = 11, U1 = 0.5) to match the range observed under acoustic stimulation. This increase in failure fraction was accompanied by an elevation in threshold EPSP (threshold EPSP spont = 6.1 ± 2.2 V/s vs threshold EPSP glycine = 8.4 ± 1.8 V/s, Δ = 2.3 ± 0.9 V/s, p<0.001, paired t-test, n = 11, U1 = 0.32, Figure 4B, see also Figure 4—source data 1). The application of the carrier alone had neither an effect on threshold EPSPs (threshold EPSP spont = 6.8±1.6 V/s vs. threshold EPSP carrier = 6.8 ± 1.7, Δ = 0 ± 0.3, p=0.89, paired t-test, n = 9, U1 = 0.1, data not shown) nor on failure fraction (failure fraction spont = 0.29 ± 0.17 vs. failure fraction carrier = 0.27 ± 0.16, Δ = 0.03 ± 0.06, p=0.79, paired t-test, n = 9, U1 = 0.11, data not shown).

Glycinergic inhibition elevates threshold EPSP and becomes activated during acoustic stimulation.

(A) Representative recording of spontaneous activity with iontophoretically applied glycine to block SBC spiking (red bar). This effect is suspended by strychnine application (green bar). (B) (Bi) Left: During spontaneous activity, small EPSPs fail to generate APs (gray = EPSPfail, blue = EPSPsucc, black line = threshold EPSP). Right: Iontophoretic application of glycine elevates the threshold EPSP (solid red line) for spike generation resulting in an increased failure fraction (dashed black line shows threshold EPSP from control condition). (Bii): Population data for 11 units showing the effect of glycinergic inhibition on the increase of threshold EPSP. (C) and (D): Acoustically evoked FRAs while blocking glycinergic inhibition. (Ci) No effect on input FRA was observed when inhibition was blocked. (Cii) Population data confirming the lack of glycine effect on the input activity. (Di) SBC output FRA shows increased firing rates during the blockade of glycinergic inhibition. Note the absence of the inhibitory sideband after inhibition block. (Dii) Population data show a considerable increase in SBC firing after block of inhibition (p<0.001, paired t-test). (Ei) Left: Under control condition, t threshold EPSP is elevated during acoustic stimulation, indicating the presence of acoustically evoked inhibition. Right: This threshold elevation is absent when the glycinergic inhibition is blocked. (Eii) Population data showing the threshold EPSP during spontaneous activity and acoustic stimulation at the units’ CF for control condition (gray, p<0.001, two-way RM ANOVA) and under inhibition block (green). Note the absence of threshold EPSP elevation during acoustic stimulation under the inhibition block. Blocking glycinergic inhibition had no effect on threshold EPSP during spontaneous activity. Triangles in Bii–Eii denote representative cells from BiEi.

https://doi.org/10.7554/eLife.19295.009

Next, the contribution of inhibition-mediated threshold EPSP elevation on spike failures during acoustic stimulation was tested. The specific glycine receptor antagonist strychnine was iontophoretically applied to block the acoustically evoked glycinergic inhibition. The effectiveness of the glycine block was tested before sound stimulation by simultaneously applying glycine and strychnine, with the application current for strychnine adjusted to block the effect of iontophoretically applied glycine. The ANF input firing rates were not influenced by the block of inhibition (control = 282 ± 48 Hz vs. strychnine = 283 ± 50 Hz, Δ = 1.2 ± 26.7 Hz, p=0.88, paired t-test, n = 11, U1 = 0.18, Figure 4C). The SBC output rates in the excitatory field, however, were substantially increased under glycine block (control = 122 ± 49 Hz vs. strychnine = 192 ± 39 Hz, Δ = 70 ± 47.9 Hz, p<0.001, paired t-test, n = 11, U1 = 0.5, Figure 4D). We next tested if the block of glycinergic inhibition differentially affects the threshold EPSP during spontaneous activity and during acoustic stimulation. When glycinergic inhibition was blocked, the threshold EPSP was only affected during acoustic stimulation, but not during spontaneous activity (interaction drug × stimulus condition, p<0.01, η² = 0.13, two-way RM ANOVA, Greenhouse-Geisser corrected, n = 11, Figure 4E). Acoustic stimulation at CF under control condition resulted in a significant threshold EPSP elevation (threshold EPSP spont = 5.4 ± 1.6 V/s vs. stim = 8.8 ± 3.1 V/s, Δ = 3.5 ± 2.9 V/s, p<0.01, two-way RM ANOVA, Bonferroni-adjusted, n = 11, U1 = 0.41) and this shift was absent when the inhibition was blocked (threshold EPSP spont = 5.9 ± 2 V/s vs. stim = 5.8 ± 1.8 V/s, Δ = 0.1 ± 1.3 V/s, p=0.82, two-way RM ANOVA, Bonferroni-adjusted, n = 11, U1 = 0.09, Figure 4D). The effects observed under acoustic stimulation were very different from the respective manipulations performed during spontaneous activity. In the absence of acoustic stimulation, the block of glycinergic inhibition had no effect on output rates (control = 51 ± 26 Hz vs. strychnine = 49 ± 20 Hz, Δ = 1.6 ± 18.5 Hz, p=0.79, two-way RM ANOVA, Bonferroni-adjusted, n = 11, U1 = 0.14, data not shown), and threshold EPSP (control = 5.5 ± 2.5 V/s vs strychnine = 5.6 ± 2.6 V/s, Δ = 0.03 ± 0.16 V/s, p=0.53, two-way RM ANOVA, Bonferroni-adjusted, n = 11, U1 = 0.09, Figure 4Eii). Similar to acoustic stimulation, the input rates were not altered during inhibition block (control = 85 ± 26 Hz vs. strychnine = 86 ± 22 Hz, Δ = 0.4 ± 16.6 Hz, p=0.94, two-way RM ANOVA, Bonferroni-adjusted, n = 11, U1 = 0.09, data not shown).

These data suggest a major role of glycinergic inhibition in acoustically evoked signal processing, but a negligible impact during spontaneous activity. Taken together, the data confirms previous reports of broadly tuned, predominantly glycinergic inhibition (Kopp-Scheinpflug et al., 2002; Kuenzel et al., 2011), which decreases and potentially normalizes SBC output firing across different stimulus conditions by an increase in threshold EPSP for spike generation.

Temporal precision improves from ANF to SBC during amplitude and frequency-modulated tones

The results above suggest that acoustically evoked inhibition can considerably influence SBC spiking by increasing the threshold for AP generation. Previous studies directly comparing the ANF input and SBC output showed an increase in temporal precision which has been attributed to the impact of inhibition (Dehmel et al., 2010; Kuenzel et al., 2011; Keine and Rübsamen, 2015). These studies focused on the responses to static pure-tone stimulation leaving the question for a potential influence of acoustically evoked inhibition on signal transmission in a more complex, i.e. a more naturalistic acoustic environment unaddressed.

Considering this issue, we first tested the responses of SBCs to sinusoidal amplitude-modulated (SAM) and frequency-modulated (SFM) acoustic stimuli. SAM stimuli were presented at the respective units’ CF 30 dB above the excitatory threshold with modulation frequencies between 50 Hz and 400 Hz (modulation depth = 100%, Figure 5). The discharge activity of the units showed different degrees of modulation congruent with the SAM for both the ANF input and the SBC output (Figure 5B,C). The AP failure fraction increased from 0.27 ± 0.22 in the absence of acoustic stimulation to 0.43 ± 0.18 during SAM stimulation (Δ = 0.16 ± 0.18, p<0.01, two-way RM ANOVA, Greenhouse-Geisser corrected, n = 14, η² = 0.11, data not shown) and was independent of modulation frequency (factor frequency: p=0.19, η² < 0.01; interaction stimulus type × frequency: p=0.27, two-way RM ANOVA, Greenhouse-Geisser corrected, η² < 0.01, n = 14). The temporal precision of ANF input and SBC output to SAM stimulation was estimated by calculating the vector strength (VS) at different modulation frequencies. The SBC output exhibited consistently higher VS compared to its ANF input (Δ = 0.06 ± 0.04, p<0.001, two-way RM ANOVA, Greenhouse-Geisser corrected, n = 14, η² = 0.09, Figure 5D, see also Figure 5—source data 1), and decreased for modulation frequencies above 200 Hz (factor frequency: p<0.001, η² = 0.09; interaction signal type × frequency: p<0.05, η² < 0.01, two-way RM ANOVA, Greenhouse-Geisser corrected, n = 14). To estimate the degree of modulation of the neural response, the modulation depth was estimated by calculating the standard deviation of the first cycle of the normalized cross-correlation function. Modulation depth was considerably higher at the SBC output (Δ = 0.04 ± 0.04, p<0.001, η² = 0.09) and decreased with modulation frequency (factor frequency: p<0.001, η² = 0.08; interaction signal type × frequency: p=0.12, η² < 0.01, two-way RM ANOVA, Greenhouse-Geisser corrected, n = 14).

Tone bursts with sinusoidal amplitude modulations (SAM) of different modulation frequencies were used to investigate the input-output function under the condition of dynamically altered amplitude profiles.

Overall, SAM testing revealed higher temporal precision and reproducibility from ANF input to SBC output. (A) The upper panel (black) shows the stimulus and the lower panel the dot-raster plot of the discharges of a representative SBC to 200 stimulus presentations with a differentiation between EPSPsucc (blue) and EPSPfail (gray). (B) Histogram of the discharge activity shown in A. Upper panel: blue = EPSPsucc, orange = ANF input, i.e. EPSPsucc+EPSPfail). Lower panel: The EPSPfail is also locked to the SAM, following the ANF input dynamics. (C) Period histograms of ANF input (orange) and SBC output (blue) to increasing modulation frequencies. For comparison, all histograms are centered to the maximum of the ANF input. Gray background indicates the stimulus modulation. (D) Trial-to-trial reproducibility and modulation depth were calculated from the cross-correlation between trials. Reproducibility was defined as the peak of the normalized cross correlation and modulation depths as the standard deviation of the first cycle. (E) Population data for 14 SBCs. Different measures of temporal precision and trial-to-trial reproducibility all revealed higher accuracy for the SBC output compared to its ANF input: The SBC output showed consistently higher vector strength (left, p<0.001, two-way RM ANOVA), increased modulation depth (middle left, p<0.001, two-way RM ANOVA), higher reproducibility (middle right, p<0.01, two-way RM ANOVA) and higher representation of the stimulus envelope (right, p<0.05, two-way RM ANOVA) throughout all modulation frequencies. Markers indicate mean ± standard deviation.

https://doi.org/10.7554/eLife.19295.011

The neuronal response to a given stimulus can vary between identical stimulus presentations. This trial-to-trial variability was quantified by calculating the within-cell, across-trial crosscorrelations separately for the ANF input and SBC output. The peak height of the crosscorrelation was termed reproducibility (Joris et al., 2006). It provides a measure of how repeatable the neural response is across trials, given identical stimulus presentations. If the reproducible features of the response encode stimulus properties, e.g. certain salient events, then an increased reproducibility corresponds to more trustable encoding of stimulus information across trials.

The analysis revealed higher reproducibility in the SBC output compared to the ANF input (Δ = 0.4 ± 0.25, p<0.001, η² = 0.09) and also showed a systematic decrease with increasing modulation frequency (factor frequency: p<0.001, η² = 0.04; interaction signal type × frequency: p<0.01, η² < 0.01, two-way RM ANOVA, Greenhouse-Geisser corrected, n = 14). To obtain a better understanding of how precisely the neuronal response reproduces the stimulus envelopes, the delay-adjusted period histograms were correlated to the stimulus envelope resulting in a CorrNorm between 0.82 and 1 (see Materials and methods for explanation). The analysis revealed higher CorrNorm for the SBC output compared to the ANF input (Δ = 0.01 ± 0.01, p<0.05, η² = 0.02), which for both signal types increased with modulation frequency (factor frequency: p<0.05, η² = 0.08; interaction signal type × frequency: p=0.19, η² < 0.01, two-way RM ANOVA, Greenhouse-Geisser corrected, n = 14). These analyses show that the increase in temporal precision observed during pure-tone stimulation is maintained during amplitude-modulated sounds across a wide range of modulation frequencies.

In a next step, the modulation of unit discharges to periodic frequency modulations (SFM) was explored. For that purpose, the stimulus intensity was fixed at 30–40 dB above the unit’s threshold and the stimulus frequency modulated between one octave below and two octaves above the unit’s CF. This frequency range covers the whole excitatory area as well as the inhibitory sideband. Similar to the SAM stimulation, the SFM resulted in prominent modulations of the units’ firing rates (Figure 6A) and increased failure fractions (Figure 6B) (spont = 0.34 ± 0.25 vs. stim = 0.6 ± 0.13, Δ = 0.26 0.23, p<0.001, η² = 0.29, data not shown). In contrast to SAM stimulation, SFM led to increased failure rates at higher modulation frequencies (e.g. 0.52 ± 0.13 at 20 Hz vs. 0.65 ± 0.12 at 400 Hz modulation frequency, p<0.001, two-way RM ANOVA, Greenhouse-Geisser corrected, η² = 0.02, n = 19, data not shown). For SFM stimulation – same as for SAM - the SBC output showed higher VS compared to their ANF input (Δ = 0.14 ± 0.09, p<0.001, η² = 0.21, see also Figure 6—source data 1) (Figure 6D left) (factor frequency: p<0.001, η² = 0.48; interaction signal type × frequency: p<0.001, η² = 0.03, two-way RM ANOVA, Greenhouse-Geisser corrected, n = 19). Still, overall the VS of ANF input and SBC output decreased with increasing modulation frequency (Figure 6D left). Notably, the VS of the SBC output deteriorated to a lesser degree than the ANF input. At modulation frequencies of 20 Hz, the output VS was not significantly different between ANF input and SBC output (ANF input = 0.61 ± 0.06 vs. SBC output = 0.63 ± 0.1, Δ = 0.04 ± 0.07, p=0.17, two-way RM ANOVA, Bonferroni-adjusted, n = 19, U1 = 0.16). For modulation frequencies of 400 Hz, however, the VS of the SBC output was considerably higher than the ANF input (ANF input = 0.25 ± 0.08 vs. SBC output 0.4 ± 0.09, Δ = 0.15 ± 0.06, p<0.001, two-way RM ANOVA, Bonferroni-adjusted, n = 19, U1 = 0.5). It has to be considered that the interpretation of VS values is difficult when the period histogram of the neuronal response shows multiple peaks (Figure 6C). We, therefore, used a set of additional measures to describe the neuronal response to SFM stimuli when comparing ANF input and SBC output. The modulation depth was considerably higher for the SBC output than the ANF input (Figure 6D midleft; Δ = 0.24 ± 0.16, p<0.001, η² = 0.28) and strongly depended on the modulation frequency (factor frequency: p<0.001, η² = 0.37; interaction signal type × frequency: p<0.001, η² = 0.03, two-way RM ANOVA, Greenhouse-Geisser corrected, n = 19). The same holds for signal reproducibility (Figure 6D midright; Δ = 1.9 ± 1.2, p<0.001, η² = 0.38) which also showed prominent frequency dependency (factor frequency: p<0.001, η² = 0.26; interaction signal type × frequency: p<0.001, η² = 0.02, two-way RM ANOVA, Greenhouse-Geisser corrected, n = 19). Unlike the previous measures, the normalized correlation between SFM stimulus envelope and neural response revealed a lower reproducibility for the SBC output compared to the ANF input (Figure 6D right; Δ = 0.02 ± 0.03, p<0.001, η² = 0.06), and the difference also holds with respect to the effect of modulation frequency (factor frequency: p<0.001, η² = 0.43; interaction signal type × frequency: p<0.001, η² = 0.03, two-way RM ANOVA, Greenhouse-Geisser corrected, n = 19). Overall, these data suggest that – same as for the SAM stimuli – the SBC output shows temporally increased precision and higher response reproducibility during SFM stimulation.

Tone bursts with sinusoidal frequency modulations (SFM) of different modulation frequencies were used to investigate the input-output function under the condition of dynamically altered frequency profiles.

Overall, SFM testing revealed improved temporal precision and across-trial reproducibility across the ANF-SBC synapse. (A) The upper panel (black) shows the SFM stimulus with a detail enlargement visualizing the dynamic frequency modulation. The dot-raster plot (lower panel) shows the activity of a representative SBC (CF = 1.8 kHz) to 200 stimulus repetitions with a differentiation between EPSPsucc (blue) and EPSPfail (gray). (B) Histogram of the discharge activity shown in A. Upper panel: blue = EPSPsucc, gray = EPSPfail, orange = ANF input, i.e. EPSPsucc+EPSPfail. Lower panel: The EPSPfail is also locked to the SFM but showed reduced fine structure compared to the ANF input. (C) Period histograms for the same cell as in A and B at different modulation frequencies (orange = ANF input, blue = SBC output). Design of the graph is identical to Figure 5C. Note the multiple peaks of the response in the period histogram. (D) Population data for 19 cells: Across all frequencies tested, the SBC output shows increased vector strength (left; p<0.001, two-way RM ANOVA), higher modulation depths (mid left; p<0.001, two-way RM ANOVA), and better across-trial reproducibility (mid right; p<0.001, two-way RM ANOVA) compared to its ANF input. The stimulus reproduction (CorrNorm) was consistently lower at the SBC level (right; p<0.001, two-way RM ANOVA). Markers indicate mean ± standard deviation.

https://doi.org/10.7554/eLife.19295.013

Spectrotemporal input-output comparison indicates broad, co-tuned, long-lasting inhibition

Above, we demonstrated an improvement in temporal precision and reproducibility in response to SAM and SFM acoustic stimuli. In natural environments, however, the auditory system has to cope with simultaneous dynamic changes in both frequency and amplitude embedded in ambient background noise. To mimic such conditions, while preserving the possibility for a quantifying data analysis, dynamic acoustic stimuli composed of gamma-tones randomly placed in the spectrogram were used (Figure 7A top, randomized gamma-tone sequence, RGS, see Materials and methods for details). The SBC activity can then be characterized using spectrotemporal receptive fields (STRFs). In the present context, STRFs can also be used to quantify the spectrotemporal transformation of response properties across the ANF-SBC synapse, since the respective analysis can be performed for both the ANF input and SBC output. SBC activity was recorded while presenting 20–30 repetitions of identically structured RGS sequences of 30 s duration each. In Figure 7A, the upper panel shows a 200 ms-section of an RGS stimulus used for stimulation of an SBC with a CF of 2 kHz; the middle panel depicts the spike raster plot differentiating between EPSPsucc and EPSPfail.

Input-output comparison of spectrotemporal receptive fields (STRF) indicates minor spectral sharpening and confirms broad, slow inhibitory action.

(A) Top panel: Randomized gamma-tone sequence (RGS, scaling of red color indicates stimulus levels with a maximum of 70 dB SPL, see Materials and methods for stimulus details) were used to estimate STRFs of SBC output and its ANF input. The RGS spanned one octave below and two octaves above the unit’s CF; in the present example 2 kHz. Second panel: Dot raster of discharges of an exemplary SBC evoked by 30 repetitive RGS presentations (blue = EPSPsucc, gray = EPSPfail). Third panel: PSTH of the recording shown above; the graph differentiates between the total of the ANF input (EPSPfail + EPSPsucc, orange) and SBC output (SBC APs, blue). Fourth panel: From the same recording the histogram of the EPSPs that fail to trigger an SBC AP (EPSPfail, gray). Note that EPSPs that elicited APs tended to be more prominent at the onset of excitatory response components. (B) STRF of the unit shown in A. Upper panel: Sketch of the two signal types, i.e. the totality of all EPSPs were considered to indicate the ANF input (orange), while EPSPs that generate an AP defined the SBC output (blue). Middle panel: corresponding STRFs. Note that there are clearly delineated areas of increased activity 2–3 ms after response-evoking stimulus components (red) which are distinct from areas with reduced activity. The spectrotemporal shape of the modulation at the ANF-SBC junction was quantified by the averaged difference-STRF. The STRFs of both ANF (left) and SBC (right) were computed separately and then subtracted (bottom panel). Relative temporal alignment was achieved by time-locking both ANF input activity and SBC output on the respective timing of maximum EPSP slope. The difference reveals changes in stimulus responsiveness in spectrotemporal coordinates. Negative values indicate a reduction in responsiveness, most likely caused by local inhibition. (CG) Population data for all recorded SBCs (n = 34); triangles in the graphs indicate the respective values of the unit shown in A and B. (C) Stimulus-driven excitation was significantly reduced from the ANF input to SBC output, measured as the sum of all positive STRF bins (p<0.001, Wilcoxon signed rank test). (D) Stimulus-driven inhibition was significantly increased, measured by the negative sum of all negative STRF bins from the ANF input to SBC output (p<0.001, Wilcoxon signed rank test). (E) Spectral precision improved at the ANF-SBC junction, indicated by a reduced spectral half-width of the excitation (p<0.001 Wilcoxon signed rank test). (F) Temporal precision, estimated as the temporal half-width, was not changed between ANF input and SBC output (Wilcoxon signed rank test, p=0.16). (G) The average difference-STRF (n = 34 cells) exhibited a prominent and broad ( > 2 octaves) reduction around CF, which remained effective for ~10 ms (black line indicates significant deviation, adjusted for a false discovery rate < 0.01).

https://doi.org/10.7554/eLife.19295.015

The neural response to gamma-tones of both the ANF input and SBC output were temporally structured (Figure 7A second and third panel). Failures of signal transmission (Figure 7A gray dots in second panel and histogram in fourth panel) were found to be increased following sequences of activation, suggesting a long-lasting action of inhibition (e.g. Figure 7A fourth panel, where EPSPfail shows a considerable increase in the responses to the second of the first two peaks).

Separate STRFs were computed for the ANF input (Figure 7B middle left) and the SBC output (Figure 7B middle right). As expected, both STRFs showed common features, e.g. frequency domain of excitation above and below the unit’s CF (in the present example 2 kHz, estimated from single tone tunings) and also the response latency (here 2.5 ms). Importantly, the reduction in the responses establishing a high-frequency sideband was already present in the ANF input to the SBCs and did not become more pronounced in SBC output. Since the ANF activity is not affected by acoustically evoked inhibition, the respective frequency-specific reduction observed in the ANF input to the SBCs likely reflects mechanical interactions in the cochlea, previously described as two-tone suppression (Engebretson and Eldredge, 1968; Sachs and Kiang, 1968; Sellick and Russell, 1979).

To evaluate the signal processing at the ANF-SBC junction, the two STRFs were subtracted from each other after normalizing each by its standard deviation (to compensate for overall firing rate differences, see Materials and methods for details; Figure 7B bottom). This normalization allows a quantification of changes in the tuning shape. The increase in EPSPfail in the STRF of the SBC output manifests itself as a broad field of negativity in the difference-STRF around CF extending up to ~10 ms after the onset of the effective signal components around 2 kHz. The respective differences between ANF input and SBC output were quantified in all recorded SBCs (n = 34) separately for the positive (red) and negative (blue) regions in the STRF (corresponding to influential spectrotemporal locations in the stimulus prior to the response). Summing all the positive regions revealed a significant reduction from ANFs to SBCs (ANF = 0.38 [0.37, 0.41] vs. SBC = 0.33 [0.31, 0.36], Δ = 0.05 [0.03, 0.07], p<0.001, Wilcoxon signed rank test, n = 34, U1 = 0.16, Figure 7C, see also Figure 7—source data 1). Similarly, the summed negative region in the SBC output was significantly larger in magnitude than the ANF input (ANF = 0.18 [0.15, 0.23] vs. SBC = 0.25 [0.18, 0.29], Δ = 0.04 [0.02, 0.07], p<0.001, Wilcoxon signed rank test, n = 34, U1 = 0.1, Figure 7D). Together, this suggests an inhibitory influence acting broadly with respect to the neuron’s tuning.

We further quantified changes in the shape of the main excitatory peak. The spectral tuning, measured as half-width of the excitatory region, was reduced at the SBC output compared to ANF input, suggesting a spectrally sharper tuning at the SBC output (ANF = 1.2 [0.9, 1.4] octaves vs. SBC = 1 [0.8, 1.2] octaves, Δ = 0.1 [0, 0.2], p<0.001, Wilcoxon signed rank test, n = 34, U1 = 0.01, Figure 7E). Temporal precision, measured correspondingly as the half-width of the excitatory region, was somewhat higher for the SBC output, but did not reach statistical significance (ANF = 2.2 [1.8, 3.4] ms vs. SBC = 2.0 [1.8, 3.1] ms, Δ = 0.08 [-0.1, 0.18], p=0.16, Wilcoxon signed rank test, n = 34, U1 = 0.03, Figure 7F).

The overall shape of the difference-STRF of all units was studied by aligning all STRFs to the peak excitation and averaging them (Figure 7G). As mentioned above, the reduction in the STRF outlasted the excitatory region for up to ~10 ms relative to the onset of the excitatory signal component. Significance was assessed point-wise using t-tests, followed by the Benjamini and Hochberg (1995) algorithm for multiple comparisons applied to the p-values of the t-tests. At a false discovery rate of 0.01, the gray line shows the region of significant deviation.

Overall, the STRF analysis confirmed the presence of inhibition co-tuned with excitation, exhibiting a longer-lasting time-course of about 10 ms with respect to the onset of the excitatory signal component. Consequently, also under dynamic broadband stimulation, inhibition is confirmed to only marginally act above or below the neuron’s excitatory receptive field and results in only a slight spectral sharpening of the SBC output. Next, we addressed the functional consequences of this co-tuned, prolonged inhibition.

Glycinergic inhibition renders SBC responses sparser, more reliable and temporally more precise

The functional consequences of co-tuned inhibition appear less evident than those of narrow, sideband inhibition. The latter can diversely shape the response properties, by reducing responses only for small, off-CF regions. Co-tuned inhibition, on the other hand, has been proposed to contribute to a precisely timed balancing of excitation to keep neurons within their dynamic ranges (Renart et al., 2010). To test for such a mechanism, we quantified properties of the SBC output in comparison to the ANF input with respect to the temporal sparsity of the response and reproducibility across trials. Efficient neural codes have been proposed to show high sparsity, i.e. respond only rarely but then with high firing activity (Field, 1994). Again, the RGS stimulus was used to test the effect of acoustically evoked inhibition under complex acoustic conditions. The results yielded reduced mean firing rates of the SBC output compared to the ANF input (Figure 8Ai, data from an exemplary SBC) and increased sparsity in 28/32 cells (units above line of equality, Figure 8Aii). Sparsity was calculated by relating the variance of the neuronal response to its mean firing rate. The population analysis revealed significantly larger temporal sparsity in the SBC output than in its ANF input (ANF = 0.22 ± 0.08 vs. SBC = 0.31 ± 0.13, Δ = 0.09 ± 0.08, p<0.001, paired t-test, n = 34, U1 = 0.15, Figure 8Aiii, see also Figure 8–source data). Sparsity was calculated by relating the variance of the neuronal response to its mean firing rate (Rolls and Tovee, 1995; Willmore and Tolhurst, 2001), but other measures for sparsity yielded qualitatively similar results (see Materials and methods and Supporting Figure 8).

Figure 8 with 1 supplement see all
Inhibition renders the SBC responses sparse and increases across-trial reproducibility.

(A) (i) Representative recording during RGS stimulation (2 s-section displayed) shows significantly sparser SBC output activity (blue) than the ANF input activity (orange). Marks on the right indicate the mean firing rate for ANF input and SBC output. (ii) Population data for all recorded SBCs (n = 34). Quantification of sparsity as the variance of the normalized firing rates shows that this relation holds for almost all units (dots above line of equality; red mark indicates representative unit on the left) and (iii) results in highly significant input-output differences (p<0.001, Wilcoxon signed rank test; triangle indicates the representative unit on the left). (iv) Blocking glycinergic inhibition in vivo by strychnine (n = 12) deteriorated the improved sparsity of the SBC output and rendered it similar to the ANF input (p<0.01, Wilcoxon signed rank test), while the ANF input remained unchanged. (B) The reproducibility of the response improved from ANF input to SBC output. Reproducibility was calculated as the time-aligned correlation between the neuron’s responses to identical stimulus trials. High reproducibility indicates that the neural response is more constant across trials. (i) In the representative unit, higher reproducibility is seen for the SBC output (blue) compared to the ANF input (orange). (ii) Population data (n = 34) shows that the same relation holds for almost all units (data point marked in red indicates the unit shown on the left), and (iii) the statistical analysis yielded a high significant input-output difference (p<0.001, Wilcoxon signed rank; triangles indicate the respective values from the exemplary unit). (iv) Application of strychnine impoverished reproducibility in the SBC output (light blue) significantly compared to the control condition (dark blue; p<0.01, Wilcoxon signed rank test). The reproducibility of the ANF input (orange) was not influenced by blocking the inhibition (light orange). (C) The temporal dispersion for repetitive acoustic stimulation decreased from the ANF input to SBC output. The temporal dispersion was quantified as the half-width of the cross-correlation within each signal across trials (i). Population analysis showed improved temporal precision, i.e. reduced half-width/dispersion in the SBC output compared to the ANF input in most of the tested cells (ii, iii, same color coding as above, p<0.01, Wilcoxon signed rank test). As above, blocking inhibition increases temporal dispersion of the SBC output to the level of the ANF input (iv, p<0.01, Wilcoxon signed rank test).

https://doi.org/10.7554/eLife.19295.017

The reproducibility of the temporal response pattern was quantified by computing across-trial cross-correlations (Figure 8B). For this analysis, the obtained correlograms were divided by the product of the individual firing rates, rendering the results independent of absolute firing rates. Reproducibility was then calculated as the peak of the correlograms measured at 0 ms lag (Figure 8Bi). In 97% (33/34) of all recorded cells, the SBC output exhibited a higher level of reproducibility (units above line of equality, Figure 8Bii). Also, the population analysis yielded a significantly higher reproducibility of the SBC output than the ANF input (ANF = 0.46 [0.37, 0.65] vs. SBC = 0.83 [0.48, 1.3], Δ = 0.35 [0.12, 0.73], p<0.001, Wilcoxon signed rank test, n = 34, U1 = 0.18, Figure 8Biii).

To estimate the temporal precision across trials, the temporal dispersion was quantified as the half-width of the across-trial cross-correlation (Figure 8Ci). Temporal precision across trials improved from ANF input to SBC output in two-thirds of the recorded SBCs (24/34, units below line of equality, Figure 8Cii). Still, population analysis yielded a significant improvement in temporal precision (temporal dispersion: ANF = 7.04 [5.25, 7.93] ms vs SBC = 4.96 [3.47, 6.9] ms, Δ = 1.1 [0, 1.98] ms, p<0.01, Wilcoxon signed rank test, n = 34, U1 = 0.03, Figure 8iii). Response reproducibility across trials renders the response more identifiable for downstream processing stages which rely on precisely timed inputs. The increased sparsity reduces the energy expense by removing spikes which reflect the constant part of the response. Temporal precision of encoding also improved, although this was only observed in about 70% of the cells.

Finally, we directly tested whether the observed changes in response properties were indeed caused by acoustically evoked, glycinergic inhibition. Another set of 12 units were recorded under RGS stimulation, and glycinergic inhibition was blocked by iontophoretic application of strychnine (Figure 8Aiv, Biv, Civ). Like in the experiments reported above, the analysis differentiated between the ANF input to SBCs and the respective SBC output. Under control conditions, cells showed the above-described increase in sparsity and reproducibility at the ANF-to-SBC transition. Blocking the glycinergic inhibition resulted in decreased sparsity of the SBC output (Figure 8Aiv; control = 0.34 ± 0.1 vs. strychnine = 0.27 ± 0.07, Δ = 0.07 ± 0.06, p<0.01, two-way RM ANOVA, Bonferroni–adjusted, n = 12, U1 = 0.33). Also, the block of inhibition caused a decrease in response reproducibility (Figure 8Biv; control = 0.79 ± 0.39 vs. strychnine = 0.53 ± 0.19, Δ = 0.26 ± 0.23, p<0.05, two-way RM ANOVA, Bonferroni-adjusted, n = 12, U1 = 0.21) and an increase in temporal dispersion at the SBC output (Figure 8Civ; control = 5.6 ± 1.6 ms vs. strychnine = 7.5 ± 1.7 ms, Δ = 1.9 ± 1.6 ms, p<0.01, two-way RM ANOVA, Bonferroni-adjusted, n = 12, U1 = 0.21). In summary, the block of inhibition reduced the observed improvements from the ANF input to the SBC output, rendering both more similar. Importantly, the ANF input was not affected by the block of inhibition (Figure 8Aiv, Biv, Civ) (sparsity: control = 0.18 ± 0.05 vs. strychnine = 0.18 ± 0.05, Δ = 0 ± 0.01, p=0.5, U1 = 0.13; reproducibility: control = 0.32 ± 0.11 vs. strychnine = 0.32 ± 0.1, Δ = 0 ± 0.03, p=0.8, U1 = 0.08; temporal dispersion: control = 8.5 ± 0.9 ms vs strychnine = 8.5 ± 1 ms, Δ = 0 ± 0.6 ms, p=0.99, U1 = 0.08, n = 12, two-way RM ANOVA, Bonferroni-adjusted). In comparison with the pre-post data, the pharmacological dataset shows smaller variability across cells, which may be due to the lack of outliers in the latter, smaller dataset. In summary, these data directly show that glycinergic inhibition is a critical factor for the observed improvements from ANF input to the SBC output during complex acoustic stimulation.

Subtractive inhibition suffices to explain the improvement in sparsity, reproducibility, and temporal precision

SBCs have been shown to be influenced by both hyperpolarizing and shunting effects of inhibition (Kuenzel et al., 2011, 2015; Nerlich et al., 2014a). While hyperpolarization has been attributed to a subtractive effect on firing rates (Doiron et al., 2001; Silver, 2010), shunting inhibition has mainly divisive effects (Mitchell and Silver, 2003; Prescott and De Koninck, 2003; Capaday and van Vreeswijk, 2006; Ly and Doiron, 2009). We investigated the functional effect of either type on the response via a simple simulation: either a fixed fraction (divisive, relative to the instantaneous firing rate) or a fixed number (subtractive) of spikes was removed from the ANF spike trains, matching the experimentally observed SBC output rates. Purely divisive inhibition, corresponding to a scaling of the PSTH, does not improve sparsity, reproducibility or temporal precision (Figure 9, purple, Δsparsity = 0 ± 0, p=0.99, Δreproducibility = 0.01 ± 0.02, p=0.25, Δtemporal dispersion = 0.05 ± 0.41, p=0.89, n = 34, one-way RM ANOVA, Bonferroni-adjusted, see also Figure 9—source data 1). On the other hand, a purely subtractive inhibition matches the qualitative effects in the data well, i.e. improves all three properties (Figure 9, green, Δsparsity = 0.24 ± 0.08, p<0.001, Δreproducibility = 1.06 ± 0.73, p<0.001, Δtemporal dispersion = 1.8 ± 1.3 ms, p<0.001, n = 34, one-way RM ANOVA, Bonferroni-adjusted). Quantitatively, the simulated subtractive inhibition leads to larger improvements in sparsity and reproducibility than observed in the experimental data (Figure 9, blue, sparsity: data = 0.31 ± 0.13 vs. subtractive inhibition = 0.46 ± 0.12, Δ = 0.15 ± 0.07, p<0.001; reproducibility: data = 0.97 ± 0.61 vs. subtractive inhibition = 1.59 ± 0.9, Δ = 0.62 ± 0.6, p<0.001, n = 34, one-way RM ANOVA, Bonferroni-adjusted). We verified that similar relations hold for the SAM and SFM stimulation and the measures used in their analyses (Figure 9, Figure 9—figure supplements 1 and 2, respectively). A temporally unspecific, subtractive effect of inhibition might, therefore, be sufficient to explain the improvement in sparsity, reproducibility, and temporal precision. When combined with the divisive, co-tuned gain control, this improvement generalizes to a wide range of stimulus levels.

Figure 9 with 2 supplements see all
Subtractive inhibition, but not divisive inhibition can account for the improvement in sparsity, reproducibility, and temporal precision.

(A) In response to the RGS stimulus, the SBC output (blue) showed a consistent increase in sparsity (left), reproducibility (middle) and decreased temporal dispersion (right). The simulated subtractive inhibition (green) showed similar improvements as the experimental data, while divisive inhibition (purple) had no effect on sparsity, reproducibility, and temporal dispersion. (B) These relations are also reflected in the population data, with significant changes in both the experimental data and the simulated subtractive inhibition (p<0.001, one-way RM ANOVA).

https://doi.org/10.7554/eLife.19295.020

Discussion

In the present study, we demonstrate that glycinergic inhibition shapes SBC responses to become sparser and more reproducible for a broad range of stimulation conditions. As a consequence, many temporal measures improve such as vector strength and across-trial temporal precision. We find inhibition to act largely co-tuned with excitation, although its latency and duration exceed the excitatory input, similar to the respective relationship found in the cortex. Therefore, we propose glycinergic inhibition to take a functional role as a gain control and a signal quality enhancer, which optimizes the SBC output for the subsequent high-fidelity integration for sound localization in the MSO and LSO (see below).

Signal analysis and iontophoretic modulation confirm local inhibitory influence

The endbulb synapse depresses considerably during high-frequency firing (Bellingham and Walmsley, 1999; Wang and Manis, 2008; Yang and Xu-Friedman, 2008) despite the large size of the presynaptic synaptic terminal and the reliable, suprathreshold excitation observed in slice recordings. The present in vivo recordings showed that the increased failure fraction during acoustic stimulation cannot be explained by synaptic depression alone. This was evidenced by an analysis of EPSP thresholds and furthermore confirmed by iontophoretic application of a glycine receptor agonist and antagonist. In conclusion, the elevation of the EPSP threshold has proven to be a reliable indicator for inhibitory action, leading to an increased failure fraction. These data are consistent with previous in vivo studies (Kuenzel et al., 2011), as well as with slice and model studies demonstrating that an increase in inhibitory conductance can elevate threshold EPSP in bushy cells (Xie and Manis, 2013; Kuenzel et al., 2015). In summary, the endbulb of Held–SBC synapse seems to operate close to AP threshold and shows variable reliability which is strongly influenced by acoustically evoked inhibition (see also Kopp-Scheinpflug et al., 2002; Kuenzel et al., 2011; Keine and Rübsamen, 2015). While the observed frequency response areas are consistent with previous reports, we did not observe two distinct types of inhibition as reported earlier, i.e. broadband vs. on-CF inhibition (Caspary et al., 1994; Kopp-Scheinpflug et al., 2002). Instead, the present data suggest that inhibition at SBCs is broadband and on-CF.

Iontophoretic application of glycine covers the physiologically relevant conditions

SBCs receive inhibitory inputs both on their somata and dendrites (Gómez-Nieto and Rubio, 2009). Both glycine and GABA receptors were shown to be present, with the latter playing a secondary role as demonstrated in slice experiments (Nerlich et al., 2014a, 2014b). Therefore, we focused on the modulation of glycinergic inhibition. The applied dose was equated to match the acoustically evoked level of AP failures, keeping inhibition in the physiologically relevant range. This cautious approach will tend to underestimate the in vivo effect of glycine since the block of glycine receptors by local application of strychnine might be incomplete. Consistent with the slice data, the lack of threshold EPSP elevation during the block of glycinergic inhibition suggests only a minor influence of the GABAergic component during tone burst stimulation. The GABAergic inhibition might have an additional modulatory function or may only be activated during periods of high activity, as has been suggested for the glycinergic inhibition in the bird’s nucleus magnocellularis (Fischl et al., 2014), the avian homolog of the AVCN. Overall, we find glycine to have a substantial influence in shaping transmission at the SBC junction. However, the increased SBC output rates during block of glycinergic inhibition might increase the influence of spike depression (Lorteije et al., 2009; Kuenzel et al., 2011) and other factors such as spike threshold adaptation have to be taken into account (Fontaine et al., 2014; Huang et al., 2016).

Inhibitory mechanism for improving sparsity and reproducibility of the neural response

For a broad range of acoustic stimuli, we observed a consistently sparser and more reproducible response in the SBC output compared to the ANF input. Can a simple inhibition achieve these changes in signal representation? Glycinergic inhibition has previously been demonstrated to be neither purely subtractive nor purely divisive (Kuenzel et al., 2011; 2015; Nerlich et al., 2014b) and act with a short delay of ~3 ms on time scales of ~10–15 ms. These properties may be sufficient for increasing sparsity and reproducibility under the assumption that large deviations of firing rate are the consequence of stimulus-elicited, high firing probabilities rather than noise (see Figure 9). The latter would be temporally unrelated to the stimulus, and its transmission would reduce the reproducibility, and probably also the usefulness of the transmitted information for further processing.

In SBCs, this would translate to multiple, closely timed spikes for an individual input or across multiple excitatory inputs. The partially subtractive glycinergic inhibition would be strongly triggered by large instantaneous firing rates, and weaken subsequent inputs which do not occur closely to other inputs. On the PSTH level, a subtractive reduction thus almost inevitably increases sparsity (see Figure 9). Reproducibility will also be increased if the average temporal precision of high peaks is greater than the bulk of spikes at lower firing rates. Divisive inhibition typically leaves sparsity and reproducibility unchanged (Figure 9), although the influence on sparsity will partially depend on the measure used.

Due to the inhibitory/excitatory co-modulation with level, the enhancement in sparsity and reproducibility can extend over a wide range of levels. These considerations do not rule out additional enhancements of temporal precision, via additional excitatory inputs, however, these would be influenced similarly by the inhibition. The considerations above are challenging to study since precise temporal control over multiple inputs would be required.

Generally, if the information in the response is largely maintained, an increase in temporal sparsity can be advantageous for mainly three reasons. First, the information per spike is increased (Barlow, 1972, 2012; Olshausen and Field, 2004). This is relevant for downstream cells, who can then combine information more efficiently with other inputs, to achieve higher precision, e.g. for estimating interaural time delays. Second, since the SBC’s response occurs on a reduced plateau of unmodulated firing than the ANF’s, changes in firing rate will lead to larger relative changes in firing rate, which should improve their detection in the target neurons. Avoiding saturating or strongly adapting postsynaptic responses also supports this improvement in change detectability (Mitchison and Durbin, 1989; Graham and Willshaw, 1997). Third, a reduction in the number of spikes reduces the energetic load on the system (Levy and Baxter, 1996; Baddeley et al., 1997; Attwell and Laughlin, 2001; Olshausen and Field, 2004; Graham and Field, 2007). In the present case the failures of transmission may, therefore, more appropriately be contrasted with the spikes selected for transmission.

Inhibition at the endbulb synapse in the context of sound processing

SBCs provide the indispensable temporally precise excitatory inputs to the interaural time difference based sound localization in the MSO. Both, physiological and modeling studies suggest the neurons in the MSO act as coincidence detectors by primarily relying on the precise spike timing of binaural inputs (Goldberg and Brown, 1969; Yin and Chan, 1990; Franken et al., 2014). While sparsity has predominantly been advocated as an advantageous coding principle due to its reduced energy demand (Levy and Baxter, 1996; Graham and Field, 2007), in the MSO there might be a different justification for sparsity: With the MSO neurons acting as coincidence detectors (Couchman et al., 2010; Plauška et al., 2016), sparse excitatory inputs would lead to an improved signal-to-noise ratio of the correlation, i.e. of its peak in relation to the common floor of correlation. Presently, this is shown by the reproducibility measure across trials, which when just considering signal processing – can be regarded as the correlation between the activity of independent neurons from both sides (Joris, 2003). Equally important, the effective gain control achieved through the acoustically evoked inhibition will allow the MSO to retain a similar level of binaural sensitivity across a wide range of sound levels and stimuli. Importantly, this sparsening and increase in reproducibility were observed for the wide range of stimuli presently tested.

The observed increase in reproducibility in SBC output compared to the ANF input is consistent with previously reported population data. Studies comparing ANF and AVCN neurons (most likely spherical or globular bushy cells) in the cat reported increased reproducibility in neurons of the AVCN (Louage et al., 2005; van der Heijden et al., 2011). Here, we directly studied the increase across trials within single ANF-SBC junctions and not across cells. We conjecture that reproducibility across cells always increases if both cells increase their reproducibility individually. However, this can only be assessed with the peak of the cross-correlation, if the cells are matched in their tuning, a condition which is typically assumed for bilateral inputs into the MSO. For non-matched cells, reproducibility should increase but would have to be assessed with a different measure, which measures the set of responses to a given stimulus (for example Mutual Information).

Presently, we show that this increase is directly achieved at individual ANF-SBC junctions through the postsynaptic interaction of acoustically evoked excitation and inhibition. Also, during both narrowband and broadband stimulation, the block of inhibition resulted in a considerable decrease in reproducibility, sparsity, and temporal precision compared to the control condition (Figure 8). Hence, our results show that acoustically evoked inhibition plays a major role in shaping the SBC response. Most models of pitch perception are based on or related to an autocorrelation of neuronal activity and rely on temporal precision and reproducibility (Licklider, 1951; Meddis and Hewitt, 1991; Cariani and Delgutte, 1996; Yost, 1996; de Cheveigné, 1998; Denham, 2005; Joris, 2016). The improvement of both characteristics at the ANF-SBC synapse might support the neuronal processing of pitch, but experimental evidence is lacking. Other transmitters such as acetylcholine (Fujino and Oertel, 2001; Goyer et al., 2016) or norepinephrine (Kössl and Vater, 1989; Rothman and Manis, 2003) have modulatory effects on the SBC activity, but their effect on precisely-timed signal processing is not yet fully understood.

The output of SBCs is also relevant for general processing of sounds, i.e. their representation for later analysis, e.g. auditory recognition, a process distinct from sound localization (Clarke et al., 1998; Maeder et al., 2001). The increase in reproducibility might be beneficial in this part of the pathway as well. On the other hand, sparsity may have adverse effects, if the overall level is unavailable or if low level sound information in the ANF response is not represented anymore in the SBC response.

Inhibition under spectrotemporally broad and dynamic stimulation

Natural stimuli tend to depart from the classical laboratory stimuli, in being spectrotemporally broad and diverse. We approximated this condition using the RGS stimulus in combination with distinct STRF estimation of both the pre- and the postsynaptic activity. STRFs were first introduced to study midbrain neurons in the grass frog (Aertsen and Johannesma, 1980Aertsen et al., 1980, 1981), and have since been an essential tool for auditory neuroscience along many stations of auditory system, ranging from the auditory nerve (Kim and Young, 1994) and the MNTB (Englitz et al., 2010) to the inferior colliculus (Escabi and Schreiner, 2002), and the auditory cortex (Kowalski et al., 1996; David et al., 2012). Recent developments in STRF estimation (Theunissen et al., 2001) make them a versatile tool to study the combined spectrotemporal stimulus selectivity of neurons for a wide range of acoustic stimuli. The present extension to pre- and postsynaptic activity is unique and provides a direct estimate of the spectrotemporal response modification occurring at a single endbulb synapse. It reveals inhibition to be co-tuned with excitation, but outlasting the latter, which – in total – leads to a slightly improved excitatory response precision. This response profile directly confirms the spectral properties gained from the pure-tone tuning curves and is in agreement with findings from of our previous studies on inhibition at the SBC (Keine and Rübsamen, 2015). The pre-post STRF analysis could provide a generally applicable tool for the investigation of a wider range of modulations of signal processing at the giant synapses in both MNTB and AVCN, for example the functional differences on synaptic transmission under the stimulation with natural acoustic stimuli or removal/application of specific neurotransmitters, e.g. GABA (Nerlich et al., 2014b) or neuromodulators, e.g. acetylcholine (Goyer et al., 2016).

The RGS stimulus allows a robust estimation of STRFs while keeping the spectrum sparse (unlike dense stimuli as the TORC stimulus, Klein et al., 2000). A sparse spectrum is advantageous for the present study of the potentially long-lasting inhibition (Nerlich et al., 2014b) since it prevents a continuous activation of inhibitory inputs causing saturation. It remains to be addressed, whether the RGS is a sufficient model for natural stimuli, or whether the natural statistics lead to a more specific activation of the inhibition arriving at the SBCs.

Overall level-dependence of STRF estimation was beyond the scope of this study and not investigated here in more detail. While the RGS stimulus contains different sound levels at different times and frequencies (via the randomized placement and shape of each gamma-tone), the overall, average sound level was kept constant. We predict that SBC STRFs will exhibit a greater robustness to changes in level compared to ANF STRFs, based on the gain-modulating inhibitory input (Figure 3).

Source of inhibition

Several nuclei have been suggested to provide the inhibitory inputs to SBCs. The present data suggest the inhibitory source to feature a broad, symmetrically shaped tuning, consistent with an integration over a wide set of primary tuned inhibitory cells. The integration would have to be weighted by the distance to the postsynaptic CF, in order to achieve the symmetry in inhibitory modulation. In a recent study, Campagnola and Manis (2014) showed directly that bushy cells receive symmetric inhibition from within the CN. Further, the ~1 ms delay of the onset of acoustically evoked inhibition compared to the onset of excitation (Kuenzel et al., 2011; Keine and Rübsamen, 2015) suggests a single additional synaptic relay. Together, both the broadly tuned D-stellate cells in the AVCN and the tuberculoventral cells in the DCN are candidate sources (Wickesberg and Oertel, 1990; Saint Marie et al., 1991; Campagnola and Manis, 2014), as well as cells in the lateral nucleus of the trapezoid body (Smith et al., 1991; Schofield and Cant, 1992). The data of Campagnola and Manis (2014) directly demonstrate the CN as a source of inhibition, but neurons in other areas may contribute in addition. While we considered here the effect of the inhibition on stimulus representation at the SBC, this broad integration provides the opportunity for additional integration in the source areas.

In summary, while acoustically evoked inhibition on SBC renders the ANF-SBC junction less reliable with respect to signal transmission, it enables sparser and more reproducible SBC sound encoding, which might be of relevance for subsequent localization of sound sources.

Materials and methods

Animals and surgical procedure

Request a detailed protocol

All experiments were performed at the Neurobiology Laboratories of the Faculty of Bioscience, Pharmacy and Psychology of the University of Leipzig (Germany), approved by the Saxonian District Government, Leipzig (TVV 06/09), and conducted according to the European Communities Council Directive (86/609/EEC). Animals were housed in the animal facility of the Institute of Biology with a 12 hr light/dark cycle and with access to food and water ad libitum. Data were collected from 42 Mongolian gerbils (Meriones unguiculatus) of either sex aged 4 to 12 weeks (P43 ± 13).

Before the experiment, animals were anesthetized by an intraperitoneal injection of a mixture of ketamine hydrochloride (140 µg/g body weight, Ketamin-Ratiopharm, Ratiopharm, Ulm, Germany) and xylazine hydrochloride (3 µg/g body weight, Rompun, Beyer, Leverkusen, Germany). The surgical procedure was performed as described previously (Keine and Rübsamen, 2015). For multi- and single-unit recordings, the animal was tilted laterally by 12–18°. The recording electrode was lowered vertically by a step motor system into the anteroventral cochlear nucleus (AVCN). Glass micropipettes (GB150F-10 and GB150F-8P, Science Products, Hofheim, Germany) were fabricated with a PC-10 vertical puller (Narishige, Japan) to have impedances of 3–5 MΩ when filled with the pipette solution (in mM) 135 NaCl, 5.4 KCl, 1 MgCl2, 1.8 CaCl2, 5 HEPES, pH adjusted to 7.3 with NaOH. At the beginning of each recording session, multiunit recordings were performed with low-impedance electrodes (1–3 MΩ) to corroborate the stereotaxic coordinates of the rostral, low-frequency pole of the AVCN.

The activity of SBCs was then acquired by loose-patch recordings (Lorteije et al., 2009; Kuenzel et al., 2011). For that, the recording electrode was lowered through the cerebellum aiming at the AVCN at a depth of about 5000 µm. When passing through non-auditory brain regions high positive pressure (200 mbar) was applied to prevent the electrode from clogging, and the electrode was advanced at a speed of 50 µm/s. On entering the target region, indicated by multiunit activity triggered by broadband noise search-stimuli, the pressure was reduced to 30 mbar, and the electrode then advanced in 1 µm-steps. When approaching a neuron, indicated by a gradual increase in series resistance, the pressure was equalized or slightly negative pressure (–5 mbar) applied. To minimize the mechanical stress on the recorded neuron, the seal resistance was kept <40 MΩ (Alcami et al., 2012). Single-units were recorded only when exhibiting a positive signal amplitude of more than 2 mV (dataset: 4.2 ± 1 mV) and a signal-to-noise ratio of at least 40 (mean amplitude of the positive AP peak divided by the standard deviation of the baseline, dataset: 68.2 ± 14.9).

Iontophoretic application

Request a detailed protocol

To study the impact of inhibition on signal processing at the ANF-SBC synapse, loose-patch recordings were combined with iontophoretic drug application of the glycine receptor agonist glycine (Sigma-Aldrich, 100 mM, prepared in 0.9% NaCl, pH 6, buffered with 10 mM HEPES) and the glycine receptor antagonist strychnine (strychnine hydrochloride, Sigma-Aldrich, 5 mM, same formula). Three-barreled piggy-back electrodes (Havey and Caspary, 1980) were glued to the recording electrode and had the following steric configuration: tip diameter 4–8 µm, recording electrode protruding 20–40 µm (3GB120F-10, Science Products). The iontophoretic current was applied using an iontophoresis amplifier (EPMS-H-7 equipped with MVCS and MVCC modules, npi electronics) with increasing current steps ( + 5 to +100 nA). To reduce potential unspecific effects of strychnine, the application current was adjusted for each cell: First, the minimum current necessary to block spontaneous activity with glycine application was determined. Then, the iontophoretic current for strychnine application was set to block the glycine effect. Control experiments were performed by iontophoretic application of the carrier alone (0.9% NaCl, pH 6, buffered with 10 mM HEPES). Holding currents for each barrel was set to –20 nA and a channel filled with 0.9% NaCl was used for automatic capacitance compensation.

Acoustic stimulation

Request a detailed protocol

All recordings were performed in a sound-attenuating and electrically isolated chamber (Type 400, Industrial Acoustics, Niederkrüchten, Germany) on a vibration-cushioned table. Acoustic stimuli were generated by custom-written Matlab software (MathWorks, Natick) and digitized at a rate of 97.7 kHz. Signals were presented via a custom-made earphone (DT48, beyerdynamic, Heilbronn, Germany) and delivered through a metal funnel ending just in front of the ear canal. The loudspeakers were calibrated using a condenser microphone (Bruel and Kjaer 4133) and custom-written Matlab software. Total harmonic distortions (ratio of the root-mean-squared (RMS) amplitude of higher harmonic frequencies to the RMS amplitude of the fundamental) were below 0.02% across all frequencies tested (0.1–40 kHz). All acoustic stimuli were corrected for the loudspeaker’s impulse response prior to presentation.

Data acquisition

Frequency response areas (FRA) were obtained by pseudorandom presentation of pure tones (100 ms in duration, 5 ms cos2 ramps, 300 ms interstimulus interval) derived from a predefined matrix consisting of 20 different frequencies equally spaced on a log scale and 10 different sound pressure levels (SPL) equally spaced on a linear scale. Each of these 200 frequency/intensity pairs was presented 5–10 times while continuously recording the unit’s discharge activity. The FRAs were used to detail each unit’s CF, (the frequency at which the neuron is most sensitive), response thresholds, and – if present – the frequency-intensity domain of an inhibitory sideband.

Sinusoidal amplitude-modulated (SAM) tones

Request a detailed protocol

Pure tones at the units' CF were amplitude-modulated at frequencies 50 Hz, 100 Hz, 200 Hz, and 400 Hz (200 ms duration, 500 ms interstimulus interval, modulation depth: 100%, starting at a phase angle of –90°).

Sinusoidal frequency-modulated (SFM) tones

Request a detailed protocol

Tones were frequency-modulated in the range of one octave below to two octaves above the unit’s CF at modulation frequencies 20 Hz, 50 Hz, 100 Hz, 200 Hz, and 400 Hz (duration: 200 ms, interstimulus interval: 500 ms).

Randomized gamma-tone sequence (RGS)

Request a detailed protocol

Spectrotemporal receptive fields, sparsity, and reproducibility were estimated in response to a spectrotemporally broad and varied stimulus, a variant of the dynamic random chord stimuli (DRC) as described in Ahrens et al. (2008). Briefly, the present DRC was generated by a randomized placement of equal-level gamma-tones in the spectrotemporal domain. Frequency locations were drawn independently according to a uniform distribution, relative to the CF of the cell, encompassing two octaves above and one octave below. The temporal separation between two adjacent Gamma-tones followed an exponential distribution with a time constant of 5 or 10 ms (see Figure 7A for an example). The bandwidth of the gamma-tones was varied along the frequency axis according to the model of Zhang et al., 2001, consistent with gamma-tone measurements along the Gerbil basilar membrane. The RGS was separately computed for each recorded unit and 20–30 identical repetitions of the 30 s long stimulus were presented while simultaneously recording the ANF input and SBC output.

Data analysis

Request a detailed protocol

The rostral pole of the AVCN was targeted considering its tonotopic organization described previously (Kopp-Scheinpflug et al., 2002; Dehmel et al., 2010). Spherical bushy cells were recognized by their characteristic complex waveform (Pfeiffer, 1966; Winter and Palmer, 1990; Englitz et al., 2009; Typlt et al., 2010) and their primary-like PSTH pattern (Blackburn and Sachs, 1989).

The neurons’ voltage signals were pre-amplified and impedance-converted (Neuroprobe 1600), A-M Systems, Sequim, USA), noise-eliminated (HumBug, Quest Scientific, North Vancouver, Canada), further amplified (PC1, Tucker-Davis Technologies), and digitized at a sampling rate of 97.7 kHz (24 bit, RP2.1, Tucker-Davis Technologies). Signals were band-pass filtered between 5 Hz and 7.5 kHz using a zero-phase forward and reverse digital IIR filter and stored for offline analysis using custom-written Matlab software.

Extracellularly recorded voltage signals of SBCs are typically composed of two (PP-EPSP) or three components (PP-EPSP-AP, Figure 1) reflecting the respective discharge of the presynaptic endbulb of Held (PP, prepotential), the postsynaptic EPSP, and the postsynaptically triggered AP (Englitz et al., 2009; Typlt et al., 2010). Signals were detected using a slope threshold for the rising flank of the EPSP. In this report, EPSPs that failed to trigger a postsynaptic AP were termed EPSPfail while EPSPs that successfully trigger an AP were termed EPSPsucc. The separation between EPSPsucc and EPSPfail was based on the maximum falling slope following the detection time point. The APs following EPSPsucc exhibited a considerably faster-falling flank than EPSPfail enabling a clear separation of both signal types (Figure 1). The detection thresholds were kept fixed for each recorded unit, but varied between units to account for different signal-to-noise ratios. For comparison of timing between EPSPsucc and EPSPfail, all events were time-stamped on their respective maximum EPSP slope. Previous studies showed that the maximum slope of the EPSP is a reliable measure of EPSP strength (Kuenzel et al., 2011). To determine the threshold EPSP, the maximum EPSP slopes of EPSPsucc and EPSPfail were binned (bin size = 0.5 V/s), and the fraction of EPSPsucc calculated for each bin. Then, a Boltzmann function of the form ϕ(x)=1/(1+edxa) was fitted to the data with each bin weighted relative to the number of events in that bin. The symmetric inflection point d indicates the threshold EPSP, i.e. the EPSP slope necessary to yield a > 50% probability of triggering a postsynaptic AP (see also supplementary Matlab code). Earlier studies showed the influence of preceding neural activity on EPSC/EPSP and AP amplitude (Englitz et al., 2009; Lorteije et al., 2009; Yang and Xu-Friedman, 2015). While usually the preceding inter-event-interval (IEI) is used as a measure of previous activity, recent studies showed that short-term plasticity can extend well beyond the last IEI in vitro (Yang and Xu-Friedman, 2015). Therefore, a weighted average of all preceding events was used, with the impact dependent on the distance and EPSP slope of the respective events. The weighting was implemented as a single-exponentially decaying kernel, emphasizing temporally close events over more distant ones (Sonntag et al., 2011). For an EPSP slope at t0 the preceding activity was computed as PreAct(t0)=1median(Si)i=1Sie(tit0)/τ, with Si indicating the EPSP slopes and ti the temporal distance of the i-th preceding event. The time constant τ was set to 60 ms, as this was shown to be the time window of influence in slice studies (Yang and Xu-Friedman, 2015). In addition, the calculation was also performed with time constants of 10 ms and 100 ms yielding qualitatively the same results. The influence of preceding spiking activity on AP amplitude was performed in a similar manner, but limiting the preceding events to successful APs.

To evaluate the shapes of the inhibitory and the excitatory FRAs, asymmetry indices (AI) were calculated for both. AI was defined as lnlog2(FU/CF)log2(CF/FL), where CF indicates the neuron’s characteristic frequency and FL and FU the respective low and high border-frequency of the FRA 40 dB above threshold (see also supplementary Matlab code). An AI of 0 indicates symmetric tuning curves, whereas negative and positive values describe asymmetric FRAs extending to lower or higher frequencies, respectively.

For sinusoidally modulated tones (SAM and SFM) the first 20 ms of every repetition were discarded from analysis to reduce the influence of onset effects, and analysis was constrained to complete periods to avoid unequal sampling. The temporal precision of spikes throughout the stimulus period was assessed by calculating the vector strength (Goldberg and Brown, 1969). The significance of phase-locking was tested based on the Rayleigh approximation (Mardia, 1972). Vector strength is an inadequate measure if the units firing rate reproduces the stimulus modulation. Therefore, we calculated the stimulus reproduction (CorrNorm) as the normalized cross-correlation between the stimulus modulation and the respective response PSTH, adjusted for the latency of the neural response (see Tolnai et al., 2008) for a detailed explanation). Note that CorrNorm is constrained to the positive range [0, 1] and an unmodulated response would yield a value of 0.82. High CorrNorm values (>0.9) are only obtained when the response shape follows the stimulus envelope. The reproducibility of the neuronal responses was estimated by measuring the central peak of the shuffled autocorrelation across identical stimulus presentations (Joris et al., 2006). The modulation depth of the neural response to the 100% amplitude modulation was estimated by calculating the standard deviation of the first cycle of the normalized cross-correlation function.

Estimation of spectrotemporal receptive fields

Request a detailed protocol

Spectrotemporal receptive fields (STRFs) represent the neural tuning in the dimensions of the spectrogram (time and frequency) and help to identify stimulus properties that control spiking at high temporal resolution. Specifically, we used STRFs to study the time course of inhibition during ongoing, spectrally dispersed stimulation. Estimation of STRFs was performed using generalized reverse correlation, as described elsewhere (Theunissen et al., 2000; Englitz et al., 2010). STRFs were estimated for both the input to the SBCs (EPSPfail + EPSPsucc), as well as the SBC output (EPSPsucc). Both input and output were aligned to their maximum EPSP slope, to enable a subtraction of the two STRFs using congruent reference points. STRFs were individually normalized to their standard deviation (positive peak lead normalization to very similar results), in order to allow an evaluation of tuning shape, removing the gain from overall firing rate. Without normalization, the result remains qualitatively the same. However, the inhibitory effect is then dominated by the difference at the STRF peak, which stems from the overall firing rate difference between ANF and SBC. The resulting difference-STRF indicates the translation in spectrotemporal sensitivity from ANF input to the SBC output (see Figure 7). We interpret this difference to be a consequence of three factors (i) local inhibition, together with (ii) postsynaptic processes of spike-frequency adaptation and (iii) Na-channel inactivation.

Temporal sparsity, temporal precision and reproducibility of the neural response

We evaluated multiple measures of synaptic responsiveness to quantify the input-to-output signal processing at these second order neurons of the ascending central auditory pathway. All measures were computed separately for the ANF input and SBC output and then compared.

Sparsity

Request a detailed protocol

The temporal sparsity of the neural response was calculated with three different methods, two classical and a third simple and intuitive one. First, we calculated the variance-based method introduced by Rolls and Tovee (1995) and Willmore and Tolhurst (2001), with sparsity defined as

S=1r(t)t2 / r(t)2t

where .t indicates an average over time. S is an index ranging between 0 and 1. Second, we calculated the kurtosis of the firing rate distribution introduced by Field (1994), which quantifies the peakedness of the firing rate distribution. Note, that this classical definition may lead to counterintuitive interpretations, e.g. if a neuron has dominantly high firing rates and is only rarely silent (also leading to high kurtosis). This measure is hence provided for historical reference.

Third, we calculated a simple and intuitive measure, which we term ‘Close-to-Silence-Index’ (CSI). The CSI is defined as the fraction of PSTH bins less than a certain firing rate F, i.e. S={ r(t)<F }t<T/ T, where F is a firing rate threshold, chosen close to 0, and T the total time of the PSTH. Different thresholds lead to different CSI values but allows one to define what a ‘non-response’ or ‘small-rate’ is. We here chose F = 15 Hz, although other sensible values (5–20 Hz) gave qualitatively similar results. The CSI is a useful estimator if the firing rate distribution is monomodal. Sparsity analysis was performed on PSTHs sampled at 100 Hz (see also supplementary Matlab code).

Reproducibility

Across multiple repetitions of a stimulus, the neural response can repeat reliably or exhibit a high trial-to-trial variability. In the present experiment, variability on the stimulation side is marginal. Thus, the observed variability is solely due to the neural processing. We calculated a measure of reliability by computing the cross-correlation across different trials (same trials always excluded) in response to the same stimulus, similar to the correlation index (Joris et al., 2006). The height of the central peak of correlation was termed reproducibility (see Figure 8C1 for an illustration and supplementary Matlab code).

Temporal precision

Request a detailed protocol

Lastly, the temporal precision of SBC AP generation was quantified by the half-maximum width of the central peak of the cross-correlation function across identical stimulus presentations. The width is termed dispersion, measured in milliseconds (see Figure 8B1 for an illustration). The slimmer the central peak, the higher the temporal precision of neural activity in representing complex acoustic stimuli.

Statistics

Data sets were tested for Gaussianity and equality of variance using the Shapiro-Wilk test (Shapiro and Wilk, 1965) and Levene’s test (Levene, 1960), respectively. Comparison between two independent groups was performed using student's two-tailed t-test or Wilcoxon rank sum test as appropriate. Aggregated data are reported as mean ± standard deviation or median [first quartile, third quartile], respectively. Within-subject comparisons were performed by paired t-test, Wilcoxon signed rank test, or by multi-factorial repeated-measures (RM) ANOVA after testing for sphericity using the Mauchly test (Mauchly, 1940). If the assumption of sphericity was violated, Greenhouse-Geisser correction was applied. Bonferroni correction was applied to all multiple comparisons (Bonferroni, 1936). Correlation between quantities was assessed by Spearman’s rank correlation (Spearman, 1904) to cover linear and nonlinear relationships. For interpretation of all results, a p-value less than 0.05 was deemed significant. Significance thresholds are abbreviated in figure panels as asterisks, with *, **, ***, corresponding to p<0.05, p<0.01, p<0.001, respectively. The effect size was calculated using the MES toolbox in Matlab (Hentschke and Stüttgen, 2011) and reported as eta-squared (η²) for RM ANOVA and Cohen’s U1 for two-sample comparisons. No statistical methods were used to pre-determine sample sizes.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
    Sensory Communication
    1. HB Barlow
    (2012)
    216–234, Possible principles underlying the transformations of sensory messages, Sensory Communication, The MIT Press, 10.7551/mitpress/9780262518420.003.0013.
  10. 10
  11. 11
  12. 12
    Controlling the false discovery rate: a practical and powerful approach to multiple testing
    1. Y Benjamini
    2. Y Hochberg
    (1995)
    Journal of the Royal Statistical Society Series B 57:289–300.
  13. 13
    Classification of unit types in the anteroventral cochlear nucleus: PST histograms and regularity analysis
    1. CC Blackburn
    2. MB Sachs
    (1989)
    Journal of Neurophysiology 62:1303–1329.
  14. 14
    Teoria statistica delle classi e calcolo delle probabilità
    1. CE Bonferroni
    (1936)
    Pubbl Del R Ist Super Di Sci Econ E Commer Di Firenze 8:3–62.
  15. 15
  16. 16
    Electrical responses of neural units in the anteroventral cochlear nucleus of the cat
    1. TR Bourk
    (1976)
    Department of Electrical Engineering and Computer Science,Massachusetts Institute of Technology.
  17. 17
  18. 18
  19. 19
    Reversible inactivation of the dorsal nucleus of the lateral lemniscus reveals its role in the processing of multiple sound sources in the inferior colliculus of bats
    1. RM Burger
    2. GD Pollak
    (2001)
    Journal of Neuroscience 21:4830–4843.
  20. 20
  21. 21
  22. 22
    Neural correlates of the pitch of complex tones. I. Pitch and pitch salience
    1. PA Cariani
    2. B Delgutte
    (1996)
    Journal of Neurophysiology 76:1698–1716.
  23. 23
    Inhibitory inputs modulate discharge rate within frequency receptive fields of anteroventral cochlear nucleus neurons
    1. DM Caspary
    2. PM Backoff
    3. PG Finlayson
    4. PS Palombi
    (1994)
    Journal of Neurophysiology 72:2124–2133.
  24. 24
  25. 25
  26. 26
  27. 27
    Cancellation model of pitch perception
    1. A de Cheveigné
    (1998)
    The Journal of the Acoustical Society of America 103:1261–1271.
    https://doi.org/10.1121/1.423232
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
    Nonlinear spectrotemporal sound analysis by neurons in the auditory midbrain
    1. MA Escabi
    2. CE Schreiner
    (2002)
    Journal of Neuroscience 22:4114–4131.
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
    Cholinergic modulation of stellate cells in the mammalian ventral cochlear nucleus
    1. K Fujino
    2. D Oertel
    (2001)
     Journal of Neuroscience 21:7372–7383.
  41. 41
  42. 42
  43. 43
    Response of binaural neurons of dog superior olivary complex to dichotic tonal stimuli: some physiological mechanisms of sound localization
    1. JM Goldberg
    2. PB Brown
    (1969)
    Journal of Neurophysiology 32:613–636.
  44. 44
    Slow Cholinergic Modulation of Spike Probability in Ultra-Fast Time-Coding Sensory Neurons
    1. D Goyer
    2. S Kurth
    3. C Gillet
    4. C Keine
    5. R Rübsamen
    6. T Kuenzel
    (2016)
    eNeuro, 3, 10.1523/ENEURO.0186-16.2016, 27699207.
  45. 45
  46. 46
    Evolution of Nervous Systems
    1. DJ Graham
    2. DJ Field
    (2007)
    181–187, Sparse coding in the neocortex, Evolution of Nervous Systems, Elsevier, 10.1016/b0-12-370878-8/00064-1.
  47. 47
    Bilateral inhibition by glycinergic afferents in the medial superior olive
    1. B Grothe
    2. DH Sanes
    (1993)
    Journal of Neurophysiology 69:1192–1196.
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
    Enhancement of neural synchronization in the anteroventral cochlear nucleus. I. Responses to tones at the characteristic frequency
    1. PX Joris
    2. LH Carney
    3. PH Smith
    4. TC Yin
    (1994a)
    Journal of Neurophysiology 71:1022–1036.
  55. 55
  56. 56
    Enhancement of neural synchronization in the anteroventral cochlear nucleus. II. Responses in the tuning curve tail
    1. PX Joris
    2. PH Smith
    3. TC Yin
    4. LH Carney
    (1994b)
    Journal of Neurophysiology 71:1037–1051.
  57. 57
    Interaural time sensitivity dominated by cochlea-induced envelope patterns
    1. PX Joris
    (2003)
     Journal of Neuroscience  23:6345–6350.
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
    Interaction of excitation and inhibition in anteroventral cochlear nucleus neurons that receive large endbulb synaptic endings
    1. C Kopp-Scheinpflug
    2. S Dehmel
    3. GJ Dörrscheidt
    4. R Rübsamen
    (2002)
    Journal of Neuroscience 22:11004–11018.
  66. 66
    Analysis of dynamic spectra in ferret primary auditory cortex. I. Characteristics of single-unit responses to moving ripple spectra
    1. N Kowalski
    2. DA Depireux
    3. SA Shamma
    (1996)
    Journal of Neurophysiology 76:3503–3523.
  67. 67
  68. 68
  69. 69
    Noradrenaline enhances temporal auditory contrast and neuronal timing precision in the cochlear nucleus of the mustached bat
    1. M Kössl
    2. M Vater
    (1989)
    Journal of Neuroscience 9:4169–4178.
  70. 70
    Contributions to Probability and Statistics
    1. H Levene
    (1960)
    278–292, Robust tests for equality of variances, Contributions to Probability and Statistics, Stanford University Press.
  71. 71
  72. 72
    A Duplex Theory of Pitch Perception
    1. JCR Licklider
    (1951)
    The Journal of the Acoustical Society of America 23:147.
    https://doi.org/10.1121/1.1917296
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
    Statistics of Directional Data
    1. KV Mardia
    (1972)
    London: Academic Press.
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
  87. 87
  88. 88
  89. 89
  90. 90
  91. 91
  92. 92
  93. 93
  94. 94
  95. 95
  96. 96
  97. 97
  98. 98
  99. 99
    Physiological correlates of comodulation masking release in the mammalian ventral cochlear nucleus
    1. D Pressnitzer
    2. R Meddis
    3. R Delahaye
    4. IM Winter
    (2001)
    Journal of Neuroscience 21:6377–6386.
  100. 100
  101. 101
    Sparseness of the neuronal representation of stimuli in the primate temporal visual cortex
    1. ET Rolls
    2. MJ Tovee
    (1995)
    Journal of Neurophysiology 73:713–726.
  102. 102
  103. 103
  104. 104
    Two-tone inhibition in auditory-nerve fibers
    1. MB Sachs
    2. NY Kiang
    (1968)
    The Journal of the Acoustical Society of America 43:1120–1128.
    https://doi.org/10.1121/1.1910947
  105. 105
  106. 106
  107. 107
  108. 108
  109. 109
  110. 110
    Neuronal arithmetic
    1. RA Silver
    (2010)
    Nature Reviews. Neuroscience 11:474–489.
    https://doi.org/10.1038/nrn2864
  111. 111
  112. 112
  113. 113
  114. 114
  115. 115
  116. 116
  117. 117
  118. 118
    Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds
    1. FE Theunissen
    2. K Sen
    3. AJ Doupe
    (2000)
    Journal of Neuroscience 20:2315–2331.
  119. 119
  120. 120
  121. 121
  122. 122
  123. 123
  124. 124
  125. 125
  126. 126
  127. 127
    Delayed, frequency-specific inhibition in the cochlear nuclei of mice: a mechanism for monaural echo suppression
    1. RE Wickesberg
    2. D Oertel
    (1990)
     Journal of Neuroscience 10:1762–1768.
  128. 128
  129. 129
  130. 130
  131. 131
  132. 132
    Inhibitory circuitry in the ventral cochlear nucleus is probably mediated by glycine
    1. SH Wu
    2. D Oertel
    (1986)
    Journal of Neuroscience 6:2691–2706.
  133. 133
  134. 134
  135. 135
  136. 136
  137. 137
    The roles of GABAergic and glycinergic inhibition on binaural processing in the dorsal nucleus of the lateral lemniscus of the mustache bat
    1. L Yang
    2. GD Pollak
    (1994)
    Journal of Neurophysiology 71:1999–2013.
  138. 138
  139. 139
    Interaural time sensitivity in medial superior olive of cat
    1. TC Yin
    2. JC Chan
    (1990)
    Journal of Neurophysiology 64:465–488.
  140. 140
    Pitch strength of iterated rippled noise
    1. WA Yost
    (1996)
    The Journal of the Acoustical Society of America 100:3329.
    https://doi.org/10.1121/1.416973
  141. 141

Decision letter

  1. Ian Winter
    Reviewing Editor; University of Cambridge, United Kingdom

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Inhibition in the auditory brainstem enhances signal representation and regulates gain in complex acoustic environments" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors and the evaluation has been overseen by Andrew King as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

All reviewers agreed that this paper was an interesting study that enhanced our understanding of the transformation of information between the auditory nerve and the cochlear nucleus. The data were generally well analysed (but see below for suggestions for improvement) and no further experiments were deemed necessary. While some of the results in the early part of the paper have been reported by others (including their own group), the latter part of the paper, looking at the coding of more natural stimuli i.e. SAM, SFM and RGS, contains new and interesting data.

There are quite a few comments, but the reviewers feel these can all be satisfactorily addressed by the authors. In particular, the reviewers would appreciate a clearer explanation of the STRF stimulus and its analysis e.g. concepts such as sparsity and reproduction need to be made more accessible. In addition, the authors should address the issues of data analysis and presentation outlined in the comments below.

Essential comments

1) It is important to set out more clearly what this study is addressing. To say simply that "the functional role [of inhibition] is not entirely understood" (Introduction section) is inadequate. In the Abstract and Introduction you suggest that one possibility is to compensate for stimulus level. The passage at the beginning of subsection “Inhibition at the endbulb in the context of sound localization” seems important to this argument, but it is difficult to understand how sparsity enhances coincidence detection in MSO. If two inputs simply fire less, then there are fewer opportunities for coincidence, so it seems coincidence detection could get worse if the spikes are deleted at random. The authors posit that reproducibility also occurs across cells, not just within. Do the present data support that? How does level tolerance arise out of this?

2) The concepts of sparsity and reproduction need to be more clearly explained. This is especially true in subsection “Glycinergic inhibition renders SBC responses sparser, more reliable and temporally more precise”, but even in the Abstract, these terms should be clearly defined.a) Why is kurtosis a measure of sparsity? Kurtosis seems to capture both tails of a distribution, but the term "sparsity" implies decreased firing generally, which would include a shift in the mean firing rate, or a skew towards longer intervals. Kurtosis, as the 4th moment, seems to be chosen to avoid both those features. Why is that? It should also be clearly explained why sparsity might be important. The argument from saving energy is not very compelling, if there is not also a decrease in firing rate.b) The term reproduction is not clear. Its definition is hidden in the Methods and while the values are compared in Figure 5, an example is not shown until Figure 8. "Reproduction" has a connotation of being able to recreate something (e.g. using a response to reproduce the stimulus). However, the quantification seems to be using cross-correlation of responses. Perhaps "reproducibility" would be a better term for this measure, though it is important to consider that just because a response is reproducible does not mean it is correct or efficient or useful.

3) Do all primary-like units show failures? Did you just select cells that showed failures? Please comment on cell type. How is it determined that these are spherical bushy cells? Are recordings targeted towards cells with particularly large synaptic inputs? How many synaptic inputs does each cell receive? Are the electrical signals assignable to activity of one input? Does the analysis of Figure 2 assume these are single inputs? If they are multiple synaptic inputs, how is this approach justified?

4) Spike depression. Lorteije et al., 2009 suggested that "spike depression" (presumably the refractory period) contributed to EPSP failures at the calyx of Held. The refractory period would likely appear as an increase in threshold during periods of high activity, similar to Figure 2A right, and a decrease in spike amplitude with high activity, similar to Figure 2C right. This possibility should at least be acknowledged, and the strychnine results interpreted with this in mind as well.

5) Second paragraph subsection “Synaptic depression alone fails to fully account for increased failure rates”: Mathematically, what you have done here is correct, but it will likely be confusing to many readers. It is unclear why an "error function" (Boltzmann?) is used to fit this data, as the points clearly cluster into two clouds, and the "threshold" point is not near any of the data points. Why were the data not binned along the ordinate of rising slope (in 0.5 or 1 V/s bins), and the probability of success or failure computed in each bin, based on the total number of succ+fail. This would give the type of representation that would then make sense in terms of fitting to a Boltzmann to obtain a "threshold". Maybe do this as an inset?

6) The summary plots can be very difficult to read. Each plot shows median, quartiles and outliers, as well as averages. This gets very busy, and winds up obscuring or minimizing the important differences. For example, in Figure 3Ciii the averages are squashed down in the lower part of the panel, and most of the panel is taken up with outliers. It would be better to maximize the plot area to make the data as visible as possible. Averages and standard errors are sufficient when testing with a t-test, as are box-and-whisker plots when a non-parametric test. This applies to summary plots throughout, with Figures 5D & 6D particularly hard to read.

7) Many plots compare paired values, and report the original values (+/- SE), but not their differences. For example, 3Ci reports average CFs of ANF and SBC. The average values of these are not very meaningful. What is important is the difference between CFs for input vs. output, which is not reported. This applies in many other places: in Figure 3, differences in threshold, Q, asymmetry; in Figure 4 right panels, effects of glycine, strychnine, stimulation; in Figure 5D and 6D, differences between input and output; Figure 7C-F, differences between input and output, Figure 8iii, iv, differences between input and output, effects of strychnine.

8) ANOVAs in Figure 5, 6. It is very difficult to understand the ANOVA carried out here. In Figure 5D, left, the ANOVA results suggest there is "a systematic variation with modulation frequencies". This is not visible in the plot, and furthermore, the nature of the "systematic variation" is obscured by an ANOVA. Did it go up or down? In the FM analysis in Figure 6, the extreme frequencies show little effect, but 100 Hz modulation shows a strong effect. Why is this behavior so non-linear? Did the results match some sort of expected outcome?

9) Flatness of the RLF (Figure 3D). Using standard deviation to quantify flatness of the RLF is confusing. A standard deviation is a comparison to a mean, and the mean here is not meaningful because it depends on which intensities were sampled. My sense is that the standard deviation would depend on whether the RLF is sampled symmetrically about a cell's dynamic range (the SD would be maximal), or asymmetrically (the SD would be lower). Please justify this measure more carefully. A more intuitive measure of flatness may be the difference in firing rate between maximum and minimum (normalized appropriately).

10) Effects of inhibition. The analysis of the effects of inhibition is very thorough, and beautifully shown, but does not adequately acknowledge similar results from earlier work. This is particularly noticeable in the final paragraph of subsection “Acoustically evoked inhibition elevates threshold for AP generation”, which does not acknowledge that Kopp-Scheinpflug, 2002 and Kuenzel, 2011 also showed similar tuning of inhibition. Figures 57 investigate different types of stimuli, but it would be helpful to compare the findings against others' results using tuning curves. The period histograms in Figure 5C and 6C seem to show that the output is simply shifted lower compared to input. Is such a simple overall shift sufficient to explain the changes in vector strength, modulation depth, reproduction, and correlation? Or, are the temporal characteristics of inhibition also important?

11) FM data period histograms. The FM stimuli appear to yield 2 or 3 peaks in the representative example of Figure 6. Multi-peaked firing can undermine the utility of the standard VS calculation. That is, each peak may be very tight and repeatable, but by having more than one, the VS decreases.

12) STRF analysis: This analysis is very nice, but some clarification would be helpful.a) Why is the inhibitory STRF calculated by subtracting the ANF and SBC plots? Please explain how and why firing rates are normalized by the standard deviation. Is it normalized by subtraction or division? What if this normalization is not done? Please add a scale bar to these plots. Can this analysis be done instead by using only the EPSP(fail) for the kernel estimation? How would this approach differ from the subtraction approach?b) The size of the excitatory region is described as sharper in the SBC compared to ANF. This is not obvious in the example. Can the spectral width be indicated on the plot in Figure 7B somehow? Is there a similar effect in the tuning curves in Figure 3? It is not clear how sound level influences the STRF. Would that affect this width measurement?

13) Figure 8: This is a very important figure, because it addresses how inhibition influences the information carried by SBCs compared to ANFs. It would be very helpful if the authors could take a stab at explaining how inhibition may enhance sparsity, reproduction, and temporal dispersion. In the case of sparsity, it seems like it cannot be simply a change in the mean firing rate, because kurtosis should not include that (unless there are weird effects of rectification at firing rate of 0).

https://doi.org/10.7554/eLife.19295.028

Author response

[…] Essential comments

1) It is important to set out more clearly what this study is addressing. To say simply that "the functional role [of inhibition] is not entirely understood" (Introduction section) is inadequate. In the Abstract and Introduction you suggest that one possibility is to compensate for stimulus level. The passage at the beginning of subsection “Inhibition at the endbulb in the context of sound localization” seems important to this argument, but it is difficult to understand how sparsity enhances coincidence detection in MSO. If two inputs simply fire less, then there are fewer opportunities for coincidence, so it seems coincidence detection could get worse if the spikes are deleted at random. The authors posit that reproducibility also occurs across cells, not just within. Do the present data support that? How does level tolerance arise out of this?

We reckon that the aims and insights of the study could have been more clearly described early on. The review process has motivated us to investigate the mechanisms deeper than before (see below), which leads to a clearer and more insightful bottom-line:

" These improvements are a consequence of the combined subtractive/divisive action of glycine (Kuenzel et al., 2011, 2015): The subtractive component enhances the temporal sparsity by raising the threshold for spiking. Therefore, mostly high instantaneous firing-rate events pass the junction, which often correspond to salient events in the cell's frequency range. The divisive component acts primarily as a gain control, which – in conjunction with the co-tuning – maintains the SBC output rate in a smaller range across different stimulus levels. Together these two effects focus the SBC output onto well-timed stimulus events across a wider range of stimulus levels. Thus, inhibition improves the basis for the high-fidelity signal processing in downstream nuclei crucial for sound localization irrespective of the prevailing stimulus levels."

We have added this passage in the Introduction section, in addition to a number of other clarifications.

Regarding the other questions raised in this point:

The reviewers correctly point out that the step from improved sparsity and reproducibility to improved coincidence detection is not a trivial one, and should be addressed more directly. As the reviewers indicate, a random deletion of spikes would not lead to improved coding. Since a random deletion would correspond to a divisive scaling of the PSTH, it would also not lead to an increase in sparsity (although this would depend to some degree on the measure, see the next point). The effect of inhibition in this case is at least partially subtractive (see also #10 and #13 of the essential comments below for more details), which means that the spikes/parts of the PSTH that remain are those occurring during phases of high firing rate or high firing probabilities. These are the phases, when the stimulus drives a lot of spikes in the current frequency band/neural tuning (directly shown e.g. by the STRF tuning). For a given excitatory frequency these phases will correspond to high stimulus level (given the monotonic rate-level functions of ANFs), which are likely to be present in both the left and the right ear. We propose, that in this way, a given frequency channel transmits the most salient events in its frequency channel, which will also have the best timing (based on the general level-latency dependence (Heil and Irvine, 1997; Heil et al., 2008; Neubauer and Heil, 2008). Such an increase in temporal precision was also demonstrated before, (e.g. (Joris et al., 1994a, 1994b; Dehmel et al., 2010; Kuenzel et al., 2011) and also our Figure 5, 6 and 8. Note, that due to the co-tuning of the inhibition with excitation, the strength of inhibition varies across level. Thus, across all levels the neuron’s response to the relatively loudest peaks are kept, which is what we refer to as gain control.

Next, the outputs from both ears are then combined in the MSO (and LSO) for sound localization: In order to precisely estimate sound source locations, MSO neurons have to detect “correct” coincidences, i.e. coincident spikes from both ears while neglecting coincident spikes from one ear or coincidences that are purely random. MSO neurons possess a number of anatomical (Stotler, 1953; Smith, 1995; Agmon-Snir et al., 1998) and biophysical specializations (Scott et al., 2007; Mathews et al., 2010) to distinguish between monaural and binaural coincidences. The increase of ANF firing rates with sound pressure level would – at high sound intensities – likely result in a large number of coinciding inputs at the MSO level and create MSO activity at unfavorable ITDs, i.e. coincident spikes just by chance alone. The above described focus on the relatively highest-level events may allow the MSO to focus on binaural spikes with high temporal fidelity, and thus avoid inaccurate contributions to the ITD estimation. The increase of inhibition with level maintains a more limited range of firing rates across level. Given the presumably strong biophysical basis of coincidence detection in the MSO, it seems plausible that precise ITD processing can only function over a limited range of input firing frequencies. However, as intuitive as this reasoning may seem, it remains partly speculative at this point. We have added considerations along the lines above in the Discussion, adding to Discussion sections that have been added in response to other points raised by the reviewers (see section: Inhibitory mechanism for improving sparsity and reproducibility of the neural response)

Next, the reviewers inquired whether reproducibility increases across cells as well. Before answering this point, we would like to make our claim more precise: If reproducibility is measured with cross-correlation at time-shift 0, an increase in reproducibility in a single cell, does not necessarily lead to an increase in reproducibility across multiple cells. Since different cells will have different tuning properties, their cross-correlation may well reduce (at time-shift 0), even if both cells individually increase their cross-correlation at time-shift 0. If cells with matched tuning properties are compared, however, an increase in reproducibility in each cell, should translate to an increase across cells. We have tested this directly for a subset of cells, identified by clearly peaked ANF cross-correlations (Author response image 1). We selected cells where identical stimulus parameters were used and in all tested cases (14 pairs of cells) the cross-correlation at time-shift 0 increased, as predicted. However, although the increase across cells was positive (0.15+/-0.18, Wilcoxon signed ranks test, p<<0.001) it was slightly smaller (Wilcoxon signed ranks test, p=0.03) compared to the increase within the same set of cells (0.53+/-0.41), which could be a consequence of the residual tuning mismatch between these cells.

Author response image 1
Reproducibility of the SBC output also improves across cells, if these are well matched in their CF and stimulus.

Top/right: Cross-correlograms between ANF input (orange) and SBC output (blue) across pairs of cells (columns and rows enumerate the cells, diagonal is the same cell (gray background), for comparison). Our sample contained 9 cells (one group of 3, and one of 6 cells), which were stimulated with the identical RGS stimulus, giving a total of 18 across pairs. One of the cells did not show an improvement in reproducibility (with itself) and correspondingly did also not show an improvement with other cells (blue background). Since the reviewers asked about improvements in reproducibility, this cell is excluded from the population analysis below. Bottom left: Reproducibility between ANF input and SBC output within (purple) and between (black) cells. Reproducibility also increased between different cells with similar tuning properties (dots above line of equality). Bottom middle: Population data indicate a significant increase in reproducibility of the SBC output compared to the ANF input between cells (p<0.001, n=14, Wilcoxon signed rank test.

https://doi.org/10.7554/eLife.19295.024

On the other hand, if reproducibility were measured with a more general measure, e.g. the entropy of the response given a stimulus (i.e. H(R|S)), or – more typically – mutual information (I(R,S) = H(R)-H(R|S)), then we would predict that an increase in reproducibility on single cells would always lead to an increase in reproducibility on the population level. While the current number of repetitions would make entropy estimation inaccurate (Panzeri and Treves, 1996), the general argument for an overall increase is as follows: considering the extreme case, where every cell produces an individual, but unique spike-train for a given stimulus. Then the neural population would also emit a unique population spike-train, thus be highly reproducible. Similarly, if a single cell becomes more reproducible, this reduces the set of response emitted by the entire population. We have added the following abbreviated version of this argument to the Discussion:

“Here, we only directly studied the increase across trials within a cell, but not across cells. We conjecture that reproducibility across cells always increases, if both cells increase their reproducibility individually. However, this can only be assessed with the peak of the cross-correlation, if the cells are matched in their tuning, a condition which is typically assumed for bilateral inputs into the MSO. For non-matched cells, reproducibility should increase, but would have to be assessed with a different measure, that measures the set of responses to a given stimulus (for example Mutual Information).”

2) The concepts of sparsity and reproduction need to be more clearly explained. This is especially true in subsection “Glycinergic inhibition renders SBC responses sparser, more reliable and temporally more precise”, but even in the Abstract, these terms should be clearly defined.a) Why is kurtosis a measure of sparsity? Kurtosis seems to capture both tails of a distribution, but the term "sparsity" implies decreased firing generally, which would include a shift in the mean firing rate, or a skew towards longer intervals. Kurtosis, as the 4th moment, seems to be chosen to avoid both those features. Why is that? It should also be clearly explained why sparsity might be important. The argument from saving energy is not very compelling, if there is not also a decrease in firing rate.

We added additional information on sparsity and reproducibility to the Introduction and Results section at their first appearance. The word limit in the Abstract prevented us from explaining the concepts in more detail there.

We used the kurtosis of the firing rate distribution to measure the sparsity of the neural response, since it had been classically suggested as a measure of sparseness (Field, 1994) with higher values of kurtosis related to higher sparseness (see Figure 6 of Field, 1994). The rationale for the use of kurtosis in the original work was to assess the 'peakedness' of the firing rate distribution, which led Field to consider kurtosis as a "useful measure", although not necessarily the best measure for sparseness.

We, however, agree with the reviewer's observation, that the use of kurtosis is too general in the present context, since both tails above and below the average contribute, hence, there could be cases with identical kurtosis but very different average firing rate. We, therefore, apply another method of sparseness quantification, which does not suffer from this problem. This measure has been suggested by Rolls and Tovee, (1995) and later taken up by Willmore and Tolhurst (2001), as well as several subsequent studies in neuroscience (~200, according to citation counts on Google Scholar).

This measure is defined as: S=1r(t)t2 / r(t)2t , where .t denote the time average. S is close to 1, i.e. maximal sparseness, if the square of the average rate ([ r(t) ]) is very small compared to the squared deviations of the rate from zero ([ r(t)2 ]).

On the other hand, S is close 0, if the average rate is high, and the variations in rate comparatively low.

Note, that from the definition of the Var(r(t))=[ r(t)2 ][ r(t) ]2>0, it can be deduced that the denominator in S is always greater or equal the enumerator. Hence, S varies only between 0 and 1, and therefore empirical differences in between S of ANF and SBC responses can be compared against 1.

As an alternative 'safe & clean' measure, we introduce another measure of sparseness, here, termed 'Close-to-Silence-Index' (CSI). CSI is defined as the fraction of PSTH bins, which are below a certain threshold (here chosen to be 15 Hz, quite low compared to the range of up to 400 Hz firing rate, see Figure 8—figure supplement 1B), i.e. CSI=#Binst(r(t)<Threshold)/#Binst.

CSI is also an index, thus ranges only between 0 and 1. CSI essentially directly evaluates how close a cell is to spiking at low rate for most of the time. Assuming monomodal firing rate distributions, which are limited from below by 0, it should be a reliable measure for sparsity as well, especially if a conservative threshold is chosen.

Overall the results between the three sparseness measures do not differ qualitatively, all indicating the SBC responses to be temporally sparser than the ANF responses. Since kurtosis has historically been used we keep it in the supplement figure together with the CSI, and replace the – to our knowledge – most standard definition of sparseness by Rolls and Tovee, 1995 in Figure 8.

Secondly, as suggested by the reviewers, we now address the virtues of sparse coding in greater detail in the Introduction (briefly) and the Discussion (at greater length), but not in the Results, since this remains an interpretation. Concretely, we focus on the information per spike and the energy efficiency: One main advantage of sparsity can be the increased information per spike, e.g. to temporal information, or also the detection of changes in firing rate/stimulus properties, if sparsity is achieved through gain normalization. Since the firing rate does indeed decrease in all cells (after all we are studying failures of transmission), we also keep the argument of more energy-efficient coding, but provide a more detailed argument now.

b) The term reproduction is not clear. Its definition is hidden in the Methods and while the values are compared in Figure 5, an example is not shown until Figure 8. "Reproduction" has a connotation of being able to recreate something (e.g. using a response to reproduce the stimulus). However, the quantification seems to be using cross-correlation of responses. Perhaps "reproducibility" would be a better term for this measure, though it is important to consider that just because a response is reproducible does not mean it is correct or efficient or useful.

We agree that “reproducibility” is more in line with our definition (and maybe more grammatical) than “reproduction”, hence, we replaced it throughout the revised manuscript. To introduce the term an explanation has been added to the Results section and a subplot has been added to its first mention in Figure 5 which illustrates its definition.

Further, we also agree that increased reproducibility of the neural response does not necessarily imply a more efficient or useful coding of acoustic information (for 'correct', see below). However, reproducibility across trials, which we measure here, provides a basis for the consistency in extracting information (e.g. interaural time difference, but also stimulus envelope) to repeated presentations of the same stimulus. This is the minimal requirement one should have for a consistent processing of stimuli.

According to Joris, 2003, comparing each pair of spike trains from recordings of a single cell can be considered as providing the input to a simple binaural coincidence detector. While we think the quantification on the basis of the cross-correlation is generally sensible, it is particularly fitting in the context of subsequent computations for extracting interaural time-delays in subsequent processing stations, which likely relies on operations similar to cross-correlation (Colburn, 1996; van der Heijden et al., 2013; Franken et al., 2014; Plauška et al., 2016). For example, MSO neurons are thought to perform an approximate cross-correlation between their bilateral inputs, hence, an increase in cross-correlation translates into a more precise firing of MSO neurons, i.e. more precise in interaural time.

The last point raised by the reviewers, whether a more reproducible response is more correct, is hard to answer, without relating it to the subsequent computation. For example, if only interaural-time difference were computed, then the combination of increased reproducibility and temporal precision can be taken as a signature of a 'more correct' response for this purpose. However, this could also render the response less correct for other purposes. We have added these important considerations to the Discussion. They are embedded into the section dealing with functional origin of the increase in sparsity and reproduction (see below):

"These properties may be sufficient for increasing sparsity and reproducibility under the assumption that large deviations of firing rate are the consequence of stimulus-elicited, high firing probabilities rather than noise (see Figure 9). The latter would be temporally unrelated to the stimulus, and its transmission would reduce the reproducibility, and probably also the usefulness of the transmitted information for further processing […]."

3) Do all primary-like units show failures? Did you just select cells that showed failures? Please comment on cell type. How is it determined that these are spherical bushy cells? Are recordings targeted towards cells with particularly large synaptic inputs? How many synaptic inputs does each cell receive? Are the electrical signals assignable to activity of one input? Does the analysis of Figure 2 assume these are single inputs? If they are multiple synaptic inputs, how is this approach justified?

From our recordings at the rostral pole of the AVCN, all cells showed failures but to very different degrees (as shown in Figure 1C). We did not encounter a cell that was completely fail safe and even for cells which showed low failure rates during spontaneous activity, the failure rate generally increased under acoustic stimulation. The SBCs were determined by a number of physiological properties such as high-spontaneous rates (Smith et al., 1993), localization at the rostral pole of the AVCN (Bazwinsky et al., 2008), the presence of a discernible prepotential in addition to complex waveform (Pfeiffer, 1966; Englitz et al., 2009), short AP duration (Typlt et al., 2012), and primary-like response patterns to pure-tone stimulation (Blackburn and Sachs, 1989). Based on these properties the SBCs can be reliably differentiated from the second type of principal neurons in the AVCN, the stellate cells. By considering these physiological properties, we are confident that the recorded cells were large SBCs, which have been shown to project to the MSO in cats (Osen, 1969). We added this information at the beginning of the Results section.

Each SBC receives between 1-3 endbulb inputs in mice (Cao and Oertel, 2010), and the majority of rostral SBCs seem to receive a single endbulb in cats (Ryugo and Sento, 1991), whereas GBCs are reported to receive between 10-70 endbulb inputs (Liberman, 1991; Spirou et al., 2005). Our dataset and previous studies recording from low-CF SBCs in the Mongolian Gerbil suggest that the majority of cells receive a single functional endbulb input (Kuenzel et al., 2011, 2015). This conclusion is supported by the following findings:

(i) Multiple endbulb inputs to SBCs would be indicated by a larger variation of EPSP slopes reflecting differences in the convergence of different inputs which – during in vivo recording of spontaneous activity – will not be completely synchronized. The consequently expected variations in the shapes (and amplitudes) of EPSPs were not observed in our recordings.

(ii) Multiple endbulb inputs onto SBCs would be indicated by violation of the refractory period and temporal overlap of two or more EPSPs. Our data on EPSP slopes show a unimodal distribution and we did not observe an overlap of multiple EPSPs in the recordings.

(iii) Multiple endbulb inputs would likely be reflected by ANF input rates above the rates of single auditory nerve fibers, since converging ANF inputs on bushy cells originate from the same SR type (Ryugo and Sento, 1991).

To further deal with this issue we compared the distribution of ANF input rates of our recordings with published data from recordings of single AN fibers from the Mongolian Gerbil (Schmiedt, 1989; and Müller, 1996; see Author response image 2). We extracted the data from Figure 7 in Schmiedt (1989) and from Figure 3 in Mueller (1996), constructed histograms and compared these to the histogram of ANF input rates of our recordings. In our recordings we did not observe cells with low spontaneous rates, which was expected, since SBCs are predominantly contacted by high-SR ANF (Smith et al., 1993). The distribution of spontaneous ANF input rates observed in the present study matches the distribution of spontaneous rates obtained from single auditory nerve fibers, suggesting that a single ANF is sufficient to produce the observed spontaneous rates. The largest endbulbs were reported at CFs between 1-4 kHz, the CF range of this study (Rouiller et al., 1986; Sento and Ryugo, 1989). Thus, our data are naturally biased towards large SBCs that receive large endbulb endings.

Furthermore, in our recordings we observed an activity-dependent facilitation of EPSPs, which might be caused by residual calcium accumulation in the endbulb of Held (Borst, 2010). If our recordings would contain EPSPs from multiple independent endbulbs, we would not see such facilitation, since only a fraction of EPSPs should be affected by previous activity. Also, the observation of facilitation rather than depression is unlikely to occur with multiple endbulb inputs.

Author response image 2
The experimentally determined ANF input (top) showed spontaneous rates consistent with previously published auditory nerve recordings of the Mongolian Gerbil (middle & bottom; data extracted from Figure 7 in Schmiedt (Hearing Research, 1989) and from Figure 3 in Mueller (Hearing Research, 1996) respectively).
https://doi.org/10.7554/eLife.19295.025

4) Spike depression. Lorteije et al., 2009 suggested that "spike depression" (presumably the refractory period) contributed to EPSP failures at the calyx of Held. The refractory period would likely appear as an increase in threshold during periods of high activity, similar to Figure 2A right, and a decrease in spike amplitude with high activity, similar to Figure 2C right. This possibility should at least be acknowledged, and the strychnine results interpreted with this in mind as well.

We agree that spike depression and the influence of refractory period could potentially contribute to failures in AP generation in SBCs. This effect was also addressed in a previous study by Kuenzel and colleagues (2011) who showed that shortly after an SBC spike the failure fraction is increased, similarly to the results reported in the study at hand. Both Lorteije et al., 2009 and Kuenzel et al., 2011 reported short recovery times of 0.8 ms and 2.1 ms, respectively, indicating that the influence of spike depression on signal transmission is limited to periods shortly after the last SBC spike.

We attempt to address the influence of spike depression on AP generation in vivo by recording spontaneous activity in the absence of sound stimulation, when the inhibitory influence is likely small (note that the block of glycinergic inhibition did not influence the input-output function in the absence of sound stimulation, Figure 4).

During acoustic stimulation, the block of glycinergic inhibition counteracts the increase in threshold EPSP, but does not render the synapse fail safe. We do not deny that spike depression will contribute to the input-output function at the ANF-SBC synapse, however argue that this effect is constrained to a short time window after the last spike. Additionally, the observed general increase in threshold EPSP during acoustic stimulation was absent or reduced when blocking glycinergic inhibition, indicating that the effect of spike depression on failure rate is limited. However, we now acknowledge the influence of spike depression more clearly in the Discussion section.

5) Second paragraph subsection “Synaptic depression alone fails to fully account for increased failure rates”: Mathematically, what you have done here is correct, but it will likely be confusing to many readers. It is unclear why an "error function" (Boltzmann?) is used to fit this data, as the points clearly cluster into two clouds, and the "threshold" point is not near any of the data points. Why were the data not binned along the ordinate of rising slope (in 0.5 or 1 V/s bins), and the probability of success or failure computed in each bin, based on the total number of succ+fail. This would give the type of representation that would then make sense in terms of fitting to a Boltzmann to obtain a "threshold". Maybe do this as an inset?

We agree with the reviewer that it is intuitively not comprehensible that the current approach is a valid method for the estimation of threshold EPSP. The criticism of the reviewer is well taken and we not only would like to justify the approach in this letter but also explain the approach in in more detail in the text body of the manuscript. Choosing a Gaussian error function was based on a previous study (Kuenzel et al., 2011). Fitting different kinds of sigmoid functions (e.g. Boltzmann function, logistic function) resulted in different slope parameter, but identical inflection points, thus providing the same estimates for the threshold EPSP. In the original submission, we refrained from binning the rising slopes to avoid potential effects of bin size on threshold EPSP estimation. Also, in the method applied in the original manuscript, the fit is inherently weighted by the number of events, which is not the case when using binned data, as all bins will be equally weighted independent of the number of events in each bin, i.e. a probability calculated from e.g. 3 events contributes equally to the fit as does a probability calculated from 1000 events. This is why we thought the use of raw, unbinned data would provide more reliable data. However, we agree that the approach suggested by the reviewers is a more intuitive method of threshold calculation. We therefore recalculated the threshold EPSPs throughout the manuscript using the suggested method (bin size=0.5 V/s). We added a linear weighting parameter to each bin based on the number of events in each bin. The results are virtually identical to the previous approach (Pearson’s r=0.9996, Author response image 3). We changed the respective passage in the Methods and Results section and modified Figure 2A accordingly, which now shows the binned fraction of EPSPsucc across the EPSP rising slopes.

Author response image 3
Comparison of threshold EPSP estimation methods: Right: Representative cell.

The version in the revised manuscript is based on the probability distribution of EPSPsucc as a function of binned EPSP slope (top). The original version was based on a unbinned binary distribution between the EPSPsucc and EPSPfail (bottom). Black arrow and red line indicate estimated threshold EPSP. Right: Population data for 62 cells show that both methods provide almost identical estimation for threshold EPSP.

https://doi.org/10.7554/eLife.19295.026

The interchangeability between both types of analysis can be seen in Author response image 3 which we would like to share with the reviewers: Both methods have been applied to the representative dataset in Figure 2 (upper left: method using bins, lower left: original method without binning), and the threshold EPSPs have been recalculated for all units in the manuscript (right). Both methods produce virtually identical results (dots close to line of equality, gray).

6) The summary plots can be very difficult to read. Each plot shows median, quartiles and outliers, as well as averages. This gets very busy, and winds up obscuring or minimizing the important differences. For example, in Figure 3Ciii the averages are squashed down in the lower part of the panel, and most of the panel is taken up with outliers. It would be better to maximize the plot area to make the data as visible as possible. Averages and standard errors are sufficient when testing with a t-test, as are box-and-whisker plots when a non-parametric test. This applies to summary plots throughout, with Figures 5D & 6D particularly hard to read.

The reviewer is correctly pointing out, that the way the data were presented in the graphs made it difficult to single out the important effects. We choose this approach to show all observed values and scaled the axes to cover all theoretically possible values (as in Figure 2D, where correlation coefficient is bound between -1 and 1.) We added the marker for the representative cell to transparent show, how its values compare to the rest of the population. In the revised manuscript we followed the reviewers’ suggestion to increase the plot area in the summary plots accordingly. In Figure 3Ciii we added a second y-axis to spread the data for the Q40 values, and in Figure 5D and 6D, we replaced the boxplots with markers indicating the mean +/- standard deviation, which makes the graphs much easier to read.

7) Many plots compare paired values, and report the original values (+/- SE), but not their differences. For example, 3Ci reports average CFs of ANF and SBC. The average values of these are not very meaningful. What is important is the difference between CFs for input vs. output, which is not reported. This applies in many other places: in Figure 3, differences in threshold, Q, asymmetry; in Figure 4 right panels, effects of glycine, strychnine, stimulation; in Figure 5D and 6D, differences between input and output; Figure 7C-F, differences between input and output, Figure 8iii, iv, differences between input and output, effects of strychnine.

We thank the reviewer for pointing out that relevant information was missing the way the data were presented. Our motivation had been to provide a transparent presentation of the distribution of the raw data. We now added mean ± standard deviation for differences in the paired comparison to provide this previously missing information to the reader.

8) ANOVAs in Figure 5, 6. It is very difficult to understand the ANOVA carried out here. In Figure 5D, left, the ANOVA results suggest there is "a systematic variation with modulation frequencies". This is not visible in the plot, and furthermore, the nature of the "systematic variation" is obscured by an ANOVA. Did it go up or down? In the FM analysis in Figure 6, the extreme frequencies show little effect, but 100 Hz modulation shows a strong effect. Why is this behavior so non-linear? Did the results match some sort of expected outcome?

We apologize for the misleading term “systematic variation” when describing the results of the VS across stimulation frequencies. Our focus was on the comparison between ANF input and SBC output rather than changes across modulation frequencies. Regarding the VS values, for both ANF input and SBC output the VS decreased with increasing stimulation frequency (VS 50 Hz: ANF=0.26 ± 0.09 vs. SBC=0.31 ± 0.07: 400 Hz: ANF=0.2 ± 0.1 vs. SBC=0.25 ± 0.1). While relatively stable VS values are obtained up to 200 Hz modulation frequency, the VS was significantly reduced at 400 Hz for both ANF input and SBC output. Importantly, the VS of the SBC output was larger than the ANF input across all frequencies tested.

The reviewer correctly observed the effect of the modulation frequencies, and here is, how we would like to explain this finding: The non-linearity of the frequency effect could be explained by the time-constant of the impact of inhibition. The FM stimulus spanned 3 octaves around the unit’s CF, thus covering frequency areas with strong inhibition and areas lacking any inhibitory effect. The time constant of the inhibitory effect is around 20-50 ms in slice recordings (Nerlich et al., 2014), but somewhat shorter in vivo (10–15 ms. Nerlich et al., 2014; Keine and Rübsamen, 2015). During frequency modulated stimulation, the inhibitory conductance should therefore also be modulated and essentially act as a band-pass filter. The slowest and highest modulation frequencies might result in saturation of inhibition or insufficient activation (if the inhibitory conductance does not summate), thus reducing the inhibitory effect on the ANF-to-SBC synapse. However, the exact inhibitory dynamics in vivo are not well understood, also the combination of cell types exerting this inhibition on SBC is unclear. Hence, the temporal dynamics of inhibitory inputs during acoustic stimulation remain elusive. At this point, we can only speculate, why the effect is smaller at the highest and lowest frequencies tested, but it is tempting to argue, that the inhibitory dynamics capture the modulation rates of natural acoustic stimuli. Further studies employing natural sounds would be necessary to evaluate the inhibitory effect on different time scales.

9) Flatness of the RLF (Figure 3D). Using standard deviation to quantify flatness of the RLF is confusing. A standard deviation is a comparison to a mean, and the mean here is not meaningful because it depends on which intensities were sampled. My sense is that the standard deviation would depend on whether the RLF is sampled symmetrically about a cell's dynamic range (the SD would be maximal), or asymmetrically (the SD would be lower). Please justify this measure more carefully. A more intuitive measure of flatness may be the difference in firing rate between maximum and minimum (normalized appropriately).

The reviewer is right in pointing out, that the measure of rate-level variability (RLV) as the standard deviation of the rate-level function (RLF) will depend on the intensities that actually were sampled. We calculated the standard deviation across all of the sampled intensities, identically for the ANF input and the SBC output. In most cells, the RLF of the ANF input shows monotonic increase over a relative wide range of firing rates, while the RLFs of the output were more flat or even non-monotonic. A measure of monotonicity was previously suggested by Kuenzel and colleauges, (2015), which relates the maximum firing rate throughout the RLF to the firing rate obtained at highest sound intensities. While this measure is suitable for detecting non-monotonic behavior in the SBC RLF, it does not quantify the gain of neuronal response, i.e. the change in firing rate across the sound intensity. Since non-monotonic behavior of the RLF is only present in about half of the SBC (Winter and Palmer, 1990; Kopp-Scheinpflug et al., 2002; Keine and Rübsamen, 2015), we opted to use another measure. The original measure was normalized to the spontaneous activity and sampled identically for all cells, and thus quantifies the deviation of firing rates relative to the spontaneous activity. Still, the reviewer is right in pointing out that in the original analysis, the RLV will depend on the threshold of the neuron, i.e. for identical changes in firing rates, different measures of RLV might be obtained. While this should not pose a major problem for the comparison of ANF input and SBC output (which are both driven by the same ANF), we agree that the way the data were analyzed might be somewhat puzzling for the reader. We therefore calculated the RLV as suggested by the reviewer as the difference between minimal and maximal firing rate normalized to the spontaneous activity of ANF input and SBC output, thus providing a measure of the change in firing rates relative to spontaneous activity independent of the sampled sound intensities or the unit’s threshold. Consequently, the RLV was renamed rate-level gain (RLG) throughout the manuscript. The new analysis is shown in the revised Figure 3. Although both measures quantify somewhat different properties of the RLF, a comparison between both resulted in significant positive correlation between the two (input: rs = 0.36, p < 0.01, output: rs = 0.35, p < 0.01). Therefore, using the method suggested by the reviewers allows us to draw the same conclusion about the reduction in gain, but is more intuitive to the reader.

10) Effects of inhibition. The analysis of the effects of inhibition is very thorough, and beautifully shown, but does not adequately acknowledge similar results from earlier work. This is particularly noticeable in the final paragraph of subsection “Acoustically evoked inhibition elevates threshold for AP generation”, which does not acknowledge that Kopp-Scheinpflug, 2002 and Kuenzel, 2011 also showed similar tuning of inhibition. Figures 57 investigate different types of stimuli, but it would be helpful to compare the findings against others' results using tuning curves. The period histograms in Figure 5C and 6C seem to show that the output is simply shifted lower compared to input. Is such a simple overall shift sufficient to explain the changes in vector strength, modulation depth, reproduction, and correlation? Or, are the temporal characteristics of inhibition also important?

As the reviewer correctly pointed out, the analysis of tuning curves showed similar characteristics of inhibition as reported earlier. While we did not observe two distinct types of inhibition, consistent with Kuenzel et al. (2011), we found inhibition to be broader than excitation, which is consistent with (Caspary et al., 1994; Kopp-Scheinpflug et al., 2002). We now acknowledge and cite these earlier findings now in the revised manuscript.

One of the main functions of inhibition seems indeed to be the control of gain, i.e. an overall shift in the firing rates and less dependence of firing rate on sound intensity. However, various studies showed, that the temporal precision also improves from the ANF to SBC on a single-unit level (Dehmel et al., 2010; Kuenzel et al., 2011; Keine and Rübsamen, 2015). Recent studies analyzing the EPSP size showed, that large EPSPs are better timed than small EPSPs (Kuenzel et al., 2011). Modeling studies showed that an increase in temporal precision can be achieved by a sufficiently large number of converging inputs (Rothman et al., 1993; Kuenzel et al., 2015) and such non-endbulb inputs have been reported in the rat VCN (Gómez-Nieto and Rubio, 2009). A recent modeling study showed that inhibition in combination with excitatory inputs at the dendrites could increase the temporal precision, but inhibition alone cannot (Kuenzel et al., 2015). Additionally, as documented by Nerlich et al. 2014 (and already mentioned above) the inhibitory inputs are rather slow, thus not suited to act on a spike-to-spike basis.

To answer the question of the reviewer, we tested if the increase in vector strength, temporal precision, reproducibility and reproduction can be achieved by different mechanisms of inhibitory action. First, we simulated a pure divisive effect of inhibition, i.e. a random loss in output spikes. For that we used the input spike times of our recordings and pseudorandomly removed spike times to match the failure fractions observed in our recordings. The simulated SBC output then reflects a pure (divisive) scaling of the ANF input with failure rates identical to the experimental data. This simulation showed that a random reduction in postsynaptic firing rates is not sufficient to cause the observed increase in temporal precision and trial-to-trial reproducibility neither for SAM nor SFM stimulation (see Figure 9—figure supplements 1 and 2). Next, we simulated a purely subtractive effect of inhibition by removing a fixed number of output spikes per bin, again matched to the experimentally observed failure rates. This results in an overall shift of the histogram, without affecting the gain of the response. We observed an increase in vector strength, modulation depth, reproducibility and sparsity for both the SAM and SFM stimulation. Notably, the simulated increase during SAM stimulation even exceeded the experimental data, while during SFM stimulation, the simulated subtractive inhibitory effect matched the data well.

Finally, we applied the same simulations to the data obtained during RGS stimulation. Again, we find that purely divisive inhibition cannot explain the increase in sparsity, reproducibility and temporal precision, while pure subtractive inhibition matches the data well. The simulation of purely subtractive inhibition shows a larger improvement in sparsity and reproducibility than observed in the data.

We conclude that the subtractive shift in SBC output rates compared to the ANF input can explain the increase in most metrics. A subtractive effect on the firing rates has been associated with hyperpolarizing inhibition with slow temporal dynamics (Doiron et al., 2001), consistent with previous slice recordings in SBCs (Nerlich et al., 2014). However, we also observed a reduced response gain of the SBC output, which is generally attributed to shunting inhibition (Rose, 1977; Koch and Poggio, 1992). Previous studies showed that both types of inhibition might be present at the SBC synapse and our data suggest that these two mechanisms result in a two-fold effect of increased temporal precision and reduced sound level dependence. While during SFM stimulation with constant stimulus level, the simulated subtractive inhibition matches the observed data well, during SAM stimulation, the simulated subtractive inhibition overestimates the effect on the SBC output. This might be caused by the inherent changes in stimulus level and the additional effect of inhibition on response gain in the experimental data.

Regarding the temporal dynamics of inhibition: Considering the slow dynamics of inhibitory currents, the temporal relation between inhibition and excitation might play only a minor role. It seems conceivable that the slow inhibitory currents will summate during acoustic stimulation, resulting in a functionally tonic inhibitory conductance (Nerlich et al., 2014; Kuenzel et al., 2015). However, it has been shown, that well-timed EPSPs are more likely to generate an SBC output than poorly timed EPSPs. Also, experimental data provided evidence that indeed the EPSP size, and – depending on this – the probability of SBC AP generation, does partially depend on the timing of the event (Keine and Rübsamen, 2015; Kuenzel et al., 2015), and it has been suggested that non-endbulb excitatory SBC inputs might facilitate AP generation of well-timed endbulb EPSPs (Ryugo and Sento, 1991; Gómez-Nieto and Rubio, 2009). The degree of non-endbulb excitatory inputs converging on SBCs and their activation pattern during acoustic stimulation remains to be investigated to draw conclusions of their importance of signal processing.

11) FM data period histograms. The FM stimuli appear to yield 2 or 3 peaks in the representative example of Figure 6. Multi-peaked firing can undermine the utility of the standard VS calculation. That is, each peak may be very tight and repeatable, but by having more than one, the VS decreases.

We agree with the reviewer that using VS for distributions with multiple peaks does not provide a sufficient measure of the temporal precision or reproducibility of the neuronal response. We included the VS for SFM stimuli to use a traditional, standardized measure for comparison with the SAM data and previous publications. We complemented VS with other measures of modulation depth (which for example deals well with multiple peaks), reproducibility and CorrNorm to provide a more comprehensive description of the temporal response properties. Also, the main focus of this analysis targeted a direct comparison of ANF input and SBC output rather than changes across modulation frequencies.

However, to inform the reader about the potential problems in VS interpretation for the observed phase histograms, we added a paragraph at the Results section which reads:

“It has to be considered that the interpretation of VS values is difficult when the phase histogram of the neuronal response shows multiple peaks (Figure 6C). We, therefore, used a set of additional measures to describe the neuronal response to SFM stimuli when comparing ANF input and SBC output.“

12) STRF analysis: This analysis is very nice, but some clarification would be helpful.a) Why is the inhibitory STRF calculated by subtracting the ANF and SBC plots? Please explain how and why firing rates are normalized by the standard deviation. Is it normalized by subtraction or division? What if this normalization is not done? Please add a scale bar to these plots. Can this analysis be done instead by using only the EPSP(fail) for the kernel estimation? How would this approach differ from the subtraction approach?b) The size of the excitatory region is described as sharper in the SBC compared to ANF. This is not obvious in the example. Can the spectral width be indicated on the plot in Figure 7B somehow? Is there a similar effect in the tuning curves in Figure 3? It is not clear how sound level influences the STRF. Would that affect this width measurement?

We are happy to provide the information the reviewer asked for. We realize that some necessary information was missing in the original manuscript which happened by trying to keep the text succinct.

Strictly speaking, we computed the difference STRF by subtracting the STRFs of ANF from the STRF of the SBC. Basically, this difference STRF could have positive and negative parts, but – in the recorded datasets – was dominated by negative parts, i.e. indicative of a reduction in response, which could be described as 'inhibitory STRF' (i.e. wherever the SBC STRF is smaller than the ANF STRF, some spikes were missing, and this could be due to inhibition), although not all missing spikes needed to be an indication for inhibition. We therefore (as far as we know) do not refer to it as "inhibitory STRF", but more generally as difference-STRF.

For us, the normalization seems to be an appropriate procedure, since otherwise, the change in gain (i.e. overall firing rate which in part was also due to unspecific failures of spike transmission) dominates the change in shape of the STRF. We originally computed it on non-normalized STRFs, which leads to a similar result after the excitatory peak. However, one obtains a dominating, large peak at BF x Best Latency, which visually masks the changes thereafter. Author response image 4 documents the observed difference without normalization. The units of the STRF are Hz/dB SPL, as they map from the dB scaled spectrogram to firing rate.

As suggested by the reviewer, we added color scale bars to two of the plots in the revised Figure 7B (top two plots are on the same scale, normalization is performed according to the standard deviation of the STRF, and then commonly normalized to peak close to 1).

The difference STRF could also be computed using the EPSPfail, which would mathematically be equivalent (if the appropriate weighting is included) to the difference without normalization. We chose to perform it as the difference since it offers the possibility to normalize (essentially to focus on the shape, rather than overall rate) in between the responses, and we can visualize each of the STRFs in comparison.

Regarding spectral width: Indeed, the spectral half-width is only slightly reduced, and this is not the case not in all units. We now indicate the spectral half-width in the example cell visually, demonstrating that the unit shown is a representative example (see also subpanel E).

The potential influence of sound level on the shape of the units’ response areas is an interesting suggestion: It would be interesting to test the level dependence of ANF and SBC output separately with the prediction that output shape would be more constant than input shape. Given the inhibitory influence, this could hold up to high sound levels, but may deteriorate at low sound levels.

In previous studies in the MNTB (Englitz et al., 2010), we experimented with different choices of sound level, essentially finding similar results as (Rabinowitz et al., 2011) in the auditory cortex (testing for sound contrast processing), namely that the shape of the STRF is mostly invariant, but that the scaling (e.g. via an output nonlinearity) is changed. Therefore, it could also be that changes in gain would be hidden in the STRF structure. But, at this point, this is speculation since in our stimuli overall loudness was not modulated. A consideration of possible sound level influence has been added to the Discussion.

The computational aspects regarding the benefit of normalization are now explained in more detail in the Methods section, right where the STRF is introduced, and furthermore are also briefly mentioned in the Results section (with reference to the Methods).

13) Figure 8: This is a very important figure, because it addresses how inhibition influences the information carried by SBCs compared to ANFs. It would be very helpful if the authors could take a stab at explaining how inhibition may enhance sparsity, reproduction, and temporal dispersion. In the case of sparsity, it seems like it cannot be simply a change in the mean firing rate, because kurtosis should not include that (unless there are weird effects of rectification at firing rate of 0).

Thank you for encouraging us to address this topic directly: since reviewer point #10 asked a similar question, we partially have dealt with the mechanism above.

We think that the main effect of inhibition is a general shift of the response rates closer to quiescence (due to combined divisive/subtractive effects). This shift is triggered by sound evoked glycinergic input, which covaries with excitation, but arrives slightly delayed (~3ms, Keine and Rübsamen, 2015)http://f1000.com/work/citation?ids=1980015. It prevents ANF inputs from evoking SBC output spikes, a mechanism especially affecting poorly timed endbulb inputs (Kuenzel et al., 2011, 2015; Keine and Rübsamen, 2015). As a consequence, sparsity increases, as in many cases mainly the previously high peaks in firing rate remain, whereas a bulk of PSTH bins is shifted to low or zero firing rate bins. As indicated by the reviewer, this will lead kurtosis to increase, as well as the other two measures of sparsity (see Figure 8—figure supplement 1 for a comparison of sparsity).

More generally, we think that the subtractive part of the glycinergic inhibition, in combination with the well-timed, high firing rate events, can explain the improvement of sparsity, reproducibility and temporal precision. We tested this hypothesis by simulating the effects of simple divisive (remove fixed fraction of spikes – relative to the firing rate – from a time bin in a spiketrain) and subtractive (remove fixed number of spikes from each time bin in a spiketrain) inhibition (see Figure 9). The removal of events was matched to the experimentally observed failure fractions. It appears the subtractive part of the inhibition can achieve these improvements, whereas the divisive part leaves them unchanged. On the other hand, the divisive part, together with the co-tuning of excitation and inhibition may preserve the SBC’s ability to respond in a limited range of firing rates (Figure 3Biii/D). It should be noted that the simulated subtractive inhibition resulted in greater improvements than observed in the experimental data, additionally supporting a combined subtractive/divisive inhibitory action. A further, theoretical exploration of this combination appears beyond the scope of this manuscript.

From the literature it appears that the effect of glycinergic inhibition is not clearly subtractive or divisive (Kuenzel et al., 2011, 2015; Nerlich et al., 2014). Hence, both a general increase in membrane permeability (leading to divisive firing rate effects) as well as shifts of membrane potential by hyperpolarization (during a phase of depolarization, leading to subtractive effects on the firing rate level) seem to shape the SBC output (see Ayaz and Chance (2009) for more details). Subtractive and divisive inhibition have been shown to differently affect the neuron’s output and tuning properties. While sole subtractive inhibition can sharpen the tuning and shifts the stimulus-response function without altering the gain, divisive inhibition preserves the tuning and influences the response gain (Wilson et al., 2012).

We think that such a – potentially temporally unspecific – effect is nonetheless compatible with an increase in temporal precision, if the high peaks of the ANF firing rate are linked to specific acoustic events. In the cross-correlations underlying the computations of reproducibility and temporal precision, a removal of the bulk of spikes, and a focus on the peak rates, would lead to higher, tighter correlations. These precisely timed spikes are then preserved and transmitted to the MSO (see #1 for more detail). Assuming that the same occurs at the other AVCN, this would hence lead to a sharpening of interaural temporal correlation.

These arguments were added to the Discussion of the revised manuscript as a new paragraph “Inhibitory mechanism for improving sparsity and reproducibility of the neural response”

https://doi.org/10.7554/eLife.19295.029

Article and author information

Author details

  1. Christian Keine

    Faculty of Bioscience, Pharmacy and Psychology, University of Leipzig, Leipzig, Germany
    Present address
    Molecular Mechanisms of Synaptic Function, Max Planck Florida Institute for Neuroscience, Jupiter, United States
    Contribution
    CK, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    For correspondence
    christian.keine@gmail.com
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8953-2593
  2. Rudolf Rübsamen

    Faculty of Bioscience, Pharmacy and Psychology, University of Leipzig, Leipzig, Germany
    Contribution
    RR, Conception and design, Drafting or revising the article
    Contributed equally with
    Bernhard Englitz
    Competing interests
    The authors declare that no competing interests exist.
  3. Bernhard Englitz

    Department of Neurophysiology, Donders Center for Neuroscience, Radboud University, Nijmegen, Netherlands
    Contribution
    BE, Conception and design, Analysis and interpretation of data, Drafting or revising the article
    Contributed equally with
    Rudolf Rübsamen
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9106-0356

Funding

Deutsche Forschungsgemeinschaft (GRK 1097)

  • Christian Keine

Deutsche Forschungsgemeinschaft (RU 390/19-1)

  • Rudolf Rübsamen

Deutsche Forschungsgemeinschaft (RU 390/20-1)

  • Rudolf Rübsamen

European Commission (Marie Sklodowska Curie Fellowship 660328)

  • Bernhard Englitz

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was supported by the DFG grants GRK 1097 (CK), RU 390/19–1, RU 390/20–1 (RR), and Marie Sklodowska Curie Fellowship 660328 (BE). We thank Jörg Encke for helpful discussion and Sebastian Maass and Ingo Kannetzky for technical support. We thank the three anonymous reviewers for their helpful suggestions which substantially improved the manuscript. The authors declare no competing financial interests.

Ethics

Animal experimentation: All experiments were approved by the Saxonian District Government, Leipzig (TVV 06/09), and conducted according to the European Communities Council Directive (86/609/EEC).

Reviewing Editor

  1. Ian Winter, University of Cambridge, United Kingdom

Publication history

  1. Received: July 3, 2016
  2. Accepted: November 17, 2016
  3. Accepted Manuscript published: November 18, 2016 (version 1)
  4. Version of Record published: December 8, 2016 (version 2)

Copyright

© 2016, Keine et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,160
    Page views
  • 180
    Downloads
  • 12
    Citations

Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Masashi Kameda et al.
    Research Article Updated
    1. Neuroscience
    Ray L Hong et al.
    Research Article