1. Neuroscience
Download icon

Natural ITD statistics predict human auditory spatial perception

  1. Rodrigo Pavão  Is a corresponding author
  2. Elyse S Sussman
  3. Brian J Fischer
  4. José L Peña
  1. Dominick P. Purpura Department of Neuroscience - Albert Einstein College of Medicine, United States
  2. Centro de Matemática, Computação e Cognição - Universidade Federal do ABC, Brazil
  3. Department of Mathematics - Seattle University, United States
Research Article
  • Cited 0
  • Views 481
  • Annotations
Cite this article as: eLife 2020;9:e51927 doi: 10.7554/eLife.51927

Abstract

A neural code adapted to the statistical structure of sensory cues may optimize perception. We investigated whether interaural time difference (ITD) statistics inherent in natural acoustic scenes are parameters determining spatial discriminability. The natural ITD rate of change across azimuth (ITDrc) and ITD variability over time (ITDv) were combined in a Fisher information statistic to assess the amount of azimuthal information conveyed by this sensory cue. We hypothesized that natural ITD statistics underlie the neural code for ITD and thus influence spatial perception. To test this hypothesis, sounds with invariant statistics were presented to measure human spatial discriminability and spatial novelty detection. Human auditory spatial perception showed correlation with natural ITD statistics, supporting our hypothesis. Further analysis showed that these results are consistent with classic models of ITD coding and can explain the ITD tuning distribution observed in the mammalian brainstem.

eLife digest

When a person hears a sound, how do they work out where it is coming from? A sound coming from your right will reach your right ear a few fractions of a millisecond earlier than your left. The brain uses this difference, known as the interaural time difference or ITD, to locate the sound.

But humans are also much better at localizing sounds that come from sources in front of them than from sources by their sides. This may be due in part to differences in the number of neurons available to detect sounds from these different locations. It may also reflect differences in the rates at which those neurons fire in response to sounds. But these factors alone cannot explain why humans are so much better at localizing sounds in front of them.

Pavão et al. showed that the brain has evolved the ability to detect natural patterns that exist in sounds as a result of their location, and to use those patterns to optimize the spatial perception of sounds. Pavão et al. showed that the way in which the head and inner ear filter incoming sounds has two consequences for how we perceive them. Firstly, the change in ITD for sounds coming from different sources in front of a person is greater than for sounds coming from their sides. And secondly, the ITD for sounds that originate in front of a person varies more over time than the ITD for sounds coming from the periphery. By playing sounds to healthy volunteers while removing these differences, Pavão et al. found that natural ITD statistics were correlated with a person’s ability to tell where a sound was coming from.

By revealing the features the brain uses to determine the location of sounds, the work of Pavão et al. could ultimately lead to the development of more effective hearing aids. The results also provide clues to how other senses, including vision, may have evolved to respond optimally to the environment.

Introduction

Humans and other species localize sound sources in the horizontal plane using sub-millisecond interaural time difference (ITD) between signals arriving at the two ears (Middlebrooks and Green, 1991). ITD is detected by auditory brainstem neurons within narrow frequency bands (Goldberg and Brown, 1969; Yin and Chan, 1990; Carr and Konishi, 1990; McAlpine et al., 2001).

Classical psychophysical studies demonstrated that humans detect sound location better in the front than in the periphery (Mills, 1958; Yost, 1974; Makous and Middlebrooks, 1990). Enhanced performance at frontal locations could be efficient for hunting and foraging, as proposed for vision (Collins and Opthalmological Society of the United Kingdom, 1922; Cartmill, 1974; Changizi and Shimojo, 2008). Physiological evidence indicates that coding of binaural spatial cues could support finer discriminability in the front (van Bergeijk, 1962; Feddersen et al., 1957; McAlpine et al., 2001; Grothe et al., 2010). Better sound discrimination and localization in frontal locations can also be predicted from the geometry of the head and the placing of the ears, causing higher ITD rate of change as a function of azimuth in the front (Woodworth, 1938; Feddersen et al., 1957; Gelfand, 2016).

For azimuth detection based on ITD, sound diffraction by structures surrounding the ear can affect the ITD-azimuth relationship (Aaronson and Hartmann, 2014; Roth et al., 1980). In addition, because the brain computes ITD in narrow frequency bands, the interaction of nearby frequencies within a given cochlear filter may also be a source of ITD variability over time (ITDv). Sensory variability is known to be fundamental in change detection. It has been shown that stimulus discrimination depends not only on the mean difference between stimuli but also on the variability of the sensory evidence (Green and Swets, 1966).

This study tested the hypothesis that the natural ITD statistics are encoded by the brain, determining ITD perception. Using human HRTF databases and models of cochlear filters, we estimated ITD rate of change and ITDv, and tested whether these statistics combined in a Fisher information metric predicted spatial discrimination thresholds and deviance detection better than ITD rate of change alone. We presented sounds through insert earphones, removing ITD statistics, to determine whether sound location discriminability and spatial deviance detection were predicted by natural ITD statistics independently from the actual stimulus properties. We found that natural ITD statistics were correlated with auditory spatial discriminability and spatial deviance detection. Analysis of classic models of ITD coding (Stern and Colburn, 1978; Harper and McAlpine, 2004) support the idea that ITD statistics influence the density distribution of ITD tuning, which may be genetically encoded and conserved across human subjects. Thus, our results are consistent with the hypothesis that human brain evolution has incorporated natural statistics of spatial cues to the neural code underlying auditory spatial perception.

Results

ITD statistics, specifically, the derivative (rate of change) of the mean ITD over azimuth (ITDrc) and the standard deviation of ITD over time (ITDv) were estimated from human HRTFs and models of cochlear filters. We first tested whether these ITD statistics predict human spatial discrimination thresholds measured under free-field sound stimulation from previously published datasets (Mills, 1958) and from data collected using tests specifically designed for measuring ITD discrimination through sounds delivered by earphones. Next, we used EEG and mismatch negativity signals (MMN) to address the question of whether these natural ITD statistics influence ITD deviance detection. Finally, we evaluated the compatibility of ITD statistics with the classic neural models for coding ITD.

ITD statistics estimated from human HRTFs and properties of cochlear filters

To test the hypothesis that natural ITD statistics influence the neural code underlying sound localization, we estimated ITDrc and ITDv in sounds reaching the ears of human subjects. The method for estimating the ITD mean and standard deviation (Figure 1A), which was applied across locations and frequencies, included: (1) Impulse responses obtained from publicly available human HRTF databases (Listen HRTF database; 51 subjects) were convolved with acoustic signals, which results in modulation of ongoing phase and gain that depends on both frequency and sound direction; (2) Sound signals were filtered using models of human cochlear filters (Glasberg and Moore, 1990; (3) Instantaneous phase and interaural phase difference (IPD) was extracted from the resulting signals; (4) The mean and standard deviation of instantaneous IPD was computed and converted to ITD to estimate ITD mean and ITD standard deviation over time, at each azimuth and frequency, across subjects (Figure 1B)); and (5) ITDrc was calculated as the derivative of mean ITD over azimuth and ITDv was calculated as the standard deviation of ITD over time.

Figure 1 with 1 supplement see all
ITD statistics of natural stimulus.

(A) Estimation of ITD mean and standard deviation over time in broadband signals filtered by human head-related impulse responses (HRIRs) and modeled cochlear filters. (1) Example HRIRs from sound emitted from speakers located at −15 degrees and recorded with microphones positioned in each ear (obtained from a publicly available LISTEN dataset). Traces show example impulse responses in the right (red) and left (blue). (2) A broadband signal was convolved with HRIRs from right (red) and left (blue) ears for each direction. (3) Convolved signals were then filtered using parameters analogous to human cochlear filters. Example of signal passed through a cochlear filter with a frequency band centered on 1000 Hz for the left (blue) and right (red) ears. (4) The instantaneous phase of the resulting signals on the left and right ears was computed. Top, instantaneous phase over time for the left (blue) and right (red) signals shown in 3. Bottom, instantaneous phase differences (IPD, in radians) and instantaneous time differences (ITD, in microseconds) between left and right signals. (5) Histogram of instantaneous IPD and ITD, illustrating their variability over time for the example signal shown in 3. (B) ITD mean (left) and standard deviation (right) over time, as a function of frequency and azimuth. Plots represent median values across subjects (N = 51), fit by spline curves, and color coded for each frequency. The derivative of the curves on the left was used to calculate ITD rate of change (ITDrc) across azimuth. The ITD variability (ITDv) was computed as the standard deviation of the ITD distribution over time. (C) Left, information of ITD cues as a function of frequency and azimuth, quantified by the median square root of ITD Fisher information (√FIITD) across subjects (azimuth was converted to ITD to obtain the estimate of the ITD statistics as a function of frequency and ITD, matching the stimulus metrics and model parameters used in our study). √FIITD statistic closely approximates ITDrc/ITDv. Right, the interquartile range of √FIITD across subjects shows low inter-individual variability. Black lines on each panel indicate the π-limit across frequency, beyond which ITD cues become ambiguous for narrowband sounds. (D) This study tests the hypothesis that over evolutionary and/or ontogenetic time scales the human brain became adapted to natural ITD statistics, such that stimuli that are more informative about sound source location would be distinctively encoded.

The ITDrc and ITDv statistics were combined to compute the Fisher information in ITD at each location and frequency. Estimation theory has shown that the square root of Fisher information relates to discrimination threholds (Abbott and Dayan, 1999; Brown et al., 2018). Thus, the square root of ITD Fisher information (√FIITD) was the ITD statistic used in this study (see Methods section for details), which closely approximates the ITDrc/ITDv ratio, computed at each location and frequency (Figure 1C-left). √FIITD displayed low variability across individuals (Figure 1C-right), indicating it constitutes a statistic that is largely invariant across human subjects.

ITDrc is determined by the shape and filtering properties of the head, including diffractions (Woodworth, 1938; Aaronson and Hartmann, 2014; Roth et al., 1980). These features should affect ITDv as well; however, ITDv also depends on phase and gain modulations induced by interaction between neighboring frequencies, within the critical band of single cochlear filters (Figure 1—figure supplement 1A). Consistently, the correlation between ITDrc and ITDv was not strong (rSpearman = −0.41). These ITD statistics were consistent across broadband signals with different frequency spectra (Figure 1—figure supplement 1B). Additionally, we tested the consistency of ITD statistics across environments, comparing statistics estimated from HRTFs recorded in anechoic and reverberant rooms (database available in http://medi.uni-oldenburg.de/hrir; Kayser et al., 2009). Echoes significantly disrupted ITD statistics; however, the precedence effect is expected to segregate leading signals from their lagging echoes (Wallach et al., 1949; Brown et al., 2015). Accordingly, ITD statistics estimated in anechoic and reverberant environments were similar when signal transients were considered (Figure 1—figure supplement 1C). Finally, the estimated ITDv was equivalent to the variability over trials (obtained from one instantaneous sample of 200 different broadband signals; rSpearman = 0.99), indicating that ITDv exhibits ergodicity. Invariance across contexts is a premise motivating the question of whether ITD statistics are represented in the brain, influencing auditory spatial perception.

Different auditory cue statistics have been examined in previous studies, as well as their predictive accuracy of human auditory spatial discriminability. Higher ITDrc in the midline supports the better spatial discrimination observed in frontal locations (Mills and localization, 1972; Gelfand, 2016; Brown et al., 2018). Consistent with previous reports (Woodworth, 1938; Feddersen et al., 1957; Gelfand, 2016), we found that ITDrc was higher in the midline for most frequencies. Additionally, the highest ITDrc occurred at locations distant from the midline in some frequencies, which was also observable in previous studies (Kuhn, 1977; Benichoux et al., 2016).

Across-trial ITD variability induced by concurrent sounds was reported by Cazettes et al., 2014 as a metric relevant to the owl’s auditory spatial perception. When this method was applied to human HRTFs and human cochlear filters, for the range of frequencies of interest in human ITD detection, we found ITD variability values weakly correlated to those obtained with the method used in the present study. However, a stronger correlation between both metrics was observed for the range of frequencies most relevant to owls’ sound localization (above 2000 Hz), suggesting that the effect of concurrent sounds on ITDv may not represent a significant source of ITDv in humans. Młynarski and Jost, 2014 also estimated auditory cue statistics across environments. However, they did so without reporting the location of sound sources, which restricted the use of their estimated statistics for testing the prediction of sound discrimination across locations.

In the current study, we hypothesized that the neural representation of ITD is influenced by natural ITD statistics so that ITD perception is predicted by natural √FIITD (Figure 1D). To test this hypothesis, we investigated the √FIITD prediction accuracy of ITD discrimination performance and novelty detection. Finally, we evaluated the consistency between ITD statistics and frameworks proposed in two classical neural models for ITD coding.

Prediction of spatial discrimination thresholds from ITD statistics

A central hypothesis tested by this study was that a neural code adapted to natural ITD statistics influences ITD-change discriminability (dITD) thresholds even under conditions where ongoing stimulus statistics are constant across frequency and locations (Figure 2A).

ITD statistics predict human ITD-change detection thresholds.

(A) Hypothesis (top) and null hypothesis (bottom) of an adapted neural code underlying human ITD discrimination. (B) Classic study by Mills, 1958 estimated the minimum azimuth change detection across frequency and locations for sounds in free-field averaged across subjects; these measures were converted to threshold dITD as a function of reference ITD (left). Scatter plots on the middle and right show free-field dITD thresholds as a function of ITDrc and √FIITD. (C) Test conducted in the present study to specifically assess dITD thresholds for tonal sounds delivered through headphones (dichotic stimulation). Left, mean dichotic dITD thresholds over subjects as a function of reference ITD across frequency. Middle, dichotic dITD thresholds as a function of ITDrc. Right, dichotic dITD thresholds as a function of √FIITD. Bars indicate 50% confidence intervals of mean dITD thresholds. Black lines represent power functions fit to all the analyzed frequencies (solid) and excluding 250 Hz frequency from the analysis (dotted).

Free-field dITD thresholds as a function of ITD and frequency as reported by a classic study of human sound localization (Mills, 1958) were used to test the hypothesis. In addition, a test measuring dITD thresholds through dichotic (earphone) sound delivery was conducted. Neither of these datasets delivered stimuli carrying natural ITD statistics: they both used tones, which abolishes ITDv, and disables ITDrc estimation by either fixing the head of subjects (in free-field) or by decoupling head movement and ITD input (in dichotic). Thus, the effect of natural ITD statistics influencing the neural representation of ITD could be assessed in both approaches.

We first tested whether natural ITD statistics predicted the free-field dITD thresholds estimated from the previously reported dataset (Mills, 1958). Figure 2B shows the free-field dITD thresholds reported by Mills, 1958 as a function of ITDrc and √FIITD estimated in our study. √FIITD displayed higher correlation with dITD thresholds than ITDrc. These results are consistent with the hypothesis that selectivity for √FIITD statistic (which combines ITDv and ITDrc) may underlie the evolution of the neural code supporting discrimination thresholds.

Additionally in the current study, to test ITD-change discrimination thresholds and evaluate prediction accuracy of ITD statistics we presented sounds through earphones instead of free-field stimulation. This avoids potential effects of ongoing ITD statistics and the influence of other sound localization cues. The 24 normal-hearing adults that participated in the testing were instructed to detect a change in ITD within a pair of tonal sounds (Methods), which allowed us to obtain dichotic dITD thresholds across reference ITD and frequency, for the range of interest (Figure 2C-left). The dichotic dITD thresholds averaged across subjects correlated with free-field thresholds reported by Mills, 1958 (rSpearman = 0.64). However, dITD thresholds were higher in the present study compared to Mills, 1958; other studies also using dichotic stimuli reported higher dITD thresholds (e.g. Brughera et al., 2013) than Mills, 1958. This may be due to approach differences, such as presentation of sounds through earphones vs. free-field stimulation and testing untrained subjects in the present study rather than highly trained individuals as was done in Mills, 1958. Using free-field stimulation leaves open the possibility that listeners rely on cues other than ITD to detect sound location in azimuth, which may have lowered the thresholds as found in Mills, 1958. Additionally, training in ITD detection may have an effect on threshold levels compared to normal untrained individuals.

We computed the average dichotic dITD thresholds across participants and quantified the Spearman correlation between them and ITD statistics estimated in our study. When all frequency conditions were analyzed, average dichotic dITD thresholds showed moderate correlation with ITDrc (Figure 2C-middle) and √FIITD (Figure 2C-right). This was particularly influenced by low correlation for dITD thresholds for 250 Hz tones. Higher thresholds for this frequency have previously been reported (Brughera et al., 2013). Exclusion of this frequency substantially improved √FIITD’s prediction accuracy (Figure 2C-right), suggesting that dITD thresholds at 250 Hz may be determined by additional parameters not addressed by the ITD statistics investigated in our study. Comparing the prediction accuracy of these statistics using linear mixed-effects models (Materials and methods) resulted in the same outcome as the Spearman correlation analysis. This provides further support for the hypothesis that both ITDrc and ITDv statistics, combined in √FIITD, influence the ITD neural code underlying discrimination thresholds.

Neural code underlying deviance detection is adapted to ITD statistics

A further investigation to support the idea that ITD statistics are correlated with auditory spatial perception was conducted by testing the ability to detect spatial deviants from a standard sound location in space. To test this, we measured the mismatch negativity (MMN) component of event-related brain potentials (ERPs) (Näätänen et al., 1978) for sounds coming from standard (repeated) and deviant (sporadic) spatial locations. MMNs are observable when a sequence of standard stimuli is unexpectedly interrupted by a low probability stimulus, without the listener making an overt response (Näätänen et al., 1978; Pakarinen et al., 2007; Sussman, 2007; Sussman et al., 2014; Figure 3A-left). Thus, the MMN signals provide a direct brain measure of discriminability that does not require training subjects to perform behavioral tasks. The MMN signal is displayed by subtracting the mean ERP response elicited by the standard stimuli from the mean ERP elicited by deviant stimuli. The amplitude of the MMN indexes discriminability between standard and deviant sounds. The larger the tone features separation between standard and deviant stimuli, or the larger the perceived difference between standard and deviant, the more negative the MMN amplitude (Deouell et al., 2006; Sams et al., 1985; Pakarinen et al., 2007; Tiitinen et al., 1994). Thus, MMN was used to test whether natural ITD statistics influence the magnitude of ITD deviance detection (Figure 3B).

ITD statistics predict discriminability of spatial deviants indexed by MMN responses.

(A) Hypothesis (top) and null hypothesis (bottom) of an adapted neural code underlying MMN responses to spatial deviants tested in this study. Under a neural code relying on natural ITD statistics, the correlation between amplitude of MMN responses and difference between deviant and standard ITD is expected to show a synergistic effect of ITD statistics. (B) Left, passive oddball sequence protocol, in which subjects listened to frequent ‘standard’ stimuli embedded with rare spatial ‘deviants’. In each condition, two tones were presented with the same frequency and distinct ITDs. Right, MMN response within the 100–200 ms latency range of the deviant-minus-standard trace (black line) is shown for the midline frontal electrode (FZ) along with standard (green) and deviant (purple) event related potential traces, averaged across conditions and subjects. Inset on the bottom-right shows the topography of the MMN response. (C) Left, coefficients of correlations between MMN amplitude and different predictor equations adjusting ITD difference between standard and deviant by ITD statistics, as a function of the relative weight of the standard stimulus (ws), relative to the weight of the deviant (wd). Middle, best prediction of MMN amplitude in the model relying on √FIITD, weighting standard more than deviant (80%:20%). Right panel, changes in MMN peak amplitude as a function of the difference between ITD of deviant and standard show stronger negative linear slopes for conditions where the weighted average of √FIITD was higher, compared to conditions with lower √FIITD.

A set of ITD and frequency conditions was selected and presented to 33 normal-hearing adults in order to sample critical ranges drawn from the HRTF analysis (Materials and methods). Frequencies of 400, 550, 600 and 650 Hz were chosen because ITDrc and ITDv changed as a function of azimuth in a manner that could maximize the difference in prediction accuracy across ITD statistics. Frequencies lower than 400 Hz were not tested because of observed distortion in the sound stimulation system, while frequencies above 650 Hz were excluded to avoid phase ambiguity confounds. MMN signals were measured separately across participants and conditions. The averaged peak amplitude of MMN was used to quantify the subject’s capacity to discriminate between ITDs of the standard and deviant. The characteristic fronto-central scalp topography of the MMN responses were observed (Giard et al., 1990; Figure 3B-right).

We then examined the prediction accuracy of MMN amplitude of model equations relying on the absolute difference between the ITDs of standard and deviant stimuli adjusted by the weighted sum of ITD statistics of standard and deviant stimuli. The equation we used to test the prediction of MMN amplitude by √FIITD was:

MMNpeak |ITDdITDs|(wsFIITD s+(1ws)FIITD d),

where ITDs and ITDd are the ITD of standard and deviant, ws and 1-ws are the relative weights of the standard and deviant, and √FIITDs and √FIITDd are the estimated √FIITD values corresponding to the frequency and ITD of the standard and deviant stimuli.

Figure 3C—left shows the Spearman correlation between each of the predictors’ output and the amplitude of MMN peaks (averaged across subjects) as a function of the weight of the standard. The highest correlation was found when multiplying the ITD difference between standard and deviant by √FIITD, and assigning 80% wt to the standard and 20% to the deviant (Figure 3C-middle). Prediction accuracy of model equations using linear mixed-effect models (Materials and methods) yielded the same results as the Spearman correlation analysis. Figure 3C-right shows that conditions with higher weighted √FIITD display larger changes in MMN amplitude as a function of difference between ITD of deviant and standard than conditions with low weighted √FIITD. The good prediction of MMN by the model relying on √FIITD further supports the idea that combined ITDrc and ITDv are critical in auditory spatial perception.

Classic neural models of ITD discriminability are consistent with a representation of ITD statistics

Two classic models of neural coding underlying discriminability of azimuth positions in acoustic space based on ITD (Stern and Colburn, 1978; Harper and McAlpine, 2004) were used to examine the potential link between the brain representation of sensory statistics and perceptual functions. The model by Stern and Colburn, 1978 postulated an increased density of pairs of fibers underlying tuning to ITDs near the midline, under a labeled-line code framework, as the basis for increased ITD discriminability in the front (Figure 4A). This density distribution showed high correlation with √FIITD (Figure 4A) and prediction accuracy of the experimental data of dITD thresholds and ITD deviant detection. Additionally, the density distribution of the model was adjusted to match ITD statistics (Figure 4B) by defining the density of cells tuned to each ITD as a linear transformation of √FIITD. The Stern and Colburn, 1978 model required only minor changes to the density distribution originally proposed to represent the √FIITD pattern. This indicates that this seminal model, which explains multiple experimental findings, is consistent with a density distribution of ITD tuning influenced by the natural ITD statistics.

Classic models of neural properties underlying ITD discriminability and their potential for explaining encoding of ITD statistics.

(A) Distribution of internal delays replotted from Stern and Colburn, 1978, Figure 2b, which proposes a higher density of pairs of fibers encoding frontal ITDs. (B) The density of pairs of fibers proposed by Stern and Colburn, 1978 as the mechanism underlying ITD discriminability could effectively achieve the representation of ITD statistics: the density was adjusted to match the pattern of √FIITD. Note that the adjusted distribution largely preserves the shape of the distribution of the original model. (C) Distribution of IPD-tuning maximizing coding across the physiological range of ITD, as proposed by Harper and McAlpine, 2004. Top, single-neuron Fisher information as a function of IPD. Bottom-left, distribution of best IPDs brain across frequency expected for humans under the framework proposed by the authors; white straight lines indicate physiological range determined by the distance between ears. Bottom-right, reconstructed neuron population Fisher information, converted from IPD to ITD for each frequency for obtaining the predicted ITD discriminability; black curved lines indicate the π-limit, beyond which ITD cues become ambiguous within narrow frequency bands. Spearman correlation coefficients for the relationship between population Fisher information and ITDrc and √FIITD outlined above. (D) The IPD-tuning distribution proposed by Harper and McAlpine, 2004 as a mechanism underlying ITD discriminability was adjusted for matching the neuron population Fisher information to ITDfi. Top, the neuron distributions matching ITD statistics depict best IPDs away from midline across frequency, consistent with a coding strategy based on two clustered subpopulations tuned to IPDs away from the front (McAlpine et al., 2001; Harper and McAlpine, 2004; Hancock and Delgutte, 2004; Pecka et al., 2008). Bottom, the neuron population Fisher information highly correlated with the ITD statistics.

On the other hand, the model postulated by Harper and McAlpine, 2004 relies on the maximization of Fisher information of firing rate within the physiological ITD range to explain the optimal IPD-tuning distribution in a population of brainstem neurons under a rate code framework. This model provides an explanation for the tuning to peripheral ITDs reported in brainstem recordings of mammals (McAlpine et al., 2001; Hancock and Delgutte, 2004; Pecka et al., 2008). We followed the model’s method of calculating Fisher information of individual neurons’ IPD-tuning curves (Figure 4C, top) and their proposed distribution of best IPDs of the neuronal population in humans (Figure 4C, bottom-left) for computing the population’s Fisher information of firing rate (Figure 4C, bottom-right). The population’s Fisher information of firing rate across ITD and frequency, which would correspond to the predicted ITD discriminability by this neural model, showed low correlation with both ITDrc and √FIITD (Figure 4C, bottom-right). This is expected from Harper and McAlpine, 2004 model’s premise of a uniform maximization of Fisher information by firing rate across the physiological ITD range, as it differs from the higher frontal information predicted by the current study. The Harper and McAlpine, 2004 model is, however, consistent with the low prediction accuracy of dITD thresholds and ITD deviance detection results (Figures 2 and 3).

Furthermore, we also tested whether a model relying on the firing rate Fisher information could also match ITD statistics (Figure 4D). Towards this goal, the density of neurons tuned to each IPD of the model was changed in order to make the neural information correlated with √FIITD. This consisted of designing neural populations with a density of preferred ITDs resulting in Fisher information of their ITD tuning curves being equal to √FIITD. The IPD-tuning distribution that generates a neural population Fisher information of firing rate matching √FIITD differs from the IPD- tuning distribution originally proposed by Harper and McAlpine, 2004. However, matching ITD statistics under this coding framework predicted neurons’ IPD-tuning clustered within peripheral IPDs across the frequency range tested, consistent with reports of brainstem recordings in mammals (McAlpine et al., 2001; Hancock and Delgutte, 2004; Pecka et al., 2008).

These results suggest the mechanisms underlying ITD discriminability proposed by Stern and Colburn, 1978 and Harper and McAlpine, 2004 are consistent with coding frameworks adapted to natural ITD statistics, providing a plausible biological connection between the coding of sensory statistics and perceptual functions.

Discussion

Different explanations for the greater discriminability of sound locations in the frontal region have been proposed, including an uneven density of brainstem ITD-sensitive neurons under a labeled-line code framework (Colburn, 1977; Stern and Colburn, 1978) and greater change in the firing rate of these neurons as a function of azimuth in the front compared to the periphery under a rate code framework (McAlpine et al., 2001; Harper and McAlpine, 2004). Mechanisms based on the spatial information carried by auditory stimuli have also been invoked, such as the rate of change of ITD as a function of azimuth (ITDrc) (Mills and localization, 1972; Gelfand, 2016). Our study proposes a new factor influencing the amount of spatial information carried by auditory stimuli, the ITDv. These statistics combined in the square root of ITD Fisher information (√FIITD) were good predictors of ITD discriminability and spatial novelty detection, supporting the hypothesis that natural ITD statistics determine the neural code underlying human sound localization. Finally, we showed that the models of Stern and Colburn, 1978 and Harper and McAlpine, 2004 can reflect the encoding of ITD statistics, thereby providing a functional connection between neural coding frameworks proposed by these models and experimental data on ITD perception.

Previous reports proposed connections between neural network properties and natural stimulus statistics by investigating the selectivity of midbrain neurons to the variability of spatial cues in the owl’s auditory system (Cazettes et al., 2014; Fischer and Peña, 2017). These studies provided evidence of how sensory reliability could be represented (Fischer and Peña, 2011; Rich et al., 2015; Cazettes et al., 2016) and integrated into an adaptive behavioral command (Cazettes et al., 2018). Although properties of the neural mechanisms underlying human and owl sound localization differ in frequency range and putative ITD coding schemes (Schnupp and Carr, 2009), studies in both species support the concept that natural ITD statistics guide ITD processing.

This study specifically investigated whether ITDrc and ITDv, inherent in natural acoustic scenes, are relevant parameters determining ITD discriminability. We tested this hypothesis using discrimination thresholds obtained through free-field (Mills, 1958) and dichotic stimulation protocols that disabled natural ITDrc and ITDv statistics. We found that the integration of these ITD statistics based on Fisher information (√FIITD) was the best predictor of discrimination thresholds of spatial changes across frequency and location. This suggests that the neural code is adapted to the combination of ITDrc and ITDv statistics. However, the higher dichotic dITD thresholds for 250 Hz tonal stimuli (also reported by Brughera et al., 2013) constituted an outlier, indicating limited predictive power of ITD discriminability at this particular frequency by the ITD statistics investigated in this study. Although additional factors may determine discriminability in lower frequencies, our results overall are consistent with the notion that natural √FIITD statistics can modulate the neural code underlying human sound localization.

We also implemented an MMN paradigm to obtain converging evidence that natural ITD statistics influence spatial perception. Although both ITD-change detection thresholds and the MMN novelty detection paradigms required discriminating a change in ITD, novelty detection also involves identification of a repetitive (standard) pattern. Furthermore, the combination of ITDrc and ITDv (√FIITD) was a better predictor of the deviance detection response, the MMN component, than ITDrc alone, which is also consistent with results of psychophysical discriminability thresholds.

Parras et al., 2017 developed an approach designed to isolate the relative contribution of prediction errors and repetition suppression in novelty detection. In the present study, natural ITD statistics might have modulated novelty detection at both levels. However, the model that best described the MMN signals in the current study relied mostly on the ITD statistics of the standard stimulus for weighting the difference between the ITDs of standard and deviants stimuli primarily by the ITD statistics of the standards. Other factors with potential influence on detecting ITD changes in a reference location are attention and training. The novelty detection protocol controlled for these factors because the MMN indexes a brain response to detected deviations irrespective of attention or training. Our finding that prediction of novelty detection signals was based primarily on the ITD statistics of the standard stimulus is consistent with the interpretation that natural ITD statistics are critical for pattern detection. Our results indicate that when the standard stimulus is in a location of higher statistical discriminability, a ‘stronger’ standard is built, which makes deviance detection easier. We speculate that the different weights for standards and deviants are the result of the mechanism underlying the building up of a standard, which requires repetitive stimulation.

Finally, our results support classic neural models of ITD coding. The compatibility of ITD statistics with classic neural models of ITD coding suggest that ITD statistics provide a potential mechanism influencing the density distribution of ITD tuning. The critical parameter of the density distribution of fiber pairs encoding interaural delays that the Stern and Colburn, 1978 model relied on to explain ITD discriminability was correlated with our ITD statistics. Thus, this model prediction matches the experimental data. Additionally, the coding scheme of the model proposed by Harper and McAlpine, 2004 is a plausible framework for our results, in which ITD discriminability is predicted by the neural population Fisher information. When the neural population Fisher information was modeled to match the ITD Fisher information, the predicted distribution of ITD tuning resembled experimental observations in brainstem of mammalian species (McAlpine et al., 2001; Harper and McAlpine, 2004; Hancock and Delgutte, 2004; Pecka et al., 2008).

In sum, we found evidence that natural ITD statistics are correlated with auditory spatial perception, supporting the idea that these statistics may determine the density distribution of ITD tuning in the auditory system and influence auditory spatial perception. The consistency across subjects indicates that this information may be genetically encoded and conserved, and serve as a potentially adaptive evolutionary mechanism for approaching optimal performance. Such a mechanism would be useful where larger ITD changes are required for detecting shifts in location for regions of space and frequency levels at which ITD discriminability is naturally weaker. These results have clinical implications in identifying stimulus parameters that are relevant to spatial discrimination and novelty detection that may lead to the development of more efficient hearing-aid devices.

Materials and methods

HRTF measurement

Request a detailed protocol

The dataset used in this study consisted of head-related impulse responses collected at the Institute for Research and Coordination in Acoustics/Music (IRCAM) from 2002 to 2003, available to the public at the LISTEN HRTF website http://recherche.ircam.fr/equipes/salles/listen. The procedure was performed inside an anechoic room with walls that absorbed sound waves above 75 Hz. The pulse signals were played by TANNOY 600 speakers facing the subjects, at a distance of 1.95 m from the center of the head. Subjects were seated on an adjustable rotating chair with a position sensor placed on the top of the head, allowing recording only when the position was correct. The impulse sounds were collected with a pair of small Knowles FG3329 microphones, calibrated using a BandK 4149 microphone. These small microphones were mounted on a silicon structure, which occluded the ear canals of the subjects, avoiding resonance and placing the microphone sensor at the entrance of the ear canal. The signal captured by the microphones was driven to a custom-made amplifier with 40 dB gain, and recorded using an RME sound card interface with Max/MSP real time library which deconvolved the microphone signal.

HRTF analysis

Request a detailed protocol

HRTF data from 51 subjects were included in the analysis (Figure 1). Head-related impulse responses (h) for the left (L) and right (R) ears corresponding to speaker locations at 0-degree in elevation and −90 to 90 degrees in azimuth (θ) were denoted as a function of time, hL,θ (t) and hR,θ(t). The azimuth in the database was sampled in 15-degree steps. Impulse responses were convolved with a white noise signal of 1 s duration s(t) to model the signals (x) received at the left and right ears:

xLt=hL,θ(t)*s(t)
xR(t)=hR,θ(t)s(t)

where * denotes convolution.

This procedure transfers temporal and level effects of the filtering properties of the head to the convolved signal. These convolved signals were filtered by narrow-band filters modeling cochlear processing, using the gamma-tone filter bank from Malcolm Slaney’s Auditory Toolbox, available in (https://engineering.purdue.edu/~malcolm/interval/1998-010). Gamma-tone filters are described in the following cochlear impulse response equation,

g(t;fk)=t3etτkcos(2πfkt)U(t),

where U(t) is the unit step function and the center frequencies of the filters (fk) ranged from 250 to 1250 Hz in 5 Hz steps. These center frequencies are within the range where ITD is a primary spatial binaural cue (Rayleigh and Xii, 1907) and also correspond with the frequency range of thresholds estimated by Mills, 1958. The time constants (tk) were chosen such that the bandwidth of these filters matched the estimated bandwidth of human cochlear filters (Glasberg and Moore, 1990).

The outputs of the gamma-tone filter bank on the left (yL(t;fk)) and right (yR(t;fk)) sides were computed by convolving left- and right-ear input signals with gamma-tone filters,

yLt;fk=gt;fk*xL(t)
yRt;fk=gt;fk*xRt.

Instantaneous phase was then computed for these output signals using the Signal Processing Toolbox (Mathworks). The instantaneous phase was computed as the phase (argument; arg) of the analytic representation of the signal,

θLt;fk=arg{yLt;fk+iy^Lt;fk}
θRt;fk=arg{yRt;fk+iy^Rt;fk}

where y is the signal and ŷ is its Hilbert transform.

For each azimuth (θ) and frequency range (fk), we then calculated the instantaneous interaural phase difference (IPD) over time,

IPDt;fk=θRt;fk-θLt;fk

where IPD(t;fk) is in radians.

The circular mean and standard deviation of the instantaneous IPD over time was then computed. To avoid the ITD rate of change being corrupted by artificial values caused by phase ambiguity, we unwrapped (MATLAB function) the mean IPD over azimuth, and subtracted the value 2π repeatedly (from all IPD values jointly) until the IPDs corresponding to midline locations returned to the value before the shift. Finally, the circular mean and standard deviation of IPD was converted to ITD (in µs) using the following equation:

ITD=106IPD2πf.

All the HRTF analysis steps described above are shown in Figure 1A. The mean ITD across azimuth was interpolated using a cubic spline (Figure 1B-left), and the rate of change of ITD across azimuth (ITDrc) was calculated as the derivative of this curve. The standard deviation of ITD (ITDv) was interpolated using the same method (Figure 1B-right) and the derivative of ITDv was calculated as the derivative of this curve.

We next combined the ITDrc and ITDv in a single quantity that is related to the discriminability of sound locations using ITD. The discriminability of a stimulus θ based on a measurement m(θ) is often described in terms of the Fisher information

FI(θ)=E[2θ2logp(m|θ)|θ],

where p(m|θ) is the conditional probability of the measurement given the stimulus. In our analysis, θ refers to azimuth location, while m(θ) is the ITD computed from the output of the left and right cochlear filters at a given frequency which is used to infer the stimulus.

We also assume that the conditional probability of ITD given azimuth p(ITD|θ) is a Gaussian distribution with mean μ(θ) and standard deviation σ(θ). Substituting the Gaussian conditional probability p(ITD|θ) into the definition of the Fisher information (Abbott and Dayan, 1999), the formula reduces to

FI(θ)=(μ(θ)σ(θ))2+2(σ(θ)σ(θ))2,

where μ'(θ) is the ITD rate of change (ITDrc), σ(θ) is the standard deviation of ITD (ITDv) and σ'(θ) is the derivative of the standard deviation of ITD (ITDv') with respect to azimuth. Discrimination thresholds have been shown to be proportional to the square root of the Fisher information (Abbott and Dayan, 1999), therefore we computed the square root of the ITD Fisher information (√FIITD).

The second term in the equation, which is often absent in calculations of Fisher information, is included in our analysis because the standard deviation of ITD changes with direction. Note that when the derivative of standard deviation σ'(θ) is zero, the square root of the Fisher information simplifies to μ'(θ)/σ(θ), the same as ITDrc/ITDv. This first term in the equation is conceptually similar to the d-prime metric; however, while d-prime is the subtraction of two means divided by the standard deviation, this part of the equation is the derivative of the mean divided by the standard deviation.

Finally, azimuth was converted to ITD (using the relationship between azimuth vs. ITD determined from the HRTFs), obtaining an estimate of the ITD statistics across frequency and ITD. The ITD statistics were computed for each subject, then the median and interquartile range of them was computed for each combination of azimuth and frequency across subjects (Figure 1C).

Estimation of spatial discriminability thresholds from previously published datasets (Mills, 1958)

Request a detailed protocol

Human spatial discriminability thresholds were estimated in the classic Mills, 1958 study. Data collection for this study was performed inside an anechoic room with a movable speaker, which delivered tones of different frequencies. The three participants were blindfolded and had their heads fixed by a clamp mounted on the chair on which they were sitting. In each trial, a 1 s duration ‘reference azimuth’ tone was played first, and 1 s after, the same signal played again after the speaker was moved slightly to the left or to the right. Subjects reported the left or right change using an interface box. Psychometric functions were obtained plotting the proportion of judgments to the left and to the right against the angle between reference and test locations. The azimuth changes leading to 75% responses to the right and to the left were estimated by linear interpolation. The threshold angle for discriminating a change was estimated by dividing the distance between these values by 2.

To convert azimuth to ITD, Mills, 1958 used binaural microphones placed inside a dummy-head ears. The ITD corresponding to the reference azimuth and the IPD corresponding to the threshold azimuth change were measured visually from signal traces using an oscilloscope. Threshold IPDs vs. reference ITDs were plotted in a logarithmic Y-axis vs. linear X-axis scale and linear curves were fit to the data. For the current study, we extracted data points of threshold dIPD across reference ITD and frequency from Mills, 1958. Threshold dIPD was converted to threshold dITD (using the same equation described in the HRTF analysis section).

Estimation of spatial ITD discriminability thresholds

Request a detailed protocol

A test was designed to estimate detection thresholds of changes in stimulus ITD (dITD) across specific frequencies of interest for this study. Healthy adult subjects were included in the sample (N = 24; 12 females and 12 males; mean age 28.0 ± 8.6; five left-handed and 19 right-handed; 19 from São Paulo and five from New York). After the procedure was described to the subjects, they provided written informed consent. The protocol was approved by the Ethics Committee of Universidade Federal do ABC and by the Internal Review Board of the Albert Einstein College of Medicine, where the study was conducted. There were no distinct groups in the experiment. All subjects had no reported history of neurological disorders or hearing impairments.

Pilot measurements of dITD thresholds were initially conducted in 10 subjects, using the same combination of frequencies and ITDs across subjects. This pilot experiment, which lasted approximately 150 min, was performed in up to five sessions per subject. Results from these measurements already showed that the ratio between ITDrc and ITDv leads to good prediction for frequencies above 250 Hz. Based on feedback from subjects undergoing pilot measurements, a shorter protocol lasting about 60 min was designed, which was conducted in 24 subjects, leading to the reported dITD thresholds results.

In this computer-based test, subjects were presented with 65 dB (A scale) tones within a range of frequencies through headphones calibrated with an Instrutherm DEC-460 decibel meter or a B and K 4947 microphone with an artificial ear. Trials started by pressing the spacebar key. In each trial, two binaural tones were presented in sequence. The ITD of both sounds started at the same value (reference ITD) and changed by different amounts (dITD) in the second half of either the first or the second sound in the sequence. Subjects were instructed to press the keys ‘1’ or ‘2’ depending on which sound in the pair they perceived a shift in location, and press a given key twice if confident, or alternate both (1 and 2) keys if unable to perceive a shift or unsure about it. Trials could be repeated as many times as needed by pressing the spacebar key. Feedback sounds indicated whether each of the pressed keys was correct or wrong.

The range of reference ITDs spanned from frontal (0 µs) to peripheral locations within the unambiguous range of ITD for each frequency. ITD change (dITD) varied from 1 µs to 200 µs towards the periphery to cover a range from unequivocally detectable and undetectable changes for each frequency. The direction of dITDs relative to the reference ITD was always away from the front, to avoid direction dependent biases affecting threshold measurements. Each condition (a given combination of frequency and reference ITD) was presented 29 times: four training trials with dITD 200 µs which were not computed and 25 testing trials which were used for estimating the dITD threshold. An initial training block used 750 Hz tones and −50 µs reference ITD for all subjects and was not included in the analysis. The following 20 blocks presented tones of frequencies 250, 500, 750, 1000 or 1250 Hz, and reference ITDs from −500 to 0 varied in steps of 100 µs. Reference IPDs of absolute values smaller than π/2 radians were included, to avoid phase ambiguity. The sequence of conditions was randomly varied across subjects.

Ongoing estimation of dITD thresholds was conducted from trial 1 of each block, to optimize the test estimate. A psychometric sigmoid curve was progressively fit to a plot of correct responses (assigned 1) and incorrect or unsure responses (assigned 0) as a function of dITD; the dITD corresponding to 0.5 accuracy was selected as the estimated dITD threshold. Preliminary dITD thresholds were computed from subsets of trials within varying dITD ranges. The first six trials ranged from dITDs 10 to 190 µs, spaced by 36 µs. From these trials, a first preliminary dITD threshold was estimated. In the following six trials, dITD was varied from −50 to 50 µs in steps of 20 µs, centered on the first preliminary dITD threshold; by the end of trial 12 a second preliminary dITD threshold was estimated using trials 1 to 12 using the same sigmoid fitting procedure. A third preliminary dITD threshold was then estimated using trials 1–18. The final seven trials ranged from −21 to 21 µs spaced by 7 µs, centered on the third preliminary dITD threshold. When a set of dITDs centered on a preliminary dITD threshold extended beyond the 1 to 200 µs range, dITDs were adjusted to fall within this range. While this procedure permitted an efficient estimate of dITD thresholds, it did not yield plausible dITD threshold estimates in cases where subjects provided largely random responses across dITDs. To address this limitation, a nonparametric receiver operating characteristic (ROC) classifier was conducted offline to independently verify the validity of estimated dITD thresholds.

For estimating threshold dITDs, an ROC classifier was computed over the 25 trials of each condition for each subject. The threshold was estimated by averaging the subset of possible thresholds within the 1 to 200 µs range that jointly maximized the number of correct (hit) responses and minimized false positive (type II error) ones. This optimization was obtained by selecting candidate dITD thresholds within a minimum euclidean distance from the perfect discrimination (i.e. 100% hit rate and 0% type II error rate) yielded by the ROC analysis. The ROC classifier was robust enough to estimate consistent dITD thresholds to all conditions from all subjects (Figure 2C-left). The threshold within 1–200 µs range estimated by the sigmoid fitting method were significantly correlated with those estimated by the ROC classifier (rPearson = 0.85).

Prediction of spatial discriminability thresholds by ITD statistics

Request a detailed protocol

An initial analysis of the ranked (Spearman’s) correlation coefficients was performed for the relationship between the threshold dITD averaged across subjects and the ITD statistics of the reference ITD (middle and right plots of Figure 2C). Spearman’s correlations, which were computed from averages of dITD thresholds over subjects across multiple conditions (N number of combinations of reference ITD and frequency), were used to assess the monotonicity of this relationship. Since the N in this analysis reflects the number of conditions, not the number of subjects, the standard statistical power analysis does not apply. Accordingly, p-values were not computed for this correlation analysis.

Additional analysis for the selection of dITD thresholds’ best predictors was performed using linear mixed-effect models (LMM; Magezi, 2015), classifying ITD statistics across stimulus conditions as ‘fixed factor’ and participants as ‘random factor’. LMM analysis assumes linearity between measures and predictors, then we inspected whether the relationship between dITD thresholds and each of the predictors was linear. Although relationships were mostly linear, some were best fitted by a power function. In these cases, we applied the standard method for achieving linearity by log-transforming both predictors and dITD thresholds. Linear regressions of the relationship between the multiple log-transformed dITD thresholds collected from each subject and log-transformed ITD statistics were performed, and Akaike Information Criterion (AIC) computed. The AIC analysis was used to compare the performance of each model, relying on both the number of model parameters and sample size (i.e. number of subjects) as a metric of goodness of fit; the lowest AIC corresponds to the best model. Since AIC is a relative quantity (i.e. the actual value brings no information alone), we normalized the AIC values by subtracting the minimum AIC observed for that behavioral measure (dAIC), which corresponds to the information loss compared to the best available model. Models with dAIC between 0 and 2 were considered equally good in their prediction capacity. Sample sizes were made several times higher than the number of parameters of our models to ensure samples were sufficiently large.

Collection and analysis of the mismatch negativity (MMN) component

Request a detailed protocol

Healthy adult participants were included in the sample (N = 33, 16 females; 17 males; mean age 29.5 ± 4.8; all right-handed). After the procedure was described to the subjects, they provided written informed consent. The protocol was approved by the Institutional Review Board of the Albert Einstein College of Medicine, where the study was conducted. All subjects had no reported history of neurological disorders and passed a bilateral hearing screening (20 dB HL or better at 500, 1000, 2000, and 4000 Hz).

A statistical power analysis for the MMN component using a stringent minimum MMN amplitude of −0.5 µV (SD 0.7 µV) revealed substantial power (1-β=0.87) with an alpha level of 0.05 in 30 adult subjects. We performed the first set of conditions (1–10) in 17 subjects and found that adjusting the difference between ITD of standard and deviant by a weighted average of the ratio between ITDrc and ITDv of the stimuli ITD; 16 additional subjects were recruited for a second set of conditions (11–20), replicating the initial findings. No significant difference was found between groups and therefore the analysis reported in the manuscript was performed on the pooled data of all 33 participants. Participants were seated inside a sound-attenuated booth (IAC Acoustics, Bronx, NY) in front of a TV screen where a subtitled muted movie was presented. Sound was delivered by a Neuroscan StimAudio system through insert earphones (3M Eartone 3A) calibrated to 53 dB (A-weighted) using a B and K 4947 microphone with an artificial ear. Sound signals were 50 ms duration tones (25 ms rise-fall time) of different frequencies and ITDs synthesized with MATLAB software.

Participants listened to oddball sequences presenting repetitive (‘standard’) tones embedded with sporadic (‘deviant’, 15% of the trials) tones with a 300 ms inter-stimulus interval (Figure 3A-left). Each subject was presented with 10 conditions, which differed in frequency and ITD. Subjects 1 to 17 performed the following conditions (‘ITD standard’ vs. ‘ITD deviant’ at ‘tone frequency’): (1) −590 vs. −295 µs at 400 Hz; (2) −295 vs. 0 µs at 400 Hz; (3) 0 vs. −295 µs at 400 Hz; (4) −295 vs. 295 µs at 400 Hz; (5) 295 vs. −295 µs at 400 Hz; (6) −295 vs. 590 µs at 400 Hz; (7) 590 vs. −295 µs at 400 Hz; (8) −590 vs. −295 µs at 600 Hz; (9) −295 vs. 0 µs at 600 Hz; and (10) 0 vs. −295 µs at 600 Hz. Subjects 18 to 33 performed the conditions (11) 0 vs. −499 µs at 400 Hz; (12) −499 vs. 0 µs at 400 Hz; (13) 0 vs. −159 µs at 400 Hz; (14) −159 vs. 0 µs at 400 Hz; (15) 0 vs. −499 µs at 550 Hz; (16) −499 vs. 0 µs at 550 Hz; (17) 0 vs. −499 µs at 650 Hz; (18) −499 vs. 0 µs at 650 Hz; (19) 0 vs. −159 µs at 650 Hz; and (20) −159 vs. 0 µs at 650 Hz. Each condition was presented in 3 blocks of 474 trials; the block order was randomized for each subject. Each of the conditions was presented three times during the experimental session; the order of blocks was randomized for each subject. The first trials of each block (18 standards + one deviant) were used for training and not included in the analysis. Results reported included the following trials (385 standards and 70 deviants) for each subject. MMN values for each condition were estimated by subtracting mean ERP signals of 210 deviant trials by the mean of 1155 standard trials; after removal of trials with artifacts (see below). Sessions lasted approximately 2.5 hr, including placement of electrode caps and breaks during the EEG recording.

EEG was recorded with Neuroscan SynAmps and a 32-channel electrode cap following the modified international 10–20 System, including electrodes on the nose (reference), P09 (ground) and left and right mastoids (LM and RM, used for offline analysis). A bipolar configuration was used between an external electrode below the left eye and the FP1-electrode position for measuring vertical electro-oculogram (EOG). The signal was recorded at 500 Hz sampling rate using a band-pass from 0.05 to 100 Hz. Impedances were maintained below 5 kOhms.

To measure the MMN, EEG signals from each subject were processed as follows: (1) 20 Hz low-pass filtering; (2) pooling (by concatenating) all EEG signals obtained during sound stimulation; (3) removal of eye-blink and saccade artifacts by performing Independent Component Analysis and reconstructing the signal without components correlated to the EOG; (4) selection of 600-millisecond epochs around sound presentation (−100 to 500 ms from sound onset); (5) removal of linear portions of EEG signals and exclusion of trials containing voltage deflections larger than 75 mV; (6) re-reference of the EEG recordings to the mastoid recordings by subtracting the average mastoid signal from the other channels; (7) ERP measurement, averaging separately signals evoked by standard and deviant trials (the first 19 trials were used for subject training and excluded from the analysis); (8) subtraction of standard from the deviant ERPs to compute the MMN; (9) identification of time bin containing the MMN peak in signals from the FZ channel (frontal EEG electrode, in the middle of the forehead) averaged across subjects, for each condition; (10) measurement of MMN peak for each subject and condition within this time bin.

Grand-averages of ERPs recorded at FZ electrodes were computed for standard and deviant trials across all subjects and conditions; for estimating the MMN topography, the signal from each electrode in the time bin of the peak MMN was averaged across subjects and conditions (Figure 3A-right).

Prediction of MMN by ITD statistics

Request a detailed protocol

An initial analysis of the ranked (Spearman) correlation coefficients was performed for the relationship between the MMN peak amplitude averaged across subjects and the absolute ITD difference between standard and deviant stimuli, multiplied by the weighted sum of ITD statistics estimated for both standard and deviant stimuli (Figure 3C-left). Additional LMM analysis (described in “Prediction of spatial discriminability thresholds by ITD statistics'' section) was used to compare performance across predictors of MMN peak amplitude. LMM analysis was conducted on MMN peak as the dependent variable, absolute ITD difference multiplied by the weighted sum of ITD statistics as ‘fixed’ factor and participant as ‘random’ factor. Since the relationship between MMN data and predictors followed a power function, it was linearized using log transformation in both measures. No outliers were detected or excluded. The AIC method was used for comparing the models (described in “Prediction of spatial discriminability thresholds by ITD statistics'' section).

Neural models

Request a detailed protocol

Two seminal models (Stern and Colburn, 1978; Harper and McAlpine, 2004) addressing discriminability of azimuth positions in acoustic space based on ITD were used to examine the potential link between the brain representation of sensory statistics and perceptual functions.

The relative number of fiber pairs encoding interaural delays, p(τ), used as predictors of ITD discriminability (Figure 4A), was extracted from Stern and Colburn, 1978. To test whether ITD statistics could be represented by this model, the p(τ) parameter was adjusted to match statistics, by normalizing ITD statistics from 0 to 1 and scaling the resulting data to obtain a probability distribution where the sum of all probabilities was equal to 1 (Figure 4B).

Fisher information from single-neuron IPD-tuning curves and the optimal population distribution of IPD-tuning estimated for humans were extracted from data reported in Harper and McAlpine, 2004. The M-shaped Fisher information curve was positioned at the best IPD of each neuron within the distribution, to obtain an estimate of the population Fisher information across IPD and frequency. The neuron population Fisher information across IPD was converted to ITD, obtaining the prediction of ITD discriminability induced by the neuron distribution (Figure 4C). To test whether ITD statistics could be represented by parameters of this model, for each frequency, we generated two midline-mirrored gaussian neural population distributions with random mean IPD tunings and standard deviations from 0 to π, then selected the distribution that displayed the highest Pearson correlation between Fisher information and ITD statistics, then constant were summed and multiplied to the density values in order to to obtain one in slope and intercept of a linear fit. Finally, the density across all frequencies were corrected to generate probability one.

All data processing was performed in MATLAB (Mathworks) using built-in or custom-made routines. The datasets generated and analyzed in the current study are available in https://doi.org/10.5061/dryad.h70rxwdf9.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
    Arboreal Life and the Evolution of the Human Eye: A Revised Publication of the Bowman Lecture Delivered Before the Ophthalmological Society of the United Kingdom in May, 1921
    1. ET Collins
    2. Opthalmological Society of the United Kingdom
    (1922)
    Sagwan Press.
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
    Signal Detection Theory and Psychophysics
    1. DM Green
    2. JA Swets
    (1966)
    Peninsula Publishing.
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
    On the Minimum Audible Angle
    1. AW Mills
    (1958)
    The Journal of the Acoustical Society of America 30:237–246.
    https://doi.org/10.1121/1.1909553
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
    XII. on our perception of sound direction
    1. L Rayleigh
    2. RL Xii
    (1907)
    The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 13:214–232.
    https://doi.org/10.1080/14786440709463595
  41. 41
  42. 42
  43. 43
    Auditory frequency discrimination and event-related potentials
    1. M Sams
    2. P Paavilainen
    3. K Alho
    4. R Näätänen
    (1985)
    Electroencephalography and Clinical Neurophysiology/Evoked Potentials Section 62:437–448.
    https://doi.org/10.1016/0168-5597(85)90054-1
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
    Experimental Psychology
    1. R Woodworth
    (1938)
    Holt.
  52. 52
  53. 53
    Discriminations of interaural phase differences
    1. WA Yost
    (1974)
    The Journal of the Acoustical Society of America 55:1299–1303.
    https://doi.org/10.1121/1.1914701

Decision letter

  1. Catherine Emily Carr
    Reviewing Editor; University of Maryland, United States
  2. Andrew J King
    Senior Editor; University of Oxford, United Kingdom
  3. Catherine Emily Carr
    Reviewer; University of Maryland, United States
  4. Michael Pecka
    Reviewer; Ludwig-Maximilians Universitaet Muenchen, Germany

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The natural statistics of spatial cues vary with sound source location, and this study supports the hypothesis that these statistics are represented in the neural code for interaural time differences and influence spatial perception.

Decision letter after peer review:

Thank you for submitting your article "Anticipated ITD statistics are built into human sound localization" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Catherine Emily Carr as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Andrew King as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Michael Pecka (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This manuscript investigates whether the human brain exploits invariant spatial statistics to optimize sound source location discriminability. The authors first establish the presence of subject-invariant interaural time difference (ITD) statistics. After identifying potential physiological mechanisms that would render the neural ITD processing sensitive to these statistics, they then demonstrate the ability of those ITD statistics to predict human discrimination thresholds. Finally, an oddball sequence paradigm and EEG measurements are used to test to what extent ITD statistics processing can be detected in the human brain. The statistical characterization that is presented is compelling and interesting. However, several major concerns were raised that will have to be addressed satisfactorily for the paper to be deemed acceptable for publication. The presentation is opaque, and we had difficulty understanding how the authors computed ITDv. We were also not convinced that the data presented can distinguish whether ITDv and ITDrc simply influence ITD processing or whether the brain anticipates the location-specific invariance of these statistics and exploits them to optimize ITD discrimination.

Essential revisions:

1) The title and main conclusion are an overstatement. To justify these statements (anticipation etc), the authors would have to show that "unnatural" ITDv (i.e. scrambled higher-order stats) deteriorate ITD discrimination. Alternatively, the statements should be modified / toned down to more accurately reflect the significance of the findings.

2) It is not clear how the authors computed either the ITD rate of change with azimuth (ITDrc) or ITD variability (ITDv) (they mention across-frequency interference that changes over time). Perhaps an example and equations in the methods would make this clearer.

The critical contribution made by this paper is the introduction of ITD_v. The data for ITD_{rc} and ITD_v come from the LISTEN data base measured at IRCAM long ago. This database consists of physical measurements, interaural differences as a function of sound source location in free field for more than 50 individual heads. Unfortunately, it is not clear how ITD_v is determined. It is said to come from a variability in head related transfer functions (HRTFs) over time. But HRTFs do not normally depend on time. HRTFs describe wave filtering from a source in space to some chosen point in the ear. Why should they depend on time? Does ITD_v come from some estimate of the motion of listeners during the action of sound localization? Does it represent variability seen at IRCAM from one experimental session to the next for a given listener? The basis for the calculations presented here is a total mystery.

The critical variable ITD_v is shown in Figure 1D. There, the variability appears to be small for small ITD values (source near the geometrical midline) and it appears to be large for large ITD values (source out to the side). However, if the temporal variation of ITD is caused by inadvertent listener rotation, then the sensitivity pattern ought to be the opposite of what appears in Figure 1D. This is so because the sensitivity of ITD to azimuth is steepest near the midline and becomes quite flat at large source azimuths.

3) The manuscript is very hard to follow. The reasoning is obscure, with mathematical operations described in words instead of equations. Thus, the substance of the paper hides within a forest of details. We counted more than 120 figures. This is too many.

4) The evidence for some of the central claims of the study is either not presented sufficiently clear or tested only indirectly. We understand that the manuscript addresses an important question, and we understand the need to explain many rather complicated physical details and methodological constraints. One reviewer commended the authors for their ability to summarize and explain their data in a manner that is accessible to a broader readership. Nevertheless, the "anticipation of characteristic ITDv / statistics built into the brain" argument is not directly shown by the data that the authors provided (see next point).

5) We are not convinced that data that are presented can distinguish whether ITDv and ITDrc simply influence ITD processing or whether the brain anticipates the location-specific invariance of these statistics and exploits them to optimize ITD discrimination. I.e. the reported increase in variability at lateralized positions will certainly partially account for deterioration of JNDs (as will the decrease in ITDrc), and a large amount of data in support of this notion is presented.

Does changing higher-order statistics indeed deteriorate ITD JNDs (or dITD)? The experiment summarized in Figure 4 touches on this idea (loss of ITDv due to the use of in-ear earphones results in elevated thresholds), however, as the authors noted, this threshold shift could also be (partially) due to the use of naïve listeners or other factors and thus is inconclusive. Likewise, the MMN paradigm (Figure 5) is unconvincing in this respect as well. The authors first state that the paradigm design avoids the presence of ITDv. They then show that the recorded MMN data data is best explained by a model including D-stat (i.e. ITDrc and ITDv) and argue that this finding is evidence for the anticipation of these statistics. While we agree that this finding is suggesting potentially "anticipatory processing", it is very difficult to judge the significance of the model performances without a control condition (including naturalistic free-field statistics). Some sort of comparative analysis (i.e. the manipulation of the higher-order statistics) to directly show whether neuronal processing is adapted to a specific statistical distribution would be required to justify the strong statements made in the Title, Abstract, and Introduction (last paragraph) about anticipation.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your article "Natural ITD statistics are built into human sound localization" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Catherine Emily Carr as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Andrew King as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Michael Pecka (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you revise the manuscript. Please note, however, that we wish to see a point-by-point response to the main comments set out below, as this will determine whether we will be willing to consider another revision. We have summarized our extensive discussion in the following, and we have also appended the original (pre-consensus) reviews, for your edification.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, when editors judge that a submitted work as a whole belongs in eLife but that some conclusions require a modest amount of additional new data, as they do with your paper, we are asking that the manuscript be revised to either limit claims to those supported by data in hand, or to explicitly state that the relevant conclusions require additional supporting data.

Our expectation is that the authors will eventually carry out the additional experiments and report on how they affect the relevant conclusions either in a preprint on bioRxiv or medRxiv, or if appropriate, as a Research Advance in eLife, either of which would be linked to the original paper. This would be relevant if you wished to shorten the paper, and publish some experiments in a second publication.

Essential revisions:

1) We all regard the use of ITD statistics, specifically, the derivative (rate of change) of the mean ITD over azimuth (ITDrc) and the standard deviation of ITD (ITDv) over time as a major strength of the paper. The finding that the Stern and Colburn, (1978) and Harper and McAlpine, (2004) models can reflect a representation of ITD statistics provides a strong connection between the neural coding and psychophysics. Thus, neural density may reflect the Fisher Information for azimuthal angle changes, and the paper could be motivated by explaining why the ITD sensitivity is not equally good within the entire physiological relevant ITD range.

2) Our major concern is the wording, which often seems vague. To excerpt from a review "At the moment, the manuscript reads as if the ITD statistic is actually a detection cue, like "Oh, I now perceive the variance! The sound must come from this direction." We think you mean that fewer neurons code sound directions where there is less Fisher information (large ITDs, typically generated from large azimuthal angles) than where there is more information from the ears (small ITDs from mid-line directions). If this is what you mean, please try to be clearer and avoid phrases like terms "statistic is built in" and "the brain anticipates".

We have extensively discussed what you mean with respect to ITD sensitivity. You imply that ITD sensitivity (and the underlying neural processing) is improved by the naturally occurring ITDv, but you don't explain how this might work, i.e. you show that the brain uses the additional variance information for ITD calculation, but whether this information necessarily follows natural statistics is not clear. Furthermore, the wording of "anticipation" implies that the brain has learned what statistics exactly define naturalistic ITDv, which is not reflected in your data. Our uncertainty about what you mean motivates point 3 below.

A third point of confusion relates to D-stat. This has the highest correlation with the Mills data, and the way we understand (or interpret) your results and discussion is that even though no ITDv is present in the stimuli, the perception is explained by neural processing that incorporates the ITDv that would normally be present for these stimuli. And since the ITDv is not present, the brain must "know" (i.e. have learned) what these stats are. Please clarify if this is what you mean.

3) As you can tell from the above questions, our major concern is that the writing is still unclear. We have spent two weeks discussing what we think you mean. Therefore, we believe that there is potential to further clarify the wording and explanations. For example, with respect to understanding what is meant by "Natural ITD statistics are built into [the brain])", we would like to clarify whether you agree or disagree with the interpretation that "neural density may reflect the Fisher Information for azimuthal angle changes, etc" before you start revising the manuscript for a third time. Please clarify this along with your responses to the other points in a reply letter. Once we understand what you mean, we could suggest how/if to streamline the manuscript.

4) The reviewers have discussed a recommendation to omit some of your experiments, on the grounds that the additional information doesn't substantially add to the main message (see review #4), but rather raises questions that potentially distract from it. We would prefer to resolve the issues in Points 1-3 first.

Reviewer #1:

This is my second review of this interesting paper. The basic idea is very compelling, that natural ITD statistics are built into the neural circuits that mediate sound localization. I also like the combination of modeling and psychophysics.

Nevertheless, the presentation is quite complex. The authors have written the paper three times. Once, when first sent to the journal, prior to review, and next after the prior reviews. This version is much clearer and has less than half the number of figures.

Strengths – the use of ITD statistics, specifically, the derivative (rate of change) of the mean ITD over azimuth (ITDrc) and the standard deviation of ITD (ITDv) over time are both important insights. The authors' finding, that the Stern and Colburn, (1978) and Harper and McAlpine, (2004) models can implement a representation of ITD statistics, provided a strong connection between the neural coding and psychophysics. I feel the field has been arguing about this for decades, and that this represents a major insight.

Reviewer #3:

The authors substantially revised the manuscript by adding more detailed explanations for the calculations they made. They also employ more measured wording throughout the paper, now. I remain a bit skeptical about the suitability of their use of the terms "built in" and "anticipated" with regard to the conclusions of their findings.

Reviewer #4:

I was initially in full agreement with the critique made in the first review (reviewer #2 was not available for re-review) and did not understand the rebuttal by the authors. But it struck me that the authors are by no means new to this research area so that the apparent absence of sense in the manuscript can only be explained by a gross misunderstanding. It took a couple of days until it clicked, and I believe I understand now what the authors are trying to convey. My take on their research question is this:

Why is ITD discrimination not equally good within the entire ecologically relevant range but is better for midline directions? Because Mills' data are already converted into milliseconds of ITD, it cannot just be the geometry (e.g Kuhn, 1977). The authors argue: It is because in real life, with broad band stimulation, the interaural correlation arriving from the left and right ear at the coincidence-detecting neurons is reduced due to differing HRTF filtering. IPD and ILD fluctuations are also introduced if ITD is not perfectly compensated by internal delay line differences (e.g. differences in left and right axonal travel times). Thus, they say, why wasting large number ITD sensitive neurons to reduce neural noise on spatial locations, where spatial cues are degraded anyway (ITDv)? In addition, ITD changes per angular location change (ITDrc) are lower at these off-centre directions relative to midline directions. In other words, one could argue that the Fisher information (change of mean divided by variance – they call it D-stat) for angular changes in source direction from these direction is low and so requires less number of neurons to be coded than the far higher information from frontal directions. The so predicted density distribution of binaural detectors as a function of their best ITD can be validated with ITD discrimination experiments using single tones, as these do not get ITD (and ILD) fluctuations by going through HRTFs, auditory filters and delay lines. Because their D-stat derived with broadband stimuli fit Mill's data very well, but D-stat can itself actually not explain Mills ITD discrimination thresholds because they were obtained with pure tones and are expressed in millisecond (not azimuthal angle), one might conclude that whatever limits ITD discrimination with tones has been influenced by D-stat by experiencing everyday broadband sounds. I assume the authors imply that it is neural noise that became somehow adjusted to match D-stat. Neural noise per ITD is determined by the number of ITD coding neuron coding this ITD (e.g. Stern and Colburn, 1978; Harper and McApline, 2004).

In the case that this indeed is, what the authors intend to say, I believe their idea is well worth disseminating. I agree with the first review that the current presentation is opaque and the description vague and misleading. It is lacking logical line of thoughts that tells the reader this story. The choice of wording itself is confusing and become more precise and descriptive in order to convey the message. The narrative can and should be kept simple and specific as it is actually a straightforward story, which is certainly novel and worth exploring further.

I would like to give further advice: The basic idea is in my opinion sufficient for an initial publication and I would not "dilute" it with unnecessary details. This story does not need further experimental data than those well-accepted ITD threshold data by Mills, (1958). They are sufficient for the purpose. Showing own, partly contradicting experimental data, just adds confusion and distracts the reader from the main message. Similarly, the EEG data are irrelevant for the narrative and just lengthen the manuscript unnecessarily.

I agree that one can also base the neural density on the distribution of ITD tuning slope curves, assuming neurons are most sensitive to changes in ITD here (Harper and McAlpine, 2004). This is indeed worth mentioning because this thinking is becoming the popular. But the fact that an ITD tuning curve has two slopes just complicates the matter. For the sake of keeping the argument simple, I recommend not go into detail with the slope model, but illustrate the basic idea using the classic Jeffress-based model by Stern and Colburn, (1978).

https://doi.org/10.7554/eLife.51927.sa1

Author response

Summary:

This manuscript investigates whether the human brain exploits invariant spatial statistics to optimize sound source location discriminability. The authors first establish the presence of subject-invariant interaural time difference (ITD) statistics. After identifying potential physiological mechanisms that would render the neural ITD processing sensitive to these statistics, they then demonstrate the ability of those ITD statistics to predict human discrimination thresholds. Finally, an oddball sequence paradigm and EEG measurements are used to test to what extent ITD statistics processing can be detected in the human brain. The statistical characterization that is presented is compelling and interesting. However, several major concerns were raised that will have to be addressed satisfactorily for the paper to be deemed acceptable for publication. The presentation is opaque, and we had difficulty understanding how the authors computed ITDv. We were also not convinced that the data presented can distinguish whether ITDv and ITDrc simply influence ITD processing or whether the brain anticipates the location-specific invariance of these statistics and exploits them to optimize ITD discrimination.

We have revised the manuscript focusing on the clarification of general concepts and the methods to compute ITDv and analyze the data; we also reduced the number of figures. As mentioned in the summary above, the main take-home message of this work is that the brain anticipates location-specific ITD statistics invariant across contexts, and exploits them to optimize ITD discrimination. We assume that ITDv and ITDrc statistics influence ITD processing, as postulated in previous studies. However, the focus of our study was not to address the direct effect of ongoing or recent statistics on ITD processing. We believe that the claim suggested above, “The brain anticipates the frequency- and location-specific invariance of these statistics [ITDv and ITDrc across contexts] and exploits them to optimize ITD discrimination”, is a valid and accurate description, and added this statement to the abstract and conclusions.

Essential revisions:

1) The title and main conclusion are an overstatement. To justify these statements (anticipation etc), the authors would have to show that "unnatural" ITDv (i.e. scrambled higher-order stats) deteriorate ITD discrimination. Alternatively, the statements should be modified / toned down to more accurately reflect the significance of the findings.

A way of testing if changes in ITDv affect ITD discrimination could be by manipulating interaural correlation. It is already known that decreasing interaural correlation deteriorates sound localization in humans. Previous work by co-authors of this manuscript has shown that behavioral biases observed in barn owls when interaural correlation is manipulated is consistent with a representation of ITDv, but whether this is also observed in humans has not been tested yet. However, as we noted in the response to the editor’s summary, the main goal of our study was not to test the effect of recently modified statistics. We have revised the text to make the goal of the study clearer and statements to more accurately reflect significant findings. We had chosen the word ‘anticipate’ in an attempt to make it clear that this study is not about the processing of ongoing ITD statistics but about a built-in representation of them, affecting perception. However, we have come to understand that the word ‘anticipation’ creates confusion. We have thus revised the title to “Natural ITD statistics are built into human sound localization” and revised conclusion statements to clarify this issue across the paper.

2) It is not clear how the authors computed either the ITD rate of change with azimuth (ITDrc) or ITD variability (ITDv) (they mention across-frequency interference that changes over time). Perhaps an example and equations in the methods would make this clearer.

The critical contribution made by this paper is the introduction of ITD_v. The data for ITD_{rc} and ITD_v come from the LISTEN data base measured at IRCAM long ago. This database consists of physical measurements, interaural differences as a function of sound source location in free field for more than 50 individual heads. Unfortunately, it is not clear how ITD_v is determined. It is said to come from a variability in head related transfer functions (HRTFs) over time. But HRTFs do not normally depend on time. HRTFs describe wave filtering from a source in space to some chosen point in the ear. Why should they depend on time? Does ITD_v come from some estimate of the motion of listeners during the action of sound localization? Does it represent variability seen at IRCAM from one experimental session to the next for a given listener? The basis for the calculations presented here is a total mystery.

The critical variable ITD_v is shown in Figure 1D. There, the variability appears to be small for small ITD values (source near the geometrical midline) and it appears to be large for large ITD values (source out to the side). However, if the temporal variation of ITD is caused by inadvertent listener rotation, then the sensitivity pattern ought to be the opposite of what appears in Figure 1D. This is so because the sensitivity of ITD to azimuth is steepest near the midline and becomes quite flat at large source azimuths.

We have made a substantial revision of the Materials and methods section explaining details of how ITD statistics were computed. We also revised Figure 1 to clarify the methods. We would like to note that ITDv was estimated as ITD variability over time but not assumed to be induced by subjects’ motion.

The basic idea is that natural acoustic signals reaching the ears are distorted in a location- and frequency-dependent manner, leading to changes in the instantaneous ITD along the duration of these signals. These changes in instantaneous ITD induce variability of ITD within short-scale time windows along sounds and across trials, affecting the natural reliability of the ITD cue. To replicate this process, we used a broadband signal, HRIRs (HRTFs impulse responses) and digital filters that reproduced cochlear filtering, as follows: (1) we convolved a broadband signal with head-related impulse responses across azimuth locations to model spatial sounds, (2) we approximated the frequency bands that reached hair cells filtering acoustic signals using parameters of human cochlear filters previously reported, and (3) we estimated ITD variability across frequency and location by computing the standard deviation of instantaneous ITD along stimulus duration.

The revised Figure 1 now provides a visual description of this method, showing that ITDv estimation for a signal from a given position in azimuth requires convolving the signal with the corresponding left- and right-ear HRIRs for each subject and passing them through model cochlear filters to estimate the standard deviation of the differences in instantaneous phase between the left and right band-filtered signals over time.

3) The manuscript is very hard to follow. The reasoning is obscure, with mathematical operations described in words instead of equations. Thus, the substance of the paper hides within a forest of details. We counted more than 120 figures. This is too many.

We are aware this manuscript addresses a complex issue, and our intention was to provide evidence of rigor of analysis with a complete reporting of data and details. However, we understand the reviewers’ concern and have streamlined the manuscript accordingly. We have added equations to the description of the methods and revised the manuscript to simplify the language. When possible, we made analysis steps more straightforward and explicative (results and conclusions did not significantly change). We have reduced the number of figures, now proposing only 4 main and 2 supplementary ones, and simplified their layout.

4) The evidence for some of the central claims of the study is either not presented sufficiently clear or tested only indirectly. We understand that the manuscript addresses an important question, and we understand the need to explain many rather complicated physical details and methodological constraints. One reviewer commended the authors for their ability to summarize and explain their data in a manner that is accessible to a broader readership. Nevertheless, the "anticipation of characteristic ITDv / statistics built into the brain" argument is not directly shown by the data that the authors provided (see next point).

We thank the reviewers for their considerate comment, which guided us in conducting a vast revision that substantially improved the manuscript. As stated above, we have revised the title and conclusions to clarify the notion of ‘anticipation’ and avoided using this wording too often. We provide specific details of revisions addressing this issue below.

5) We are not convinced that data that are presented can distinguish whether ITDv and ITDrc simply influence ITD processing or whether the brain anticipates the location-specific invariance of these statistics and exploits them to optimize ITD discrimination. I.e. the reported increase in variability at lateralized positions will certainly partially account for deterioration of JNDs (as will the decrease in ITDrc), and a large amount of data in support of this notion is presented.

Does changing higher-order statistics indeed deteriorate ITD JNDs (or dITD)? The experiment summarized in Figure 4 touches on this idea (loss of ITDv due to the use of in-ear earphones results in elevated thresholds), however, as the authors noted, this threshold shift could also be (partially) due to the use of naïve listeners or other factors and thus is inconclusive. Likewise, the MMN paradigm (Figure 5) is unconvincing in this respect as well. The authors first state that the paradigm design avoids the presence of ITDv. They then show that the recorded MMN data data is best explained by a model including D-stat (i.e. ITDrc and ITDv) and argue that this finding is evidence for the anticipation of these statistics. While we agree that this finding is suggesting potentially "anticipatory processing", it is very difficult to judge the significance of the model performances without a control condition (including naturalistic free-field statistics). Some sort of comparative analysis (i.e. the manipulation of the higher-order statistics) to directly show whether neuronal processing is adapted to a specific statistical distribution would be required to justify the strong statements made in the Title, Abstract, and Introduction (last paragraph) about anticipation.

We thank the reviewers for raising these critical issues. It is true that the dITD thresholds we report obtained from our earphone testing are higher than those reported by Mills et al., (1958). An important difference, which motivated the decision to conduct our own measurements, is that Mills et al., (1958) used free-field stimulation and estimated ITD from azimuth values. Free-field stimulation, however, allows subjects to use spatial cues other than ITD to detect sound location in azimuth, which makes the lower thresholds expected. This argument is supported by the fact that ITD thresholds measured with dichotic stimulation by other previous studies (e.g., Brughera et al., 2013) are also higher than those reported by Mills. In addition, we believe our reference to using naïve listeners is important, not only to explain differences with Mills data but also to avoid the effect of training on spontaneous ITD thresholds. We have revised our justification of the differences between our data and Mills thresholds, adding this evidence. We would like to emphasize the idea that our study is not about the effect of ongoing statistics of experimental stimuli on ITD thresholds, which is why we did not attempt to manipulate them. At this time, it is difficult to predict the interplay of anticipated and actual statistics on ITD processing. Our group is currently conducting experiments examining this interaction but consider this project is beyond the scope of the present study and adding it would make it even more complex than what it already is.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Essential revisions:

1) We all regard the use of ITD statistics, specifically, the derivative (rate of change) of the mean ITD over azimuth (ITDrc) and the standard deviation of ITD (ITDv) over time as a major strength of the paper. The finding that the Stern and Colburn, (1978) and Harper and McAlpine, (2004) models can reflect a representation of ITD statistics provides a strong connection between the neural coding and psychophysics. Thus, neural density may reflect the Fisher Information for azimuthal angle changes, and the paper could be motivated by explaining why the ITD sensitivity is not equally good within the entire physiological relevant ITD range.

Thank you for this constructive suggestion. We agree with the idea that Fisher information (FI) may be a better way to represent ITDrc and ITDv statistics. We have revised the manuscript showing FI’s predictive power of our psychophysical data and its consistency with Stern and Colburn, (1978) and Harper and McAlpine, (2004) models. As recognized by reviewers, in fact the D-stat metric (ITDrc/ITDv) used in the previous submissions was closely related to Fisher information; specifically, D-stat was approximately equal to the square root of Fisher information (√FIITD), and therefore conclusions were unchanged after revising the manuscript as suggested.

2) Our major concern is the wording, which often seems vague. To excerpt from a review "At the moment, the manuscript reads as if the ITD statistic is actually a detection cue, like "Oh, I now perceive the variance! The sound must come from this direction." We think you mean that fewer neurons code sound directions where there is less Fisher information (large ITDs, typically generated from large azimuthal angles) than where there is more information from the ears (small ITDs from mid-line directions). If this is what you mean, please try to be clearer and avoid phrases like terms "statistic is built in" and "the brain anticipates".

We are painfully aware that the wording did not convey our message clearly and have revised the whole manuscript to address this issue. Regarding the specific excerpt mentioned above, we must note that our paper is not conveying the idea of 'perceiving' variance and inferring from it where a sound is coming from. Rather, we are suggesting the brain possesses a prior representation of ITD statistics (ITDv and ITDrc) across frequency and location of sounds, and that this representation affects the perception and discriminability of ITD. We support the interpretation that the number of neurons coding ITD may match the natural level of Fisher information. Our paper tests that hypothesis through the Stern and Colburn and Harper and McAlpine models, showing that the distribution of neurons in these classic models of ITD discriminability are consistent with a representation of the stimulus Fisher information, such that there are fewer neurons where Fisher information is low. In this way, we argue that the human brain has a representation of ITD statistics and show that classic models of ITD coding provide a possible way that the statistics are represented.

We have extensively discussed what you mean with respect to ITD sensitivity. You imply that ITD sensitivity (and the underlying neural processing) is improved by the naturally occurring ITDv, but you don't explain how this might work, i.e. you show that the brain uses the additional variance information for ITD calculation, but whether this information necessarily follows natural statistics is not clear. Furthermore, the wording of "anticipation" implies that the brain has learned what statistics exactly define naturalistic ITDv, which is not reflected in your data. Our uncertainty about what you mean motivates point 3 below.

Thank you for the effort of helping us to improve our manuscript. We are not implying ITD sensitivity is ‘improved by the naturally occurring ITDv’ but that the brain contains a representation of ITDv that a priori affects ITD sensitivity. We assume that this is adaptive and creates a more efficient code for ITD because it makes ITD sensitivity finer for stimuli that are more informative about azimuth (stimuli that carry cues that vary less over stimulus duration and across trials and differ more across neighboring locations). Nor do we aim to imply that ITD ‘variance’ is ‘used’ for ITD calculation, but that ITD calculation is more accurate when ITDv is lower in natural stimuli. By using the word ‘anticipation’ in the previous version, we implied the brain has learned ITDv (and ITDrc) or contains an innate representation of it, which affects ITD perception for stimuli of different frequencies and location without needing to compute these statistics during stimulus presentation. However, our paper is not about how the statistics are learned. Our intention is to convey the idea that the brain undergoes early-life learning or, more likely, innately contains a representation of ITD statistics, including ITDv, and this is why we used the wording ‘built-in’ in earlier submissions. However, based on the critiques, we have abandoned this terminology and now refer to it as a “representation” of ITD statistics. Whether this representation of ITD statistics is genetically determined or learned is an interesting and still unanswered question, which we are currently studying. Our assumption is that the statistics that are captured over brain evolution are within a group of statistics determining Fisher information (ITDrc and ITDv).

A third point of confusion relates to D-stat. This has the highest correlation with the Mills data, and the way we understand (or interpret) your results and discussion is that even though no ITDv is present in the stimuli, the perception is explained by neural processing that incorporates the ITDv that would normally be present for these stimuli. And since the ITDv is not present, the brain must "know" (i.e. have learned) what these stats are. Please clarify if this is what you mean.

We find this paragraph quite close to what we are trying to suggest. The brain must ‘know’ and therefore has learned early in life, or has innately received, the information of the statistics underlying D-stat (ITDrc/ITDv). Here again, this is the reason why we used the ‘built-in’ terminology in the previous version of the manuscript, which was discarded in the current revision.

3) As you can tell from the above questions, our major concern is that the writing is still unclear. We have spent two weeks discussing what we think you mean. Therefore, we believe that there is potential to further clarify the wording and explanations. For example, with respect to understanding what is meant by "Natural ITD statistics are built into [the brain])", we would like to clarify whether you agree or disagree with the interpretation that "neural density may reflect the Fisher Information for azimuthal angle changes, etc" before you start revising the manuscript for a third time. Please clarify this along with your responses to the other points in a reply letter. Once we understand what you mean, we could suggest how/if to streamline the manuscript.

We feel remorse by the time our paper has taken from you and deeply appreciate the invaluable help. We agree with the interpretation that 'neural density may reflect the Fisher information for azimuthal angle changes', based on the assumption that Fisher information integrates both rate of change and variability of ITD, and this is what we are suggesting by showing that the Stern and Colburn and Harper and McAlpine papers could be consistent with our findings. By showing that the coding mechanisms underlying ITD sensitivity proposed by this paper reflects trends in natural statistics, our intention is to show how a representation of natural statistics could cause the observed effects on ITD discriminability.

Reviewer #1:

This is my second review of this interesting paper. The basic idea is very compelling, that natural ITD statistics are built into the neural circuits that mediate sound localization. I also like the combination of modeling and psychophysics.

Nevertheless, the presentation is quite complex. The authors have written the paper three times. Once, when first sent to the journal, prior to review, and next after the prior reviews. This version is much clearer and has less than half the number of figures.

Strengths – the use of ITD statistics, specifically, the derivative (rate of change) of the mean ITD over azimuth (ITDrc) and the standard deviation of ITD (ITDv) over time are both important insights. The authors' finding, that the Stern and Colburn, (1978) and Harper and McAlpine, (2004) models can implement a representation of ITD statistics, provided a strong connection between the neural coding and psychophysics. I feel the field has been arguing about this for decades, and that this represents a major insight.

We thank the reviewer 1’s positive feedback. We have revised the manuscript extensively, further simplifying figures and reducing their number. We have also revised the text to make the main message clearer.

Reviewer #3:

The authors substantially revised the manuscript by adding more detailed explanations for the calculations they made. They also employ more measured wording throughout the paper, now. I remain a bit skeptical about the suitability of their use of the terms "built in" and "anticipated" with regard to the conclusions of their findings.

We thank reviewer 3 for acknowledging the revisions in the previous submission. We have now revised the title and text across the manuscript, removing completely the terminology of ‘built in’ and ‘anticipated’ natural ITD statistics, and replacing them by saying that natural ITD statistics are ‘represented’ in the brain.

Reviewer #4:

[…] Why is ITD discrimination not equally good within the entire ecologically relevant range but is better for midline directions? Because Mills' data are already converted into milliseconds of ITD, it cannot just be the geometry (e.g Kuhn, 1977). The authors argue: It is because in real life, with broad band stimulation, the interaural correlation arriving from the left and right ear at the coincidence-detecting neurons is reduced due to differing HRTF filtering. IPD and ILD fluctuations are also introduced if ITD is not perfectly compensated by internal delay line differences (e.g. differences in left and right axonal travel times). Thus, they say, why wasting large number ITD sensitive neurons to reduce neural noise on spatial locations, where spatial cues are degraded anyway (ITDv)? In addition, ITD changes per angular location change (ITDrc) are lower at these off-centre directions relative to midline directions. In other words, one could argue that the Fisher information (change of mean divided by variance – they call it D-stat) for angular changes in source direction from these direction is low and so requires less number of neurons to be coded than the far higher information from frontal directions. The so predicted density distribution of binaural detectors as a function of their best ITD can be validated with ITD discrimination experiments using single tones, as these do not get ITD (and ILD) fluctuations by going through HRTFs, auditory filters and delay lines. Because their D-stat derived with broadband stimuli fit Mill's data very well, but D-stat can itself actually not explain Mills ITD discrimination thresholds because they were obtained with pure tones and are expressed in millisecond (not azimuthal angle), one might conclude that whatever limits ITD discrimination with tones has been influenced by D-stat by experiencing everyday broadband sounds. I assume the authors imply that it is neural noise that became somehow adjusted to match D-stat. Neural noise per ITD is determined by the number of ITD coding neuron coding this ITD (e.g. Stern and Colburn, 1978; Harper and McApline, 2004).

In the case that this indeed is, what the authors intend to say, I believe their idea is well worth disseminating. I agree with the first review that the current presentation is opaque and the description vague and misleading. It is lacking logical line of thoughts that tells the reader this story. The choice of wording itself is confusing and become more precise and descriptive in order to convey the message. The narrative can and should be kept simple and specific as it is actually a straightforward story, which is certainly novel and worth exploring further.

We thank reviewer 4 for the very insightful comments and the suggested connection with Fisher information. We have replaced our D-stat metric by a Fisher information index and we have revised the terminology referring to the brain representing statistics through experience and evolution. We also extensively revised the manuscript, trying to clarify the take home message and use wording easier to understand.

I would like to give further advice: The basic idea is in my opinion sufficient for an initial publication and I would not "dilute" it with unnecessary details. This story does not need further experimental data than those well-accepted ITD threshold data by Mills, (1958). They are sufficient for the purpose. Showing own, partly contradicting experimental data, just adds confusion and distracts the reader from the main message. Similarly, the EEG data are irrelevant for the narrative and just lengthen the manuscript unnecessarily.

We agree with the reviewer that the Mills, (1958) is powerful evidence supporting the goals of our study. We would like to argue that our experimental data are useful and provide more specific evidence regarding the effect of representing ITD statistics. We think showing that novelty detection is also influenced by ITD statistics is important. In addition, our measurements of dITD thresholds is relevant because it tests the hypothesis more specifically – Mills data are free field stimulation with tones, not through earphones. In fact, our data support our hypothesis rather than contradicting it. We believe that the revised figures, with removed unnecessary details, are able to show the main point of our study, supported by evidence from multiple datasets and approaches.

I agree that one can also base the neural density on the distribution of ITD tuning slope curves, assuming neurons are most sensitive to changes in ITD here (Harper and McAlpine, 2004). This is indeed worth mentioning because this thinking is becoming the popular. But the fact that an ITD tuning curve has two slopes just complicates the matter. For the sake of keeping the argument simple, I recommend not go into detail with the slope model, but illustrate the basic idea using the classic Jeffress-based model by Stern and Colburn (1978).

We agree with the notion that the two slopes of tuning curves adds complexity to Harper and McAlpine’s model prediction, which may be affecting Fisher information. We believe showing that the Harper and McAlpine, (2004) postulated coding framework could permit a representation of ITD statistics is a strong point in our manuscript, as it was pointed out by other reviewers; moreover the current analysis shows an interesting match between ITD and modeled neural population Fisher information. The models proposed in Stern and Colburn and Harper and McAlpine rely on two different frameworks explaining, respectively, psychophysics and reported tuning properties of ITD selective neurons. In our work we show the models are complementary. Furthermore, our analysis showing that adjusting the Harper and McAlpine model to match ITD statistics predicts observed neural responses suggests a potential functional connection between tuning properties and psychophysics for this model.

https://doi.org/10.7554/eLife.51927.sa2

Article and author information

Author details

  1. Rodrigo Pavão

    1. Dominick P. Purpura Department of Neuroscience - Albert Einstein College of Medicine, New York, United States
    2. Centro de Matemática, Computação e Cognição - Universidade Federal do ABC, Santo André, Brazil
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    For correspondence
    rodrigo.pavao@ufabc.edu.br
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6857-8963
  2. Elyse S Sussman

    Dominick P. Purpura Department of Neuroscience - Albert Einstein College of Medicine, New York, United States
    Contribution
    Conceptualization, Resources, Data curation, Supervision, Funding acquisition, Methodology, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1013-0621
  3. Brian J Fischer

    Department of Mathematics - Seattle University, Seattle, United States
    Contribution
    Conceptualization, Software, Funding acquisition, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5786-0544
  4. José L Peña

    Dominick P. Purpura Department of Neuroscience - Albert Einstein College of Medicine, New York, United States
    Contribution
    Conceptualization, Resources, Data curation, Supervision, Funding acquisition, Investigation, Writing - original draft, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6773-5640

Funding

National Institutes of Health (NS104911)

  • José L Peña
  • Brian J Fischer

National Institute on Deafness and Other Communication Disorders (DC004263)

  • Elyse S Sussman

National Institute on Deafness and Other Communication Disorders (DC007690)

  • José L Peña

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was supported by the NIH BRAIN Initiative (Grant NS104911) and by the NIDCD (Grants DC004263 and DC007690). We thank André Cravo, Boris Marin, Fanny Cazettes, Gervasio Batista, Michael Beckert, Peter Claessens and Ruben Coen Cagli for useful discussions and comments on the manuscript, and Renee Symonds, Kelin Brace and Huizhen (Joann) Tang for support with troubleshooting during early stages of data collection. We also thank eLife journal editors and reviewers for their thoughtful and constructive comments.

Ethics

Human subjects: This study was performed in accordance with the NIH Human Subjects Policies and Guidance and with the Brazilian National Health Council, and it was approved by the Internal Review Board of the Albert Einstein College of Medicine (#1999-023) and Ethics Committee of Universidade Federal do ABC (#2968291).

Senior Editor

  1. Andrew J King, University of Oxford, United Kingdom

Reviewing Editor

  1. Catherine Emily Carr, University of Maryland, United States

Reviewers

  1. Catherine Emily Carr, University of Maryland, United States
  2. Michael Pecka, Ludwig-Maximilians Universitaet Muenchen, Germany

Publication history

  1. Received: September 17, 2019
  2. Accepted: October 9, 2020
  3. Accepted Manuscript published: October 12, 2020 (version 1)
  4. Version of Record published: November 12, 2020 (version 2)

Copyright

© 2020, Pavão et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 481
    Page views
  • 64
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Hiromi Tamada et al.
    Research Advance

    Previously we showed that cryo fixation of adult mouse brain tissue gave a truer representation of brain ultrastructure in comparison with a standard chemical fixation method (Korogod et al 2005). Extracellular space matched physiological measurements, there were larger numbers of docked vesicles and less glial coverage of synapses and blood capillaries. Here, using the same preservation approaches we compared the morphology of dendritic spines. We show that the length of the spine and the volume of its head is unchanged, however, the spine neck width is thinner by more than 30 % after cryo fixation. In addition, the weak correlation between spine neck width and head volume seen after chemical fixation was not present in cryo-fixed spines. Our data suggest that spine neck geometry is independent of the spine head volume, with cryo fixation showing enhanced spine head compartmentalization and a higher predicted electrical resistance between spine head and parent dendrite.

    1. Neuroscience
    Opeyemi O Alabi et al.
    Research Article

    Goal-directed behaviors are essential for normal function and significantly impaired in neuropsychiatric disorders. Despite extensive associations between genetic mutations and these disorders, the molecular contributions to goal-directed dysfunction remain unclear. We examined mice with constitutive and brain region-specific mutations in Neurexin1α, a neuropsychiatric disease-associated synaptic molecule, in value-based choice paradigms. We found Neurexin1α knockouts exhibited reduced selection of beneficial outcomes and impaired avoidance of costlier options. Reinforcement modeling suggested this was driven by deficits in updating and representation of value. Disruption of Neurexin1α within telencephalic excitatory projection neurons, but not thalamic neurons, recapitulated choice abnormalities of global Neurexin1α knockouts. Furthermore, this selective forebrain excitatory knockout of Neurexin1α perturbed value-modulated neural signals within striatum, a central node in feedback-based reinforcement learning. By relating deficits in value-based decision-making to region-specific Nrxn1α disruption and changes in value-modulated neural activity, we reveal potential neural substrates for the pathophysiology of neuropsychiatric disease-associated cognitive dysfunction.