1. Neuroscience
Download icon

Abstract rules drive adaptation in the subcortical sensory pathway

  1. Alejandro Tabas  Is a corresponding author
  2. Glad Mihai
  3. Stefan Kiebel
  4. Robert Trampel
  5. Katharina von Kriegstein
  1. Faculty of Psychology, Technische Universität Dresden, Germany
  2. Max Planck Research Group Neural Mechanism of Human Communication, Max Planck Institute for Human Cognitive and Brain Sciences, Germany
  3. Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technische Universität Dresden, Germany
  4. Department of Neurophysics, Max Planck Institute for Human Cognitive and Brain Sciences, Germany
Research Article
  • Cited 2
  • Views 2,774
  • Annotations
Cite this article as: eLife 2020;9:e64501 doi: 10.7554/eLife.64501

Abstract

The subcortical sensory pathways are the fundamental channels for mapping the outside world to our minds. Sensory pathways efficiently transmit information by adapting neural responses to the local statistics of the sensory input. The long-standing mechanistic explanation for this adaptive behaviour is that neural activity decreases with increasing regularities in the local statistics of the stimuli. An alternative account is that neural coding is directly driven by expectations of the sensory input. Here, we used abstract rules to manipulate expectations independently of local stimulus statistics. The ultra-high-field functional-MRI data show that abstract expectations can drive the response amplitude to tones in the human auditory pathway. These results provide first unambiguous evidence of abstract processing in a subcortical sensory pathway. They indicate that the neural representation of the outside world is altered by our prior beliefs even at initial points of the processing hierarchy.

Introduction

Expectations have measurable effects on human perception; for instance, when disambiguating ambivalent stimuli like an object in the dark or spoken sentences in a noisy pub (de Lange et al., 2018). The predictive coding theoretical framework (Rao and Ballard, 1999; Friston, 2005) formalises the active role of expectations on perception by suggesting that sensory neurons constantly match the incoming stimuli against an internal prediction derived from a generative model of the sensory input. This strategy increases the efficiency of encoding and naturally boosts the salience of unexpected events that often have strong relevance for behaviour and survival. Although predictive coding has been shown for sensory processing in the cerebral cortex (see Kok and de Lange, 2015 for a review), the role of predictability in subcortical sensory coding is unclear (Malmierca et al., 2019; Carbajal and Malmierca, 2018; Parras et al., 2017; Malmierca et al., 2015). If coding at the subcortical pathway was based on expectations on the incoming stimuli, that would mean that the brain does not hold a veridical representation of the environment even at the very early points of the processing hierarchy.

Several studies in non-human mammals (Parras et al., 2017; Robinson et al., 2016; Ayala et al., 2015; Gao et al., 2014; Pérez-González et al., 2012; Zhao et al., 2011; Bäuerle et al., 2011; Antunes et al., 2010; Anderson et al., 2009Malmierca et al., 2009) as well as in humans (Font-Alaminos et al., 2020; Cacciaglia et al., 2015; Cornella et al., 2015; Escera and Malmierca, 2014; Grimm et al., 2011) have shown that single neurons and neuronal ensembles of subcortical sensory pathway nuclei exhibit stimulus-specific adaptation (SSA). Neurons and neural populations showing SSA adapt to so-called standards (frequently occurring stimuli) yet show restored responses to so-called deviants (rarely occurring stimuli) (Ulanovsky et al., 2003; Antunes et al., 2010; Zhao et al., 2011). In the auditory modality, SSA is typically elicited using sequences consisting of repetitions of a standard sound (typically a pure tone of a given frequency) incorporating a single, randomly located, deviant (a pure tone of the same duration and loudness but with a different frequency). Although SSA is often taken to support the view of predictive coding (Font-Alaminos et al., 2020; Carbajal and Malmierca, 2018; Malmierca et al., 2015; Cacciaglia et al., 2015), it can also be explained in terms of habituation (Malmierca et al., 2014), where neurons show decreased responsiveness to increased regularities in their local statistics independently of their predictability (see Grill-Spector et al., 2006; Kok and de Lange, 2015 for reviews). These local effects have been proposed to be caused by synaptic fatigue (Wang et al., 2014), network habituation (Eytan et al., 2003; Mill et al., 2011), or sharpening of the receptive fields after stimulus repetition (Grill-Spector et al., 2006); they occur even at the level of the retina (Hosoya et al., 2005) and the cochlea (Yates et al., 1990).

Habituation optimises information transmission locally by reducing responsiveness to redundant information at each stage of the processing hierarchy (Chechik et al., 2006). In contrast, the predictive coding framework (Rao and Ballard, 1999; Friston, 2005) suggests that neural activity represents prediction error and that such prediction error is minimal for predictable stimuli independently of their local statistics (Malmierca et al., 2015). It has been previously speculated that predictive coding optimises the neural code globally; that is, that expectations formed in high-level stages of the processing hierarchy are used to adapt neural representations even at lower level stages (Kiebel et al., 2008).

Distinguishing between these two scenarios requires to manipulate abstract predictability orthogonally to the local statistics of the stimulus (Summerfield et al., 2008). One way to do this is to control for behavioural expectations using abstract rules, an unresolved technical challenge for previous studies that mostly considered SSA in (often anaesthetised) animal models. Here, we used a novel paradigm in combination with ultra-high-field fMRI in human subjects to disassociate the habituation and predictive coding views of redundancy reduction in the auditory subcortical sensory pathway. We focused on the nuclei of the thalamus (medial geniculate body, MGB) and midbrain (inferior colliculus, IC) as they are the key nuclei of the ascending subcortical pathway that can be reliably investigated in human participants in vivo (Sitek et al., 2019).

Results

Experimental design and hypotheses

We measured blood-oxygenated-level-dependent (BOLD) responses in the human subcortical auditory pathway using 7 Tesla fMRI with a spatial resolution of 1.5 mm isotropic. We recorded a slab comprising the MGB and the IC. Nineteen subjects listened to sequences of eight pure tones (seven repetitions of a standard and one deviant tone; see Figure 1A–B). Tones were taken from a pool of three tones and used equally often as standards and as deviants. Subjects reported the position of the deviant for each sequence by pressing one button of a response box as quickly as possible.

Figure 1 with 1 supplement see all
Experimental design and hypotheses.

(A) Example of a trial, consisting of a sequence of seven pure tones of a standard frequency (blue waveform) and one pure tone of a deviant frequency (fourth tone in the example; red waveform), that could be located in positions 4, 5, or 6. Subjects had to report, in each trial, the position of the deviant. Each subject completed 240 trials in total, 80 per deviant position. All tones had a duration of 50 ms and were separated by 700 ms inter-stimulus-intervals (ISIs). (B) Schematic view of the expected underlying responses in the auditory pathway for the sequence shown in A, together with the definition of the experimental variables (std0: first standard; std1: repeated standards preceding the deviant; std2: standards following the deviant; devx: deviant in position x). (C) Expected responses in the auditory pathway nuclei corresponding to the habituation (h1) and predictive coding (h2) hypotheses. Since the posterior probability of finding a deviant at locations 4, 5, or 6 after hearing 3, 4 or 5 standards is 1/3, 1/2, and 1, respectively, predictive coding predicts different BOLD responses to different deviant locations.

Expectations for each of the deviant positions were manipulated by two abstract rules that were disclosed to the subjects: (1) all sequences have a deviant, and (2) the deviant is always located in positions 4, 5, or 6. Note that, although the three deviant positions were equally likely at the beginning of the sequence, due to the two abstract rules the probability of finding a deviant in position 4 after hearing three standards is 1/3, the probability of finding a deviant in position 5 after hearing four standards is 1/2, and the probability of finding a deviant in position 6 after hearing five standards is 1. This means that participants expected deviants at all positions, but with different expectations of the probability of finding the deviant. Therefore, habituation and predictive coding make opposing predictions for the responses at the different deviant positions (Figure 1B). According to the habituation hypothesis (Figure 1C, left), deviants will elicit roughly similar responses independently of their position. Conversely, under the predictive coding view the response is hypothesised to scale with the probability of finding a deviant in the target position (Figure 1C, right), rendering responses to earlier deviants stronger in contrast to the later deviants.

Behavioural responses

All subjects showed ceiling performances to all deviant positions (90 ± 3%, 95 ± 1%, and 94 ± 2%; mean accuracies ± standard error of the mean, for deviants in positions 4, 5 and 6, respectively), indicating that subjects were attentive. Reaction times (RT=541±43 ms, RT=447±32 ms, RT=197±40 ms; for deviants at positions 4, 5, and 6, respectively) were shorter for the more expected deviants, indicating a behavioural benefit of predictability. RTs were significantly shorter for deviants at position six than for deviants at positions 4 and 5 (Cohen’s d=-1.9 and d=-1.6, respectively; p<0.0001), and also shorter for deviants at position 5 than deviants at position 4 (Cohen’s d=0.6, p=0.045; statistical significance assessed with two-tailed Ranksum tests with N=19 samples, Holm-Bonferroni corrected for three comparisons). The RT difference between deviants 4 and 5 did not reach significance (p=0.1, uncorrected; same test as above, Cohen’s d=0.22).

SSA in IC and MGB

We estimated BOLD responses to the different stimuli using a general linear model (GLM) with six different conditions: the first standard (std0), the standards after the first standard but before the deviant (std1), the standards after the deviant (std2), and deviants at positions 4, 5, and 6 (dev4, dev5, and dev6, respectively; Figure 1B). The conditions std1 and std2 were parametrically modulated according to their positions to account for possible variations in the responses over subsequent repetitions (see Materials and methods and Figure 1—figure supplement 1).

In the first step of the analysis, we determined those voxels within the ICs and MGBs that showed SSA at the mesoscopic level; that is, that adapted to repeated stimuli and had restored responses to a deviant. We first identified the bilateral IC and MGB (IC and MGB ROIs; yellow patches in Figure 2) based on an atlas of the subcortical auditory pathway (Sitek et al., 2019). Within these ROIs, we tested: (1) for voxels with adapting responses to repeated standards (contrast std0>0.5std1+0.5std2) and (2) for voxels showing deviant detection, where the deviant elicited a stronger response than the repeated standards (contrast dev4>0.5std1+0.5std2); since all tones were used the same number of times as deviant and standard, dev4-0.5std1-0.5std2 is equivalent to the definition of the SSA index used in the animal literature (e.g. Parras et al., 2017). We included only dev4 in the contrast because it is the only deviant for which the habituation and predictive coding hypotheses make the same prediction. Including dev5 and dev6, which according to the predictive coding hypothesis will elicit weaker responses, would have biased the SSA regions towards the habituation hypothesis.

Mesoscopic stimulus-specific adaptation (SSA) in bilateral IC and MGB.

Regions within the anatomical MGB and IC ROIs showed adaptation to the repeated standards (adaptation; blue+purple) and deviant detection (red+purple). SSA (i.e. recovered responses to a deviant in voxels showing adaptation) occurred in bilateral MGB and IC (purple). Contrast patches show the voxels thresholded at p<0.05 FDR-corrected for the number of voxels in each anatomical ROI.

We found significantly adapting (p<0.001) and deviant detecting (p<0.0002) voxels in all four anatomical ROIs (Table 1). To test for voxels with significant SSA, we combined the adaptation and deviant-detection p-values so that pSSA=max(padaptation,pdeviant detection) in each voxel. Most voxels that showed adaptation also showed deviant detection (pSSA<0.0009; purple patches in Figure 2).

Table 1
Statistics and MNI coordinates of peak adaptation, deviant detection, and SSA in the four regions of interest.

All p-values are FWE-corrected for the number of voxels in each anatomical ROI and Holm-Bonferroni corrected for 12 statistical comparisons.

ContrastROICluster sizeMNI coordinates (mm)peak-level p-value
AdaptationLeft IC177 voxels[-4,-34,-11]p=0.0003
Right IC196 voxels[3,-36,-11]p=0.0002
Left MGB280 voxels[-16,-24,-6]p=0.0001
Right MGB276 voxels[18,-24,-7]p=0.001
Deviant detectionLeft IC243 voxels[-5,-35,-11]p=0.0002
Right IC249 voxels[4,-35,-12]p=0.0002
Left MGB278 voxels[-15,-25,-6]p=0.0001
Right MGB280 voxels[16,-23,-7]p=0.0001
SSALeft IC173 voxels[-4,-34,-11]p=0.0002
Right IC194 voxels[3,-35,-11]p=0.0002
Left MGB267 voxels[-16,-24,-6]p=0.00009
right MGB269 voxels[15,-23,-7]p=0.0009

BOLD responses correlate with the predictability of the deviants

We used the SSA ROIs of the ICs and MGBs to study the estimated BOLD responses to the different deviant positions (Figure 3). On visual inspection, the response profile showed that the more expected the deviants, the more reduced the responses, fitting with h2 (the predictive coding hypothesis; Figure 1C). Formal (Ranksum) statistical tests revealed significant differences in responses to the different deviant positions at α=0.05 for all contrasts (dev4dev5, dev5dev6, dev4dev6) in the four ROIs (p<0.005, Holm-Bonferroni corrected for 32 comparisons; |d|>1.00; for statistical details see Table 2). The results of these tests show that MGB and IC mesoscopic responses to deviant tones cannot be explained by habituation only.

Figure 3 with 1 supplement see all
BOLD responses in the four ROIs to the three different positions of the deviants.

Kernel density estimations of the distribution of z-scores of the estimated BOLD responses, averaged over voxels of each ROI, to the three deviant positions (dev4, dev5, dev6) in each of the four ROIs: left and right IC, and left and right MGB (IC-L, IC-R, MGB-L, MGB-R). Responses to the three different standards (std0, std1, std2) are displayed for reference. Each distribution holds 19 samples, one per subject. Error bars signal the mean and standard error of the distributions. * p<0.05, ** p<0.005, *** p<0.0005, **** p<0.00005; all p-values are Holm-Bonferroni corrected for 8×4=32 comparisons. Std0, first standard; std1: standards preceding the deviant; std2: standards following the deviant; dev4, dev5, and dev6: deviants at positions 4, 5, and 6, respectively.

Table 2
Statistics of the BOLD response differences between conditions.

Effect size is expressed as Cohen’s d. Statistical significance was evaluated with two-tailed Ranksum tests between the distributions of the mean response in each ROI across subjects (N=19). All p-values in the table are Holm-Bonferroni corrected for 4×8=32 comparisons.

IC-L
dev4dev5dev6
std0d=-1.04p=0.046d=-0.36p=1d=1.21p=0.025
std2d=-2.97p=8.6×106d=-0.02p=0.95
dev4d=-1.05p=0.038d=-2.45p=5.5×105
dev5d=-1.90p=0.00043
IC-R
dev4dev5dev6
std0d=-1.07p=0.028d=-0.50p=0.9d=0.93p=0.061
std2d=-1.88p=0.00044d=-0.16p=1
dev4d=-0.69p=0.18d=-1.87p=0.001
dev5d=-1.44p=0.0053
MGB-L
dev4dev5dev6
std0d=-1.46p=0.0024d=-0.55p=1d=1.38p=0.017
std2d=-3.78p=7.6×106d=-0.48p=1
dev4d=-1.15p=0.016d=-2.52p=2.8×105
dev5d=-1.93p=0.00035
MGB-R
dev4dev5dev6
std0d=-1.15p=0.024d=-0.04p=1d=1.47p=0.0063
std2d=-2.57p=5.6×105d=-0.17p=1
dev4d=-1.26p=0.014d=-2.44p=6.1×105
dev5d=-1.67p=0.0026

We tested if the responses to deviants were negatively correlated to the posterior probability of the deviant positions, as hypothesised by the predictive coding hypothesis (h2; Figure 1C). We computed the correlation between the estimated BOLD response elicited by the different deviant positions in each SSA ROIs of the ICs and MGBs and the probability of finding the deviant in the nth position after hearing n-1 standards (namely: 1/3, 1/2 and 1, for deviant positions 4, 5, and 6, respectively; Figure 3—figure supplement 1). We found a strong negative Pearson’s correlation between predictability and BOLD responses in all four ROIs (left IC: r=-0.33, right IC: r=-0.27, left MGB: r=-0.43, right MGB: r=0.32; N = 19 and p<4×10-7 in the four ROIs).

To explore the robustness of these findings we tested the correlation between the mean BOLD responses and deviant predictability at the single-subject level. We found negative correlations for each subject, with Pearson’s r ranging from r=-0.27 to r=-0.72 (Figure 3—figure supplement 1). The correlations were statistically significant for 14 of the 19 subjects (p>0.19 for the non-significant correlations, and p[0.036,10-10] for the significant ones; Pearson’s test comprised N=4×4×3=48 samples, corresponding to one sample for each ROI, run, and condition).

Deviant detection can be abolished by making the deviant predictable

The correlation analyses suggested that the mesoscopic responses in the IC and MGB to the deviants can be interpreted as prediction error. If that is indeed the case, we expect that the deviant in position six would elicit similar responses as the standards after a deviant (std2), because the expectation of occurrence is the same (i.e. P=1). In contrast, responses to a deviant in position four should show similar behaviour as deviants in traditional SSA designs; namely, higher response to the deviant than to the first standard (std0; deviant detection) (Cacciaglia et al., 2015; Gao et al., 2014; Malmierca et al., 2009). The present results are consistent with both predictions: response magnitudes for dev6 and std2 are similar and the response to dev4 is significantly higher than to std0 in all four ROIs (Figure 3; Cohen’s d<-0.8; p<0.02 Holm-Bonferroni corrected for 32 comparisons; Table 2).

The negligible differences between the responses to the fully expected deviant (dev6) and the standards after the deviant (std2) fits the predictive coding framework perfectly: although the deviant is different from the standards in terms of frequency, it elicits the same response as a standard. Thus, deviance detection can be virtually abolished at the mesoscopic level by manipulating subjects’ expectations; that is, by rendering the deviant predictable.

IC and MGB respond in accordance with the predictive-coding model

To formally test the habituation (h1) and predictive coding hypothesis (h2) against each other in a voxel-by-voxel manner, we used Bayesian model comparison. Following the methodology described in Rosa et al., 2010 and Stephan et al., 2009, we first calculated the log-likelihood of each model in each voxel of the four SSA regions in each subject. Each of the two models associated different relative amplitudes to different tone positions in the sequences. The habituation model assumed an asymptotic decay of the standards and recovered responses to the deviants (Figure 4A), whereas the predictive-coding model assumed that the responses to both deviants and standards would depend on their predictability (Figure 4A; Figure 1C).

Figure 4 with 1 supplement see all
Bayesian model comparison analysis of the BOLD responses.

(A) Design of the Bayesian analysis: each model was defined according to the relative amplitudes it predicted for the different positions of the standards and deviants in the tone sequences. Note that, depending on the deviant position, standards in positions 4 and 5 were not fully expected in the predictive coding model. (B) Posterior probability map of the predictive coding model. Since we only used two models to compute the posteriors, p<0.5 means that the habituation model (blue) is the most likely explanation of the data, and p>0.5 means that the predictive coding model is the most likely explanation of the data. (C) Histograms showing the prevalence of each of the two models in each of the SSA regions. See also Figure 4—figure supplement 1, which shows the posterior maps and histograms for the anatomical ROIs.

Subject-specific log-likelihoods were used to construct a posterior probability map for each model at the group level. Posterior maps showed that most voxels in both ICs and MGBs were more likely to respond according to the principles of predictive coding (red sections in Figure 4B). For the IC, this was the case for 98% (right IC) and 86% (left IC) of the voxels. Only negligible parts of the four nuclei (maximum of 3%) were more likely to be driven by habituation (blue sections in Figure 4B). We repeated the analysis without restriction to the SSA regions, but for the anatomical IC and MGB regions. The results were qualitatively the same (Figure 4—figure supplement 1).

SSA is present and driven by predictive coding in both primary and secondary MGB

Next, we tested whether voxels showing SSA and responding to the principles of predictive coding were present in the primary (lemniscal) or only secondary (non-lemniscal) sections of the auditory pathway. Whilst the primary pathway is characterised by neurons that carry auditory information with high fidelity, the secondary pathway typically shows contextual and multisensory effects (Hu, 2003). Both the MGB and the IC contain subregions that contain either primary and secondary pathway components. Distinguishing between the primary and secondary subsection of the IC and MGB non-invasively is technically challenging. A recent study (Mihai et al., 2019) distinguished two distinct tonotopic gradients of the MGB. The ventral tonotopic gradient was identified as the ventral MGB (vMGB) which is the primary or lemniscal subsection of the MGB (see Figure 5A, green). Although the parcellation is based only on the topography of the tonotopic axes and their anatomical location, the region is the best approximation to-date of the vMGB in humans.

Analyses of BOLD responses in ventral MGB.

(A) Masks from Mihai et al., 2019 of the ventral MGBs (green); blue marks the remaining of the anatomical MGB ROIs. (B) The distribution of the SSA index SI=(dev-std)/(dev+std) across each of the two subdivisions of the MGB ROIs. (C) Histograms showing the prevalence of the habituation (hab) and predictive coding (pred) models in each of the subdivisions.

First, we assessed whether the strength of SSA is comparable in the ventral tonotopic gradient and in the rest of the MGB ROIs. Following the procedures described in previous literature (e.g. Ulanovsky et al., 2003), we computed the SSA index SI=(dev4-std1/2-std2/2)/(dev4+std1/2+std2/2) for each voxel in each of the subdivisions of the MGB. Similar distributions of the SI were observed in the vMGB and the rest of the MGB (Figure 5B). We also observed similar distributions of the posterior probability of the habituation and predictive coding model across the voxels of each of the subdivisions (Figure 5C). Predictive coding was the most likely underlying model in the entire left and right vMGB, respectively, and in 97% and 93% of the left and right voxels not belonging to the ventral subdivision. We conclude that both the vMGB and the rest of the MGB are dominated by responses driven by predictive coding.

Deviant detection can be elicited by unpredictable standards

So far, we assumed that not only the responses to deviants, but also to standards, was modulated by predictability (Figures 4 and 5). This means that unexpected standards elicit stronger responses than expected standards: that is, that deviant detection is not restricted to deviant tones, but more generally to unexpected tones. To validate this choice formally we ran a further Bayesian model comparison including a model that we call the deviant-only predictive coding model, where only the responses to deviants but not the standards are modulated by predictability (see Figure 6A).

Bayesian model comparison of a variation of the predictive coding model.

(A) Design: relative amplitudes assumed by the habituation, predictive coding, and deviant-only predictive coding model. The first two models are identical to the ones defined in Figure 4A. (B) Posterior probability map of the deviant-only predictive coding model. Since three models were considered when computing the posteriors, P<0.33 means that the deviant-only predictive coding model is not the most likely explanation of the data, but P>0.33 does not necessarily mean that the deviant-only predictive coding model is the most likely explanation of the data. (C) Histograms showing the prevalence of each of the three models in each of the SSA regions.

BOLD responses in most voxels (a minum of 96%) of the four nuclei are best explained by the level of predictability of both the deviants and standards (Figure 6B and C).

Discussion

We tested two opposing views on the mechanism of sensory processing in the auditory midbrain (IC) and auditory thalamus (MGB). In one view, sensory processing can be explained by habituation to local stimulus statistics (Figure 1C, h1), in the other by predictive coding (Figure 1C, h2). The study included a novel paradigm that orthogonalised local stimulus statistics and subjects’ expectations. We used ultra-high-resolution 7-Tesla fMRI optimised for imaging the IC and MGB. There were three key findings: First, mean BOLD responses in IC and MGB correlated with the subjects’ expectations of the probability of the stimulus occurrence but not with the local stimulus statistics. Second, events deviating from local stimulus statistics did not lead to increased responses in IC and MGB if subjects expected these events. Third, Bayesian model comparison showed that the responses of the majority of voxels in IC and MGB are best explained by a predictive coding model. Together, the findings indicate that sensory processing in auditory midbrain and thalamus are mostly driven by expectations of the subject and not by regularities in the local stimulus statistics.

Several previous studies have interpreted response properties of subcortical sensory nuclei within a predictive coding framework (Font-Alaminos et al., 2020; Carbajal and Malmierca, 2018; Parras et al., 2017; Malmierca et al., 2015; Cacciaglia et al., 2015; Ulanovsky et al., 2003). These studies have, however, used designs where predictions were generated based on the regularities of the local stimulus statistics. Although mesoscopic responses to violation of abstract rules have been reported in the sensory cortex (e.g., Näätänen et al., 1978; Paavilainen, 2013; Kok and de Lange, 2015; de Lange et al., 2018), they have not been reported in subcortical nuclei to-date. Our study breaks with a long tradition on research on subcortical SSA (Font-Alaminos et al., 2020; Parras et al., 2017; Robinson et al., 2016; Cacciaglia et al., 2015; Duque and Malmierca, 2015; Ayala et al., 2015; Cornella et al., 2015; Gao et al., 2014; Anderson and Malmierca, 2013; Ayala et al., 2012; Pérez-González et al., 2012; Zhao et al., 2011; Bäuerle et al., 2011; Antunes and Malmierca, 2011; Antunes et al., 2010; Anderson et al., 2009; Malmierca et al., 2009; Yu et al., 2009) by defining the predictions based on abstract rules that were orthogonal to the regularity of the stimulus local statistics. Only one study attempted to investigate the impact of abstract rules on SSA using alternating tone sequences in anaesthetised rats (Malmierca et al., 2019). They found that only around 5% of the measured units (comparable to the false discovery rate α=0.05 of the study) showed deviant responses to violations of the abstract rules.

A study on SSA in the rodent auditory system (Parras et al., 2017) where predictability was controlled using local stimulus statistics reported that structures at increasingly higher stages of the auditory pathway show increasing amounts of prediction error. The authors defined prediction error as the responses to sounds that deviate from the predictions in comparison to the responses to those same sounds when there were no available predictions. The authors concluded that the IC, MGB, and AC form a hierarchical network of prediction error. Although the studies use different paradigms in different species, a similar analysis can be done in our data by comparing the responses to the most unexpected deviant (dev4) with those for which no prediction is available; that is, the first standard in the sequences std0. Responses to dev4 are higher than responses to std0 in both, IC and MGB (Table 2 and Figure 3). This contrast with Parras’ results, where the IC showed little or no difference between the responses elicited by deviant and control sounds.

Nuclei in the auditory pathway are organised in primary (or lemniscal) and secondary (or non-lemniscal) subdivisions. The lemniscal division of the auditory pathway has narrowly tuned frequency responses and is considered as responsible for the transmission of bottom-up information; the non-lemniscal division presents wider tuned frequency responses and is also involved in multisensory integration (Hu, 2003). In the animal neurophysiology literature the strongest SSA is typically reported in non-lemniscal areas; that is, in dorsal and medial sections of the MGB (Antunes et al., 2010; Antunes and Malmierca, 2011; Duque et al., 2014) and the cortices of the IC (Pérez-González et al., 2012; Gao et al., 2014; Duque et al., 2014; Ayala and Malmierca, 2015; Ayala and Malmierca, 2018). Subdivisions of IC and MGB are notoriously difficult to assess in humans in vivo because of their small size and deep location within the brain (Moerel et al., 2015; Mihai et al., 2019). Nevertheless, our results showed that the SSA index had comparable distributions in the ventral and dorsal subdivisions of the MGB (Figure 5A). Moreover, our results showed that MGB regions driven by the predictive coding model were predominant in the ventral (lemniscal) tonotopic gradient of the MGB (Mihai et al., 2019) as well as in the rest of the MGB. Regarding the IC, there is to-date no available anatomical or functional atlas delimiting its central section (lemniscal) from its cortex (non-lemniscal). Nevertheless, our results show that the predictive coding model is the most likely generator of the data across the entire nuclei. We therefore assume that predictive coding underlies encoding of both, lemniscal and non-lemniscal subdivisions of the IC and MGB.

This fundamental difference with the animal literature might stem from a number of reasons. First, our design involved an active task: lemniscal pathways might only be strongly modulated by predictions when they carry behaviourally relevant sensory information. Second, the modulation of the subcortical pathways might be fundamentally different in humans compared to other mammals. Last, given the strength of the SSA effects reported in this study, it is possible that regions with weak SSA might have been contaminated with signal stemming from areas with strong SSA due to smoothing and interpolation necessary for the analysis of fMRI data.

It is tempting to hypothesise that the predictions on the sensory input that drive the subcortical responses in our experiment are generated in the cerebral cortex. This hypothesis would be consistent with the strong feedback connections from cerebral cortex to the subcortical sensory pathway (Winer, 1984; Winer, 2005). It would also be consistent with the results from animal studies where the deactivation of unilateral auditory cortex (Bäuerle et al., 2011) or the TRN (Yu et al., 2009) led to reduction of SSA in the ventral MGB (but also see contradictory findings in non-lemniscal MGB, Antunes and Malmierca, 2011, and non-lemniscal IC, Anderson and Malmierca, 2013). Our paradigm was optimised to study prediction error rather than the generation of such predictions, and we lacked the resolution to study cortical responses in enough detail as to disentangle activity representing predictions from activity representing prediction error. Thus, although it is unlikely that subcortical sensory nuclei like the MGB or IC are able to generate predictions based on the task instructions, whether these predictions originate in the cerebral cortex remains an open question.

Higher BOLD responses to attended in contrast to unattended sounds are present in auditory cortex (Lee et al., 2014; Paltoglou et al., 2011), and to a much weaker extend also in the IC (Rinne et al., 2007; Rinne et al., 2008; Varghese et al., 2015; Riecke et al., 2018). Our results showed that responses to fully expected deviants at position 6 (posterior probability of 1) are strongly attenuated with respect to responses to deviants in positions where standards might also occur. This strong attenuation might not only be interpreted in terms of predictive coding, but also additionally by attentional gain modulation: deviants with a posterior probability of 1 might not need to be examined as carefully as deviants with low posterior probability, because its occurrence is guaranteed by task design. Two independent arguments support the interpretation that predictive coding underlies our results. First, although both conditions dev4 and dev5 required full attention of the participants and are thus not affected by any potential changes in the attentional state of the subject, BOLD response differences for these two conditions had strong effect sizes, ranging from d=-1.36 to d=-0.69 (see Table 2).

Second, our results showed that deviance responses were virtually abolished for dev6 (Table 2). From previous work in animals, we know that deviance detection is salient even in anaesthetised animals (Malmierca et al., 2015) and effect sizes of SSA in the IC are comparable in the awake and anaesthetised mouse (Duque and Malmierca, 2015). Using fMRI in humans, Cacciaglia and colleagues (Cacciaglia et al., 2015) showed deviance detection in the human subcortical auditory pathway in passive listening conditions. Despite the much lower BOLD sensitivity of their experimental setup in comparison to ours, they reported a t-statistic for the deviant versus repeated standard contrast (in the e.g. left IC) of t11=5.24, corresponding to an effect size of d=3.15. In contrast, our effect sizes for the dev6 versus std2 contrast range from d=0.26 (left IC) to d=-0.74 (right MGB; Table 2). If the dev6 response in our study was influenced by lack of attention, we would have still expected similar deviance responses as in Cacciaglia and colleagues’s passive listening design. Only by interpreting the BOLD responses in our data as a correlate of predictability to abstract rules we can explain why we measured similar responses to dev6 and std2 in our paradigm.

The present study focused on auditory sensory pathway nuclei. Stimulus-specific adaptation at early stages of the sensory pathways has, however, also been reported in the visual (Dhruv and Carandini, 2014), olfactory (Fletcher and Wilson, 2003), and somatosensory (Maravall et al., 2013) pathways. Predictive coding serves to optimise the dynamic range of sensory systems (Brenner et al., 2000), and to maximise information transmission in the neural code by reducing the responses to expected stimuli (Fairhall et al., 2001) and to redundant portions of the incoming sensory signal (Huang and Rao, 2011). We speculate that abstract expectations are used as well in other sensory modalities to facilitate sensory processing in subcortical sensory nuclei.

Given the importance of predictive coding on sensory processing (e.g., Sohoglu and Davis, 2016; Davis and Johnsrude, 2007), atypical predictive coding in the subcortical sensory pathway is expected to result in profound repercussion at the cognitive level (McFadyen et al., 2020). For instance, individuals with developmental dyslexia, a disorder that is characterised by difficulties with processing speech sounds, have altered adaption dynamics to stimulus regularities (Perrachione et al., 2016; Ahissar et al., 2006; Chandrasekaran et al., 2009), altered responses in the left MGB (Díaz et al., 2012; Chandrasekaran et al., 2009), and atypical left hemispheric cortico-thalamic pathways (Müller-Axt et al., 2017; Tschentscher et al., 2019). Understanding the mechanisms underlying SSA and its relation to sensory processing in subcortical sensory pathways could have valuable applications in clinical contexts.

Materials and methods

This study was approved by the Ethics committee of the Medical Faculty of the University of Leipzig, Germany (ethics approval number 273/14-ff). All listeners provided written informed consent and received monetary compensation for their participation.

Participants

Nineteen German native speakers (12 female), aged 24 to 34 years (mean 26.6), participated in the study. None of them reported a history of psychiatric or neurological disorders, hearing difficulties, or current use of psychoactive medications. Normal hearing abilities were confirmed with pure tone audiometry (250 Hz to 8000 Hz; Madsen Micromate 304, GN Otometrics, Denmark) with a threshold equal to or below 25 dB SPL. Participants were also screened for dyslexia (rapid automatised naming test of letters, numbers, and objects [Denckla and Rudel, 1974]; German LGVT 6–12 test [Schneider et al., 2007]) and autism (Autism Spectrum Quotient [Baron-Cohen et al., 2001]). All scores were within the neurotypical range (RAN: maximum of 3.5 errors and RT=30 seconds across the four categories; AQ: all participants under a score of 23, below the cut-off value of 32; LGVT scores: all subjects where performing in the normal range). As we had no estimations of the possible sizes of the effects, we maximised our statistical power by recruiting as many participant as we could fit in the MRI measurement time allocated to the study. This number was fixed to nineteen before we started data collection.

Experimental paradigm

Request a detailed protocol

All sounds were 50 ms long (including 5 ms in/out ramps) pure tones of frequencies 1455 Hz, 1500 Hz, or 1600 Hz, corresponding to three local minima of the power spectrum of the noise produced by the MRI during the scanning. From those three tones, we constructed six standard-deviant frequency combinations that were used the same number of times across each run, so that all tones were used the same number of times as deviant and standards. We used three rather than two tones so that each run contained six rather than two different standard-deviant combinations, rendering the task more engaging.

Each tone sequence consisted of seven repetitions of the standard stimulus and a single event of the deviant stimulus. Stimuli were separated by 700 ms inter-stimulus-intervals (ISI), amounting to a total duration of 5300 ms per sequence. To choose the ISI, we run a pilot behavioural study where we measured the reaction time to deviants 4, 5, and 6 with different ISIs. We took the shortest possible ISI that allowed the subjects to predict the fully expected deviant, as revealed by a significant behavioural benefit in the RT for a deviant located in position 6.

In each trial of the fMRI experiment, subjects listened to one tone sequence and reported, as fast and accurately as possible using a button box with three buttons, the position of the deviant (4, 5, or 6). The inter-trial-interval (ITI) was jittered so that deviants were separated by an average of 5 s, up to a maximum of 11 s, with a minimum ITI of 1500 ms. We chose such ITI properties to maximise the efficiency of the response estimation of the deviants (Friston et al., 1999), while keeping a sufficiently long ITI to ensure that the sequences belonging to separate trials were not confounded.

The experiment consisted in four runs with the same task. Each run contained 6 blocks of 10 trials. The 10 trials in each block used one of the six possible combinations of pure tones, so that all the sequences within each block had the same standard and deviant. Thus, within a block only the position of the deviant was unknown, while the frequency of the deviant was known. The order of the blocks within the experiment was randomised. The position of the deviant was pseudorandomised across all trials in each run so that each deviant position happened exactly 20 times per run but an unknown amount of times per block. This constraint allowed us to keep the same a priori probability for all deviant positions in each block. In addition, there were 23 silent gaps of 5300 ms duration (i.e., null events of the same duration as the tone sequences) randomly located in each run (Friston et al., 1999).

Each run lasted around 10 minutes, depending on the reaction times of the participant. The runs were separated by breaks of a minimum of 1 minute, during which the subjects could rest. Fieldmaps and a whole-head EPI (see Data acquisition) were acquired between the second and third run. The first run was preceded by a practice run of four randomly chosen trials to ensure the subjects had understood the task. We acquired fMRI during the practice run in order to allow the subjects to undertake the training with MRI-noise. As we had no estimations of the possible sizes of the effects, we maximised our statistical power by measuring as many trials as we could fit within the expected engagement span of the participants, that we estimated of around 45 minutes.

Data acquisition

Request a detailed protocol

MRI data were acquired using a Siemens Magnetom 7 Tesla scanner (Siemens Healthineers, Erlangen, Germany) with an eight-channel head coil (RAPID Biomedical, Rimpar, Germany).

Functional MRI data were acquired using echo planar imaging (EPI) sequences. We used a field of view (FoV) of 132 mm × 132 mm and partial coverage with 30 slices. This volume was oriented in parallel to the superior temporal gyrus such that the slices encompassed the IC, the MGB, and the superior temporal gyrus. In addition, we acquired three volumes of an additional whole-head EPI with the same parameters (including the FoV) and 80 slices during resting to aid the coregistration process (see Data preprocessing).

The EPI sequence had the following acquisition parameters: TR = 1600 ms, TE = 19 ms, flip angle 65°, GRAPPA with acceleration factor 2 (Griswold et al., 2002), 33% phase oversampling, matrix size 88 × 88, FoV 132 mm × 132 mm, phase partial Fourier 6/8, voxel size 1.5 mm isotropic, interleaved acquisition, and anterior to posterior phase-encode direction. During fMRI data acquisition, heart rate and respiration rate were acquired using a BIOPAC MP150 system (BIOPAC Systems Inc, Goleta, CA, USA).

Structural images were recorded using an MP2RAGE (Marques et al., 2010) T1 protocol with 700 µm isotropic resolution, TE = 2.45 ms, TR = 5000 ms, TI1 = 900 ms, TI2 = 2750 ms, flip angle 1 = 5°, flip angle 2 = 3°, FoV = 224 mm × 224 mm, GRAPPA acceleration factor 2.

Stimuli were presented using MATLAB (The Mathworks Inc, Natick, MA, USA; RRID:SCR_001622) with the Psychophysics Toolbox extensions (Brainard, 1997) and delivered through an MrConfon amplifier and headphones (MrConfon GmbH, Magdeburg, Germany). Loudness was adjusted independently for each subject before starting the data acquisition to a comfortable level.

Data preprocessing

Request a detailed protocol

The preprocessing pipeline was coded in Nipype 1.1.2 (Gorgolewski et al., 2011) (RRID:SCR_002502), and carried out using tools of the Statistical Parametric Mapping toolbox, version 12 (SPM; RRID:SCR_007037); Freesurfer (RRID:SCR_001847), version 6 (Fischl et al., 2002); the FMRIB Software Library, version 5 (FSL; RRID:SCR_002823) (Jenkinson et al., 2012); and the Advanced Normalisation Tools, version 2.2.0 (ANTS; RRID:SCR_004757) (Avants et al., 2011). All data were coregistered to the Montreal Neurological Institute (MNI) MNI152 1 mm isotropic symmetric template (RRID:SCR_014087).

First, we realigned the functional runs. We used SPM’s FieldMap Toolbox to calculate the geometric distortions caused in the EPI images due to field inhomogeneities. Next, we used SPM’s Realign and Unwarp to perform motion and distortion correction on the functional data. Motion artefacts, recorded using SPM’s ArtifactDetect, were later added to the design matrix (see Estimation of the BOLD responses).

Next, we processed the structural data. We first masked the structural data to eliminate voxels that contained air, scalp, skull, and cerebrospinal fluid. The masks were computed by segmenting the white matter with SPM’s Segment and applied with FSLmaths. Then, we used Freesurfer’s recon-all routine to calculate the boundaries between grey and white matter (these are necessary to register the functional data to the structural images) and ANTs to compute the transformation between the structural images and the MNI152 symmetric template.

Last, we coregistered the functional data to the MNI152 space. The transformation between the functional runs and the structural image was computed with using Freesurfer’s BBregister using the boundaries between grey and white matter of the structural data and the whole-brain EPI as an intermediate step. The final functional-to-MNI transformation, computed as the concatenation of the functional-to-structural and structural-to-MNI transformations, was then applied using ANTs. Note that, since the resolution of the MNI space (1 mm isotropic) was higher than the resolution of the functional data (1.5 mm isotropic), the transformation resulted in a spatial oversampling.

All the preprocessing parameters, including the smoothing kernel size, were fixed before we started fitting the general linear model (GLM) and remained unchanged during the subsequent steps of the data analysis.

Physiological (heart rate and respiration rate) data were processed by the PhysIO Toolbox (Kasper et al., 2017), that computes the Fourier expansion of each component along time and adds the coefficients as covariates of no interests in the model’s design matrix.

Estimation of the BOLD responses

Request a detailed protocol

First level and second level analyses were coded in Nipype and carried out using SPM. Statistical analyses of the model estimations in the SSA ROIs were carried out using custom code in MATLAB. BOLD data acquired during the practice run was not included in the analysis.

The coregistered data were first smoothed using a 2 mm full-width half-maximum kernel Gaussian kernel with SPM’s Smooth.

The first level GLM’s design matrix included six conditions: first standard (std0), standards before the deviant (std1), standards after the deviant (std2), and deviants in positions 4, 5, and 6 (dev4, dev5, and dev6, respectively; Figure 1). Conditions std1 and std2 were modelled using linear parametric modulation (O'Doherty et al., 2007), whose linear factors were coded according to the position of the sound within the sequence (see Figure 1—figure supplement 1). We modelled the first standard separately from the remaining standards preceding the deviant so that we could perform a contrast comparing the responses to the first and the adapted standards to locate voxels showing adaptation. We modelled the standards preceding and following the deviant separately because we cannot propose a set of linear factors simultaneously valid for both, std1 and std2. On top of the main regressors, the design matrix also included the physiological PhysIO and artefact regressors of no-interest.

Definition of the anatomical and SSA ROIs

Request a detailed protocol

We used a recent anatomical atlas of the subcortical auditory pathway (Sitek et al., 2019) to locate the voxels corresponding to the left IC, right IC, left MGB, and right MGB, respectively. The atlas comprises three different definitions of the ROIs calculated using (1) data from the big brain project, (2) postmortem data, and (3) fMRI in vivo-data. We used the mask computed with the fMRI data because this data collection method resembled our experimental setup the most.

We used the coefficients of the GLM or beta estimates from the first level analysis to calculate the adaptation (Figure 2, blue patches) and deviant detection (red patches) ROIs, defined as the sets of voxels within the IC and MGB ROIs that responded significantly to the contrasts std0>0.5std1+0.5std2 and dev4>0.5std1+0.5std2, respectively. Significance was defined as p<0.05, false-discovery-rate (FDR)-corrected for the number of voxels within each of the IC/MGB ROIs. SSA voxels are defined as voxels that show both, adaptation and deviant detection; thus, we calculated an upper bound of the p-value maps for the SSA contrast as the maximum of the uncorrected p-values associated to the adaptation and deviant detection contrasts. The SSA ROIs (Figure 2, purple patches) were calculated by FDR-correcting and thresholding the resulting p-maps at α=0.05. All calculations were performed using custom-made scripts (see Data and code availability).

Bayesian model comparison

Request a detailed protocol

The Bayesian analysis of the data consisted as well of first and second level analyses. In the first level, we used SPM via nipype to compute the log-evidence in each voxel of each subject for each of the four models: habituation, predictive coding, task engagement, and deviant-only predictive coding. The models were described using a single regressor with parametric modulation whose coefficients corresponded to a simplified view of the expected responses according to each model. The expected responses of each model were the same in all trials that had the same deviant position.

The values assigned to each stimulus in the models are schematically shown in Figures 4 and 6. In the habituation model, the amplitude was one for the first standard in the sequences (std0 in the regression models) and the deviant, 1/n for standards n=2,3,, and 1/(n-1) for the standards n=d+1,d+2,, where d is the position of the deviant; for example tones in a sequence with d=6 have amplitudes [1,1/2,1/3,1/4,1/5,1,1/5,1/6]. For the predictive coding model, the amplitude of the first standard was set to 0.5 and, for the rest of stimuli, to 1P where P is the probability of occurrence of the stimulus; for example tones in a sequence with d=6 have amplitudes [0.5,0,0,0.66,0.5,0,0,0]. For the deviant-only predictive coding model, amplitudes were set as in the predictive coding model, but turning the standards in positions 4 and 5 also to zero; for example, tones in a sequence with d=6 have amplitudes [0.5,0,0,0,0,0,0,0]. Amplitudes of all the models were normalised to have a mean of zero and a variance of one along the entire run before fitting.

Log-evidence maps were combined using custom scripts (see Data and code availability) and following the procedure described in Rosa et al., 2010 and Stephan et al., 2009 to compute the posterior probability maps associated to each model. Histograms shown in Figures 4 and 6 are kernel-density estimates computed with the distribution of the posterior probabilities across voxels for each of the SSA ROIs.

Data availability

Derivatives (beta maps and log-likelihood maps, computed with SPM) and all code used for data processing and analysis are publicly available at https://doi.org/10.17605/OSF.IO/F5TSY.

The following data sets were generated
    1. Tabas A
    (2020) Open Science Framework
    Predictive processing in the human subcortical auditory pathway.
    https://doi.org/10.17605/OSF.IO/F5TSY

References

    1. Friston K
    (2005) A theory of cortical responses
    Philosophical Transactions of the Royal Society B: Biological Sciences 360:815–836.
    https://doi.org/10.1098/rstb.2005.1622
    1. Huang Y
    2. Rao RPN
    (2011) Predictive coding
    Wiley Interdisciplinary Reviews: Cognitive Science 2:580–593.
    https://doi.org/10.1002/wcs.142
  1. Book
    1. Kok P
    2. de Lange FP
    (2015) Predictive Coding in Sensory Cortex
    In: Forstmann B, Wagenmakers E. J, editors. An Introduction to Model-Based Cognitive Neuroscience. New York: Springer. pp. 221–244.
    https://doi.org/10.1007/978-1-4939-2236-9_11
  2. Book
    1. Schneider W
    2. Schlagmüller M
    3. Ennemoser M
    (2007)
    LGVT 6-12: Lesegeschwindigkeits-Und-Verständnistest Für Die Klassen 6-12
    Hogrefe Göttingen.
  3. Book
    1. Winer JA
    (2005) Three Systems of Descending Projections to the Inferior Colliculus
    In: Winer J. A, Schreiner C. E, editors. The Inferior Colliculus. Springer-Verlag. pp. 231–247.
    https://doi.org/10.1007/0-387-27083-3_8

Decision letter

  1. Barbara G Shinn-Cunningham
    Senior Editor; Carnegie Mellon University, United States
  2. Timothy D Griffiths
    Reviewing Editor; University of Newcastle, United Kingdom
  3. Manuel S Malmierca
    Reviewer; University of Salamanca, Spain

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The work addresses whether predictive coding models of perception, that have previously been applied to cortical analysis, can also be applied subcortical processing. This has been been done using high field fMRI and a paradigm that aims to disambiguate sensory adaptation and expectation for sound sequences.

Decision letter after peer review:

[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]

Thank you for submitting your work entitled "Abstract rules drive adaptation in the subcortical sensory pathway via hierarchical predictive coding" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Timothy D Griffiths as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by a Senior Editor.

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife.

The reviewers raise a number of issues that require an extensive revision is in order to be addressed. It is likely that further data and analysis will also be required as discussed below. It is eLife policy to reject any manuscript where the work necessary to address criticisms would take more than 3 months; therefore, we are rejecting the paper at this time. That said, the reviewers agree that the work addresses an important issue, and that it is possible that further analyses and clarification will address the large number of concerns raised by the reviewers. We therefore would welcome a "new" submission of this work when you are in a position to address the concerns raised by the reviewers.

General comments

The work addresses the interesting question of whether high-level prediction effects processing in the ascending auditory pathway. There is much less work on possible correlates of constructive models of auditory perception compared to visual perception and much less work on afferent pathways to the sensory cortices as opposed to cortical processing, so the initiative is welcome. BOLD responses are studied in the human subcortical auditory pathway using 7 Tesla fMRI with a spatial resolution of 1.5 mm isotropic to study adaptation in the subcortical regions of the auditory pathway, more precisely in the inferior colliculus and medial geniculate thalamus in order study the hierarchical predictive coding. The paradigm is appealing because of the claim that it allows interpretation in terms of stimulus specific adaptation versus predictive coding.

The reviewers were all concerned about interpretation of the data in terms of subcortical bases for predictive coding. A formal model comparison is suggested by one reviewer, and another required an additional control. The interpretation of the data is speculative in places. A further concern was the extent to which this is an advance on the work of Parras and colleagues who demonstrated a hierarchical organization of prediction error at the neural level.

Specific criticisms

Data

1) Several frequency combinations are used in the paradigm. Authors show that the latency of response for the early DEV is larger than for the later ones. This is quite reasonable and expected. While the main paradigm used is generally interesting, there is a question whether or not the BOLD response shows genuine expectancy. This issue is that expectance may be generated when authors warn subjects that a DEV will be in 4th, 5th or 6th position. It is unclear whether this BOLD response is something that brain will extract from the history of stimulation. The experimental subjects already know that these DEV will appear, no matter what, on these 3 positions. A missing control is to have a regular sequence with unexpected DEV where the subjects will have no idea of when the dev will appear. Then you can manipulate the appearance of the DEV the way you wish making the DEV to appear regularly or irregularly and see if subject can really extract any abstract rule. One reviewer felt this control required to know if the BOLD response is due to the expectancy, the position of the DEV or any other reason.

2) Another confound is that authors wish to show Stimulus specific adaptation (SSA) in IC and MGB. Subsection “Adjudicating between habituation and predictive coding”. Strictly speaking to show SSA; one should use the classical oddball paradigm and used the flip-flop control to make sure that there is indeed a genuine adaptation that is specific to the stimulus and that the differences is not due of a different sensitivity to the two frequencies uses ad DEV and STD.

3) The choice of contrast to represent adaptation was std0>std2, but why not also std0>std1?

4) In the same vein as above, Deviance detection was defined as dev4>std2, why not also consider dev 4> std1. But also, why focus only on dev4, would it not be more complete to look at all deviants (that is devi>stdj with i=4, 5, and 6; and j=1,2), since they all elicit deviance detection? (Even if dev6 is predictable, it is still deviant).

5) Were the data from the silent gaps analysed? These data are potentially invaluable to disentangle between the 2 models since with gaps we evoke prediction errors but not adaptation.

6) It was unclear how the 3 pure tones were played in the sequence structure. Did they alternate between std and dev? was a Flip-flop design used (where sound A being dev in one sequence is then dev in a different sequence? Why use 3 pure tones instead of 2 (one for dev and one for std).

7) The correlational approach to test of H2) predictive coding seems somewhat suboptimal since it's not really formal model comparison. A regressor for probability in the GLM would make more sense, if the goal is to show that BOLD responses increase a function of probability (as per Figure 1C). Alternatively, even more elegantly, a bayesian approach comparing the 2 models simultaneous using posterior probability mapping (Rosa et al., 2010) would be most appropriate to adjudicate between these alternative models. That approach has the great advantage of formally testing alternative (2 or more) models simultaneously at each and every voxel (while also avoiding the need for multiple comparisons) – hence at the end one has a map of where h1 and h2 are more likely.

8) The approaches above mentioned would also preserve the original 7T superb spatial resolution. It seems from Table 2 that activity from all voxels within MGB and IC were lumped together for the correlations – is this really necessary or even desirable? Given how ubiquitous habituation is in the brain, it is quite likely that it also occurs along this subcortical pathway. And yet, with the approach taken, this is surprisingly completely ruled out. It is possible that by lumping the data together we're losing specificity about voxels within IC/MGB where habituation is more likely than predictive coding, and voxels where the reverse (predictive coding) is more likely? If the latter dominates then by lumping data, we can only see predictive coding likely missing out on evidence for habituation (in at least some voxels).

9) If a hierarchy exist as the authors claim, this would ideally be shown by a quantitative analysis of the observed effect between the IC and MGB.

10) Why restrict the analysis to IC and MGB with functional localisers? Would it not be interesting to see if these effects emerge elsewhere in the brain (within the slab imaged – e.g., replicate effects within primary auditory cortex for example)? Also why use functional localisers instead of anatomically defined ROIs?

11) Authors also claim that the response is observed in both lemniscal and non-lemniscal regions of the IC and MGB, however, again I miss a detailed analysis about this issue and how they have separate the lemniscal IC vs non lemniscal IC and similarly about MGB. Authors refer to a previous work, but no details and actual results are provided or evident here. In fact, no data are shown about the IC.

12) Will code and data be made publicly available in keeping with the open science framework?

Exposition

1) Nomenclature. This is not a trivial issue. Authors made an unconventional use of the words habituation, stimulus-specific adaptation, stimulus-specific habituation, etc. and then they speak of neural habituation. These need to be precisely defined and used consistently in the text.

2) The section on “Results cannot be explained by task-engagement” is basically a discussion where authors try to argue if results could be explained by effects of task-engagement. This whole part pertains to the Discussion, which on the other hand is unusually brief and rather speculative and unfocused.

3) The Discussion itself required expansion to more deeply reflect on the implications of this study.

https://doi.org/10.7554/eLife.64501.sa1

Author response

[Editors’ note: the authors resubmitted a revised version of the paper for consideration. What follows is the authors’ response to the first round of review.]

The reviewers were all concerned about interpretation of the data in terms of subcortical bases for predictive coding. A formal model comparison is suggested by one reviewer, and another required an additional control. The interpretation of the data is speculative in places.

We now use formal Bayesian model comparison to interpret the data (for details, see responses to reviewer comment 7). We have also thoroughly rephrased the Discussion to remove speculative data interpretation. We assume that the request for an additional control was based on a misleading description of our paradigm and we have now rephrased this description (for details, see responses to reviewer comment 1).

A further concern was the extent to which this is an advance on the work of Parras and colleagues who demonstrated a hierarchical organization of prediction error at the neural level.

The essential differences with other previous studies on subcortical SSA is now discussed in depth in the Discussion:

“[Previous] studies have [...] used designs were predictions were generated based on the regularities of the local stimulus statistics. Although mesoscopic responses to violation of abstract rules have been reported in the sensory cortex [...], they have not been reported in subcortical nuclei to-date. Our study breaks with a long tradition on research on subcortical SSA [...] by defining the predictions based on abstract rules that were orthogonal to the regularity of the stimulus local statistics.”

This is a fundamental conceptual difference with previous work on predictive coding in the auditory pathway [7]: while previous work shows that the responses to violations of the regularities of the stimuli increases at increasingly higher stages of the ascending auditory pathway, we show that the model of the sensory world used to compute expectations on the deviants incorporates abstract information already in the MGB and IC.

Specific criticisms

Data

1) Several frequency combinations are used in the paradigm. Authors show that the latency of response for the early DEV is larger than for the later ones. This is quite reasonable and expected. While the main paradigm used is generally interesting, there is a question whether or not the BOLD response shows genuine expectancy. This issue is that expectance may be generated when authors warn subjects that a DEV will be in 4th, 5th or 6th position. It is unclear whether this BOLD response is something that brain will extract from the history of stimulation. The experimental subjects already know that these DEV will appear, no matter what, on these 3 positions. A missing control is to have a regular sequence with unexpected DEV where the subjects will have no idea of when the dev will appear. Then you can manipulate the appearance of the DEV the way you wish making the DEV to appear regularly or irregularly and see if subject can really extract any abstract rule. One reviewer felt this control required to know if the BOLD response is due to the expectancy, the position of the DEV or any other reason.

We thank the reviewer for this point that prompted us to clarify the rationale and nature of our design. The participants were not required to “extract any abstract rule”, but the rules were made clear to the participants from the out-set of the experiment. That participants already know that DEV will appear “no matter what” is an intended feature of the design. We realised that there were a couple of sentences in the manuscript that might have led to a misunderstanding of the nature of the design. We have now rephrased them:

“Note that, although the three deviant positions were equally likely at the beginning of the sequence, due to the two abstract rules the probability of finding a deviant in position 4 after hearing 3 standards is 1/3, the probability of finding a deviant in position 5 after hearing 4 standards is 1/2, and the probability of finding a deviant in position 6 after hearing 5 standards is 1. This means that participants expected deviants at all positions, but with different posterior probabilities of finding the deviant. Therefore, habituation and predictive coding make opposing predictions for the responses at the different deviant positions (Figure 1B).”

2) Another confound is that authors wish to show Stimulus specific adaptation (SSA) in IC and MGB. Subsection “Adjudicating between habituation and predictive coding”. Strictly speaking to show SSA; one should use the classical oddball paradigm and used the flip-flop control to make sure that there is indeed a genuine adaptation that is specific to the stimulus and that the differences is not due of a different sensitivity to the two frequencies uses ad DEV and STD.

We thank the reviewer for making us aware that this important aspect of the design was not clearly enough described. We can exclude that the SSA we find is due to a different sensitivity to the two frequencies as all tones were used the same number of times as deviant and standard. The design thus contains a flip-flop control. We have now clarified:

“[...] since all tones were used the same number of times as deviant and standard, dev4 − 0.5std1−0.5std2 is equivalent to the definition of the SSA index used in the animal literature.”

Since each of the three pure tones was used as many times as a deviant as it was used as a standard, our definition of deviant detection (contrast dev > std) is equivalent to that of the animal literature (devstd)/(dev + std) > 0; we have only assumed that dev + std > 0.

3) The choice of contrast to represent adaptation was std0>std2, but why not also std0>std1?

We thank the reviewer for this helpful comment. We selected std0 > std2 to maximise power, assuming that the standards immediately after the first standard would elicit higher activation than the standards at the tail of the trains. We acknowledge that by doing so we were not following the exact same principles as in the animal literature on SSA. We have repeated the analysis with the following new definition of the adaptation contrast: std0 > 0.5std1+0.5std2. This definition takes into account that there are as many tones in the std1 as in the std2 condition (an average of 3.5 in each, across all deviant positions). Results remained qualitatively the same.

4) In the same vein as above, Deviance detection was defined as dev4>std2, why not also consider dev 4> std1. But also, why focus only on dev4, would it not be more complete to look at all deviants (that is devi>stdj with i=4, 5, and 6; and j=1,2), since they all elicit deviance detection? (Even if dev6 is predictable, it is still deviant).

We thank the reviewer for prompting us to clarify the rationale of our analysis procedure. The standard condition std1 is now included in the deviant detection contrast (see the response to reviewer comment 3).

Regarding the inclusion of the other deviants, we aimed to define SSA regions that remained agnostic with respect to the underlying mechanisms. We have added the following explanation to the manuscript:

“We included only dev4 in the contrast because it is the only deviant for which the habituation and predictive coding hypotheses make the same prediction. Including dev5 and dev6, which according to the predictive coding hypothesis will elicit weaker responses, would have biased the SSA regions towards the habituation hypothesis.”

Since it turned out that the predictive coding model explains the data much better than the habituation model, including dev5 and dev6 in the adaptation contrast increases the variance and reduces the power of the dev > std contrast, which effectively shrinks the size of the SSA areas. For the reviewer’s information we have now included an additional characterisation of SSA in IC and MGB (Figure 1 rev) using the alternative definition of deviant detection proposed by the reviewer std1+std22<dev4+dev5+dev63:. Using these alternative SSA ROIs the results stay qualitatively the same (Author response image 1).

Author response image 1
Mesoscopic stimulus specific adaptation (SSA) using all deviants.

Regions within the MGB and IC ROIs showed adaptation to the repeated standards (adaptation; blue+purple) and deviant detection (red+purple). Deviant detection was defined according to the alternative contrast suggested by the reviewer std1+std22< dev4+dev5+dev63. Stimulus specific adaptation occurred in bilateral MGB and IC (purple). Contrasts are thresholded at p < 0:05 FWE-corrected for the size of each anatomical ROI. Cf. Figure 2 of the main text.

5) Were the data from the silent gaps analysed? These data are potentially invaluable to disentangle between the 2 models since with gaps we evoke prediction errors but not adaptation.

We thank the reviewer for suggesting this analysis. The silent gaps (i.e., null-events and inter-trial intervals) introduced in the design are, however, unlikely to elicit prediction error. Fully expected inter-trial intervals, of durations ranging from 1.5 to 11 seconds, separated the sequences. Because of this jitter, participants were unable to perform predictions on when the next first standard would be presented. The null-events had a duration of around 5 seconds, which makes them practically indistinguishable from a long inter-trial interval. The design was optimized to disentangle between the 2 models based on the responses to the three deviants.

6) It was unclear how the 3 pure tones were played in the sequence structure. Did they alternate between std and dev? was a Flip-flop design used (where sound A being dev in one sequence is then dev in a different sequence? Why use 3 pure tones instead of 2 (one for dev and one for std).

We thank the reviewer for making us aware that this point was not clear in the previous version of the manuscript. We now write:

“From those 3 tones we constructed 6 standard-deviant frequency combinations that were used the same number of times across each run, so that all tones were used the same number of times as deviant and standards. We used 3 rather than 2 tones so that each run contained 6 rather than 2 different standard-deviant combination, rendering the task more engaging.”

7) The correlational approach to test of H2) predictive coding seems somewhat suboptimal since it's not really formal model comparison. A regressor for probability in the GLM would make more sense, if the goal is to show that BOLD responses increase a function of probability (as per Figure 1C right). Alternatively, even more elegantly, a bayesian approach comparing the 2 models simultaneous using posterior probability mapping (Rosa et al., 2010) would be most appropriate to adjudicate between these alternative models. That approach has the great advantage of formally testing alternative (2 or more) models simultaneously at each and every voxel (while also avoiding the need for multiple comparisons) – hence at the end one has a map of where h1 and h2 are more likely.

We thank the reviewer for this fantastic idea! The models are now compared using Bayesian model comparison (BMC) according to [35]. In brief, the results show that the predictive coding model is more likely to explain the BOLD responses in most voxels of the SSA regions. For more details see the new section in the manuscript, Figure 4 and reviewer comment 8.

8) The approaches above mentioned would also preserve the original 7T superb spatial resolution. It seems from Table 2 that activity from all voxels within MGB and IC were lumped together for the correlations – is this really necessary or even desirable? Given how ubiquitous habituation is in the brain, it is quite likely that it also occurs along this subcortical pathway. And yet, with the approach taken, this is surprisingly completely ruled out. It is possible that by lumping the data together we're losing specificity about voxels within IC/MGB where habituation is more likely than predictive coding, and voxels where the reverse (predictive coding) is more likely? If the latter dominates then by lumping data, we can only see predictive coding likely missing out on evidence for habituation (in at least some voxels).

We thank the reviewer for this tremendously insightful comment. Using the BMC approach we were able to construct a map showing the posterior probability of each model across the ICs and MGBs. As the reviewer suggested, we were indeed dismissing a really interesting functional parcellation of some of the nuclei. Although the majority of voxels responded according to the predictive coding model, there were three small but continuous areas, one in the left IC and one in each of the MGBs, which were strongly driven by the habituation model. We have integrated these results in a new section “SSA is present and driven by predictive coding in both primary and secondary MGB” and in Figure 4.

9) If a hierarchy exist as the authors claim, this would ideally be shown by a quantitative analysis of the observed effect between the IC and MGB.

We thank the reviewer for making us aware of that our use of the term “hierarchy” was not described precisely in the paper. We referred to hierarchy to emphasise that “higher-level” abstract representations influence “lower-level” sensory processing. We have now removed the “hierarchical predictive coding” from the title to avoid misunderstanding. The main aim of the paper was not to show a hierarchy of prediction errors in the auditory pathway. Nevertheless, we have now discussed our results in contrast to the hierarchical effects described in the Parras et al. study:

“(Parras et al., 2007) defined prediction error as the responses to sounds that deviate from the predictions in comparison to the responses to those same sounds when there were no available predictions. The authors concluded that the IC, MGB, and AC form a hierarchical network of prediction error. Although our studies use different paradigms in different species, a similar analysis can be done in our data by comparing the responses to the most unexpected deviant (dev4) with those for which no prediction is available, i.e., the first standard in the sequences std0. Responses to dev4 are higher than responses to std0 in both, (IC and MGB Table 2 and Figure 3). This contrast with Parras’ results, where the IC showed little or no response difference between deviant and control sound.”

Moreover, the novel BMC now allows to quantify the extend of effects in IC and MGB. In brief, we find that most voxels in IC and MGB are driven by predictive coding. This is the case for 79% in the right MGB, 61% in the left MGB, 98% in the right IC, 86% in the left IC (see new result section “SSA is present and driven by predictive coding in both primary and secondary MGB”). We have also discussed these findings in the context of the hierarchical relationship between IC and MGB:

“The extend of the habituation regions in our results were qualitatively larger in the MGBs than in the ICs. This is a surprising result since the MGB receives bottom-up inputs from the IC: if, as hypothesised in the predictive coding model, the representation of expected stimuli is attenuated in IC, this attenuation should be transmitted to the MGB. One possibility is that the habituation regions of the MGBs receive inputs via the direct connections from the cochlear nucleus (CN) that bypass the IC.”

10) Why restrict the analysis to IC and MGB with functional localisers? Would it not be interesting to see if these effects emerge elsewhere in the brain (within the slab imaged – e.g., replicate effects within primary auditory cortex for example)? Also why use functional localisers instead of anatomically defined ROIs?

We restricted the analyses to the MGB and the IC because these are the two subcortical stages of auditory processing where SSA has been demonstrated in the animal literature. The positioning and size of the slab was optimised to include these structures. Although the slice potentially covered other nuclei in midbrain and thalamus, and in most subjects parts of auditory cortex, we performed the data analysis according to the stipulations of our experimental aims; namely, to test whether SSA in the subcortical auditory pathway nuclei (that are readily accessible with high-resolution fMRI) is driven by habituation or predictive coding. Including data of non-auditory subcortical nuclei or cortical areas would be beyond the scope of the study. For example, including the cerebral cortex results would need repetitions of Figures 2-5 to the result section. In addition, we would also need to considerably extend the Introduction and the Discussion to include the large body of previous works on deviant detection in cortex, which would lead the focus away from our main aim of the present study. We plan to report the results of the cortical responses for the subjects where the slab had full auditory cortical coverage elsewhere.

We consider functional and anatomical localisers as equally valid to define the IC and MGB. Given that we did not have functional localisers of all participants, we have now removed the functional localisers from the paper and re-defined the IC and MGB using the recent anatomical atlas by Sitek. The results stay qualitatively the same.

11) Authors also claim that the response is observed in both lemniscal and non-lemniscal regions of the IC and MGB, however, again I miss a detailed analysis about this issue and how they have separate the lemniscal IC vs non lemniscal IC and similarly about MGB. Authors refer to a previous work, but no details and actual results are provided or evident here. In fact, no data are shown about the IC.

We thank the reviewer for making us aware of that the lemniscal/non-lemniscal part was not written in sufficient detail. We now integrated a short description of the vMGB mask in the Materials and methods:

“A recent study [38] distinguished two distinct tonotopic gradients of the MGB. The ventral tonotopic gradient was identified as the ventral MGB (vMGB) which is the primary or lemniscal subsection of the MGB. (see Figure 5A, green). Although the parcellation is based only on the topography of the tonotopic axes and their anatomical location, the region is the best approximation to-date of the vMGB in humans.”

Unfortunately, there is currently no parcellation of the central IC in the human brain. To avoid misunderstanding, we have now rephrased the title of the subsection to “SSA is present and driven by predictive coding in both primary and secondary MGB”.

12) Will code and data be made publicly available in keeping with the open science framework?

We will publish in an open-science repository all the derivatives (β maps and log evidence maps) and code used to preprocess and analyse the data, including all the scripts needed to plot the figures of the paper. We do not have consent from the participants in compliance with the European General Law of Data Protection, to share the raw MRI data.

Exposition

1) Nomenclature. This is not a trivial issue. Authors made an unconventional use of the words habituation, stimulus-specific adaptation, stimulus-specific habituation, etc. and then they speak of neural habituation. These need to be precisely defined and used consistently in the text.

We thank the reviewers for this important comment. We have now checked the whole manuscript and adapted the nomenclature to the standard definitions. We now use only three terms: habituation (as “decreased responsiveness to increased regularities in their local statistics independently of their predictability”), predictive coding (as the framework that “suggests that neural activity represents prediction error and that such prediction error is minimal for predictable stimuli independently of their local statistics”), and stimulus specific adaptation (as the phenomenon where neurons “adapt to so-called standards (frequently occurring stimuli) yet show restored responses to so-called deviants (rarely occurring stimuli)”).

2) The section on “Results cannot be explained by task-engagement” is basically a discussion where authors try to argue if results could be explained by effects of task-engagement. This whole part pertains to the Discussion, which on the other hand is unusually brief and rather speculative and unfocused.

We have moved the paragraphs on attention to the Discussion and rewrote the Discussion.

3) The Discussion itself required expansion to more deeply reflect on the implications of this study.

We have considerably extended the Discussion.

https://doi.org/10.7554/eLife.64501.sa2

Article and author information

Author details

  1. Alejandro Tabas

    1. Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
    2. Max Planck Research Group Neural Mechanism of Human Communication, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Investigation, Methodology, Writing - original draft, Writing - review and editing
    For correspondence
    alejandro.tabas@tu-dresden.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8643-1543
  2. Glad Mihai

    1. Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
    2. Max Planck Research Group Neural Mechanism of Human Communication, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
    Contribution
    Software, Methodology
    Competing interests
    No competing interests declared
  3. Stefan Kiebel

    1. Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
    2. Centre for Tactile Internet with Human-in-the-Loop (CeTI), Technische Universität Dresden, Dresden, Germany
    Contribution
    Conceptualization, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5052-1117
  4. Robert Trampel

    Department of Neurophysics, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
    Contribution
    Methodology
    Competing interests
    No competing interests declared
  5. Katharina von Kriegstein

    1. Faculty of Psychology, Technische Universität Dresden, Dresden, Germany
    2. Max Planck Research Group Neural Mechanism of Human Communication, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
    Contribution
    Conceptualization, Resources, Supervision, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7989-5860

Funding

H2020 European Research Council (SENSOCOM (647051))

  • Katharina von Kriegstein
  • Alejandro Tabas

DFG (EXC 2050/1-Project ID 390696704)

  • Stefan Kiebel

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We sincerely thank the reviewers and editor for their constructive feedback and methodological suggestions.

Ethics

Human subjects: This study was approved by the Ethics committee of the Medical Faculty of the University of Leipzig, Germany (ethics approval number 273/14-ff). All listeners provided written informed consent and received monetary compensation for their participation.

Senior Editor

  1. Barbara G Shinn-Cunningham, Carnegie Mellon University, United States

Reviewing Editor

  1. Timothy D Griffiths, University of Newcastle, United Kingdom

Reviewer

  1. Manuel S Malmierca, University of Salamanca, Spain

Publication history

  1. Received: October 30, 2020
  2. Accepted: December 3, 2020
  3. Accepted Manuscript published: December 8, 2020 (version 1)
  4. Version of Record published: January 5, 2021 (version 2)

Copyright

© 2020, Tabas et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,774
    Page views
  • 331
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

  1. Further reading

Further reading

    1. Neuroscience
    Diptendu Mukherjee et al.
    Research Article Updated

    Drug addiction develops due to brain-wide plasticity within neuronal ensembles, mediated by dynamic gene expression. Though the most common approach to identify such ensembles relies on immediate early gene expression, little is known of how the activity of these genes is linked to modified behavior observed following repeated drug exposure. To address this gap, we present a broad-to-specific approach, beginning with a comprehensive investigation of brain-wide cocaine-driven gene expression, through the description of dynamic spatial patterns of gene induction in subregions of the striatum, and finally address functionality of region-specific gene induction in the development of cocaine preference. Our findings reveal differential cell-type specific dynamic transcriptional recruitment patterns within two subdomains of the dorsal striatum following repeated cocaine exposure. Furthermore, we demonstrate that induction of the IEG Egr2 in the ventrolateral striatum, as well as the cells within which it is expressed, are required for the development of cocaine seeking.

    1. Neuroscience
    Antonio HO Fonseca et al.
    Research Article Updated

    Mice emit ultrasonic vocalizations (USVs) that communicate socially relevant information. To detect and classify these USVs, here we describe VocalMat. VocalMat is a software that uses image-processing and differential geometry approaches to detect USVs in audio files, eliminating the need for user-defined parameters. VocalMat also uses computational vision and machine learning methods to classify USVs into distinct categories. In a data set of >4000 USVs emitted by mice, VocalMat detected over 98% of manually labeled USVs and accurately classified ≈86% of the USVs out of 11 USV categories. We then used dimensionality reduction tools to analyze the probability distribution of USV classification among different experimental groups, providing a robust method to quantify and qualify the vocal repertoire of mice. Thus, VocalMat makes it possible to perform automated, accurate, and quantitative analysis of USVs without the need for user inputs, opening the opportunity for detailed and high-throughput analysis of this behavior.