Decoding contextual influences on auditory perception from primary auditory cortex

Bernhard Englitz author has email address
Sahar Akram
Mounya Elhilali
Shihab Shamma

Institute for Systems Research, University of Maryland, College Park, MD, USA
Computational Neuroscience Lab, Donders Institute for Brain Cognition and Behavior, Radboud University, Nijmegen, The Netherlands
Research Data Science, Meta Platforms, United States
Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, USA
Equipe Audition, Ecole Normale Supérieure, Paris, France

https://doi.org/10.7554/eLife.94296.2

Open access
Copyright information

Figures and data

Stimulus design and recording techniques
A Shepard tones are acoustic complexes, composed of octave-spaced pure-tones (top). Each Shepard tone is uniquely characterized by its base frequency f_base, and the difference between two Shepard tones by the difference of their base frequencies f_diff, usually given in semitones. A Shepard tone shifted by a full octave ‘projects’ onto itself, and is therefore the physically same stimulus. The space of Shepard tones therefore forms a circle (bottom), in which each stimulus can be characterized as a so-called pitch class in semitones, which runs from 0 to 12, corresponding to a full octave. The base pitch class 0 was here chosen to correspond to f_base = 440Hz. We use the displayed color mapping (hue) throughout the paper.
B We used the tritone paradox - a sequence of two Shepard tones - to investigate how ambiguous percepts are resolved, for example by preceding stimuli. In the tritone paradox, two Shepard tones are presented, which are separated by half an octave (6 semitones). Listeners, asked to judge the relative pitch between the two Shepard tones, are ambiguous as to their percept of an ascending or a descending step. If the ambiguous Shepard pair is preceded with a sequence of Shepard tones with pitch classes above the first but below the second tone (red area), listeners report an ascending percept. Conversely, if the ambiguous Shepard pair is preceded by a sequence of Shepard tones with pitch classes below the first, but above the second tone (blue area), listeners perceive a descending step. The neural representation of this contextual influence is not known, and we conducted a series of physiological and psychophysical experiments to elucidate the neural basis.
C Neural responses from individual neurons were collected in awake ferrets from the auditory cortex (left). Individual neurons modulated their firing rate during the presentation of the stimulus sequence and exhibited tuned responses (see Fig. S1). Using MEG recordings, we also collected neural responses from populations of neurons in auditory cortex from human subjects, performing the up/down discrimination task. The amplitude of the magnetic field was modulated as a function of time during the stimulus presentation.

Neural responses adapt locally during the Bias sequence for awake ferrets (top) and behaving humans (bottom).
A1 During the presentation of the Bias sequence (10 tones, black bars), the neural response adapts over time (individual cell). This adaptation occurs for all parts of the response, shown here is the onset part (0-50 ms, black). See Fig. S1 for more details on the different response types. Errorbars denote 2 SEM across trials.
A2 On the population level, the response reaches an adapted plateau 13% below the initial response after about 5 stimuli (τ=3.9 stimuli, also for the onset response). This rate of reduction is similar to the rate of build-up of perceptual influence in human behavior [15,18]. Errorbars denote 2 SEM across neurons.
A3 After the Bias the activity of the cells is significantly more reduced (Δ=33%, p=0.001) around the center or the Bias (<2.5 semitones from the center) compared to the edges (2.5-3 semitones from the center). Errorbars denote 2 SEM across neurons.
A4 Recovery of spontaneous neural firing rates from the adaptation due to the Bias sequence progressed over multiple seconds with an exponential recovery time-constant of 1.2 s.
B1 In human auditory cortex the Bias sequences also evoked an adapting sequence of responses, here shown is the activity for a single subject (#8). Errorbars denote 2 SEM across trials.
B2 On average, the adaptation of the neural response proceeded with a similar time course as the single-unit response (A2), and plateaus after about 3-4 stimuli. Errorbars denote 2 SEM across subjects.
B3 Following the Bias, the activity state of the cortex is probed with a sequence of brief stimuli (35 ms Shepard tones, after 0.5 s silence). Responses to probe tones in the same (red) semi-octave are significantly reduced (21% for the first time window, signed ranks test, p<0.0001) compared to the corresponding response in the opposite semi-octave (blue), indicating a local effect of adaptation. Errorbars denote one SEM across subjects.

Repulsive shifts are not consistent with minimal distance hypothesis
A In the case of an unbiased Shepard pair, steps of less than 6 pitch classes lead to unambiguous percepts, e.g. a 0st to 3st steps leads to an ascending percept (red) and a 0st to 9st (i.e. 0st =>-3st) step to a descending percept (blue). Semi-octave (0st to 6st, blue/red) steps lead to an ambiguous percept, which strongly depends on the stimulus history [18] [15] and even the individual’s specifics, such as language experience and vocal range [49]. This suggests the minimal distance hypothesis which predicts the percept to follow the smaller of the two distances along the circle between the two pitch classes
B In the case of an ambiguous Shepard pair (0st to 6st), preceded by a Bias sequence (red bar, right), here an UP-Bias, the ascending percept together with the minimal distance hypothesis would predict the distance between the Shepard tones to be reduced on the side of the UP-Bias (red dots). However, the population decoding shows that the distance between the tones is indeed increased on the side of the UP-Bias, challenging the minimal distance hypothesis.

Population decoding predicts a Bias-induced, repulsive shift in pitch class.
A We decoded the represented stimulus using dimensionality reduction techniques (see Fig. S2 for population-vector decoding). The stimulus identity (top) is reflected in the joint activity of all neurons (middle). If the neurons are considered as dimensions of a high-dimensional response space, the circular stimulus space of Shepard tones induces a circular manifold of responses, which lies in a lower dimensional space (light red plane). Colors represent a Shepard tone’s pitch class, also in the following graphs.
B The entire set of responses to the 240 distinct Shepard tones (from the various Bias sequences) is projected by the decoding into a low dimensional space (dots, hue = true pitch class), in which neighboring stimuli fall close to each other and the stimuli overall form a circle. The thick, colored line is computed from local averages along the range of pitch classes and emphasizes the circular structure. The Shepard tones in the ambiguous pairs are projected using the same decoder (denoted by the different triangles, hue = true pitch class), and roughly fall into their expected locations. However, if a stimulus was relatively above the Bias sequence (△, bright, upward triangles), their representation is shifted to higher pitch classes, compared to the same stimulus when located relatively below the Bias sequence (▾, dark, downward triangles). Hence, the preceding Bias repulsed the stimuli in the ambiguous pair in their represented pitch class. Both stimuli of the pair are treated equally here.
C To demonstrate that decoding in this way is reliable, we compare real and estimated pitch classes (by taking the circular position in B) for each stimulus, which exhibits a reliable relation (r=0.995).
D The influence of the Bias can be compared quantitatively by centering the represented test stimuli around their actual pitch class and inspecting the difference between the two different Bias conditions. After the Bias the decoded pitch class is shifted from their actual pitch class away from the biased pitch class range with high significance (p<10^-43, Wilcoxon-test).
E The size of the shift is influenced by the length of the Bias sequence (5 tones = red, 10 tones = black) and the time between the Bias and the test tones (τ=1.1s).

Distributed activity & local adaptation predict tuning changes and repulsive shifts
A We model the encoding by a simplified model, which starts from the cochlea, includes only one intermediate station (e.g. the MGB), and then projects to cortical neurons. The model is general in the sense that a cascaded version would lead to the same response, as long as similar mechanisms act on each level. A stimulus elicits an activity distribution along the cochlea (bottom) which is retained in shape on the intermediate level (2nd from below). In the native state, the stimulus is transferred to the cortical level without adaptation (2nd from top, black) and integrated by the cortical neuron (top, black). After a stimulus presentation, an adapted trough is left behind in the connections leading up to the cortical level (2nd from top, red), which reduces the cortical tuning curve locally. Since tuning curves closer to the center adapt more strongly, the stimulus representation in the neural population shifts away from the region of adaptation.
B Applying the same analysis as above (Fig. 3) for the real data leads again to a circular decoding (B1), with the estimated pitch classes of the tones the Shepard pair shifted repulsively by the preceding Bias (B2, for more details see the description of Fig. 4).
C Single cells show adaptation of their responses colocalized (different colors) with the biased region (colored bars, bottom). The Bias was presented in 4 different regions in separate trials, and the tuning of the cell probed in between the biasing stimuli. The left side shows a model example and the right side a representative, neural example.
D Centered on the Bias, neurons in the auditory cortex adapt their response colocal with the Bias. The curves represent the difference in response rate between the unadapted tuning and the adapted tuning, again for model cells on the left and actual data on the right.
E The presence of the Bias reduces the firing rate relative to the initial discharge rate, by ∼40% (red), while the rate stays the same or is slightly elevated outside of the Bias regions (green) (see Fig. S3 for the decoding results and two related, incompatible models, which demonstrate noteworthy subtleties of the decoding process).

Decoding based on the directionality of the individual cells predicts the directional percept.
A The predicted directional percept D(S) for a stimulus S is computed by the average over the cells activity f_i(S), weighted by their directionality D_i and their distance to the stimulus w_i . The inset images show prototypical SSTRFs of cells with the ascending (top, up), descending (bottom, down) and undirected (middle) directional preference.
B Examples three directional cells based on the Shepard tones based spectrotemporal receptive fields (SSTRFs). Directionality was determined by the asymmetry of the 2nd column of the SSTRF (response to previous stimulus), centered at the maximum (BF) of the first column (response to current stimulus, see Methods for details). As usual, time on the abscissa runs into the past. The middle cell for example is a down-cell, since it responds more strongly to a stimulus sequence 10st => 7st, than 4st => 7st based on the SSTRF.
|C The prediction of the decoding (ordinate) compared to the usually perceived direction for the two sequences (abscissa). Predictions depended on the length of the sequence (o = 5 tones, • = 10 tones) and the predicted tone (red = 1st tone, blue = 2nd tone). The dashed red line corresponds to a flat prediction.
D Predictive performance increased as a function of Bias length and distance to the Bias, reflected as 1st (red) or 2nd (blue) tone after the Bias. Both dependencies are consistent with human performance and the build-up and recovery of adaptive processes.
E The basis for the directional decoding can be analyzed by considering the entire set of Bias-induced differences in response, arranged by the directional preference of each cell (abscissa), and the location in BF relative to each stimulus in the Shepard pair (ordinate). Applying the analysis to the neural data, the obtained pattern of activity (top) is composed of two angled stripes of positive (red) and negative (blue) differential activity. For cells with BFs close to the pitch class of the test tones, the relative activities are significantly different (p=0.03, 1-way ANOVA) between ascending and descending preferring cells, thus predicting the percept of these tones. Gray boxes indicate combinations of directionality and relative location which did not exist in the cell population.
F Applied to a population of model neurons (as in Fig. 5, see Methods for details) subjected to the same stimulus as the real neurons, in the absence of adaptation (left) no significant pattern emerges. If no directional cells are present (middle), adaptation leads to a distinct pattern for different relative spectral locations, but the lack of directional cells prevents a directional judgment. Finally, with adaptation and directional cells a pattern of differential activation is obtained, similar to the pattern in the neural data. T Cells located close to the target tone (near 0 on the ordinate) show a differential activity, predictive of the percept, which was used in the direct decoding above (shown separately in the lower plots). While these activities exhibit no significant dependence in the absence of adaptation or directional cells, the dependence becomes significantly characteristic with adaptation (p<0.001, 1-way ANOVA, bottom right).
G The above results can be summarized as a symmetric imbalance in the activities of directional cells after the Bias around it (right), which when decoded predict steps consistent with the percept, i.e. both are judged in their relative position to the Bias. Hence the percept of the pitch change direction is determined by the local activity, rather than by the circular distance between Shepard tones (Fig. 3).

Auditory cortex neurons respond tuned to Shepard tones. Three representative neurons are shown, which span the range of observed response types. Neurons differed in their pitch class tuning and their temporal response profiles. The top row depicts the response pattern for 10 repetitions of a fixed Bias sequence, in the bottom row the responses to all presented Shepard tones during the Bias sequences have been sorted by pitch class, much like in a classical pure tone based tuning.
A Most cells exhibited an onset type tuning, i.e. a brief response within the first 50ms (orange) each after stimulus onset, but no sustained (red, 50-100ms) or offset response (black, 0-50 ms in the silence after the stimulus). These cells typically showed broad/complex tuning w.r.t. to the pitch class of the stimulus.
B A considerable fraction of cells had a surprisingly sharp pitch class tuning, with tuning half-widths of only 2-3 semitones, leading to comparably sparse response patterns. These cells often showed cotuned onset and sustained responses.
C The remaining cells exhibited predominant offset pitch class tunings, which showed similarly broad/complex tunings as the onset tuned cells.
D Distribution of response types over the entire set of cells.
E Tuning halfwidth (f_50%) for individual cells as a function of pitch class (angle) and response type (colors as in D). The radius indicates the halfwidth of the tuning in octaves (see Methods). More precisely tuned cells lie in the center of the circle. This indicates that while many cells were tuned narrowly enough to only be excited by a single constituent tone, many others were excited and/or inhibited by multiple constituent tones of the Shepard tone.

Populationvector-based decoding also predicts a repulsive shift in pitch class.
A Individual stimuli (top) can also be decoded, by assigning each neuron its preferred pitch class (a complex number, gray vectors, bottom) and weighting these by the neural response (small purple dots, bottom) for the given stimulus. The angle of the average of these vectors (black vectors, bottom) then corresponds to the decoded stimulus (big purple dot, bottom).
B As in Fig.3, the entire set of 240 distinct Shepard tones (from the various Bias sequences) is represented in a lower dimensional space (dots, hue = true pitch class), in which neighboring stimuli fall close to each other and the stimuli overall form a circle. The Shepard tones in the ambiguous pairs are projected using the same decoder (denoted by the different triangles, hue = true pitch class), and roughly fall into their expected locations. As before, if a stimulus was relatively above the Bias sequence (△, bright, upward triangles), their representation is shifted to higher pitch classes, compared to the same stimulus when located relatively below the Bias sequence (▾, dark, downward triangles). Hence, the preceding Bias repulsed the stimuli in the ambiguous pair in their represented pitch class. Both stimuli of the pair are treated equally here.
C As before for the dimensionality reduction decoding we compare real and estimated pitch classes (by taking the circular position in B) for each stimulus, which explains >99% of the variance.
D The influence of the Bias can be compared quantitatively, by centering the represented test stimuli around their actual pitch class and inspecting the difference between the two different Bias conditions. The Bias shifts them away with high significance (p<10I-43, Wilcoxon-test).
E As before, the size of the shift is influenced by the length of the Bias sequence (5 tones = red, 10 tones = black) and the time between the Bias and the test tones (τ=0.92s).

Local and global (postsynaptic) adaptation cannot explain both influences of the Bias on the tuning (A1-C1) and the represented stimuli (A2-C2).
A The observed tuning curves could also be obtained if the adaptation was assumed to be strictly local, and no distributed activity in response to a stimulus was considered. In this case, however, no change in stimulus representation is expected (A2, below (red) and above (blue) curves are identical), since the local adaptation would reduce the response of each neuron (either multiplicatively or subtractively) and, hence, not changing the center of the represented stimulus.
B The repulsive shifts in the represented stimulus could also be obtained, if adaptation was assumed to be global across all inputs, i.e. only on the postsynaptic side, determined by the total postsynaptic activity. In this case, however, tuning curves are predicted to scale, rather than change in shape or position (B1). This is inconsistent with the observed changes in tuning curve shape (Fig. 5C-E).
C If local adaptation (either on the synaptic level or in the cells projecting on the present cell) is combined with a distributed activity in the ascending auditory system, sharply adapted tuning curves are obtained as well as a repulsive influence of a preceding sequence of Bias stimuli. In response to a stimulus, both local changes in tuning curves (C1) and Bias-repulsed stimulus representation are obtained (C2). It should be noted that other models exist which would produce similar results, yet, the present model makes only a limited set of well accepted assumptions and is consistent with the data. A3-C3 The decoding curves are obtained from the encoding-decoding matrix, which relates the stimulus (abscissa) to the response for each neuron (ordinate). Decoding consists of reading off the population response at the test stimulus (blue vertical line). Here, the Bias below condition is illustrated (red curves in the other plots).

Sign up for email alerts