Stimulus design and recording techniques

A Shepard tones are acoustic complexes, composed of octave-spaced pure-tones (top). Each Shepard tone is uniquely characterized in frequency by its base frequency fbase, and the difference between two Shepard tones by the difference of their base frequencies fdiff, usually given in semitones. A Shepard tone shifted by a full octave ‘projects’ onto itself, and is therefore the physically same stimulus. The space of Shepard tones therefore forms a circle (bottom). We use this color mapping (hue) throughout the paper.

B We used the tritone paradox - a sequence of two Shepard tones - to investigate how ambiguous percepts are resolved, for example by preceding stimuli. In the tritone paradox, two Shepard tones are presented, which are separated by half an octave (6 semitones). Listeners, asked to judge the relative pitch between the two Shepard tones, are ambiguous as to their percept of an ascending or a descending step. If the ambiguous Shepard pair is preceded with a sequence of Shepard tones with pitches above the first but below the second tone (red area), listeners report an ascending percept. Conversely, if the ambiguous Shepard pair is preceded by a sequence of Shepard tones with pitches below the first, but above the second tone (blue area), listeners perceive a descending step. The neural representation of this contextual influence is not known, and we conducted a series of physiological and psychophysical experiments to elucidate the neural basis.

C Neural responses from individual neurons were collected in awake ferrets from the auditory cortex (left). Individual neurons modulated their firing rate during the presentation of the stimulus sequence and exhibited tuned responses (see Fig. S1). Using MEG recordings, we also collected neural response from populations of neurons in auditory cortex from human subjects, performing the up/down discrimination task. The amplitude of the magnetic field was modulated as a function of time during the stimulus presentation.

Neural responses adapt locally during the bias sequence for awake ferrets (top) and behaving humans (bottom).

A1 During the presentation of the bias sequence (10 tones, black bars), the neural response adapts over time (individual cell). This adaptation occurs for all parts of the response, shown here is the onset part (0-50 ms, black). See Fig. S1 for more details on the different response types. Errorbars denote 2 SEM across trials.

A2 On the population level, the response reaches an adapted plateau 13% below the initial response after about 5 stimuli (τ=3.9 stimuli). This rate of reduction is similar to the rate of build-up of perceptual influence in human behavior (Chambers et al. 2010, 2017). Errorbars denote 2 SEM across neurons.

A3 After the bias the activity of the cells is significantly more reduced (Δ=33%, p=0.001) around the center or the bias (<2.5 semitones from the center) compared to the edges (2.5-3 semitones from the center). Errorbars denote 2 SEM across neurons.

B1 In human auditory cortex the bias sequences also evoked an adapting sequence of responses, here shown is the activity for a single subject (#8). Errorbars denote 2 SEM across trials.

B2 On average, the adaptation of the neural response proceeded with a similar time course as the single-unit response (A2), and plateaus after about 3-4 stimuli. Errorbars denote 2 SEM across subjects.

B3 Following the Bias, the activity state of the cortex is probed with a sequence of brief stimuli (35 ms Shepard tones, after 0.5 s silence). Responses to probe tones in the same (red) semi-octave are significantly reduced (21% for the first time window, signed ranks test, p<0.0001) compared to the corresponding responsed in the opposite semioctave (blue), indicating a local effect of adaptation. Errorbars denote one SEM across subjects.

Population decoding predicts a bias-induced, repulsive shift in pitchclass.

A We decoded the represented stimulus using dimensionality reduction techniques (see Fig. S2 for population-vector decoding). The stimulus identity (top) is reflected in the joint activity of all neurons (middle). If the neurons are considered as dimensions of a high-dimensional response space, the circular stimulus space of Shepard tones induces a circular manifold of responses, which lies in a lower dimensional space (light red plane). Colors represent a Shepard tone’s pitchclass, also in the following graphs.

B The entire set of responses to the 240 distinct Shepard tones (from the various bias sequences) is projected by the decoding into a low dimensional space (dots, hue = true pitchclass), in which neighboring stimuli fall close to each other and the stimuli overall form a circle. The thick, colored line is computed from local averages along the range of pitchclasses and emphasizes the circular structure. The Shepard tones in the ambiguous pairs are projected using the same decoder (denoted by the different triangles, hue = true pitchclass), and roughly fall into their expected locations. However, if a stimulus was relatively above the bias sequence (△, bright, upward triangles), their representation is shifted to higher pitchclasses, compared to the same stimulus when located relatively below the bias sequence (▾, dark, downward triangles). Hence, the preceding bias repulsed the stimuli in the ambiguous pair in their represented pitchclass. Both stimuli of the pair are treated equally here.

C To demonstrate that decoding in this way is reliable, we compare real and estimated pitchclasses (by taking the circular position in B) for each stimulus, which exhibits a reliable relation (r=0.995).

D The influence of the bias can be compared quantitatively by centering the represented test stimuli around their actual pitchclass and inspecting the difference between the two different bias conditions. The bias shifts them away with high significance (p<10-43, Wilcoxon-test).

E The size of the shift is influenced by the length of the bias sequence (5 tones = red, 10 tones = black) and the time between the bias and the test tones (τ=1.1s).

Repulsive shifts are not consistent with minimal distance hypothesis

A In the case of an unbiased Shepard pair, steps of less than 6 pitch-classes lead to unambiguous percepts, e.g. a 0st to 3st steps leads to an ascending percept (red) and a 0st to 9st (i.e. 0st =>-3st) step to a descending percept (blue). Semioctave (0st to 6st, blue/red) steps lead to an ambiguous percept, whose percept at a time can depend on individual properties (Deutsch 1986) and the stimulus history (Chambers et al. 2012). This suggests the minimal distance hypothesis which predicts the percept to follow the smaller of the two distances along the circle between the two pitch-classes

B In the case of an ambiguous Shepard pair (0st to 6st), preceded by a bias sequence (red bar, right), here an UP-Bias, the ascending percept together with the minimal distance hypothesis would predict the distance between the Shepard tones to be reduced on the side of the UP-Bias (red dots). However, the population decoding shows that the distance between the tones is indeed increased on the side of the UP-Bias, challenging the minimal distance hypothesis.

Distributed activity & local adaptation predict tuning changes and repulsive shifts

A We model the encoding by a simplified model, which starts from the cochlea, includes only one intermediate station (e.g. the MGB), and then projects to cortical neurons. The model is general in the sense that a cascaded version would lead to the same response, as long as similar mechanisms act on each level. A stimulus elicits an activity distribution along the cochlea (bottom) which is retained in shape on the intermediate level (2nd from below). In the native state, the stimulus is transferred to the cortical level without adaptation (2nd from top, black) and integrated by the cortical neuron (top, black). After a stimulus presentation, an adapted trough is left behind in the connections leading up to the cortical level (2nd from top, red), which reduces the cortical tuning curve locally. Since tuning curves closer to the center adapt more strongly, the stimulus representation in the neural population shifts away from the region of adaptation.

B Applying the same analysis as above (Fig. 3) for the real data leads again to a circular decoding (B1), with the estimated pitches of the tones the Shepard pair shifted repulsively by the preceding bias (B2, for more details see the description of Fig. 3).

C Single cells show adaptation of their responses colocalized (different colors) with the biased region (colored bars, bottom). The bias was presented in 4 different regions in separate trials, and the tuning of the cell probed in between the biasing stimuli. The left side shows a model example and the right side a representative neural example.

D Centered on the bias, neurons in the auditory cortex adapt their onset and sustained responses colocal with the bias, and more broadly for the offset response. The curves represent the difference in response rate between the unadapted tuning and the adapted tuning, again for model cells on the left and actual data on the right.

E The presence of the bias reduces the firing rate relative to the initial discharge rate, by ∼40% (red), while the rate stays the same or is slightly elevated outside of the bias regions (green) (see Fig. S3 for the decoding results and two related, incompatible models, which demonstrate noteworthy subtleties of the decoding process).

Decoding based on the directionality of the individual cells predicts the directional percept.

A The directional percept (left) is predicted by the average over the cells activity (right), weighted by their directionality (right) and their distance to the stimulus (middle).

B Examples three directional cells based on the Shepard tones based spectrotemporal receptive fields (STRFs). Directionality was determined by the asymmetry of the 2nd column of the STRF (response to previous stimulus), centered at the maximum (BF) of the first column (response to current stimulus, see Methods for details). As usual, time on the abscissa runs into the past. The middle cell for example is a down-cell, since it responds more strongly to a stimulus sequence 10st => 7st, than 4st => 7st based on the STRF.

C The prediction of the decoding (ordinate) compared to the usually perceived direction for the two sequences (abscissa). Predictions depended on the length of the sequence (o = 5 tones, = 10 tones) and the predicted tone (red = 1st tone, blue = 2nd tone). The dashed red line corresponds to a flat prediction.

D Predictive performance increased as a function of bias length and distance to the bias, reflected as 1st (red) or 2nd (blue) tone after the bias. Both dependencies are consistent with human performance and the build-up and recovery of adaptive processes.

E The basis for the directional decoding can be analyzed by considering the entire set of bias-induced differences in response, arranged by the directional preference of each cell (abscissa), and the location in BF relative to each stimulus in the Shepard pair (abscissa). Applying the analysis to the neural data, the obtained pattern of activity (top) is composed of two angled stripes of positive and negative differential activity. For cells with BFs close to the pitchclass of the test tones, the relative activities are significantly different (p=0.03, 1-way ANOVA) between ascending and descending preferring cells, thus predicting the percept of these tones. Grey boxes indicate combinations of directionality and relative location which did not exist in the cell population.

F Applied to a population of model neurons (as in Fig. 4, see Methods for details) subjected to the same stimulus as the real neurons, in the absence of adaptation (left) no significant pattern emerges. If no directional cells are present (middle), adaptation leads to a distinct pattern for different relative spectral locations, but the lack of directional cells prevents a directional judgement. Finally, with adaptation and directional cells a pattern of differential activation is obtained, similar to the pattern in the neural data. T Cells located close to the target tone (near 0 on the ordinate) show a differential activity, predictive of the percept, which was used in the direct decoding above (shown separately in the lower plots). While these activities exhibit no significant dependence in the absence of adaptation or directional cells, the dependence becomes significantly characteristic with adaptation (p<0.001, 1-way ANOVA, bottom right).

G The above results can be summarized as a symmetric imbalance in the activities of directional cells after the Bias around it (right), which when decoded predict steps consistent with the percept, i.e. both are judged in their relative position to the bias. Hence the percept of the frequency change direction is determined by the local activity, rather than by a global distance.

Auditory cortex neurons respond tuned to Shepard tones. Three representative neurons are shown, which span the range of observed response types. Neurons differed in their pitchclass tuning and their temporal response profiles. The top row depicts the response pattern for 10 repetitions of a fixed bias sequence, in the bottom row the responses to all presented Shepard tones during the bias sequences have been sorted by pitchclass, much like in a classical pure tone based tuning.

A Most cells exhibited an onset type tuning, i.e. a brief response within the first 50ms (orange) each after stimulus onset, but no sustained (red, 50-100ms) or offset response (black, 0-50 ms in the silence after the stimulus). These cells typically showed broad/complex tuning w.r.t. to the pitchclass of the stimulus.

B A considerable fraction of cells had a surprisingly sharp pitchclass tuning, with tuning half-widths of only 2-3 semitones, leading to comparably sparse response patterns. These cells often showed cotuned onset and sustained responses.

C The remaining cells exhibited predominant offset pitchclass tunings, which showed similarly broad/complex tunings as the onset tuned cells.

D Distribution of response types over the entire set of cells.

E Tuning halfwidth (f50%) for individual cells as a function of pitchclass (angle) and response type (colors aas in D). The radius indicates the halfwidth of the tuning in octaves (see Methods). More precisely tuned cells lie in the center of the circle.

Populationvector-based decoding also predicts a repulsive shift in pitchclass.

A Individual stimuli (top) can also be decoded, by assigning each neuron its preferred pitchclass (a complex number, gray vectors, bottom) and weighting these by the neural response (small purple dots, bottom) for the given stimulus. The angle of the average of these vectors (black vectors, bottom) then corresponds to the decoded stimulus (big purple dot, bottom).

B As in Fig.3, the entire set of 240 distinct Shepard tones (from the various bias sequences) is represented in a lower dimensional space (dots, hue = true pitchclass), in which neighboring stimuli fall close to each other and the stimuli overall form a circle. The Shepard tones in the ambiguous pairs are projected using the same decoder (denoted by the different triangles, hue = true pitchclass), and roughly fall into their expected locations. As before, if a stimulus was relatively above the bias sequence (△, bright, upward triangles), their representation is shifted to higher pitchclasses, compared to the same stimulus when located relatively below the bias sequence (▾, dark, downward triangles). Hence, the preceding bias repulsed the stimuli in the ambiguous pair in their represented pitchclass. Both stimuli of the pair are treated equally here.

C As before for the dimensionality reduction decoding we compare real and estimated pitchclasses (by taking the circular position in B) for each stimulus, which explains >99% of the variance.

D The influence of the bias can be compared quantitatively, by centering the represented test stimuli around their actual pitchclass and inspecting the difference between the two different bias conditions. The bias shifts them away with high significance (p<10^-43, Wilcoxon-test).

E As before, the size of the shift is influenced by the length of the bias sequence (5 tones = red, 10 tones = black) and the time between the bias and the test tones (τ=0.92s).

Local and global (postsynaptic) adaptation cannot explain both influences of the bias on the tuning (A1-C1) and the represented stimuli (A2-C2).

A The observed tuning curves could also be obtained if the adaptation was assumed to be strictly local, and no distributed activity in response to a stimulus was considered. In this case, however, no change in stimulus representation is expected (A2, below (red) and above (blue) curves are identical), since the local adaptation would reduce the response of each neuron (either multiplicatively or subtractively) and, hence, not changing the center of the represented stimulus.

B The repulsive shifts in the represented stimulus could also be obtained, if adaptation was assumed to be global across all inputs, i.e. only on the postsynaptic side, determined by the total postsynaptic activity. In this case, however, tuning curves are predicted to scale, rather than change in shape or position (B1). This is inconsistent with the observed changes in tuning curve shape (Fig. 4C-E).

C If local adaptation (either on the synaptic level or in the cells projecting on the present cell) is combined with a distributed activity in the ascending auditory system, sharply adapted tuning curves are obtained as well as a repulsive influence of a preceding sequence of bias stimuli. In response to a stimulus, both local changes in tuning curves (C1) and bias-repulsed stimulus representation are obtained (C2). It should be noted that other models exist which would produce similar results, yet, the present model makes only a limited set of well accepted assumptions and is consistent with the data. A3-C3 The decoding curves are obtained from the encoding-decoding matrix, which relates the stimulus (abscissa) to the response for each neuron (ordinate). Decoding consists of reading off the population response at the test stimulus (blue vertical line). Here, the bias below condition is illustrated (red curves in the other plots).