Abstract
Perception can be highly dependent on stimulus context, but whether and how sensory areas encode the context remains uncertain. We used an ambiguous auditory stimulus - a tritone pair - to investigate the neural activity associated with a preceding contextual stimulus that strongly influenced the tritone pair’s perception: either as an ascending or a descending step in pitch.
We recorded single-unit responses from a population of auditory cortical cells in awake ferrets listening to the tritone pairs preceded by the contextual stimulus. We find that the responses adapt locally to the contextual stimulus, consistent with human MEG recordings from the auditory cortex under the same conditions. Decoding the population responses demonstrates that cells responding to pitch-class-changes are able to predict well the context-sensitive percept of the tritone pairs. Conversely, decoding the individual pitch-class representations and taking their distance in the circular Shepard tone space predicts the opposite of the percept. The various percepts can be readily captured and explained by a neural model of cortical activity based on populations of adapting, pitch-class and pitch-class-direction cells, aligned with the neurophysiological responses.
Together, these decoding and model results suggest that contextual influences on perception may well be already encoded at the level of the primary sensory cortices, reflecting basic neural response properties commonly found in these areas.
Introduction
In real world scenarios, the elements of the sensory environment do not occur independently [1,2]. Temporal, spatial and informational predictability exists within and across modalities, already as a consequence of the basic physical properties, such as spatial and temporal continuity [3]. Neural systems make efficient use of this inherent predictability of the environment in the form of expectations [4]. Expectations are valuable, because they provide an internal mechanism to recognize stimuli faster [5] and more reliably under noisy conditions [6] with speech being of specific relevance for humans [7]. As a consequence, the same stimulus can be perceived differently, depending on the context it occurs in, e.g. which stimuli it is preceded by or which stimuli co-occur with it [8]. We can thus study the expectation underlying a percept by studying the nature of the contextual influence.
Within audition, several forms of contextual influences have been found to shape perception. They range from spatial (e.g. localization in different contexts), to grouping (e.g. ABA sequences, [9]) and phonetic [10] contextual influences. A striking example in human communication concerns the perception of certain syllable sequences, such as an ambiguous syllable between /ga/ and /da/ preceded by either /al/ or /ar/. In both, the second syllable is physically identical, but is heard as /da/ or /ga/ depending on the preceding syllable [11]. Subsequent psychoacoustic investigations have revealed that this effect still occurs if the preceding syllable was replaced by appropriately chosen tone sequences, that it persists with substantial silent gaps between the two syllables, and that only very few tones are in fact necessary to bias the percepts one way or another [10]. Hence, these contextual effects are likely not linguistic in nature, but reflect more basic adaptive neural mechanisms. Different interpretations have been provided to interpret these findings, such as the enhancement of contrasts [10,12].
Here, we investigate the neural correlates of these contextual effects using a simplified paradigm, in which the context reliably biases the percept of an ambiguous acoustic stimulus: A sequence of two Shepard tones [13], differing by half an octave in the frequencies of its constituent tones, can be perceived as ascending or descending in pitch. Shepard tones are complex tones with octave spaced constituent tones (Fig. 1A). This percept can be reliably manipulated by presenting a suitably chosen sequence of Shepard tones before, setting up different contexts (Fig. 1B). This contextual influence is highly effective, rapidly established and can last for multiple seconds [14,15]. Importantly, the ability to determine the changes in pitch has relevance for a wide spectrum of real-world tasks, ranging from distinguishing an approaching from a departing vehicle to distinguishing different emotions in human communication [16,17].
We presented various Shepard tone sequences to awake ferrets and humans while simultaneously performing single-unit population recordings using chronically implanted electrode arrays in the left primary auditory cortex, and MEG (magnetoencephalographic) recordings, respectively. Exposure to the contextual sequence resulted in localized adaptation that faded over the time-course of ∼1 s, consistent with stimulus specific adaptation [12] and with findings from a related human MEG study [18]. A straight-forward decoding approach demonstrates that the perceived pitch-change direction can be directly related to the contextually adapted activity of direction selective cells, i.e. cells that have a preference for sounds with ascending or descending frequency over time [19,20]. Conversely, decoding the represented Shepard tone pitch classes and their respective differences, predicts a repulsive effect, opposite to the perceived direction of pitch change. These underlying neuronal adaptation dynamics are consistent with changes in neural activity in the auditory cortex estimated from human MEG recordings collected for the same sounds.
We can account for these effects in a simplified model of the cortical representation based on known properties of pitch-class-change selective cells, which matches both the results from the directional- and the distance-decoding analysis. Further, the model is consistent with multiple observed properties of the neural representation, including tuning changes and directional tuning of individual neurons as well as the build-up of the contextual effect in humans.
Results
We collected neural recordings from 7 awake ferrets (662 responsive, tuned single units) and from 16 humans (MEG recordings) in response to sequences of Shepard tones (the ‘Bias’) followed by an ambiguous, semi-octave separated test pair (Fig. 1B). The human participants performed a 2-alternative forced choice task, selecting between hearing an ascending or descending step in the test pair, while ferrets listened passively. It has been previously shown [15,18] that the presence of the Bias reliably influences human perception, towards hearing a pitch step ’bridging’ the location of the Bias, i.e. if the Bias is located above the first tone an ascending step is heard (Fig. 1B middle), and a descending one, if it is below (Fig. 1B bottom). The present study investigates the neural representation underlying these modified percepts. In the following, we first show that the bias sequence induces a local adaptation in the neural population activity in both passive (ferret) and active (human) condition (Fig. 2, see Discussion for more details). Next, we demonstrate the effect of this adaptation on the stimulus representation by decoding the neural population response. We find a repulsive influence of the Bias sequence on the pitch classes in the pair, i.e. the pitch class of each Shepard tone shifts away from the Bias. This increases their distance in pitch class along the perceived direction (Fig. 3) and is thus not compatible with a simple, pitch (class)-distance based decoder as an explanation of the directionality percept (Fig. 4). We then provide a simplified model of neuronal activity in auditory cortex that captures both the population representation and the adaptive changes in tuning of individual cells (Fig. 5). Finally, based on this model, we provide an alternative explanation for the directionality percept of the ambiguous pair by showing that the adaptation pattern of directional cells predicts the percept (Fig. 6).
For simplicity, the term pitch is used interchangeably with pitch class as only Shepard tones are considered in this study. Pitch (class) is here mainly used as a term to describe the neural responses to Shepard tones, as in previous literature on the topic, and the fact that Shepard tones are composite stimuli that lead to a pitch percept.
The contextual bias adapts the neural population locally
Adaptation to a stimulus is a ubiquitous phenomenon in neural systems [21–23]. Multiple kinds and roles of adaptation have been proposed, ranging from fatigue to adaptation in statistics [24–27] to higher-order adaptation [28,29]. Since adaptation has previously been implicated in affecting perception, e.g. in the tilt-after effect in vision [30,31], we start out by characterizing the adaptation in neural response during and following the Bias under awake (ferrets, single units) and behaving (humans, MEG) conditions. The Bias was matched to the choice from a previous human study ( [18], see Fig. 1B), i.e. it consisted of a sequence of 5 or 10 Shepard tones with pitch classes randomly drawn from a range of 5 semitones, symmetrically arranged around a central pitch class, e.g. a Bias sequence centered at 3 semitones had individual tones drawn from 0.5-5.5 semitones. Relative to the ambiguous pairs, there were both an up- and a down-Bias, positioned above or below the first tone in the ambiguous pair, respectively. In the single unit data the average response strength decreases as a function of the position in the Bias sequence. Cells adapted their onset, sustained and offset response within a few tones in the biasing sequence (Fig. 2 A1). This behavior was observable for the vast majority of cells (91%, p<0.05, Kruskal-Wallis test), and is thus conserved in the grand average (Fig. 2 A2). The adaptation plateaued after about 3-4 stimuli (corresponding to a time-constant of 0.59s) on a level about 13% below the initial level.
The single-unit response strength is reduced locally by the Bias sequence, i.e. more strongly for the range of Shepard tones occuring in the Bias. The responses to the tones in the ambiguous pair (Fig. 2 A3, blue) - which are at the edges of the Bias sequence - are significantly less reduced (33%, p=0.0011, 2 group t-test) compared to within the bias (Fig. 2 A3, blue vs. red), relative to the first responses of the bias. The average response was here compared against the unadapted response of the neurons measured via their Shepard tone tuning curve, collected prior to the Bias experiment (see Fig. S1 for some examples). This difference is enhanced if longer sequences are used and the entire non-biased region is measured (see below).
The response strength remains adapted on the order of one second for single cells. The average spontaneous activity recovered with a time constant of 1.2 s (Fig. 2 A4). The initial buildup before the reduction is probably due to offset responses of some cells.
For the human recordings, we obtained quite similar time-courses and qualitative response progressions. The neural response adapted both for individuals (Fig. 2 B1) and on average (Fig. 2 B2), with slightly faster time-constants (0.69s), which could stem from the lower repetition rate (4 Hz compared to 7 Hz) used in the human experiments, potentially leading to less adaptation. To the contrary though, the amount of adaptation under behaving conditions in humans appears to be more substantial (40%) than for the average single units under awake conditions. While this difference could be partly explained by desynchronization which is typically associated with active behavior or attention [32], general response adaptation to repeated stimuli is also typical in behaving humans [33]. However, comparisons between the level of adaptation in MEG and single neuron firing rates may be misleading, due to the differences in the signal measured and subsequent processing.
Similarly to the neuronal data, in the MEG data the responses to probe tones in the same semi-octave (Fig. 2B3, red) are significantly reduced (21%, signed ranks test, p<0.0001) compared to the corresponding response in the opposite semi-octave (blue). This local reduction is not surprising, given that single neurons in the auditory cortex can be well tuned to Shepard tones, with tuning widths of as little as 2-3 semitones (see Fig. S1E). The detailed effect adaptation has on individual cells is studied in detail further below. Note, that the term local here could mean that a neuron is adapted in multiple octaves, but this is collapsed into the Shepard tone space.
In summary, both under awake (ferret) and behaving (humans) conditions, we find that neural responses adapt with similar time courses. The adaptation is local in nature, despite the global, i.e. wide-band, nature of the Shepard tones. Together, this suggests that adaptation may play an important role in explaining the effect the Bias sequence has on perceiving the ambiguous Shepard pair. Below we investigate the specific influence of the Bias on the detailed neural representation using the neuronal recordings from the ferret auditory cortex, which cannot be achieved currently with the human MEG data.
The contextual bias repels the ambiguous pair in pitch
Adaptation can have a variety of effects on the represented stimulus attributes (see [30]): stimulus properties can be attracted, repelled or left unchanged depending on the kind of adaptation. In the present paradigm, one hypothesis to explain the percept would be that the bias attracts subsequent tones, thus reducing the distance along this side of the pitch-circle, e.g. an UP-Bias would reduce an ambiguous 6 semitone step to a non-ambiguous 5 semitone step (Fig. 3). To test this hypothesis, we decoded the represented stimulus using various decoding techniques from the neural population.
In population decoding the goal is to estimate a mapping from neural activity to stimulus properties, which assigns a stimulus to the population response. In the present context, this amounts to predicting the pitch class for a given neural response. Several decoding techniques exist which apply different algorithms and start from different assumptions. We here present a dimensionality reduction technique, based on principal component analysis, however, other techniques gave very similar results (e.g. Stochastic Neighborhood Embedding (tSNE, [34]), or for a population vector decoding, see Fig. S2).
Decoding techniques based on dimensionality reduction attempt to discover a new coordinate system, which accounts for a substantial portion of the variance within much fewer dimensions (Fig. 4A). In other words, they estimate a new representation adapted to the intrinsic geometry of the set of neural responses. In the case of Shepard tones, we predict this geometry to be circular (assuming the neural representation is not degenerate), given the circular nature of the Shepard tones (Fig. 1A). As a circular variable can be represented (embedded) in a 2D Euclidean space, we only consider two dimensions of the decoding, typically the first two, if sorted by explained variance.
The projection of the neural data onto the first two principal components, forms indeed a circular arrangement (Fig. 4B). Each point corresponds to a Shepard tone, with its actual pitch given by its color. The dimensionality reduction was based on the neural responses of 662 neurons to 240 distinct Shepard tones, compiled from the 32 biasing sequences (16 for both sequence lengths, 5 & 10), which covered the octave evenly. The orderly progression of colors indicates that a proximity in stimulus space leads to a proximity in neural response space.
To reassign a pitch class to each point, we estimated a continuous pitch-circle (Fig. 4B, colorful polygon), by computing the local average of each set of 10 adjacent points (w.r.t. actual pitch) and interpolating in between these points. This decoder produces an excellent mapping between actual and estimated pitch classes on the training set (r=0.995, Pearson correlation, Fig. 4C). Based on the responses to the Bias sequences, we have thus constructed a decoder of high accuracy.
Next, we apply the decoder to the responses of the Shepard tones in the ambiguous pair to estimate their represented pitch class, and check whether they are represented at their expected pitch class. We find that their pitch classes are shifted away from the Bias, i.e. tones in the pair that occur above the Bias are shifted further above (Fig. 4B/D/E, △, bright, upward triangles), and vice versa (Fig. 4B/D/E, ▾, dark, downward triangles). This result is highly significant (p=10-41, exact ranks test, MATLAB signrank) and holds for all tested pitch classes in the pair ([0,3,6,9], Fig. 4D). The size of the shift increases with the length of the Bias sequence (Fig. 4E, red: L=5, black: L=10) and decreases with the temporal separation between Bias and ambiguous pair (Fig. 4E, τ=1.1s). This time constant agrees well with the time course of recovery from adaptation (Fig. 2A4). The effect size - ranging up to a total of 0.8 semitones - much greater than the human threshold of ∼0.2 semitones for distinguishing two Shepard tones (internal pilot experiment, data not shown). Practically the same result is obtained using population vector decoding instead (Fig. S2).
Consequently, the presence of the Bias has a repulsive effect on the tones in the pair. Therefore their distance increases (when measured on the Shepard pitch circle) on the side of the pitch circle where the Bias was presented, e.g. for a 6 semitone step an UP-Bias leads to a represented 7 semitone (or correspondingly -5 semitone) step (Fig. 3B). Hence, the population decoding suggests that a decoder based on circular distance in pitch cannot account for the effect of the Bias on the percept, since this would have predicted the distance to shrink on the side of the Bias.
An SSA-like model accounts for repulsion & local adaptation
Before we can propose an alternative decoder for the directionality percept, we devise a basic neural model which is consistent with both the local nature of the adaptation (Fig. 2) and the repulsion in pitch (Fig. 4). This will serve to highlight a few boundary conditions coming from the neural activity and tuning properties that a complete model of the perceptual effect will have to obey. Generally, the model is a 2-layer, tonotopically organized model with spectral integration for both layers and adaptation occurring in the transition from the first to the second layer (see Fig. 5A for a depiction and Methods for more details).
As detailed below, the Bias not only leads to ’local’ adaptation in the sense of ’cells close to the location of the Bias’, but the adaptation even acts locally within the tuning curve of a given cell (Fig. 5D, left). Hence, while global, postsynaptic adaptation (fatigue) has been shown to be sufficient in producing a repulsion from the adaptor in decoded stimuli (see Fig. S3, [30,35]), the local reduction of individual tuning curves actually observed in our data requires us to consider non-global adaptation in the model. Adaptation of this type is not unheard of in the auditory cortex, given that another cortical property of stimulus representation - stimulus specific adaptation (SSA, [12,36] - is likely to rest on similar mechanisms.
The simplest possibility along these lines is very local adaptation, i.e. specific and limited to each stimulus. While this adaptation could account for the local changes in tuning-curves, it would not predict a repulsion to occur in decoding (Fig. S3 A, see Methods for details on the model implementation). Since this adaptation is assumed to be specific for each stimulus, the population activity for this decoded stimulus would simply be scaled, which would leave the mean and all other moments the same (Fig. S3 A3).
A more biological variant is adaptation that is based on the internal, neural representation of the stimulus (Fig. 5A). As the auditory system has non-zero filter-bandwidths, every stimulus elicits a distributed, rather than a perfectly localized activity. At least up to the primary auditory cortex, acoustic stimuli are therefore represented in a distributed manner, in particular in the medial geniculate body (MGB) of the thalamus. If this distribution of activity adapts the corresponding channels locally, the tuning curves of cells in field AI of the primary auditory cortex - which receive forward input from the MGB - will be locally reduced, however, less local than in the point-like representation due to the width of the distribution in its inputs (Fig. S3 C1). On the other hand, decoding of pitches will be repulsive after an adaptor, because cells closer in BF to the adapting stimulus will integrate more adaptation (Fig. 5A top right), and thus contribute less of their stimulus preference to the decoding. This imbalance shifts the average in the decoded pitch further away (Fig. S3 C3). For simplicity, the adaptation here is attributed to the incoming synaptic connections to AI, yet, it could equally be localized at an earlier or multiple levels.
The models discussed above (Fig. S3) were implemented non-dynamically to illustrate the interaction between context and different types of adaptation. Based on the aforementioned considerations, we implemented a dynamical rate model including local adaptation and distributed representation (the latter two as in Fig. S3 C3), which receives the identical stimulus sequences as were presented to the real neurons. The dynamical model provides a quantitative match to the adaptation of single cells and the repulsive representation of Shepard tones after the Bias and further allows us to estimate parameters of the underlying processing (Fig. 5B-E).
First, when subjected to population decoding analysis as for the real data before, the model exhibits a very similar circular representation of the space of Shepard tones (Fig. 5B1) and repulsive shifts in represented pitch class (Fig. 5B2). The basis for this representation is analyzed in the following.
Local adaptation in the tuning of individual cells is retained in the model (Fig. 5C). Individual cells showed adaptation patterns matched in location to the region where the Bias was presented (different colors represent different Bias regions), in comparison to the unbiased tuning curve (gray). Similarly, in the model, the locally implemented adaptation together with the distributed activity in the middle level leads to similarly adapted individual tuning curves.
To study the adaptation in a more standardized way, we computed the pointwise difference between adapted and unadapted tuning curve. These differences were then cocentered with the Bias before averaging (Fig. 5D). Both for the onset (brown) and the sustained response (red), the reduction is highly local, with steep flanks at the boundaries of the Bias for both the model and the actual data. The offset data appears to show some local reduction but not as sharply defined as the other two.
In order to find the relative reduction, the response rates inside (red lines) and outside (green lines) the biased regions are plotted against the corresponding regions in the unbiased case (Fig. 5E). While small firing rates appear to be less influenced, a reduction of ∼40% stabilized for higher firing rates, largely independent of the response bin. This analysis can illustrate dependencies on firing rate and prevents degenerate divisions by small firing rates.
The adaptation encapsulated in the model makes only few assumptions, yet provides a qualitatively matched description of the neural behavior in response to the stimulus sequences (w.r.t. the present level of analysis). The model can now be extended to directional cells, to provide a novel explanation for the directionality percept of the biased Shepard pairs.
The contextual Bias differentially adapts directional cells predicting the percept
The decoding techniques used in the previous sections relied only on the cells’ tunings to the currently presented Shepard tone. However, cells in the auditory cortex also possess preferences for the succession of two (or more) stimuli, e.g. differing in frequency [19,20,37]. We hypothesized that perception of pitch steps could rely on the relative activities of cells preferring ascending and descending steps in frequency, or presently pitch class (Fig. 6A), instead of the difference in decoded pitch class (as in the minimal distance hypothesis, disproven above).
We tested this directional hypothesis by decoding the perceived direction more directly by taking the directional preferences of each cell into account (Fig. 6A). The directional percept is predicted by the population response weighted by each cell’s directionality index (DI) and the distance to the currently presented stimulus (see Methods for details). The DI is computed from the cells’ SSTRFs, which are estimated from the sequences of Shepard tones. This approach avoids estimating receptive fields from stimuli with different statistics. Directional cells have asymmetric spectro-temporal receptive fields (SSTRF, see Fig. 6B for some examples): down-selective cells have SSTRFs with active zones (red) angled down (from past to future, current time being on the left), up-selective cells the opposite. In both cases the SSTRF is dominated by the activity in response to the current stimulus, i.e. the recent past.
The directional decoding successfully predicts the percept for both stimuli in the ambiguous pair. Predictions are performed for each stimulus in the test pair separately. The step direction of the first stimulus is defined based on its relative position to the center of the Bias. For the analysis of the second Shepard tone in the pair, the step is assumed to be perceived on the side of the Bias, i.e. as previously shown in human perception [15]. Predictions (Fig. 6C) for the first tone (red) were generally more reliable than for the second tone (blue), and predictions also improved for both tones with the length of the Bias sequence (5 tones = o, 10 tones = •). This dependence of prediction reliability is consistent with the certainty of judgment in human psychophysics [15]. On average, the prediction was correct in 88% and 95% for the first tone, for a Bias length of 5 and 10 tones, respectively, and 67% and 88% for the second tone, respectively (Fig. 6D). In summary, the Bias seems to influence the relative activities of up- and down-preferring cells differentially above and below the Bias, such that responses from down-preferring cells prevail below the Bias, and up-preferring cells prevail above the Bias, predicting the human percept correctly.
We next investigate in what way the activities of directional cells are modified by the Bias to generate these perception-matched decodings. In the previous sections, we have seen that rather local adaptation occurs during the Bias and modifies the response properties of a cell. How does this adaptation affect a cell that has a directional-preference? To address this question we distinguish the differential response between the Up and Down Bias as a function of a cell’s directionality and pitch class separation from the test tone, i.e. Shepard tones in the ambiguous pair. Hence, the analysis (Fig. 6E/F) plots the difference in response between a preceding Up- and Down-Bias (specifically: Bias-is-locally-above-tone minus Bias-is-locally-below-tone, color scale), as a function of the cells directional selectivity (abscissa) and the cell’s best pitch class location with respect to each tone (ordinate). For the present analysis, the responses to the first and second tone in the pair were analyzed together. For the second tone, a cell thus contributes also to a second relative-to-tone bin (ordinate) at the same directionality, however, with a different set of responses. Also, each cell contributed for each tone in multiple locations, since multiple target tones (4) were tested in the paradigm.
For the neural data, the differential responses exhibit an angled stripe pattern, formed by a positive and a negative stripe (Fig. 6E top). The stripes are connected at the top and bottom ends, due to the circularity of the Shepard space. The pattern of differential responses conforms to the directional hypothesis, if down-cells (left half) are more active than up-cells (right half) close to the pair tones (Fig. 6E, ordinate around 0). This central region was considered here, since these are the cells that will respond most strongly to the tone. For the neural data, this differential activity is significantly dependent on the directionality of the cells (Fig. 6E bottom, ANOVA, p<0.005).
Extending the neuronal model to directionally sensitive cells
In order to better understand the mechanisms shaping this biasing pattern, the same analysis was applied to neural models including different properties (Fig. 6F). The same model as in the previous section was used, with the only difference that the tuning of the cells extended in time to include two stimuli instead of only one, comparable to the SSTRFs of the actual cells (Fig. 6B). To illustrate the effect of adaptation, three models are compared: without adaptation (Fig. 6F left), without directionally tuned cells (Fig. 6F middle), and with adaptation and directionality tuned cells (Fig. 6F right).
Without adaptation, the cells do not show a differential response (Fig. 6F left), since the Bias does not affect the responses in the test pair (Note, that there is a 200 ms pause between the end of the Bias and the test pair, such that the directionality itself cannot explain the pattern of responses). Here, the difference in activity around the current tone is not significant (ANOVA, p=0.5; Fig. 6F left bottom).
Without directional cells, the pattern reflects only the difference in activity generated by the interaction of the Bias with the adaptation. The lack of directional cells limits the pattern to a small range of directionalities, generated by estimation inaccuracy. Hence, the local pattern of differential response around the test tone is not significantly modulated, due to the lack of directional cells to span the range (ANOVA, p=0.7; Fig. 6F middle bottom).
In the model with adapting and directional cells, the pattern resembles the angled double stripe pattern from the neural data. The stripes in the pattern are generated by the adaptation, whereas the directionality of the cells leads to the angle of those stripes. Locally around the test tone, this difference shows a statistically significant dependence on the directionality of the cells (ANOVA, p<0.0001; Fig. 6F right bottom).
The resulting distribution of activities in their relation to the Bias is, hence, symmetric around the Bias (Fig. 6G). Without prior stimulation, the population of cells is unadapted and thus exhibits balanced activity in response to a stimulus. After a sequence of stimuli, the population is partially adapted (Fig. 6G right), such that a subsequent stimulus now elicits an imbalanced activity. Translated concretely to the present paradigm, the Bias will locally adapt cells. The degree of adaptation will be stronger, if their tuning curve overlaps more with the biased region. Adaptation in this region should therefore most strongly influence a cell’s response. For example, if one considers two directional cells, an up- and a down-selective cell, cocentered in the same spectral location below the Bias, then the Bias will more strongly adapt the up-cell, which has its dominant, recent part of the SSTRF more inside the region of the Bias (Fig. 6G right). Consistent with the percept, this imbalance predicts the tone to be perceived as a descending step relative to the Bias. Conversely, for the second stimulus in the pair, located above the Bias, the down-selective cells will be more adapted, thus predicting an ascending step relative to the previous tone.
In summary, taking into account the directional selectivities of the population of cells and local neural adaptation, the changes in the directional percept induced by the Bias sequences can be predicted from the neural data. Specifically, the local adaptation of specifically the directionally selective cells caused by the Bias underlies the imbalance in their responses, and thus is likely to be the underlying mechanism of the biase Shepard tones percept.
Discussion
We have investigated the physiological basis underlying the influence of stimulus history on the perception of pitch-direction, using a bistable acoustic stimulus, pairs of Shepard tones. Stimulus history is found to persist as spectrally-localized adaptation in animal and human recordings, which specifically shapes the activity of direction-selective cells in agreement with the percept. The adaptation’s spectral and temporal properties suggest a common origin with previously described mechanisms, such as stimulus specific adaptation (SSA). Conversely, the typically assumed [13,14], but rarely explicitly discussed, circle-distance hypothesis in Shepard tone judgements is in conflict with the repulsive effect on cortically represented pitch revealed presently using different types of population decoding. While our entire study was based on Shepard tones, we hypothesize that the underlying mechanisms will influence many other stimuli as well, although the perceptual salience will depend on the level of ambiguity.
Relation to previous studies of Shepard tone percepts and their underlying physiology
While context-dependent auditory perception of Shepard tones has been studied previously in humans, we here provide a first account the underlying neurophysiological representation. Previous studies have considered how the stimulus context influences various judgements, e.g. whether subsequent tones influence each other in frequency [38,39], whether a sound is continuous [40] or which of two related phonemes of is perceived [10]. In the present study, we chose directional judgements, due to the fundamental role frequency-modulations play in the perception of natural stimuli and language. We find the preceding stimulus to locally bias directional cells, such that on a population level the first Shepard tone is perceived as a step downward, and the second tone as a step upward for an UP Bias, and conversely for a DOWN Bias. While the present study cannot directly rule out that the local adaptation occurs before the thalamo-cortical junction, both physiology [41] and psychophysical (binaural fusion, [42]) results suggest a location beyond the olivary nuclei.
The success of the directional decoder in linking the cellular activity in A1 to the human percept under the same stimulus conditions is of remarkable accuracy, ∼90%. We would like to emphasize that the use of the directional decoder is equally plausible as the use of a preferred frequency based decoder. In both cases it is assumed that a downstream region in the brain pools the neural responses from A1 assuming either only a frequency preference, or a combination of frequency and direction preference. The elegance of the directional decoder is that it makes a direction connection to well-known directional response characteristics of auditory cortical neurons (see below), while avoiding any mechanisms specific to Shepard tones, such as the computation of the circular distance.
Are these results compatible with the cellular mechanisms that give rise to direction selectivity? The cellular mechanisms underlying the emergence of directional selectivity in the auditory system have been elucidated in recent years using in-vivo intracellular recordings [43–45]. Two mechanisms have been identified in the auditory cortex, (i) excitatory inputs with different timing and spectral location and (ii) excitatory & inhibitory inputs with different spectral location. In both mechanisms local adaptation by prior stimulation would tend to equalize direction selectivity, by diminishing (i) one excitatory channel or (ii) either the inhibitory or the excitatory channel. The observed changes in response properties under local stimulation are thus compatible with the network mechanisms underlying direction selectivity. This makes the prediction that the presentation of FM-sweeps of one direction should bias subsequent perception to the opposite direction. The timescales tested in these studies [43,46] are similar to what was presently used, e.g. Ye et al. (2010) used FM sweeps that lasted for up to 200 ms, which is quite comparable to our SOA of 150 ms. Psychophysical evidence in this respect has been observed previously in terms of threshold shifts of directional perception, which are in agreement with a local bias influencing the directional percept of subsequent stimuli [47,48]. More specific adaptation paradigms are required to resolve some of the more detailed effects e.g. local differences across the octave [48].
Conversely, we could disprove the hypothesis that directional judgements are based on the distance between the tones on the circular Shepard space. Earlier studies on directional judgements of Shepard pairs have - implicitly or explicitly - used the circular nature of the Shepard space to predict the percept [14,48,49], starting from the fundamental work of [13] The original idea was to construct a stimulus with tonal character but ambiguous pitch, and as such it has interesting applications in the study of pitch perception. However, as presently shown, the percept of directionality does not rest on the circular construction. This conclusion is obtained by decoding the represented pitch of the Shepard tones in the context of different biasing sequences. This analysis demonstrated that the biasing sequence exerts a repulsive rather than an attractive effect on the pitch of following stimuli. Repulsive effects of this kind have been widely investigated in the visual literature, in particular the tilt after-effect ([30,31,35,50,51], where exposure to a single oriented grating perceptually repels subsequently presented gratings of similar orientation. Repulsive effects have also been described in the auditory perception [10,38,40], but not in auditory physiology. In conclusion, we find the percept to be inconsistent with the increase in the circular distance in the Shepard tone space.
An interesting approach would be to provide a Bayesian interpretation for the effect of the Bias on the cortical representation. Typically an increase in activity is considered as a representation of the prior occurrence probability of stimuli [52]. Given the local reduction in activity described above, this interpretation would, however, not predict the percept. Alternatively, one could propose to interpret the negative deviation, i.e. local adaptation, as the local magnitude of the prior, which could be consistently interpreted with the percept in this paradigm, as has been proposed before [18]. Recordings from different areas in the auditory cortex might, however, show different characteristics, including a sign inversion.
Relation to other principles of adaptation in audition
Adaptation has been attributed with several functions in sensory processing, ranging from fatigue (adaptation in excitability of spiking), representation of stimulus statistics [21], compensation for stimulus statistics [27], sensitization for novel stimuli [12,53] and sensory memory [22,54]. Adaptation is also present on multiple time-scales, ranging from milliseconds to minutes ([21,22,28,55]. Based on the time-scales of the stimulus and the task-design, the present experiments mainly revealed adaptation in the range of fractions of a second. Adaptation can be global - in the sense that a neuron responds less to all stimuli - or local - in the sense that adaptation is specific to certain, usually the previously presented stimuli, as in SSA [12,56]. Here, adaptation was well confined to the set of stimuli presented before. Hence, the adaptation identified presently is temporally and spectrally well matched to SSA described before. In recent years, the research on SSA has focussed on the aspect of stimulus novelty [57–59], as a potential single-cell correlate of mismatch-negativity (MMN) recorded in human EEG and MEG tasks. While the connection between SSA and MMN appears convincing when it comes to some properties, e.g. stimulus frequency, it appears to not transfer in a similar way to other, still primary properties, such as stimulus level or duration, which elicit robust MMN [60]. The present results reemphasize another putative role of SSA, namely sensory memory. Naturally, adaptation - if it is local - constitutes a ‘negative afterimage’ of the preceding stimulus history. Recent studies in humans suggest a functional role for this adapted state in representing properties of the task. This was recently demonstrated in an auditory delayed match-to-sample task, where a frequency-specific reduction in activity was maintained between the sample and the match ([61], see also [62]). Localized adaptation as described presently provides a likely substrate for such a sensory memory trace.
Future directions
While in human perception task engagement is not necessary to be influenced by the biasing sequence, a natural continuation of the present work would be to record from behaving animals. This would allow us to investigate potential differences in neural activity depending on the activity state, and how individual neurons contribute to the decision on a trial-by-trial basis [63,64]. Furthermore, the current study was limited to the primary auditory cortex of the ferret, but secondary areas as well as parietal and frontal areas could also be involved and should be explored in subsequent research. Switching to mice as an experimental species would allow us to differentiate the roles of different cell types better [65]. On the paradigm level, an extension of the time between the end of the Bias sequence and the test pair would be of particular interest in the active condition, where human research suggests that the Bias can persist for more extended times than suggested by the decay properties of the adaptation in the present data set.
Methods
Experimental Procedures
All animal experiments were performed in accordance with the regulations of the National Institutes of Health and the University of Maryland Institutional Animal Care and Use Committee. All human experiments were performed in accordance with the ethical guidelines of the University of Maryland. We collected single unit recordings from 7 adult, female ferrets (age: 6-12 months, Mustela putorius furo) in the awake condition, MEG recordings from 16 human subjects (9 female, average age: 26y, range 23–34 y) and psychophysical recordings from 10 human subjects (5 female, average age: 28y, range 25–32 y).
Surgical Procedures
A dental cement cap and a headpost were surgically implanted on the animal’s head using sterile procedures, as described previously [66]. Microelectrode arrays (Microprobes Inc., 32-96 channels, 2.5 MOhm, shaft ø=125 μm, various planar layouts with 0.5mm interelectrode spacing) were surgically implanted in the primary auditory cortex AI at a depth of ∼500 μm, for 2 animals sequentially on both hemispheres. A custom-designed, chronic drive system was used in some recordings to change the depth of the electrode array.
Physiology : Stimulation & Recording
Acoustic stimuli were generated at 80 kHz using custom written software in MATLAB (The Mathworks, Natick, USA) and presented via a flat calibrated (within +/-5dB in the range 0.1-32 kHz using the inverse impulse response) sound system (amplifier : Crown D75A; speaker: Manger, flat within 0.08-35 kHz). Animals were head-restrained in a standard position in a tube inside a soundproof chamber (mac3, Industrial Acoustics Corporation). The speaker was positioned centrally above the animal’s head and calibration was performed for the animal head’s position during recordings.
Signals were pre-amplified directly on the head (1x or 2x, Blackrock/TBSI) and further amplified (1000x, Plexon Inc.) and bandpass-filtered (0.1-8000 Hz, Plexon Inc.) before digitization ([-5,5]V, 16 bits, 25 kHz, M-series cards, National Instruments) and storage/display using an open-source DAQ system [67]. Single units were identified using custom written software for spike sorting (for details see [68]). All subsequent analyses were performed in Matlab.
Magnetoencephalography & Psychophysics : Stimulation, Recording and Data Analysis
Acoustic stimuli were generated at 44.1 kHz using custom written software in MATLAB and presented via a flat calibrated (within +/-5 dB in the range 40–3000 Hz) sound system. During MEG experiments, the sound was delivered to the ear via sound tubing (ER-3A, Etymotic), inserted with foam plugs (ER-3-14) into the ear canal, while during psychophysical experiments an over-the-ear headphone (Sony MDR-V700) was used. While the limited calibration range (due to the sound tubing) is not optimal, it still encompasses >6 octaves/constituent tones for every Shepard tone. Sound stimuli were presented at 70 dBSPL. Magnetoencephalographic (MEG) signals were recorded in a magnetically shielded room (Yokogawa Corp.) using a 160 channel, whole-head system (Kanazawa Institute of Technology, Kanazawa, Japan), with the detection coils (ø = 15.5 mm) arranged uniformly (∼25 mm center-to-center spacing) around the top part of the head. Sensors are configured as first-order axial gradiometers with a baseline of 50 mm, with field sensitivities of >5 fT/Hz in the white noise region. Three of the 160 channels were used as reference channels in noise-filtering methods [69]. The magnetic signals were band-passed between 1 Hz and 200 Hz, notch filtered at 60 Hz, and sampled at 1 kHz. Finally, the power spectrum was computed and the amplitude at the target rate of 4 Hz was extracted (as in [70], all magnetic field amplitudes in Fig. 2B represent this measure).
Subjects had to press one of two buttons (controller held in the right hand, away from the sensors) to indicate an ascending or a descending percept. Subjects listened to 120 stimuli in a block, and completed 3 blocks in a session, lasting ∼1 hour.
Acoustic Stimuli
All stimuli were composed of sequences of Shepard tones. A Shepard tone is a complex tone built as the sum of octave-spaced pure-tones. To stimulate a wide range of neurons, we used a flat envelope, i.e. all constituent tones had the same amplitude. Phases of the constituent tones were randomized for each trial, to prevent any single, fixed phase relationship from influencing the pitch percept. Each Shepard tone was gated with 5 ms sinusoidal ramps at the beginning and end.
A Shepard tone can be characterized by its position in an octave, termed pitch class (in units of semitones), w.r.t. a base-tone. In the present study, the Shepard tone based on 440 Hz was assigned pitch class 0. The Shepard tone with pitch class 1 is one semitone higher than pitch class 0 and pitch class 12 is identical to pitch class 0, since all constituent tones are shifted by an octave and range from inaudibly low to inaudibly high frequencies. Hence, the space of Shepard tones is circular (see Fig. 1B). Across the entire set of experiments the duration of the Shepard tones was 0.1 s (neural recordings) / 0.125 s (MEG recordings) and the amplitude 70 dB SPL (at the ear).
We used two different stimulus sequences to probe the neural representation of the ambiguous Shepard pairs and their spectral and temporal tuning properties, (i) the Biased Shepard Pair and (ii) the Biased Shepard Tuning:
i) Biased Shepard Pair In this paradigm, an ambiguous Shepard pair (6 st separation) preceded by a longer sequence of Shepard tones, the Bias (see Fig. 1C). The Bias consists of a sequence of Shepard tones (lengths: 5 and 10 stimuli) which are within 6 semitones above or below the first Shepard tone in the pair. These biases are called ‘up’ and ‘down’ Bias respectively, as they bias the perception of the ambiguous pair to be ‘ascending’ or ‘descending’, respectively, in pitch [15,18]. A pause of different length ([0.05,0.2,0.5] s) was inserted between the Bias and the pair, to study the temporal aspects of the neural representation. Altogether we presented 32 different Bias sequences (4 base pitch classes ([0,3,6,9] st), 2 randomizations (pitch classes and position in sequence), 2 Bias lengths ([5,10] stimuli), ‘up’ and ‘down Bias), which in total contained 240 distinct Shepard tones. Their individual pitch classes in the Bias were drawn randomly and continuously from their respective range. Each stimulus was repeated 10 times. In all subsequent analyses, neural responses are averages over these repetitions, and all analyses are performed on the pooled data from all animals. For the neural data, these 240 different Shepard tones were also used to obtain a ‘Shepard tone tuning’ for individual cells (see Fig. S1). The stimulus described above was presented to both animals and humans. The human psychophysical data were only used to reproduce the previous findings by Chambers et al. (2014) with the current parameters. For the MEG recordings, a variation of the biased Shepard pair stimulus was used, which enabled the separate measurement of the activation state in the biased and the unbiased spectral regions. For this purpose a second sequence of Shepard tones (tone duration: 30 ms; SOA: 250 ms; pitch classes: 3 st above or below the tone of the pair) was inserted between the Bias sequence and the Shepard pair, with the time between the two adapted to include the duration of the sequence (2s) and a pause after the Bias sequence ([0.5,1,2] s).
ii) Biased Shepard Tuning For estimating the changes in the tuning curve of individual neurons, much longer sequences (154 Shepard tones) were presented to a subset of the neurons. The duration and stimulus onset asynchrony was matched to the Bias sequence. The Shepard tones in these sequences were chosen to maintain the influence of the Bias over the entire sequence, while intermittently probing the entire octave of semitones to estimate the overall influence of the Bias on the tuning of neurons. For this purpose, 5/6 (∼83%) of the tones in the sequence were randomly drawn from one of the four Bias regions ([0–5],[3–8],[6–11],[9–2]st), while the 6th tone was randomly drawn from the entire octave, discretized to 24 steps (reminiscent of the studies of [24]). The 6th tone could thus be used to measure each neurons ’Shepard tuning’ at a resolution of 0.5 semitones, adapted to different Bias locations. To avoid onset effects, a lead-in sequence of 15 Bias tones preceded the first tuning estimation tone. Individual stimulus parameters (intensity, durations of tone and interstimulus interval) were chosen as above. Five pseudorandom sequences were presented for each of the four Bias regions, repeated 6 or more times, providing at least 30 repetitions for each location in the tuning curve (Results of these conditions are shown in Fig. 5). A randomly varied pause of ∼5 s separated the trials.
Unit Selection
Overall, we recorded from 1467 neurons across all ferrets, out of which 662 were selected for the decoding analysis based on their driven firing rate (i.e. whether they responded significantly to auditory stimulation) and whether they showed a differential response to different Shepard tones. The thresholds for auditory response and tuning to Shepard tones were not very critical: setting the threshold low led to qualitatively the same result, however, with more noise. Setting the thresholds very high, reduced the set of cells included in the analysis, and eventually made the results less stable, as the cells did not cover the entire range of preferences to Shepard tones.
Response Type Analysis
Whether a cell was adapting or facilitating within the Bias sequence was assessed by averaging the responses across all Bias sequences for a given cell separately. The resulting PSTH was then split up into Onset, Sustained and Offset bins, each 50 ms in time, for each stimulus in the Bias. The sequence of response rates was then fitted with an exponential function, and the direction of the adaptation assessed by comparing the initial rate and the asymptotic rate. If the asymptotic rate exceeded the initial rate, a cell was classified as facilitating, conversely as adapting. The three response bins showed similar proportions of adapting response and were thus averaged to assign a single response type to a given cell, as reported in the Results.
Population Decoding
The represented stimuli in the ambiguous pair were estimated from the neural responses by training a decoder on the biasing sequences and then applying the decoder to the neural response of the pair. We used two different decoders to compare their results, one based on dimensionality reduction (PCA, Principal Component Analysis) and one based on a weighted population-vector, which both gave very similar results (see Fig. 4 and S2). For both decoders, we first built a matrix of responses which had the (240) different Shepard tones occuring in all Bias sequences running along one dimension and the neurons along the other dimension.
The PCA decoder performed a linear dimensionality reduction, utilizing the stimuli as examples and the neurons as dimensions of the representation. The data were projected to the first three dimensions, which represented the pitch class as well as the position in the sequence of stimuli (see Fig. 4A for a schematic). As the position in the Bias sequence was not relevant for the subsequent pitch class decoding, we only focussed on the two dimensions that spanned the pitch circle. A wide range of linear and non-linear dimensionality reduction techniques - e.g. tSNE [34] - was tested leading to very similar results.
The weighted population decoder was computed by assigning each neuron its best pitch class (i.e. pitch class that evoked the highest response) and then evaluating the firing-rate weighted sum of all neurons’ best pitch classes (see Fig. S2A for a schematic). Since the stimulus space is circular, this weighted average was performed in the complex domain, where each neuron was represented by a unit vector in the complex plane, with an angle corresponding to the best pitch class. More precisely, this decoder is simply (omitting indices in the following)
where PCi,bestis the preferred pitch class of a cell and C is the set of pitch classes, fi(S) the firing rate of the neuron i for stimulus S. In the decoding, firing rate is normalized to the maximal firing rate for each cell, and the preferred pitch class for the empirical frequency of occurrence P(PCi,best), to compensate for uneven sampling of preferred pitch classes.
To assign a pitch class to the decoded stimuli of the test pair, we projected them onto the ‘pitch-circle’ formed by the decoded stimuli from the Bias sequences (Fig. 4A/B). More precisely, we first estimated a coarse pitch circle with 24 steps at a resolution of 0.5 st, by averaging over bins of 10 neighboring pitch classes (partitioning the total of 240 Bias tones). Next, a more finely resolved trajectory through the set of Bias-tones at a resolution of 0.05 st was created by linear interpolation. Then, the pitch class of the test tone was set to the pitch class of the closest point on the trajectory.
For the present purpose the decoder was not cross-validated within the Bias sequence data, because its purpose was to provide a reference for the ambiguous pair stimuli, which were not part of the training set.
Neural modeling
We used rate-based models of neural responses in the auditory cortex to investigate the link between the Bias-induced changes in response characteristic and the population decoding results. These are not trivially related, as different kinds of adaptation can lead to different - repulsive or attractive - effects [30,35]. Two types of models were investigated for this purpose:
(i) a non-dynamic tuning model, which serves to investigate generally the effect of different types of adaptation on the represented stimuli. This model is detailed in the Supplementary Methods and results are shown in Fig. S3.
(ii) a dynamic model, which serves to use the insights of the non-dynamic model to account in more detail for the neural data. We used the identical stimulus sequences and analyses as for the real data. The structure of the dynamic model corresponded to non-dynamic model (c) (see Supplementary Methods), i.e. a distributed stimulus representation before cortex and local adaptation in the thalamo-cortical synapses and (see Fig. 5A for a schematic representation of the model). A sampling rate of 20 Hz was used for the simulations to speed up computations. Stimuli were represented as spectrograms - i.e. time-frequency representations - with ’frequency’ being encoded as Shepard tones, i.e. they ranged over one octave and wrapped at the spectral boundaries.
In the mid-level (e.g. MGB) neural representation of the stimulus, each cell’s response was modeled by a peak-normalized von Mises distribution for each time t of the filter, i.e. , where φ denotes the stimulus, φi the mean of the distribution μ denotes the best pitch class and σ the standard deviation, all in semitones. The maximal rate Rmax was arbitrarily set to 1, after normalizing the height to 1 by division via M&+&(,. Hence, the responses on the mid-level Tj(S(t)) of each neuron j were modeled as a weighted average of the spectrogram at time t with the neuron’s tuning curve
On the top-level, corresponding to auditory cortex, the activity of each neuron was modeled as a spectrotemporal filter on the activity of the mid-level representation with local synaptic depression at the synapses
where the SSTRFi(τ, j) is the time-frequency filter for cortical neuron i, weighting the activity of the MGB neurons j at times τ=0…T before the current time. SSTRF stands for Shepard Spectro-Temporal Receptive Field, which is equivalent to a classical STRF [71], just for Shepard tones. The state of synaptic depression between cortical neuron i and thalamic neuron j is given by Ai,j(t). The adaptation was determined by the activity locally present at each synapse and thus led to relatively local changes in the postsynaptic tuning curves. The dynamics of Ai,j(t) are given by
where FA is a constant weighting factor, which scales the amount of adaptation. In both cases the response computed via the SSTRF is weighted with the adaptation coefficients APC/G, and each coefficient recovers by a fraction FR in each step (leading to exponential recovery).
For the final simulations (Fig. 6), the model was extended to contain a subset of directional cells, by extending the dependence of the SSTRF by another 150 ms (3 timesteps at the SR). A directional preference was implemented by adding a von Mises distribution (see above for definition) at the time range 150-250 ms with a peak size of 0.25, roughly matching the observed peak-sizes in the SSTRFs of real directional cells. For downward preferring cells the center of the von Mises was placed relatively higher than the best semitone of the cell, and vice versa for upward preferring cells, in each case wrapping at the edges to account for the circularity of the Shepard tone response. The simulated population of 500 cells was split into one third non-directional cells, one third upward selective cells and one third downward selective cells.
Tuning Curve Adaptation Analysis
We estimated the biased Shepard tunings from the long stimulus sequences (see Acoustic Stimuli: Biased Shepard Tuning) by averaging the test stimuli for each location in the octave (see Fig. 4C, different colors indicate different locations of the Bias sequence). To get an estimate of the unadapted tuning curve, we collected the initial 5 stimuli from each condition and thus constructed a corresponding tuning curve at a resolution of 1st. To evaluate the influence of the Bias, the local difference (Fig. 5D) and fraction (Fig. 5E) between the adapted and the unadapted tuning curve were analyzed. The same analysis was applied to model data generated from the identical stimuli using the same model as above (local adaptation, distributed input on the intermediate level, see Supplementary Methods and Fig. S3C).
Directionality Analysis
We investigated the effect of the Bias sequence on directionally selective cells. For this purpose, each cell’s directional selectivity was estimated from the steps contained in the biasing sequences. SSTRF were approximated by reverse correlating each neuron’s response with the Bias sequences of Shepard tones (using normalized linear regression, three examples are shown in Fig. 6B). The SSTRFs were discretized at 50 ms, and include only the bins during the stimulus, not during the pause.
First, directional preference was assessed by computing the asymmetry in response strength in the second time bin t2, centered on the maximal response in the first time bin, i.e.
Positive values of DI⬚indicate up-ward selective cells and vice versa. The SSTRF was first normalized to the maximal value to obtain comparable values between cells.
Second, a cell’s spectral location relative to the test stimulus was determined by computing the distance between a cell’s SSTRF center-of-mass and the pitch class of the test tone. These first two steps, located a cell on the x- and y-axis of the following analysis (see Fig. 6C/D, top).
Third, the difference in response for identical test stimuli with different preceding Bias locations (relative to each tone in the pair) was computed (‘above’ - ‘below’).
Finally, these differences were averaged for all cells with a given directionality and pitch-class relative to the target tone.
This analysis was also applied for the second test-stimulus, which means that each cell contributes to two locations, separated by the semi-octave distance between the two test-tones, however, the contribution was constituted by different (later) responses of the cell. This analysis was conducted both for the actual neural data, as well as for model data. These modeling results were obtained with the same model as above (local adaptation, distributed input on the intermediate level), although adaptation was set to 0 in one condition to demonstrate its role in generating the asymmetry of responses.
Directional Decoder
The decoder above first estimated the neurally represented pitch class of the stimulus and then evaluated the circular distance between the pitch classes for a given ambiguous pair to predict the percept. This approach implicitly assumes that the neural system organizes the neural representations correspondingly and can compute distances in this way. A more general and direct way of assessing whether a given stimulus is relatively higher or lower in pitch than a preceding stimulus may be to integrate the responses of neurons with regards to their directional preference (see previous section). We refer to this as the directional hypothesis. Quantitatively, this approach for decoding simply takes estimated direction selectivity DI of each cell, and weighs it by its activity, and then sums across all cells. Analogously to decoding based on preferred pitch class, it thus assumes that a downstream decoder in the brain ’knows’ about one or multiple characteristic properties of the cell (e.g. spectral and/or direction selectivity) and combines the activity of many cells in a weighted manner to arrive at a single estimate. We assume that this directionality is evaluated at the location of the current stimulus, i.e. the contribution of each cell is therefore weighted by the distance in preferred pitch class to the pitch class of the currently presented stimulus (see Fig. 6A for the mathematical structure of the decoding).
Statistical Analysis
Non-parametric tests were used throughout the study to avoid assumptions regarding distributional shape. Single group medians were assessed with the Wilcoxon signed rank test, two group median comparisons with the Mann-Whitney U-test, multiple groups with the Kruskal-Wallis (one-way) and Friedman test (two-way), with post-hoc testing performed using Bonferroni-correction of p-values.ll tests are implemented in the Matlab Statistics Toolbox (The Mathworks, Natick).
Acknowledgements
The authors would like to thank Barak Shechter, John Rinzel and Romain Brette for interesting discussions and comments on the manuscript. Funding information: European Research Council (Neume to SS); National Institutes of Health (to MH and SS: U01 AG058532). BE acknowledges funding from an NWO VIDI grant (016.VIDI.189.052) and a NWO ALW Open (ALWOP.146).
Supplementary Materials
Tuning Halfwidth
A neuron’s tuning halfwidth with respect to Shepard tones was estimated using the range of Shepard tones that the firing rate was above f50% = (fMax - fMin)/2. We used a conservative estimation method by determining fMin and then computing the range between the closest crossing of f50% above and below fMin. In this way, neurons with a small difference between fMax and fMin were assigned comparatively large tuning halfwidths, corresponding to their less salient tuning.
Non-dynamic neural models
In the non-dynamic model each neuron is represented as a von Mises distribution
with two parameters, best pitch class φi and standard deviation σi, both measured in semitones, and Mtotal, normalizing the response to an area of 1. We simulate the response of a population of N=100 cortical neurons with φi equally spaced within [0,12] st. The models were run at the same sampling rate (20Hz) as the data analysis for consistency.
The influence of the Bias is modeled assuming an idealized, continuous range of Biases, rather than individual tones. We consider three different models of adaptation: (a) local adaptation (Fig. S3 A) (b) global adaptation (Fig. S3 B), and (c) local adaptation with spreaded representation (Fig. S3 C):
a) Local adaptation refers to a multiplicative reduction of responses to individual stimuli, based on the local, recent stimulus history. The amount of local adaptation is taken as the prominence of this stimulus in the recent history, i.e.
where SBias(φ) is defined as a function over [0,12] st taking values in [0,1]. A5is the maximal fraction of adaptation, set to 0.8 in Fig.S3. The cells adapted/biased response to a single Shepard tone is then given by
In the more general case of a complex stimulus S, one would replace Mi(S) with Mi(S) ∗ S, i.e. the convolution of response and stimulus distribution.
This form of local adaptation resembles a highly stimulus specific version of adaptation. Hence, the responses are adapted only to previously presented stimuli, but no transfer to other stimuli occurs (see Fig. S3A). This type of local adaptation leads to no adaptation, since neurons uniformly reduce their response to the test stimulus, which keeps the mean of the population response the same.
b) Global adaptation refers to a multiplicative reduction of the entire tuning curve, based on the recent response history, irrespective of which stimulus caused it. The amount of global adaptation is computed as the correlation between a cell’s tuning curve and the stimulus history SBias(PC), i.e.
where * denotes convolution. By the normalization of both SBias and Ri, Ai will also be normalized within [0,A0]. The cells biased tuning curve to a single Shepard tone is then given by
Global adaptation in this sense captures summarized adaptation effects that occur ‘globally’ for the postsynaptic cell (e.g. a change in excitability which changes the slope of the IF-curve) (see Fig. S3 B). Global adaptation shifts subsequent stimuli away from the adapting stimulus, since neurons close to the adaptor adapt more strongly (Series et al. 2009).
c) Local adaptation with input spread combines local adaptation with a distributed neural representation of pointlike stimuli (like a single tone or single Shepard tone), i.e. the stimulus is first represented on an intermediate level (e.g. the MGB) and then integrated on the cortical level, with adaptation occurring locally at the synapses connecting MGB-AI (see Fig. 5A for an illustration of the architecture). Concretely, the cortical response is given as
Where the intermediate representation Tj(S)is given as
i.e. the convolution of the stimulus with the intermediate response properties Mj(φ) assumed to also be given by von Mises distributions as well. The adaptation Ai(φ) is equated with the midlevel activity induced by the bias
This form of local adaptation is ‘less local’ than the purely local adaptation described above. Hence, a presentation of a given stimulus will adapt the neural response not only to this stimulus, but - via the distributed representation - also for neighboring stimuli (see Fig. S3 C), which in the decoding leads to repulsive shifts, while reducing tuning curves locally. The resulting shape of the adaptation has been described as a shift in tuning curve combined with a global adaptation (Jin et al. 2005). We propose that the adaptation proposed here provides a simpler explanation for this observed shape of tuning curve change.
Note that the differences in decoding emerge only at the boundaries of the Bias region, depicted by the encoding-decoding matrices in Fig. S3 A3/B3/C3. If the distribution at the vertical line (at 0) has more weight above 0 on the abscissa, this corresponds to a repulsive shift.
In summary, purely local adaptation can account for the local changes in Shepard tunings of the real data, but fails to explain the repulsive decoding (Fig. S3A). Global adaptation is consistent with the repulsive decoding results, but fails to explain the local tuning curve changes (Fig. S3B). The combination of local adaptation and distributed input on the intermediate level (Fig.S3C, Fig.4) is consistent with both the encoding and decoding findings.
References
- 1.Efficient coding of natural soundsNat Neurosci 5:356–363https://doi.org/10.1038/nn831
- 2.Efficient auditory codingNature 439:978–982https://doi.org/10.1038/nature04485
- 3.Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effectsNat Neurosci 2:79–87https://doi.org/10.1038/4580
- 4.Bayesian Computation through Cortical Latent DynamicsNeuron 103:934–947https://doi.org/10.1016/j.neuron.2019.06.012
- 5.The effect of prediction accuracy on choice reaction timeMem Cognit 17:503–508https://doi.org/10.3758/bf03202624
- 6.Temporal predictability enhances auditory detectionJ Acoust Soc Am 135:EL357–EL363https://doi.org/10.1121/1.4879667
- 7.Bayesian inference and feedback in speech recognitionLang Cogn Neurosci 31:4–18https://doi.org/10.1080/23273798.2015.1081703
- 8.Diverse effects of stimulus history in waking mouse auditory cortexJ Neurophysiol 118:1376–1393https://doi.org/10.1152/jn.00094.2017
- 9.Auditory Scene Analysis: The Perceptual Organization of SoundMIT Press
- 10.Temporally nonadjacent nonlinguistic sounds affect speech categorizationPsychol Sci 16:305–312https://doi.org/10.1111/j.0956-7976.2005.01532.x
- 11.Putting phonetic context effects into context: a commentary on Fowler (2006)Percept Psychophys 68:178–183https://doi.org/10.3758/bf03193667
- 12.Processing of low-probability sounds by cortical neuronsNat Neurosci 6:391–398https://doi.org/10.1038/nn1032
- 13.Circularity in Judgments of Relative PitchJ Acoust Soc Am 36https://doi.org/10.1121/1.1919362
- 14.Spectral envelope and context effects in the tritone paradoxPerception 26:645–665https://doi.org/10.1068/p260645
- 15.Perceptual hysteresis in the judgment of auditory pitch shiftAtten Percept Psychophys 76:1271–1279https://doi.org/10.3758/s13414-014-0676-5
- 16.On the use of pitch-based features for fear emotion detection from speech4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP). IEEE :1–6https://doi.org/10.1109/ATSIP.2018.8364512
- 17.Investigating audiovisual integration of emotional signals in the human brainUnderstanding Emotions. Elsevier :345–361https://doi.org/10.1016/S0079-6123(06)56019-4
- 18.Prior context in audition informs binding and shapes simple featuresNat Commun 8https://doi.org/10.1038/ncomms15027
- 19.Sequence sensitivity of neurons in cat primary auditory cortexCereb Cortex 10:1155–1167https://doi.org/10.1093/cercor/10.12.1155
- 20.Processing of sound sequences in macaque auditory cortex: response enhancementJ Neurophysiol 82:1542–1559https://doi.org/10.1152/jn.1999.82.3.1542
- 21.Efficiency and ambiguity in an adaptive neural codeNature 412:787–792https://doi.org/10.1038/35090500
- 22.Multiple time scales of adaptation in auditory cortex neuronsJ Neurosci 24:10440–10453https://doi.org/10.1523/JNEUROSCI.1905-04.2004
- 23.Visual adaptation: neural, psychological and computational aspectsVision Res 47:3125–3131https://doi.org/10.1016/j.visres.2007.08.023
- 24.Neural population coding of sound level adapts to stimulus statisticsNat Neurosci 8:1684–1689https://doi.org/10.1038/nn1541
- 25.Rapid neural adaptation to sound level statisticsJ Neurosci 28:6430–6438https://doi.org/10.1523/JNEUROSCI.0470-08.2008
- 26.Time course of dynamic range adaptation in the auditory nerveJ Neurophysiol 108:69–82https://doi.org/10.1152/jn.00055.2012
- 27.Adaptation maintains population homeostasis in primary visual cortexNat Neurosci 16:724–729https://doi.org/10.1038/nn.3382
- 28.Response adaptation to broadband sounds in primary auditory cortex of the awake ferretHear Res 221:91–103https://doi.org/10.1016/j.heares.2006.08.002
- 29.Contrast gain control in auditory cortexNeuron 70:1178–1191https://doi.org/10.1016/j.neuron.2011.04.030
- 30.Is the homunculus “aware” of sensory adaptation?Neural Comput 21:3271–3304https://doi.org/10.1162/neco.2009.09-08-869
- 31.Adaptation changes the direction tuning of macaque MT neuronsNat Neurosci 7:764–772https://doi.org/10.1038/nn1267
- 32.The asynchronous state’s relation to large-scale potentials in cortexJ Neurophysiol 122:2206–2219https://doi.org/10.1152/jn.00013.2019
- 33.Stimulus-specific adaptation: can it be a neural correlate of behavioral habituation?J Neurosci 31:17811–17820https://doi.org/10.1523/JNEUROSCI.4790-11.2011
- 34.Hinton GEVisualizing High-Dimensional Data using t-SNE :2579–2605
- 35.Tilt aftereffect and adaptation-induced changes in orientation tuning in visual cortexJ Neurophysiol 94:4038–4050https://doi.org/10.1152/jn.00571.2004
- 36.Neurons along the auditory pathway exhibit a hierarchical organization of prediction errorNat Commun 8https://doi.org/10.1038/s41467-017-02038-6
- 37.Time course of forward masking tuning curves in cat primary auditory cortexJ Neurophysiol 77:923–943https://doi.org/10.1152/jn.1997.77.2.923
- 38.Changes in frequency discrimination caused by leading and trailing tonesJ Acoust Soc Am 51:1947–1950https://doi.org/10.1121/1.1913054
- 39.How recent history affects perception: the normative approach and its heuristic approximationPLoS Comput Biol 8https://doi.org/10.1371/journal.pcbi.1002731
- 40.Recalibration of the auditory continuity illusion: sensory and decisional effectsHear Res 277:152–162https://doi.org/10.1016/j.heares.2011.01.013
- 41.Synaptic mechanisms of forward suppression in rat auditory cortexNeuron 47:437–445https://doi.org/10.1016/j.neuron.2005.06.009
- 42.Paradoxes of musical pitchSci Am 267:88–95
- 43.Synaptic mechanisms of direction selectivity in primary auditory cortexJ Neurosci 30:1861–1868https://doi.org/10.1523/JNEUROSCI.3088-09.2010
- 44.Topography and synaptic shaping of direction selectivity in primary auditory cortexNature 424:201–205https://doi.org/10.1038/nature01796
- 45.The generation of direction selectivity in the auditory systemNeuron 73:1016–1027https://doi.org/10.1016/j.neuron.2011.11.035
- 46.Organization of response areas in ferret primary auditory cortexJ Neurophysiol 69:367–383https://doi.org/10.1152/jn.1993.69.2.367
- 47.Evidence for direction-specific channels in the processing of frequency modulationJ Acoust Soc Am 66:704–709https://doi.org/10.1121/1.383220
- 48.Spectral-motion aftereffects and the tritone paradox among Canadian subjectsPercept Psychophys 60:209–220
- 49.A Musical ParadoxMusic Percept 3:275–280https://doi.org/10.2307/40285337
- 50.Adaptation, after-effect and contrast in the perception of tilted lines. I. Quantitative studiesJ Exp Psychol 20:453–467https://doi.org/10.1037/h0059826
- 51.Visual adaptation: physiology, mechanisms, and functional benefitsJ Neurophysiol 97:3155–3164https://doi.org/10.1152/jn.00086.2007
- 52.Fast population codingNeural Comput 19:404–441https://doi.org/10.1162/neco.2007.19.2.404
- 53.Novelty detector neurons in the mammalian auditory midbrainEur J Neurosci 22:2879–2885https://doi.org/10.1111/j.1460-9568.2005.04472.x
- 54.Short-term plasticity in auditory cognitionTrends Neurosci 30:653–661https://doi.org/10.1016/j.tins.2007.09.003
- 55.Spectrotemporal receptive fields in anesthetized cat primary auditory cortex are context dependentCereb Cortex 19:1448–1461https://doi.org/10.1093/cercor/bhn184
- 56.Habituation produces frequency-specific plasticity of receptive fields in the auditory cortexBehav Neurosci 105:416–430
- 57.Mismatch Negativity and Stimulus-Specific Adaptation in Animal ModelsJ Psychophysiol 21:214–223https://doi.org/10.1027/0269-8803.21.34.214
- 58.Correlating stimulus-specific adaptation of cortical neurons and local field potentials in the awake ratJ Neurosci 29:13837–13849https://doi.org/10.1523/JNEUROSCI.3475-09.2009
- 59.Stimulus-specific adaptation in the gerbil primary auditory thalamus is the result of a fast frequency-specific habituation and is regulated by the corticofugal systemJ Neurosci 31:9708–9722https://doi.org/10.1523/JNEUROSCI.5814-10.2011
- 60.Stimulus-specific adaptation in auditory cortex is an NMDA-independent process distinct from the sensory novelty encoded by the mismatch negativityJ Neurosci 30:16475–16484https://doi.org/10.1523/JNEUROSCI.2793-10.2010
- 61.Stimulus-specific suppression preserves information in auditory short-term memoryProc Natl Acad Sci USA 108:12961–12966https://doi.org/10.1073/pnas.1102118108
- 62.Modulation of auditory cortex activation by sound presentation rate and attentionHum Brain Mapp 26:94–99https://doi.org/10.1002/hbm.20123
- 63.Perceptual Decision-Making as Probabilistic Inference by Neural SamplingNeuron 90:649–660https://doi.org/10.1016/j.neuron.2016.03.020
- 64.Inferring decoding strategies from choice probabilities in the presence of correlated variabilityNat Neurosci 16:235–242https://doi.org/10.1038/nn.3309
- 65.Cortical Interneurons Differentially Shape Frequency Tuning following AdaptationCell Rep 21:878–890https://doi.org/10.1016/j.celrep.2017.10.012
- 66.Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortexNat Neurosci 6:1216–1223https://doi.org/10.1038/nn1141
- 67.MANTA--an open-source, high density electrophysiology recording suite for MATLABFront Neural Circuits 7https://doi.org/10.3389/fncir.2013.00069
- 68.Reliability of synaptic transmission at the synapses of Held in vivo under acoustic stimulationPLoS ONE 4https://doi.org/10.1371/journal.pone.0007014
- 69.Denoising based on time-shift PCAJ Neurosci Methods 165:297–305https://doi.org/10.1016/j.jneumeth.2007.06.003
- 70.Interaction between attention and bottom-up saliency mediates the representation of foreground and background in an auditory scenePLoS Biol 7https://doi.org/10.1371/journal.pbio.1000129
- 71.Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortexJ Neurophysiol 85:1220–1234https://doi.org/10.1152/jn.2001.85.3.1220
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
- Reviewed Preprint version 2:
Copyright
© 2024, Englitz et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 308
- downloads
- 5
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.