Hemodynamic activity reflects encoding of foregrounds and backgrounds.

(A) Stationarity for foregrounds (squares) and backgrounds (diamonds). (B) Sound presentation paradigm, with example cochleagrams. We created continuous streams by concatenating 9.6-second foreground (cold colors) and background segments (warm colors) following the illustrated design. Each foreground (resp. background) stream was presented in isolation and with two different background (resp. foreground) streams. (C) We measured cerebral blood volume (CBV) in coronal slices (blue plane) of the ferret auditory cortex (black outline) with functional ultrasound imaging. We imaged the whole auditory through successive slices across several days. (D) Average changes in CBV (normalized to silent baseline) in auditory cortex aligned to sound changes, for different conditions, averaged across all ferrets. Shaded area represents standard error of the mean across sound segments. (E) Test-retest cross-correlation for each condition. Voxel responses for two repeats of sounds are correlated with different lags. Resulting matrices are then averaged across all responsive voxels (ΔCBV > 2.5%).

Invariance to background sounds is hierarchically organized in ferret auditory cortex.

(A) Map of average response for an example hemisphere (ferret L). Responses are expressed in percent changes in CBV relative to baseline activity, measured in periods of silence. Values are averaged across depth to obtain this surface view of auditory cortex. (B) Map of test-retest reliability. In the following maps, only reliably responding voxels are displayed (test-retest > 0.3 for at least one category of sounds) and the transparency of surface bins in the maps is determined by the number of (reliable) voxels included in the average. (C) Map of considered regions of interest (ROIs), based on anatomical landmarks. The arrows indicate the example slices shown in D (orange: primary; green: non-primary example). (D) Responses to isolated and combined foregrounds. Bottom: Responses to mixtures and foregrounds in isolation for example voxels (left:primary; right:non-primary). Each dot represents the voxel’s time-averaged response to every foreground (x-axis) and mixture (y-axis), averaged across two repetitions. r indicates the value of the Pearson correlation. Top: Maps show invariance, defined as noise-corrected correlation between mixtures and foregrounds in isolation, for the example voxel’s slice with values overlaid on anatomical images representing baseline CBV. Example voxels are shown with white squares. (E) Map of background invariance for the same hemisphere (see Figure S2 for other ferrets). (F) Quantification of background invariance for each ROI. The crosses (+) indicate median values across all voxels of each ROI, across animals. Grey dots represent median values across primary (MEG) and non-primary (dPEG + VP) voxels for each animal. The size of each dot is proportional to the number of voxels across which the median is taken. The thicker line corresponds to the example ferret L. ***: p <= 0.001 for comparing the average background invariance across animals for pairs of ROIs, obtained by a permutation test of voxel ROI labels within each animal. (G-I) Same as D-F for foreground invariance (comparing mixtures to backgrounds in isolation). AEG, anterior ectosylvian gyrus; MEG, medial ectosylvian gyrus; dPEG, dorsal posterior ectosylvian gyrus; VP, ventral posterior auditory field.

Simple spectrotemporal tuning explains spatial organization of background invariance.

(A) Presentation of the two-stage filter-bank, or spectrotemporal model. Cochleagrams (shown for an example foreground and background) are convolved through a bank of spectrotemporal modulation filters. (B) Energy of foregrounds and backgrounds in spectrotemporal modulation space, averaged across all frequency bins. (C) Average difference of energy between foregrounds and backgrounds in the full acoustic feature space (frequency * temporal modulation * spectral modulation). (D) We predicted time-averaged voxel responses using sound features derived from the spectrotemporal model presented in A with ridge regression. For each voxel, we thus obtain a set of weights for frequency and spectrotemporal modulation features, as well as cross-validated predicted responses to all sounds. (E) Average model weights for MEG. (F) Maps of preferred frequency, temporal and spectral modulation based on the fit model. To calculate the preferred value for each feature, we marginalized the weight matrix over the two other dimensions. (G) Average differences of weights between voxels of each non-primary (dPEG and VP) and primary (MEG) region. (H) Background invariance (left) and foreground invariance (right) for voxels tuned to low (< 8 Hz) or high (> 8 Hz) temporal modulation rates within each ROI. ***: p <= 0.001 for comparing the average background invariance across animals for voxels tuned to low vs. high rates, obtained by a permutation test of tuning within each animal.

A model of auditory processing predicts hierarchical differences in ferret auditory cortex.

Same as in figure 2 using cross-validated predictions from the spectrotemporal model. (A) Predicted responses to mixtures and foregrounds in isolation for example voxels (primary, left and non-primary, right). Each dot represents the voxel’s predicted response to foregrounds (x-axis) and mixtures (y-axis). r indicates the value of the Pearson correlation. Maps above show predicted invariance values for the example voxel’s slice overlaid on anatomical images representing baseline CBV. Example voxels are shown with white squares. (B) Maps of predicted background invariance, defined as the correlation between predicted responses to mixtures and foregrounds in isolation. (C) Binned scatter plot representing predicted vs. measured background invariance across voxels. Each line corresponds to one animal, using 0.1 bins of measured invariance. (D) Predicted background invariance for each ROI. The crosses (+) indicate median value across all voxels of each ROI, across animals. Grey dots represent median values across primary (MEG) and non-primary (dPEG + VP) voxels for each animal. The size of each dot is proportional to the number of voxels across which the median is done. The thicker line corresponds to example ferret L. *: p <= 0.05; ***: p <= 0.001 for comparing the average predicted background invariance across animals for pairs of ROIs, obtained by a permutation test of voxel ROI labels within each animal. (E-H) Same as A-D for predicted foreground invariance, i.e. comparing predicted responses to mixtures and backgrounds in isolation.

The spectrotemporal model is a poor predictor of human background invariance.

(A) We replicated our analyses with a dataset of a similar experiment measuring fMRI responses in human auditory cortex (Kell & McDermott, 2019). We compared responses in primary and non-primary auditory cortex, as delineated in Kell & McDermott (2019). (B) Responses to mixtures and foregrounds in isolation for example voxels (left: primary; right: non-primary). Each dot represents the voxel’s response to foregrounds (x-axis) and mixtures (y-axis), averaged across repetitions. r indicates the value of the Pearson correlation. (C-D) Quantification of background invariance for each ROI, for measured responses (C) and responses predicted from the spectrotemporal model (D). The crosses (+) indicate median value across all voxels of each ROI, across subjects. Grey dots represent median values for each ROI and subject. The size of each dot is proportional to the number of (reliable) voxels across which the median is done. (E) Binned scatter plot representing predicted vs measured background invariance across voxels. Each line corresponds to one subject, using 0.1 bins of measured invariance. (F-I) Same as B-E for foreground invariance, i.e. comparing predicted responses to mixtures and backgrounds in isolation.