Modeling Approach.

Participants were presented with an audiovisual movie. To create a design matrix for our modeling, we extracted the audio track of this movie and decomposed it into a spectrogram that represents the power spectral density of the audio signal. The spectrogram for a 5 minute segment of a movie is shown. A predicted timecourse is generated by taking the dot product of the spectrogram and a Gaussian function defined over log frequency (Hz), parameterised by its peak position (μ) and size (σ). Subsequently a static nonlinearity (n) is applied to the timecourse before convolution with a hemodynamic response function. Representative model fits, as well as the effect of varying each parameter on model predictions are depicted in Supplementary Figure S1.

A) Shows preferred frequency (μ) of the tonotopic population of vertices that were designated for further analysis (see Methods) rendered onto flattened and semi-inflated representations of the cortical surface. The 2D colormap depicts μ across the x axis and generalisation performance (Xval R2) defined by transparency across the y axis. B) Depicts zoomed-in views of the data from A. C) Depicts the difference in out-of-sample performance (Xval R2 diff) between the speech-selective and CSS models. D) Depicts category preferences (speech v nonspeech). E) Depicts the μ estimated by the CSS model weighted by model improvement offered by the speech-selective model. Primarily low-frequencies near STG are visible, indicating that speech selectivity is observed at low-frequency portions of non-primary tonotopic maps

Panels A-D depict functional parameters of the data derived from the CSS and speech selective models. Panels E-H depict averaged statistics for myelo-architectural and structural parameters. I) Depicts the feature loadings onto each of the 3 principal components. J) Shows the 3 components overlaid in RGB color space, as depicted by the color-wheel. Panels K-M depict the individual components.

Panel A (left to right) shows the ROI definitions, with ROI color corresponding to the colours of the violin /marginal density plots in the remaining figures (left), the μ data (middle) and the linear fit to the μ data per ROI (right) for the left hemisphere. Panel B shows the same for the right hemisphere. Panels C-F show the per ROI distributions for out-of-sample variance explained by the CSS model (C), speech selectivity (D), FWHM (E) and n (F). Violins are normalised to have equal maximum width. The 5th, 50th and 95th percentiles of the distributions are demarcated by white horizontal lines. Panels G-J depict the between-ROI pairwise differences for each of the corresponding parameters in panels C-F. The values in each cell are signed log10 transformed p values. Larger values for a parameter in a reference ROI (y axis) compared to a comparison ROI (x axis) are assigned positive values (p values are -log10 transformed). Smaller values in the reference ROI are assigned negative values (p values are log10 transformed). p values that do not reach the false discovery rate corrected alpha level of .05 are marked by a cross in the corresponding cell. Panel K shows hexbin plots for the relationship between μ and FWHM in each ROI. The cells at the bottom left of each plot depict the same signed log10 transformed p values (now with positive values signifying positive relationships). Panel L shows hexbin plots for the relationship between μ and speech selectivity. The corresponding colorbar for all p values can be found at the bottom of the figure.