Concepts, Stimuli and Recordings. (A) Conceptual definition of selective, preferred and shared neural processes.

Illustration of the continua between speech and music selectivity, speech and music preference and shared resources. “Selective” responses are neural responses significant for one domain but not the other, and with a significant difference between domains (for speech top-left; for music bottom-right). “Preferred” responses correspond to neural responses that occur during both speech and music processing, but with a significantly stronger response for one domain over the other (striped triangles). Finally, “shared” responses occur when there are no significant differences between domains, and there is a significant neural response to at least one of the two stimuli (visible along the diagonal). If neither domain produces a significant neural response, the difference is not assessed (lower left square). (B) Stimuli. Modulation spectrum of the acoustic temporal envelope of the continuous, 10-minutes long speech and music stimuli. (C) Anatomical localization of the sEEG electrodes for each patient (N=18). (D) Anatomical localization of the sEEG electrodes for each anatomical region. Abbreviations according to the Brainnetome Atlas (Fan et al., 2016).

Power spectrum analyses of activations (speech or music > tones).

(A)-(F) Neural responses to speech and/or music for the 6 canonical frequency bands. Only significant activations compared to the baseline condition (pure tones listening) are reported (see Figure S8 for non-thresholded results). Nested pie charts indicate: (1) in the center, the percentage of channels that showed a significant response to speech and/or music. (2) The outer pie indicates the percentage of channels, relative to the center, classified as shared (white), selective (light gray), and preferred (dark gray). (3) The inner pie indicates, for the selective (plain) and preferred (pattern) categories, the proportion of channels that were (more) responsive to speech (red) or music (blue). Brain plots indicate: Distribution of shared (white) and selective (red/blue) sEEG channels projected on the brain surface. Results are significant at α < 0.01 (N=18).

Power spectrum analyses of deactivations (speech or music < tones).

(A)-(F) Neural responses to speech and/or music for the 6 canonical frequency bands. Only significant deactivations compared to the baseline condition (pure tones listening) are reported. Same conventions as in Figure 2. Results are significant at α < 0.01 (N=18).

Cross-frequency channel selectivity for the power spectrum analyses.

Percentage of channels that selectively respond to speech (red) or music (blue) across different frequency bands. For each plot, the first (leftmost) value corresponds to the percentage (%) of channels displaying a selective response in a specific frequency band (either activation or deactivation, compared to the baseline condition of pure tones listening). In the next value, we remove the channels that are significantly responsive in the other domain for the following frequency band (e.g. in panel A: speech selective in delta; speech selective in delta XOR music responsive in theta; speech selective in delta XOR music responsive in theta XOR music responsive in alpha; and so forth). The black dots at the bottom of the graph indicate which frequency bands were successively included in the analysis. Note that channels remaining selective across frequency bands did not necessarily respond selectively in every band. They simply never showed a significant response to the other domain in the other bands.

Population prevalence for the power spectral analyses of activations (speech or music > tones; N = 18).

(A)-(F) Population prevalence of shared or selective responses for the 6 canonical frequency bands, per anatomical region (note that preferred responses are excluded). Only significant activations compared to the baseline condition (pure tones listening) are reported. Regions (on the x-axis) were included in the analyses if they had been explored by minimally 2 patients with minimally 2 significant channels. Patients were considered to show regional selective processing when all their channels in a given region responded selectively to either speech (red) or music (blue). When regions contained a combination of channels with speech selective, music selective or shared responses, the patient was considered to show shared (white) processing in this region. The height of the lollipop (y-axis) indicates the percentage of patients over the total number of explored patients in that given region. The size of the lollipop indicates the number of patients. As an example, in panel F (HFa band), most lollipops are white with a height of 100%, indicating that, in these regions, all patients presented a shared response profile. However, in the left inferior parietal lobule (IPL, Left) one patient (out of the 7 explored) shows speech selective processing (filled red circle). A fully selective region would thus show a fixed-color full height across all frequency bands. Abbreviations according to the Brainnetome Atlas (Fan et al., 2016).

Population prevalence for the power spectral analyses of deactivations (speech or music < tones; N = 18).

(A)-(F) Population prevalence of shared or selective responses for the 6 canonical frequency bands, per anatomical region. Only significant deactivations compared to the baseline condition (pure tones listening) are reported. Same conventions as in Figure 5.

Temporal Response Function (TRF) analyses.

(A) Model comparison. On the top left a toy model illustrates the use of Venn diagrams comparing the winning model (peakRate in LF) to each of the three other models for speech and music (pooled). Four TRF models were investigated to quantify the encoding of the instantaneous envelope and the discrete acoustic onset edges (peakRate) by either the low frequency band (LF) or the high frequency amplitude (HF). The ‘peakRate & LF’ model significantly captures the largest proportion of channels, and is, therefore, considered the winning model. The percentages on the top (in bold) indicate the percentage of total channels for which a significant encoding was observed during speech and/or music listening in either of the two compared models. In the Venn diagram, we indicate, out of all significant channels, the percentage that responded in the winning model (left) or in the alternative model (right). The middle part indicates the percentage of channels shared between the winning and the alternative model (percentage not shown). Q-values indicate pairwise model comparisons (Wilcoxon sign rank test, FDR-corrected). (B) and (C) peakRate & LF model: Spatial distribution of sEEG channels wherein low frequency (LF) neural activity significantly encodes the speech (red) and music (blue) peakRates. Higher prediction accuracy (Pearson’s r) is indicated by increasing color saturation. All results are significant at q < 0.01 (N=18). (D) Anatomical localization of the best encoding channel within the left hemisphere for each patient (N=15), as estimated by the ‘peakRate and LF’ model (averaged across speech and music). These channels are all within the auditory cortex and serve as seeds for subsequent connectivity analyses. (E) TRFs averaged across the seed channels (N=15), for speech and music. (F) Prediction accuracy (Pearson’s r) of the neural activity of each seed channel, for speech and music.

Seed-based functional connectivity analyses.

(A)-(F) Significant coherence responses to speech and/or music for the 6 canonical frequency bands (see Figure S8 for non-thresholded results). The seed was located in the left auditory cortex (see Figure 7D). Same conventions as in Figure 2, except for the center percentage in the nested pie charts which, here, reflects the percentage of channels significantly connected to the seed. Results are significant at α < 0.01 (N=15).

Cross-frequency channel selectivity for the connectivity analyses.

Percentage of channels that showed selective coherence with the primary auditory cortex in speech (red) or music (blue) across different frequency bands. Same conventions as in Figure 4.

Population prevalence for the connectivity analyses (N=15).

Same conventions as in Figure 5.

Power spectrum analyses of activations (speech or music > syllables).

(A)-(F) Neural responses to speech and/or music for the 6 canonical frequency bands. Only significant activations compared to the baseline condition (syllables listening) are reported. Same conventions as in Figure 2. Results are significant at α < 0.01 (N=18).

Power spectrum analyses of deactivations (speech or music < syllables).

(A)-(F) Neural responses to speech and/or music for the 6 canonical frequency bands. Only significant deactivations compared to the baseline condition (syllables listening) are reported. Same conventions as in Figure 2. Results are significant at α < 0.01 (N=18).

Cross-frequency channel selectivity for the power spectrum analyses (syllables baseline).

Percentage of channels that selectively respond to speech (red) or music (blue) across different frequency bands, compared to the baseline condition (syllable listening). Same conventions as in Figure 4.

Population prevalence for the power spectral analyses of activations (speech or music > syllables; N = 18).

(A)-(F) Population prevalence of shared or selective responses for the 6 canonical frequency bands, per anatomical region. Only significant activations compared to the baseline condition (syllables listening) are reported. Same conventions as in Figure 5.

Population prevalence for the power spectral analyses of deactivations (speech or music < syllables; N = 18).

(A)-(F) Population prevalence of shared or selective responses for the 6 canonical frequency bands, per anatomical region. Only significant deactivations compared to the baseline condition (syllables listening) are reported. Same conventions as in Figure 5.

Power spectrum analyses of activations (speech or music > tones) for each hemisphere separately.

Same conventions as in Figure 2. Results are significant at α < 0.01 (N=18).

Power spectrum analyses of deactivations (speech or music < tones) for each hemisphere separately.

Same conventions as in Figure 2. Results are significant at α < 0.01 (N=18).

Contrast between the neural responses to speech and music, for the 6 canonical frequency bands (tones baseline).

Distribution of non-thresholded t-values from the speech vs. music permutation test, projected on the brain surface. Only sEEG channels for which speech and/or music is significantly stronger (left column) or weaker (right column) than the baseline condition (pure tones listening) are reported. Channels for which the speech vs. music contrast is significant are circled in black. Channels for which the speech vs. music contrast is not significant are circled in white. Results are significant at α < 0.01 (N=18).

Contrast between the coherence responses to speech and music, for the 6 canonical frequency bands.

Distribution of non-thresholded differences in coherence values between speech and music, projected on the brain surface. Same conventions as in Figure S8. Results are significant at α < 0.01 (N=15).

Patients description.