Acoustic structure of the speech envelope across 17 languages.

a) Power spectrum between 2 and 12 Hz (top) and between 12 and 180 Hz (bottom) of the envelope of naturalistic, discourse-level speech, displayed for each language (gray lines) and averaged across languages (blue line). b) Significant phase amplitude coupling (PAC) comodulogram estimated across languages (N = 17), showing coupling between the phase (2-12 Hz) and amplitude (20-150 Hz) of the speech envelope (p<.05, surrogate tests). c) Example of speech envelope segment exhibiting strong PAC. From top to bottom: raw envelope, the same signal filtered in the 100–150 Hz, 30–50 Hz, and 2–6 Hz bands. The dashed line indicates the peak of the slow-frequency cycle, aligned with increased amplitude in the higher-frequency bands. d) Correlation across languages between the median fundamental frequency (f0) and the amplitude frequency exhibiting maximal and significant PAC (between 80 and 150 Hz). e) Averaged time-frequency response of the speech envelope aligned to consonant (left) or vocalic (right) onsets, extracted from 9 languages. f) Maximum of 30-50 Hz activity (N=9 languages) after phoneme onset for consonants (blue) and vowels (yellow). Black lines represent individual languages (paired t-test, *** p<.001). g) Example phrase exhibiting strong 30–50 Hz activity after vocalic onset. From top to bottom: sentence in French, phonetic transcription with onset-aligned phonemes, waveform, envelope, and time-frequency representation of the envelope. Arrows indicate fluctuations in the envelope generating the 30–50 Hz activity. h) Significant PAC comodulogram of the speech envelope of the 10-minute French stories used during neural recordings, uttered by (left) a male voice (median f0: 139 Hz) and (right) a female voice (median f0: 189 Hz; p<.05, surrogate tests).

Identification of oscillatory and speech related neural sources.

a-e) Methods. a) Example of cerebral MRI scan (3D T1-weighted): axial cross section with reconstruction of the SEEG electrode position for patient #6. The location of each contact is represented with white rectangles. b) Example of monopolar recordings during resting state. c) Spatial profile of two SEEG-ICs (independent components) across the electrode shaft, representing their contribution to each contact. Both components have clear peaks (ICA weights) along the electrode, indicating the origin of the neural activity around these contacts. d) SEEG-IC traces during the same time period as panel (b). The first component (SEEG-IC 1) captures high-amplitude low-frequency oscillations from the contacts close to the surface, while SEEG-IC 2 is characterized by irregular activity, with no clear oscillatory pattern. e) Schematic of the approach to detect oscillations for SEEG-IC 1. The power spectrum is modeled as a combination of an aperiodic fit and oscillatory peaks. The relative power (yellow lines) of each oscillatory peak is measured as the difference between oscillatory (gray) and aperiodic (dashed) fits. f-g) Resting state. f) Relative power of all significant oscillatory peaks identified across patients during resting state. Points represent oscillatory peaks (N=507). g) Histogram of frequencies for all the oscillatory peaks in panel (f). h-m) Speech processing. h) Normalized power spectra (speech-rest)/rest, averaged across all components exhibiting significant effect (N=112). For all panels: bold lines represent power ratios significantly different from zero at the group level (p<.05, t-tests against zero, FDR-corrected). Error bars are s.e.m. i) Left: example of SEEG-IC with a significant Granger Causality (GC) from the speech envelope, but not in the opposite direction (black line represents statistical threshold: p=.05, surrogate tests). Right: Normalized power spectra across all components with a significant GC from the speech envelope (N=60). j) Left: example of one SEEG-IC with a significant phase-amplitude coupling (PAC) between the phase of theta and the gamma amplitude (p<.05, surrogate tests). Right: Normalized power spectra across all components with a significant theta-gamma PAC (N=28). k) Venn diagram of SEEG-ICs classified by their outcomes in the different analyses: non-responsive channels (N=193; in grey), significant power changes only (N=49; in blue), significant GC and power changes (N=35; in green), significant PAC, GC and power changes (N=25; in orange), and significant PAC and power changes (N=3; in orange). l) Histogram of oscillatory peaks during speech processing, for the different groups of SEEG-ICs identified in panel (k). Each group was considered independently, excluding those contained within smaller circles (e.g., the blue histogram represents the SEEG-ICs with significant power changes but no GC or PAC). m) Normalized power spectra for the different groups of SEEG-ICs.Aperiodic activity was corrected, with different fits between 1–28 Hz and 28–150 Hz (vertical black line).

Spectral GC and neural PAC profiles.

a) Representative example of the three types of outcomes observed in the spectral GC analysis between the speech envelope and neural activity: theta (2–6 Hz; in blue), gamma (low 30–50 Hz and/or high 100–150 Hz; in yellow), or theta and gamma (low and/or high; in green) significant GC. Shaded areas indicate the significance threshold (surrogate tests). b) Average PAC modulation spectrum for the three groups of SEEG-ICs: theta (left; N=10), gamma (middle; N=26) and theta-gamma (right; N=24). c) PAC between the theta (2–6 Hz) phase and low- (30–50 Hz) or high-gamma (100–150 Hz) for each SEEG-IC group (** p<.05, Tukey’s HSD post-hoc test). d) Position of the 60 SEEG-ICs from the three groups on a 3D surface of the temporal lobe. The location of each SEEG-IC is defined by the contact with the maximal contribution in the spatial IC profile.

Relation between speech-brain coupling and fundamental frequency.

a) Comparison of neural responses to male and female recordings with representative examples of the two types of outcomes observed in the spectral GC analysis between the speech envelope (left: male voice; right: female voice) and neural activity: theta (2–6 Hz; top; in blue) and gamma (low 30–50 Hz and/or high 100–150 Hz; bottom; in yellow). Shaded areas indicate the significance threshold (surrogate tests). b) PAC modulation spectrum estimated between the phase of the speech envelope (male: left, female: right) and the amplitude of the gamma SEEG-IC from panel (e) (p<.05, surrogate tests).

Directed speech-brain and brain-brain PAC.

a) Left: Averaged speech-brain cross-frequency directionality (CFD) comodulogram, estimated between the phase of the speech envelope and the amplitude of speech-driven gamma SEEG-ICs (N=50). Right: Average CFD between theta phase (2-6 Hz) and low- (30-50 Hz) or high-gamma (100-150 Hz) amplitude. b) speech-brain CFD estimated between the phase of speech-driven theta SEEG-ICs (N=34) and the amplitude of the speech envelope. c) Inter-source CFD averaged across electrodes (N=15). For each electrode a single CFD map was computed as the average of all its speech-driven pairs. d) Intra-source CFD averaged across electrodes (N=16). Points represent (a,b) individual SEEG-ICs or (c,d) electrodes. Positive CFD values indicate phase-to-amplitude directionality; negative values indicate the reverse. One-sample t-test against zero:: ** p<.01, *** p<.001.

Diagram summarizing the relationship between the speech signal and early auditory cortex dynamics during speech perception.

The green box (Granger Causality) shows brain dynamics that mirror the speech envelope, linearly driven by it, indicated by directional arrows. In the purple box (Power), up/down vertical arrows reflect increases or decreases in power during speech perception compared to rest across all neural sources (SEEG-ICs). The blue box (Cross-Frequency Directionality, CFD) depicts internal connectivity between brain dynamics associated with phase–amplitude coupling (PAC).

Description of the speech corpus of 17 languages.

Anatomical location of the SEEG-ICs.

STG: Superior Temporal Gyrus; A22c: caudal area 22; A22r: rostral area 22; STS: Superior Temporal Sulcus; rpSTS: rostroposterior STS. Brainnetome atlas.

PAC of the speech envelope across 17 languages.

PAC comodulogram between the phase and the amplitude of the speech envelope of naturalistic, discourse-level speech (p<.05, surrogate tests).

PAC of the envelope of various naturalistic musical and environmental sounds.

PAC comodulogram between the phase and the amplitude of the envelope of (top) 10 musical genres and (bottom) 5 categories of environmental sounds.

Preferred phase of PAC.

Averaged normalized amplitude distribution across two 4 Hz cycles of the speech envelope of the french male voice. The black line represents a reference for the phase signal.

Non-normalized power spectrum in rest and speech.

Power spectra during rest and speech for all components with: (left) significant changes in power between conditions; (middle) significant GC from the speech envelope; (right) significant phase-amplitude coupling (PAC) between the phase of theta and the gamma amplitude.

Temporal lag between the speech envelope and auditory activity.

Delay estimated using cross-correlations between the speech envelope and speech-driven SEEG-ICs with spectral GC at theta (N=33 SEEG-ICs showing a significant cross-correlation with the speech envelope, see Methods), low-gamma (N=22), or high-gamma (N=29) frequencies. (*/*** p < .05/.001, Tukey’s HSD post-hoc test). Points represent individual SEEG-ICs. Box plots show the median (center line), interquartile range (box), and full data range excluding outliers (whiskers); outliers are marked with red crosses.