Experimental design, and stimulus characteristics.

(A) Experimental stimuli. Twenty gestures were paired with 20 relevant speech stimuli. Two gating studies were executed to define the minimal length of each gesture and speech required for semantic identification, namely, the discrimination point (DP) of gesture and the identification point (IP) of speech. Overall, a mean of 183.78 ms (SD = 84.82) was found for the DP of gestures and the IP of speech was 176.40 ms (SD = 66.21). The onset of speech was set at the gesture DP. Responses for each item were assessed utilizing information-theoretic complexity metrics to quantify the information content of both gesture and speech during integration, employing entropy and MI.

(B) Procedure of Experiment 1. HD-tDCS, including Anodal, Cathodal, or Sham conditions, was administered to the IFG or pMTG) using a 4 * 1 ring-based electrode montage. Electrode F7 targeted the IFG, with return electrodes placed on AF7, FC5, F9, and FT9. For pMTG stimulation, TP7 was targeted, with return electrodes positioned on C5, P5, T9, and P9. Sessions lasted 20 minutes, with a 5-second fade-in and fade-out, while the Sham condition involved only 30 seconds of stimulation.

(C) Procedure of Experiment 2. Eight time windows (TWs, duration = 40 ms) were segmented in relative to the speech IP. Among the eight TWs, five (TW1, TW2, TW3, TW6, and TW7) were chosen based on the significant results in our prior study28. Double-pulse TMS was delivered over each of the TW of either the pMTG or the IFG.

(D) Procedure of Experiment 3. Semantically congruent gesture-speech pairs were presented randomly with Electroencephalogram (EEG) recorded simultaneously. Epochs were time locked to the onset of speech and lasted for 1000 ms. A 200 ms pre-stimulus baseline correction was applied before the onset of gesture stoke. Various elicited components were hypothesized.

(E-F) Proposed gradations in cortical engagements during gesture-speech information changes. Stepwise variations in the quantity of gesture and speech information during integration, as characterized by information theory metrics (E), are believed to the underpinned by progressive neural engagement within the IFG-pMTG gesture-speech integration circuit (F).

Quantification formulas (A) and distributions of each stimulus in Shannon’s entropy (B).

Two separate pre-tests (N = 30) were conducted to assign a single verb for describing each of the isolated 20 gestures and 20 speech items. Responses provided for each item were transformed into Shannon’s entropy using a relative quantification formula. Gesture (B left) and speech (B right) entropy quantify the randomness of gestural or speech information, representing the uncertainty of probabilistic representation activated when a specific stimulus occurs. Joint entropy (B middle) captures the widespread nature of the two sources of information combined. Mutual information (MI) was calculated as the difference between joint entropy with gesture entropy and speech entropy combined (A), thereby capturing the overlap of gesture and speech and representing semantic integration.

tDCS effect over semantic congruency.

(A) tDCS effect was defined as active-tDCS minus sham-tDCS. The semantic congruency effect was calculated as the reaction time (RT) difference between semantically incongruent and semantically congruent pairs.

(B) Correlations of the tDCS effect over the semantic congruency effect with three information models (gesture entropy, speech entropy and MI) are displayed with best-fitting regression lines. Significant correlations are marked in red. * p < 0.05, ** p < 0.01 after FDR correction.

TMS effect over semantic congruency.

(A) TMS effect was defined as active-TMS minus vertex-TMS. The semantic congruency effect was calculated as the reaction time (RT) difference between semantically incongruent and semantically congruent pairs.

(B) Correlations of the TMS effect over the semantic congruency effect with three information models (gesture entropy, speech entropy and MI) are displayed with best-fitting regression lines. Significant correlations are marked in red. * p < 0.05, ** p < 0.01, *** p < 0.001 after FDR correction.

ERP results of gesture entropy (A), speech entropy (B) or MI (C).

Four ERP components were identified from grand-average ERPs at the Cz electrode, contrasting trials with the lower 50% (red lines) and the higher 50% (blue lines) of gesture entropy, speech entropy or MI (Top panels). Clusters of adjacent time points and electrodes were subsequently identified within each component using a cluster-based permutation test (Bottom right). Topographical maps depict amplitude differences between the lower and higher halves of each information model, with significant ROIs or electrode clusters highlighted in black. Solid rectangles delineating the ROIs that exhibited the maximal correlation and paired t-values (Bottom left). T-test comparisons with normal distribution lines and correlations with best-fitting regression lines are calculated and illustrated between the average ERP amplitude within the rectangular ROI (Bottom left) or the elicited clusters (Bottom right) and the three information models individually. * p < 0.05, ** p < 0.01 after FDR correction.

Progressive processing stages of gesture–speech information within the pMTG-IFG loop.

Correlations between the TMS disruption effect of pMTG and IFG with three information models are represented by the orange line and the green lines, respectively. Black lines denote the strongest correlations of ROI averaged ERP components with three information models. * p < 0.05, ** p < 0.01 after FDR correction.

Gesture description and paring with incongruent and congruent speech.

Examples of ‘an4 (press)’ for the calculation of speech entropy, gesture entropy and mutual information (MI)

Quantitative information for each stimulus.

Raw RT of semantic congruent (Sc) and semantic incongruent (Si) in Experiment 1 and Experiment 2.