Progressive neural engagement within the IFG-pMTG circuit as gesture and speech entropy and MI advances

Wanying Zhao; Zhouyi Li; Xiang Li; Yi Du

doi:10.7554/eLife.99416.1

eLife Assessment

This useful study investigates the role of frontotemporal regions in integrating linguistic and extra-linguistic information during communication, focusing on the inferior frontal gyrus and posterior middle temporal gyrus. It uses brain stimulation and electroencephalography to study speech-gesture integration. While the research question is interesting, the methods are insufficient for studying tightly-coupled brain regions over short timescales, leading to incomplete support for the claims due to conceptual and methodological limitations.

https://doi.org/10.7554/eLife.99416.1.sa4

Significance of findings

useful: Findings that have focused importance and scope

landmark
fundamental
important
valuable
useful

Strength of evidence

incomplete: Main claims are only partially supported

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Semantic representation emerges from distributed multisensory modalities, yet a comprehensive understanding of the functional changing pattern within convergence zones or hubs integrating multisensory semantic information remains elusive. In this study, employing information-theoretic metrics, we quantified gesture and speech information, alongside their interaction, utilizing entropy and mutual information (MI). Neural activities were assessed via interruption effects induced by High-Definition transcranial direct current stimulation (HD-tDCS). Additionally, chronometric double-pulse transcranial magnetic stimulation (TMS) and high-temporal event-related potentials were utilized to decipher dynamic neural changes resulting from various information contributors. Results showed gradual inhibition of both inferior frontal gyrus (IFG) and posterior middle temporal gyrus (pMTG) as degree of gesture-speech integration, indexed by MI, increased. Moreover, a time-sensitive and staged progression of neural engagement was observed, evidenced by distinct correlations between neural activity patterns and entropy measures of speech and gesture, as well as MI, across early sensory and lexico-semantic processing stages. These findings illuminate the gradual nature of neural activity during multisensory gesture-speech semantic processing, shaped by dynamic gesture constraints and speech encoding, thereby offering insights into the neural mechanisms underlying multisensory language processing.

Introduction

Semantic representation, distinguished by its cohesive conceptual nature, emerges from distributed modality-specific regions. Consensus acknowledges the presence of ’convergence zones’ within the temporal and inferior parietal areas ¹, or the ’semantic hub’ located in the anterior temporal lobe², pivotal for integrating, converging, or distilling multimodal inputs. Contemporary perspectives on semantic processing portray it as a sequence of quantitatively functional mental states defined by a specific parser³, unified by statistical regularities among multiple sensory inputs⁴ through hierarchical prediction and multimodal interactions^5–9. Hence, proposals suggest that the coherent semantic representation emerges from statistical learning mechanisms within these ’convergence zones’ or ’semantic hub’ ^10–12, potentially functioning in a graded manner^12,13. However, the exact nature of the graded structure within these integration hubs, along with their temporal dynamics, remains elusive.

Among the many kinds of multimodal extralinguistic information, representational gesture is the one that is related to the semantic content of co-occurring speech^14,15. Representational gesture is regarded as ‘part of language’¹⁶ or functional equivalents of lexical units that alternate and integrate with speech into a ‘single unification space’ to convey a coherent meaning^17–19. Empirical studies have investigated the semantic integration between representational gesture (gesture in short hereafter) and speech by manipulating their semantic relationship^20–23 and revealed a mutual interaction between them^24–26 as reflected by the N400 latency and amplitude¹⁹ as well as common neural underpinnings in the left inferior frontal gyrus (IFG) and posterior middle temporal gyrus (pMTG)^20,27,28. Quantifying the amount of information from both sources and their interaction, the present study delved into cortical engagement and temporal dynamics during multisensory gesture-speech integration, with a specific focus on the IFG and pMTG, alongside various ERP components.

To this end, we developed an analytic approach to directly probe the contribution of gesture and speech during multisensory semantic integration, while adopting the information-theoretic complexity metrics of entropy and mutual information (MI). Entropy captures the disorder or randomness of information and is used as a measurement of the uncertainty of representation activated when an event occurs²⁹. MI illustrates the mutual constraint that the two variables impose on each other³⁰. Herein, during gesture-speech integration, entropy measures the uncertainty of information of gesture or speech, while MI indexes the degree of integration.

Three experiments were conducted to unravel the intricate neural processes underlying gesture-speech semantic integration. In Experiment 1, High-Definition Transcranial Direct Current Stimulation (HD-tDCS) was utilized to administer Anodal, Cathodal and Sham stimulation to either the IFG or the pMTG. HD-tDCS induces membrane depolarization with anodal stimulation and membrane hyperpolarisation with cathodal stimulation³¹, thereby respectively increasing or decreasing cortical excitability in the targeted brain area. Hence, Experiment 1 aimed to determine whether the facilitation effect (Anodal-tDCS minus Sham-tDCS) and/or the inhibitory effect (Cathodal-tDCS minus Sham-tDCS) on the integration hubs of IFG and/or pMTG were modulated by the degree of gesture-speech integration, indexed with MI. Considering the different roles of IFG and pMTG during integration²⁸, as well as the various ERP components reported in prior investigations, such as the early sensory effect as P1 and N1–P2^33,34, the N400 semantic conflict effect^19,34,35, and the late positive component (LPC) reconstruction effect^36,37. Experiment 2 employed chronometric double-pulse transcranial magnetic stimulation (TMS) to target short time windows along the gesture-speech integration period³². In parallel, Experiment 3 utilized high-temporal event-related potentials to explore whether the various neural engagements were temporally and progressively modulated by distinct information contributors during gesture-speech integration.

Material and methods

Participants

Ninety-eight young Chinese participants signed written informed consent forms and took part in the present study (Experiment 1: 29 females, 23 males, age = 20 ± 3.40 years; Experiment 2: 11 females, 13 males, age = 23 ± 4.88 years; Experiment 3: 12 females, 10 males, age = 21 ± 3.53 years). All of the participants were right-handed (Experiment 1: laterality quotient (LQ)³⁸ = 88.71 ± 13.14; Experiment 2: LQ = 89.02 ± 13.25; Experiment 3: LQ = 88.49 ± 12.65), had normal or corrected-to-normal vision and were paid ¥100 per hour for their participation. All experiments were approved by the Ethics Committee of the Institute of Psychology, Chinese Academy of Sciences.

Stimuli

Twenty gestures (Appendix Table 1) with 20 semantically congruent speech signals taken from previous study²⁸ were used. The stimuli set were recorded from two native Chinese speakers (1 male, 1 female) and validated by replicating the semantic congruency effect with 30 participants. Results showed a significantly (t(29) = 7.16, p < 0.001) larger reaction time when participants were asked to judge the gender of the speaker if gesture contained incongruent semantic information with speech (a ‘cut’ gesture paired with speech word ‘喷 pen1 (spray)’: mean = 554.51 ms, SE = 11.65) relative to when they were semantically congruent (a ‘cut’ gesture paired with ‘剪 jian3 (cut)’ word: mean = 533.90 ms, SE = 12.02)²⁸.

Additionally, two separate pre-tests with 30 subjects in each (pre-test 1: 16 females, 14 males, age = 24 ± 4.37 years; pre-test 2: 15 females, 15 males, age = 22 ± 3.26 years) were conducted to determine the comprehensive values of gesture and speech. Participants were presented with segments of increasing duration, beginning at 40 ms, and were prompted to provide a single verb to describe either the isolated gesture they observed (pre-test 1) or the isolated speech they heard (pre-test 2). For each pre-test, the response consistently provided by participants for four to six consecutive instances was considered the comprehensive answer for the gesture or speech. The initial instance duration was marked as the discrimination point (DP) for gesture (mean = 183.78 ± 84.82ms) or the identification point (IP) for speech (mean = 176.40 ± 66.21ms) (Figure 1A top).

Experimental design, and stimulus characteristics.
**(A) Experimental stimuli.** Twenty gestures were paired with 20 relevant speech stimuli. Two gating studies were executed to define the minimal length of each gesture and speech required for semantic identification, namely, the discrimination point (DP) of gesture and the identification point (IP) of speech. Overall, a mean of 183.78 ms (SD = 84.82) was found for the DP of gestures and the IP of speech was 176.40 ms (SD = 66.21). The onset of speech was set at the gesture DP. Responses for each item were assessed utilizing information-theoretic complexity metrics to quantify the information content of both gesture and speech during integration, employing entropy and MI.
**(B) Procedure of Experiment 1.** HD-tDCS, including Anodal, Cathodal, or Sham conditions, was administered to the IFG or pMTG) using a 4 * 1 ring-based electrode montage. Electrode F7 targeted the IFG, with return electrodes placed on AF7, FC5, F9, and FT9. For pMTG stimulation, TP7 was targeted, with return electrodes positioned on C5, P5, T9, and P9. Sessions lasted 20 minutes, with a 5-second fade-in and fade-out, while the Sham condition involved only 30 seconds of stimulation.
**(C) Procedure of Experiment 2.** Eight time windows (TWs, duration = 40 ms) were segmented in relative to the speech IP. Among the eight TWs, five (TW1, TW2, TW3, TW6, and TW7) were chosen based on the significant results in our prior study²⁸. Double-pulse TMS was delivered over each of the TW of either the pMTG or the IFG.
**(D) Procedure of Experiment 3.** Semantically congruent gesture-speech pairs were presented randomly with Electroencephalogram (EEG) recorded simultaneously. Epochs were time locked to the onset of speech and lasted for 1000 ms. A 200 ms pre-stimulus baseline correction was applied before the onset of gesture stoke. Various elicited components were hypothesized.
**(E-F) Proposed gradations in cortical engagements during gesture-speech information changes.** Stepwise variations in the quantity of gesture and speech information during integration, as characterized by information theory metrics (E), are believed to the underpinned by progressive neural engagement within the IFG-pMTG gesture-speech integration circuit (F).

To quantify information content, responses for each item were converted into Shannon’s entropy (H) as a measure of information richness (Figure 1A bottom). With no significant gender differences observed in both gesture (t(20) = 0.21, p = 0.84) and speech (t(20) = 0.52, p = 0.61), responses were aggregated across genders, resulting in 60 answers per item (Appendix Table 2). Here, p(xi) and p(yi) represent the distribution of 60 answers for a given gesture (Appendix Table 2B) and speech (Appendix Table 2A), respectively. High entropy indicates diverse answers, reflecting broad representation, while low entropy suggests focused lexical recognition for a specific item (Figure 2B). The joint entropy computation for gesture and speech, represented by H(xi, yi), involved amalgamating datasets of gesture and speech responses to depict their combined distributions. For specific gesture-speech combinations, equivalence between the joint entropy and the sum of individual entropies (gesture or speech) indicates absence of overlap in response sets. Conversely, significant overlap, denoted by a considerable number of shared responses between gesture and speech datasets, leads to a noticeable discrepancy between joint entropy and the sum of gesture and speech entropies. This quantification of gesture-speech overlap was operationalized by subtracting the joint entropy of gesture-speech from the combined entropies of gesture and speech, indexed by Mutual Information (MI) (see Appendix Table 2C). Elevated MI values thus signify substantial overlap, indicative of a robust mutual interaction between gesture and speech. The quantitative information for each stimulus, including gesture entropy, speech entropy, joint entropy, and MI are displayed in Appendix Table 3.

Quantification formulas (A) and distributions of each stimulus in Shannon’s entropy (B).
Two separate pre-tests (N = 30) were conducted to assign a single verb for describing each of the isolated 20 gestures and 20 speech items. Responses provided for each item were transformed into Shannon’s entropy using a relative quantification formula. Gesture (**B left**) and speech (**B right**) entropy quantify the randomness of gestural or speech information, representing the uncertainty of probabilistic representation activated when a specific stimulus occurs. Joint entropy (**B middle**) captures the widespread nature of the two sources of information combined. Mutual information (MI) was calculated as the difference between joint entropy with gesture entropy and speech entropy combined (A), thereby capturing the overlap of gesture and speech and representing semantic integration.

To accurately assess whether entropy/MI corresponds to stepped neural changes, the current study aggregated neural responses (Non-invasive brain stimulation (NIBS) inhibition effect or ERP amplitude) with identical entropy or MI values prior to conducting correlational analyses.

Experimental procedure

Adopting a semantic priming paradigm of gestures onto speech^16,32, speech onset was set to be at the DP of each accompanying gesture. An irrelevant factor of gender congruency (e.g., a man making a gesture combined with a female voice) was created^27,28,39. This involved aligning the gender of the voice with the corresponding gender of the gesture in either a congruent (e.g., male voice paired with a male gesture) or incongruent (e.g., male voice paired with a female gesture) manner. This approach served as a direct control mechanism, facilitating the investigation of the automatic and implicit semantic interplay between gesture and speech³⁹. In light of previous findings indicating a distinct TMS-disruption effect on the semantic congruency of gesture-speech interactions²⁸, both semantically congruent and incongruent pairs were included in Experiment 1 and Experiment 2. Experiment 3, conversely, exclusively utilized semantically congruent pairs to elucidate ERP metrics indicative of nuanced semantic progression.

Gesture–speech pairs were presented randomly using Presentation software (www.neurobs.com). Participants underwent Experiment 1, comprising 480 gesture-speech pairs, across three separate sessions spaced one week apart for each participant. In each session, participants received one of three stimulation types (Anodal, Cathodal, or Sham). Experiment 2 consisted of 800 pairs and was conducted across 15 blocks over three days, with one week between sessions. The order of stimulation site and time window (TW) was counterbalanced using a Latin square design. Experiment 3, comprising 80 gesture-speech pairs, was completed in a single-day session. Participants were asked to look at the screen but respond with both hands as quickly and accurately as possible merely to the gender of the voice they heard. The RT and the button being pressed were recorded. The experiment started with a fixation cross presented on the center of the screen, which lasted for 0.5-1.5 sec.

Experiment 1: HD-tDCS protocol and data analysis

HD-tDCS protocol employed a constant current stimulator (The Starstim 8 system) delivering stimulation at an intensity of 2000mA. A 4 * 1 ring-based electrode montage was utilized, comprising a central electrode (stimulation) positioned directly over the target cortical area and four return electrodes encircling it to provide focused stimulation. For targeting the left IFG at Montreal Neurological Institute (MNI) coordinates (-62, 16, 22), electrode F7 was selected as the optimal cortical projection site⁴⁰, with the four return electrodes placed on AF7, FC5, F9, and FT9. For stimulation of the pMTG at coordinates (-50, -56, 10), TP7 was identified as the cortical projection site⁴⁰, with return electrodes positioned on C5, P5, T9, and P9. The stimulation parameters included a 20-minute duration with a 5-second fade-in and fade-out for both Anodal and Cathodal conditions. The Sham condition involved a 5-second fade-in followed by only 30 seconds of stimulation, then 19’20 minutes of no stimulation, and finally a 5-second fade-out (Figure 1B). Stimulation was controlled using NIC software, with participants blinded to the stimulation conditions.

All incorrect responses (702 out of the total number of 24960, 2.81% of trials) were excluded. To eliminate the influence of outliers, a 2SD trimmed mean for every participant in each session was also calculated. Our present analysis focused on Pearson correlations between the interruption effects of HD-tDCS (active tDCS minus sham tDCS) on the semantic congruency effect (difference in reaction time between semantic incongruent and semantic congruent pairs) and the variables of gesture entropy, speech entropy, or MI. This methodology seeks to determine whether the neural activity within the left IFG and pMTG is gradually affected by varying levels of gesture and speech information during integration, as quantified by entropy and MI.

Experiment 2: TMS protocol and data analysis

At an intensity of 50% of the maximum stimulator output, double-pulse TMS was delivered via a 70 mm figure-eight coil using a Magstim Rapid² stimulator (Magstim, UK) over either the left IFG in TW3 (-40∼0 ms in relative to speech identification point (IP)) and TW6 (80∼120 ms,) or the left pMTG in TW1 (-120 ∼ -80 ms), TW2 (-80 ∼ -40 ms) and TW7 (120∼160 ms). Among the TWs that covering the period of gesture-speech integration, those that showed a TW-selective disruption of gesture-speech integration were selected²⁸ (Figure 1C).

High-resolution (1 × 1 × 0.6 mm) T1-weighted MRI scans were obtained using a Siemens 3T Trio/Tim Scanner for image-guided TMS navigation. Frameless stereotaxic procedures (BrainSight 2; Rogue Research) allowed real-time stimulation monitoring. To ensure precision, individual anatomical images were manually registered by identifying the anterior and posterior commissures. Subject-specific target regions were defined using trajectory markers in the MNI coordinate system. Vertex was used as control.

All incorrect responses (922 out of the total number of 19200, 4.8% of trials) were excluded. We focused our analysis on Pearson correlations of the TMS interruption effects (active TMS minus vertex TMS) of the semantic congruency effect with the gesture entropy, speech entropy or MI. By doing this, we can determine how the time-sensitive contribution of the left IFG and pMTG to gesture–speech integration was affected by gesture and speech information distribution. FDR correction was applied for multiple comparisons.

Experiment 3: Electroencephalogram (EEG) recording and data analysis

EEG were recorded from 48 Ag/AgCl electrodes mounted in a cap according to the 10-20 system⁴¹, amplified with a PORTI-32/MREFA amplifier (TMS International B.V., Enschede, NL) and digitized online at 500 Hz (bandpass, 0.01-70 Hz). EEGLAB, a MATLAB toolbox, was used to analyze the EEG data⁴². Vertical and horizontal eye movements were measured with 4 electrodes placed above the left eyebrow, below the left orbital ridge and at bilateral external canthus. All electrodes were referenced online to the left mastoid. Electrode impedance was maintained below 5 KΩ. The average of the left and right mastoids was used for re-referencing. A high-pass filter with a cutoff of 0.05 Hz and a low-pass filter with a cutoff of 30 Hz were applied. Semi-automated artifact removal, including independent component analysis (ICA) for identifying components of eye blinks and muscle activity, was performed (Figure 1D). Participants with rejected trials exceeding 30% of their total were excluded from further analysis.

All incorrect responses were excluded (147 out of 1760, 8.35% of trials). To eliminate the influence of outliers, a 2 SD trimmed mean was calculated for every participant in each condition. Data were epoched from the onset of speech and lasted for 1000 ms. To ensure a clean baseline with no stimulus presented, a 200 ms pre-stimulus baseline correction was applied before gesture onset.

To objectively identify the time windows of activated components, grand-average ERPs at electrode Cz were compared between the higher (≥50%) and lower (<50%) halves for gesture entropy (Figure 5A1), speech entropy (Figure 5B1), and MI (Figure 5C1). Consequently, four ERP components were predetermined: the P1 effect observed within the time window of 0-100 ms^33,34, the N1-P2 effect observed between 150-250ms^33,34, the N400 within the interval of 250-450ms^19,34,35, and the LPC spanning from 550-1000ms^36,37. Additionally, seven regions-of-interest (ROIs) were defined in order to locate the modulation effect on each ERP component: left anterior (LA): F1, F3, F5, FC1, FC3, and FC5; left central (LC): C1, C3, C5, CP1, CP3, and CP5; left posterior (LP): P1, P3, P5, PO3, PO5, and O1; right anterior (RA): F2, F4, F6, FC2, FC4, and FC6; right central (RC): C2, C4, C6, CP2, CP4, and CP6; right posterior (RP): P2, P4, P6, PO4, PO6, and O2; and midline electrodes (ML): Fz, FCz, Cz, Pz, Oz, and CPz⁴³.

Subsequently, cluster-based permutation tests⁴⁴ in Fieldtrip was further used to determine the significant clusters of adjacent time points and electrodes of ERP amplitude between the higher and lower halves of gesture entropy, speech entropy and MI, respectively. The electrode-level type I error threshold was set to 0.025. Cluster-level statistic was estimated through 5000 Monte Carlo simulations, where the cluster-level statistic is the sum of T-values for each stimulus within a cluster. The cluster-level type I error threshold was set to 0.05. Clusters with a p-value less than the critical alpha-level are considered to be conditionally different.

Paired t-tests were conducted to compare the lower and upper halves of each information model for the averaged amplitude within each ROI or cluster across the four ERP time windows, separately. Pearson correlations were calculated between each model value and each averaged ERP amplitude in each ROI or cluster, individually. False discovery rate (FDR) correction was applied for multiple comparisons.

Results

Experiment 1: Modulation of left pMTG and IFG engagement by gradual changes in gesture-speech semantic information

In the IFG, one-way ANOVA examining the effects of three tDCS conditions (Anodal, Cathodal, or Sham) on semantic congruency (RT (semantic incongruent) – RT (semantic congruent)) demonstrated a significant main effect of stimulation condition (F(2, 75) = 3.673, p = 0.030, ηp2 = 0.089). Post hoc paired t-tests indicated a significantly reduced semantic congruency effect between the Cathodal condition and the Sham condition (t(26) = -3.296, p = 0.003, 95% CI = [-11.488, 4.896]) (Figure 3A left). Subsequent Pearson correlation analysis revealed that the reduced semantic congruency effect was progressively associated with the MI, evidenced by a significant correlation between the Cathodal-tDCS effect (Cathodal-tDCS minus Sham-tDCS) and MI (r = -0.595, p = 0.007, 95% CI = [-0.995, -0.195]) (Figure 3B).

tDCS effect over semantic congruency.
**(A)** tDCS effect was defined as active-tDCS minus sham-tDCS. The semantic congruency effect was calculated as the reaction time (RT) difference between semantically incongruent and semantically congruent pairs.
**(B)** Correlations of the tDCS effect over the semantic congruency effect with three information models (gesture entropy, speech entropy and MI) are displayed with best-fitting regression lines. Significant correlations are marked in red. * p < 0.05, ** p < 0.01 after FDR correction.

Similarly, in the pMTG, a one-way ANOVA assessing the effects of three tDCS conditions on semantic congruency also revealed a significant main effect of stimulation condition (F(2, 75) = 3.250, p = 0.044, ηp2 = 0.080). Subsequent paired t-tests identified a significantly reduced semantic congruency effect between the Cathodal condition and the Sham condition (t(25) = -2.740, p = 0.011, 95% CI = [-11.915, 6.435]) (Figure 3A right). Moreover, a significant correlation was observed between the Cathodal-tDCS effect and MI (r = -0.457, p = 0.049, 95% CI = [-0.900, -0.014]) (Figure 3B). RTs of congruent and incongruent trials of IFG and pMTG in each of the stimulation conditions were shown in Appendix Table 4A.

Experiment 2: Time-sensitive modulation of left pMTG and IFG engagements by gradual changes in gesture-speech semantic information

A 2 (TMS effect: active - Vertex) × 5 (TW) ANOVA on semantic congruency revealed a significant interaction between TMS effect and TW (F(3.589, 82.538) = 3.273, p = 0.019, ηp2 = 0.125). Further t-tests identified a significant TMS effect over the pMTG in TW1 (t(23) = -3.068, p = 0.005, 95% CI = [-6.838, 0.702]), TW2 (t(23) = -2.923, p = 0.008, 95% CI = [-6.490, 0.644]), and TW7 (t(23) = -2.005, p = 0.047, 95% CI = [-5.628, 1.618]). In contrast, a significant TMS effect over the IFG was found in TW3 (t(23) = -2.335, p = 0.029, 95% CI = [-5.928, 1.258]), and TW6 (t(23) = -4.839, p < 0.001, 95% CI = [-7.617, -2.061]) (Figure 4A). Raw RTs of congruent and incongruent trials were shown in Appendix Table 4B.

TMS effect over semantic congruency.
**(A)** TMS effect was defined as active-TMS minus vertex-TMS. The semantic congruency effect was calculated as the reaction time (RT) difference between semantically incongruent and semantically congruent pairs.
**(B)** Correlations of the TMS effect over the semantic congruency effect with three information models (gesture entropy, speech entropy and MI) are displayed with best-fitting regression lines. Significant correlations are marked in red. * p < 0.05, ** p < 0.01, *** p < 0.001 after FDR correction.

Additionally, a significant negative correlation was found between the TMS effect (a more negative TMS effect represents a stronger interruption of the integration effect) and speech entropy when the pMTG was inhibited in TW2 (r = -0.792, p = 0.004, 95% CI = [-1.252, -0.331]). Meanwhile, when the IFG activity was interrupted in TW6, a significant negative correlation was found between the TMS effect and gesture entropy (r = -0.539, p = 0.014, 95% CI = [-0.956, -0.122]), speech entropy (r = -0.664, p = 0.026, 95% CI = [-1.255, -0.073]), and MI (r = -0.677, p = 0.001, 95% CI = [-1.054, -0.300]) (Figure 4B).

Experiment 3: Temporal modulation of P1, N1-P2, N400 and LPC components by gradual changes in gesture-speech semantic information

Topographical maps illustrating amplitude differences between the lower and higher halves of speech entropy demonstrate a central-posterior P1 amplitude (0-100 ms, Figure 5B2 middle). Aligning with prior findings³³, the paired t-tests demonstrated a significantly larger P1 amplitude within the ML ROI (t(22) = 2.510, p = 0.020, 95% confidence interval (CI) = [1.66, 3.36]) when contrasting stimuli with higher 50% speech entropy against those with lower 50% speech entropy (Figure 5B2 left). Subsequent correlation analyses unveiled a significant increase in the P1 amplitude with the rise in speech entropy within the ML ROI (r = 0.609, p = 0.047, 95% CI = [0.039, 1.179], Figure 5B2 right). Furthermore, a cluster of neighboring time-electrode samples exhibited a significant contrast between the lower 50% and higher 50% of speech entropy, revealing a P1 effect spanning 16 to 78 ms at specific electrodes (FC2, FCz, C1, C2, Cz, and CPz, Figure 5B3 middle) (t(22) = 2.754, p = 0.004, 95% confidence interval (CI) = [1.65, 3.86], Figure 5B3 left), with a significant correlation with speech entropy (r = 0.636, p = 0.035, 95% CI = [0.081, 1.191], Figure 5B3 right).

ERP results of gesture entropy (A), speech entropy (B) or MI (C).
Four ERP components were identified from grand-average ERPs at the Cz electrode, contrasting trials with the lower 50% (red lines) and the higher 50% (blue lines) of gesture entropy, speech entropy or MI (**Top panels**). Clusters of adjacent time points and electrodes were subsequently identified within each component using a cluster-based permutation test (**Bottom right**). Topographical maps depict amplitude differences between the lower and higher halves of each information model, with significant ROIs or electrode clusters highlighted in black. Solid rectangles delineating the ROIs that exhibited the maximal correlation and paired t-values (**Bottom left**). T-test comparisons with normal distribution lines and correlations with best-fitting regression lines are calculated and illustrated between the average ERP amplitude within the rectangular ROI (**Bottom left**) or the elicited clusters (**Bottom right**) and the three information models individually. * p < 0.05, ** p < 0.01 after FDR correction.

Additionally, topographical maps comparing the lower 50% and higher 50% gesture entropy revealed a frontal N1-P2 amplitude (150-250 ms, Figure 5A2 middle). In accordance with previous findings on bilateral frontal N1-P2 amplitude³³, paired t-tests displayed a significantly larger amplitude for stimuli with lower 50% gesture entropy than with higher 50% entropy in both ROIs of LA (t(22) = 2.820, p = 0.011, 95% CI = [2.21, 3.43]) and RA (t(22) = 2.223, p = 0.038, 95% CI = [1.56, 2.89]) (Figure 5A2 left). Moreover, a negative correlation was found between N1-P2 amplitude and gesture entropy in both ROIs of LA (r = -0.465, p = 0.039, 95% CI = [-0.87, -0.06]) and RA (r = -0.465, p = 0.039, 95% CI = [-0.88, -0.05]) (Figure 5A2 right). Additionally, through a cluster-permutation test, the N1-P2 effect was identified between 184 to 202 ms at electrodes FC4, FC6, C2, C4, C6, and CP4 (Figure 5A3 middle) (t(22) = 2.638, p = 0.015, 95% CI = [1.79, 3.48], (Figure 5A3 left)), exhibiting a significant correlation with gesture entropy (r = -0.485, p = 0.030, 95% CI = [-0.91, -0.06], Figure 5A3 right).

Furthermore, in line with prior research⁴⁵, a left-frontal N400 amplitude (250-450 ms) was discerned from topographical maps of both gesture entropy (Figure 5A4 middle) and MI (Figure 5C2 middle). Notably, a larger N400 amplitude in the LA ROI was consistently observed for stimuli with lower 50% values compared to those with higher 50% values, both for gesture entropy (t(22) = 2.455, p = 0.023, 95% CI = [1.95, 2.96], Figure 5A4 left) and MI (t(22) = 3.00, p = 0.007, 95% CI = [2.54, 3.46], Figure 5C2 left). Concurrently, a negative correlation was noted between the N400 amplitude and both gesture entropy (r = -0.480, p = 0.032, 95% CI = [-0.94, -0.03], Figure 5A4 right) and MI (r = -0.504, p = 0.028, 95% CI = [-0.97, -0.04], Figure 5C2 right) in the LA ROI.

The identified clusters with the N400 effect for gesture entropy (282 – 318 ms at electrodes FC1, FCz, C1, and Cz, Figure 5A5 middle) (t(22) = 2.828, p = 0.010, 95% CI = [2.02, 3.64], Figure 5A5 left) exhibited significant correlation between the N400 amplitude and gesture entropy (r = -0.445, p = 0.049, 95% CI = [-0.88, -0.01], Figure 5A5 right). Similarly, the cluster with the N400 effect for MI (294 – 306 ms at electrodes F1, F3, Fz, FC1, FC3, FCz, and C1, Figure 5C3 middle) (t(22) = 2.461, p = 0.023, 95% CI = [1.62, 3.30], Figure 5C3 left) also exhibited significant correlation (r = -0.569, p = 0.011, 95% CI = [-0.98, -0.16], Figure 5C5 right).

Finally, consistent with previous findings³³, an anterior LPC effect (550-1000 ms) was observed in topographical maps comparing stimuli with lower and higher 50% speech entropy (Figure 5B4 middle). The reduced LPC amplitude was evident in the paired t-tests conducted in ROIs of LA (t(22) = 2.614, p = 0.016, 95% CI = [1.88, 3.35]); LC (t(22) = 2.592, p = 0.017, 95% CI = [1.83, 3.35]); RA (t(22) = 2.520, p = 0.020, 95% CI = [1.84, 3.24]); and ML (t(22) = 2.267, p = 0.034, 95% CI = [1.44, 3.10]) (Figure 5B4 left). Simultaneously, a marked negative correlation with speech entropy was evidenced in ROIs of LA (r = -0.836, p = 0.001, 95% CI = [-1.26, -0.42]); LC (r = -0.762, p = 0.006, 95% CI = [-1.23, -0.30]); RA (r = -0.774, p = 0.005, 95% CI = [-1.23, -0.32]) and ML (r = -0.730, p = 0.011, 95% CI = [-1.22, -0.24]) (Figure 5B4 right). Additionally, a cluster with the LPC effect (644 - 688 ms at electrodes Cz, CPz, P1, and Pz, Figure 5B5 middle) (t(22) = 2.754, p = 0.012, 95% CI = [1.50, 4.01], Figure 5B5 left) displayed a significant correlation with speech entropy (r = -0.699, p = 0.017, 95% CI = [-1.24, -0.16], Figure 5B5 right).

Discussion

Through mathematical quantification of gesture and speech information using entropy and mutual information (MI), we examined the functional pattern and dynamic neural structure underlying multisensory semantic integration. Our results, for the first time, unveiled a progressive inhibition of IFG and pMTG by HD-tDCS as the degree of gesture-speech interaction, indexed by MI, advanced (Experiment 1). Additionally, the gradual neural engagement was found to be time-sensitive and staged, as evidenced by the selectively interrupted time windows (Experiment 2) and the distinct correlated ERP components (Experiment 3), which were modulated by top-down gesture constrain (gesture entropy) and bottom-up speech. These findings significantly expand our understanding of the cortical foundations of statistically regularized multisensory semantic information.

It is widely acknowledged that a single, amodal system mediates the interactions among perceptual representations of different modalities^11,12,46. Moreover, observations have suggested that semantic dementia patients experience increasing overregularization of their conceptual knowledge due to the progressive deterioration of this amodal system⁴⁷. Consequently, a graded function and structure of the transmodal ’hub’ representational system has been proposed^12,48,49. In line with this, through the use of NIBS techniques such as HD-tDCS and TMS, the present study provides compelling evidence that the integration hubs of gesture and speech, namely the pMTG and IFG, function in a graded manner. This is supported by the progressive inhibition effect observed in these brain areas as the entropy and mutual information of gesture and speech advances.

Moreover, by dividing the potential integration period into eight TWs relative to the speech IP and administering inhibitory double-pulse TMS across each TW, the current study attributed the gradual TMS-selective regional inhibition to distinct information sources. In the early pre-lexical TW2 of gesture-speech integration, the suppression effect observed in the pMTG was correlated with speech entropy. Conversely, in the later post-lexical TW6, the IFG interruption effect was influenced by both gesture entropy, speech entropy, and their MI. A dual-stage pMTG-IFG-pMTG neurocircuit loop during gesture-speech integration has been proposed previous²⁸. As an extension, the present study unveils a staged accumulation of engagement within the neurocircuit linking the transmodal regions of pMTG and IFG, arising from distinct contributors of information.

Furthermore, we disentangled the sub-processes of integration with high-temporal ERPs, when representations of gesture and speech were variously presented. Early P1-N1 and P2 sensory effects linked to perception and attentional processes^34,50 was comprehended as a reflection of the early audiovisual gesture-speech integration in the sensory-perceptual processing chain⁵¹. Note that a semantic priming paradigm was adopted here to create a top-down prediction of gesture over speech. The observed positive correlation of the P1 effect with speech entropy and the negative correlation of the N1-P2 effect with gesture entropy suggest that the early interaction of gesture-speech information was modulated by both top-down gesture prediction and bottom-up speech processing. Additionally, the lexico-semantic effect of the N400 and the LPC were differentially mediated by top-down gesture prediction, bottom-up speech encoding and their interaction: the N400 was negatively correlated with both the gesture entropy and MI, but the LPC was negatively correlated only with the speech entropy. Nonetheless, activation of representation is modulated progressively. The input stimuli would activate a dynamically distributed neural landscape, the state of which constructs gradually as measured by entropy and MI and correlates with the electrophysiological signals (N400 and LPC) which indicate the change of lexical representation. Consistent with recent account in multisensory information processing^4,52, our findings further confirm that the changed activation pattern can be induced from directions of both top-down and bottom-up gesture-speech processing.

Considering the close alignment of the ERP components with the TWs of TMS effect, it is reasonable to speculate the ERP components with the cortical involvements (Figure 6). Consequently, referencing the recurrent neurocircuit connecting the left IFG and pMTG for semantic unification⁵³, we extended the previously proposed two-stage gesture-speech integration circuit²⁸ into sequential steps. First, bottom-up speech processing mapping acoustic signal to its lexical representation was performed from the STG/S to the pMTG. The larger speech entropy was, the greater effort was made during the matching of the acoustic input with its stored lexical representation, thus leading to a larger involvement of the pMTG at pre-lexical stage (TW2) and a larger P1 effect (Figure 6 ①). Second, the gesture representation was activated in the pMTG and further exerted a top-down modulation over the phonological processing of speech in the STG/S⁵⁴. The higher certainty of gesture, a larger modulation of gesture would be made upon speech, as indexed by a smaller gesture entropy with an enhanced N1-P2 amplitude (Figure 6②). Third, information was relayed from the pMTG to the IFG for sustained activation, during which a semantic constraint from gesture has been made on the semantic retrieval of speech. Greater TMS effect over the IFG at post-lexical stage (TW6) accompanying with a reduced N400 amplitude were found with the increase of gesture entropy, when the representation of gesture was wildly distributed and the constrain over the following speech was weak (Figure 6③). Fourth, the activated speech representation was compared with that of the gesture in the IFG. At this stage, the larger overlapped neural populations activated by gesture and speech as indexed by a larger MI, a greater TMS disruption effect of the IFG and a reduced N400 amplitude indexing easier integration and less semantic conflict were observed (Figure 6④). Last, the activated speech representation would disambiguate and reanalyze the semantic information that was stored in the IFG and further unify into a coherent comprehension in the pMTG^17,55. The more uncertain information being provided by speech, as indicated by an increased speech entropy, a stronger reweighting effect was made over the activated semantic information, resulting in a strengthened involvement of the IFG as well as a reduced LPC amplitude (Figure 6⑤).

Progressive processing stages of gesture–speech information within the pMTG-IFG loop.
Correlations between the TMS disruption effect of pMTG and IFG with three information models are represented by the orange line and the green lines, respectively. Black lines denote the strongest correlations of ROI averaged ERP components with three information models. * p < 0.05, ** p < 0.01 after FDR correction.

Note that the sequential cortical involvement and ERP components discussed above are derived from a deliberate alignment of speech onset with gesture DP, creating an artificial priming effect with gesture semantically preceding speech. Caution is advised when generalizing these findings to the spontaneous gesture-speech relationships, although gestures naturally precede speech⁵⁶.

Limitations exist. ERP components and cortical engagements were linked through intermediary variables of entropy and MI. Dissociations were observed between ERP components and cortical engagement. Importantly, there is no direct evidence of the brain structures underpinning the corresponding ERPs, necessitating clarification in future studies. Additionally, not all influenced TWs exhibited significant associations with entropy and MI. While HD-tDCS and TMS may impact functionally and anatomically connected brain regions^43,44, the graded functionality of every disturbed period is not guaranteed. Caution is warranted in interpreting the causal relationship between NIBS inhabitation effects and information-theoretic metrics (entropy and MI). Finally, the current study incorporated a restricted set of entropy and MI measures. The generalizability of the findings should be assessed in future studies using a more extensive range of matrices.

In summary, utilizing information-theoretic complexity metrics such as entropy and mutual information (MI), our study demonstrates that multisensory semantic processing, involving gesture and speech, gives rise to dynamically evolving representations through the interplay between gesture-primed prediction and speech presentation. This process correlates with the progressive engagement of the pMTG-IFG-pMTG circuit and various ERP components. These findings significantly advancing our understanding of the neural mechanisms underlying multisensory semantic integration.

Acknowledgements

This research was supported by grants from the STI 2030—Major Projects 2021ZD0201500, the National Natural Science Foundation of China (31822024, 31800964), the Scientific Foundation of Institute of Psychology, Chinese Academy of Sciences (E2CX3625CX), and the Strategic Priority Research Program of Chinese Academy of Sciences (XDB32010300).

Additional information

Author contributions

Conceptualization, W.Y.Z. and Y.D.; Investigation, W.Y.Z. and Z.Y.L.; Formal Analysis, W.Y.Z. and Z.Y.L.; Methodology, W.Y.Z. and Z.Y.L.; Validation, Z.Y.L. and X.L.; Visualization, W.Y.Z. and Z.Y.L. and X.L.; Funding Acquisition, W.Y.Z. and Y.D.; Supervision, Y.D.; Project administration, Y.D.; Writing – Original Draft, W.Y.Z.; Writing – Review & Editing, W.Y.Z., Z.Y.L., X.L., and Y.D.

Competing interests

The authors declare no competing interests.

Gesture description and paring with incongruent and congruent speech.

Examples of ‘an4 (press)’ for the calculation of speech entropy, gesture entropy and mutual information (MI)

Quantitative information for each stimulus.

Raw RT of semantic congruent (Sc) and semantic incongruent (Si) in Experiment 1 and Experiment 2.

References

1.
1. Damasio H.
2. Grabowski T.J.
3. Tranel D.
4. Hichwa R.D.
5. Damasio A.R
1996A neural basis for lexical retrievalNature 380:499–505https://doi.org/10.1038/380499a0 Google Scholar
2.
1. Patterson K.
2. Nestor P.J.
3. Rogers T.T
2007Where do you know what you know? The representation of semantic knowledge in the human brainNature Reviews Neuroscience 8:976–987https://doi.org/10.1038/nrn2277 Google Scholar
3.
1. Brennan J.R.
2. Stabler E.P.
3. Van Wagenen S.E.
4. Luh W.M.
5. Hale J.T.
2016Abstract linguistic structure correlates with temporal activity during naturalistic comprehensionBrain and Language 157:81–94https://doi.org/10.1016/j.bandl.2016.04.008 Google Scholar
4.
1. Benetti S.
2. Ferrari A.
3. Pavani F
2023Multimodal processing in face-to-face interactions: A bridging link between psycholinguistics and sensory neuroscienceFront Hum Neurosci 17:1108354https://doi.org/10.3389/fnhum.2023.1108354 Google Scholar
5.
1. Armeni K.
2. Willems R.M.
3. Frank S.L
2017Probabilistic language models in cognitive neuroscience: Promises and pitfallsNeurosci Biobehav R 83:579–588https://doi.org/10.1016/j.neubiorev.2017.09.001 Google Scholar
6.
1. Leonard M.K.
2. Bouchard K.E.
3. Tang C.
4. Chang E.F
2015Dynamic encoding of speech sequence probability in human temporal cortexJournal of Neuroscience 35:7203–7214https://doi.org/10.1523/JNEUROSCI.4100-14.2015 Google Scholar
7.
1. Goldstein A.
2. Zada Z.
3. Buchnik E.
4. Schain M.
5. Price A.
6. Aubrey B.
7. Nastase S.A.
8. Feder A.
9. Emanuel D.
10. Cohen A.
11. et al.
2022Shared computational principles for language processing in humans and deep language modelsNature Neuroscience 25:369https://doi.org/10.1038/s41593-022-01026-4 Google Scholar
8.
1. Heilbron M.
2. Armeni K.
3. Schoffelen J.-M.
4. Hagoort P.
5. de Lange F.P.
2022A hierarchy of linguistic predictions during natural language comprehensionP Natl Acad Sci USA 119:e2201968119–e2201968119https://doi.org/10.1073/pnas.2201968119 Google Scholar
9.
1. Kuperberg G.R.
2. Jaeger T.F
2016What do we mean by prediction in language comprehension?Lang Cogn Neurosci 31:32–59https://doi.org/10.1080/23273798.2015.1102299 Google Scholar
10.
1. Ralph M.A.L
2014Neurocognitive insights on conceptual knowledge and its breakdownPhilos T R Soc B 369:20120392https://doi.org/10.1098/rstb.2012.0392 Google Scholar
11.
1. Rogers T.T.
2. Ralph M.A.L.
3. Garrard P.
4. Bozeat S.
5. McClelland J.L.
6. Hodges J.R.
7. Patterson K
2004Structure and deterioration of semantic memory: A neuropsychological and computational investigationPsychological Review 111:205–235https://doi.org/10.1037/0033-295x.111.1.205 Google Scholar
12.
1. Ralph M.A.L.
2. Jefferies E.
3. Patterson K.
4. Rogers T.T
2017The neural and computational bases of semantic cognitionNature Reviews Neuroscience 18:42–55https://doi.org/10.1038/nrn.2016.150 Google Scholar
13.
1. Holler J.
2. Levinson S.C
2019Multimodal language processing in human communicationTrends in Cognitive Sciences 23:639–652https://doi.org/10.1016/j.tics.2019.05.006 Google Scholar
14.
1. Hostetter A.
2. Mainela-Arnold E
2015Gestures occur with spatial and Motoric knowledge: It’s more than just coincidencePerspectives on Language Learning and Education 22:42–49https://doi.org/10.1044/lle22.2.42 Google Scholar
15.
1. McNeill D.
2005Gesture and thoughUniversity of Chicago Press https://doi.org/10.7208/chicago/9780226514642.001.0001 Google Scholar
16.
1. Kendon A
1997GestureAnnu Rev Anthropol 26:109–128https://doi.org/10.1146/annurev.anthro.26.1.109 Google Scholar
17.
1. Hagoort P
2005On broca, brain, and binding: a new frameworkTrends in Cognitive Sciences 9:416–423https://doi.org/10.1016/j.tics.2005.07.004 Google Scholar
18.
1. Hagoort P.
2. Hald L.
3. Bastiaansen M.
4. Petersson K.M
2004Integration of word meaning and world knowledge in language comprehensionScience 304:438–441https://doi.org/10.1126/science.1095455 Google Scholar
19.
1. Ozyurek A.
2. Willems R.M.
3. Kita S.
4. Hagoort P
2007On-line integration of semantic information from speech and gesture: Insights from event-related brain potentialsJ Cognitive Neurosci 19:605–616https://doi.org/10.1162/jocn.2007.19.4.605 Google Scholar
20.
1. Willems R.M.
2. Ozyurek A.
3. Hagoort P
2009Differential roles for left inferior frontal and superior temporal cortex in multimodal integration of action and languageNeuroimage 47:1992–2004https://doi.org/10.1016/j.neuroimage.2009.05.066 Google Scholar
21.
1. Drijvers L.
2. Jensen O.
3. Spaak E
2021Rapid invisible frequency tagging reveals nonlinear integration of auditory and visual informationHuman Brain Mapping 42:1138–1152https://doi.org/10.1002/hbm.25282 Google Scholar
22.
1. Drijvers L.
2. Ozyurek A
2018Native language status of the listener modulates the neural integration of speech and iconic gestures in clear and adverse listening conditionsBrain and Language 177:7–17https://doi.org/10.1016/j.bandl.2018.01.003 Google Scholar
23.
1. Drijvers L.
2. van der Plas M.
3. Ozyurek A.
4. Jensen O.
2019Native and non-native listeners show similar yet distinct oscillatory dynamics when using gestures to access speech in noiseNeuroimage 194:55–67https://doi.org/10.1016/j.neuroimage.2019.03.032 Google Scholar
24.
1. Holle H.
2. Gunter T.C
2007The role of iconic gestures in speech disambiguation: ERP evidenceJ Cognitive Neurosci 19:1175–1192https://doi.org/10.1162/jocn.2007.19.7.1175 Google Scholar
25.
1. Kita S.
2. Ozyurek A
2003What does cross-linguistic variation in semantic coordination of speech and gesture reveal?: Evidence for an interface representation of spatial thinking and speakingJ Mem Lang 48:16–32https://doi.org/10.1016/S0749-596x(02)00505-3 Google Scholar
26.
1. Bernardis P.
2. Gentilucci M
2006Speech and gesture share the same communication systemNeuropsychologia 44:178–190https://doi.org/10.1016/j.neuropsychologia.2005.05.007 Google Scholar
27.
1. Zhao W.Y.
2. Riggs K.
3. Schindler I.
4. Holle H
2018Transcranial magnetic stimulation over left inferior frontal and posterior temporal cortex disrupts gesture-speech integrationJournal of Neuroscience 38:1891–1900https://doi.org/10.1523/Jneurosci.1748-17.2017 Google Scholar
28.
1. Zhao W.
2. Li Y.
3. Du Y
2021TMS reveals dynamic interaction between inferior frontal gyrus and posterior middle temporal gyrus in gesture-speech semantic integrationThe Journal of Neuroscience https://doi.org/10.1523/jneurosci.1355-21.2021 Google Scholar
29.
1. Shannon C.E
1948A mathematical theory of communicationBell Syst Tech J 27:379–423https://doi.org/10.1002/j.1538-7305.1948.tb01338.x Google Scholar
30.
1. Tremblay P.
2. Deschamps I.
3. Baroni M.
4. Hasson U
2016Neural sensitivity to syllable frequency and mutual information in speech perception and productionNeuroimage 136:106–121https://doi.org/10.1016/j.neuroimage.2016.05.018 Google Scholar
31.
1. Bikson M.
2. Inoue M.
3. Akiyama H.
4. Deans J.K.
5. Fox J.E.
6. Miyakawa H.
7. Jefferys J.G.R
2004Effects of uniform extracellular DC electric fields on excitability in rat hippocampal slicesJ Physiol-London 557:175–190https://doi.org/10.1113/jphysiol.2003.055772 Google Scholar
32.
1. Pitcher D.
2. Walsh V.
3. Yovel G.
4. Duchaine B
2007TMS evidence for the involvement of the right occipital face area in early face processingCurrent Biology 17:1568–1573https://doi.org/10.1016/j.cub.2007.07.063 Google Scholar
33.
1. Federmeier K.D.
2. Mai H.
3. Kutas M
2005Both sides get the point: hemispheric sensitivities to sentential constraintMemory & Cognition 33:871–886https://doi.org/10.3758/bf03193082 Google Scholar
34.
1. Kelly S.D.
2. Kravitz C.
3. Hopkins M
2004Neural correlates of bimodal speech and gesture comprehensionBrain and Language 89:253–260https://doi.org/10.1016/s0093-934x(03)00335-3 Google Scholar
35.
1. Wu Y.C.
2. Coulson S
2005Meaningful gestures: Electrophysiological indices of iconic gesture comprehensionPsychophysiology 42:654–667https://doi.org/10.1111/j.1469-8986.2005.00356.x Google Scholar
36.
1. Fritz I.
2. Kita S.
3. Littlemore J.
4. Krott A
2021Multimodal language processing: How preceding discourse constrains gesture interpretation and affects gesture integration when gestures do not synchronise with semantic affiliatesJ Mem Lang 117:104191https://doi.org/10.1016/j.jml.2020.104191 Google Scholar
37.
1. Gunter T.C.
2. Weinbrenner J.E.D
2017When to take a gesture seriously: On how we use and prioritize communicative cuesJ Cognitive Neurosci 29:1355–1367https://doi.org/10.1162/jocn_a_01125 Google Scholar
38.
1. Oldfield R.C
1971The assessment and analysis of handedness: the Edinburgh inventoryNeuropsychologia 9:97–113https://doi.org/10.1016/0028-3932(71)90067-4 Google Scholar
39.
1. Kelly S.D.
2. Creigh P.
3. Bartolotti J
2010Integrating speech and iconic gestures in a Stroop-like task: Evidence for automatic processingJournal of Cognitive Neuroscience 22:683–694https://doi.org/10.1162/jocn.2009.21254 Google Scholar
40.
1. Koessler L.
2. Maillard L.
3. Benhadid A.
4. Vignal J.P.
5. Felblinger J.
6. Vespignani H.
7. Braun M
2009Automated cortical projection of EEG sensors: Anatomical correlation via the international 10-10 systemNeuroimage 46:64–72https://doi.org/10.1016/j.neuroimage.2009.02.006 Google Scholar
41.
1. Nuwer M.R.
2. Comi G.
3. Emerson R.
4. Fuglsang-Frederiksen A.
5. Guerit J.M.
6. Hinrichs H.
7. Ikeda A.
8. Luccas F.J.
9. Rappelsberger P
1999IFCN standards for digital recording of clinical EEG. The International Federation of Clinical NeurophysiologyElectroencephalogr Clin Neurophysiol Suppl 52:11–14https://doi.org/10.1016/S0013-4694(97)00106-5 Google Scholar
42.
1. Delorme A.
2. Makeig S
2004EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysisJ Neurosci Methods 134:9–21https://doi.org/10.1016/j.jneumeth.2003.10.009 Google Scholar
43.
1. Habets B.
2. Kita S.
3. Shao Z.S.
4. Ozyurek A.
5. Hagoort P
2011The Role of Synchrony and Ambiguity in Speech-Gesture Integration during ComprehensionJ Cognitive Neurosci 23:1845–1854https://doi.org/10.1162/jocn.2010.21462 Google Scholar
44.
1. Oostenveld R.
2. Fries P.
3. Maris E.
4. Schoffelen J.-M
2011FieldTrip: Open Source Software for Advanced Analysis of MEG, EEG, and Invasive Electrophysiological DataComputational Intelligence and Neuroscience 2011https://doi.org/10.1155/2011/156869 Google Scholar
45.
1. Kutas M.
2. Federmeier K.D
2011Thirty Years and Counting: Finding Meaning in the N400 Component of the Event-Related Brain Potential (ERP)Annual Review of Psychology 62:621–647https://doi.org/10.1146/annurev.psych.093008.131123 Google Scholar
46.
1. Noppeney U
2021Perceptual Inference, Learning, and Attention in a Multisensory WorldAnnual Review of Neuroscience 44:449–473https://doi.org/10.1146/annurev-neuro-100120-085519 Google Scholar
47.
1. Rogers T.T.
2. Hodges J.R.
3. Ralph M.A.L.
4. Patterson K
2003Object recognition under semantic impairment: The effects of conceptual regularities on perceptual decisionsLang Cognitive Proc 18:625–662https://doi.org/10.1080/01690960344000053 Google Scholar
48.
1. Krieger-Redwood K.
2. Wang X.
3. Souter N.
4. Gonzalez Alam T.R.d.J.
5. Smallwood J.
6. Jackson R.L.
7. Jefferies E.
2024Graded and sharp transitions in semantic function in left temporal lobeBrain and Language 251:105402https://doi.org/10.1016/j.bandl.2024.105402 Google Scholar
49.
1. Plaut D.C
2002Graded modality-specific specialisation in semantics: A computational account of optic aphasiaCognitive Neuropsychology 19:603–639https://doi.org/10.1080/02643290244000112 Google Scholar
50.
1. Fadiga L.
2. Craighero L.
3. Olivier E
2005Human motor cortex excitability during the perception of others’ actionCurrent Opinion in Neurobiology 15:213–218https://doi.org/10.1016/j.conb.2005.03.013 Google Scholar
51.
1. Giard M.H.
2. Peronnet F
1999Auditory-visual integration during multimodal object recognition in humans: A behavioral and electrophysiological studyJ Cognitive Neurosci 11:473–490https://doi.org/10.1162/089892999563544 Google Scholar
52.
1. Trujillo J.P.
2. Holler J
2023Interactionally Embedded Gestalt Principles of Multimodal Human CommunicationPerspect Psychol Sci 18:1136–1159https://doi.org/10.1177/17456916221141422 Google Scholar
53.
1. Hagoort P
2013MUC (Memory, UnificationControl) and beyond. Frontiers in Psychology 4:416https://doi.org/10.3389/fpsyg.2013.00416 Google Scholar
54.
1. Bizley J.K.
2. Maddox R.K.
3. Lee A.K.C
2016Defining auditory-visual objects: Behavioral tests and physiological mechanismsTrends in Neurosciences 39:74–85https://doi.org/10.1016/j.tins.2015.12.007 Google Scholar
55.
1. Tesink C.M.J.Y.
2. Petersson K.M.
3. van Berkum J.J.A.
4. van den Brink D.
5. Buitelaar J.K.
6. Hagoort P.
2009Unification of speaker and meaning in language comprehension: An fMRI studyJ Cognitive Neurosci 21:2085–2099https://doi.org/10.1162/jocn.2008.21161 Google Scholar
56.
1. McNeill D.
1992Hand and mind : what gestures reveal about thoughtUniversity of Chicago Press https://doi.org/10.2307/1576015 Google Scholar

Article and author information

Author information

Wanying Zhao
CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China, Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
ORCID iD: 0000-0002-6979-940X
- For correspondence: zhaowy@psych.ac.cn
Zhouyi Li
CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China, School of Psychology, Central China Normal University, Wuhan, China
Xiang Li
CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China, Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
Yi Du
CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China, Department of Psychology, University of Chinese Academy of Sciences, Beijing, China, CAS Center for Excellence in Brain Science and Intelligence Technology, Shanghai, China, Chinese Institute for Brain Research, Beijing, China 102206
ORCID iD: 0000-0003-4512-5221
- For correspondence: duyi@psych.ac.cn

Version history

Sent for peer review: May 28, 2024
Preprint posted: June 4, 2024
Reviewed Preprint version 1: December 13, 2024
Reviewed Preprint version 2: May 19, 2025
Reviewed Preprint version 3: July 16, 2025

Cite all versions

You can cite all versions using the DOI https://doi.org/10.7554/eLife.99416. This DOI represents all versions, and will always resolve to the latest one.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

views: 651
downloads: 25
citations: 0

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Significance of findings

Strength of evidence

Abstract

Introduction

Material and methods

Participants

Stimuli

Experimental design, and stimulus characteristics.

Quantification formulas (A) and distributions of each stimulus in Shannon’s entropy (B).

Experimental procedure

Experiment 1: HD-tDCS protocol and data analysis

Experiment 2: TMS protocol and data analysis

Experiment 3: Electroencephalogram (EEG) recording and data analysis

Results

Experiment 1: Modulation of left pMTG and IFG engagement by gradual changes in gesture-speech semantic information

tDCS effect over semantic congruency.

Experiment 2: Time-sensitive modulation of left pMTG and IFG engagements by gradual changes in gesture-speech semantic information

TMS effect over semantic congruency.

Experiment 3: Temporal modulation of P1, N1-P2, N400 and LPC components by gradual changes in gesture-speech semantic information

ERP results of gesture entropy (A), speech entropy (B) or MI (C).

Discussion

Progressive processing stages of gesture–speech information within the pMTG-IFG loop.

Acknowledgements

Additional information

Author contributions

Competing interests

Gesture description and paring with incongruent and congruent speech.

Examples of ‘an4 (press)’ for the calculation of speech entropy, gesture entropy and mutual information (MI)

Quantitative information for each stimulus.

Raw RT of semantic congruent (Sc) and semantic incongruent (Si) in Experiment 1 and Experiment 2.

References

Article and author information

Author information

Wanying Zhao

Zhouyi Li

Xiang Li

Yi Du

Version history

Cite all versions

Copyright

Metrics