1. Neuroscience
Download icon

Uncovering the functional anatomy of the human insula during speech

  1. Oscar Woolnough
  2. Kiefer James Forseth
  3. Patrick Sarahan Rollo
  4. Nitin Tandon  Is a corresponding author
  1. McGovern Medical School at UT Health Houston, United States
  2. University of Texas Health Science Center at Houston, United States
  3. Texas Medical Center, United States
Research Article
  • Cited 2
  • Views 1,755
  • Annotations
Cite this article as: eLife 2019;8:e53086 doi: 10.7554/eLife.53086

Abstract

The contribution of insular cortex to speech production remains unclear and controversial given diverse findings from functional neuroimaging and lesional data. To create a precise spatiotemporal map of insular activity, we performed a series of experiments: single-word articulations of varying complexity, non-speech orofacial movements and speech listening, in a cohort of 27 patients implanted with penetrating intracranial electrodes. The posterior insula was robustly active bilaterally, but after the onset of articulation, during listening to speech and during production of non-speech mouth movements. Preceding articulation there was very sparse activity, localized primarily to the frontal operculum rather than the insula. Posterior insular was active coincident with superior temporal gyrus but was more active for self-generated speech than external speech, the opposite of the superior temporal gyrus. These findings support the conclusion that the insula does not serve pre-articulatory preparatory roles.

Introduction

Multiple lesion studies have linked insular damage to speech and orofacial motor control disorders such as apraxia of speech (AOS) (Dronkers, 1996; Marien et al., 2001; Ogar et al., 2006; Itabashi et al., 2016), dysarthria (Baier et al., 2011) and dysphagia (Daniels et al., 1996). Dronkers’ study was the first to quantitatively link anterior insular damage to a disruption in speech production, finding a 100% lesion overlap in the left superior precentral gyrus of the insula (SPG) in AOS patients. Lesion symptom mapping revealed that patients with SPG lesions produced a greater number of speech errors during complex, multisyllabic articulations (Ogar et al., 2006; Baldo et al., 2011). This implicated the insula in pre-articulatory motor plans specific to speech. Further evidence was contributed by fMRI studies that implicate the anterior insula in both speech perception and production (Mutschler et al., 2009; Adank, 2012; McGettigan et al., 2013; Ardila et al., 2014; Oh et al., 2014). In the dual-stream model of the speech articulation network, the anterior insula is included as part of the putative dorsal sensorimotor pathway (Hickok and Poeppel, 2007).

However, the data regarding insula’s direct involvement in speech are not all concordant. Comparative assessments of activation to speech and non-speech oral movements have shown greater activity for non-speech within the insula (Bonilha et al., 2006) suggesting a more general role in oral motor control without specialization for speech. Also, it has been suggested that the apparent involvement of the SPG in complex articulation might actually be attributable to the inferior frontal gyrus (IFG) (Fedorenko et al., 2015), therefore the insula’s engagement based on AOS lesion studies may simply reflect the high probability of insular damage following middle cerebral artery ischemia, with Broca’s area lesions being more causative of AOS (Hillis et al., 2004).

Experiments directly stimulating the insula have also been inconclusive. Of the many patients who have undergone insular stimulation with implanted electrodes, somatosensory manifestations are relatively common (in 70% of cases), but speech disruptions occur very infrequently (5–7% of the time) (Ostrowsky et al., 2000; Afif et al., 2010; Pugnaghi et al., 2011; Stephani et al., 2011; Mazzola et al., 2017).

The major impediment in categorizing the role of the insula has been the lack of information regarding the timing of its activation. Its deep location renders it inaccessible to standard non-invasive electrophysiological techniques. We have previously shown that it is possible to obtain direct high spatiotemporal resolution recordings of the right (non-dominant) insula using stereotactically placed depth electrodes, work that allowed us to disambiguate its role in stopping movement relative to activity of the right IFG (Bartoli et al., 2018).

Here, we performed direct, invasive recordings of cortical activity from multiple sites across the insula in both hemispheres, in patients undergoing seizure localization for intractable epilepsy, testing the theories generated from the existing literature, namely that the insula acts as a pre-articulatory preparatory region. Given that the insula is thought to be engaged by complex articulation, we used two speech articulation tasks of varying levels of complexity, either reading complex multisyllabic words or naming objects with mono or multisyllabic names. To help disentangle speech-specific neural activity from more general processes, recordings were also performed during listening to speech and during orofacial movements. Taken together, these tasks allow us to map insular regions specifically involved in the preparation and production of speech.

Results

Participants read visually presented words aloud (n = 27), performed an object naming task (n = 23) of common objects presented as line drawings, listened to speech stimuli as a part of a naming to description task (n = 21) and performed an orofacial praxis task (n = 8), where they silently performed non-speech mouth movements (Figure 1A).

Experimental design.

(A) Schematic representation of the four tasks. (B) Representative spatial coverage map on a standard N27 inflated surface illustrating how many patients had electrodes in each brain region. (C) Individual electrodes shown on the same brain surface. Colored electrodes represent those included in each ROI for the grouped electrode analyses. PI: Posterior Insula, AI: Anterior Insula, STG: Superior Temporal Gyrus, CS: Central Sulcus, IFG: Inferior Frontal Gyrus.

Behavioral performance

Task accuracy for the word reading task was 96.4 ± 5.3% (mean ± SD) with an average response time (RT) of 978 ± 221 ms from word presentation. The RT for the object naming task was 1192 ± 245 ms. 75.1 ± 10.6% of responses in the naming task had both correct articulation and the expected, most common word choice; only these were used for subsequent analysis.

Electrode coverage

A plot of the coverage across our standardized brain, accumulating electrode recording zones from each patient (Figure 1B) (Kadipasaoglu et al., 2014), confirmed good coverage bilaterally in all sub-regions of the insula and across our extra-insular ROIs.

Chronology of insula activation

To compare the timing of activation of other functional regions with the insula, we used ROIs based on known anatomico-functional parcellation of the insula, separating the short gyri (anterior) and long gyri (posterior) (Naidich et al., 2004), and targeted adjacent regions well-established to be involved in speech production, as detailed in the methods: left superior temporal gyrus (STG; primary and secondary auditory cortex), bilateral central sulcus (CS) and left inferior frontal gyrus (IFG) (Figure 1C). During both reading and naming, activity in these ROIs was as expected (Figure 2; Figure 2—source data 1). IFG activation began ~750 ms before speech onset, prior to CS activation. CS activity was maximal at speech onset and, shortly after the onset speech, STG became active.

Figure 2 with 3 supplements see all
Spectrotemporal representations of activity in the ROIs.

Broadband gamma activity (A–D) and spectrogram (E–H) plots of activity within each ROI, averaged across subjects during the complex reading (A,E; n = 27), monosyllabic naming (B,F; n = 23), multisyllabic naming (C,G; n = 23) and listening (D,H; n = 21) tasks. Colored bars under the BGA plots represent regions of significant activation (q < 0.05). Responses are time locked to speech onset in the reading and naming tasks and to the stimulus onset in listening.

The posterior insula was active exclusively after speech onset implying that it did not play a role in speech planning. There were no differences in the amplitude of activation with varying levels of articulation complexity - simple monosyllabic names vs. complex read words (Wilcoxon rank sign, 200–600 ms; p=0.083) and multisyllabic naming responses (p=0.898) (Figure 2—figure supplement 1). The only difference observed was a duration effect, with longer articulation times and therefore longer activation duration for multisyllabic words (600–1000 ms; p=0.024).

In posterior insula, the timing of responses resembled those of STG very closely, with similar onset, offset and peak activity times. As expected, from other studies of auditory cortex, the STG responded more strongly to external speech than to self-generated speech (p=0.002) (Figure 2—figure supplement 1). The posterior insula, however, showed a significantly greater response during speech production than during speech listening (p=0.019), thus these two adjacent regions are functional dissociable. Further, in the praxis task, we observed significant post-articulatory activation in posterior insula where none was seen in STG (Figure 2—figure supplement 2).

The anterior insula showed a very weak though significant activation in both speech articulation tasks, starting shortly after the IFG. This low amplitude response first became significant around 150 ms before speech onset and remained active for the duration of speech, concurrent with the posterior insula and STG. This small but reliable response could represent a low magnitude local processing, but could also represent activity from an active adjacent region, such as the frontal operculum which overlies the insula. Additionally, when considering the evoked activity from anterior insula, we observed no significant response (Figure 2—figure supplement 3).

Topography of insula activation

To evaluate activity in the insula without imposing regional homogenization intrinsic to grouped electrode analysis, and form an accurate spatiotemporal activity map, we also represented the activity of individual insular electrodes on a standardized brain surface (Figure 3A). During both reading and naming the magnitude and significance of the activation at individual electrodes were highly consistent. In the post-articulatory period of both the reading and naming tasks, the posterior insula, bilaterally along the anterior long gyri (ALG), showed clusters of electrodes responding with high significance and amplitude, primarily in the superior part of the gyrus (Figure 3B; Figure 3—source data 1). This region also appeared to be the primary region of activation in the listening task and also showed significant but low amplitude activation during the praxis task.

Topographic maps of a standardized insula.

(A) Insula coverage map showing all insula electrodes from patients who did the reading task on a standard N27 pial surface insula. (B) Activity maps showing activation above baseline for each task in either the pre (−500 to −100 ms) or post (200 to 600 ms) articulatory time window. ASG: Anterior Short Gyrus, PSG: Posterior Short Gyrus, ALG: Anterior Long Gyrus. Electrodes with a non-significant activation (q > 0.05) shown in black.

Overall the anterior insula showed characteristics comparable to those in the grouped analysis, with significant but very low amplitude responses in the post-articulatory period but little to no pre-articulatory activity. The active electrodes seen at the anterior edge of the left insula in the pre-articulatory period were both from one patient and were in close proximity to the frontal operculum. The remaining electrodes with significant responses were very low in amplitude and were scattered across the region, with no clear spatial organization.

To account for variations in sampling density across the insula, we also performed a population level analysis using surface based mixed-effects multilevel analysis (sb-MEMA) (Conner et al., 2011; Kadipasaoglu et al., 2014; Kadipasaoglu et al., 2015; Forseth et al., 2018) on the reading task (Figure 4; Video 1). In the pre-articulatory period, we saw prominent activity in the IFG and a cluster across the medial frontal operculum but little to no significant activity anywhere in the insula. By comparison, in the post-articulatory time period, the majority of posterior insula showed substantial activation, accompanied by a large activation across the STG. Additionally, the activation cluster extended across the frontal operculum and spread into part of the superior anterior insula. These results are concordant with our previous analyses in that the posterior insula shows substantial activation during speech production, alongside the superior temporal gyrus. This analysis also showed that while the anterior insula shows little activity the adjacent frontal operculum shows prominent activation.

MEMA map showing left hemispheric activity in the pre (−500 to −100 ms) and post (200 to 600 ms) articulatory periods during the reading task.

Regions are shown for clusters with significant activity (p<0.01, corrected), absolute BGA change of >10% and coverage of at least three patients. Regions excluded for lack of patient coverage are shown in black.

Video 1
MEMA video of left hemispheric activity during the reading task.

MEMA was run on short, overlapping time windows (150 ms width, 10 ms spacing). Regions are shown for clusters with significant activity (p<0.01, uncorrected), absolute BGA change of >15% and coverage of at least three patients. Regions excluded for lack of patient coverage are shown in black.

Comparing anterior insula and frontal operculum

In fMRI studies of speech and language, the anterior insula is often represented as being active. However, these activation clusters typically encompass both anterior insula and medial frontal operculum (FO). Given that we recorded little activity from electrodes in anterior insula and the MEMA results revealed prominent activity in the FO, it is pertinent to evaluate if FO activity could be the true source of the subtle activation seen in some anterior insular electrodes.

To assess this, we evaluated electrode responses in patients who had electrodes in both the anterior insula and FO. Two examples are shown in Figure 5. We noted strong, pre-articulatory BGA responses in FO but low amplitude activity in the anterior insula. FO electrodes showed significant gamma activation that started up to 700 ms before articulation and continued for the duration of articulation (Figure 5A,B; Figure 5—source data 1).

Figure 5 with 1 supplement see all
Within-patient differences in anterior insula and frontal operculum activity.

(A) Electrode pairs within two representative patients showing two neighboring electrodes, one within anterior insula (AI) and another within frontal operculum (FO). (B) Activity of these electrodes while reading, showing much greater activity within the FO electrodes. Electrode locations (C) and BGA (D) for neighboring electrode pairs used for within patient comparisons of AI and FO activity. Colored bars under the BGA plots represent regions of significant activation (q < 0.05).

Due to the oblique trajectories used for sampling the insula, the majority of patients (n = 13) with anterior insula electrodes had a nearby electrode on the same probe (separation 5.7 ± 2.2 mm) that was localized to frontal operculum (Figure 5C). The band-limited (70–150 Hz) voltage traces of these electrodes were significantly correlated between the electrode pairs (r = 0.11 ± 0.03, mean ± SE; Wilcoxon rank sign, p=0.008), this correlation was maximal at 0 ms time lag, suggestive of volume conduction between the two regions (Figure 5—figure supplement 1). Also, the population level BGA time courses were highly comparable between the two regions (Figure 5D).

Posterior insular vs. superior temporal gyrus activity

As we have shown earlier, the posterior insula, particularly the superior ALG, is active in the post-articulatory period during both reading and naming and also during auditory perception and orofacial movements. Posterior insula and STG followed comparable time courses of activation and appeared as one contiguous activation cluster in the MEMA potentially suggesting comparable functional characteristics. However, when we compared the response amplitudes between the reading and listening tasks, within individual electrodes (Figure 6; Figure 6—source data 1), significantly larger responses were seen along the entire left STG to externally generated over self-generated speech. By contrast, the posterior insula showed greater activation during self-generated speech.

Functional dissociation of posterior insula from superior temporal gyrus.

(A) Contrast map comparing activation during self-generated speech in the reading task to external speech from the listening task in the 200 to 600 ms window. STG showed greater activation during speech perception while PI activity was greater during speech production. Electrodes with a non-significant difference (q > 0.05) shown in black. Electrode locations (B) and BGA for reading (C) and listening (D) for electrode pairs used for within patient comparisons of PI and STG activity. Colored bars under the BGA plots represents regions of significant activation (q < 0.05).

In patients with electrodes in both PI and STG (n = 8), we took the closest electrode pair (separation 11.9 ± 2.9 mm) (Figure 6B). In contrast to the AI-FO correlation, band-limited voltage traces in these electrode pairs were not significantly correlated (r = 0.01 ± 0.03, mean ± SE; Wilcoxon rank sign, p=0.74).

Discussion

Our recordings showed no clear evidence for insular involvement in the preparation for speech. Rather, bilateral posterior insula, particularly the superior anterior long gyri, appear involved in some aspect of monitoring during speech production, a process that is functionally separable from that of the superior temporal gyrus. We find little evidence anterior insula is directly involved in speech or language production, with the neighboring frontal operculum more likely being the true regional activity source.

Posterior insula

The posterior insula was strongly active during speech production, with weaker activation during speech perception and much weaker activation during silent mouth movements. The weaker activation during these constituent components of speech compared with speech production could suggest that this region may serve a role in integration of auditory and somatosensory input (Rodgers et al., 2008). It could also suggest its involvement in respiratory or laryngeal control (Ackermann and Riecker, 2010; Fedorenko et al., 2015), aspects we did not control for in these experiments.

This activation profile in posterior insula is the exact opposite of the STG, which is more active during externally produced speech than self-generated speech. It is well known that auditory cortex is suppressed during self-generated speech (Creutzfeldt et al., 1989; Paus et al., 1996; Numminen et al., 1999; Chan et al., 2014) and self-generated sounds more generally (Rummell et al., 2016; Singla et al., 2017), likely as a result of interactions between auditory and non-auditory sensory feedback in auditory cortex. While STG is more active during externally produced speech than self-generated speech, posterior insular activity does not suppress in response to articulation and rather, is more active during self-generated speech.

The notion of that the posterior insula is involved in somatosensory processing is broadly consistent with lesional studies of dysarthric (Baier et al., 2011) and dysphagic (Daniels et al., 1996) populations and with stimulation studies that result in orofacial sensations (Pugnaghi et al., 2011). Our higher spatiotemporal resolution allows us to resolve these properties of the insula better than fMRI studies of mouth movement related activity in this region (Bonilha et al., 2006; Fedorenko et al., 2015).

Lesional analysis in cases of apraxia of speech, attribute the primary cause to be either (i) disruption of pre-articulatory planning in the superior left posterior short gyrus, (also called the precentral gyrus of the insula), (ii) disruption of pre-articulatory planning in IFG or (iii) impairment of audio-motor integration. Our study provides evidence to rule out the first possibility. Given the lack of pre-articulatory activity shown in this study and the lack of any relationship of activation to the complexity of articulation, it is unlikely that lesions of this region are crucial for AOS. (Kent, 2000; Baldo et al., 2011). While our results are suggestive of audio-motor integration in the ALG, we do not have direct evidence of this function (Kent and Rosenbek, 1983; Rogers et al., 1996; Maas et al., 2015). Thus, our findings best support AOS representing a disruption of the IFG (Hillis et al., 2004; Fedorenko et al., 2015).

In summary, the posterior insula (i) lacks pre-articulatory activity, (ii) lacks complexity sensitivity (Baldo et al., 2011), (iii) is activated by externally produced sounds and (iv) by non-speech mouth movements. Taken together these findings are suggestive of a sensory monitoring region - congruent with the role of the insula in auditory-somatosensory integration (Rodgers et al., 2008) where both auditory and somatosensory activity in rodent insula is maximal during coincident presentation, comparable to what we see during human self-generated speech.

Anterior insula

The anterior insula is a common area of fMRI activation during speech and language studies. However, these do not show a clear delineation between insula and the operculum in their activation clusters. Our results reveal that the majority of activity lies on the side of the operculum rather than anterior insula. The frontal operculum is also the only peri-insular region in this study with significant and substantial pre-articulatory gamma activity, the timing of which would implicate it as a preparatory region, a role that has traditionally been attributed to the insula. This agrees with stimulation studies of the operculum as disruption of this region has been shown to lead to language disruption (Mălîia et al., 2018).

Our electrodes have a center-to-center separation of ~4 mm and this distance between neighboring electrodes allowed us to distinguish the highly active frontal operculum from the minimally active anterior insula. In most modern fMRI, a smoothing kernel of 4–8 mm is used (Mikl et al., 2008) which would likely remove the distinction between these two regions. The interpretation of fMRI data derived from peri-insular cortex should therefore consider the degree of smoothing, the use of surface-based smoothing (Saad and Reynolds, 2012) and use of individualized patient ROI masks (Fedorenko et al., 2015).

In summary, we find that the insula does not serve pre-articulatory preparatory roles, and that bilateral posterior insular cortices may function as auditory and somatosensory integration or monitoring regions. Our findings, analyzed several different ways, have implications for existent models of language production in humans and for the pathophysiology of speech disorders following brain injury.

Materials and methods

Participants

Twenty-seven patients (14 male, 18–50 years, 8 left handed) participated in the experiments after written informed consent was obtained. All experimental procedures were reviewed and approved by the Committee for the Protection of Human Subjects (CPHS) of the University of Texas Health Science Center at Houston as Protocol Number HSC-MS-06–0385. Inclusion criteria were that the participants were an English native speaker, had at least one electrode contact localized in the insular long or short gyri and that the insula was not identified as a seizure onset zone.

Electrode implantation and data recording

Request a detailed protocol

Recordings were acquired from stereo-electroencephalographic (sEEG) electrodes (PMT corporation, Chanhassen, Minnesota) implanted for clinical purposes of seizure localization in patients with pharmaco-resistant epilepsy using a Robotic Surgical Assistant (ROSA; Medtech, Montpellier, France) (Tandon et al., 2019). sEEG probes were 0.8 mm in diameter, with 8–16 electrode contacts, each of which was a platinum-iridium cylinder, 2.0 mm in length and separated from the adjacent contact by 1.5–2.43 mm. Thus, the center-to-center distance between the electrode contacts was 3.5–4.43 mm. Each patient had multiple (12-16) such probes implanted.

Following implantation, electrodes were localized by co-registration of pre-operative anatomical 3T MRI and post-operative CT scans using a cost function in AFNI (Cox, 1996). Electrode positions were projected onto a cortical surface model generated in FreeSurfer (Dale et al., 1999), and displayed on the cortical surface model for visualization (Pieters et al., 2013). sEEG data were collected using the NeuroPort recording system (Blackrock Microsystems, Salt Lake City, UT) digitized at 2 kHz. They were imported into MATLAB initially referenced to the white matter channel used as a reference by the clinical acquisition system, visually inspected for line noise, artefacts and epileptic activity. Electrodes with excessive line noise or localized to sites of seizure onset were excluded. Each electrode was re-referenced offline to the common average of the remaining channels. Trials contaminated by inter-ictal epileptic spikes were discarded.

Stimuli and experimental design

Participants read visually presented words aloud (n = 27), performed an object naming task (n = 23) of common objects presented as line drawings, listened to speech stimuli as a part of a naming to description task (n = 21) and performed an orofacial praxis task (n = 8), where they silently performed non-speech mouth movements (Figure 1A).

Word Reading

Request a detailed protocol

Fifty-eight unique words were visually presented in a pseudorandom order with no repetition and patients read the words aloud. Stimuli were presented using Python v2.7, on a 15.4’ LCD screen positioned at eye-level, 2–3’ from the patient, for 2000 ms with an inter-stimulus interval of 3000 ms. Black, lower-case text (Calibri, height 150 pixels) centered on a 2880 × 1800 pixel white background was used. To create high articulatory complexity and to maximally engage the insula (Baldo et al., 2011), all words used had three or more syllables, an initial CCV phoneme structure, and a high articulatory travel (e.g. snapdragon, globalization, claustrophobia).

Object Naming

Request a detailed protocol

To enable comparisons of a range of articulatory complexities and allow for word selection processes, participants were presented with visual stimuli selected from a standardized set of line drawings (Snodgrass and Vanderwart, 1980; Kaplan et al., 1983) and instructed to verbally name the objects (Conner et al., 2014; Forseth et al., 2018). Stimuli were presented in two recording sessions, each containing presentation of 165 unique images, in a pseudorandom order, that were either coherent images or their spatially scrambled versions. Stimuli were presented using Python v2.7 at a size of 1000 × 1000 pixels centered on a 2880 × 1800 pixel white background on a 15.4’ LCD screen positioned at eye-level, 2–3’ from the patient, for 1500 ms with an inter-stimulus interval of 3000 ms. Unlike the reading task, this task did not require a single specific response, allowing the patient to choose the response to any given stimulus (e.g. Pelican vs. Bird, Rhinoceros vs. Rhino). Only trials with the most commonly produced word for each stimulus were used for analysis. If all responses were given using expected answers this resulted in 112 monosyllabic responses and 37 multisyllabic (3+ syllable) responses.

Listening to speech

Request a detailed protocol

Participants listened to recorded phrases that described common objects (Hamberger and Seidel, 2003; Forseth et al., 2018). A total of 72 unique speech stimuli were presented in a pseudorandom order, balanced with an equal number of trials with male and female speakers. Auditory stimuli were played using stereo speakers (44.1 kHz, 15’ MacBook Pro 2008) with an inter-stimulus interval of 5000 ms.

Orofacial Praxis

Request a detailed protocol

Participants were cued to perform various orofacial movements silently (e.g. smile then pout, stick tongue out straight) based on the stimuli from Fedorenko et al. (2015). There were 12 unique movements, repeated five times each in a pseudorandom order resulting in a total of 60 trials. Instructions were presented using the same style visual stimuli and timings as in the reading task.

Audio and video analysis

Request a detailed protocol

Continuous audio recordings were carried out using an omnidirectional microphone (30–20,000 Hz response, 73 dB SNR, Audio Technica U841A) placed adjacent to the presentation laptop. These recordings were analyzed offline to transcribe patient responses and manually select the onset of audible speech. In participants who performed the praxis task, we performed video recordings of both reading and praxis tasks to obtain timing of articulation onset and to allow a comparison between the timing of the onset of articulation and that of audible speech. This comparison showed that in the reading task, articulation onset preceded audible speech by 108 ± 127 ms.

Signal analysis

Request a detailed protocol

A total of 5312 electrode contacts were implanted in these patients, 1807 of these were excluded from analysis due to proximity to the seizure onset zone, excessive inter-ictal spikes or line noise. Insular electrodes were selected manually, based on anatomical criteria, after localization of the electrodes’ CT artifacts relative to a pre-operative MRI scan. Electrodes in either the anterior (short) gyri or posterior (long) gyri of the insula, as separated by the central sulcus (Naidich et al., 2004), were selected for analysis. 52 electrodes in 20 patients were localized to anterior insula (LH n = 40 (16), RH n = 12 (5); # electrodes (# patients)) and 53 electrodes in 14 patients were localized to posterior insula (LH n = 30 (9), RH n = 23 (7)). For the remainder, electrodes were indexed to the closest node on a standardized cortical surface in patient-space to enable grouped representation and analysis (Saad and Reynolds, 2012). Regions of interest (ROIs) derived using cortical parcellation from the Human Connectome Project parcellation (Glasser et al., 2016) on the standardized surface were used to select electrodes for further analysis. These were the (i) superior temporal gyrus (STG) ROI comprised of: A1, PBelt, MBelt and LBelt in the left hemisphere (n = 106 (16)); (ii) the central sulcus (CS) ROI comprised of areas 3a, 3b and 4 bilaterally (n = 31 (5)). (iii) the inferior frontal gyrus (IFG) ROI comprised of the areas 45, IFSp and IFSa in the left hemisphere (n = 70 (17)). A post-hoc medial frontal operculum ROI (n = 13 (13)) was defined using the same techniques as the insular ROIs, manually selecting electrodes from individual patient MRIs based on anatomical criteria (Naidich et al., 2004).

Analyses were performed by first bandpass filtering raw data of each electrode into broadband gamma activity (BGA; 70–150 Hz) following removal of line noise and its harmonics (zero-phase second-order Butterworth band-stop filters). A frequency domain bandpass Hilbert transform (paired sigmoid flanks with half-width 1.5 Hz) was applied and the analytic amplitude was smoothed (Savitzky-Golay FIR, third-order, frame length of 151 ms; Matlab 2017a, Mathworks, Natick, MA). BGA was defined as percentage change from baseline level; 500 to 100 ms before the presentation of the visual stimulus in each speech production task and 500 to 100 ms before auditory stimulus presentation for the listening task. Periods of significant activation were tested using a one-tailed t-test at each time point and were accepted at a Benjamini-Hochberg false detection rate (FDR) corrected threshold of q < 0.05. Responses were time aligned to the onset of audible speech production. For the grouped analysis, all electrodes were averaged within each subject and then the between subject averages were used. This minimized the influence of outliers in the grouped data.

To evaluate individual insular electrodes, data were tested by calculating the Z-score of the time period of interest against the baseline period. For articulatory tasks, the time periods used were −500 to −100 ms before speech onset and 200 to 600 ms after speech onset. The listening task was tested from 200 to 600 ms after stimulus onset. Statistical significance was accepted at an FDR corrected threshold of q < 0.05.

To provide statistically robust and topologically precise estimates of BGA, population-level representations were created using surface-based mixed-effects multilevel analysis (sb-MEMA) (Fischl et al., 1999; Conner et al., 2011; Kadipasaoglu et al., 2014; Kadipasaoglu et al., 2015; Forseth et al., 2018). This method accounts for sparse sampling, outlier inferences, as well as intra- and inter-subject variability to produce population maps of cortical activity. Significance levels were computed at a corrected alpha-level of 0.01 using family-wise error rate corrections for multiple comparisons. The minimum criterion for the family-wise error rate was determined by white-noise clustering analysis (Monte Carlo simulations, 5000 iterations) of data with the same dimension and smoothness as that analyzed (Kadipasaoglu et al., 2014). Subsequently, a geodesic Gaussian smoothing filter (3 mm full-width at half-maximum) was applied. Results were further restricted to regions with at least three patients contributing to coverage and BGA percent change exceeding 10%. To produce an activation movie, sb-MEMA was run on short, overlapping time windows (150 ms width, 10 ms spacing) to generate the frames of a movie portraying cortical activity.

To generate event-related potentials (ERPs; Figure 2—figure supplement 3), the raw data were band pass filtered (0.1–50 Hz). Speech aligned trials were averaged together and the resultant waveform was smoothed (Savitzky-Golay FIR, third-order, frame length of 151 ms). Periods of significant activity were determined as described previously. All electrodes were averaged within each subject, within ROI, and then the between subject averages were used.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
    Boston Naming Test
    1. E Kaplan
    2. H Goodglass
    3. S Weintraub
    (1983)
    Philadelphia: Lea & Febiger.
  28. 28
  29. 29
    Acoustic patterns of apraxia of speech
    1. RD Kent
    2. JC Rosenbek
    (1983)
    Journal of Speech, Language, and Hearing Research 26:231–249.
    https://doi.org/10.1044/jshr.2602.231
  30. 30
  31. 31
  32. 32
  33. 33
    Electrical stimulations of the human insula: their contribution to the ictal semiology of insular seizures
    1. L Mazzola
    2. F Mauguière
    3. J Isnard
    (2017)
    Journal of Clinical Neurophysiology : Official Publication of the American Electroencephalographic Society 34:307–314.
    https://doi.org/10.1097/WNP.0000000000000382
  34. 34
  35. 35
  36. 36
  37. 37
    The insula: anatomic study and MR imaging display at 1.5 T
    1. TP Naidich
    2. E Kang
    3. GM Fatterpekar
    4. BN Delman
    5. SH Gultekin
    6. D Wolfe
    7. O Ortiz
    8. I Yousry
    9. M Weismann
    10. TA Yousry
    (2004)
    AJNR. American Journal of Neuroradiology 25:222–232.
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
    The effects of noise masking on vowel duration in three patients with apraxia of speech and a concomitant aphasia
    1. MA Rogers
    2. R Eyraud
    3. EA Strand
    4. H Storkel
    (1996)
    Clinical Aphasiology 24:83–96.
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52

Decision letter

  1. Barbara G Shinn-Cunningham
    Senior and Reviewing Editor; Carnegie Mellon University, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

Woolnough et al., use intracranial electrocorticography in 27 patients with intractable epilepsy to describe the role of the insula in processing speech during listening and speaking. The goal of evaluating the role of the insula in speech production and planning is well motivated by past lesion and fMRI studies showing its importance in, for example, apraxia of speech. The study finds that responses in the posterior insula are functionally separable from the superior temporal gyrus (STG). Specifically, insula is more responsive during speech production than during perception, whereas STG shows the opposite effect. Moreover, the posterior insula did not show significant activity prior to speech onset, suggesting that it is not part of the pre-articulatory planning network. The anterior insula showed only very low amplitude activity (before speech onset), which was correlated with activity recorded from the frontal operculum (FO), suggesting that FO may be the true source of this planning activity. Gathering human intracranial recordings of this sort is challenging, yet there are no other methods available that provide a view of speech perception and production with this level of resolution and detail. Using this approach, the paper shows the unexpected result that activity in posterior insula during speaking exceeds that during listening. The current study thus provides us with a rare look at the neural processing underlying speech that has important implications.

Decision letter after peer review:

[Editors’ note: a previous version of this study was rejected after peer review, but the authors submitted for reconsideration. The first decision letter after peer review is shown below.]

Thank you for submitting your work entitled "Uncovering the functional anatomy of the insula during speech" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor.

Our decision has been reached after consultation among the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered for publication in eLife as it stands. However, if you are able to fully address the major concerns, we will consider a resubmission since the reviewers also agree that the work is of substantial potential interest.

This is a clearly written paper that evaluates the role of anterior and posterior insular cortex, as well as overlying inferior frontal gyrus and STG and CS regions, in speech production and perception, using reading naming, listening, and orofacial praxis tasks. Recordings were analyzed from a large number of electrodes, in a large number of patients. The researchers conclude that insular activity is evident only post speech onset, indicating that it does not participate in speech planning.

Some critical limitations led the reviewers to question whether the conclusion is really justifiable. First of all, the main conclusion, that insula does not participate in motor planning, stems from a null result, which is always tricky to interpret. (Is this a sensitivity issue? Or perhaps looking in an inappropriate time window?) Second, the imputation of anterior insular activity to FO is qualitative, as are inferences about functional specialization (e.g., anterior vs posterior insula). A more rigorous statistical treatment would help to show that different regions participate in the tasks in different ways.

Another issue is a lack of detail around the acoustic stimuli, and the possible degree to which acoustic (durational?) differences may influence results. Related to this, the time windows analyzed may have a large impact on what is seen. Reviewer #1 mentions the possibility of low-amplitude early activity that may contain important speech-relevant information, and reviewer #2 indicates that the red curves in Figure 2A-D provide a specific example of how the choice of time window changes interpretation – that production responses last longer that perception responses but this could be due to differences in the stimuli between the tasks.

Finally, as reviewer #3 notes, perhaps data from the praxis task could be leveraged more thoroughly to get more information about the timing of speech motor responses, and to inform analysis of data.

Reviewer #1:

This is a well written paper that evaluates the role of anterior and posterior insular cortex, as well as overlying inferior frontal gyrus and STG and CS regions, in speech production and perception, using reading naming, listening, and orofacial praxis tasks. Recordings were analyzed from a large number of electrodes, in a large number of patients. The results appear to demonstrate insular activity only post speech onset, indicating that it does not participate in speech planning.

I have one major concern with this paper as it stands.

First, the time scales that are discussed are exclusively long ones – 200 ms or more. Other iEEG studies (e.g. Brugge et al., 2003, and others from Matt Howard's group in Iowa) demonstrate early evoked auditory activity, within a few tens of ms of stimulation, in addition to the much later population responses discussed here (and evident on whole-brain noninvasive EEG). The amplitude is much lower than the population response that happens later, but the earlier activity is consistent with synaptic conduction delays. The problem is that these earlier, smaller signals might carry a lot of information, and be functionally important. In other words, there may be planning activity at a lower amplitude in insular cortex, early, but it is being missed.

At a minimum this issue of first evoked response (early) vs population activity (late) needs to be discussed, I think.

Reviewer #2:

Woolnough and colleagues present data on the role of the insula in speech. Using multiple speech and non-speech tasks, they report that high-frequency activity in bilateral posterior insula starts after speech onset during both speaking and listening, which they argue may reflect a monitoring process. In contrast, activation in anterior insula is very low amplitude, starting before speech onset. They suggest that this activity may actually reflect the frontal operculum. In comparison to other cortical speech regions, they conclude that posterior insula is involved in monitoring or sensorimotor processing particularly during speech production.

This is a unique data set, combining intracranial recordings from a relatively large cohort of epilepsy patients. The broad question about the role of the insula in speech is extremely important, as they point out that this area is consistently activated across many studies, yet we do not understand its functions. In general, the analyses are statistically robust, and the use of multiple tasks to compare across speech, articulatory, and sensory processing is important.

However, there are several serious problems with the manuscript. I hope that the following comments are helpful to the authors in their revisions of the paper.

Essential revisions:

1) Most of the major claims of the paper are highly speculative. To be clear, the data themselves seem solid, and for the most part, I think it is possible to make novel claims from the analyses that were done. The problem is that there are massive logical leaps required to get from some of the data, e.g.:

"the activity in posterior insula was exclusively post-speech onset".… "There were no differences in amplitude of activation with varying levels of articulation complexity, comparing the simple monosyllabic words from the naming task against the complex reading […] and multisyllabic naming responses"

to the interpretation, e.g.:

"implying a role not as a planning region but possibly as a monitoring region"

The claim about posterior insula as a monitoring region (or is it auditory and somatosensory integration [subsection “Anterior Insula”]?) has no clear support in the data or analyses. It relies on the fact that response differences from baseline do not begin until after acoustic speech onset, and that articulatory complexity (varying across tasks) does not affect response amplitude. I do not deny that these interpretations could be accurate, but I do not see any analyses that explicitly test them in a falsifiable way.

Here's another major example: the claim that any supra-threshold activation in anterior insula actually reflects neural activity in the frontal operculum is based entirely on a qualitative analysis of electrodes in both regions (Figure 5, subsection “Comparing Anterior Insula and Frontal Operculum”). It seems that the authors want to claim that the low amplitude and proximity to FO means that these responses actually come from FO (subsection “Comparing Anterior Insula and Frontal Operculum”). Yet the analysis in Figure 5 seems to actually suggest that the activity between these regions is completely different. One way to actually evaluate this hypothesis is to simply (cross-)correlate the activity between the regions (can you predict FO from AI, and vice versa?), yet no similar analysis was done here.

Overall, the analyses that were done provide some relatively clear results regarding the localization and timing of activity in the (at least posterior) insula during various speech, motor, and sensory tasks. But in my opinion, the interpretations are grossly overextended. In general, this paper seems to be a series of descriptions of activity, many of which are novel. But it's not clear to me how this changes our understanding of either speech production or speech perception networks in the brain, and specifically toward the stated goal of the paper, which is understanding the role of the insula in these networks.

2) In general, the specific hypotheses are hard to follow. For instance, the hypothesis laid out in the Introduction seems to suggest that the FO/AI link shown later in the paper is an a priori hypothesis. Yet the way the Results section is written, it's not clear if the authors truly suspected this was the case, or whether they observed the low amplitude AI activity and then tried to test whether it actually reflects FO. A related example is that the ROIs briefly mentioned in subsection “Comparing Anterior Insula and Frontal Operculum” (and detailed more in the Materials and methods section) do not state that there was a particular reason (possibly other than a briefly mentioned set of fMRI results) to parcellate the insula into anterior/posterior. Thus, I am left confused about both the hypothesis and how it relates to the previous fMRI and lesion literature mentioned in the Introduction.

Another example: the fourth paragraph of subsection “Comparing Anterior Insula and Frontal Operculum”. The paragraph starts by saying the goal is to disambiguate something (what is not clear – is it STG vs insula, or auditory vs sensorimotor?). At the end of the paragraph, a claim is made that STG and posterior insula are functionally separable. The relative amplitude of activation between speaking and listening in both regions does not really disambiguate the role of these regions, it simply shows that one has higher activity than the other in each task (also without noting the extensive literature on suppressed responses during self-produced speech in areas like STG). The analysis in Figure 6 quantifies this in a slightly different way, though ultimately it is the same analysis, and therefore I do not think strengthens the claim.

Similarly, the interpretation of the FO/AI relationship in the Discussion also presents claims that the present manuscript explains lesion data, but without much clarity as to how it rules out the supposed alternatives (subsection “Posterior Insula”; what makes the alternatives "less likely"?).

In general, an approach like linear mixed-effects modeling would greatly improve the ability to understand the functions of these electrodes/regions, rather than just comparing average amplitudes across tasks that may not be perfectly controlled for features that these neurons care about.

3) It is unclear to me why the authors put so much focus on the FO/AI relationship while not testing the same hypothesis for STG/PI. As far as I am aware, these regions have similarly proximate locations, and depending on the trajectories of the depth electrodes, could theoretically pick up the same signals. Here again, the differences in response amplitude between speaking and listening tasks would not truly disambiguate these regions either in terms of shared signals or functional properties. I believe that additional approaches are necessary for making the FO/AI claim (and perhaps also testing the STG/PI claim), including modeling of the BGA spatial spread and/or single pulse stimulation to look for functionally connected electrodes.

4) Why do Figure 4 and Figure 6 only show data from the left hemisphere? Figure 1 and Figure 3 show clear bilateral coverage, and at times, the authors claim that effects are indeed bilateral. However, the summary paragraph at the end of the Discussion section suddenly claims that these effects are left-lateralized.

Reviewer #3:

In "Uncovering the functional anatomy of the insula during speech" Woolnough et al. employ intracranial electrocorticography in a large cohort of intractable epilepsy patients during listening and speaking tasks to describe the role of the insula in speech processing. They find that, despite previous assertions on the role of the insula in speech production, the insula is not involved in pre-articulatory preparation, but is more likely an auditory and somatosensory integration area that is particularly responsive during self-produced speech compared to externally generated speech. I found this paper to be well-written and exciting, since I have not yet seen any ECoG papers directly address the timing of speech responses in the insular cortex and compare them to other speech-responsive areas. The analysis was relatively straightforward, comparing reading, naming, listening, and praxis tasks (orofacial movements) while recording from the posterior and anterior insula, superior temporal gyrus, central sulcus, and inferior frontal gyrus. The authors assert that the posterior insula, in particular the anterior long gyrus (ALG), is not active during pre-articulatory time periods (pre-speech), but is active during listening. More specifically, it is more active during reading and naming as compared to listening to pre-recorded sounds, while superior temporal gyrus shows the opposite pattern. The ALG is also active during orofacial movements, but to a lesser extent than when these movements produce an overt sound (i.e. during object naming or reading). Activity in this area is also dissociated from activity in the frontal operculum, which appears to be motor/preparatory in nature. This result provides an important missing link into how self-monitoring during speech production is mediated by separate auditory circuits, and serves to reconcile disparate findings in noninvasive modalities. I only have a few comments to improve the manuscript.

1) Overall, the use of "superior temporal gyrus" to mean the entire auditory core, belt, and parabelt areas is fine, but I found it distracting at first since different gyri within the superior temporal gyrus may have quite different functional roles. For example, much of the activity recorded in the STG ROI appears to be in the transverse temporal gyrus (Heschl's gyrus), which could be parcellated as a separate area. I do not suggest that the authors need to do this, rather, it would be good to state upfront (e.g. in the Results section, or in any case before the methods section) that this ROI was a large region encompassing primary, secondary, and tertiary auditory cortex.

2) The authors note that ~75% of responses in the naming task were correctly articulated and used the most common word choice. If the number of trials allows, the authors could consider comparing a subset of the ~25% error trials (incorrect articulations) to correct trials to determine whether the posterior insula shows a difference in response magnitude or timing when the utterance is incorrect. If posterior insula is indeed critical for self-monitoring and is implicated in dysarthria, this could provide some key insight into how that process occurs.

3) In Figure 2, the large scale of activity in motor areas and the superior temporal gyrus means that the color scale is blown out. I would suggest scaling each row of panels from -max to +max within an ROI, since the main comparison being made with these is across tasks, while the comparison across areas can be done using panels A-D.

4) Related to point (1), in Figure 3, the authors show the difference in activation for reading and listening to isolate contributions from externally generated vs. self-generated speech. It is difficult to tell the difference between STG, Heschl's gyrus, and regions of the insula. I would suggest that the authors draw an ROI over the regions that they're classifying as each anatomical area.

5) During the listening task, the patients hear words that are presumably not the same as the words that they generate in the reading and naming tasks. To what extent might these acoustic differences influence their results? Do the authors expect that responses during a playback condition should show the same result as their listening task? An analysis of the acoustic properties of the reading/naming vs. listening sounds could be helpful here.

6) Although fewer participants completed the praxis task, it would be helpful to see this task as a similar plot to Figure 2, in a supplemental figure. Currently the praxis task is shown only in Figure 3B for two averaged time points, so it is difficult to know whether the time-course of activity is similar to the other tasks.

[Editors’ note: minor issues and corrections have not been included, so there is not an accompanying Author response.]

In looking over your final copy, please consider the following points and make any changes you think merited.

The reported data provide modest support for the idea that primary anterior insula (AI) activity can be explained by a source in FO, and insula is not involved in pre-articulatory motor planning. The data do show that FO has substantially stronger activity, but that is not the same as showing that there is no activity in AI. The authors also present a cross-correlation analysis showing a double dissociation between activity in posterior and anterior insula. Yet the support for the idea that activity in AI is actually coming from FO rests on the observation of what is a low-- albeit significant-- correlation (r=0.11). Phrased another way, FO activity only explains 1.2% of the variance in AI responses. Yet, the amplitude of AI is likely close to the noise floor of the recordings; thus, one may not expect to observe a particularly high correlation coefficient even if FO is the source of AI responses. In the end, this argument is not particularly convincing. The problem is a classic: a negative result is essentially impossible to prove. That said, negative results are important to report and have extraordinary value in combatting publication bias, so should be included in the published paper.

Regardless of how the results are to be interpreted, the authors should include substantially more information about how the cross-correlations were done. Was it pairwise for each electrode in each region? Or for the most proximal electrodes in each region? Or averaged across electrodes in each region? What was the peak lag, which could help us resolve whether AI activity really reflects FO, (since volume conduction should result in a 0 lag)? Providing such details will help readers understand the results and their significance.

https://doi.org/10.7554/eLife.53086.sa1

Author response

[Editors’ note: the author responses to the first round of peer review follow.]

Reviewer #1:

I have one major concern with this paper as it stands.

First, the time scales that are discussed are exclusively long ones – 200 ms or more. Other iEEG studies (e.g. Brugge et al., 2003, and others from Matt Howard's group in Iowa) demonstrate early evoked auditory activity, within a few tens of ms of stimulation, in addition to the much later population responses discussed here (and evident on whole-brain noninvasive EEG). The amplitude is much lower than the population response that happens later, but the earlier activity is consistent with synaptic conduction delays. The problem is that these earlier, smaller signals might carry a lot of information, and be functionally important. In other words, there may be planning activity at a lower amplitude in insular cortex, early, but it is being missed.

At a minimum this issue of first evoked response (early) vs population activity (late) needs to be discussed, I think.

We performed a new evoked response analysis, at the reviewer’s behest, looking across our ROIs. We find no evidence of significant pre-articulatory potentials in either insular region. We have added text in the methods relevant to the ERP analysis and have incorporated a new Figure 2—figure supplement 3 in the manuscript

“To generate event related potentials (ERPs; Figure 2—figure supplement 3) the raw data were band pass filtered (0.1 – 50 Hz). Speech aligned trials were averaged together and the resultant waveform was smoothed (Savitzky-Golay FIR, third order, frame length of 151 ms). Periods of significant activity were determined as described previously. All electrodes were averaged within each subject, within ROI, and then the between subject averages were used.”

Additionally, to help improve the presentation of our time resolved data we have generated a new 4D representation of cortical activation using MEMA, using smaller time windows (150 ms windows, 10ms center offset) (Video 1), that shows the progression of pre-articulatory activity from IFG to FO but does not result in AI activation.

Reviewer #2:

Essential revisions:

1) Most of the major claims of the paper are highly speculative. To be clear, the data themselves seem solid, and for the most part, I think it is possible to make novel claims from the analyses that were done. The problem is that there are massive logical leaps required to get from some of the data, e.g.:

"the activity in posterior insula was exclusively post-speech onset".… "There were no differences in amplitude of activation with varying levels of articulation complexity, comparing the simple monosyllabic words from the naming task against the complex reading […] and multisyllabic naming responses"

to the interpretation, e.g.:

"implying a role not as a planning region but possibly as a monitoring region"

The claim about posterior insula as a monitoring region (or is it auditory and somatosensory integration [subsection “Anterior Insula”]?) has no clear support in the data or analyses. It relies on the fact that response differences from baseline do not begin until after acoustic speech onset, and that articulatory complexity (varying across tasks) does not affect response amplitude. I do not deny that these interpretations could be accurate, but I do not see any analyses that explicitly test them in a falsifiable way.

Our claims are based on a reasonable extension of the results we have found – however, we concede the reviewers point as the insula may have other functions beyond those assessable by the experimental paradigms presented here. We have therefore eliminated any reference to the role of posterior insula in monitoring in the abstract and in the Results sections. In the discussion, we summarize salient findings and have added this text, so that future efforts can be informed by our perspective.

“In summary, the posterior insula (i) lacks pre-articulatory activity, (ii) lacks complexity sensitivity (Baldo et al., 2011), (iii) is activated by externally produced sounds and (iv) by non-speech mouth movements. Taken together these findings are suggestive of a sensory monitoring region – congruent with the role of the insula in auditory-somatosensory integration (Rodgers et al., 2008) where both auditory and somatosensory activity in rodent insula is maximal during coincident presentation, comparable to what we see during human selfgenerated speech.”

Here's another major example: the claim that any supra-threshold activation in anterior insula actually reflects neural activity in the frontal operculum is based entirely on a qualitative analysis of electrodes in both regions (Figure 5, subsection “Comparing Anterior Insula and Frontal Operculum”). It seems that the authors want to claim that the low amplitude and proximity to FO means that these responses actually come from FO (subsection “Comparing Anterior Insula and Frontal Operculum”). Yet the analysis in Figure 5 seems to actually suggest that the activity between these regions is completely different. One way to actually evaluate this hypothesis is to simply (cross-)correlate the activity between the regions (can you predict FO from AI, and vice versa?), yet no similar analysis was done here.

Motivated by the reviewers’ comments, we performed a cross-correlation analysis of the gamma band limited voltage trace data from electrode pairs within these adjacent regions (subsection “Comparing Anterior Insula and Frontal Operculum”, subsection “Posterior Insular vs. Superior Temporal Gyrus activity”). As anticipated, we found significant correlation between the signals in AI and FO (r = 0.11, p = 0.008). In contrast, and also as per expectations, in the PI-STG analysis, we found no significant correlation (r = 0.01, p = 0.74). This text has been added to the Results section and Figure 5 and Figure 6 have now been modified as below.

“Due to the oblique trajectories used for sampling the insula, a majority of patients (n=13) with anterior insular electrodes also had an electrode on the same probe (separation 5.7 ± 2.2 mm) that was localized to frontal operculum (Figure 5C). The band-limited (70-150Hz) voltage traces at these electrodes were significantly correlated between electrode pairs (r = 0.11 ± 0.03, mean ± SE; Wilcoxon rank sign, p = 0.008). Also, the population level BGA time courses were highly comparable between the two regions (Figure 5D).

In patients with electrodes in both PI and STG (n=8) we correlated activity in the closest electrode pair (separation 11.9 ± 2.9 mm) (Figure 6B). In contrast to the AI-FO correlation, band-limited voltage traces in these electrode pairs were not significantly correlated (r = 0.01 ± 0.03, mean ± SE; Wilcoxon rank sign, p = 0.74).”

Overall, the analyses that were done provide some relatively clear results regarding the localization and timing of activity in the (at least posterior) insula during various speech, motor, and sensory tasks. But in my opinion, the interpretations are grossly overextended. In general, this paper seems to be a series of descriptions of activity, many of which are novel. But it's not clear to me how this changes our understanding of either speech production or speech perception networks in the brain, and specifically toward the stated goal of the paper, which is understanding the role of the insula in these networks.

This study was initially motivated by the prevalent literature that associates speech production and insula, derived from results of lesional (Dronkers, 1996; Marien et al., 2001; Ogar et al., 2006; Itabashi et al., 2016) and functional imaging (Mutschler et al., 2009; Adank, 2012; McGettigan et al., 2013; Ardila et al., 2014; Oh et al., 2014) studies. The Dronkers 1996 paper linking the anterior insula to disruption of speech has now amassed >1300 citations and anterior insula has been included in several high-profile models of speech and language production (Hickok and Poeppel, 2007 (cited 3474 times) and Dehaene, 2009). Further, the role of the insula as a pre-articulatory node has been used as an explanation of apraxia of speech (Ogar et al., 2006; Baldo et al., 2011); Contrary to the predictions of these models (all derived from studies that use techniques without the temporal resolution to confirm this), we do not find significant pre-articulatory activity that originates in the insula; more specifically, we show that the multitude of fMRI studies that have been interpreted to show that the anterior insula generates activity during speech production have misattributed activity from the frontal operculum to the anterior insula.

2) In general, the specific hypotheses are hard to follow. For instance, the hypothesis laid out in the Introduction seems to suggest that the FO/AI link shown later in the paper is an a priori hypothesis. Yet the way the Results section is written, it's not clear if the authors truly suspected this was the case, or whether they observed the low amplitude AI activity and then tried to test whether it actually reflects FO.

We thought we were clear in our statements in the section the reviewer raises. We have included the following in the introduction to clarify our initial hypothesis.

“Here, we performed direct, invasive recordings of cortical activity from multiple sites across the insula in both hemispheres, in patients undergoing seizure localization for intractable epilepsy, testing the theories generated from the existing literature, namely that the insula acts as a pre-articulatory preparatory region.”

In the Introduction, we refer to the IFG as confounding functional imaging and lesional studies of the SPG. The fMRI literature shows speech related activation clusters which are attributed to anterior insula but we believe originate from FO. Lesional studies (Hillis et al., 2004) implicate IFG as the likely alternative to insula as the cause of apraxia of speech. Therefore, we were also concerned about the possibility of it contaminating the signals recorded in insula using sEEG electrodes. Fortunately, γ band signal is highly focal and falls off rapidly with distance. This spatial resolution allowed us to disambiguate the specific roles of AI and FO in our study (Figure 2, Figure 5; Video 1).

A related example is that the ROIs briefly mentioned in subsection “Comparing Anterior Insula and Frontal Operculum” (and detailed more in the Materials and methods section) do not state that there was a particular reason (possibly other than a briefly mentioned set of fMRI results) to parcellate the insula into anterior/posterior. Thus, I am left confused about both the hypothesis and how it relates to the previous fMRI and lesion literature mentioned in the Introduction.

The insula is defined bygross anatomical boundaries between the insular short gyri (anterior) and long gyri (posterior) (Naidich et al., 2004). This is also supported by the existent literature which also invokes separable functional roles – the anterior insula is presumed the primary locus of speech production related activity in functional imaging studies (Mutschler et al., 2009; Adank, 2012; McGettigan et al., 2013; Ardila et al., 2014; Oh et al., 2014). Posterior insula is however more strongly linked to somatosensory and nociceptive processing (Stephani et al., 2011; Garcia-Larrea, 2012).

“To compare the timing of activation of other functional regions with the insula, we used ROIs based on known anatomico-functional parcellation of the insula, separating the short gyri (anterior) and long gyri (posterior) (Naidich et al., 2004), and targeted adjacent regions well-established to be involved in speech production, as detailed in the methods: left superior temporal gyrus (STG; primary and secondary auditory cortex), bilateral central sulcus (CS) and left inferior frontal gyrus (IFG) (Figure 1C).”

Another example: the fourth paragraph of subsection “Comparing Anterior Insula and Frontal Operculum”. The paragraph starts by saying the goal is to disambiguate something (what is not clear – is it STG vs insula, or auditory vs sensorimotor?). At the end of the paragraph, a claim is made that STG and posterior insula are functionally separable. The relative amplitude of activation between speaking and listening in both regions does not really disambiguate the role of these regions, it simply shows that one has higher activity than the other in each task (also without noting the extensive literature on suppressed responses during self-produced speech in areas like STG). The analysis in Figure 6 quantifies this in a slightly different way, though ultimately it is the same analysis, and therefore I do not think strengthens the claim.

Our goal of this analysis was to show a functional dissociation between PI and STG. While, as has previously been shown, STG is suppressed during self-generated speech the posterior insula is not and in fact shows preferential activation. Our separation of representation between Figure 2 and Figure 6 is to show this effect can be seen at the single electrode level in each ROI and is not an effect of combining responses across the region.

We have rewritten the entire subsection “Chronology of Insular Activation” to improve clarity. We are well aware of the speech induced suppression of auditory cortex and have now included references to this literature on the suppression of self-generated sounds in our Discussion section.

“To compare the timing of activation of other functional regions with the insula, we used ROIs based on known anatomico-functional parcellation of the insula, separating the short gyri (anterior) and long gyri (posterior) (Naidich et al., 2004), and targeting adjacent regions well established to be involved in speech production, as detailed in the methods: left superior temporal gyrus (STG – primary and secondary auditory cortex), bilateral central sulcus (CS) and left inferior frontal gyrus (IFG) (Figure 1C). During both reading and naming, activity in these ROIs was as expected (Figure 2). IFG activation began ~750 ms before speech onset, prior to CS activation. CS activity was maximal at speech onset and shortly after the onset speech, STG became active.

The posterior insula was active exclusively after speech onset implying that it did not play a role in speech planning. There were no differences in the amplitude of activation with varying levels of articulation complexity – simple monosyllabic names vs. complex read words (Wilcoxon rank sign, 200-600ms; p=0.083) and multisyllabic naming responses (p=0.898) (Figure 2—figure supplement 1). The only difference observed was a duration effect, with longer articulation times and therefore longer activation duration for multisyllabic words (600-1000ms; p=0.024). In posterior insula, the timing of responses resembled those of STG very closely, with similar onset, offset and peak activity times. As expected, from other studies of auditory cortex, the STG responded more strongly to external speech rather than to self-generated speech (p=0.002) (Figure 2—figure supplement 1). The posterior insula however showed a significantly greater response during speech production than during speech listening (p=0.019) – thus these two adjacent regions are functional dissociable.

The anterior insula showed a very weak though significant activation in both speech articulation tasks, starting shortly after the IFG. This low amplitude response first became significant around 150 ms before speech onset and remained active for the duration of speech, concurrent with the posterior insula and STG. This small but reliable response could represent a low magnitude local processing, but could also represent activity from an active adjacent region, such as the frontal operculum which overlies the insula.”

Discussion section

“This activation profile in PI is the opposite of STG. It is well known that auditory cortex is suppressed during self-generated speech (Creutzfeldt et al., 1989; Paus et al., 1996; Numminen et al., 1999; Chan et al., 2014) and self-generated sounds (Rummell et al., 2016; Singla et al., 2017), likely as a result of interactions between auditory and non-auditory sensory feedback in auditory cortex. While STG is more active during externally produced speech than self-generated speech, posterior insular activity does not suppress in response to articulation and rather, is more active during self-generated speech.”

Similarly, the interpretation of the FO/AI relationship in the Discussion also presents claims that the present manuscript explains lesion data, but without much clarity as to how it rules out the supposed alternatives (subsection “Posterior Insula”; what makes the alternatives "less likely"?).

That is not exactly what we have said: Lesional analysis in cases of apraxia of speech, attribute the problem to be (i) disruption of pre-articulatory planning in insula, (ii) disruption of pre-articulatory planning in IFG or (iii) impairment of audio-motor integration. Our study provides the first clear evidence to rule out the first possibility and instead invokes IFG or FO disruption. We have made modifications to the relevant paragraph to prevent any misinterpretations.

“Lesional analysis in cases of apraxia of speech, attribute the primary cause to be either (i) disruption of prearticulatory planning in the superior left posterior short gyrus, (also called the precentral gyrus of the insula), (ii) disruption of pre-articulatory planning in IFG or (iii) impairment of audio-motor integration. Our study provides evidence to rule out the first possibility. Given the lack of pre-articulatory activity shown in this study and the lack of any relationship of activation to the complexity of articulation, it is unlikely that lesions of this region are crucial for AOS. (Kent, 2000; Baldo et al., 2011). While our results are suggestive of audio-motor integration in the ALG we do not have direct evidence of this function (Kent and Rosenbek, 1983; Rogers et al., 1996; Maas et al., 2015). Thus, our findings best support AOS representing a disruption of the IFG (Hillis et al., 2004; Fedorenko et al., 2015).”

3) It is unclear to me why the authors put so much focus on the FO/AI relationship while not testing the same hypothesis for STG/PI. As far as I am aware, these regions have similarly proximate locations, and depending on the trajectories of the depth electrodes, could theoretically pick up the same signals. Here again, the differences in response amplitude between speaking and listening tasks would not truly disambiguate these regions either in terms of shared signals or functional properties. I believe that additional approaches are necessary for making the FO/AI claim (and perhaps also testing the STG/PI claim), including modeling of the BGA spatial spread and/or single pulse stimulation to look for functionally connected electrodes.

We were focused on the AI and the FO given the principal role the AI has been attributed in prior explanations of speech apraxia. We have now performed and included a cross-correlation analysis of the voltage trace data from electrode pairs within these abutting regions (subsection “Comparing Anterior Insula and Frontal Operculum”, subsection “Posterior Insular vs. Superior Temporal Gyrus activity”). In the AI-FO comparison there is a significant correlation between the signals and this is not seen in the PI-STG comparison.

“Due to the oblique trajectories used for sampling the insula, the majority of patients (n=13) with anterior insula electrodes had a nearby electrode on the same probe (separation 5.7 ± 2.2 mm) that was localized to frontal operculum (Figure 5C). The band-limited (70-150Hz) voltage traces of these electrodes were significantly correlated between the electrode pairs (r = 0.11 ± 0.03, mean ± SE; Wilcoxon rank sign, p = 0.008). Also, the population level BGA time courses were highly comparable between the two regions (Figure 5D).”

“In patients with electrodes in both PI and STG we took the closest electrode pair (separation 11.9 ± 2.9 mm) (Figure 6B). The band-limited voltage traces in these electrode pairs were not significantly correlated between the electrode pairs (r = 0.01 ± 0.03, mean ± SE; Wilcoxon rank sign, p = 0.74).”

4) Why do Figure 4 and Figure 6 only show data from the left hemisphere? Figure 1 and Figure 3 show clear bilateral coverage, and at times, the authors claim that effects are indeed bilateral. However, the summary paragraph at the end of the Discussion section suddenly claims that these effects are left-lateralized.

We had not included the right hemisphere data given that we did not have sufficient broad coverage of the right hemisphere to run a meaningful MEMA analysis – Specifically, for the MEMA analysis a minimum patient coverage of 3 is generally required in any given region. In deference to the reviewer’s request, we have also represented right hemisphere data in Figure 6 and the ambiguity from the summary paragraph has been corrected.

“In summary, we find that the insula does not serve pre-articulatory preparatory roles, and that bilateral posterior insular cortices may function as auditory and somatosensory integration or monitoring regions.”

Reviewer #3:

1) Overall, the use of "superior temporal gyrus" to mean the entire auditory core, belt, and parabelt areas is fine, but I found it distracting at first since different gyri within the superior temporal gyrus may have quite different functional roles. For example, much of the activity recorded in the STG ROI appears to be in the transverse temporal gyrus (Heschl's gyrus), which could be parcellated as a separate area. I do not suggest that the authors need to do this, rather, it would be good to state upfront (e.g. in the Results section, or in any case before the methods section) that this ROI was a large region encompassing primary, secondary, and tertiary auditory cortex.

We have now added this distinction to our Results section:

“left superior temporal gyrus (STG; primary and secondary auditory cortex)”

2) The authors note that ~75% of responses in the naming task were correctly articulated and used the most common word choice. If the number of trials allows, the authors could consider comparing a subset of the ~25% error trials (incorrect articulations) to correct trials to determine whether the posterior insula shows a difference in response magnitude or timing when the utterance is incorrect. If posterior insula is indeed critical for self-monitoring and is implicated in dysarthria, this could provide some key insight into how that process occurs.

As we state in the Materials and methods section, in naming trials patients were not constrained in their answer. To allow us to assure specific articulations were mono or multi-syllabic we only analyzed trials with answers that were of the expected words for a given stimulus. >95% of the responses were correctly articulated but ~25% of these made a word choice that was not the most commonly associated word for the presented visual stimulus (e.g. bird instead of pelican), thus, for reasonable grouping of data and population level analyses, we excluded these trials with variations in responses.

3) In Figure 2, the large scale of activity in motor areas and the superior temporal gyrus means that the color scale is blown out. I would suggest scaling each row of panels from -max to +max within an ROI, since the main comparison being made with these is across tasks, while the comparison across areas can be done using panels A-D.

STG and CS have been rescaled for better visualization.

4) Related to point (1), in Figure 3, the authors show the difference in activation for reading and listening to isolate contributions from externally generated vs. self-generated speech. It is difficult to tell the difference between STG, Heschl's gyrus, and regions of the insula. I would suggest that the authors draw an ROI over the regions that they're classifying as each anatomical area.

In Figure 3 we are purely showing the insula, with STG and Heschl’s gyrus hidden as they would usually be obscuring the insula in this representation.

5) During the listening task, the patients hear words that are presumably not the same as the words that they generate in the reading and naming tasks. To what extent might these acoustic differences influence their results? Do the authors expect that responses during a playback condition should show the same result as their listening task? An analysis of the acoustic properties of the reading/naming vs. listening sounds could be helpful here.

Within both reading and listening tasks we have a phonologically diverse range of stimuli. Reading consisted of 60 unique words and listening contained 72 sentences with varying sentence structure and initial word phonemes. There may be variations in posterior insular activity driven by phonological effects, but this does not detract from our principal findings and is beyond the scope of this study.

6) Although fewer participants completed the praxis task, it would be helpful to see this task as a similar plot to Figure 2, in a supplemental figure. Currently the praxis task is shown only in Figure 3B for two averaged time points, so it is difficult to know whether the time-course of activity is similar to the other tasks.

In deference to the reviewer’s wishes, we performed a subgroup analysis using only those patients who performed the praxis task (n = 8). We depict these results as Figure 2—figure supplement 2.

“Further, in the praxis task we observed significant activation in posterior insula where none was seen in STG (Figure 2—figure supplement 2).”

https://doi.org/10.7554/eLife.53086.sa2

Article and author information

Author details

  1. Oscar Woolnough

    1. Vivian L. Smith Department of Neurosurgery, McGovern Medical School at UT Health Houston, Houston, United States
    2. Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, United States
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5878-6865
  2. Kiefer James Forseth

    1. Vivian L. Smith Department of Neurosurgery, McGovern Medical School at UT Health Houston, Houston, United States
    2. Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, United States
    Contribution
    Data curation, Software, Investigation
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1624-8329
  3. Patrick Sarahan Rollo

    1. Vivian L. Smith Department of Neurosurgery, McGovern Medical School at UT Health Houston, Houston, United States
    2. Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, United States
    Contribution
    Data curation, Investigation
    Competing interests
    No competing interests declared
  4. Nitin Tandon

    1. Vivian L. Smith Department of Neurosurgery, McGovern Medical School at UT Health Houston, Houston, United States
    2. Texas Institute for Restorative Neurotechnologies, University of Texas Health Science Center at Houston, Houston, United States
    3. Memorial Hermann Hospital, Texas Medical Center, Houston, United States
    Contribution
    Conceptualization, Supervision, Methodology, Project administration
    For correspondence
    Nitin.Tandon@uth.tmc.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2752-2365

Funding

National Institute on Deafness and Other Communication Disorders (DC014589)

  • Oscar Woolnough
  • Kiefer James Forseth
  • Patrick Sarahan Rollo
  • Nitin Tandon

National Institute of Neurological Disorders and Stroke (NS098981)

  • Oscar Woolnough
  • Kiefer James Forseth
  • Patrick Sarahan Rollo
  • Nitin Tandon

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Vitoria Piai and Eleonora Bartoli for assistance with stimulus design and for input into earlier versions of the manuscript. We express our gratitude to all the patients who participated in this study; the neurologists at the Texas Comprehensive Epilepsy Program who participated in the care of these patients; and the nurses and technicians in the Epilepsy Monitoring Unit at Memorial Hermann Hospital who helped make this research possible. This work was supported by the National Institute for Deafness and other Communication Disorders DC014589 and National Institute of Neurological Disorders and Stroke NS098981.

Ethics

Human subjects: Patients participated in the experiments after written informed consent was obtained. All experimental procedures were reviewed and approved by the Committee for the Protection of Human Subjects (CPHS) of the University of Texas Health Science Center at Houston as Protocol Number: HSC-MS-06-0385.

Senior and Reviewing Editor

  1. Barbara G Shinn-Cunningham, Carnegie Mellon University, United States

Publication history

  1. Received: October 27, 2019
  2. Accepted: December 12, 2019
  3. Accepted Manuscript published: December 19, 2019 (version 1)
  4. Version of Record published: January 3, 2020 (version 2)

Copyright

© 2019, Woolnough et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,755
    Page views
  • 188
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Matthew J Davidson et al.
    Research Article Updated

    Research on the neural basis of conscious perception has almost exclusively shown that becoming aware of a stimulus leads to increased neural responses. By designing a novel form of perceptual filling-in (PFI) overlaid with a dynamic texture display, we frequency-tagged multiple disappearing targets as well as their surroundings. We show that in a PFI paradigm, the disappearance of a stimulus and subjective invisibility is associated with increases in neural activity, as measured with steady-state visually evoked potentials (SSVEPs), in electroencephalography (EEG). We also find that this increase correlates with alpha-band activity, a well-established neural measure of attention. These findings cast doubt on the direct relationship previously reported between the strength of neural activity and conscious perception, at least when measured with current tools, such as the SSVEP. Instead, we conclude that SSVEP strength more closely measures changes in attention.

    1. Neuroscience
    Margaret M Cunniff et al.
    Research Article Updated

    Many genes have been linked to autism. However, it remains unclear what long-term changes in neural circuitry result from disruptions in these genes, and how these circuit changes might contribute to abnormal behaviors. To address these questions, we studied behavior and physiology in mice heterozygous for Pogz, a high confidence autism gene. Pogz+/- mice exhibit reduced anxiety-related avoidance in the elevated plus maze (EPM). Theta-frequency communication between the ventral hippocampus (vHPC) and medial prefrontal cortex (mPFC) is known to be necessary for normal avoidance in the EPM. We found deficient theta-frequency synchronization between the vHPC and mPFC in vivo. When we examined vHPC–mPFC communication at higher resolution, vHPC input onto prefrontal GABAergic interneurons was specifically disrupted, whereas input onto pyramidal neurons remained intact. These findings illustrate how the loss of a high confidence autism gene can impair long-range communication by causing inhibitory circuit dysfunction within pathways important for specific behaviors.