1. Neuroscience
Download icon

Neuronal populations in the occipital cortex of the blind synchronize to the temporal dynamics of speech

  1. Markus Johannes Van Ackeren
  2. Francesca M Barbero
  3. Stefania Mattioni
  4. Roberto Bottini
  5. Olivier Collignon  Is a corresponding author
  1. University of Trento, Italy
  2. University of Louvain, Belgium
Research Article
  • Cited 3
  • Views 1,206
  • Annotations
Cite this article as: eLife 2018;7:e31640 doi: 10.7554/eLife.31640

Abstract

The occipital cortex of early blind individuals (EB) activates during speech processing, challenging the notion of a hard-wired neurobiology of language. But, at what stage of speech processing do occipital regions participate in EB? Here we demonstrate that parieto-occipital regions in EB enhance their synchronization to acoustic fluctuations in human speech in the theta-range (corresponding to syllabic rate), irrespective of speech intelligibility. Crucially, enhanced synchronization to the intelligibility of speech was selectively observed in primary visual cortex in EB, suggesting that this region is at the interface between speech perception and comprehension. Moreover, EB showed overall enhanced functional connectivity between temporal and occipital cortices that are sensitive to speech intelligibility and altered directionality when compared to the sighted group. These findings suggest that the occipital cortex of the blind adopts an architecture that allows the tracking of speech material, and therefore does not fully abstract from the reorganized sensory inputs it receives.

https://doi.org/10.7554/eLife.31640.001

Introduction

The human cortex comprises a number of specialized units, functionally tuned to specific types of information. How this functional architecture emerges, persists, and develops throughout a person’s life are among the most challenging and exciting questions in neuroscience research. Although there is little debate that both genetic and environmental influences affect brain development, it is currently not known how these two factors shape the functional architecture of the cortex. A key topic in this debate is the organization of the human language system. Language is commonly thought to engage a well-known network of regions around the lateral sulcus. The consistency of this functional mapping across individuals and its presence early in development are remarkable, and often used to argue that the neurobiological organization of the human language system is the result of innate constraints (Dehaene-Lambertz et al., 2006; Berwick et al., 2013). Does the existence of a highly consistent set of regions for language acquisition and processing imply that this network is ‘hardwired’ and immutable to experience? Strong nativist theories for linguistic innateness leave little room for plasticity due to experience (Bates, 1999), suggesting that we should conceive ‘the growth of language as analogous to the development of a bodily organ’ (Chomsky, 1976, p.11). However, studies in infants born with extensive damage to cortical regions that are typically involved in language processing may develop normal language abilities, thereby demonstrating that the language network is subject to reorganization (Bates, 2005). Perhaps the most intriguing demonstrations to show that the neurobiology of language is susceptible to change due to experiencecome from studies showing functional selectivity to language in primary and secondary ‘visual’ areas in congenitally blind individuals (Röder et al., 2002; Burton, 2003; Amedi et al., 2004; Bedny et al., 2011; Arnaud et al., 2013). Such reorganization of the language network is particularly fascinating because it arises in the absence of injury to the core language network (Bates, 2005; Atilgan et al., 2017).

However, the level at which the occipital cortex is involved in speech representation in the early blind (EB), remains poorly understood. Speech comprehension requires that the brain extracts meaning from the acoustic features of sounds (de Heer et al., 2017). Although several neuroimaging studies have yielded valuable insights about the processing of speech in EB adults (Arnaud et al., 2013; Bedny et al., 2011; Büchel, 2003; Lane et al., 2015; Röder et al., 2002) and infants (Bedny et al., 2015), these methods do not adequately capture the fast and continuous nature of speech processing. Because speech unfolds over time, understanding spoken language relies on the ability to track the incoming acoustic signal in near real-time (Peelle and Davis, 2012). Indeed, speech is a fluctuating acoustic signal that rhythmically excites neuronal populations in the brain (Poeppel et al., 2008; Gross et al., 2013; Peelle et al., 2013). Several studies have demonstrated that neuronal populations in auditory areas entrain to the acoustic fluctuations that are present in human speech around the syllabic rate (Luo and Poeppel, 2007; Kayser et al., 2009; Szymanski et al., 2011; Zoefel and VanRullen, 2015). It has therefore been suggested that entrainment reflects a key mechanism underlying hearing by facilitating the parsing of individual syllables through adjusting the sensory gain relative to fluctuations in the acoustic energy (Giraud and Poeppel, 2012; Peelle and Davis, 2012; Ding and Simon, 2014). Crucially, because some regions that track the specific acoustic rhythm of speech are sensitive to speech intelligibility, neural synchronization is not only driven by changes in the acoustic cue of the auditory stimuli, but also reflects cortical encoding and processing of the auditory signal (Peelle et al., 2013; Ding and Simon, 2014). Speech tracking is therefore an invaluable tool to probe regions that interface speech perception and comprehension (Poeppel et al., 2008; Gross et al., 2013; Peelle et al., 2013).

Does the occipital cortex of EB people synchronize to speech rhythm? Is this putative synchronization of neural activity to speech influenced by comprehension? Addressing these questions would provide novel insights into the functional organization of speech processing rooted in the occipital cortex of EB people. In the current study, we investigated whether neuronal populations in blind occipital cortex synchronize to rhythmic dynamics of speech, by relating the amplitude fluctuations in speech to electromagnetic dynamics recorded from the participant’s brain. To this end, we quantified the local brain activity and directed connectivity in a group of early blind (EB; n = 17) and sighted individuals (SI; n = 16) using magnetoencephalography (MEG) while participants listened to short narrations from audiobooks. If the occipital cortex of the blind entrains to speech rhythms, this will support the idea that this region processes low-level acoustic features relevant for understanding language. We further tested whether the putative synchronization of occipital responses in EB people benefits from linguistic information or only relates to acoustic information. To separate linguistic and acoustic processes, we relied on a noise-vocoding manipulation. This method spectrally distorted the speech signal in order to impair intelligibility gradually, but systematically preserves the slow amplitude fluctuations responsible for speech rhythm (Shannon et al., 1995; Peelle and Davis, 2012). Furthermore, to go beyond differences in the local encoding of speech rhythm in the occipital cortex, we also investigated whether the connectivity between occipital and temporal regions sensitive to speech comprehension is altered in early blindness.

Results

Story comprehension

Participants listened to either natural speech segments (nat-condition) or to altered vocoded versions of these segments (see Materials and methods for details). In the 8-channel vocoded condition, the voice of the speaker is highly distorted but the intelligibility is unperturbed. By contrast, in the 1-channel vocoded condition, the speech is entirely unintelligible. After listening to each speech segment, participants were provided with a short statement about the segment, and asked to indicate whether the statement was true or false. Behavioural performance on these comprehension statements was analysed using linear mixed-effects models with maximum likelihood estimation. This method is a linear regression that takes into account dependencies in the data, as present in repeated measures designs. Blindness and intelligibility were included as fixed effects, while subject was modelled as a random effect. Intelligibility was nested in subjects. Intelligibility had a significant effect on story comprehension (χ(2)=110.7, p<0.001). The effect of blindness and the interaction between intelligibility and blindness were non-significant (χ(1)=1.14, p=0.286 and χ(2)=0.24, p=0.889). Orthogonal contrasts demonstrated that speech comprehension was stronger in the nat and 8-channel condition versus the 1-channel condition (b = 0.78, t(62)=14.43, p<0.001, r = 0.88). There was no difference between the nat and the 8-channel condition (b = 0.02, t(62)=1.39, p=0.17). Thus, speech intelligibility was reduced in the 1-channel, but not the 8-channel vocoded condition (Figure 1D). The lack of effect for the factor blindness suggests that there is no evidence for a potential difference in comprehension, attention or motivation between groups. Descriptive statistics of the group, and condition means are depicted in Table 1.

Table 1
Proportion of correct responses for the story comprehension questions.

Depicted are the number of subjects (N) in the early blind (EB) and sighted individuals (SI) groups, the mean (M), and the standard error of the mean (SE) for each of the three conditions (Natural, 8-channel, 1-channel).

https://doi.org/10.7554/eLife.31640.002
Natural8-channel1-channel
GroupNMSEMSEMSE
 EB170.8990.0190.8530.0310.5590.027
 SI150.9140.0200.8900.0250.5760.034
Cerebro-acoustic coherence in EB and SI.

(A) Segments from each condition (nat, 8-channel, and 1-channel) in the time domain (see also Video 1). Overlaid is the amplitude envelope of each condition. (B) Spectrogram of the speech samples from (A). The effect of spectral vocoding on the fine spectral detail in all three conditions. (C) The superimposed amplitude envelopes of the samples for each condition are highly similar, despite distortions in the fine spectral detail. (D) Behavioural performance on the comprehension statements reveals that comprehension is unperturbed in the nat and 8-channel conditions, whereas the 1-channel condition elicits chance performance. (E) Coherence spectrogram extracted from bilateral temporal sensors reveals a peak in cerebro-acoustic coherence at 7 Hz, across groups and conditions. The shaded area depicts the frequency range of interest in the current study. The topography shows the spatial extent of coherence values in the range 4–9 Hz. Enhanced coherence is observed selectively in bilateral temporal sensors. (F) Source reconstruction of the raw coherence confirms that cerebro-acoustic coherence is strongest in the vicinity of auditory cortex, bilaterally, extending into superior and middle temporal gyrus. (G) Statistically thresholded maps for the contrast between natural and 1-channel vocoded speech show an effect of intelligibility (p<0.05, FWE-corrected) in right STG. (H) Enhanced envelope tracking is observed for EB versus SI in a bilateral parieto-occipital network along the medial wall centred on Precuneus (p<0.05, FWE-corrected). (I) The statistical map shows the interaction effect between blindness and intelligibility: Early blind (EB) individuals show enhanced synchronization during intelligible (nat) versus non-intelligible speech (1-channel) as compared to SI in right calcarine sulcus (CS) (p<0.005, uncorrected). (J) Boxplots for three regions identified in the whole-brain analysis (top panel, STG; middle panel, parieto-occipital cortex; bottom panel, calcarine sulcus).

https://doi.org/10.7554/eLife.31640.003

Tracking intelligible speech in blind and sighted temporal cortex

The overall coherence spectrum, which highlights the relationship between the amplitude envelope of speech and the signal recorded from the brain, was maximal over temporal sensors between 6 Hz and 7 Hz (Figure 1E). This first analysis was performed on the combined dataset, and hence is not prone to bias or circularity for subsequent analyses, targeting group differences. The peak in the current study is slightly higher than those reported in previous studies (Gross et al., 2013). A likely explanation for this shift is the difference in syllabic rate between English (~6.2 Hz), used in previous studies, and Italian (~7 Hz) (Pellegrino et al., 2011). The syllabic rate is the main carrier of amplitude fluctuations in speech, and thus is most prone to reset oscillatory activity.

To capture the temporal scale of cerebro-acoustic coherence effects (6–7 Hz) optimally, and to achieve a robust source estimate, source reconstruction was performed on two separate frequency windows (6 ± 2, and 7 ± 2 Hz) for each subject and condition. The two source images were averaged for subsequent analysis, yielding a single image representing the frequency range 4–9 Hz. By combining the two frequency-bands, we acquire a source estimate that emphasizes the center of our frequency band of interest (6–7 Hz) and tapers off towards the edges. This source estimate optimally represents the coherence spectrum observed in sensor space. The choice of the center frequency was also restricted by the length of the time window used for the analysis. That is, with a 1 s time window and a resulting 1 Hz frequency resolution, a non-integer center frequency at, for example, 6.5 Hz was not feasible. The source-reconstructed coherence at the frequency-range of interest confirmed that cerebro-acoustic coherence was strongest across groups and conditions in bilateral temporal lobes, including primary auditory cortex. (Figure 1F).

To test whether envelope tracking in the current study is modulated by intelligibility, we compared the coherence maps for the intelligible (nat) versus non-intelligible (1-channel) condition, in all groups combined (i.e., EB and SI), with dependent-samples permutation t-tests in SPM. The resulting statistical map (Figure 1G, p<0.05, FWE-corrected) revealed a cluster in right superior and middle temporal cortex (STG, MTG), where synchronization to the envelope of speech was stronger when participants had access to the content of the story. In other words, the effect of intelligibility on the sensory response (coherence with the speech envelope) suggests an interface between acoustic speech processing and comprehension in STG that is present in both groups.

Speech tracking in blind parieto-occipital cortex

Whether EB individuals recruit additional neural substrate for tracking the envelope of speech was tested using independent-samples permutation t-tests in SPM, contrasting coherence maps between EB and SI for all three conditions combined. The statistical maps (Figure 1H, p<0.05, FWE-corrected) revealed enhanced coherence in EB versus SI in parieto-occipital cortex along the medial wall, centered on bilateral Precuneus, branching more extensively into the right hemisphere (see Table 2). This main effect of group highlights that neuronal populations in blind parieto-occipital cortex show enhanced synchronization to the acoustic speech signal. That is, blind participants show a stronger sensitivity to the sensory properties of the external speech stimulus at this level.

Table 2
MNI (Montreal Neurological Institute) coordinates and p-values for the three contrasts tested with threshold-free cluster enhancement (TFCE).
https://doi.org/10.7554/eLife.31640.004
MNI-coordinates (mm)
p-value (FWE-cor)xyz
Intelligibility
 Superior temporal gyrus0.02764−160
Blindness
Posterior cingulate0.0308−4040
 Postcentral sulcus0.03316−3264
Precuneus0.033-8−6448
Blindness x intelligibility
 Calcarine sulcus0.00716−880

Finally, to investigate whether and where envelope tracking is sensitive to top-down predictions during intelligible speech comprehension, we subtracted the unintelligible (1-channel) from the intelligible (nat) condition, and computed independent-samples t-tests between groups. As highlighted in the introduction, we were particularly interested in the role played by the Calcarine sulcus (V1) in processing semantic attributes of speech (Burton et al., 2002; Röder et al., 2002; Amedi et al., 2003), and therefore we restricted the statistical analysis to the area around the primary visual cortex. The search volume was constructed from four 10 mm spheres around coordinates in bilateral Calcarine sulcus ([−7,–81, −3]; [−6,–85, 4]; [12, -87, 0; 10, –84, 6]). The coordinates were extracted from a previous study on speech comprehension in EB (Burton et al., 2002). The resulting mask also include regions highlighted in similar studies by other groups (Röder et al., 2002; Bedny et al., 2011).

A significant effect of the interaction between blindness and intelligibility [(natEB – 1-channelEB) – (natSI – 1-channelSI)] was observed in right calcarine sulcus. To explore the spatial specificity of the effect, we show the whole-brain statistical map for the interaction between intelligibility and blindness at a more liberal threshold (p<0.005, uncorrected) in Figure 1I. This shows that intelligible speech selectively engages the area around the right calcarine sulcus in the blind versus the sighted. Specifically, the region corresponding to right ‘visual’ cortex showed enhanced sensitivity to intelligible speech in EB versus SI (p<0.05, FWE-corrected). However, the analysis contrasting the two intelligible conditions (nat and 8-channel) did not yield a significant effect in the EB, suggesting that the low-level degrading of the stimulus alone does not drive the effect, but rather the intelligibility of the speech segment. Follow-up post-hoc comparisons between the two groups revealed that coherence was stronger during the intelligible condition for EB versus SI (t[30] = 3.09, p=0.004), but not during the unintelligible condition (t[30] = –1.08, p=0.29).

A group difference between EB and SI across conditions was not observed in the calcarine region (t[30] = 1.1, p=0.28). The lack of an overall group effect in calcarine sulcus suggests that there is not a simple enhanced sensory response to speech in the blind. This was different to the response we observed in parietal cortex, in which synchronization was stronger in the blind than in the sighted. Rather, the response in calcarine sulcus only differed between the two groups when speech was intelligible. While overall coherence with the speech envelope was equally high in EB and SI, the blind population showed significantly higher coherence in the intelligible speech condition than did the sighted, who showed higher cerebro-acoustic coherence in the unintelligible condition (1-Chan). The latter is reminiscent of the fact that the more adverse the listening condition (low signal-to-noise-ratio or audio-visual incongruence), the more the visual cortex is entrained to the visual speech signal of actual acoustic speech when presented together with varying levels of acoustic noise (Park et al., 2016; Giordano et al., 2017). Moreover, Giordano et al. (2017) showed an increase of directed connectivity between superior frontal regions and visual cortex under the most challenging (acoustic noise and uninformative visual cues) conditions, again suggesting a link between the reorganization observed in the occipital cortex of blind individuals and typical multisensory pathways involving the occipital cortex in audio-visual speech processing (Kayser et al., 2008). Visual deprivation since birth, however, triggers a functional reorganization of the calcarine region that can then dynamically interact with the intelligibility of the speech signal. For illustration purposes, cerebro-acoustic coherence from functional peak locations for intelligibility (across groups) in STG, blindness in the parieto-occipital cortex (Figure 1G; POC), and the interaction are represented as boxplots in Figure 1J.

Changes in occipito-temporal connectivity in the EB

To further investigate whether cerebro-acoustic peaks in CS and STG which are sensitive to intelligible speech in EB are indicative of a more general re-organization of the network, we conducted functional connectivity analysis. Statistical analysis of the connectivity estimates was performed on the mean phase-locking value in the theta (4–8 Hz) range.

Using linear mixed-effects models, blindness (EB, SI), intelligibility (nat, 1-channel), and the interaction between blindness and intelligibility were added to the model in a hierarchical fashion. Blindness and intelligibility were modelled as fixed effects, while subject was a random effect. Intelligibility was nested in subject. A main effect was observed only for blindness (χ[1] = 4.32, p=0.038) and was caused by greater connectivity for EB versus SI (Figure 2A–B). The main effects of intelligibility and of the interaction between intelligibility and blindnesswere non-significant (p=0.241 and p=0.716, respectively).

Occipital-temporal connectivity in EB and SI.

(A) Spectra illustrate the phase locking between CS and STG in EB (dark curve) and in SI (light curve). Shaded areas illustrate the SEM. A difference between the curves is observed in the theta range. (B) Boxplots depict the mean phase locking between CS and STG for EB (black) and SI (white) in the theta range. Connectivity is enhanced in EB versus SI. (C) Directional connectivity analysis using the phase-slope index (PSI). Positive values suggest enhanced directionality from STG to CS, and negative values represent enhanced connectivity in the opposite direction. The boxplots highlight the strong feed-forward drive from CS to STG shown by SI, whereas blind individuals show a more balanced pattern, with a non-significant trend in the opposite direction.

https://doi.org/10.7554/eLife.31640.005

Subsequently, linear mixed-effects models were applied to test for the effects of blindness and intelligibility on the directional information flow (PSI) between CS and STG. The fixed and random effects structure was the same as that described in the previous analysis. Here, only the main effect of blindness was significant (χ[1] = 4.54, p=0.033), indicating that the directional information flow differs between groups. The effect of intelligibility and the interaction between intelligibility and blindness were both non-significant (p=0.51 and p=0.377, respectively). Follow-up post-hoc one-sample t-tests on the phase slope estimates for each group individually revealed a significant direction bias from CS to STG for SI (t[14] = –2.22, p=0.044). No directional bias was found for EB (t[16] = 0.74, p=0.47). As depicted in Figure 2C–D, these results suggest that CS projects predominantly to STG in SI, whereas in EB, this interaction is more balanced and trending in the opposite direction.

Discussion

To test whether neuronal populations in the early blind’s occipital cortex synchronize to the temporal dynamics present in human speech, we took advantage of the statistical and conceptual power offered by correlating magnetoencephalographic recordings with the envelope of naturalistic continuous speech. Using source-localized MEG activity, we confirm and extend previous research (Scott et al., 2000; Davis and Johnsrude, 2003; Narain et al., 2003; Rodd et al., 2005, 2010; Okada et al., 2010; Peelle et al., 2010) by showing that temporal brain regions entrain to the natural rhythm of speech signal in both blind and sighted groups (Figure 1F). Strikingly, we show that following early visual deprivation, occipito-parietal regions enhance their synchronization to the acoustic rhythm of speech (Figure 1G) independently of language content. That is, the region around the Precuneus/Cuneus area seems to be involved in low-level processing of acoustic information alone. Previous studies have demonstrated that this area enhances its response to auditory information after early, but not late, visual deprivation (Collignon et al., 2013) and maintains this elevated response to sound even after sight-restoration early in life (Collignon et al., 2015). These studies did not find that this crossmodal response was related to a specific cognitive function, but rather pointed to a more general response to sounds (Collignon et al., 2013, 2015). Interestingly, the current data suggest that this area is capable of parsing complex temporal patterns in sounds but is insensitive to the intelligibility of speech, again suggesting a more general role in sound processing.

In order to isolate the brain regions that are modulated by speech comprehension, we identified regions showing a greater response to amplitude modulations that convey speech information as compared to amplitude modulations that do not. In addition to the right STG observed in both groups (Figure 1H), the blind showed enhanced cerebro-acoustic coherence during intelligible speech in the vicinity of calcarine sulcus (V1; Figure 1I). This pattern of local encoding was accompanied by enhanced occipito-temporal connectivity during speech comprehension in EB as compared to SI. SI show the expected feed-forward projections from occipital to temporal regions (Lamme et al., 1998), whereas EB show a more balanced connectivity profile, trending towards the reverse temporal to occipital direction (see Figure 2). These findings support the idea of a reverse hierarchical model (Büchel, 2003) of the occipital cortex in EB, where the regions typically coding for ‘low-level’ visual features in the sighted (e.g. visual contrast or orientation) participate in higher-level function (e.g. speech intelligibility). Indeed, previous studies have found increased activity in the primary ‘visual’ cortex of EB people during Braille reading (Sadato et al., 1996; Burton et al., 2002, 2012), verbal memory and verb generation tasks (Amedi et al., 2003), and during auditory language-related processing (Bedny et al., 2011). In line with our results, activity in primary occipital regions in EB people is stronger in a semantic versus a phonologic task (Burton et al., 2003), and vary as a function of syntactic and semantic complexity (Röder et al., 2002; Bedny et al., 2011; Lane et al., 2015). Moreover, repetitive transcranial magnetic stimulation (rTMS) over the occipital pole induces more semantic errors than phonologic errors in a verb-generation task in EB people (Amedi et al., 2003). As we show that occipital regions entrain to the envelope of speech and are enhanced by its intelligibility, our results clearly suggest that the involvement of the occipital pole for language is not fully abstracted from sensory inputs as previously suggested (Bedny, 2017). The role of the occipital cortex in tracking the flow of speech in EB people may constitute an adaptive strategy to boost perceptual sensitivity at informational peaks in language.

Mechanistic origins of envelope tracking in occipital cortex

Although language is not the only cognitive process that selectively activates the occipital cortex of EB people, it is arguably one of the most puzzling. The reason is that reorganization of other processes, such as auditory motion perception and tactile object recognition, appears to follow the topography of the functionally equivalent visual processes in the sighted brain (Ricciardi et al., 2007a; Amedi et al., 2010; Dormal et al., 2016). For example, the hMT+/V5 complex, which is typically involved in visual motion in the sighted, selectively processes auditory (Poirier et al., 2006; Dormal et al., 2016; Jiang et al., 2016) or tactile (Ricciardi et al., 2007b) motion in blind people. However, in the case of language, such recruitment is striking in light of the cognitive and evolutionary differences between vision and language (Bedny et al., 2011). This led to the proposal that, at birth, human cortical areas are cognitively pluripotent: capable of assuming a broad range of unrelated cognitive functions (Bedny, 2017). However, this argument resides on the presupposition that language has no computational relation with vision. But does this proposition tally with what we know about the relationship between the visual system and the classical language network? Rhythmic information in speech has a well-known language-related surrogate in the visual domain: lip movements (Lewkowicz and Hansen-Tift, 2012; Park et al., 2016). Indeed, both acoustic and visual speech signals exhibit rhythmic temporal patterns at prosodic and syllabic rates (Chandrasekaran et al., 2009; Schwartz and Savariaux, 2014; Giordano et al., 2017). The perception of lip kinematics that are naturally linked to amplitude fluctuations in speech serves as an important vehicle for the everyday use of language (Kuhl and Meltzoff, 1982; Weikum et al., 2007; Lewkowicz and Hansen-Tift, 2012) and helps language understanding, particularly in noisy conditions (Ross et al., 2007). Indeed, reading lips in the absence of any sound activates both primary and association auditory regions overlapping with regions that are active during the actual perception of spoken words (Calvert et al., 1997). The synchronicity between auditory and visual speech entrains rhythmic activity in the observer’s primary auditory and visual regions, and facilitates perception by aligning neural excitability with acoustic or visual speech features (Schroeder et al., 2008; Schroeder and Lakatos, 2009; Giraud and Poeppel, 2012; Mesgarani and Chang, 2012; Peelle and Davis, 2012; van Wassenhove, 2013; Zion Golumbic et al., 2013b; Park et al., 2016; Giordano et al., 2017). These results strongly suggest that both the auditory and the visual components of speech are processed together at the earliest level possible in neural circuitry, based on the shared slow temporal modulations (around 2–7 Hz range) present across modalities (Chandrasekaran et al., 2009). Corroborating this idea, it has been demonstrated that neuronal populations in visual cortex follow the temporal dynamics of lip movements in sighted individuals, similar to the way in which temporal regions follow the acoustic and visual fluctuations of speech (Luo et al., 2010; Zion Golumbic et al., 2013a; Park et al., 2016). Similar to temporal cortex, occipital cortex in the sighted also shows enhanced lip tracking when attention is directed to speech content. This result highlights the fact that a basic oscillatory architecture for tracking the dynamic aspects of (visual-) speech in occipital cortex exists even in sighted individuals.

Importantly, audiovisual integration of the temporal dynamics of speech has been suggested to play a key role when learning speech early in life: young infants detect, match, and integrate the auditory and visual temporal coherence of speech (Kuhl and Meltzoff, 1982; Rosenblum et al., 1997; Lewkowicz, 2000, 2010; Brookes et al., 2001; Patterson and Werker, 2003; Lewkowicz and Ghazanfar, 2006; Kushnerenko et al., 2008; Bristow et al., 2009; Pons et al., 2009; Vouloumanos et al., 2009; Lewkowicz et al., 2010; Nath et al., 2011). For instance, young infants between 4 and 6 months of age can detect their native language from lip movements only (Weikum et al., 2007). Around the same period, children detect synchrony between lip movements and speech sounds, and distribute more attention towards the mouth than towards the eyes (Lewkowicz and Hansen-Tift, 2012). Linking what they hear to the lip movements may provide young infants with a stepping-stone towards language production (Lewkowicz and Hansen-Tift, 2012). Moreover, infants aged 10 weeks already exhibit a McGurk effect, again highlighting the early multisensory nature of speech perception (Rosenblum et al., 1997). Taken together, these results suggest that an audio-visual link between observing lip movements and hearing speech sounds is present at very early developmental stages, potentially from birth, which helps infants acquire their first language.

In line with these prior studies, the current results may support the biased connectivity hypothesis of cross-modal reorganization (Reich et al., 2011; Hannagan et al., 2015; Striem-Amit et al., 2015). Indeed, it has been argued that reorganization in blind occipital cortex may be constrained by functional pathways to other sensory and cognitive systems that are also present in sighted individuals (Elman et al., 1996; Hannagan et al., 2015). This hypothesis may explain the overlap in functional specialization between blind and sighted individuals (Collignon et al., 2011; Dormal et al., 2016; He et al., 2013; Jiang et al., 2016; Peelen et al., 2013; Pietrini et al., 2004Weeks et al., 2000; Poirier et al., 2006). In our experiment, sensitivity to acoustic dynamics of intelligible speech in blind occipital cortex could arise from pre-existing occipito-temporal pathways connecting the auditory and visual system that are particularly important for the early developmental stages of language acquisition. In fact, the reorganization of this potentially predisposed pathway to process language content would explain how language-selective response may appear in blind children as young as 3 years old (Bedny et al., 2015).

Previous studies have suggested that language processing in the occipital cortex arises through top-down projections from frontal regions typically associated with the classical language network (Bedny et al., 2011; Deen et al., 2015), and that the representational content is symbolic and abstract rather than sensory (Bedny, 2017). Our results contrast with this view by showing that neuronal populations in (peri-)calcarine cortex align to the temporal dynamics of intelligible speech, and are functionally connected to areas sensitive to auditory information in temporal cortex. In sighted individuals, regions of the temporal lobe including STS are sensitive to acoustic features of speech, whereas higher-level regions such as anterior temporal cortex and left inferior frontal gyrus are relatively insensitive to these features and therefore do not entrain to the syllabic rate of speech (Davis and Johnsrude, 2003; Hickok and Poeppel, 2007; see confirmation in Figure 1F–G). This suggests that occipital areas respond, at least partially, to speech at a much lower (sensory) level than previously thought in EB brains, which may be caused by the reorganization of existing multisensory pathways connecting the ‘auditory’ and ‘visual’ centres in the brain.

Functional dependencies between sensory systems exist between the earliest stages of sensory processing in both human (Ghazanfar and Schroeder, 2006; Kayser et al., 2008; Schroeder and Lakatos, 2009; Murray et al., 2016) and nonhuman primates (Falchier et al., 2002; Lakatos et al., 2007; Schroeder and Lakatos, 2009). Several neuroimaging studies have demonstrated enhanced functional connectivity in sighted individuals between auditory and visual cortices under multisensory conditions (see Murray et al., 2016 for a recent review), including multisensory speech (Giordano et al., 2017). Moreover, neuroimaging studies have shown that hearing people consistently activate left temporal regions during silent speech-reading (Calvert et al., 1997; MacSweeney et al., 2000, 2001). We therefore postulate that brain networks that are typically dedicated to the integration of audio-visual speech signal, might be reorganized in the absence of visual inputs and might lead to an extension of speech tracking in the occipital cortex (Collignon et al., 2009). Although an experience-dependent mechanism related to EB affects the strength and directionality of the connectivity between the occipital and temporal regions, the presence of intrinsic connectivity between these regions — which can also be observed in sighted individuals (e.g. for multisensory integration) — may constrain the expression of the plasticity observed in our task. Building on this connectivity bias, the occipital pole may extend its sensitivity to the intelligibility of speech, a computation this region is obviously not originally dedicated to.

A number of recent studies have suggested that visual deprivation reinforces the functional connections between the occipital cortex and auditory regions typically classified as the language network (Hasson et al., 2016; Schepers et al., 2012). Previous studies using dynamic causal modelling support the idea that auditory information reaches the reorganized occipital cortex of the blind through direct temporo-occipital connection, rather than using subcortical (Klinge et al., 2010) or top-down pathways (Collignon et al., 2013). In support of these studies, we observed that the overall magnitude of functional connectivity between occipital and temporal cortex is higher in blind people than in sighted people during natural speech comprehension. Moreover, directional connectivity analysis revealed that the interaction between the two cortices is also qualitatively different: sighted individuals show a strong feed-forward drive towards temporal cortex, whereas blind individuals show a more balanced information flow, and a trend in the reverse direction. These results highlight one possible pathway by which the speech signal is enhanced in the occipital cortex of EB individuals. However, this does not mean that the changes in connectivity between blind and sighted individuals are limited to this specific network (e.g. see Kayser et al., 2015; Park et al., 2015).

Speech comprehension in the right hemisphere

We observed that neuronal populations in right superior temporal cortex synchronize to the temporal dynamics of intelligible, but not non-intelligible, speech in both EB and SI groups. Why does speech intelligibility modulate temporal regions of the right, but not the left, hemisphere? According to an influential model of speech comprehension – the asymmetric sampling in time model (AST; Giraud and Poeppel, 2012; Hickok and Poeppel, 2007; Poeppel, 2003) – there is a division of labour between the left- and right auditory cortices (Poeppel, 2003; Boemio et al., 2005; Hickok and Poeppel, 2007), with the left auditory cortex being more sensitive to high-frequency information (+20 Hz), whereas the right temporal cortex is more sensitive to low-frequency information (~6 Hz) such as syllable sampling and prosody (Belin et al., 1998; Poeppel, 2003; Boemio et al., 2005; Obleser et al., 2008; Giraud and Poeppel, 2012). Several studies have shown that the right hemisphere is specifically involved in the representation of connected speech (Bourguignon et al., 2013; Fonteneau et al., 2015; Horowitz-Kraus et al., 2015; Alexandrou et al., 2017), whereas other studies have directly demonstrated the prevalence of speech-to-brain entrainment while listening to sentences or stories in delta and theta bands in the right hemisphere more than in the left hemisphere (Luo and Poeppel, 2007; Abrams et al., 2008; Gross et al., 2013; Giordano et al., 2017). The present study therefore replicates these results by showing enhanced phase coupling between the right hemisphere and the speech envelope at the syllabic rate (low-frequency phase of speech envelope), consistent with the AST model. An interesting observation in the current study is that right hemispheric sensitivity to intelligible speech in temporal areas coincides with the enhanced right hemispheric sensitivity to intelligible speech in the occipital cortex of blind individuals.

Having more cortical tissue devoted to sentence processing and understanding could potentially support enhanced sentence comprehension. Previous studies have indeed demonstrated that, as compared to sighted individuals, blind people have enhanced speech discrimination in noisy environments (Niemeyer and Starlinger, 1981), as well as the capability to understand speech displayed at a much faster rate (sighted listeners at rates of 9–14 syllables/s andblind listeners at rates of 17–22 syllables/s; Moos and Trouvain, 2007). Crucially, listening to intelligible ultra-fast speech (as compared to reverse speech) has been shown to cause enhanced activity in the right primary visual cortex in early and late blind individuals when compared to sighted controls, and activity in this region correlates with individual ultra-fast speech perception skills [Dietrich et al., 2013]). These results raise the interesting possibility that the engagement of right V1 in the analysis of the speech envelope, as demonstrated in our study, may support the enhanced encoding of early supra-segmental aspects of the speech signal, supporting the ability to understand an ultra-fast speech signal. However, our behavioral task (speech comprehension) did not allow us to directly assess this link between the reorganized occipital cortex and speech comprehension, as by design performance was almost at ceiling in the nat and 8-chan condition but at chance in the 1-chan condition (see Table 1).

However, it is important to note that linking brain activity and behavior is not a trivial issue. Many studies have not found a direct link between crossmodal reorganization and non-visual processing. More generally, as a behavioral outcome is the end product of a complex interactive process between several brain regions, linking the role of one region in isolation (e.g. the reorganized region in the occipital cortex of blind people) to behavior (e.g. performance) in a multifaced task is not straightforward. An absence of a direct relation between behavior (e.g. speech processing) and crossmodal plasticity could be explained by the fact that this complex process is supported by additional networks other than the reorganized ones. Interestingly, recent studies have shown that early visual deprivation triggers a game of ‘balance’ between the brain systems typically dedicated to a specific process and the reorganized occipital cortex (Dormal et al., 2016). It has indeed been proposed that early visual deprivation triggers a redeployment mechanism that would reallocate part of the processing typically implemented in the preserved networks (i.e. the temporal or frontal cortices for speech processing) to the occipital cortex deprived of its most salient input (vision). Two recent studies using multivoxel pattern analysis (MVPA) showed that the ability to decode the different auditory motion stimuli was enhanced in hMT+ (a region typically involved in visual motion in sighted individuals) of early blind individuals, whereas an enhanced decoding accuracy was observed in the planum temporale in the sighted group (Jiang et al., 2016; Dormal et al., 2016). Moreover, Bedny et al. (2015) reported an enhanced activation of occipital cortex and a simultaneous deactivation of prefrontal regions during a linguistic task in blind children. The authors suggested that the increased involvement of the occipital cortex might decrease the pressure on the prefrontal areas to specialize in language processing. Interestingly, a transcranial magnetic stimulation (TMS) study reported a reduced disruptive effect of TMS applied over the left inferior prefrontal cortex during linguistic tasks for both blind and sighted individuals, whereas TMS applied over the occipital cortex caused more disruption in early blind as compared to sighted people (Amedi et al., 2004). Such studies support the idea that brain regions that are typically recruited for specific tasks in sighted individuals may become less essential in EB people if they concomitantly recruit occipital regions for the same task. An important open question for future research therefore concerns the relative behavioral contribution of occipital and perisylvian cortex to speech understanding.

Conclusions

We demonstrate that the primary ‘visual’ cortex synchronizes to the temporal dynamics of intelligible speech at the rate of syllable transitions in language (~6–7 Hz). Our results demonstrate that this neural population is involved in processing the sensory signal of speech, and therefore contrasts with the proposition that occipital involvement in speech processing is abstracted from its sensory input and purely reflects higher-level operations similar to those observed in prefrontal regions (Bedny, 2017). Blindness, due to the absence of organizing visual input, leaves the door open for sensory and functional colonization of occipital regions. This colonization might however not be stochastic, but could be constrained by modes of information processing that are natively anchored in specific brain regions and networks. Even if the exact processing mode is still to be unveiled by future research, we hypothesise that the mapping of language onto occipital cortex builds upon pre-existing oscillatory architecture typically linking auditory and visual speech rhythm (Chandrasekaran et al., 2009). Our study therefore supports the view that the development of functional specialisation in the human cortex is the product of a dynamic interplay between genetic and environmental factors during development, rather than being predetermined at birth (Elman et al., 1996).

Video 1
Movie of the natural and two vocoding conditions used for one exemplar of auditory segment used in our experiment.
https://doi.org/10.7554/eLife.31640.006

Materials and methods

Participants

Seventeen early blind (11 female; mean ± SD, 32.9 ± 10.19 years; range, 20–67 years) and sixteen sighted individuals (10 female; mean ± SD, 32.2 ± 9.92 years; range, 20–53 years) participated in the current study. There was no age difference between the blind and the sighted group (t[30] = 0.19, p=0.85). All blind participants were either totally blind or severely visually impaired from birth; however, two participants reported residual visual perception before the age of 3 and one before the age of 4, as well as one participant who lost their sight completely at age 10. Causes of vision loss were damage or detached retina (10), damage to the optic nerve (3), infection of the eyes (1), microphtalmia (2), and hypoxia (1). Although some participants reported residual light perception, none were able to use vision functionally. All participants were proficient braille readers and native speakers of Italian. None of them suffered from a known neurological or peripheral auditory disease. The data from one sighted individual were not included because of the discovery of a brain structural abnormality that was unknown to the experimenters at the time of the experiment. The project was approved by the local ethical committee at the University of Trento. In agreement with the Declaration of Helsinki, all participants provided written informed consent before participating in the study.

Experimental design

Auditory stimuli were delivered into the magnetically shielded MEG room via stereo loudspeakers using a Panphonics Sound Shower two amplifier at a comfortable sound level, which was the same for all participants. Stimulus presentation was controlled via the Matlab (RRID:SCR_001622) Psychophysics Toolbox 3 (http://psychtoolbox.org; RRID:SCR_002881) running on a Dell Alienware Aurora PC under Windows 7 (64 bit). Both sighted and blind participants were blindfolded during the experiment, and the room was dimly lit to allow for visual monitoring of the participant via a video feed from the MEG room. Instructions were provided throughout the experiment using previous recordings from one of the experimenters (FB).

The stimuli consisted of 14 short segments (~1 min) from popular audiobooks (e.g., Pippi Longstocking and Candide) in Italian (nat). Furthermore, channel-vocoding in the Praat software was used to produce two additional control conditions. First, the original sound file was band-pass filtered into 1 (1-channel) or 8 (8-channel) logarithmically spaced frequency bands. The envelope for each of these bands was computed, filled with Gaussian white noise, and the different signals were recombined into a single sound file. The resulting signal has an amplitude envelope close to the original, while the fine spectral detail was gradually distorted. Perceptually, the voice of the speaker is highly distorted in the 8-channel condition, but intelligibility is unperturbed. By contrast, in the 1-channel condition, speech is entirely unintelligible. In total, 42 sound files were presented in a pseudo-randomized fashion, distributed among seven blocks. Each block contained two sound files from each condition, and the same story was never used twice in the same block. Examples of the stimuli are provided in supplementary media content ( see Video 1).

To verify story comprehension, each speech segment was followed by a single-sentence statement about the story. Participants were instructed to listen to the story carefully, and to judge whether the statement at the end was true or false, using a nonmagnetic button box. Responses were provided with the index and middle finger of the right hand.

MEG data acquisition and preprocessing

MEG was recorded continuously from a 306 triple sensor (204 planar gradiometers; 102 magnetometers) whole-head system (Elekta Neuromag, Helsinki, Finland) using a sampling rate of 1 kHz and an online band-bass filter between 0.1 and 300 Hz. The headshape of each individual participant was measured using a Polhemus FASTRAK 3D digitizer. Head position of the subject was recorded continuously using five localization coils (forehead, mastoids).

Data pre-processing was performed using the open-source Matlab toolbox Fieldtrip (www.fieldtriptoolbox.org; RRID:SCR_004849). First the continuous data were filtered (high-pass Butterworth filter at 1 Hz, low-pass Butterworth filter at 170 Hz, and DFT filter at 50,100, and 150 Hz to remove line-noise artefacts in the signal), and downsampled to 256 Hz. Next, the data were epoched into segments of 1 s for subsequent analysis.

Rejection of trials containing artefacts and bad channels was performed using a semi-automatic procedure. First, outliers were rejected using a pre-screening based on the variance and range in each trial/channel. Then, algorithmically guided visual inspection of the raw data was performed to remove any remaining sources of noise.

Extraction of the speech amplitude envelope

Amplitude envelopes of the stories were computed using the Chimera toolbox (Chandrasekaran et al., 2009) and custom code, following the procedure described by Gross and colleagues (Gross et al., 2013). First, the sound files were band-pass filtered between 100 and 1000 Hz into nine frequency-bands, using a fourth order Butterworth filter. The filter was applied in forward and backward direction to avoid any phase shifts with respect to the original signal, and frequency-bands were spaced with equal width along the human basilar membrane. Subsequently, the analytic amplitude for each filtered segment was computed as the absolute of the Hilbert transform. Finally, the amplitude envelopes of all nine bands were summed and scaled to a maximum value of 1. The resulting envelope was combined with the MEG data, and processed identically henceforth.

Analysis of cerebro-acoustic coherence

To determine where, and at what temporal scale, neuronal populations follow the temporal dynamics of speech, we computed spectral coherence between the speech envelope and the MEG signal. Coherence is a statistic that quantifies the phase relationship between two signals, and can be used to relate oscillatory activity in the brain with a peripheral measure such as a speech signal (Peelle et al., 2013). The first analysis was conducted in sensor space, across conditions and participants, to determine the temporal scale at which coherence between the speech envelope and the MEG signal is strongest. To this end, a Hanning taper was applied to the 1 s data segments, and Fourier transformation was used to compute the cross-spectral density between 1 and 30 Hz, with a step size of 1 Hz. To render the coherence values more normally distributed, a Fisher z-transform was applied by computing the inverse hyperbolic tangent (atanh).

Source-space analysis was centred on the coherence frequency-band of interest (FOI) identified in the sensor space analysis. Source reconstruction was performed using a frequency domain beamformer called Dynamic Imaging of Coherent Sources (DICS) (Gross et al., 2001; Liljeström et al., 2005). DICS was used to localize spectral coherence, observed at the sensors, on a three-dimensional grid (8 × 8 × 8 mm). The forward model was based on a realistic single-shell headmodel (Nolte, 2003) for each individual. As structural scans could not be acquired for all participants, we approximated individual anatomy by warping a structural MNI template brain (MNI, Montreal, Quebec, Canada; www.bic.mni.mcgill.ca/brainweb) into individual headspace using the information from each participant’s headshape.

Source connectivity analysis

Functional connectivity between occipital and temporal cortex was computed by extracting virtual sensor time-series at the locations of interest in CS and STG using time-domain beamforming. These virtual sensor time series were used to compute non-directional and directional connectivity metrics. The rationale behind focusing our analyses on STG and CS is to limit our hypothesis space by the results as they unfolded in our analytic steps. Indeed, we found that the right STG was the only region modulated by the intelligibility of speech in both groups, confirming previous results using similar methods (Luo and Poeppel, 2007; Abrams et al., 2008; Gross et al., 2013; Giordano et al., 2017). In addition to STG, we found enhanced cerebro-acoustic coherence for intelligible speech in CS of EB individuals. We therefore decided to focus our connectivity analyses on these two ROIs since they were functionally defined from our first analytical step (modulation of cerebro-acoustic coherence by speech intelligibility). The advantage of following this procedure is that connectivity analyses are based on nodes that we can functionally interpret and for which we have clear predictions. We successfully used the same hierarchical analytic structure in previous studies (Collignon et al., 2013, 2015; Benetti et al., 2017), following guidelines on how to investigate directional connectivity in brain networks (Stephan et al., 2010).

Virtual sensor time-series at the two locations of interest were extracted using a time-domain vector-beamforming technique called linear constrained minimum variance (LCMV) beamforming (Van Veen et al., 1997). First, average covariance matrices were computed for each participant to estimate common spatial filter coefficients. These filter coefficients were multiplied with the single-trial cleaned MEG data. To reduce the resulting three-dimensional time-series to one singular value, decomposition (SVD) was applied, resulting in a single time-series for each trial and participant.

Functional connectivity between the two regions was computed at peak locations using the phase-locking value (PLV) (Lachaux et al., 1999). First, single-trial virtual sensor time courses were converted to the frequency domain (0–50 Hz) using Fourier transformation. The data were padded to 2 s and a Hanning taper was applied to reduce spectral leakage. Coherence was used as a proxy for functional connectivity. To disentangle phase consistency between regions from joint fluctuations in power, the spectra were normalized with respect to the amplitude, resulting in an estimate of the phase-locking (Lachaux et al., 1999) between regions. Connectivity estimates were normalized using a Fischer z-transform (atanh), as in the analysis of cerebro-acoustic coherence.

PLV is a symmetric proxy for connectivity and does not allow for inferences regarding the direction of information flow between two regions of interest. To test whether differences in functional connectivity between EB and SI are accompanied by changes in the directionality of the flow of information between the regions of interest, we computed the phase slope index (PSI) (Nolte et al., 2008). The PSI deduces net directionality from the time delay between two time series (x1 and x2). Time series x1 is said to precede, and hence drive, time series x2 in a given frequency band if the phase difference between x1 and x2 increases with higher frequencies. Consequently, a negative phase slope reflects a net information flux in the reverse direction, that is, from x2 to x1. Here, we computed the PSI using a bandwidth of ±5 Hz around the frequencies of interest. Following the recommendation by Nolte and colleagues (Nolte et al., 2008), PSI estimates were normalized with the standard error, which was computed using the jackknife method.

Statistical analysis

Statistical testing of the behavioural comprehension scores, as well as the connectivity estimates, was performed using linear mixed-effects models in R. Differences between conditions and groups in source space were evaluated using Statistical Parametric Mapping (SPM12; RRID:SCR_007037), and the Threshold-Free Cluster Enhancement (TFCE) toolboxes in Matlab. TFCE (Smith and Nichols, 2009) computes new values for each voxel in a statistical map as a function of the original voxel value and the values of the surrounding voxels. By enhancing the statistical values in voxels with a high T-value that are also part of a local cluster, TFCE optimally combines the benefits of voxel-wise and cluster-based methods. TFCE was applied to the whole-brain contrasts. Final correction for multiple comparisons was applied using a maximum statistic based on 1000 permutations of group membership (independent testing) or conditions (dependent testing). Here, we applied a variance smoothing of 15 mm FWHM to reduce the effects of high spatial frequency noise in the statistical maps. The smoothing kernel used in the current study is higher than that in comparable fMRI studies because of the inherent smoothness of the source-reconstructed MEG data.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
    Biol Knowl Revisit From Neurogenes to Psychogenes
    1. ED Bates
    (2005)
    205–253, Plasticity, Localization, and Language Development, Biol Knowl Revisit From Neurogenes to Psychogenes, 9351.
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
    Visual cortex activity in early and late blind people
    1. H Burton
    (2003)
    The Journal of Neuroscience : The Official Journal of the Society for Neuroscience 23:4005–4011.
  24. 24
  25. 25
  26. 26
  27. 27
    Reflections on Language
    1. N Chomsky
    (1976)
    Reflections on Language.
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
    Hierarchical processing in spoken language comprehension
    1. MH Davis
    2. IS Johnsrude
    (2003)
    Journal of Neuroscience 23:3423–3431.
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
    Rethinking Innateness: Connectionist Perspective on Development
    1. JL Elman
    2. EA Bates
    3. MH Johnson
    4. A Karmiloff-Smith
    5. D Parisi
    6. K Plunkett
    (1996)
    Rethinking Innateness: Connectionist Perspective on Development.
  40. 40
    Anatomical evidence of multimodal integration in primate striate cortex
    1. A Falchier
    2. S Clavagnier
    3. P Barone
    4. H Kennedy
    (2002)
    The Journal of Neuroscience : The Official Journal of the Society for Neuroscience 22:5749–5759.
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
    Comprehension of ultra-fast speech–blind vs.’normally hearing’ persons
    1. A Moos
    2. J Trouvain
    (2007)
    Proc 16th Int Congr Phonetic Sci. pp. 677–680.
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
  87. 87
  88. 88
  89. 89
  90. 90
  91. 91
  92. 92
  93. 93
  94. 94
  95. 95
  96. 96
  97. 97
  98. 98
  99. 99
  100. 100
  101. 101
  102. 102
  103. 103
  104. 104
  105. 105
  106. 106
  107. 107
  108. 108
  109. 109
  110. 110
  111. 111
  112. 112
  113. 113
  114. 114
  115. 115
  116. 116
  117. 117
  118. 118
    A positron emission tomographic study of auditory localization in the congenitally blind
    1. R Weeks
    2. B Horwitz
    3. A Aziz-Sultan
    4. B Tian
    5. CM Wessinger
    6. LG Cohen
    7. M Hallett
    8. JP Rauschecker
    (2000)
    The Journal of Neuroscience : The Official Journal of the Society for Neuroscience 20:2664–2672.
  119. 119
  120. 120
  121. 121
  122. 122

Decision letter

  1. Andrew J King
    Reviewing Editor; University of Oxford, United Kingdom

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Neuronal populations in the occipital cortex of the blind synchronize to the temporal dynamics of speech" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Andrew King as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Elana Zion Golumbic (Reviewer #2); Yanchao Bi (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This paper examines the role of the deprived visual cortex in the processing of language in blind individuals. Using magnetoencephalography, the authors report that, compared to individuals who could see, early-blind subjects show more synchronized brain responses within the calcarine sulcus to the temporal dynamics of speech. Intriguingly, they observed enhanced synchronization to the intelligibility of speech in the region of the primary visual cortex, and changes in the flow of information between areas of the occipital and temporal cortex that are sensitive to speech intelligibility. The reviewers agreed that this is an important topic and that this is a timely study that builds nicely on previous work on the neural tracking of speech and in the domain of language-system plasticity in the blind. However, various issues were identified with the analysis and presentation of the data, which have implications for the conclusions made by the authors.

Essential revisions:

1) The authors offer up a compelling narrative, but do not address the fundamental question of why blind participants need to recruit visual cortices for speech comprehension. What is the added functional value of having a more expanded brain network? The manuscript provides no explanation for this central aspect of the results. This needs to be addressed in the Discussion.

2) The reviewers found your description of the data analysis confusing and inadequate. Please be more specific about exactly which group or interaction effects are used as the marker for sensory processing. If this is based on the interaction between group and speech intelligibility in the calcarine sulcus, as assumed by one of the reviewers, other contrasts, e.g., the main effect of group (do blind and sighted subjects differ in terms of the coherence effects?) should also be reported. Given that the three conditions were matched on speech envelope properties, wouldn't interaction with intelligibility actually reflect top down modulation, which is presumably from lexical/semantic/syntactic properties rather than sensory processes?

3) There are several conflicting sentences in the Results, which make it unclear whether synchronization is present in both sighted and early-blind individuals, with the two groups differing in the degree of tracking, or is present only in the early blind. From Figure 1J it seems that the sighted did in fact show both a general coherence effect (at least for the 8 and 1 channel conditions) and an intelligibility effect (negative correlation). Demonstrating clearly whether the temporal locking effects are present in the sighted subjects and how exactly the early-blind group behaves in comparison is important because this directly links to the conclusions drawn in the paper. It appears from this figure that the functional role of the calcarine sulcus in synchronizing to speech in these populations may need reconsideration.

4) The reviewers pointed out the analysis of source reconstructions on two separate frequency windows that were largely overlapping needs clearer explanation and justification.

5) Another issue concerns the presentation of the same story clip in 3 different versions to each participant. Again, this requires clearer explanation and justification.

https://doi.org/10.7554/eLife.31640.010

Author response

Summary:

This paper examines the role of the deprived visual cortex in the processing of language in blind individuals. Using magnetoencephalography, the authors report that, compared to individuals who could see, early-blind subjects show more synchronized brain responses within the calcarine sulcus to the temporal dynamics of speech. Intriguingly, they observed enhanced synchronization to the intelligibility of speech in the region of the primary visual cortex, and changes in the flow of information between areas of the occipital and temporal cortex that are sensitive to speech intelligibility. The reviewers agreed that this is an important topic and that this is a timely study that builds nicely on previous work on the neural tracking of speech and in the domain of language-system plasticity in the blind. However, various issues were identified with the analysis and presentation of the data, which have implications for the conclusions made by the authors.

Essential revisions:

1) The authors offer up a compelling narrative, but do not address the fundamental question of why blind participants need to recruit visual cortices for speech comprehension. What is the added functional value of having a more expanded brain network? The manuscript provides no explanation for this central aspect of the results. This needs to be addressed in the Discussion.

Why crossmodal plasticity arises in the brain of blind individuals is a crucial question that the field is currently trying to address. It is a very complex question, however, since the underlying mechanisms governing the expression of crossmodal plasticity is the result of a complex interplay between intrinsic organizational principles and experience-dependent influences on these intrinsic neurophysiological constraints.

One possibility that has been proposed in the literature is that the reorganization observed in the occipital cortex of blind people would support enhanced perceptual abilities for the reorganized function (De Villers-Sidani et al., 2016). The rational is simple: since an extension in the cortical territories underlying a computation is observed in the blind, this extension may support enhanced compensatory behavior. For instance, it has been demonstrated that early blind individuals with the higher auditory localization abilities (Gougoux et al., 2005) or superior verbal memory (Amedi et al., 2003) also show higher occipital activity levels. Moreover, occipital cortical thickness predicts performance on pitch and musical tasks in blind individuals (Voss and Zatorre, 2012). Similarly, in early deaf humans the higher the recruitment of temporal regions for face processing the better they are at processing faces (Benetti et al., 2017) and temporarily deactivating the functioning of crossmodally reorganized regions of the temporal cortex of deaf cats also alleviates their superior performance in visuo-spatial processing (Lomber et al., 2010). All-together these studies suggest that the crossmodal recruitment of sensory deprived regions may induce enhanced abilities in the remaining senses.

In relation to the topic of the current study, having more cortical tissue devoted to sentence processing and understanding could potentially support enhanced sentence comprehension. Previous studies have indeed demonstrated that blind people have enhanced speech discrimination in noisy environments (Niemeyer and Starlinger, 1981) and the capability to understand speech displayed at a much faster rate than sighted persons (sighted listeners at rates between 9-14 syllables/s and to blind listeners between 17-22 syllables /s; (Moos and Trouvain, 2007) Crucially, listening to intelligible ultra-fast speech (compared to reverse speech) engage enhanced activity in the right primary visual cortex in early and late blind individuals when compared to sighted controls; and activity in this region correlated with individual ultra-fast speech perception skills (Dietrich et al., 2013). These results raise the interesting possibility that the engagement of right V1 in the analysis of the speech envelope as demonstrated in our study may support the enhanced encoding of early supra-segmental aspects of the speech signal, supporting the ability to understand ultra-fast speech signal. Our behavioral task (speech comprehension) did not however create enough between-subject variability to assess directly this link between the reorganized occipital cortex and speech comprehension (performance was almost at ceiling in the Nat and 8-chan condition but at chance in the 1-chan condition, as targeted- see table 1).

However, it is important to note that linking brain activity and behavior is not a trivial issue. Many studies did not find a direct link between crossmodal reorganization and non-visual processing. More generally, since a behavioral outcome is the end product of a complex interactive process between several brain regions, linking the role of one region in isolation (e.g. the reorganized region in the occipital cortex of blind people) to behavior (e.g. performance) in a multifaced task is not straightforward. The absence of a direct relation between behavior, e.g. speech processing, and crossmodal plasticity could be explained, for instance, by the fact that this complex process will be supported by additional networks than the reorganized ones. Interestingly, recent studies have shown that early visual deprivation triggers a game of “balance” between the brain systems typically dedicated to a specific process and the reorganized occipital cortex (Dormal et al., 2016). It has indeed been proposed that early visual deprivation triggers a redeployment mechanism that would reallocate part of the processing typically implemented in the preserved networks (i.e. the temporal or frontal cortices for speech processing) to the occipital cortex deprived of its most salient input (vision). Two recent studies using multivoxel pattern analysis (MVPA) showed that the ability to decode the different auditory motion stimuli was enhanced in hMT+ (a region typically involved in visual motion in sighted) of early blind while an enhanced decoding accuracy was observed in the sighted group in the planum temporale (Dormal et al., 2016; Jiang et al., 2016). Moreover, Bedny and collaborators (2015) reported enhanced activation of occipital cortex and a simultaneous deactivation of prefrontal regions during a linguistic task in blind children. The authors suggested that the increased involvement of the occipital cortex might decrease the pressure on the prefrontal areas to specialize for language. Interestingly, a Transcranial Magnetic Stimulation (TMS) study reported a reduced disruptive effect of TMS applied over the left inferior prefrontal cortex during linguistic tasks, while TMS application over the occipital had more effects in early blind compared to sighted people (Amedi et al., 2004). This evidence supports the idea that brain regions normally recruited for specific tasks in sighted might become less essential in EB in the case they concomitantly recruit occipital regions for the same task. An important open question for future research therefore concerns the relative behavioral contribution of occipital and perisylvian cortex to speech understanding.

We now have summarized these arguments in the revised Discussion section of our manuscript.

2) The reviewers found your description of the data analysis confusing and inadequate. Please be more specific about exactly which group or interaction effects are used as the marker for sensory processing. If this is based on the interaction between group and speech intelligibility in the calcarine sulcus, as assumed by one of the reviewers, other contrasts, e.g., the main effect of group (do blind and sighted subjects differ in terms of the coherence effects?) should also be reported. Given that the three conditions were matched on speech envelope properties, wouldn't interaction with intelligibility actually reflect top down modulation, which is presumably from lexical/semantic/syntactic properties rather than sensory processes?

We thank the reviewers for pointing out that our previous description of the data analysis was not clear enough. We now have thoroughly modified our data analysis section in order to make it more streamlined. We clarify here, in brief, the specific points raised by the reviewers.

The marker for sensory processing of speech is the cerebro-acoustic coherence that is independent of the intelligibility of speech. Therefore, the regions that are solely sensitive to the acoustic information of our signal (e.g. variation of acoustic energy across time, at the syllabic rate) are tracking the signal envelope for the unaltered speech segments but also for the two vocoded conditions, including the 1-chanel vocoding condition during which speech is not intelligible. Figure 1F therefore shows the regions that synchronize most strongly to the acoustic properties of speech independently of the group (sighted and blind) and independently of the condition of presentation (Natural speech, 8 channels vocoding and 1 channel vocoding). Figure 1H however shows regions where the cerebro-acoustic coherence is higher in the blind than in the sighted, but still independently of condition presentation. This is the effect of group. These regions therefore reorganize in the blind to track acoustic stimulation but do not necessarily show an effect of intelligibility of the speech signal. It is certainly the case that phase-locked responses to sensory stimuli are not unique to human speech comprehension, as evidenced by oscillatory responses to simple auditory stimuli in non-human primates (Lakatos et al., 2007; Schroeder et al., 2008). Might there be additional information in the speech signal that affects the phase-locked cortical response to spoken language in the occipital cortex of the blind? This is exactly what we target with the group-level effect of blindness. The interaction effect we observed in calcarine sulcus (CS) highlights that synchronization to intelligible speech is different in the blind versus sighted. (see Figure 1I). As such, the enhanced neural phase-locking observed in the calcarine sulcus of early blind individuals is not solely driven by changes in the acoustic cue of the auditory stimuli, but also reflects cortical encoding and processing of the speech signal.

As we now discuss at length in the manuscript, the calcarine sulcus in the blind might therefore locate at the interface between high-level speech comprehension and acoustic processing. This sheds important new lights on language processing in the occipital cortex of the blind by demonstrating that the involvement of this neural population relates to the sensory signal of speech and therefore contrasts with the dominant proposition that occipital involvement in speech processing is abstracted from its sensory input and purely reflect higher-level operations similar to those observed in prefrontal regions (Bedny, 2017)

3) There are several conflicting sentences in the Results, which make it unclear whether synchronization is present in both sighted and early-blind individuals, with the two groups differing in the degree of tracking, or is present only in the early blind. From Figure 1J it seems that the sighted did in fact show both a general coherence effect (at least for the 8 and 1 channel conditions) and an intelligibility effect (negative correlation). Demonstrating clearly whether the temporal locking effects are present in the sighted subjects and how exactly the early-blind group behaves in comparison is important because this directly links to the conclusions drawn in the paper. It appears from this figure that the functional role of the calcarine sulcus in synchronizing to speech in these populations may need reconsideration.

We thank the reviewers for this valuable question. With the evidence presented in the current study we are describing differences in the degree of cerebro-acoustic tracking. A binary classification as to whether tracking does, or does not occur in one of the groups was not intended. We outline our reasons for this argument below.

First, from a practical point of view, coherence is a bivariate metric that ranges between 0 and 1. Thus, with no negative values, a simple one-sample t-test in this case is not possible. Indeed, the majority of studies use group or condition comparisons to identify regions where cerebro-acoustic tracking is modulated. Here we use the contrast between intelligible and vocoded speech to highlight areas sensitive to speech intelligibility, and the effect of blindness on target areas where sensitivity to the speech rhythm. Hence all our conclusions are limited to describing modulatory effects.

Second, while it is theoretically possible to compare the results with a surrogate distribution to mimic a one-sample t-test, we believe that this analysis would not add to our argument. The current study was designed to address questions regarding group and condition differences. Additional one-sample t-test would not affect these conclusions. For example, if we do find a significant difference from zero, the conclusions would still hold, while a lack of an effect could not be interpreted. That said, our main proposal is based on the idea that the blind recycle existing functional pathways, and therefore we would not be surprised to find weak forms of cerebro-acoustic coherence in the occipital cortex of the sighted as well. For example, following the reviewer’s suggestion, we now have also examined the group effect as a post-hoc comparison in CS. Here we do not find that coherence is overall (across the 3 conditions) higher in blind versus sighted individuals. However, we do find a qualitative difference where the blind show enhanced coherence during intelligible speech compared to sighted individuals, while in the unintelligible condition, we find no significant difference between groups. The fact that the occipital cortex of sighted individuals shows stronger entrainment for altered speech condition may relate to the previous demonstration that the more adverse the listening condition (low signal-to-noise-ratio or audio-visual incongruence), the more the visual cortex is entrained to the speech signal of actual acoustic speech presented together with varying levels of acoustic noise (Park et al., 2016; Giordano et al., 2017). Moreover, Giordano and colleagues (2017), showed an increase of directed connectivity between superior frontal regions and visual cortex under the most challenging (acoustic noise and uninformative visual cues) conditions. Kayser et al. (Kayser et al., 2015) also proposed top-down processes modulating acoustic entrainment. In case of absence of structuring visual inputs since birth, the occipital cortex that does not play a role in integrating visual and auditory speech signals, like in the sighted, would now start to show enhanced coherence during intelligible speech, like what is typically observed in temporal regions. This hypothesis again suggests a link between the reorganization observed in blind individuals and the typical multisensory structure involving the occipital cortex in audio-visual speech processing (Kayser et al., 2007).

We now have added the following paragraph to the Results section to highlight that idea.

“A group difference between blind and sighted individuals across conditions was not observed in the calcarine region (t(30)=1.1, p=.28). The lack of an overall group effect in calcarine sulcus suggests that there is not a simple enhanced sensory response to speech in the blind. In this respect, calcarine sulcus responds differently than the region wide identified in parietal cortex where synchronization was overall stronger in the blind. Rather, the two groups differ only when speech is intelligible. But, while overall coherence with the speech envelope is equally high in early blind and sighted individuals, the blind population show significantly higher coherence in the intelligible speech condition compared to the sighted, who show higher cerebro-acoustic coherence in the unintelligible condition (1-Chan). The latter is reminiscent to the fact that the more adverse the listening condition (low signal-to-noise-ratio or audio-visual incongruence), the more the visual cortex is entrained to visual speech signal of actual acoustic speech presented together with varying levels of acoustic noise (Park et al., 2016; Giordano et al., 2017). Moreover, Giordano and colleagues (2017), showed an increase of directed connectivity between superior frontal regions and visual cortex under the most challenging (acoustic noise and uninformative visual cues) conditions. This hypothesis again suggests a link between the reorganization observed in the occipital cortex of blind individuals and typical multisensory pathways involving the occipital cortex in audio-visual speech processing (Kayser et al., 2007). Visual deprivation since birth however triggers a functional reorganization of the calcarine region that may now dynamically interact with the intelligibility of speech signal.”

4) The reviewers pointed out the analysis of source reconstructions on two separate frequency windows that were largely overlapping needs clearer explanation and justification.

We apologize for this lack of clarity. We have decided to combine the reconstructions with filters for two different frequency bands to optimally capture the shape of the coherence spectrum we saw in sensors space. We have decided to opt for this approach to cover the broad range of frequencies where syllables can occur. In addition, our choice was restricted by our decision to segment the trials into 1s segments, and hence compute a frequency spectrum with integer frequencies. A center-frequency at 6.5Hz was thus not feasible. We have now elaborated on this approach in the Materials and methods.

By combining the two frequency-bands we acquire a source estimate that emphasizes the center of our frequency band of interest (6-7Hz) and tapers off towards the edges, which optimally represents the coherence spectrum observed in sensor space. The choice of the center frequency was also restricted by the length of the time window used for the analysis. That is, with a 1s time window and a resulting 1Hz frequency resolution, a non-integer center frequency at e.g. 6.5Hz was not feasible.

5) Another issue concerns the presentation of the same story clip in 3 different versions to each participant. Again, this requires clearer explanation and justification.

Because the key analyses involved contrasting the different conditions, we wanted to make sure that the speech envelopes for the three different conditions were exactly the same and, so it was crucial to keep the sentences the same. However, to avoid contamination as much as possible, we ensured that during the randomization procedure of the trials two versions of the same story were never presented close to each other. Due to the length and complexity of the stories presented in the current study, it would be very difficult to reconstruct a story in the 1chan condition from a previously heard comprehensible version. We argue that this is a different situation than hearing short sentences in close succession during a vocoded and intelligible condition. In addition, if this effect existed, it would be balanced out due to the randomization procedure.

https://doi.org/10.7554/eLife.31640.011

Article and author information

Author details

  1. Markus Johannes Van Ackeren

    Center for Mind/Brain Studies, University of Trento, Trento, Italy
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
  2. Francesca M Barbero

    1. Institute of research in Psychology, University of Louvain, Louvain, Belgium
    2. Institute of Neuroscience, University of Louvain, Louvain, Belgium
    Contribution
    Conceptualization, Data curation, Validation, Methodology, Project administration, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4445-4402
  3. Stefania Mattioni

    Center for Mind/Brain Studies, University of Trento, Trento, Italy
    Contribution
    Resources, Data curation, Validation, Investigation, Project administration, Writing—review and editing
    Competing interests
    No competing interests declared
  4. Roberto Bottini

    Center for Mind/Brain Studies, University of Trento, Trento, Italy
    Contribution
    Resources, Data curation, Validation, Investigation, Project administration, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7941-7762
  5. Olivier Collignon

    1. Center for Mind/Brain Studies, University of Trento, Trento, Italy
    2. Institute of research in Psychology, University of Louvain, Louvain, Belgium
    3. Institute of Neuroscience, University of Louvain, Louvain, Belgium
    Contribution
    Conceptualization, Resources, Data curation, Supervision, Funding acquisition, Validation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    olivier.collignon@uclouvain.be
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1882-3550

Funding

H2020 European Research Council (337573)

  • Markus Johannes Van Ackeren
  • Stefania Mattioni
  • Roberto Bottini
  • Olivier Collignon

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The project was funded by the ERC grant MADVIS – Mapping the Deprived Visual System: Cracking function for Prediction (Project: 337573, ERC-20130StG) awarded to Olivier Collignon. We would also like extend our gratitude to Valeria Occelli, and to the Masters students who assisted with the data acquisition (Marco Barilari, Giorgia Bertonati, and Elisa Crestale) for their kind assistance with the blind participants. In addition, we would like to thank Gianpiero Monittola for ongoing support with the hardware and Jodie Davies-Thompson for insightful comments on the manuscript. Finally, we would like to thank the organizations for the blind in Trento, Mantova, Genova, Savona, Cuneo, Torino, Trieste and Milano for their help in recruiting participants.

Ethics

Human subjects: The project was approved by the local ethical committee at the University of Trento (protocol 2014-007). In agreement with the Declaration of Helsinki, all participants provided written informed consent to participate in the study.

Reviewing Editor

  1. Andrew J King, University of Oxford, United Kingdom

Publication history

  1. Received: August 30, 2017
  2. Accepted: January 16, 2018
  3. Accepted Manuscript published: January 17, 2018 (version 1)
  4. Version of Record published: January 30, 2018 (version 2)
  5. Version of Record updated: February 13, 2018 (version 3)

Copyright

© 2018, Van Ackeren et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,206
    Page views
  • 193
    Downloads
  • 3
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, Scopus, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Noa Bielopolski et al.
    Research Article Updated
    1. Neuroscience
    Pierre-Yves Musso et al.
    Research Advance