Resolving multisensory and attentional influences across cortical depth in sensory cortices

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

In our environment, our senses are bombarded with a myriad of signals, only a subset of which is relevant for our goals. Using sub-millimeter-resolution fMRI at 7T, we resolved BOLD-response and activation patterns across cortical depth in early sensory cortices to auditory, visual and audiovisual stimuli under auditory or visual attention. In visual cortices, auditory stimulation induced widespread inhibition irrespective of attention, whereas auditory relative to visual attention suppressed mainly central visual field representations. In auditory cortices, visual stimulation suppressed activations, but amplified responses to concurrent auditory stimuli, in a patchy topography. Critically, multisensory interactions in auditory cortices were stronger in deeper laminae, while attentional influences were greatest at the surface. These distinct depth-dependent profiles suggest that multisensory and attentional mechanisms regulate sensory processing via partly distinct circuitries. Our findings are crucial for understanding how the brain regulates information flow across senses to interact with our complex multisensory world.

Introduction

In our natural environment, our senses are exposed to a constant influx of sensory signals that arise from many different sources. How the brain flexibly regulates information flow across the senses to enable effective interactions with the world remains unclear.

Mounting evidence from neuroimaging (Beauchamp et al., 2004; Noesselt et al., 2007; Rohe and Noppeney, 2016), neurophysiology (Atilgan et al., 2018; Kayser et al., 2010; Kayser et al., 2008; Lakatos et al., 2007) and neuroanatomy (Falchier et al., 2002; Rockland and Ojima, 2003) suggests that interactions across the senses are pervasive in neocortex, arising even in primary cortices (Driver and Noesselt, 2008; Ghazanfar and Schroeder, 2006; Liang et al., 2013; Schroeder and Foxe, 2002). Visual stimuli can directly drive as well as modulate responses in cortices that are dedicated to other sensory modalities. Most prominently, functional magnetic resonance imaging (fMRI) in humans has shown that visual stimuli can induce crossmodal deactivations in primary and secondary auditory cortices (Laurienti et al., 2002; Leitão et al., 2013; Mozolic et al., 2008), yet enhance the response to a concurrent auditory stimulus (Werner and Noppeney, 2011; Werner and Noppeney, 2010). Further, neurophysiological research has suggested that multisensory influences emerge early in sensory cortices and are to some extent preserved in anaesthetized animals (Butler et al., 2012; Ibrahim et al., 2016; Iurilli et al., 2012; Kayser et al., 2007; Mercier et al., 2013). Yet, the ability to extrapolate from neurophysiological findings in animals to human fMRI studies is limited by the nature of the BOLD response, which pools neural activity over time and across a vast number of neurons (Logothetis, 2008).

Information flow is regulated not only by multisensory but also by attentional mechanisms that are guided by our current goals (Fairhall and Macaluso, 2009; Talsma et al., 2010). Critically, multisensory and attentional mechanisms are closely intertwined. Both enhance perceptual sensitivity (Leo et al., 2011) and precision of sensory representations (Ernst and Bülthoff, 2004; Fetsch et al., 2012; Meijer et al., 2019; Rohe and Noppeney, 2015). Most importantly, the co-occurrence of two congruent sensory stimuli boosts the salience of an event (Lewis and Noppeney, 2010; Van der Burg et al., 2008), which may thereby attract greater attention. Conversely, a stimulus presented in one sensory modality alone may withdraw attentional resources from other sensory modalities. Behavioural and functional imaging studies have shown that shifting attention endogenously to one sensory modality reduces processing and activations in the unattended sensory systems (Ciaramitaro et al., 2007; Johnson and Zatorre, 2005; Mozolic et al., 2008; Rohe and Noppeney, 2018; Rohe and Noppeney, 2015; Shomstein and Yantis, 2004). As a consequence, attentional mechanisms may contribute to competitive and cooperative interactions across the senses, for instance by amplifying responses for congruent audiovisual stimuli, and generating crossmodal deactivations for unisensory stimuli. Inter-sensory attention can also profoundly modulate multisensory interactions (Talsma et al., 2007). Most prominently, the influence of visual stimuli on auditory cortices was shown to be enhanced when attention was focused on the visual sense (Lakatos et al., 2009).

While previous neurophysiological studies have revealed influences of modality-specific attention predominantly in superficial laminae in non-human primates (Lakatos et al., 2009), visual influences on auditory cortices have recently been shown to be most prominent in deep layer 6 of auditory cortex in rodents (Morrill and Hasenstaub, 2018). In other words, combined neurophysiological evidence from primates and rodents suggests a double disassociation of attentional and multisensory influences.

To investigate whether this double dissociation can be found in human neocortex, we exploited recent advances in submillimeter-resolution fMRI at 7T that allow the characterization of depth-dependent activation profiles (De Martino et al., 2015a; Duong et al., 2003; Harel et al., 2006; Kok et al., 2016; Koopmans et al., 2010; Muckli et al., 2015; Polimeni et al., 2010; Trampel et al., 2012). While gradient-echo echo-planar-imaging (GE-EPI) BOLD fMRI is not yet able to attribute activations unequivocally to specific cortical layers (Duong et al., 2003; Goense et al., 2012; Harel et al., 2006; Huber et al., 2017; Markuerkiaga et al., 2016; Trampel et al., 2019), the observation in the same cortical territories of distinct laminar activation profiles induced by multisensory and attentional influences would strongly imply distinct neural mechanisms.

The current study investigated the processing of auditory, visual or audiovisual looming stimuli under auditory and visual attention in the human brain. Combining submillimeter-resolution fMRI at 7T with laminar and multivariate pattern analyses, we show distinct depth-dependent activation profiles and/or patterns for multisensory and attentional influences in early auditory and visual cortices. These results suggest that multisensory and attentional mechanisms regulate sensory processing in early sensory cortices via partly distinct neural circuitries.

Results

In this fMRI study, participants were presented with blocks of auditory (A), visual (V) and audiovisual (AV) looming stimuli interleaved with fixation (Figure 1A). We used looming motion as a biologically relevant and highly salient stimulus that reliably evokes crossmodal influences in sensory cortices in human neuroimaging and animal neurophysiology (Cappe et al., 2012; Maier et al., 2008; Tyll et al., 2013). Modality-specific attention was manipulated by requiring participants to detect and respond selectively to weak auditory or visual targets, which were adjusted prior to the main study to threshold performance in sound amplitude or visual size for each participant. The targets were interspersed throughout all auditory, visual and audiovisual looming blocks (e.g. visual targets were presented during both visual and auditory looming blocks; see Figure 1B).

Figure 1 with 2 supplements see all

Download asset Open asset

Experimental design, timeline and cortical layering.

(A) Experimental design: Participants were presented with auditory, visual and audiovisual looming stimuli under auditory and visual attention. (B) Example trial and timeline: Participants were presented with brief auditory, visual or audiovisual looming stimuli in 33 s blocks interleaved with 16 s fixation. At the beginning of each block, a cue indicated whether the auditory or visual modality needed to be attended. Brief visual and auditory targets (grey) were interspersed in the looming activation blocks. Participants were instructed to respond to the targets in the attended and ignore the targets in the unattended sensory modality. (C) Cortical layering: Left: A parasagittal section of a high resolution T1 map is shown with a colour coded laminar label for each voxel (voxel size: (0.4 mm)³). The primary auditory cortex is circled in yellow. Right: The cortical sheet is defined by the pial and white matter surfaces (thick black solid lines). Six additional surfaces (thin black solid lines) were determined at different cortical depths. Data were mapped onto those surfaces by sampling (blue dots) radially along the normal (white dashed line) to the mid-cortical depth surface (not shown here). WM: white matter, GM: grey matter, CSF: cerebrospinal fluid.

For the behavioural analysis, the percentages of visual and auditory targets that gained a response (see Figure 1—figure supplement 1 and Supplementary file 1 for ‘% target responses’) were entered into a 2 (target modality: auditory vs. visual) X 2 (attended modality: auditory vs. visual) X 3 (stimulus block modality: auditory, visual, audio-visual) repeated measures ANOVA. We observed significant main effects of attended modality (F(1, 10)=291.263, p<0.001, ηp² = 0.967), target modality (F(1, 10)=13.289, p=0.004, ηp² = 0.571), and stimulus modality (F(1.986, 19.864)=5.304, p=0.014, ηp² = 0.347), together with a significant interaction between target modality and attended modality (F(1, 10)=14.786, p=0.003, ηp² = 0.597). This interaction confirmed that participants had successfully maintained auditory or visual attention as instructed.

Further, we observed significant interactions between target modality and stimulus block modality (F(1.813, 18.126)=8.149, p=0.004, ηp² = 0.449) and between attended modality and stimulus block modality (F(1.436, 14.360)=24.034, p<0.001, ηp² = 0.706). These interactions resulted from greater hit rates for auditory targets given auditory attention than for visual targets given visual attention during both auditory (t(10)=2.845, p=0.017, Hedges g_av = 0.804) and visual blocks (t(10)=4.432, p=0.001, Hedges g_av = 2.037), but not for audio-visual blocks (t(10)=0.276, p=0.788, Hedges g_av = 0.081) suggesting that observers’ performance was not completely matched across all conditions. Specifically, the presentation of auditory and visual stimuli in the audiovisual blocks interfered with the detection of auditory targets. For completeness, there was no significant three-way interaction (F(1.783, 17.826)=2.467, p=0.118, ηp² = 0.198).

Using sub-millimeter resolution fMRI, we characterized the laminar profiles in auditory (primary auditory cortex, A1; planum temporale, PT) and visual (primary, V1; higher order, V2/3) regions for the following effects: 1. sensory deactivations in unisensory contexts for non-preferred stimuli (i.e. crossmodal, e.g. [V-Fix] in auditory cortices), 2. crossmodal modulation (e.g. [AV-A] in auditory cortices, [AV-V] in visual cortices) in audiovisual context and 3. direct and modulatory effects of modality-specific attention.

Briefly, we used the following methodological approach (see Material and methods): in each ROI and participant, we estimated the regional BOLD response (i.e. ‘B parameter estimate’) (e.g. Figure 2A row 1, left) and the multivariate pattern decoding accuracy for each of the six laminae (e.g. Figure 3A row 1, right). We then characterized the laminar profiles of BOLD response and decoding accuracy in terms of a constant and a linear shape parameter (i.e. ‘S parameter estimates’) and show their across-subjects’ mean and distribution in violin plots (e.g. Figure 2A row 2). Finally, we characterized the spatial topography of those S parameter estimates by projecting their group mean onto a normalized group cortical surface (e.g. Figure 2B).

Figure 2 with 3 supplements see all

Download asset Open asset

Auditory and visual deactivations.

(A) BOLD response profiles: Rows 1 and 3: The BOLD response (i.e. B parameters, across subjects’ mean ± SEM) for visual and auditory looming stimuli averaged over auditory and visual attention in A1, PT, V1, V2/3 is shown as a function of percentage cortical depth. WM: white matter; GM: grey matter; CSF: cerebrospinal fluid. Percentage cortical depth is indicated by the small numbers and colour coded in red. Rows 2 and 4: Across subjects’ mean (± SEM) and violin plot of the participants’ shape parameter estimates that characterize the mean (C: constant) and linear increase (L: linear) of the laminar BOLD response profile. n = 11. (B) Surface projections: Across subject’ mean of the ‘constant’ (row i) and ‘linear’ increase (row ii) shape parameter estimates of the laminar BOLD response profile for auditory and visual looming stimuli (averaged over auditory and visual attention) are projected on an inflated group mean surface to show auditory and visual regions of the left hemisphere. A1 and V1 are delineated by black solid lines, PT and V2/3 by dashed lines. For visualization purposes only: i. surface projections were smoothed (FWHM = 1.5 mm); ii. values are also presented for vertices for which data were not available from all subjects and which were therefore not included in our formal statistical analysis. Grey areas denote vertices with no available data for any subject.

Figure 3 with 2 supplements see all

Download asset Open asset

Cross-modal modulation in auditory areas.

(A) Laminar profiles: Rows 1 and 3: The BOLD response (solid line; column 1 and 3) and decoding accuracy (dashed line; columns 2 and 4) (across subjects’ mean ± SEM) for [AV-A] in A1 and PT is shown as a function of percentage cortical depth pooled (i.e. averaged) over auditory and visual attention. WM: white matter, GM: grey matter, CSF: cerebrospinal fluid. Percentage cortical depth is indicated by the small numbers and colour coded in red. Rows 2 and 4: Across subjects’ mean (± SEM) and violin plot of participants’ shape parameter estimates that characterize the mean (C: constant) and linear increase (L: linear) of the laminar BOLD response and decoding accuracy profiles. n = 11. (B) Surface projections and raster plots: **Left**: Across subject’ mean of the ‘constant’ (row i) and ‘linear increase’ (row ii) shape parameter estimates of the laminar BOLD response profile for [AV-A] (averaged over auditory and visual attention) are projected on an inflated group mean surface to show auditory regions of the left hemisphere. A1 and PT are delineated by black solid and dashed lines. For visualization purposes only: i. surface projections were smoothed (FWHM = 1.5 mm); ii. values are also presented for vertices for which data were not available from all subjects and which were therefore not included in our formal statistical analysis. Grey areas denote vertices with no available data for any subject. Right: Row i. PT: The raster plot illustrates the statistical relationship between the ‘constant’ shape parameters for the visual evoked response [V-Fix]_{AttA, AttV} and the crossmodal modulation [AV-A]_{AttA, AttV} in PT. Each raster plot shows the laminar profiles (colour coded along abscissa) of the vertices for the ‘predicting contrast’ [V-Fix] and of the ‘predicted contrast’ [AV-A]. The laminar profiles of the vertices were sorted along the ordinate according to the value of the ‘constant’ shape parameter for [V-Fix]. The raster plot shows that the laminar profile of a vertex for [V-Fix] can predict its laminar profile for [AV-A]: PT vertices with less deactivations across laminae for [V-Fix] are associated with greater crossmodal enhancement [AV-A]. Row ii. The raster plot illustrates the statistical relationship between the ‘linear slope’ shape parameters for the visual evoked response [V-Fix]_{AttA, AttV} and the crossmodal modulation [AV-A] in A1. Each raster plot shows the laminar profiles (colour coded along abscissa) of the vertices for the ‘predicting contrast’ [V-Fix] and of the ‘predicted contrast’ [AV-A]. The laminar profiles of the vertices are sorted along the ordinate according to the value of the ‘linear’ shape parameter for [V-Fix]. A1 vertices with less deactivations in deeper laminae for [V-Fix] are associated with greater crossmodal enhancement [AV-A] in deeper laminae. To enable averaging the raster plots across participants, the vertices were binned after sorting (number of bins for A1: 10100, number of bins for PT: 6800). For visualization purposes all raster plots were smoothed along the vertical axis (FWHM = 1% of the number of data bins). The subplot (i.e. black line) to the left of the raster plots shows the across subjects’ mean value (+ /- STD) of the shape parameters (i.e. i. constant, ii, linear) of the sorting contrast. n = 11.

All statistical results are presented in Table 1, Table 2, Table 3 and Supplementary files 3 and 4. Additional descriptive statistics as well as effect sizes can be found here: https://osf.io/tbh37/.

Table 1

Auditory and visual deactivations.

		Linear or constant			Constant		Linear
[V-fix]Att_A, Att_V	Mean(A1, PT)	F(2,40) = 9.280	p<0.001		t(10)=−2.460	p=0.017*	F(1,20) = 2.083	p=0.164
				A1	t(10)=−2.077	p=0.032*
				PT	t(10)=−2.042	p=0.034*
[A-fix]Att_A, Att_V	mean(V1, V23)	F(2,40) = 58.615	p<0.001		t(10)=−5.547	p<0.001*	F(1,20) = 22.433	p<0.001
				V1	t(10)=−6.538	p<0.001*	t(10)=−5.080	p<0.001
				V2-3	t(10)=−4.305	p<0.001*	t(10)=−4.142	p=0.002

*indicates p-values based on a one-sided t-test based on a priori hypotheses. p-values<0.05 are indicated in bold. n = 11

Using 2 (shape parameter: constant, linear) x 2 (ROI: primary, non-primary) linear mixed effects models, we performed the following statistical comparisons in a 'step down procedure':
1. Two-dimensional F-test assessing whether the constant or linear parameter (e.g. each averaged across ROIs in auditory resp. visual cortices), was significantly different from zero (dark grey),

2. If this two-dimensional F-test was significant, we computed one dimensional F-tests separately for the constant and the linear parameters (again averaged across auditory resp. visual ROIs) (light grey),
3. If the one dimensional F-test was significant, we computed follow-up t-tests separately for each of the two ROIs (white).

Table 2

Effects of the cross-modal modulation on the laminar BOLD response and decoding accuracy profiles in auditory areas.

A) BOLD profile
		linear or constant			constant		linear
[AV - A]Att_A, Att_V	mean(A1, PT)	F(2,40) = 0.196	p=0.823
B) Decoding profile
		linear or constant			constant		linear
[AV VS A]att A, att V	mean(A1, PT)	F(2,40) = 34.946	p<0.001		F(1,20) = 21.966	p<0.001	F(1,20) = 1.850	p=0.189
				A1	t(10)=3.867	p=0.003
				PT	t(10)=4.992	p<0.001

Using 2 (shape parameter: constant, linear) x 2 (ROI: primary, non-primary) linear mixed effects models, we performed the following statistical comparisons in a 'step down procedure':

1. Two-dimensional F-test assessing whether the constant or linear parameter (e.g. each averaged across ROIs in auditory resp. visual cortices), was significantly different from zero (dark grey),
2. If this two-dimensional F-test was significant, we computed one dimensional F-tests separately for the constant and the linear parameters (again averaged across auditory resp. visual ROIs) (light grey),

3. If the one dimensional F-test was significant, we computed follow-up t-tests separately for each of the two ROIs (white).

Table 3

Effects of the attentional modulation (irrespective of stimulus type) on the laminar BOLD response and decoding accuracy profiles.

A) BOLD profile
		linear or constant			constant		linear
[Att_V - Att_A]A, V, AV	mean(A1, PT)	F(2,40) = 12.602	p<0.001		F(1,20) = 9.249	p=0.006	F(1,20) = 12.163	p=0.002
				A1	t(10)=1.882	p=0.089	t(10)=3.123	p=0.011
				PT	t(10)=4.523	p=0.001	t(10)=3.361	p=0.007
[Att_V - Att_A]A, V, AV	mean(V1, V23)	F(2,40) = 0.669	p=0.518
B) Decoding profile
		linear or constant			constant		linear
[Att_A VS Att_V]A, V, AV	mean(A1, PT)	F(2,40) = 4.687	p=0.015		F(1,20) = 4.882	p=0.039	F(1,20) = 4.028	p=0.058
				A1	t(10)=1.260	p=0.236
				PT	t(10)=2.031	p=0.070
[Att_A VS Att_V]A, V, AV	mean(V1, V23)	F(2,40) = 20.026	p<0.001		F(1,20) = 13.564	p=0.001	F(1,20) = 9.951	p=0.005
				V1	t(10)=2.472	p=0.033	t(10)=1.359	p=0.204
				V2-3	t(10)=4.298	p=0.002	t(10)=3.089	p=0.011

Using 2 (shape parameter: constant, linear) x 2 (ROI: primary, non-primary) linear mixed effects models, we performed the following statistical comparisons in a 'step down procedure':

1. Two-dimensional F-test assessing whether the constant or linear parameter (e.g. each averaged across ROIs in auditory resp. visual cortices), was significantly different from zero (dark grey),
2. If this two-dimensional F-test was significant, we computed one dimensional F-tests separately for the constant and the linear parameters (again averaged across auditory resp. visual ROIs) (light grey),

3. If the one dimensional F-test was significant, we computed follow-up t-tests separately for each of the two ROIs (white).p-values<0.05 are indicated in bold. n = 11.

Auditory cortices

Auditory stimuli evoked a positive BOLD response in primary auditory cortex and especially in anterior portions of the planum temporale (Figure 2—figure supplements 1B and 3 right). As expected from the typical physiological point spread function of the GE-EPI BOLD signal, the positive BOLD signal increased roughly linearly towards the cortical surface (Markuerkiaga et al., 2016) (Figure 2—figure supplement 1A and Supplementary file 3).

Deactivations induced by crossmodal visual stimuli (i.e. a negative BOLD response for [V-Fix]) were observed in both A1 and PT with a constant response profile and based on visual inspection even a trend towards stronger deactivations in deeper laminae (Figure 2A left, Table 1). These visually induced deactivations were generally observed for both auditory and visual attention conditions, with no significant difference between them ([Att_A-Att_V] for visual stimuli in A1 and PT: F(2,40) = 0.644, p=0.530).

We did not observe a significant crossmodulatory effect of visual stimuli on the BOLD-response in A1 and PT in the context of concurrent auditory stimuli (Figure 3A left, Table 2A). However, pattern classifiers succeeded in discriminating between patterns elicited by [AV vs. A] conditions across all laminae in both PT and A1 (Figure 3A right, Table 2B). Thus, even when the mean BOLD response across vertices averaged across A1 and PT did not differ significantly for AV and A stimuli at a particular depth, the visual stimulus changed the activation pattern elicited by a concurrent auditory stimulus. This provides evidence that sub-regions with crossmodal enhancement and suppression co-exist in A1 and PT. The visual induced changes in activation patterns were again not significantly affected by modality-specific attention (for the classification [AV-V]_{att A} VS [AV-V]_{att V}, F(2,40) = 0.185, p=0.832).

Taken together these results suggest that salient visual looming stimuli affected auditory cortices irrespective of the direction of modality-specific attention in both unisensory and audiovisual contexts. Further, the surface projections of the shape parameters of the BOLD response profile at the group level revealed a patchy topography both for visually-induced deactivations in a unisensory context (Figure 2B left) and for visual-induced modulations of the auditory response (Figure 3B left i and ii).

We next explored whether visual stimuli influenced activation patterns in auditory cortices in a similar patchy topography during unisensory visual and audiovisual contexts. For this, we defined a general linear model for each subject that used the ‘constant’ or ‘linear’ shape parameters from the visual deactivations (i.e. [V-Fix]) for each vertex to predict the ‘constant’, and respectively ‘linear’, shape parameters for the visually induced crossmodal modulation (i.e. [AV-A]) over vertices (again in a cross-validated fashion). We visualized the results of this regression in the form of raster plots (Figure 3B right i and ii). These raster plots show the laminar BOLD response profile for the difference [AV-A] across vertices, sorted according to their BOLD response profile for [V-fix].

In A1 and PT, the shape profile of a vertex for visual deactivations significantly predicted its profile for cross-modal modulation suggesting similar patchy topographies for visual deactivations and crossmodal modulation. In PT, the constant parameter for [V-Fix] predicted the constant for [AV-A]: the less a vertex is deactivated in PT by visual stimuli in a unisensory context, the more it shows a visually induced enhancement of the response to a concurrent auditory stimulus (constant: t(10)=3.460, p=0.006, linear: t(10)=1.853, p=0.094, see Figure 3Bi right). More interestingly, in A1 the linear shape parameter for [V-Fix] predicted the linear shape parameter of [AV-A]: vertices with less deactivations in deeper relative to superficial laminae in unisensory visual context showed a robust crossmodal enhancement that was most pronounced in deeper laminae (linear: t(10)=3.121, p=0.011, constant: t(10)=0.021, p=0.983, Figure 3Bii right). These results strongly suggest that visual stimuli generate a BOLD response in primary auditory cortex with a patchy topography similar for unisensory and audiovisual contexts.

Because these commonalities in topography across two contrasts could in principle arise from spurious factors (such as registration errors, curvature-dependent segmentation errors, heterogeneous occurrence of principal veins, curvature dependent occurrence of veins, orientation dependence to B0, signal leakage of kissing gyri) that could affect BOLD-response magnitude similarly in different contrasts, we repeated the statistical analysis and raster plots for the constant shape parameters of the [V-Fix] and [A-Fix] contrasts in auditory areas. However, as shown in the raster plots in Figure 3—figure supplement 2 this control analysis did not reveal any significant relationship between the two contrasts. The absence of a significant effect in this control analysis thus suggests that the similarity in topography between visually induced deactivations and cross-modal modulations reflects similarities in neural organization rather than non-specific effects.

In contrast to these multisensory influences, attentional modulation was greatest at the cortical surface in both A1 and PT for the regional BOLD response (i.e. significant positive linear effect in A1 and PT, Figure 4Ai left, Table 3A). Averaged across A1 and PT, pattern classifiers also discriminated between the auditory and visual attention conditions (Figure 4Ai right, Table 3B). The decoding accuracy profiles were characterized by a significant constant term and a non-significant trend for the linear term. Surprisingly, even though the mean BOLD-response profiles revealed a profound effect of attention, the discrimination between auditory and visual attention conditions was rather limited. Discriminating between activation patterns may be limited, if modality-specific attention predominantly amplifies and scales the BOLD response for A, V or AV stimuli whilst preserving the activation patterns, whereas the crossmodal influences impacts the activation patterns.

Figure 4

Download asset Open asset

Attentional modulation.

(A) Laminar profiles: Rows 1, 3, 5, 7: The BOLD response (solid line; columns 1 and 3) and decoding accuracy (dashed line; columns 2 and 4) (across subjects’ mean ± SEM) for attentional modulation (i: top - [Att_A-Att_V] in A1 and PT; ii: bottom - [Att_V-Att_A] in V1 and V2/3) is shown as a function of percentage cortical depth pooled (i.e. averaged) over auditory, visual and audiovisual looming stimuli. WM: white matter, GM: grey matter, CSF: cerebrospinal fluid. Percentage cortical depth is indicated by the small numbers and colour coded in red. Rows 2, 4, 6, 8: Across subjects’ mean (± SEM) and violin plot of the participants’ shape parameter estimates that characterize the mean (C: constant) and linear increase (L: linear) of the laminar BOLD response and decoding accuracy profiles. n = 11. B. Surface projections: Across subject’ mean of the ‘constant’ shape parameter estimates for attentional modulation [Att_A-Att_V]_{A,V, AV} in A1 and PT (i: top) and in V1 and V2/3 (ii: bottom) are projected on an inflated group mean surface of the left hemisphere. A1 and V1 are delineated by black solid lines. PT and V2/3 are delineated by dashed lines. For visualization purposes only: i. surface projections were smoothed (FWHM = 1.5 mm); ii. values are also presented for vertices for which data were not available from all subjects and which were therefore not included in our formal statistical analysis; iii borders between visual-induced activations and deactivations (white dashed lines on the right) are reported here from Figure 2B. Grey areas denote vertices with no available data for any subject.

To investigate this explanation further, we quantified and compared the similarity in A1 between activation patterns of auditory vs. visual attention conditions (averaging over i. A_Att_V vs. A_Att_A; and ii. AV_Att_V vs. AV_Att_A) and of auditory vs. audiovisual conditions (averaging over iii. A_Att_A vs. AV_Att_A; and iv. A_Att_V vs. AV_Att_V) using the Spearman correlation coefficient computed over vertices. The average similarity between auditory and visual attention conditions was indeed greater than the average similarity between A and AV conditions (two-sided exact sign permutation test on Fisher transformed correlation coefficients, p=0.008). These results suggest that auditory attention mainly scales the BOLD response in A1 whilst preserving the activation pattern, whereas visual stimuli alter the activation pattern in A1.

In summary, we observed distinct laminar BOLD response profiles and activation patterns for multisensory and attentional influences in auditory cortices. Visual stimuli induced deactivations with a constant profile both in A1 and PT. Likewise, they influenced the activation pattern in auditory cortices similarly across all laminae (i.e. constant effect for decoding accuracy) in audiovisual context leading to successful pattern decoding accuracy for [AV vs. A] even in deeper laminae. In fact, in A1 and PT visual inputs altered the activation pattern with a similar patchy topography when presented alone (i.e. visual-induced deactivation) or together with an auditory stimulus (i.e. crossmodal modulation).

Modality-specific attention amplified the mean BOLD-signal within A1 and PT mainly at the cortical surface (i.e. significant linear parameter). Likewise the decoding accuracy was better than chance with a non-significant increase toward the surface.

Visual cortices

In V1 and V2/3 visual stimuli induced activations that increased toward the cortical surface as expected for GE-EPI BOLD (Markuerkiaga et al., 2016) (Figure 2—figure supplements 1 and 2 left and Supplementary file 1). Contrary to previous research (Chen et al., 2013; Koopmans et al., 2010), we did not observe a more selective increase in activations for layer 4. Potentially, this selective increase in BOLD-response may have been smoothed across multiple layers, because the relative cortical depth of layer four varies between foveal and lateral projections in V1 and our region of interest is larger than those used in previous work (for similar findings see figure 6B in Polimeni et al., 2010 for fMRI and figure 6E in Waehnert et al. (2016) for structural anatomy). Consistent with previous research (Shmuel et al., 2002; Silver et al., 2008; Tootell et al., 1998), the centrally presented visual looming stimuli activated predominantly areas representing the centre of the visual field with response suppressions (i.e. negative BOLD) in adjacent areas (see dashed white lines in Figure 2—figure supplement 1B right). By contrast, auditory deactivations were not confined to the central field that was activated by visual stimuli. Instead, they were particularly pronounced in the rostral part of the calcarine sulcus representing more eccentric positions of the visual field (Figure 2B right and Figure 2—figure supplement 2 right). Furthermore, auditory looming stimuli induced deactivations that increased in absolute magnitude toward the cortical surface (significant constant and linear effect, Table 1). Again as in auditory areas, these auditory deactivations were not significantly modulated by modality-specific attention ([Att_V-Att_A] for auditory stimuli in V1 and V2/3: F(2,40) = 1.651, p=0.205).

Despite the pronounced deactivations elicited by unisensory auditory stimuli, we did not observe any significant cross-modal modulation of the regional BOLD response in V1 and V2/3 (see Figure 3—figure supplement 1 left and Supplementary file 4A). Likewise, the support vector classifiers were not able to discriminate between activation patterns for [AV vs. V] stimuli better than chance. In summary, the effect of auditory stimuli on visual cortices was abolished under concurrent visual stimulation both in terms of regional BOLD response and activation patterns (see Figure 3—figure supplement 1 right and Supplementary file 4B).

With respect to attentional modulation, we observed no significant differences in mean BOLD responses in V1 or V2/3 for visual vs. auditory attention conditions (see Table 3A). As shown in the violin plots of Figure 4Aii left, only few participants showed a substantial attentional modulation effect when averaged across the entire regions of interest. Yet, the classifier was able to discriminate between patterns for visual and auditory attention conditions successfully across all cortical depth surfaces in V1 and V2/3 (i.e. significant constant term in V1 and V2/3 and linear term in V2/3, see Table 3B, Figure 4Aii right). The surface projection of shape parameters of the BOLD response profile at the group level explains this discrepancy between the attentional effects on regional BOLD response and activation patterns (Figure 4Bii). Consistent with mechanisms of contrast enhancement (Bressler et al., 2013; Müller and Kleinschmidt, 2004; Smith et al., 2000; Tootell et al., 1998), attending to vision relative to audition increased responses in regions that were activated by visual looming stimuli, yet suppressed the deactivations in the surrounding regions that were deactivated by visual stimuli thereby cancelling out global attentional effects for the regional BOLD response (Figure 4Bii, note the white lines indicate the border between visual activation and deactivation from Figure 2—figure supplement 1B right).

In summary, auditory looming induced widespread deactivations in visual cortices that were maximal at the cortical surface and extended into regions that represent more peripheral visual fields. By contrast, attention to vision predominantly increased BOLD response in central parts that were activated by visual stimuli and suppressed BOLD responses in the periphery that were deactivated by visual stimuli.

Discussion

The current high-resolution 7T fMRI study revealed distinct depth-dependent BOLD response profiles and patterns for multisensory and attentional influences in early sensory cortices.

Distinct laminar profiles and activation patterns for multisensory and attentional influences in auditory cortices

Visual looming suppressed activations in auditory cortices (see also [Laurienti et al., 2002; Leitão et al., 2013; Mozolic et al., 2008], but enhanced the response to concurrent auditory looming in posterior auditory cortices (Werner and Noppeney, 2011; Werner and Noppeney, 2010). In other words, intersensory competition for purely visual stimuli turned into cooperative interactions for audiovisual stimuli. While the visual-induced response suppression was constant across cortical depth in A1 and PT, the crossmodal BOLD-response enhancement was observed mainly in the caudal parts of PT, that is in parts of PT where visual influences have previously been shown (Kayser et al., 2007). At a finer spatial resolution, multivariate analyses revealed significant differences between audiovisual and auditory activation patterns across all laminae in both A1 and PT. As shown in the surface projections (see Figure 3B), visual looming enhanced and suppressed auditory responses in adjacent patches consistent with neurophysiological results in non-human primates (Kayser et al., 2008). Critically, as illustrated in the raster plots, the response profile of a vertex to visual stimuli significantly predicted its laminar profile for crossmodal modulation (see Figure 3B right). In A1, vertices whose activations were only weakly suppressed by visual looming in deeper laminae showed a greater visual-induced response enhancement for audiovisual looming again in deeper laminae. These results suggest that visual looming influences activations in A1 mainly in deeper layers with similar patchy topographies in unisensory and audiovisual contexts.

By contrast, modality-specific attention affected auditory and visual evoked responses predominantly at the cortical surface (see significant linear term in Table 3). The large attentional BOLD-response effects in superficial laminae dovetail nicely with previous neurophysiological research, which located effects of modality-specific attention in supragranular layers of auditory cortices (De Martino et al., 2015a; Lakatos et al., 2009). Yet, because the GE-EPI BOLD response measured at superficial laminae includes contributions from deeper laminae (Harel et al., 2006; Markuerkiaga et al., 2016), we cannot unequivocally attribute the attentional influences at the cortical surface solely to superficial layers as neural origin. It is also possible that the increase in attentional BOLD-response effects towards the cortical surface may arise from neural effects across all cortical layers. Consistent with this conjecture, the BOLD-response profiles to auditory induced activations (Figure 2—figure supplement 1A left) and attentional modulation (Figure 4Ai left) in A1 and PT are quite similar.

Critically, however, the laminar profiles for attentional effects were distinct from those for multisensory (i.e. crossmodal) effects in auditory cortices. This dissociation in laminar profiles in the same territories cannot be explained by vascular effects that limit the laminar specificity of the BOLD response, but strongly implies that multisensory and attentional mechanisms regulate information flow in auditory cortices via partly distinct neural circuitries (De Martino et al., 2015a; Lakatos et al., 2009). Further, the constant laminar profiles for crossmodal influences in auditory cortices suggest that visual signals influence auditory cortices mainly via infragranular layers – a finding that converges with a recent neurophysiological study in mouse that likewise highlighted deep layer six as the key locus for visual influences on auditory cortex (Morrill and Hasenstaub, 2018). Anatomical studies in monkeys have previously suggested that visual inputs can influence auditory cortices via three routes (Ghazanfar and Schroeder, 2006; Musacchia and Schroeder, 2009; Schroeder et al., 2003; Smiley and Falchier, 2009): i. thalamic afferents, ii. direct connectivity between sensory areas (Falchier et al., 2002; Rockland and Ojima, 2003) and iii. connections from higher order association cortices.

Common laminar profiles, but distinct activation patterns for multisensory and attentional influences in visual cortices

In visual cortices, visual and auditory stimuli evoked responses that were maximal in absolute amplitude at the cortical surface, yet differed in their activation pattern. Visual looming stimuli induced activations in areas representing the centre of the visual field and deactivations in surrounding areas representing the periphery (Shmuel et al., 2002). In contrast to this visual ‘centre-surround’ pattern, auditory stimuli induced widespread deactivations (Laurienti et al., 2002; Leitão et al., 2013; Mozolic et al., 2008) particularly in peripheral visual field representations known to have denser direct fibre connections to auditory cortices (Falchier et al., 2002; Rockland and Ojima, 2003).

Thus, our study revealed three types of deactivations. In visual cortices, both visual and auditory stimuli induced deactivations that were most evident at the cortical surface, as previously reported for deactivations to ipsilateral visual stimuli (Fracasso et al., 2018), though see [Goense et al., 2012]. In auditory cortices, the visual-induced deactivations were constant across cortical depth. These differences in laminar profiles or BOLD patterns between visual and auditory cortices may reflect differences in neural mechanisms. For instance, neurophysiological studies in rodents have shown asymmetries in multisensory influences between auditory and visual cortices, reporting robust auditory induced inhibition in layers 2/3 in visual cortices but not visual-induced inhibition in auditory cortices (Iurilli et al., 2012).

While we observed no statistically significant crossmodal nor attentional effects on the regional BOLD response profiles in visual cortices, multivariate pattern analyses indicated that modality-specific attention altered the activation patterns at a sub-regional spatial resolution (Figure 4Aii): attention to vision amplified activations in areas representing the central visual field and suppressed activations in areas representing peripheral visual fields (see Figure 4Bii).

The differences in activation patterns for multisensory and attentional influences indicate that the impact of purely auditory looming (Maier et al., 2008; Maier et al., 2004) on visual cortices cannot solely be attributed to a withdrawal of attentional resources from vision. Instead, purely auditory looming may inhibit activations in visual cortices via direct connectivity between auditory and visual areas (Bizley et al., 2007; Budinger et al., 2006; Campi et al., 2010; Falchier et al., 2002; Ibrahim et al., 2016; Rockland and Ojima, 2003). In line with this conjecture, research in rodents has shown that auditory stimuli modulates activity in supragranular layers in primary visual areas via direct connectivity from auditory cortices and translaminar inhibitory circuits – though the studies disagreed in whether layer 5 (Iurilli et al., 2012) or layer 1 (Deneux et al., 2019; Ibrahim et al., 2016) were the primary targets of auditory inputs.

Multisensory effects in visual and auditory cortices – a comparison

Our results reveal distinct laminar profiles for multisensory deactivations in auditory and visual cortices. In auditory cortices the visual induced deactivations were constant across laminae, in visual cortices the deactivations linear increased (in absolute magnitude) toward cortical surface. These distinct laminar profiles converge with previous research in rodents that also reveal visual influences in auditory cortices predominantly in infragranular layer (Morrill and Hasenstaub, 2018), but auditory influences in visual cortices mainly in supragranular layers (Deneux et al., 2019; Ibrahim et al., 2016; Iurilli et al., 2012). Potentially, these distinct laminar profiles in auditory and visual cortices may suggest that multisensory influences are mediated by different neural circuitries in auditory and visual cortices. However, the distinct laminar profiles across cortical areas may not only reflect differences in neural circuitries but also in vascular organization. Moreover, a wealth of research has previously shown that multisensory responses profoundly depend on stimulus salience and other input characteristics (Deneux et al., 2019; Meijer et al., 2017; Stein et al., 2020; Werner and Noppeney, 2010). It is currently unclear how these factors influence laminar BOLD-response profiles. Finally, interpreting laminar BOLD-response profiles in relation to previous findings in rodents remains tentative because of cross-species and methodological differences.

Conclusions

Using submillimetre resolution fMRI at 7T, we resolved BOLD response and activation patterns of multisensory and attentional influences across cortical depth in early sensory cortices. In visual cortices auditory stimulation induced widespread inhibition, while attention to vision enhanced central but suppressed peripheral field representations. In auditory cortices, competitive and cooperative multisensory interactions were stronger deep within the cortex, while attentional influences were greatest at the cortical surface. The distinctiveness of these depth-dependent activation profiles and patterns suggests that multisensory and attentional mechanisms control information flow via partly distinct neural circuitries.

Materials and methods

All procedures were approved by the Ethics Committee of the University of Leipzig.

Participants

Thirteen healthy German native speakers (6 females; 7 males; mean age: 24.8 years, standard deviation: 1.5 years, range: 22–27 years) gave written informed consent to participate in this fMRI study. Participants reported no history of psychiatric or neurological disorders, and no current use of any psychoactive medications. All had normal or corrected to normal vision and reported normal hearing. Two of those subjects did not complete all fMRI sessions and were therefore not included in the analysis. All procedures were approved by the Ethics Committee of the University of Leipzig.

Share this article

Cite this article

Experimental design, timeline and cortical layering.

Auditory and visual deactivations.

Cross-modal modulation in auditory areas.

Auditory and visual deactivations.

Effects of the cross-modal modulation on the laminar BOLD response and decoding accuracy profiles in auditory areas.

Effects of the attentional modulation (irrespective of stimulus type) on the laminar BOLD response and decoding accuracy profiles.

Attentional modulation.

Author details

Remi Gau

Contribution

For correspondence

Competing interests

Pierre-Louis Bazin

Contribution

Competing interests

Robert Trampel

Contribution

Competing interests

Robert Turner

Contribution

Competing interests

Uta Noppeney

Contribution

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism