1. Neuroscience
Download icon

Categorical representation from sound and sight in the ventral occipito-temporal cortex of sighted and blind

  1. Stefania Mattioni  Is a corresponding author
  2. Mohamed Rezk
  3. Ceren Battal
  4. Roberto Bottini
  5. Karen E Cuculiza Mendoza
  6. Nikolaas N Oosterhof
  7. Olivier Collignon  Is a corresponding author
  1. Institute of research in Psychology (IPSY) & Institute of Neuroscience (IoNS) - University of Louvain (UCLouvain), Belgium
  2. Centre for Mind/Brain Sciences, University of Trento, Italy
Research Article
  • Cited 0
  • Views 1,192
  • Annotations
Cite this article as: eLife 2020;9:e50732 doi: 10.7554/eLife.50732

Abstract

Is vision necessary for the development of the categorical organization of the Ventral Occipito-Temporal Cortex (VOTC)? We used fMRI to characterize VOTC responses to eight categories presented acoustically in sighted and early blind individuals, and visually in a separate sighted group. We observed that VOTC reliably encodes sound categories in sighted and blind people using a representational structure and connectivity partially similar to the one found in vision. Sound categories were, however, more reliably encoded in the blind than the sighted group, using a representational format closer to the one found in vision. Crucially, VOTC in blind represents the categorical membership of sounds rather than their acoustic features. Our results suggest that sounds trigger categorical responses in the VOTC of congenitally blind and sighted people that partially match the topography and functional profile of the visual response, despite qualitative nuances in the categorical organization of VOTC between modalities and groups.

eLife digest

The world is full of rich and dynamic visual information. To avoid information overload, the human brain groups inputs into categories such as faces, houses, or tools. A part of the brain called the ventral occipito-temporal cortex (VOTC) helps categorize visual information. Specific parts of the VOTC prefer different types of visual input; for example, one part may tend to respond more to faces, whilst another may prefer houses. However, it is not clear how the VOTC characterizes information.

One idea is that similarities between certain types of visual information may drive how information is organized in the VOTC. For example, looking at faces requires using central vision, while looking at houses requires using peripheral vision. Furthermore, all faces have a roundish shape while houses tend to have a more rectangular shape. Another possibility, however, is that the categorization of different inputs cannot be explained just by vision, and is also be driven by higher-level aspects of each category. For instance, how humans use or interact with something may also influence how an input is categorized. If categories are established depending (at least partially) on these higher-level aspects, rather than purely through visual likeness, it is likely that the VOTC would respond similarly to both sounds and images representing these categories.

Now, Mattioni et al. have tested how individuals with and without sight respond to eight different categories of information to find out whether or not categorization is driven purely by visual likeness. Each category was presented to participants using sounds while measuring their brain activity. In addition, a group of participants who could see were also presented with the categories visually. Mattioni et al. then compared what happened in the VOTC of the three groups – sighted people presented with sounds, blind people presented with sounds, and sighted people presented with images – in response to each category.

The experiment revealed that the VOTC organizes both auditory and visual information in a similar way. However, there were more similarities between the way blind people categorized auditory information and how sighted people categorized visual information than between how sighted people categorized each type of input. Mattioni et al. also found that the region of the VOTC that responds to inanimate objects massively overlapped across the three groups, whereas the part of the VOTC that responds to living things was more variable.

These findings suggest that the way that the VOTC organizes information is, at least partly, independent from vision. The experiments also provide some information about how the brain reorganizes in people who are born blind. Further studies may reveal how differences in the VOTC of people with and without sight affect regions typically associated with auditory categorization, and potentially explain how the brain reorganizes in people who become blind later in life.

Introduction

The study of sensory deprived individuals represents a unique model system to test how sensory experience interacts with intrinsic biological constraints to shape the functional organization of the brain. One of the most striking demonstrations of experience-dependent plasticity comes from studies of blind individuals showing that the occipital cortex (traditionally considered as visual) massively extends its response repertoire to non-visual inputs (Neville and Bavelier, 2002; Sadato et al., 1998).

But what are the mechanisms guiding this process of brain reorganization? It was suggested that the occipital cortex of people born blind is repurposed toward new functions that are distant from the typical tuning of these regions for vision (Bedny, 2017). In fact, the functional organization of occipital regions has been thought to develop based on innate protomaps implementing a computational bias for low-level visual features including retinal eccentricity bias (Malach et al., 2002), orientation content (Rice et al., 2014), spatial frequency content (Rajimehr et al., 2011) and the average curvilinearity/rectilinearity of stimuli (Nasr et al., 2014). This proto-organization would serve as low-level visual biases scaffolding experience-dependent domain specialization (Arcaro and Livingstone, 2017; Gomez et al., 2019). Consequently, in absence of visual experience, the functional organization of the occipital cortex could not develop according to this visual proto-organization and those regions may therefore switch their functional tuning toward distant computations (Bedny, 2017).

In striking contrast with this view, several studies suggested that the occipital cortex of congenitally blind people maintains a division of computational labor somewhat similar to the one characterizing the sighted brain (Amedi et al., 2010; Dormal and Collignon, 2011; Ricciardi et al., 2007). Perhaps, the most striking demonstration that the occipital cortex of blind people develops a similar coding structure and topography as the one typically observed in sighted people comes from studies exploring the response properties of the ventral occipito-temporal cortex (VOTC). In sighted individuals, lesion and neuroimaging studies have demonstrated that VOTC shows a medial to lateral segregation in response to living and non-living visual stimuli, respectively, and that some specific regions respond preferentially to visual objects of specific categories like the fusiform face area (FFA; Kanwisher et al., 1997; Tong et al., 2000), the extrastriate body area (EBA; Downing et al., 2001) or the parahippocampal place area (PPA; Epstein and Kanwisher, 1998). Interestingly, In early blind people, the functional preference for words (Reich et al., 2011) or letters (Striem-Amit et al., 2012), motion (Dormal et al., 2016; Poirier et al., 2004), places (He et al., 2013; Wolbers et al., 2011), bodies (Kitada et al., 2014; Striem-Amit and Amedi, 2014), tools (Peelen et al., 2013) and shapes (Amedi et al., 2007) partially overlaps with similar categorical responses in sighted people when processing visual inputs.

Distributed multivariate pattern analyses (Haxby et al., 2001) have also supported the idea that the large-scale categorical layout in VOTC shares similarities between sighted and blind people (Handjaras et al., 2016; van den Hurk et al., 2017; Peelen et al., 2014; Wang et al., 2015). For example, it was shown that the tactile exploration of different manufactured objects (shoes and bottles) elicits distributed activity in VOTC of blind people similar to the one observed in sighted people in vision (Pietrini et al., 2004). A recent study demonstrated that the response patterns elicited by sounds of four different categories in the VOTC of blind people could successfully predict the categorical response to images of the same categories in the VOTC of sighted controls, suggesting overlapping distributed categorical responses in sighted for vision and in blind for sounds (van den Hurk et al., 2017). All together, these studies suggest that there is more to the development of the categorical response of VOTC than meets the eye (Collignon et al., 2012).

However, these researches leave several important questions unanswered. If a spatial overlap exists between the sighted processing visual inputs and the blind processing non-visual material, whether VOTC represents similar informational content in both groups remains unknown. It is possible, for instance, that the overlap in categorical responses between groups comes from the fact that VOTC represents visual attributes in the sighted (Arcaro and Livingstone, 2017; Gomez et al., 2019) and acoustic attributes in the blind due to crossmodal plasticity (Bavelier and Neville, 2002). Indeed, several studies involving congenitally blind have shown that their occipital cortex may represent acoustic features – for instance frequencies (Huber et al., 2019; Watkins et al., 2013) – which form the basis of the development of categorical selectivity in the auditory cortex (Moerel et al., 2012). Such preferential responses for visual or acoustic features in the sighted and blind, respectively, may lead to overlapping patterns of activity for similar categories while implementing separate computations on the sensory inputs. Alternatively, it is possible that the VOTC of both groups code for higher-order categorical membership of stimuli presented in vision in sighted and audition in the blind, at least partial independently from low-level features of the stimuli.

Moreover, the degree of similarity between the categorical representation in sighted and in blind might differ across different categories: not all the regions in VOTC seem to be affected to the same extent by the crossmodal plasticity reorganization (Bi et al., 2016Dormal et al., 2018; Wang et al., 2015). This ‘domain–by–modality interaction’ suggests that intrinsic characteristics of objects belonging to different categories might drive this difference. However, a qualitative exploration of the structure of the categorical representation in the VOTC of blind and sighted is still missing.

Another unresolved but important question is whether sighted people also show categorical responses in VOTC to acoustic information similar to the one they show in vision. For instance, the two multivariate studies using sensory (not word) stimulation (tactile, Pietrini et al., 2004; auditory, van den Hurk et al., 2017) of various categories in sighted and blind either did not find the existence of category-related patterns of response in the ventral temporal cortex of sighted people (Pietrini et al., 2004) or did not report overlapping distributed response between categories presented acoustically or visually in sighted people (van den Hurk et al., 2017). Therefore, it remains controversial whether similar categorical responses in VOTC for visual and non-visual sensory stimuli only emerge in the absence of bottom-up visual inputs during development or whether it is an organizational property also found in the VOTC of sighted people.

Finally, it has been suggested that VOTC regions might display similar functional profile for sound and sight in sighted and blind because different portions of this region integrate specific large-scale brain networks sharing similar functional coding. However, empirical evidence supporting this mechanistic account remains scarce.

With these unsolved questions in mind, we relied on a series of complementary multivariate analyses in order to carry out a comprehensive mapping of the representational geometry underlying low-level (acoustic or visual features mapping) and categorical responses to images and sounds in the VOTC of sighted and early blind people.

All together our results demonstrate that early visual deprivation triggers an extension of the intrinsic, partially non-visual, categorical organization of VOTC, potentially supported by a connectivity bias of portions of VOTC with specific large-scale functional brain networks. However, the categorical representation of the auditory stimuli in VOTC of both blind and sighted individuals exhibits different qualitative nuances compared to the categorical organization generated by visual stimuli in sighted people.

Results

Topographical selectivity map

Figure 1B represents the topographical selectivity maps, which show the voxel-wise preferred stimulus condition based on a winner take-all approach (for the four main categories: animals, humans, small objects and places). In the visual modality, we found the well-known functional selectivity map for visual categories (Julian et al., 2012; Kanwisher, 2010). The auditory selectivity maps of the blind subjects partially matched the visual map obtained in sighted controls during vision (r = 0.19, pFDR <0.001). The blind map and the visual control map are strongly correlated.

Figure 1 with 1 supplement see all
Experimental design and topographical selectivity maps.

(A) Categories of stimuli and design of the visual (VIS) and auditory (AUD) fMRI experiments. (B) Averaged untresholded topographical selectivity maps for the sighted-visual (top), the blind-auditory (center) and the sighted-auditory (bottom) participants. These maps visualize the functional topography of VOTC to the main four categories in each group. These group maps are created for visualization purpose only since statistics are run from single subject maps (see methods). To obtain those group maps, we first averaged the β-values among participants of the same group in each voxel inside the VOTC mask for each of our 4 main conditions (animals, humans, manipulable objects and places) separately and we then assign to each voxel the condition producing the highest β-value. We decided to represent maps including the 4 main categories (instead of 8) to simplify visualization of the main effects (the correlation values are almost identical with 8 categories and those maps can be found in supplemental material).

In addition, a similar selectivity map was also observed in the sighted controls using sounds. The correlation was significant both with visual map in sighted (r = 0.14, pFDR <0.001), and with the auditory map in blinds (r = 0.06, pFDR = 0.001). The correlation between EBa and SCa was significantly lower than both the correlation between SCv and EBa (pFDR. = 0.0003) and the correlation between SCv and SCa (pFDR = 0.0004). Instead, the magnitude of correlation between EBa and SCv was not significantly different from the correlation between SCa and SCv (pFDR = 0.233).

In Figure 1B we report the results on the four main categories for the simplicity of visualization, however in the supplemental material we show that the results including eight categories are almost identical (Figure 1—figure supplement 1).

In order to look at the consistency of the topographical representation of the categories across subjects within the same group we computed the Jaccard similarity between the topographical selectivity map of each subject and the mean topographical map of his own group. The one sample T-test revealed a significant Jaccard similarity in each category and in each group (in all cases p<0.001, after FDR correction for the 12 comparisons: three groups * four categories), highlighting a significant consistency of the topographical maps for each category between subjects belonging to the same group. We performed a repeated measures ANOVA to look at the differences between categories and groups. We obtained a significant main effect of Category (F(3,138)=18.369; p=<0.001) and a significant interaction Group*Category (F(6,138)=4.9; p=<0.001), instead the main effect of Group was not significant (F(2,46)=1.83; p.17). We then run post-hoc. In SCv we did not find any difference between categories, meaning that the consistency (i.e. the Jaccard similarity) of the topographical maps for the visual stimuli in sighted was similar for each category. In the SCa group, the only two significant differences emerged between the category ‘big objects and places’ and the two animate categories: humans (p=0.01) and animals (p<0.008). In both cases the consistency of the topographical maps in the big object and places was significantly higher compared to the consistency in the humans and animals’ categories. Finally, in the blind group only the humans and the manipulable objects categories did not show a significant difference. The animal category was, indeed, significantly lower than the humans (p=0.01), the manipulable objects (p=0.008) and the big objects and places (p=0.002). Both, the human and the manipulable objects categories were significantly lower compared to the big objects and places category (p=0.004 and p=0.007). Finally, when we look at the difference between the three groups for each category, the main differences emerged from the animals and the big objects and places categories. The animals’ category showed a significantly lower Jaccard similarity within the EBa group compared to both SCa (p=0.008) and SCv (p=0.001) groups while the similarity of big objects and places category was significantly higher in EBa compared to SCv (p=0.038).

In addition, we wanted to explore the similarity and differences among the topographical representations of our categories when they were presented visually compared to when they were presented acoustically, both in sighted and in blind. To do that, we computed the Jaccard similarity index for each category, between the topographical map of each blind and sighted subject in the auditory experiment and the averaged topographical selectivity map of the sighted in the visual experiment (see Figure 2C for the results). The one-sample T-tests revealed a significant similarity between EBa and SCv and between SCa and SCv in each category (pFDR <0.001 in all cases). The repeated measures ANOVA highlighted a significant main effect of Category (F(2.2,69.8) = 31.17, p<0.001). In both groups’ comparisons the Jaccard similarity was higher between the big objects and places category compared to the other three categories (animal: p<0.001; human: p<0.001; manipulable: p<0.001).

Figure 2 with 1 supplement see all
Voxels’ count and Jaccard analyses.

(A) Number of selective voxels for each category in each group, within VOTC. Each circle represents one subject, the colored horizontal lines represent the group average and the vertical black bars are the standard error of the mean across subjects. (B) The Jaccard similarities values within each group, for each of the four categories. (C) The Jaccard similarity indices between the EBa and the SCv groups (left side) and the Jaccard similarity indices between the SCa and the SCv groups (right side).

No difference emerged between groups, suggesting comparable level of similarity of the auditory topographical maps (both in blind and in sighted) with the visual topographical map in sighted participants.

Finally, since the degree of overlap highlighted by the Jaccard similarity values might be driven by the number of voxels selective for each category, we looked at the number of voxels showing the preference for each category in each group. The one sample T-test revealed a significant number of selective voxels in each category and in each group (in all cases p<0.001, after FDR correction for the 12 comparisons: 3 groups * 4 categories). We performed a repeated measure ANOVA to look at the differences between categories and groups. In SCv we found that a smaller number of voxels shows selectivity for the human category compared to the others (human vs animal: t(15)=-3.27; pFDR = 0.03; human vs manipulable: t(15)=2.60; pFDR = 0.08; human vs big: t(15)=-4.16; pFDR = 0.01). In EBa, instead, there is a lower number of voxels preferring animals compare to non-living categories (animals vs manipulable: t(15)=-2.47; pFDR = 0.09; animals vs big: t(15)=-3.22; pFDR = 0.03). Finally, in SCa the number of voxels selective for the big and place category was significantly higher than the number of voxels selective for the manipulable category (t(16)=3.27; pFDR = 0.03). Importantly, when we look at the difference between the 3 groups for each category, the main difference emerged from the animal category. In this category, the ANOVA revealed a main effect of group (F(2,46)=3.91; p=0.03). The post hoc comparisons revealed that this difference was mainly driven by the reduced number of voxels selective for the animal category in EBa compared to SCv (p=0.02).

For the results of the same analysis on the 8 different categories see also Figure 1—figure supplement 1 and Figure 2—figure supplement 1.

Binary MVP classification

Figure 3B represents the results from the average binary classification analyses for each group and every ROI (FDR corrected). In SCv and in EBa the averaged decoding accuracy was significantly higher than chance level in both EVC (SCv: DA = 69%; t(15)=6.69, pFDR <0.00001; EBa: DA = 55%; t(15)=4.48, pFDR = 0.0006) and VOTC (SCv: DA = 71%; t(15)=7.37, pFDR <0.00001; EBa: DA = 57%; t(15)=8.00, pFDR <0.0001). In the SCa the averaged decoding accuracy was significantly higher than the chance level in VOTC (DA = 54%; t(16)=4.32, pFDR = 0.0006) but not in EVC (DA = 51%; t(16)=1.70, pFDR = 0.11). Moreover, independent sample t-tests revealed higher decoding accuracy values in EBa when compared to SCa in both EVC (t(31)=2.52, pFDR = 0.017) and VOTC (t(31)=2.08, pFDR = 0.046).

Regions of interest and classification results.

(A) Representation of the 2 ROIs in one representative subject’s brain; (B) Binary decoding averaged results in early visual cortex (EVC) and ventral occipito-temporal cortex (VOTC) for visual stimuli in sighted (green), auditory stimuli in blind (orange) and auditory stimuli in sighted (blue). ***p<0.001, **p<0.05.

Results from each binary classification analysis (n = 28) for each group are represented in Figure 4 panel A1 for EVC and in panel B1 for VOTC. The p-values for each t-test is reported in the SI-table 3.

EVC and VOTC functional profiles.

(A1 and B1) Binary decoding bar plots. For each group (SCv: top; EBa: center; SCa: bottom) the decoding accuracy from the 28 binary decoding analyses are represented. Each column represents the decoding accuracy value coming from the classification analysis between 2 categories. The 2 dots under each column represent the 2 categories. (A2 and B2) The 28 decoding accuracy values are represented in the form of a dissimilarity matrix. Each column and each row of the matrix represent one category. In each square there is the accuracy value coming from the classification analysis of 2 categories. Blue color means low decoding accuracy values and yellow color means high decoding accuracy values. (A3 and B3) Binary decoding multidimensional scaling (MDS). The categories have been arranged such that their pairwise distances approximately reflect response pattern similarities (dissimilarity measure: accuracy values). Categories placed close together were based on low decoding accuracy values (similar response patterns). Categories arranged far apart generated high decoding accuracy values (different response patterns). The arrangement is unsupervised: it does not presuppose any categorical structure (Kriegeskorte et al., 2008b). (A4 and B4) Binary decoding dendrogram. We performed hierarchical cluster analysis (based on the accuracy values) to assess if EVC (A4) and VOTC (B4) response patterns form clusters corresponding to natural categories in the 3 groups (SCv: top; EBa: center; SCa: bottom).

RSA: Correlation between the neural dissimilarity matrices of the 3 groups

We used the accuracy values from the binary classifications to build neural dissimilarity matrices for each subject in EVC (Figure 4 - Panel A2) and in VOTC (Figure 4 - Panel B2). Then, in every ROI we computed the correlations between DSMs for each groups’ pair (i.e. SCv-EBa; SCv-SCa; EBa-SCa).

In EVC, the permutation test revealed a significant positive correlation only between the SCv and the EBa DSMs (mean r = 0.19; pFDR = 0.0002), whereas negative correlation emerged from the correlation between the SCv and SCa DSMs (mean r = –.18; pFDR <0.001) and the correlation between the SCa and the EBa DSMs (mean r = –0.03; pFDR = 0.18). Moreover, the correlation between SCv and EBa was significantly higher compared to both the correlation between SCv and SCa (mean corr. diff = 0.38; pFDR = 0.008) and the correlation between SCa and EBa (mean corr. diff = 0.22; pFDR = 0.008).

In VOTC, we observed a significant correlation for all the groups’ pairs: SCv and EBa (r = 0.34; pFDR = 0.0002), SCv and SCa (r = 0.1; pFDR = 0.002), SCa and EBa (r = 0.1; pFDR = 0.002). Moreover, the correlation between SCv and EBa was significantly higher compared to both the correlation between SCv and SCa (corr. diff = 0.25; pFDR = 0.008) and the correlation between SCa and EBa (corr. diff = 0.25; pFDR = 0.008).

Hierarchical clustering analysis on the brain categorical representation

We implemented this analysis to go beyond the magnitude of correlation values and to qualitatively explore the representational structure of our ROIs in the 3 groups. Using this analysis, we can see which categories are clustered together in the brain representations according to a specified n number of clusters (see Figure 5 for detailed results). In VOTC the most striking results are related to the way the animal category is represented in the EBa group compared to the SCv. In fact, when the hierarchical clustering was stopped at 2 clusters in SCv, we observed a clear living vs non-living distinction. In EBa, instead, the division was between humans and non-humans (including all the non-living categories plus the animals). When the clusters are 3, we see in SCv a separation into (1) non-living; (2) human; (3) animals. In EBa, instead, the animals keep in being clustered with non-living, in a way that the 3 clusters are: (1) Non-living and animals; (2) Human vocalization voices; (3) Human non-vocalization voices. In the case of the 4 clusters, in both SCv and EBa the additional 4th cluster is represented by the manipulable-tools category, while in the EBa the animals remain with the rest of the non-living categories. In the VOTC of SCa group, the structure of the categorical representation is less straightforward, despite the significant correlation with the DSMs of SCv. For example, we cannot clearly discern the distinction into living/non-living, or into humans and animals. However, there are some specific categories such as manipulable-graspable objects, human vocalizations and environmental sounds that show a segregated representation from the others. When we observe the clustering in EVC, we see that there is not a clear categorical clustering in this ROI in none of the groups, with the exception of the SCv in which the human stimuli tend to cluster together.

Finally, the clustering analysis on the behavioral data (see Figure 5) revealed a very similar way of clustering the categories in the 2 groups of sighted that in the 4 clusters step show exactly the same structure: (1) Animate categories including animals and humans; (2) Manipulable Objects; (3) Big mechanical objects; (4) Big Environmental category. In the EBa, instead, we find a different clustering structure with the animal and the human categories being separated.

Figure 5 with 1 supplement see all
Hierarchical clustering of brain data: VOTC and EVC.

Hierarchical clustering on the dissimilarity matrices extracted from EVC (left) and VOTC (right) in the three groups. The clustering was repeated three times for each DSM, stopping it at 2, 3 and 4 clusters, respectively. This allows to compare the similarities and the differences of the clusters at the different steps across the groups. In the figure each cluster is represented in a different color. The first line represents 2 clusters (green and red); the second line represents 3 clusters (green, red and pink); finally, the third line represents 4 clusters (green, red, pink and light blue).

RSA: correlation with representational low-level/behavioral models

In a behavioral session following the fMRI session, we asked our participants to rate each possible pair of stimuli in the experiment they took part in (either visual or acoustic) and we built three dissimilarity matrices based on their judgments. A visual exploration of the ratings using the dissimilarity matrix visualization revealed a clustering of the stimuli into a main living/non-living distinction, with some sub-clustering such as humans, animals and objects (Figure 6A). The three DSMs were highly correlated (SCa/EBa: r = 0.85, p<0.001; SCa/SCv: r = 0.95, p<0.001; EBa/SCv: r = 0.89, p<0.001), revealing a similar way to group the stimuli across the three groups following mostly a categorical strategy to classify the stimuli. Based on this observation, we used the behavioral matrices as a categorical/high-level model to contrast with the low-level models built on the physical properties of the stimuli (HmaxC1 and pitch models, Figure 6A).

Figure 6 with 1 supplement see all
Representational similarity analysis (RSA) between brain and representational low-level/behavioral models.

(A) In each ROI we computed the brain dissimilarity matrix (DSM) in every subject based on binary decoding of our 8 different categories. In the visual experiment (left) we computed the partial correlation between each subject’s brain DSM and the behavioral DSM from the same group (SCv-Behav.) regressing out the shared correlation with the HmaxC1 model, and vice versa. In the auditory experiment (right) we computed the partial correlation between each subject’s brain DSM (in both Early Blind and Sighted Controls) and the behavioral DSM from the own group (either EBa-Behav. or SCa-Behav.) regressing out the shared correlation with the pitch model, and vice versa. (B) Results from the Spearman’s correlation between representational low-level/behavioral models and brain DSMs from both EVC and VOTC. On the left are the results from the visual experiment. Dark green: partial correlation between SCv brain DSM and behavioral model; Light green: Partial correlation between SCv brain DSM and HmaxC1 model. On the right are the results from the auditory experiment in both early blind (EBa) and sighted controls (SCa). Orange: partial correlation between EBa brain DSM and behavioral model; Yellow: Partial correlation between EBa brain DSM and pitch model. Dark blue: partial correlation between SCa brain DSM and behavioral model; Light blue: partial correlation between SCa brain DSM and pitch model. For each ROI and group, the gray background bar represents the reliability of the correlational patterns, which provides an approximate upper bound of the observable correlations between representational low-level/behavioral models and neural data (Bracci and Op de Beeck, 2016; Nili et al., 2014). Error bars indicate SEM. ***p<0.001, **p<.005, *p<0.05. P values are FDR corrected.

In order to better understand the representational content of VOTC and EVC, we computed second-order partial correlations between each ROI’s DSM and our representational models (i.e. behavioral and low-level DSMs) for each participant. Figure 6B represents the results for the correlations between the brain DSMs in the 3 groups and the representational low-level/behavioral models DSMs (i.e. behavioral, pitch and Hmax-C1 DSMs).

The permutation test revealed that in SCv the EVC’s DSM was significantly correlated with the Hmax-C1 model (mean r = 0.13, p(one-tailed)FDR = 0.002) but not with the behavioral model (mean r = 0.03; p(one-tailed)FDR = 0.11). Even though the correlation was numerically higher with the Hmax-C1 model than with the behavioral model, a paired samples t-test did not reveal a significant difference between the two (t(15)=1.24, p(one-tailed)FDR = 0.23). The permutation test, showed that VOTC’s DSM, instead, was significantly positively correlated with the behavioral model (mean r = 0.34; pFDR <0.001) but negative correlated with the Hmax-C1 model (r = –.07; p(one-tailed)FDR = 0.991.). A paired samples t-test revealed that the difference between the correlation with the two models was significant (t(15)=6.71, pFDR <0.001).

In the EBa and SCa groups, EVC’s DSMs were not significantly correlated with neither the behavioral (EBa: mean r = 0.004; p(one-tailed)FDR = 0.47; SCa: mean r = –0.11, p(one-tailed)FDR = 0.98) nor the pitch model (EBa: mean r = –0.08; p(one-tailed)FDR = 0.94; SCa: mean r = –0.09, p(one-tailed)FDR = 0.98). In contrast, the VOTC’s DSMs were significantly correlated with the behavioral model in EBa (mean r = 0.12, p(one-tailed)FDR = 0.02) but not in SCa (mean r = 0.06; p(one-tailed)FDR = 0.17). Finally the VOTC’s DSMs were not significantly correlated with the pitch model neither in EBa(mean r = –0.03; p(one-tailed)FDR = 0.49), nor in SCa(mean r = 0.03; p(one-tailed)FDR = 0.39); In addition, a 2 Groups (EBa/SCa) X 2 Models (behavioral/pitch) ANOVA in VOTC revealed a significant main effect of Model (F(1,31)=11.37, p=0.002) and a significant interaction Group X Model (F(1,31)=4.03, p=0.05), whereas the main effect of Group (F(1,31)=2.38−4, p=0.98), was non-significant. A Bonferroni post-hoc test on the main effect of Model confirmed that the correlation was significantly higher for the behavioral model compared to the pitch model (t = 3.18, p=0.003). However, the Bonferroni post-hoc test on the interaction Group*Model revealed that the difference between behavioral and pitch models was significant only in EBa (t = 3.8, p=0.004).

For completeness of results, we report here also the correlation results before regressing out the partial correlation of the behavioral/low-level models from each other. In ECV, the mean correlation with the behavioral model was: r = 0.2 in SCv, r = 0.02 in EBa and r=–0.07 in SCa. In ECV, the mean correlation with the low level/model was: r = 0.21 in SCv (HmaxC1), r=–0.09 in EBa (pitch) and r = –0.06 in SCa (pitch). In VOTC, the mean correlation with the behavioral model was: r = 0.42 in SCv, r = 0.12 in EBa and r = 0.04 in SCa. In VOTC, the mean correlation with the low level/model was: r = 0.15 in SCv (HmaxC1), r = –0.08 in EBa (pitch) and r = 0.005 in SCa (pitch).

RSA: Inter-subjects correlation

We run this analysis to understand how variable was the brain representation in VOTC across subjects belonging either to the same group or to different groups. Since we have 3 groups, this analysis resulted in 6 different correlation values: 3 values for the 3 within group correlation conditions (SCv; EBa; SCa) and 3 values for the 3 between groups correlation conditions (i.e. SCv-EBa; SCv-SCa; EBa-SCa). Results are represented in Figure 7.

VOTC Inter-subject correlation within and between groups.

Upper panel represents the correlation matrix between the VOTC brain DSM of each subject with all the other subjects (from the same group and from different groups). The mean correlation of each within- and between-groups combination is reported in the bottom panel (bar graphs). The straight line ending with a square represents the average of the correlation between subjects from the same group (i.e. within groups conditions: SCv, EBa, SCa), the dotted line ending with the circle represents the average of the correlation between subjects from different groups (i.e. between groups conditions: SCv-EBa/SCv-SCa/EBa SCa). The mean correlations are ranked from the higher to the lower inter-subject correlation values.

The permutation test revealed that the correlation between subjects’ DSMs in the within group condition was significant in SCv (r = 0.42; pFDR <0.001) and EBa (r = 0.10; pFDR <0.001), whereas it was not significant in SCa (r = –.03; pFDR = 0.98). Moreover, the correlation between subjects’ DSMs was significant in all the three between groups conditions (SCv-EBa: r = 0.17, pFDR <0.001; SCv-SCa: r = 0.04, pFDR = 0.002; EBa-SCa: r = 0.02; pFDR = 0.04). When we ranked the correlations values (Figure 7) we observed that the highest inter-subject correlation is the within SCv group condition, which was significantly higher compared to all the other five conditions. It was followed by inter-subject correlation between SCv and EBa group and the within EBa group correlation. Interestingly, both the between groups SCv-EBa and the within group EBa correlations were significantly higher compared to the last 3 inter-subjects correlation’s values (between SCv-SCa; between EBa-SCa; within SCa).

Representational connectivity analysis

Figure 8 represents the results from the representational connectivity analysis in VOTC. The permutation analysis highlighted that the representational connectivity profile of VOTC with the rest of the brain is significantly correlated between all pairs of groups (SCv-EBa: mean r = 0.18, pFDR = 0.001; SCv-SCa: mean r = 0.14, pFDR <0.001; EBa-SCa: mean r = 0.16, pFDR <0.001), and with no difference between groups’ pairs.

Representational connectivity.

(A) Representation of the z-normalized correlation values between the dissimilarity matrix of the three VOTC seeds (left: Fusiform gyrus, center: Parahippocampal gyrus, Right: Infero-Temporal cortex) and the dissimilarity matrix of 27 parcels covering the rest of the cortex in the three groups (top: SCv, central: EBa, bottom: SCa). Blue color represents low correlation with the ROI seed; yellow color represents high correlation with the ROI seed. (B) The normalized correlation values are represented in format of one matrix for each group. This connectivity profile is correlated between groups. SCv: sighted control-vision; EBa: early blind-audition; SCa: sighted control-audition.

We performed the same analysis also in EVC. In this case the permutation analysis revealed a significant correlation only between the representational connectivity profile of the two groups of sighted: SCv and SCa (mean r = 0.17, pFDR <0.001), whereas the correlation between the EBa was not significant neither with SCv (mean r = 0.06, pFDR = 0.12) nor with SCa (mean r = 0.06, pFDR = 0.11). Moreover, the correlation between SCv and SCa was significantly higher than both, the correlation between SCv and EBa (mean diff = 0.11, pFDR = 0.01) and the correlation between SCa and EBa (mean diff = 0.11, pFDR = 0.01).

Discussion

In our study, we demonstrate that VOTC reliably encodes the categorical membership of sounds from eight different categories in sighted and blind people, using a topography (Figure 1B), representational format (Figure 4) and a representational connectivity profile (Figure 8) partially similar to the one observed in response to images of similar categories in vision.

Previous studies using linguistic stimuli had already suggested that VOTC may actually represent categorical information in a more abstracted fashion than previously thought (Handjaras et al., 2016; Borghesani et al., 2016Striem-Amit et al., 2018b; Peelen and Downing, 2017). However, even if the use of words is very useful in the investigation of pre-existing representation of concepts (Martin et al., 2017), it prevents the investigation of a bottom-up perceptual processing. By contrast, in our study we used sensory-related non-linguistic stimuli (i.e. sounds) in order to investigate both the sensory (acoustic) and categorical nature of the representation implemented in VOTC. To the limit of our knowledge, only one recent study investigated the macroscopic functional organization of VOTC during categorical processing of auditory and visual stimuli in sighted and in blind individuals (van den Hurk et al., 2017). They found that it is possible to predict the global large-scale distributed pattern of activity generated by different categories presented visually in sighted using the pattern of activity generated by the same categories presented acoustically in early blind. Relying on a different analytical stream, focusing on representational matrices extracted from pairwise decoding of our eight categories, our study confirms and extends those findings by showing that VOTC reliably encodes sound categories in blind people using a representational structure relatively similar to the one found in vision.

Our study goes beyond previous results in at least six significant ways. First, our results demonstrate that VOTC shows categorical responses to sounds in the sighted and the blind in a representational format partially similar to the one elicited by images of the same categories in sighted people (see Figure 1B). Observation of a similar categorical representational structure in VOTC for sounds and images in sighted people is crucial to support the idea that the intrinsic categorical organization of VOTC might be partially independent from vision even in sighted and that such intrinsic multisensory functional scaffolding may constrain the way crossmodal plasticity expresses in early blind people. Second, we observed that blind people show higher decoding accuracies and higher inter-subject consistency in the representation of auditory categories, and that the representational structure of visual categories in sighted was significantly closer to the structure of the auditory categories in blind than in sighted group (see Figure 1B, Figure 3, Figure 4B, Figure 7). This points to the idea that in absence of feed-forward visual input, VOTC increases its intrinsic representation of non-visual information. Third, VOTC shows similar large-scale representational connectivity profiles when processing images in sighted and sounds in sighted and blind people (see Figure 8). This result provides strong support to the general hypothesis that the functional tuning of a region is determined by large-scale connectivity patterns with regions involved in similar coding strategies (Behrens and Sporns, 2012; Mahon and Caramazza, 2011; Passingham et al., 2002). Fourth, our design allowed us to investigate which dimension of our stimuli, either categorical membership or acoustic properties, may determine the response properties of VOTC to sounds. By harnessing the opportunities provided by representational similarity analysis, we demonstrate that categorical membership is the main factor that predicts the representational structure of sounds in VOTC in blind people (see Figure 6), rather than lower-level acoustical attributes of sounds that are at least partially at the basis of category selectivity in the temporal cortex (Moerel et al., 2012). These results elucidate for the first time the computational characteristics that determine the categorical response for sounds in VOTC. Fifth, we provided a qualitative exploration of the structure of the categorical representation in the VOTC. The between-groups Jaccard similarity analysis revealed a domain–by–modality interaction (see Figure 2C), with the big objects and places category showing an higher degree of similarity between the auditory and visual representations compared to the other categories. In addition, both, the hierarchical clustering and the within-group Jaccard similarity analysis highlighted a domain-by-sensory experience interaction (see Figure 5 and Figure 2B), with the animal category represented differently in blind compared to sighted subjects (Bi et al., 2016; Wang et al., 2015). Finally, our study discloses that categorical membership is encoded in the EVC of blind people only, using a representational format that does not relate neither to the acoustic nor to the categorical structure of our stimuli, suggesting different mechanisms of reorganization in this posterior occipital region.

 Different visual categories elicit distinct distributed responses in VOTC using a remarkable topographic consistency across individuals (Julian et al., 2012; Kanwisher, 2010). It was suggested that regular visual properties specific to each category like retinotopic eccentricity biases (Gomez et al., 2019; Malach et al., 2002), curvature (Nasr et al., 2014) or spatial frequencies (Rajimehr et al., 2011) could drive the development of categorical response in VOTC for visual information (Andrews et al., 2010; Baldassi et al., 2013; Bracci et al., 2018; Rice et al., 2014; see Op de Beeck et al., 2019 for a recent review on the emergence of category selectivity in VOTC). For instance, the parahippocampal place area (PPA) and the fusiform face area (FFA) receive dominant inputs from downstream regions of the visual system with differential selectivity for high vs low spatial frequencies and peripheral vs. foveal inputs, causing them to respond differentially to place and face stimuli (Levy et al., 2001). These biases for specific visual attributes could be present at birth and represent a proto-organization driving the development of the categorical responses of VOTC based on experience (Arcaro and Livingstone, 2017; Gomez et al., 2019). For instance, a proto-eccentricity map is evident early in development (Arcaro and Livingstone, 2017) and monkeys trained early in life to discriminate different categories varying in their curvilinearity/rectilinearity develop distinct and consistent functional clusters for these categories (Srihasam et al., 2014). Further, adults who had intensive visual experience with Pokémon early in life demonstrate distinct distributed cortical responses to this trained visual category with a systematic location supposed to be based on retinal eccentricity (Gomez et al., 2019).

Although our results by no means disprove the observations that inherent visual biases can influence the development of the functional topography of high-level vision (Gomez et al., 2019; Hasson et al., 2002; Nasr et al., 2014), our data however suggest that category membership independently of visual attributes is also a key developmental factor that determines the consistent functional topography of the VOTC. Our study demonstrates that VOTC responds to sounds using a similar distributed functional profile to the one found in response to vision, even in case of people that have never had visual experience.

By orthogonalizing category membership and visual features of visual stimuli, previous studies reported a residual categorical effect in VOTC, highlighting how some of the variance in the neural data of VOTC might be explained by high-level categorical properties of the stimuli even when the contribution of the basic low-level features has been controlled for (Bracci and Op de Beeck, 2016; Kaiser et al., 2016; Proklova et al., 2016). Category-selectivity has also been observed in VOTC during semantic tasks when word stimuli were used, suggesting an involvement of the occipito-temporal cortex in the retrieval of category-specific conceptual information (Handjaras et al., 2016; Borghesani et al., 2016; Peelen and Downing, 2017). Moreover, previous research has shown that learning to associate semantic features (e.g., ‘floats’) and spatial contextual associations (e.g., ‘found in gardens’) with novel objects influences VOTC representations, such that objects with contextual connections exhibited higher pattern similarity after learning in association with a reduction in pattern information about the object's visual features (Clarke et al., 2016).

Even if we cannot fully exclude that the processing of auditory information in the VOTC of sighted people could be the by-product of the visual imagery triggered by the non-visual stimulation (Cichy et al., 2012; Kosslyn et al., 1995; Reddy et al., 2010; Slotnick et al., 2005; Stokes et al., 2009), we find it unlikely. First, we purposely included two separate groups of sighted people, each one performing the experiment in one modality only, in order to minimize the influence of having heard or seen the stimuli in the other modality in the context of the experiment. Also, we used a fast event-related design that restricted the time window to build a visual image of the actual sound since the next sound was presented quickly after (Logie, 1989). Moreover, we would expect that visual imagery would also triggers information to be processed in posterior occipital regions (Kosslyn et al., 1999). Instead, we found that EVC does not discriminate the different sounds in the sighted group (Figure 3B and Figure 4A). Finally, to further test the visual imagery hypothesis, we correlated the brain representational space of EVC in SCa with low-level visual model (i.e. HmaxC1). A significant positive correlation between the two would be in support of the presence of visual imagery mechanism when sighted people hear sounds of categories. We found, instead, a non-significant negative correlation, making the visual imagery hypothesis further unlikely to explain our results.

Comparing blind and sighted individuals arguably provides the strongest evidence for the hypothesis that category-selective regions traditionally considered to be ‘high-level visual regions’ can develop independently of visual experience. Interestingly, we found that the decoding accuracy for the auditory categories in VOTC is significantly higher in the early blind compared to the sighted control group (Figure 3B). In addition, the correlation between the topographic distribution of categorical response observed in VOTC was stronger in blind versus sighted people (Figure 1B). Moreover, the representational structure of visual categories in sighted was significantly closer to the structure of the auditory categories in blind than in sighted (Figure 4A2). Finally, the representation of the auditory stimuli in VOTC is more similar between blind than between sighted subjects (Figure 7), showing an increased inter-subject stability of the representation in case of early visual deprivation. All together, these results not only demonstrate that a categorical organization similar to the one found in vision could emerge in VOTC in absence of visual experience, but also that such categorical response to sounds is actually enhanced and more stable in congenitally blind people.

Several studies have shown that in absence of vision, the occipital cortex enhances its response to non-visual information processing (Collignon et al., 2012; Collignon et al., 2011; Sadato et al., 1998). However, people debate on the mechanistic principles guiding the expression of this crossmodal plasticity. For instance, it was suggested that early visual deprivation changes the computational nature of the occipital cortex which would reorganize itself for higher-level functions, distant from the ones typically implemented for visual stimuli in the same region (Bedny, 2017). In contrast with this view, our results demonstrate that the expression of crossmodal plasticity, at least in VOTC (see differences in EVC below), is constrained by the inherent categorical structure endowed in this region. First, we highlighted remarkably similar functional profile of VOTC for visual and auditory stimuli in sighted and in early blind individuals (Figure 4B). In addition, we showed that VOTC is encoding a similar categorical dimension of the stimuli across different inputs of presentation and different visual experiences (Figure 6B). In support of such idea, we recently demonstrated that the involvement of right dorsal occipital region for arithmetic processing in blind people actually relates to the intrinsic ‘spatial’ nature of these regions, a process involved in specific arithmetic computation (e.g. subtraction but not multiplication) (Crollen et al., 2019). Similarly, the involvement of VOTC during ‘language’ as observed in previous studies (Bedny et al., 2011; Burton et al., 2006; Kim et al., 2017; Lane et al., 2015; Röder et al., 2002) may relate to the fact that some level of representation involved in language (e.g. semantic) can be intrinsically encoded in VOTC as supported by the current results (Huth et al., 2016). In fact, we suggest that VOTC regions have innate predispositions relevant to important categorical distinctions that cause category-selective patches to emerge regardless of sensory experience. Why would the ‘visual’ system embed representation of categories independently of their perceptual features? One argument might be that items from a particular broad category (e.g. inanimate) are so diverse that they may not share systematic perceptual features and therefore a higher-level of representation, partially abstracted from vision, might prove important. Indeed, we gather evidence in support of an extension of the intrinsic categorical organization of VOTC that is already partially independent from vision in sighted. This finding represents an important step forward in understanding how experience and intrinsic constraints interact in shaping the functional properties of VOTC. An intriguing possibility raised by our results is that the crossmodal plasticity observed in early blind individuals may actually serve to maintain the functional homeostasis of occipital regions.

 What would be the mechanism driving the preservation of the categorical organization of VOTC in case of congenital blindness? It is thought that the specific topographic location of a selective brain functions is constrained by an innate profile of functional and structural connections with extrinsic brain regions (Passingham et al., 2002). Since the main fiber tracts are already present in full-term human neonates (Dubois et al., 2014; Dubois et al., 2016; Kennedy et al., 1999; Kostović and Judaš, 2010; Marín-Padilla, 2011; Takahashi et al., 2011), such initial connectome may at least partly drive the functional development of a specific area. Supporting this hypothesis, the visual word form area (VWFA) in VOTC (McCandliss et al., 2003) shows robust and specific anatomical connectivity to EVC and to frontotemporal language networks and this connectivity fingerprint can predict the location of VWFA even before a child learn to read (Saygin et al., 2016). Similarly, anatomical connectivity profile can predict the location of the fusiform face area (FFA) (Saygin et al., 2012). In addition to intra-occipital connections, FFA has a direct structural connection with the temporal voice area (TVA) in the superior temporal sulcus (Benetti et al., 2018; Blank et al., 2011) thought to support similar computations applied on faces and voices as well as their integration (von Kriegstein et al., 2005). Interestingly, recent studies suggested that the maintenance of those selective structural connections between TVA and FFA explains the preferential recruitment of TVA for face processing in congenitally deaf people (Benetti et al., 2018; Benetti et al., 2017). This TVA-FFA connectivity may explain why voices preferentially map slightly more lateral to the mid-fusiform sulcus (Figure 1B). Similarly, sounds of big objects or natural scenes preferentially recruit more mesial VOTC regions (Figure 1B), overlapping with the parahippocampal place area, potentially due to the preserved pattern of structural connectivity of those regions in blind people (Wang et al., 2017). The existence of these innate large-scale brain connections that are specific for each region supporting separate categorical domain may provide the structural scaffolding on which crossmodal inputs capitalize to reach VOTC in both sighted and blind people, potentially through feed-back connections. Indeed, it has been shown that the main white matter tracks including those involving occipital regions are not significantly different between blind and sighted individuals (Shimony et al., 2006). In EB, the absence of competitive visual inputs typically coming from feed-forward inputs from EVC may actually trigger an enhanced weighting of those feed-back inter-modal connection leading to an extension of selective categorical response to sounds in VOTC, as observed in the current study. Our results provide crucial support for this ‘biased connectivity’ hypothesis (Hannagan et al., 2015; Mahon and Caramazza, 2011) showing that VOTC subregions are part of a large-scale functional network representing categorical information in a format that is at least partially independent from the modality of the stimuli presentation and from the visual experience.

Even though the categorical representation of VOTC appears, to a certain degree, immune to input modality and visual experience, there are also several differences emerging from the categorical representation of sight and sounds in the sighted and blind. Previous studies already suggested that intrinsic characteristics of objects belonging to different categories might drive different representations in the VOTC of the blind (Bi et al., 2016Büchel, 2003; Wang et al., 2015). In line with this idea, the between-groups Jaccard similarity analysis (see Figure 2C) revealed a domain–by–modality interaction, with the big objects and places categories showing the highest degree of similarity between the vision and audition (both in blind and in sighted). In contrast, the lowest topographical consistency between groups was found for the animal category. We found that in the early blind group the number of voxels selective for animals is reduced compared to the other categories (see Figure 2A), suggesting that the animal category is under represented in the VOTC of early blind. Our hierarchical clustering analyses (see Figure 5 and Figure 5—figure supplement 1) also highlight a reduced animate/inanimate division in the EBa group, with the animal and the humans categories not clustering together and the animals being represented more like tools or big objects in the EBa. Interestingly, this is the case in both the categorical representation of VOTC (Figure 5) and the behavioral evaluation of our stimuli made by blind individuals (Figure 5—figure supplement 1). An explanation for this effect could be the different way blind and sighted individuals might have in perceiving and interacting with animals. In fact, if we exclude pets (only 1 out of the six animals we included in this study), sighted individuals normally perceive the animacy of animals (such as bird, donkey, horse etc.) mostly throughout vision (either in real life or in pictures/movies). Blind people, instead, do normally learn the peculiar shape of each animal touching static miniature models of them. Moreover, when blind people hear the sounds of these animals without seeing them, they might combine these sounds with the rest of the environmental sounds, and this is indeed what we see in the behavioral ratings, in which only blind subjects cluster together animals and big environmental sounds. These results therefore reveal that the modality of presentation and/or the visual experience do affect the qualitative structure of the categorical representation in VOTC, and this effect is stronger for some categories (i.e. animals) compared to others (i.e. inanimate).

A different profile emerged from EVC. First, sound categories could be decoded in the EVC of EB (Figure 3B) but not of the SC. In addition, the representational structure of EVC for sounds correlated to the one found in vision only in EB (Figure 4A2). However, neither the categorical membership nor the acoustic attributes of sounds correlated with the representational structure found in the EVC of EB (Figure 6B). A possible explanation for this result is that the posterior part of the occipital cortex in EB is the region that distances itself the most from the native computation it typically implements (Bi et al., 2016; Wang et al., 2015). In support of this possibility, the representational connectivity profile of EVC in EBa did not show any similarity with the one of sighted (neither SCv nor SCa). Because this area has a native computation that does not easily transfer to another sensory modality (i.e. low-level vision), it may therefore rewire itself for distant functions (Bedny, 2017). Some studies, for instance reported an involvement of EVC in high-level linguistic or memory tasks (Van Ackeren et al., 2018; Amedi et al., 2003; Bedny et al., 2011). However, as demonstrated here, the categorical membership of sounds, which may be a proxy for semantic representation, does not explain the representational structure of EVC in our study. It would be interesting to investigate whether models based on linguistic properties such as word frequency or distributional statistic in language corpus (Baroni et al., 2009) would, at least partially, explain the enhanced information that we found in EVC of EB. However, our design does not allow us to implement this analysis because the language-statistic DSM based on our stimuli space highly correlate with categorical models. Future studies should investigate this point using a set of stimuli in which the categorical and the linguistic dimensions should be orthogonalized. A further limitation of our study is the limited number of brain regions that we investigated. Since the experimental design and analyses we implemented were a priori focused on VOTC (target region) and EVC (as a control region), it is possible that other brain areas might show either similar or different representation across modalities and groups. In particular, since the brain is a highly interconnected organ (de Pasquale et al., 2018), it is unlikely that early visual deprivation would affect exclusively a portion of the occipital cortex leaving the rest of the functional network unaffected. It would be of particular interest to investigate whether the reorganization of the visual cortex occurs together with changes in brain regions coding for the remaining senses, such as temporal regions typically coding for auditory stimuli (Mattioni et al., 2018).

Materials and methods

Participants

Thirty-four participants completed the auditory version of the fMRI study: 17 early blinds (EBa; 10F) and 17 sighted controls (SCa; 6F). An additional group of 16 sighted participants (SCv; 8F) performed the visual version of the fMRI experiment. All the blind participants lost sight at birth or before 4 years of age and all of them reported not having visual memories and never used vision functionally (see Supplementary file 1). The three groups were age (range 20–67 years, mean ± SD: 33.29 ± 10.24 for EBa subjects, respectively 23–63, 34.12 ± 8.69 for SCa subjects, and 23–51, 30.88 ± 7.24 for SCv subjects) and gender (χ2 (2,50)=1.92, p=0.38) matched. One blind subject performed only 2 out of the 5 runs in the fMRI due to claustrophobia; because of that we excluded his data. All subjects were blindfolded during the auditory task and were instructed to keep the eyes closed during the entire duration of the experiment. Participants received monetary compensation for their participation. The ethical committee of the University of Trento approved this study (protocol 2014–007) and participants gave their informed consent before participation.

Stimuli

Request a detailed protocol

We decided to use sounds and images, instead of words, because we wanted to access and model the bottom-up cascade of sensory processing starting from the low-level sensory processing up to the more conceptual level. This methodological decision was crucial in order to assess what level of sound representation is implemented in VOTC of blind and sighted individuals.

A preliminary experiment was carried out in order to select the auditory stimuli. Ten participants who did not participate in the main experiment were presented with 4 different versions of 80 acoustic stimuli from 8 different categories (human vocalization, human non-vocalization, birds, mammals, tools, graspable objects, environmental scenes, big mechanical objects). We asked the participants to name the sound and then to rate, from 1 to 7, how representative the sound was of its category. We selected only the stimuli that were recognized with at least 80% accuracy, and among those, we choose for each category the 3 most representative sounds for a total of 24 acoustical stimuli in the final set (see Supplementary file 1). All sounds were collected from the database Freesound (https://freesound.org), except for the human vocalizations that were recorded in the lab. The sounds were edited and analysed using the softwares Audacity (http://www.audacityteam.org) and Praat (http://www.fon.hum.uva.nl/praat/). Each mono-sound (44,100 Hz sampling rate) was 2 s long (100msec fade in/out) and amplitude-normalized using root mean square (RMS) method.

The final acoustic stimulus set included 24 sounds from 8 different categories (human vocalization, human non-vocalization, birds, mammals, tools, graspable objects, environmental scenes, big mechanical objects) that could be reduced to 4 superordinate categories (human, animals, manipulable objects, big objects/places) (see Figure 1 and Supplementary file 1).

We created a visual version of the stimuli set. The images for the visual experiment were colored pictures collected from Internet and edited using GIMP (https://www.gimp.org). Images were placed on a gray (129 RGB) 400 × 400 pixels background.

Procedure

Request a detailed protocol

The experimental session was divided into two parts: first the subjects underwent the fMRI experiment and then they performed a behavioral rating judgment task on the same stimuli used in the fMRI experiment.

Similarity rating

Request a detailed protocol

The behavioral experiment aimed to create individual behavioral dissimilarity matrices to understand how the participants perceived the similarity of our stimulus space. Due to practical constraints, only a subset of our participants underwent the behavioral experiment (15 EBa, 11 SCa, and 9 SCv). We created each possible pair from the 24 stimuli set leading to a total of 276 pairs of stimuli. In the auditory experiment, participants heard each sound of a pair sequentially and were asked to judge from 1 to 7 how similar the two stimuli producing these sounds were. In the visual experiment, we presented each pair of stimuli on a screen to the participants and we asked them to judge from 1 to 7 how similar the two stimuli were. Since their rating was strongly based on the categorical features of the stimuli, we used the data from the behavioral experiment to build the categorical models for the representational similarity analysis (see the section ‘Representational similarity analysis: correlation with representational low-level/behavioral models’).

fMRI experiment

Request a detailed protocol

Each participant took part in only one experiment, either in the auditory or in the visual version. We decided to include two separate groups of sighted people, one for each modality, for two crucial reasons. First, we wanted to limit as much as possible the possibility of triggering mental imagery from one modality to the other. Second, since cross-group comparisons of representational dissimilarity analyses represent a core component of our analyses stream, we wanted to ensure a cross-group variance comparable between the blind versus the sighted and the sighted in audition versus the sighted in vision.

The procedure for the two experiments was highly similar (Figure 1A).

Before entering the scanner, all the stimuli (either auditory or visual) were presented to each participant to ensure perfect recognition. In the fMRI experiment each trial consisted of the same stimulus repeated twice. Rarely (8% of the occurrences), a trial was made up of two different consecutive stimuli (catch trials). Only in this case participants were asked to press a key with the right index finger if the second stimulus belonged to the living category and with their right middle finger if the second stimulus belonged to the non-living category. This procedure ensured that the participants attended and processed the stimuli. In the auditory experiment, each pair of stimuli lasted 4 s (2 s per stimulus) and the inter-stimulus interval between one pair and the next was 2 s long for a total of 6 s for each trial (Figure 1A). In the visual experiment, each pair of stimuli lasted 2 s (1 s per stimulus) and the inter-stimulus interval between one pair and the next was 2 s long for a total of 4 s for each trial (Figure 1A).

The use of a ‘‘quick’’ event-related fMRI paradigm balances the need for separable hemodynamic responses and the need for presenting many stimuli in the limited time-span of the fMRI experiment. Within both the auditory and the visual fMRI sessions, participants underwent five runs. Each run contained 3 repetitions of each of the 24 stimuli, eight catch trials and two 20s-long rest periods (one in the middle and another at the end of the run). The total duration of each run was 8 min and 40 s for the auditory experiment and 6 min for the visual experiment. For each run, the presentation of trials was pseudo-randomized: two stimuli from the same category were never presented in subsequent trials. The stimulus delivery was controlled using the Psychophysics toolbox implemented in Matlab R2012a (The MathWorks).

fMRI data acquisition and preprocessing

Request a detailed protocol

We acquired our data on a 4T Bruker Biospin MedSpec equipped with an eight-channel birdcage head coil. Functional images were acquired with a T2*-weighted gradient-recalled echo-planar imaging (EPI) sequence (TR, 2000 ms; TE, 28 ms; flip angle, 73°; resolution, 3 × 3 mm3; 30 transverses slices in interleaved ascending order; 3 mm slice thickness; field of view (FoV) 192 × 192 mm2). The four initial scans were discarded for steady-state magnetization. Before each EPI run, we performed an additional scan to measure the point-spread function (PSF) of the acquired sequence, including fat saturation, which served for distortion correction that is expected with high-field imaging.

A structural T1-weighted 3D magnetization prepared rapid gradient echo sequence was also acquired for each subject (MP-RAGE; voxel size 1 × 1×1 mm3; GRAPPA acquisition with an acceleration factor of 2; TR 2700 ms; TE 4,18 ms; TI (inversion time) 1020 ms; FoV 256 mm; 176 slices).

To correct for distortions in geometry and intensity in the EPI images, we applied distortion correction on the basis of the PSF data acquired before the EPI scans (Zeng and Constable, 2002). Raw functional images were pre-processed and analyzed with SPM8 (Welcome Trust Centre for Neuroimaging London, UK (https://www.fil.ion.ucl.ac.uk/spm/software/spm8/) implemented in MATLAB R2013b (MathWorks). Pre-processing included slice-timing correction using the middle slice as reference, the application of temporally high-pass filtered at 128 Hz and motion correction.

Regions of interest

Request a detailed protocol

Since we were interested in the brain representation of different categories we decided to focus on the ventro-occipito temporal cortex as a whole. This region is well known to contain several distinct macroscopic brain regions known to prefer a specific category of visual objects including faces, places, body parts, small artificial objects, etc. (Kanwisher, 2010). We decided to focus our analyses on a full mask of VOTC, and not in specific sub-parcels because we were interested in looking at the categorical representation across categories and not at the preference of a specific category compared to the others. Our study therefore builds upon the paradigm shift of viewing VOTC as a distributed categorical system rather than a sum of isolated functionally specific areas, which reframes how we should expect to understand those areas (Haxby et al., 2001). In fact, our main aim was to investigate how sensory input channel and visual experience impact on the general representation of different categories in the brain. Looking at one specific category-selective region at time would not allow us to address this specific question, since we would expect an imbalanced representation of the preferred category compared to the others. Indeed, to tackle our question, we need to observe the distributed representation of the categories over the entire ventral occipito-temporal cortex (Haxby et al., 2001). This approach has already been validated by previous studies that investigated the categorical representation in the ventral-occipito temporal cortex, using a wide VOTC mask, such as van den Hurk et al. (2017)Kriegeskorte et al. (2008b); Wang et al. (2015). We also added the early visual cortex (EVC) as a control node. We decided to work in a structurally and individually defined mask of VOTC using the Desikan-Killiany atlas (Desikan et al., 2006) implemented in FreeSurfer (http://surfer.nmr.mgh.harvard.edu). Six ROIs were selected in each hemisphere: Pericalcarine, Cuneus and Lingual areas were combined to define the early visual cortex (EVC) ROI; Fusiform, Parahippocampal and Infero-Temporal areas were combined to define the ventral occipito-temporal (VOTC) ROI. Then, we combined these areas in order to obtain one bilateral EVC ROI and one bilateral VOTC ROI (Figure 3A).

Our strategy to work on a limited number of relatively large brain parcels has the advantage to minimize unstable decoding results collected from small regions (Norman et al., 2006) and reduce multiple comparison problems intrinsic to neuroimaging studies (Etzel et al., 2013). All analyses, except for the topographical selectivity map (see below), were carried out in subject space for enhanced anatomico-functional precision and to avoid spatial normalization across subjects.

General linear model

Request a detailed protocol

The pre-processed images for each participant were analyzed using a general linear model (GLM). For each of the five runs, we included 32 regressors: 24 regressors of interest (each stimulus), 1 regressor of no-interest for the target stimulus, six head-motion regressors of no-interest and one constant. From the GLM analysis, we obtained a β-image for each stimulus (i.e. 24 sounds) in each run, for a total of 120 (24 stimuli x five runs) beta maps.

Topographical selectivity map

Request a detailed protocol

For this analysis, we needed all participants to be coregistered and normalized in a common volumetric space. To achieve maximal accuracy, we relied on the DARTEL (Diffeomorphic Anatomical Registration Through Exponentiated Lie Algebra; Ashburner, 2007) toolbox. DARTEL normalization takes the gray and white matter templates from each subject to create an averaged template based on our own sample that will be used for the normalization. The creation of a study-specific template using DARTEL was performed to reduce deformation errors that are more likely to arise when registering single subject images to an unusually shaped template (Ashburner, 2007). This is particularly relevant when comparing blind and sighted subjects given that blindness is associated with significant changes in the structure of the brain itself, particularly within the occipital cortex (Dormal et al., 2016; Jiang et al., 2009; Pan et al., 2007; Park et al., 2009).

To create the topographical selectivity map (Figure 1B), we extracted in each participant the β-value for each of our four main conditions (animals, humans, manipulable objects and places) from each voxel inside the VOTC mask and we assigned to each voxel the condition producing the highest β-value (winner takes all). This analysis resulted in specific clusters of voxels that spatially distinguish themselves from their surround in terms of selectivity for a particular condition (van den Hurk et al., 2017; Striem-Amit et al., 2018a).

Finally, to compare how similar are the topographical selectivity maps in the three groups we followed, for each pair of groups (i.e. 1.SCv-EBa; 2.SCv-SCa; 3.EBa-SCa) these steps: (1) We computed the Spearman’s correlation between the topographical selectivity map of each subject from Group one with the averaged selectivity map of Group two and we compute the mean of these values. (2) We computed the Spearman’s correlation between the topographical selectivity map of each subject from Group two with the averaged selectivity map of Group one and we computed the mean of these values. (3) We averaged the two mean values obtained from step 1 and step 2, in order to have one mean value for each group comparison (see the section ‘Statistical analyses’ for details about the assessment of statistical differences).

We ran this analysis using the four (Figure 1B) and the eight categories (see Figure 1—figure supplement 1) and both analyses lead to almost identical results. We decided to represent the data of the four main categories for simpler visualization of the main effect (topographical overlap across modalities and groups).

In order to go beyond the magnitude of the correlation between the topographical selectivity maps and to explore the quality of the (dis)similarity between the topographical maps of our subjects and groups we computed the Jaccard index between them. The Jaccard similarity coefficient is a statistic used for measuring the similarity and diversity of sample sets (Devereux et al., 2013; Xu et al., 2018). The Jaccard coefficient is defined as the size of the intersection divided by the size of the union of the sample sets. This value is 0 when the two sets are disjoint, 1 when they are equal, and between 0 and 1 otherwise.

First, we looked at the consistency of the topographical representation of the categories across subjects within the same group. For this within group analysis we computed the Jaccard similarity between the topographical selectivity map of each subject and the mean topographical map of his own group. This analysis produces 4 Jaccard similarity indices (one for each of the main category: (1) animals, (2) humans, (3) manipulable objects and (4) big objects and places) for each group (see Figure 2B). Low values for one category within a group mean that the topographical representation of that category varies a lot across subjects of that group. Statistical significance of the Jaccard similarity index within groups was assessed using parametric statistics: One sample t-tests to assess the difference against zero and a repeated measure ANOVA (4 categories * 3 groups) to compare the different categories and groups.

In addition, we computed the Jaccard similarity index for each category, between the topographical map of each blind and sighted subject in the auditory experiment (EBa and SCa) with the averaged topographical selectivity map from the sighted in the visual experiment (SCv; see Figure 2C for the results). In more practical terms, this means that we have 4 Jaccard similarity indices (one for each of the main category: (1) animal, (2) human, (3) manipulable objects and (4) big objects and places) for each of the two groups’ pairs: EBa-SCv and SCa-SCv. These values can help in explore more in detail the similarity and differences among the topographical representations of our categories when they are presented visually compared to when they are presented acoustically, both in sighted and in blind. Statistical significance of the Jaccard similarity index between groups was assessed using parametric statistics: one sample T-tests to test the difference against zero, a repeated measure ANOVA (4 categories * 2 groups) to compare the different categories and groups. The Greenhouse-Geisser sphericity correction was applied.

However, the Jaccard similarity values could be partially driven by the number of voxels that show selectivity for each category. For instance, the absence of overlap between the voxels that prefer animals in EBa and SCv could be explained by the fact that different voxels prefer animals in the two groups or by the fact that one or both groups have a limited number of voxels that show a preference for this category. To disentangle these two possibilities we counted in each group the number of voxels within VOTC that prefers each category (see Figure 2A for the results). We ended up with four values (one number for each category) for each group. Statistical significance of the number of voxels within groups was assessed using parametric statistics: One sample t-tests to assess the difference against zero, a repeated measure ANOVA (four categories * three groups) to compare the different categories and groups.

MVP-classifications: Binary decoding

Request a detailed protocol

We performed a binary MVP-classification (using SVM - support vector machine classifier) to look at the ability of each ROI to distinguish between two categories at time. With eight categories we can have 28 possible pairs, resulting in 28 binary MVP-classification tests in each ROI. Statistical significance of the binary classification was assessed using t-test against the chance level. We, then, averaged the 28 accuracy values of each subject in order to have one mean accuracy value for subject. Statistical significance of the averaged binary classification was assessed using parametric statistics: t-test against zero and ANOVA.

Representational similarity analysis (RSA): Correlation between neural dissimilarity matrices of the three groups

Request a detailed protocol

We further investigated the functional profile of the ROIs using RSA. RSA was performed using the CoSMoMVPA toolbox, implemented in Matlab (r2013b; Matworks). The basic concept of RSA is the dissimilarity matrix (DSM). A DSM is a square matrix where the number of columns and rows corresponds to the number of the conditions (8 × 8 in this experiment) and it is symmetrical about a diagonal of zeros. Each cell contains the dissimilarity index between the two stimuli. We used the binary MVP-classification as dissimilarity index to build neural DSMs (Carlson et al., 2013; Cichy and Pantazis, 2017; Cichy et al., 2013; Dobs et al., 2019; Haxby et al., 2014; Haxby et al., 2011; O'Toole et al., 2005; Pereira et al., 2009; Proklova et al., 2019) for each group, in order to compare the functional profile of the ROIs between the three groups. In this way, we ended up with a DSM for each group for every ROI.

We preferred to use binary MVP-classification as dissimilarity index to build neural DSMs rather than other types of dissimilarity measures (e.g. Pearson correlation, Euclidean distance, Spearman correlation) since two experimental conditions that do not drive a response and therefore have uncorrelated patterns (noise only, r ≈ 0) appear very dissimilar (1 – r ≈ 1). When using a decoding approach instead, due to the intrinsic cross-validation steps, we would find that the two conditions that don’t drive responses are indistinguishable, despite their substantial correlation distance (Walther et al., 2016) since the noise is independent between the training and testing partitions, therefore cross-validated estimates of the distance do not grow with increasing noise. This was crucial in our study since we are looking at brain activity elicited by sounds in brain regions that are primarily visual (EVC and VOTC), therefore the level of noise is expected to be high, at least in sighted people.

Finally, to compare how similar are the DSMs in the three groups, for each pair of groups (i.e. 1.SCv-EBa; 2.SCv-SCa; 3.EBa-SCa) (1) we computed the Spearman’s correlation between the upper triangular DSM (excluding the diagonal) of each subject from Group one with the averaged upper triangular DSM (excluding the diagonal) of Group two and we compute the mean of these values. (2) We computed the Spearman’s correlation between the upper triangular DSM (excluding the diagonal) of each subject from Group two with the averaged upper triangular DSM (excluding the diagonal) of Group one and we computed the mean of these values. (3) We averaged the two mean values obtained from step 1 and step 2, in order to have one mean value for each groups’ comparison.

Considering the unidirectional hypothesis for this test (a positive correlation between neural similarity and models similarity) and the difficult interpretation of a negative correlation, one-tailed statistical tests were used. For all other tests (e.g., differences between groups), for which both directions might be hypothesized, two-tailed tests were applied (Peelen et al., 2014; Evans et al., 2019).

See the section ‘Statistical analyses’ for details about the assessment of statistical differences.

Hierarchical clustering analysis on the brain categorical representations

Request a detailed protocol

In order to go beyond the correlation values and to explore more qualitatively the representational structure of VOTC and EVC in the three groups, we implemented a hierarchical clustering approach (King et al., 2019). First, we created a hierarchical cluster tree for each brain DSM using the 'linkage’ function in Matlab, then we defined clusters from this hierarchical cluster tree with the ‘cluster’ function in Matlab. Hierarchical clustering starts by treating each observation as a separate cluster. Then, it identifies the two clusters that are closest together, and merges these two most similar clusters. This continues until all the clusters are merged together or until the clustering is ‘stopped’ to a n number of clusters. We repeated the clustering three times for each DSM, stopping it at 2, 3 and 4 clusters, respectively (see Figure 5). In this way, we could compare the similarities and the differences of the clusters at the different scales across the groups. We applied the same clustering analysis also on the behavioral data (see Figure 5—figure supplement 1).

Representational similarity analysis (RSA): correlation with representational low-level/behavioral models

Request a detailed protocol

We then intended to investigate which features of the visual and auditory stimuli were represented in the different ROIs of sighted and blind subjects. RSA allows the comparisons between the brain DSMs extracted from specific ROIs with representational DSMs, based on physical properties of the stimuli or based on behavioral rating of the perceived categorical similarity of our stimuli.

Low-level DSM in the auditory experiment: pitch DSM

Request a detailed protocol

Pitch corresponds to the perceived frequency content of a stimulus. We selected this specific low-level auditory feature for two reasons. First, previous studies showed that this physical property of the sounds is distinctly represented in the auditory cortex and may create some low-level bias of auditory category selective responses in the temporal cortex (Giordano et al., 2013; Leaver and Rauschecker, 2010; Moerel et al., 2012). Second, we confirmed with our own SCa group that, among alternative auditory RDMs based on separate acoustic features (e.g. Harmonicity on noise ratio, Spectral centroid), the pitch model correlated most with brain RDM extracted from the temporal cortex. This provided strong support that this model was maximally efficient in capturing the encoding of sounds based on acoustic features in auditory cortical regions (see Figure 6—figure supplement 1).

We computed a pitch value for each of the 24 auditory stimuli, using the Praat software and an autocorrelation method. This method extracts the strongest periodic component of several time windows across the stimulus and averages them to have one mean pitch value for that stimulus. The ‘pitch floor’ determines the size of the time windows over which these values are calculated in Praat. Based on a previous study, we chose a default pitch floor of 60 Hz (Leaver and Rauschecker, 2010). We then averaged the pitch values across stimuli belonging to the same category. Once we obtained one pitch value for each category, we built the DSM computing the absolute value of the pitch difference for each possible pairwise (see Figure 6A). The pitch DSM was not positively correlated with the behavioral DSM of neither SCa (r=–0.36, p=0.06) nor EBa (r = –0.29, p=0.13).

Low-level DSM in the visual experiment: Hmax- C1 model

Request a detailed protocol

The Hmax model (Serre et al., 2007) reflects the hierarchical organization of the visual cortex (Hubel and Wiesel, 1962) in a series of layers from V1 to infero-temporal (IT) cortex. To build our low-level visual model we used the output from the V1- complex cells layer. The inputs for the model are the gray-value luminance images presented in the sighted group doing the visual experiment. Each image is first analysed by an array of simple cells (S1) units at 4 different orientations and 16 scales. At the next C1 layer, the image is subsampled through a local Max pooling operation over a neighbourhood of S1 units in both space and scale, but with the same preferred orientation (Serre et al., 2007). C1 layer stage corresponds to V1 cortical complex cells, which shows some tolerance to shift and size (Serre et al., 2007). The outputs of all complex cells were concatenated into a vector as the V1 representational pattern of each image (Khaligh-Razavi and Kriegeskorte, 2014; Kriegeskorte et al., 2008a). We averaged the vectors of images from the same category in order to have one vector for each category. We, finally, built the (8 × 8) DSM computing 1- Pearson’s correlation of each pair of vectors (see Figure 6A). The Hmax-C1 DSM was significantly correlated with the SCv behavioral DSM (r = 0.56, p=0.002).

Behavioral-categorical DSMs

Request a detailed protocol

We used the pairwise similarity judgments from the behavioral experiment to build the semantic DSMs. We computed one matrix for each subject that took part in the behavioral experiment and we averaged all the matrices of the participants from the same group to finally obtain three mean behavioral-categorical DSMs, one for each group (i.e. EBa, SCa, SCv; Figure 4A). The three behavioral matrices were highly correlated between them (SCv-EBa: r = 0.89, p<0.001; SCv-SCa: r = 0.94, p<0.001; EBa-SCa: r = 0.85, p<0.001), and the similarity judgment was clearly performed on a categorical-membership basis (Figure 6A).

The last step consisted in comparing neural and external DSMs models using a second order correlation. Because we wanted to investigate each representational model independently from the other, we relied on Spearman's rank partial correlation: in the auditory experiment, we removed the influence of the pitch similarity when we were computing the correlation with the behavioral matrix, and vice versa; in the visual experiment, we removed the influence of the Hmax-C1 model similarity, when we were computing the correlation with the behavioral matrix, and vice versa. In this way, we could measure the partial correlation for each external model for each participant separately. Importantly, we did not correlate the full symmetrical DSMs but only the upper triangular DSM excluding the diagonal.

See the section ‘Statistical analyses’ for details about the assessment of statistical differences.

RSA: Inter-subjects correlation

Request a detailed protocol

To examine the commonalities of the neural representational space across subjects in VOTC, we extracted the neural DSM of every subject individually and then correlated it with the neural DSM of every other subject. Since we have 49 participants in total, this analysis resulted in a 49 × 49 matrix (Figure 7) in which each line and column represents the correlation of one subject’s DSM with all other subjects’ DSM. The three main squares in the diagonal (Figure 7) represent the within group correlation of the 3 groups. We averaged the value within each main square (only the upper half excluding the diagonal) on the diagonal to obtain a mean value of within group correlation for each group. The three main off diagonal squares (Figure 7) represent the between groups correlation of the three possible groups’ pairs (i.e. 1.SCv/EBa; 2.SCv-SCa; 3.EBa-SCa). We averaged the value within each main off-diagonal square in order to obtain a mean value of between groups correlation for each groups’ pair.

See the section ‘Statistical analyses’ for details about the assessment of statistical differences.

Representational connectivity analysis

Request a detailed protocol

Representational connectivity analysis were implemented to identify the representational relationship among the ROIs composing VOTC and the rest of the brain (Kriegeskorte et al., 2008a; Pillet et al., 2018). This approach can be considered a type of connectivity where similar RDMs of two ROIs indicate shared representational structure and therefore is supposed to be a proxy for information exchange (Kriegeskorte et al., 2008b). Representational connectivity between two ROIs does not imply a direct structural connection but can provide connectivity information from a functional perspective, assessing to what extent two regions represent information similarly (Xue et al., 2013).

To perform this analysis, we included 30 bilateral parcels (covering almost the entire cortex) extracted from the segmentation of individual anatomical scan following the Desikan-Killiany atlas implemented in FreeSurfer (http://surfer.nmr.mgh.harvard.edu). We only excluded three parcels (Entorhinal cortex, Temporal Pole and Frontal Pole) because their size was too small and signal too noisy (these regions are notably highly susceptible to signal drop in EPI acquisition) to allowed the extraction of reliable dissimilarity matrices in most of the participants. We merged together the left and right corresponding parcels in order to have a total of 30 bilateral ROIs for each subject. From each ROI we extracted the dissimilarity matrix based on binary decoding accuracies, as described in the section ‘Representational similarity analysis (RSA): Correlation between neural dissimilarity matrices of the three groups’. Finally, we computed the Spearman’s correlation between the three seed ROIs (i.e. fusiform gyrus, parahippocampal gyrus and infero-temporal cortex) and all the other 27 ROIs. We ended up with a connectivity profile of 3 (number of seeds) by 27 (ROIs representing the rest of the brain) for each subject. We considered this 3*27 matrix as one representational connectivity profile of the seed region in each subject.

Finally, to compare how similar are the RSA connectivity profiles in the three groups, for each pair of groups (i.e. 1.SCv-EBa; 2.SCv-SCa; 3.EBa-SCa): (1) We computed the Spearman’s correlation between the representational connectivity profile of each subject from Group one with the averaged representational connectivity profile from Group two and we compute the mean of these values. (2) We computed the Spearman’s correlation between the representational connectivity profile of each subject from Group two with the averaged representational connectivity profile of Group one and we computed the mean of these values. (3) We averaged the two mean values obtained from step 1 and step 2, in order to have one mean value for each group comparison.

We computed the same analysis also in the EVC, as a control ROI. In this case the three nodes ROIs were the pericalcarine cortex, the cuneus and the lingual gyrus.

See the section ‘Statistical analyses’ for details about the assessment of statistical differences.

Statistical analyses

Request a detailed protocol

To assess statistical differences, we applied parametric tests (T-Test and ANOVA) in the analyses that met the main assumptions required by parametric statistics: normal distribution of the data and independency of the observations. Moreover, in case of statistical comparisons between different groups we ran the Levene’s test to check for the assumption of equality of variances between the groups, in case the test was significant (suggesting different levels of variance) we applied the Welch Homogeneity correction. Parametric tests were used in the Jaccard similarity analyses (both within and between groups), in the analysis on the selective voxel’s count and in the averaged binary decoding analysis. In all these analyses the correlation data were z-transformed before subjecting them to parametric statistics.

However, the correlation of the topographical selectivity maps, the correlation of the brain dissimilarity matrices, the correlation of the RSA connectivity profiles and the inter-subject DSMs correlation did not meet the assumption of independency of the data. In fact, in these analyses we contrast group’s comparisons, so data from the same subjects are always included in two comparisons (e.g. data from EBa subjects are included both in the SCv-EBa and in EBa-SCa comparisons). For this reason, the use of permutation was a preferable approach in the case of these analyses. In each of this analysis we have one vector of values for each subject in each ROI (i.e. The vector containing the categorical selectivity label of each voxel; The brain dissimilarity matrix in the format of pairwise distance vector; The vectorized RSA connectivity profile). In each analysis we want to correlate these values between each possible pair of groups, which are three in total: SCv-EBa; SCv-SCa; EBa-SCa. To compute the average correlation value between each pair of groups we followed these steps: (1) We computed the Spearman’s correlation between the vector of each subject from Group one with the mean vector of Group two and we computed the mean of these values (e.g. we correlated the vector from each EBa subject with the mean vector from the SCv group). (2) We computed the Spearman’s correlation between the vector of each subject from Group two with the mean vector of Group one and we computed the mean of these values (e.g. we correlated the vector from each SCv sub. with the mean vector from the EBa group). (3) We averaged the two mean values obtained from step 1 and step 2, in order to have one mean value for each group comparison. Since our data points are not completely independent, we cannot use parametric statistics (Parsons et al., 2018). Therefore, to test statistical differences we used a permutation test (10.000 iterations): (4) We randomly permuted the conditions of the vector of each subject from Group 1 and of the mean vector of Group 2 and we computed the correlation (as in Step 1). (5) We randomly permuted the conditions of the vector of each subject from Group 2 and of the mean vector of Group 1 and we computed the correlation (as in Step 2). (6) We averaged the 2 mean values obtained from step 4 and step 5. (7) We repeated these steps 10.000 times to obtain a distribution of correlations simulating the null hypothesis that the two vectors are unrelated (Kriegeskorte et al., 2008b). If the actual correlation falls within the top α ×100% of the simulated null distribution of correlations, the null hypothesis of unrelated vectors can be rejected with a false-positives rate of α.

Only in the case of the correlation of topographical maps, we constrained the permutation performed in the step five in order to take into consideration the inherent smoothness/spatial dependencies in the univariate fMRI data. In each subject, we individuated each cluster of voxels showing selectivity for the same category and we kept these clusters fixed in the permutation, assigning randomly a condition to each of these predefined clusters. In this way, the spatial structure of the topographical maps was kept identical to the original one, making very unlikely that a significant result could be explained by the voxels’ spatial dependencies. We may however note that this null-distribution is likely overly conservative since it assumes that size and position of clusters could be created only from task-independent spatial dependencies (either intrinsic to the acquisition or due to smoothing). We had to exclude one EBa subject from the analysis because he had less than seven clusters in his topographical map, which is not enough to have 10000 combinations needed for the permutation given our four categories tested (possible combinations = n_categoriesn_clusters; 47 = 16384).

To test the difference between the group pairs’ correlations (e.g. to test if the correlation between SCv and EBa was different from the correlation of SCv and SCa), we used a permutation test (10.000 iterations): (8) We computed the difference between the correlation of Pair one and Pair 2: mean correlation Pair1 – mean correlation Pair2. (9) We kept fixed the labels of the group common to the two pairs and we shuffled the labels of the subjects from the other two groups (e.g. if we are comparing SCv-EBa versus SCv-SCa, we keep the SCv group fixed and we shuffle the labels of EBa and SCa). (10) After shuffling the groups’ labels we computed again the point 1-2-3 and 8. (11) We repeated this step 10.000 times to obtain a distribution of differences simulating the null hypothesis that there is no difference between the two pairs’ correlations. If the actual difference falls within the top α ×100% of the simulated null distribution of difference, the null hypothesis of absence of difference can be rejected with a false-positives rate of α.

Finally, also the RSA with representational low-levels and behavioral models did not meet the assumption of independency of the data. In fact, for dissimilarity matrices the independence of the samples cannot be assumed, because each similarity is dependent on two response patterns, each of which also codetermines the similarities of all its other pairings in the RDM (Kriegeskorte et al., 2008b). For each group, the statistical difference from zero was determined using permutation test (10000 iterations), building a null distribution for these correlation values by computing them after randomly shuffling the labels of the matrices. Similarly, the statistical difference between groups was assessed using permutation test (10000 iterations) building a null distribution for these correlation values by computing them after randomly shuffling the group labels. Moreover, considering the unidirectional hypothesis for this test (a positive correlation between neural similarity and models similarity) and the difficult interpretation of a negative correlation, one-tailed statistical tests were used. For all other tests (e.g., differences between groups), for which both directions might be hypothesized, two-tailed tests were used (Peelen et al., 2014; Evans et al., 2019).

In each analysis, all the p-values are reported after false discovery rate (FDR) correction implemented using the matlab function ‘mafdr’.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
    Plasticity in Sensory Systems
    1. O Collignon
    2. G Dormal
    3. F Lepore
    (2012)
    Building the Brain in the Dark: Functional and Specific Crossmodal Reorganization in the Occipital Cortex of Blind Individuals, Plasticity in Sensory Systems, Cambridge University Press, 10.1017/CBO9781139136907.007.
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
  87. 87
  88. 88
  89. 89
  90. 90
  91. 91
  92. 92
  93. 93
  94. 94
  95. 95
  96. 96
  97. 97
  98. 98
  99. 99
  100. 100
  101. 101
  102. 102
  103. 103
  104. 104
  105. 105
  106. 106
  107. 107
  108. 108
  109. 109
  110. 110
  111. 111
  112. 112
  113. 113
  114. 114
  115. 115
  116. 116
  117. 117
  118. 118
  119. 119
  120. 120
  121. 121
  122. 122
  123. 123
  124. 124
  125. 125
  126. 126
  127. 127
  128. 128
  129. 129
  130. 130

Decision letter

  1. Tamar R Makin
    Reviewing Editor; University College London, United Kingdom
  2. Barbara G Shinn-Cunningham
    Senior Editor; Carnegie Mellon University, United States
  3. Tamar R Makin
    Reviewer; University College London, United Kingdom

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The study adds important support, using brain decoding, to the theory that the visual cortex in early blindness processes similar category boundaries as in vision. This is an extremely well designed study and extends beyond previous research in the field by adopting a substantially richer stimulus space, and by carefully considering alternative contributions to the stimulus domains. Together with detailed and thoughtful analysis, the study provides a comprehensive account of the similarities and differences of categorical representation in the blind ventral occipito-temporal cortex.

Decision letter after peer review:

Thank you for submitting your article "Similar categorical representation from sound and sight in the occipito-temporal cortex of sighted and blind" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Tamar R Makin as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Barbara Shinn-Cunningham as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary

This study is aimed at assessing categorical representational similarities between sighted and blind individuals in the early and ventral occipitotemporal visual cortex. The authors used sounds of objects, animals, humans and scenes to construct a representational structure in the blinds visual cortex and compared that to the representational structure evoked by images and sounds to the same categories in sighted individuals. They showed that the blind response pattern and response pattern connectivity for such sounds is broadly similar to that found in sighted using images in the VOTC, and that this representational structure didn't merely reflect auditory pitch similarity or low-level visual processing. Additionally, they showed that the inter-subject consistency is higher in the blind for sounds than in the sighted for the same sounds, but lower than for sight. Overall the study adds important support, using RSA, to the theory that the visual cortex in early blindness processes similar category boundaries as in vision. This is an extremely well designed study and extends beyond previous research in the field by adopting a richer stimulus space and more detailed analyses that provide a more comprehensive account of the similarities and differences between the different groups and stimulus domains.

All three reviewers were particularly enthusiastic about the experimental design and were excited towards the opportunity of developing a more comprehensive understanding of the similarities and differences between blind and sighted individuals. In that respect, we felt that although interesting, the results highlighting the similarities across groups shouldn't obstruct the authors from exploring potential differences between groups. In addition, multiple questions were raised relating to the specific statistical tests and ROI definition, which required further justification or better clarity of the test specifics. In some cases, the manuscript would benefit from more mainstream/standardised analyses, as there were concerns that the current approaches were at times inflating the effects/their significance. The consensus across reviewers was that we are interested in a more nuanced understanding of your findings, even if this might potentially weaken some of your key conclusions (i.e. similar categorical representation from sound and sight in the occipito-temporal cortex of sighted and blind).

Major comments

1) The reviewers felt that there are some missed opportunities here to explore beyond the simple DSM correlations. One reviewer suggested more formal model comparisons, another suggested interrogating the representation structure with more detail. I'm appending the specific comments below:

Reviewer 1: In Abstract and elsewhere (e.g. “the representational structure of visual and auditory categories was almost double in the bind when compared to the sighted group") – The authors are confusing the R with the effect size (R square) which is actually almost four times greater. In general – they might like to consider more formal comparison across models (PCM or equivalent), while accounting for noise levels (if the inter-subject variance is greater than the correlation is likely to be lower).

Reviewer 3: The authors suggest that "VOTC reliably encodes sounds categories in blind people using a representational structure strikingly similar to the one found in vision". This conclusion is based on the magnitude of correlations between DSMs (e.g. Figure 3). However, I think it is important to go beyond the significant correlations and look carefully at the representational structure (e.g. see King et al., 2019). For example, while the dendrogram for SCv shows an initial split between the animate and inanimate items, for EBa the initial split is for the human conditions (HV, HN) from all others, and the animal conditions (AM, AM) and more closely grouped with BM and MG. Thus, the high correlation between SCv and EBa actually belies some differences in the representational structure that I think are worth commenting on. Similar comparison can be made for EVC and for SCa. So, I recommend the authors consider the representational structure in more detail and probably should draw back slightly from the claim that the representational structure in EBa and SCv are "strikingly similar".

2) One of the reviewers was particularly concerned about the usage of group means to characterise differences between groups, without adequately representing inter-subject variance within each group. In some analyses (e.g. the topographical selectivity map) the mean is taken as the sole group statistics (as a fixed effect), making the analysis highly susceptible to outliers and precluding generalisation to other samples. In other analyses (e.g. DSMs) a permutation test is included to determine group differences; however the specifics of this permutation tests are not well explained and require more thought – does this permutation test adequately captures the variance of the categorical representation between sighted and blind individuals? Or are the authors potentially confusing their units of analysis (see Parsons et al., eLife, 2018)? Ideally, variance should be assessed by measuring the effect in each individual participant and comparing the effect across groups (indeed this approach is used in some of the tests in the current study). In the case of DSM, each participant (e.g. from the EBa group) should be correlated to the SCv group, generating a distribution of the effect of interest. To an extent, a similar analysis is presented in Figure 5, providing much lower effect sizes than highlighted throughout the paper. So, we are looking for a more considered quantification of the main effects. A lot of confusion/misinterpretation could be avoided if the authors used standard statistics that takes into consideration the inter-subject variance, which is the key unit of your analysis (e.g. t-test and its variants). If the authors feel that their own approach is preferable, then they should increase the clarity on their analysis and its validity, with a clear statement of what null is.

3) A few of the reviewers were commenting on inconsistent statistical criteria, with the significance test flipping between one-tail to two-tails. For example, correlations between DSM were tested for significance using a one-tailed permutation test. This resulted in identifying a very strong negative correlation between the SCv and SCa in the early visual cortex (r = -.50), as non-significant, where in fact this effect is stronger than the correlation presented between SCv and EBa DSMs – r=.41, p=0.03). Similarly, the SCv DSM in the early visual cortex may be significantly negatively correlated to the behavioral similarity ratings (r = -0.09; the in sighted EVC DSM is reported to be significantly correlated with both the behavioural DSMs at r=.09, p=0.01). This could be an interesting finding, e.g. if cross-modal inhibition is categorically selective (allowing for decoding of sound categories in EVC in another study; (Vetter et al., 2014)). This finding could also add to the mechanistic explanation of the plasticity in the blind. But at present it is uninterpretable. We therefore request that one-tail testing would be avoided, unless there's strong statistical justification to usage it (e.g. in case of a replication of previous findings). In that case, the reasons for the one-tail hypothesis should be reported more transparently and interpreted more cautiously. But if the authors are confirming a new hypothesis, which we believe is the case for most of the tests reported, we'd encourage them to stick to two-tailed hypothesis testing. The authors are welcomed to consider correction for multiple comparisons, but again, we are looking for consistency throughout the manuscript and justification of the criteria used (or abandoned).

4) The focus on VOTC, and the specific definition of the ROIs requires further consideration. In particular the reviewers mentioned the lack of discrimination within it, limited coverage of the ventral stream, and lack of consideration of other areas which might be involved in categorical representation, e.g. high order auditory areas or other visual areas in the lateral occipital cortex (see also comment 6 below). Further exploratory analysis, focused by previous research on similar topics, would strengthen the interoperability of the results. A searchlight approach might be particularly helpful, though we leave it to the authors to decide on the specific methodology.

5) The topographic selectivity analysis raised multiple comments from the reviewers. Beyond the issue raised in comment 2 above, it was agreed that this analysis potentially reveals important differences between the groups that are not adequately captured in the current presentation and analysis. For example, Reviewer 2 mentions: "the blind show nearly no preference for animal sounds in accordance with claims made by (Bi et al., 2016); in the sighted the human non-vocalizations are the preferred category for face selective areas in the visual cortex but in the auditory version (in both blind and sighted) there is mostly vocalization selectivity; The blind show little or no preference for the large-environmental sounds whereas the sighted show a differentiation between the two sound types". Reviewer 3 mentions: "In SCv, the medial ventral temporal cortex is primarily BIG-ENV, but in EBa, that same region is primarily BIG-MEC. Similarly, there appears to be more cortex showing VOC. In SCa than in either of the other two groups". We appreciate that some of the analysis might have been used as a replication of the Van den Hurk study, but we also expect them to stand alone, in terms of quality of analysis and interpretation. In particular, these topographical preference differences could be interesting and meaningful with relation to understanding blind visual cortex function, but it is hard to judge their strength or significance as the topographical selectivity maps are not thresholded, by activation strength or selectivity significance. In addition, we would like some further clarification of how the correlations have been constructed, if indeed a use-takes-all approach is used to label each voxel.

6) In the representational connectivity analysis, the authors may be overestimating the connectivity between regions due to contribution of intrinsic fluctuations between areas. To more accurately estimate the representational connectivity, it would be better if the authors used separate trials for the seed and target regions. See Henriksson et al., 2015, Neuroimage for a demonstration and discussion of this issue. Indeed, considering the clear differences found elsewhere between groups, the high between-group correlations are striking. It could have been informative to examine a control node (e.g. low-level visual cortex; low level auditory cortex) just to gain a better sense for what these correlations reveal. Also, in this section it is not clear why the authors use a separate ROI for each of the 3 VOTC sub-regions, rather than the combination of all three, as they have for the main analysis? As a minor point, the reviewer wasn't clear how the authors switched from 3 nodes for each group to only one comparison for each group pair.

https://doi.org/10.7554/eLife.50732.sa1

Author response

Major comments

1) The reviewers felt that there are some missed opportunities here to explore beyond the simple DSM correlations. One reviewer suggested more formal model comparisons, another suggested interrogating the representation structure with more detail. I'm appending the specific comments below:

Reviewer 1: In Abstract and elsewhere (e.g. 'the representational structure of visual and auditory categories was almost double in the bind when compared to the sighted group") – The authors are confusing the R with the effect size (R square) which is actually almost four times greater.

We thank the reviewers for highlighting this confusing statement. We agree that such claims were unclear and potentially leading to misunderstanding the outcome of correlation values and effect size. In order to avoid such confusion, we removed the sentence from the Abstract and we rephrased the sentence in the Discussion. The sentence now reads:

“The representational structure of visual categories in sighted was significantly closer to the structure of the auditory categories in blind than in sighted”.

In general – they might like to consider more formal comparison across models (PCM or equivalent), while accounting for noise levels (if the inter-subject variance is greater than the correlation is likely to be lower).

We thank the reviewers for this suggestion. We fully agree that the intersubject variance within a brain region and the consequent noise level within each group are important points to take into consideration when looking at the correlation with our models. The Pattern Component Modeling (PCM) is indeed a useful analysis to perform a formal comparison between models notably normalizing the results according to the noise ceiling (Diedrichsen et al., 2018).

However, we believe that the use of normalization according to the noise ceiling might not be the best approach in our study. A high noise ceiling is indicative of high inter-subject reliability in the representational structure of a region for our stimuli space. Therefore, if we normalize our data according to the noise ceiling, we would lose important information about the inter-subject variability itself and this could potentially bring to some erroneous conclusion. One illustrative example of a possible misinterpretation of the data after such normalization comes from the comparison of occipital and temporal brain regions in the sighted in the auditory experiment (SCa). As expected, in Author response image 1 we see much higher decoding of auditory categories in the temporal cortex (~75% decoding accuracy using mean binary decoding of our 8 categories) when compared to VOTC (~55% decoding accuracy using mean binary decoding of our 8 categories).

Author response image 1
MVP-classification results in the SCa group.

The 28 binary classification accuracies averaged are represented for the VOTC (in pink) and the temporal ROI (in green). Each circle represents one subject, the coloured horizontal line represents the group mean and the black vertical bars represent the standard error of the mean across subjects.

We can now go a step further and look at the RSA analysis in the TEMP and VOTC parcels. In Author response image 2, left side, you see a dot-plot graph representing the correlation of the behavioral model with the DSMs extracted from both VOTC and TEMP parcels in SCa. We see, again as expected, that the categorical model correlates more with the temporal DSM than with the VOTC DSM. In addition, the inter-subject correlation -therefore the noise ceiling- (represented by the grey line) is higher in the temporal region than in VOTC. This is an expected result, suggesting that the inter-subject variance in VOTC is higher (i.e. the noise ceiling is lower) compared to the one in the temporal region for the obvious reason that temporal regions likely have a more coherent response code across subjects given their prominent role in coding auditory information (including low-level features of the sounds that can be segregated from our different categories). Such effect has been extensively described by the work of Hasson et al., (Hasson et al., 2004, 2008), in which the authors reported a striking level of voxelby-voxel synchronization between individuals, especially in the sensory cortices involved in the processing of the stimuli at play (e.g. primary and secondary visual and auditory cortices during the presentation of visual and auditory video clips, respectively).

Author response image 2
Dot-plot graphs representing the correlation of the behavioral model with the DSMs extracted from both VOTC (pink) and a TEMP (green) parcels, in sighted for auditory stimuli (SCa).

Each dot represents one subject, the colored (pink and green) lines represent the group mean values. The vertical black bars represent the standard error of the mean across subjects. Left panel: Spearman’s correlation with the behavioural model is represented here. The grey lines represent the noise ceiling (e.g. inter-subject correlation in each ROI). High inter-subject correlation means low variance. Right panel: The Spearman’s correlation values are represented after normalization for the noise ceiling. Since the noise ceiling was obviously higher than the group mean correlation in both ROI, negative values were expected after normalization.

This information is, by itself, important. However, if we would normalize the results of the model comparisons according to these noise ceilings we would lose this information and we would even reverse the results of the correlation with the behavioural models (see Author response image 2, right side) and we would observe a higher correlation in VOTC than in the temporal lobe, which we believe would misleadingly suggest higher auditory categorical information in VOTC than in the temporal cortex if one did not consider the noise ceiling being much higher in the temporal cortex. As a side note, the negative values after the normalization were expected here, since in both ROIs the noise ceiling was higher compared to the group average correlation values.

We therefore think that the noise not only provide interesting information about the brain representations in the different groups but also, in complement to the correlation between the brain and model DSM, allows to evaluate fully and transparently the fine-grained structure of how a region codes the stimuli space. This is a point that we actually address more directly with the new inter-subjects analysis that we now ran to understand how variable is the brain representation in VOTC across subjects belonging either to the same group or to different groups (see new Figure 5 in the main manuscript for further details).

If we normalize the correlation between the representational similarity of VOTC and the behavioural model according to the noise ceiling of each group, we would indeed find a comparable amount of correlation in VOTC both for EBa and SCa (Author response image 3, right panel).

Author response image 3
Dot-plot graphs representing the correlation of the behavioral model with the DSMs extracted from VOTC in both SCa (pink) and a EBa (green) groups.

Each dot represents one subject, the colored (pink and green) lines represent the group mean values. The vertical black bars represent the standard error of the mean across subjects. Left panel: Spearman’s correlation with the behavioural model. The grey lines represent the noise ceiling (e.g. inter-subject correlation in each group). High inter-subject correlation means low variance. Right panel: The Spearman’s correlation values are represented after normalization for the noise ceiling. Since the noise ceiling was higher than the group mean correlation in both group, negative values were expected after normalization.

However, if we look at the data without the normalization (Author response image 3, left panel), we clearly see that there is a difference between the two groups, with the representation in SCa being much noisier and variable across subjects (therefore having a lower noise ceiling). We are convinced that, in case of normalization for noise ceiling, such absence of difference (following the reasoning we made above when comparing temporal and occipital regions) would be misleading.

For these reasons we think that the use of RSA, with the representation of both the correlation results and the noise ceiling (provided in each of our representation of RSA analyses) for each group and ROI is preferable for fully and transparently appreciating our results, also avoiding misleading the reader. Indeed, RSA provides a straightforward way of measuring the correspondence between the predicted (our models) and observed (the brain response) distances, in order to select the model that better explains the data in each group, given a certain level of noise.

In addition, there are also some methodological points that make RSA the best choice for our investigation (as highlighted by Diedrichsen and Kriegeskorte, 2017 in their work comparing PCM and RSA analyses). First of all, in case of a condition-rich design and for simple models such as it is the case in our study, RSA is much more computationally efficient compared than PCM. More importantly, the assumption of a linear relationship between predicted and measured representational dissimilarities (which is made by PCM), is not always a desirable choice for fMRI measurements since it might be violated in many cases (Driediechsen and Kriegeskorte, 2017). We now use the Spearman’s correlation to compare our models and the brain representations since this rank-correlation-based RSA provides a robust method without relying on a linear relationship between the predicted and measured dissimilarities (Driediechsen and Kriegeskorte, 2017).

Reviewer 3: The authors suggest that "VOTC reliably encodes sounds categories in blind people using a representational structure strikingly similar to the one found in vision". This conclusion is based on the magnitude of correlations between DSMs (e.g. Figure 3). However, I think it is important to go beyond the significant correlations and look carefully at the representational structure (e.g. see King et al., 2019). For example, while the dendrogram for SCv shows an initial split between the animate and inanimate items, for EBa the initial split is for the human conditions (HV, HN) from all others, and the animal conditions (AM, AM) and more closely grouped with BM and MG. Thus, the high correlation between SCv and EBa actually belies some differences in the representational structure that I think are worth commenting on. Similar comparison can be made for EVC and for SCa. So, I recommend the authors consider the representational structure in more detail and probably should draw back slightly from the claim that the representational structure in EBa and SCv are "strikingly similar".

We fully take this point and we followed the recommendation of the reviewers to go beyond the correlation values and to look more in details at the representational structure. This further exploration of our data is in line with the suggestion of toning down our claim about the similarity of the VOTC representational structure in EBa and SCv. In the Abstract, and elsewhere in the paper, we systematically substituted the wording “strikingly similar” with “partially similar”. More generally, we moved our attention from the similarity between the representational structure of our three groups towards highlighting some interesting between-groups differences.

To do so, we now use a hierarchical clustering approach similar to the one of King et al., 2019, as suggested by the reviewer. We describe the analysis in the new section of the Materials and Methods entitled Hierarchical clustering analysis on the brain categorical representations:

“In order to go beyond the correlation values and to explore more qualitatively the representational structure of VOTC and EVC in the 3 groups, we implemented a hierarchical clustering approach (King et al., 2019). […] We applied the same clustering analysis also on the behavioural data (see Figure 4—figure supplement 2)”.

Here are the results, reported in the paper under the section “Hierarchical clustering analysis on the brain categorical representation”:

“We implemented this analysis to go beyond the magnitude of correlation values and to qualitatively explore the representational structure of our ROIs in the 3 groups. […] In the EBa, instead, we find a different clustering structure with the animal and the human categories being separate“.

As now reported in the Discussion section, the results from the hierarchical clustering analysis, together with the Jaccard similarity analysis, highlight a domain-by-sensory experience interaction, with the animal category represented differently in blind compared to the sighted subjects.

As we now report in the Discussion:

“Previous studies already suggested that intrinsic characteristics of objects belonging to different categories might drive different representation in the VOTC of the blind (Wang et al., 2015; Bi et al., 2016).”

Compared to previous literature, our results highlight a further distinction within the animate category in early blind people. In this group, the animal category does not cluster together with the human stimuli both at behavioural and at brain level but tend to be assimilated to the inanimate categories. An explanation for this effect could be the different way blind and sighted individuals might have in perceiving and interacting with animals. In fact, if we exclude pets (only 1 out of the 6 animals we included in this study), sighted individuals normally perceive the animacy of animals (such as birds, donkey, horses etc.) mostly throughout vision (either in real life or in documentaries or videos). Blind people, instead, do normally learn the peculiar shape of each animal touching miniature, static, models of them. Moreover, when blind people hear the sounds of these animals without seeing them, they might combine these sounds with the rest of the environmental sounds, and this is indeed what we see in the behavioral ratings, in which only blind subjects cluster together animals and big environmental sounds.

2) One of the reviewers was particularly concerned about the usage of group means to characterise differences between groups, without adequately representing inter-subject variance within each group. In some analyses (e.g. the topographical selectivity map) the mean is taken as the sole group statistics (as a fixed effect), making the analysis highly susceptible to outliers and precluding generalisation to other samples. In other analyses (e.g. DSMs) a permutation test is included to determine group differences; however the specifics of this permutation tests are not well explained and require more thought – does this permutation test adequately captures the variance of the categorical representation between sighted and blind individuals? Or are the authors potentially confusing their units of analysis (see Parsons et al., eLife, 2018)? Ideally, variance should be assessed by measuring the effect in each individual participant and comparing the effect across groups (indeed this approach is used in some of the tests in the current study). In the case of DSM, each participant (e.g. from the EBa group) should be correlated to the SCv group, generating a distribution of the effect of interest. To an extent, a similar analysis is presented in Figure 5, providing much lower effect sizes than highlighted throughout the paper. So, we are looking for a more considered quantification of the main effects. A lot of confusion/misinterpretation could be avoided if the authors used standard statistics that takes into consideration the inter-subject variance, which is the key unit of your analysis (e.g. t-test and its variants). If the authors feel that their own approach is preferable, then they should increase the clarity on their analysis and its validity, with a clear statement of what null is.

We thank the reviewer for bringing this to our attention. We agree that in some of our statistical analyses we were not adequately considering the intersubject variance within each group. Following the recommendations made by the reviewers, we now implemented a different way of computing statistics. In general, the main point is that we do not use anymore, in any of the analyses, the group mean as a fixed effect, but we consider the variance measuring the effect in each individual participant.

To assess statistical differences, we now apply parametric tests (T-Test and ANOVA) in the analyses that met all the assumptions required by parametric statistics: normal distribution of the data, homogeneity of variances and independency of the observations. This was the case of the Jaccard similarity analyses (both within and between groups), of the voxels’ count analysis and of the averaged binary decoding analysis. However, in the rest of the cases we considered non-parametric statistic more appropriate since not meeting the criteria for parametric testing. Our choice of relying on non-parametric statistic (i.e. permutation analysis) is mainly driven by the most recent guidelines about the more appropriate way to run statistics on multivariate fMRI data (e.g. Stelzer et al., 2013). In this work the authors outline several theoretical points, supported by simulated data, according to which MVPA data might not respect some of the assumptions imposed by the t-test:

1) Normal distribution of the samples. One fundamental assumption of the t-test is that the samples need to be distributed normally, especially when the sample size is small. On the contrary, the unknown distribution of decoding accuracies is generally skewed and long-tailed and, in practice, depends massively on the classifier used and the input data itself.

2) Continuous distribution of the samples. A further assumption is that the underlying distribution from which samples are drawn should be continuous. This is not the case in accuracy values from decoding. In fact, the indicator function, which maps the number of correctly predicted labels to an accuracy value between 0 and 1, can only take certain values: for k cross-validation steps and a test set of size t, only k×t+1 different values between 0 and 1 can be taken.

3) Low variance of the samples is important for the t-test. Quite the opposite, the single subject accuracies are in general highly variable. In fact, the number of observations available for classification is very limited. This limitation of samples represents one of the main causes of the high variance of the accuracy values.

Similar guidelines concern also the correlation values between dissimilarity matrices (at the base of RSA analyses). In their seminal paper about how to implement RSA, Kriegeskorte and collaborators (2008) highlight that for dissimilarity matrices the independence of the samples cannot be assumed, because each similarity is dependent on two response patterns, each of which also codetermines the similarities of all its other pairings in the RDM. Therefore, they suggested to test the relatedness of dissimilarity matrices by using permutation tests (e.g. randomly permuting the conditions, to reorder rows and columns of one of the two dissimilarity matrices to be compared according to this permutation, and to compute the correlation).

That being said, we agree with the reviewers that in certain cases the use of standard statistics might avoid some confusion/misinterpretation and might be more straightforward. Therefore, we decided to move to parametric statistics for the analyses that allowed it (i.e. the analyses that met the main three assumptions required to apply the parametric statistics: normality of the distribution/ homogeneity of variances/ independency of data).

However, some of our analysis (i.e. the correlation of the topographical selectivity maps, the correlation of the brain dissimilarity matrices, the correlation of the RSA connectivity profiles and the inter-subject variance analysis) did not meet the assumption of independency of the data. This issue is related to the identification of the unit of analysis also mentioned by the reviewer in the current comment. We thank the reviewer for pointing out the interesting paper from Parsons et al., 2018, in which the problem of independency of the data is formally described. We now consider the guidance of this paper in building our statistical procedure.

Only in the experimental setting where each experimental unit provides a single outcome or observation, the experimental unit is the same as the unit of analysis (i.e. the one analyzed). In this case the independency of the data is maintained, and the use of parametric statistic is allowed. However, this is not always the case. For example, in our analyses previously mentioned (i.e. the correlation of the topographical selectivity maps, the correlation of the brain dissimilarity matrices, the correlation of the RSA connectivity profiles and the inter-subject variance analysis), our units of analysis are the correlation values between experimental units. In fact, in these analyses we contrast group’s comparisons, so data from the same subjects are always included in two comparisons (e.g. data from EBa subjects are included both in the SCv-EBa and in EBa-SCa comparisons). Therefore, the independency of the data is not respected here. In this case, using classical parametric statistics we would treat equally all observations in the analysis, ignoring the dependency in the data and this would lead to inflation of the false positive rate and incorrect (often highly inflated) estimates of statistical power, resulting in invalid statistical inference (Parsons et al., 2018).

For this reason, we believe that the use of permutation is a preferable approach in the case of these analyses. This approach is more flexible than the parametric statistics and allows to overcome the problem of independency. For example, in testing the difference between the comparisons we could build our null distribution keeping fixed the labels of the group common to the 2 pairs and shuffling the labels of the subjects from the other two groups (see next paragraphs for a more detailed description of all the statistical procedure used in the permutation analysis).

Also in the case of RSA with representational models we stick to permutation statistics, since also in this case there is a problem of independency of the data at the level of the DSMs’ building, as Kriegeskorte and collaborators highlighted in their seminal paper on RSA (2008): For dissimilarity matrices the independence of the samples cannot be assumed, because each similarity is dependent on two response patterns, each of which also codetermines the similarities of all its other pairings in the RDM. Therefore, the authors suggest testing the relatedness of dissimilarity matrices by using permutation tests (e.g. randomly permuting the conditions, to reorder rows and columns of one of the two dissimilarity matrices to be compared according to this permutation, and to compute the correlation).

In contrast to the previous version of our manuscript, we now changed the way of computing permutation, considering the variance of our values and we also explained more clearly our null hypothesis. In order to avoid confusion about the way we run statistics, we now explain all the information related to our statistical tests in a new section at the end of Materials and methods titled “Statistical analyses”:

“To assess statistical differences, we applied parametric tests (T-Test and ANOVA) in the analyses that met the main assumptions required by parametric statistics: normal distribution of the data and independency of the observations. […]

In each analysis, all the p-values are reported after false discovery rate (FDR) correction implemented using the matlab function ‘mafdr’”.

The main consequence of this change in the statistical methods is a decrease of the effect sizes in all the three analyses (correlation of the topographical selectivity maps; correlation of the brain dissimilarity matrices and correlation of the RSA connectivity profiles), as anticipated by the reviewer. However, even though the magnitude of the effect is reduced, this did not affect the significance of the results.

For example, if we look at the correlation between the brain DSMs, both in EVC and VOTC, we see that even though the average correlation between the DSMs in the new version of the results is lower compare to the previous results, however the same correlations that were significant in the previous results stay significant also in the new results (see Author response image 4).

Author response image 4
New version of the results from the correlation of brain DSMs in EVC (left) and in VOTC (right).

Despite a reduction of size effect in the new results, there is not any major change in the statistical results.

Similarly, the correlation values between the topographical maps remain significant (see Figure 1B).

Finally, a similar effect appears also in the correlation results between the RSA connectivity profiles (see Figure 8B). Also, in this case we find correlation values that are decreased in magnitude, but no difference in the significant tests:

3) A few of the reviewers were commenting on inconsistent statistical criteria, with the significance test flipping between one-tail to two-tails. For example, correlations between DSM were tested for significance using a one-tailed permutation test. This resulted in identifying a very strong negative correlation between the SCv and SCa in the early visual cortex (r = -.50), as non-significant, where in fact this effect is stronger than the correlation presented between SCv and EBa DSMs – r=.41, p=0.03). Similarly, the SCv DSM in the early visual cortex may be significantly negatively correlated to the behavioral similarity ratings (r=-0.09; the in sighted EVC DSM is reported to be significantly correlated with both the behavioural DSMs at r=.09, p=0.01). This could be an interesting finding, e.g. if cross-modal inhibition is categorically selective (allowing for decoding of sound categories in EVC in another study; (Vetter et al., 2014)). This finding could also add to the mechanistic explanation of the plasticity in the blind. But at present it is uninterpretable. We therefore request that one-tail testing would be avoided, unless there's strong statistical justification to usage it (e.g. in case of a replication of previous findings). In that case, the reasons for the one-tail hypothesis should be reported more transparently and interpreted more cautiously. But if the authors are confirming a new hypothesis, which we believe is the case for most of the tests reported, we'd encourage them to stick to two-tailed hypothesis testing. The authors are welcomed to consider correction for multiple comparisons, but again, we are looking for consistency throughout the manuscript and justification of the criteria used (or abandoned).

We thank the reviewer for raising this important point which give us the opportunity to clarify further our statistical strategy. We use one-tailed permutation test only in two analyses: the correlation of neural DSMs between groups and the correlation between neural DSMs and representational models (i.e. behavioural/low-level models); otherwise we systematically used 2-sided hypothesis testing. Our decision to use one-tail hypothesis testing in those specific condition is based on the lack of interpretability of negative correlation in RSA.

First, we would like to point out that the interesting possibility of a categorically selective cross-modal inhibition, suggested by the reviewer, would actually still produce a positive correlation with our behavioral/categorical model. In the use of RSA, in fact, we look at the structure of the representation of our categories in a specific brain region (i.e. how the representation of each category is similar or different from the representation of the other categories in a given ROI), however the structure of the representation does not tell anything about the level of activity of this region. For instance, we could see that in a specific brain ROI the representation of tools looks different from the representation of animals. From this information we cannot infer if the categorical representation embedded in the multivoxel pattern was the product of a deactivation or an activation, we can only infer that the brain activity patterns for tools are different from the brain activity patterns for animals. Therefore, a categorically selective cross-modal inhibition and a categorically selective cross-modal activation would create a similar structure of the representation, in both cases positively correlated with a categorical model.

A negative correlation with a representational model, such as the one we find in EVC of SCa with the behavioural model, highlight the fact that categories that look similar in our model have a different brain pattern activity while categories that look different in our model share a similar pattern of activity. These results are very difficult to interpret. One possibility is that there is another model, that we did not test, which is anti-correlated with our behavioural model and that can explain the representational structure of this ROI. This is one of the main limitations of RSA: there are in theory infinite models that we could test (Kriegeskorte et al., 2008). In practice, we need to select our model a priori for obvious statistical reasons (e.g. fitting a model a posteriori after observing the structure of our data).

Based on these limitations, studies relying on RSA typically do not interpret negative correlations’ results and tend to build unidirectional hypothesis testing (for an example with a design similar to the one of our study, including blind and sighted participants and using RSA to test different representational models, see Peelen et al. 2014; for recent examples of one-tailed test of RSA correlation see also (Fischer-Baum et al., 2017; Handjaras et al., 2017; Leshinskaya et al., 2017; Zhao et al., 2017; Wang et al., 2018; Evans et al., 2019).

In other words, since we cannot easily interpret negative correlations, it makes more sense to build a unidirectional hypothesis. We now added a short explanation in the paper to justify our choice of using a one-tailed instead than a two-tailed test:

”Considering the unidirectional hypothesis for this test (a positive correlation between neural similarity and models similarity) and the difficult interpretation of a negative correlation, one-tailed statistical tests were used. For all other tests (e.g., differences between groups), for which both directions might be hypothesized, two-tailed tests were (Peelen et al., 2014; Evans et al., 2019)”.

That being said, and in order to reassure the reviewers that using one-tailed instead of two-tailed test has no significant impact on the (positive) correlation results, we show the permutation results reporting the p values for both the one-tailed and the two-tailed test after FDR correction for the 12 multiple comparisons (see Author response image 5). As you can see, the only correlation values that are significant with the two-tailed and not with the one-tailed are the negative ones, that in any case (for the reasons previously reported) we would not have interpreted.

Author response image 5
Comparison of the p-values (one-tailed vs two-tailed) resulting from the correlation of the brain DSMs (EVC on the left and VOTC on the right) with the representational behavioural and low-level models in the 3 groups (first line: SCv, middle line: EBa, lower line: SCa).

The null distribution is represented in dark blue. The red line represents the actual correlation value. For each permutation test p-values are reported both for two-tailed and one-tailed tests. P value reported in red are significant according to the selected threshold of 0.05. p values are reported after FDR correction for 12 multiple comparisons.

Finally, the reviewer might have noticed that there is one difference compared to the results of this analysis in the previous version of the paper: the correlation with the behavioural model and the neural DSM from VOTC in SCa is not anymore significant. This is, however, unrelated to the use of one- or two-tailed test but instead related to the use of Spearman instead of Pearson correlation in our RSA analyses.

4) The focus on VOTC, and the specific definition of the ROIs requires further consideration. In particular the reviewers mentioned the lack of discrimination within it, limited coverage of the ventral stream, and lack of consideration of other areas which might be involved in categorical representation, e.g. high order auditory areas or other visual areas in the lateral occipital cortex (see also comment 6 below). Further exploratory analysis, focused by previous research on similar topics, would strengthen the interoperability of the results. A searchlight approach might be particularly helpful, though we leave it to the authors to decide on the specific methodology.

We thank the reviewer for raising this important point that leads us to clarify further our hypothesis-driven analytical strategy. There are three aspects that need to be clarified: 1) Why we decided to take the entire VOTC (and not separate subregions within it as region of interest; 2) How did we define our VOTC mask and why we chose that way; 3) Why we did not include other regions in our analyses.

First, since we were interested in the brain representation of different categories, we decided to focus on the ventro-occipito temporal cortex as a whole. This region is well known to contain several distinct macroscopic brain regions known to prefer a specific category of visual objects including faces, places, body parts, small artificial objects, etc. (Kanwisher, 2010). We decided to focus our analyses on a full mask of VOTC, and not in specific sub-parcels (such as FFA, PPA, etc.) because we were interested in looking at the categorical representation across categories and not at within a specific category. Our study therefore builds upon the paradigm shift of viewing VOTC as a distributed categorical system rather than a sum of isolated functionally specific areas, which reframes how we should expect to understand those areas (Haxby et al., 2001). In fact, our main aim was to investigate how input of presentation and visual experience impact on the general representation of different categories in the brain. Looking at one specific category selective region at a time would not allow us to address this specific question. This approach has already been validated by previous studies that investigated the categorical representation in the ventral-occipito temporal cortex using a wide VOTC mask (Kriegeskorte et al., 2008; Grill-Spector and Weiner, 2014; Wang et al., 2015; Xu et al., 2016; Hurk et al., 2017; Peelen and Downing, 2017; Ritchie et al., 2017).

That being said, our winner take all topographic analyses (see Figure 1B) in combination with calculation of Jaccard index (assessing topographic overlap) provide a clear measure of how each category maps onto a similar brain region within the VOTC mask across modalities and groups. This analyze suggest that there is a partial spatial overlap onto where separate categories map, but also highlight some differences (e.g. for animals).

Secondly, why did we decide to use a structural definition on VOTC mask and not a functional localizer? The main reason is that it is quite challenging to functionally localize VOTC in blind subjects functionally. Indeed, all the classical localizers are based on visual stimulation. Therefore, we thought that it would be misleading to functionally localize the mask only in the sighted and then apply it both to sighted and blind participants. In relation to the previous point, this is even more true if we had to localize specific categorically specific regions since we first need to localize those regions (e.g. FFA, PPA, LO) based on functional localizers since their locations show important inter-individual variability (Kanwisher, 2010; Julian et al., 2012). This is not feasible in blind people where clear functional localizers for those regions are not trivial to define. For this reason, we decided to rely on a structural definition of the mask. We decided to work in the subject space limiting the transformation and normalization of the brain’s images. This is particularly relevant when comparing blind and sighted subjects given that blindness is associated with significant changes in the structure of the brain itself, particularly within the occipital cortex (Pan et al., 2007; Jiang et al., 2009; Park et al., 2009; Dormal et al., 2016). That is why we used the anatomical scan to segment the brain in separate regions according to the Desikan-Killiany atlas (Desikan et al., 2006) implemented in FreeSurfer (http://surfer.nmr.mgh.harvard.edu). And finally, among the parcels produced by this atlas, we selected the regions laying within the ventral-occipito-temporal cortex and known to be involved in visual categorical representation of different categories.

Finally, we did not include other regions because our a priori hypothesis was based on VOTC. The exploration of additional parcels would be more based on an exploratory approach and would increase the multiple comparisons problem of our study. Because of this reason we decided to focus our analysis selectively on VOTC, adding the EVC as a control node. What drives the functional organization of VOTC (visual experience, retinotopy, curvature etc…) is a burgeoning topic with recent influential paper focusing precisely on such question (Hurk et la., 2017, Grill-Spector et al., 2014; Bracci et al., 2017;Bi et al., 2016; Peelen et al., 2017; Wang et al., 2015). We decided to insert our study in that topic by testing the role visual experience plays in driving the functional organization of VOTC by testing sighted and blind individuals with categorical sounds.

We agree with the reviewers that it would be interesting to investigate the categorical representation in other parts of the brain, such as the lateral occipital complex or the temporal cortex. We are indeed investigating at the moment the categorical representation in the temporal cortex of sighted and blind subjects (using a part of the actual dataset), with also an additional extension including blind subjects with late onset of blindness. However, we believe that this topic is beyond the scope of the present study that is already theoretically and technically challenging. Adding other regions would necessarily need changing the theoretical focus of the study and likely result in a paper presenting a patchwork results that are difficult to integrate in a streamlined global theoretical framework.

That being said, we agree with the reviewers that we needed to better explain our choice about the definition of our ROIs. We added some clarification in the section “Regions of interest”:

“Since we were interested in the brain representation of different categories we decided to focus on the ventro-occipito temporal cortex as a whole. […] Then, we combined these areas in order to obtain one bilateral EVC ROI and one bilateral VOTC ROI (Figure 3A). […]”

Finally, we thank the reviewer for highlighting the important point related to the searchlight approach. The searchlight approach might be, indeed, a useful and powerful analysis in the case of a more exploratory study. However, we again believe that this analysis is beyond the scope of the current paper, and we think that it might not help in adding value to our study which has a more hypothesis driven orientation (such as Wang et al., 2015; Hurk et al., 2017; Bi et al., 2016). On the opposite, the control for the (copious) multiple comparisons could even hinder part of the effects that emerge using the hypothesis-driven ROI approach. Moreover, as highlighted in the previous part of this comment, we decided to work in the subject space since there are plenty of evidences that early blindness is associated with significant changes in the structure of the brain itself, particularly within the occipital cortex (Dormal et al., 2016; Jiang et al., 2009; Pan et al., 2007; Park et al., 2009). Therefore, even though we could implement the searchlight approach in each individual subject, a normalization step would be required to move to a common space in order to infer the location at the group level. Working in individually defined anatomical masks allowed as to circumvent this problem.

In addition, even though we agree that the searchlight approach might be helpful as a side analysis to explore our data more extensively, we respectfully remind the reviewer that we build our neural dissimilarity matrices using a crossvalidation approach for each possible pair of conditions (i.e. for each ROI and for each group we run 28 binary decoding tests); we believe that it would be extremely challenging (especially for a side-analysis), at the computational level, to repeat the same analysis for each voxel.

As a final note, the multidimensional nature of fMRI data can always trigger new hypothesis-testing based on sensibilities of reviewers (which region is selected, which method to test the hypothesis etc.). We however believe that this can be problematic in generating post-hoc hypotheses and enlarging the theoretical and statistical space of a research project, potentially producing scientific malpractice (Zimring, Nature, 2016). More generally speaking, the request for additional experiments or analyses in the review phase has been recently eloquently debated by Hidde Ploegh in a Nature News called “End the Wasteful Tyranny of Reviewer Experiments” or by Derek Lowe in Science Translational Medicine: “Just A Few More Month’s Work, That’s All I’m Asking Here”. Quoting Ploegh here: “Submit a biomedical-research paper to Nature or other high-profile journals, and a common recommendation often comes back from reviewers: perform additional experiments. Although such extra work can provide important support for the results being presented, all too frequently it represents instead an entirely new phase of the project or does not extend the reach of what is reported”.

Again, our study was strongly hypothesis-driven by being focused on testing whether different modalities of input and sensory experiences affect the categorical representation in VOTC; following a focused analytical strategy as implemented by previous major literature in this field (Kriegeskorte et al., 2008; Grill-Spector et al., 2014; Bracci et al., 2017;Bi et al., 2016; Peelen et al., 2017; Wang et al., 2015).

5) The topographic selectivity analysis raised multiple comments from the reviewers. Beyond the issue raised in comment 2 above, it was agreed that this analysis potentially reveals important differences between the groups that are not adequately captured in the current presentation and analysis. For example, Reviewer 2 mentions: "the blind show nearly no preference for animal sounds in accordance with claims made by (Bi et al., 2016); in the sighted the human non-vocalizations are the preferred category for face selective areas in the visual cortex but in the auditory version (in both blind and sighted) there is mostly vocalization selectivity; The blind show little or no preference for the large-environmental sounds whereas the sighted show a differentiation between the two sound types". Reviewer 3 mentions: "In SCv, the medial ventral temporal cortex is primarily BIG-ENV, but in EBa, that same region is primarily BIG-MEC. Similarly, there appears to be more cortex showing VOC. In SCa than in either of the other two groups". We appreciate that some of the analysis might have been used as a replication of the Van den Hurk study, but we also expect them to stand alone, in terms of quality of analysis and interpretation. In particular, these topographical preference differences could be interesting and meaningful with relation to understanding blind visual cortex function, but it is hard to judge their strength or significance as the topographical selectivity maps are not thresholded, by activation strength or selectivity significance.

We thank the reviewers for raising these interesting comments about the topographic selectivity analysis. We agree that this analysis might reveal important differences between the groups and that we should try to adequately capture and highlight these interesting differences in our work.

We now compute the Jaccard index to quantify the similarity between the topographical maps preferentially elicited by each category in our 3 groups. The Jaccard similarity coefficient is a statistic used for measuring the topographic similarity and diversity of sample sets. The Jaccard coefficient is defined as the size of the intersection divided by the size of the union of the sample sets (see Author response image 6):

Author response image 6
Visual representation of the intersection (left) and union (right) of two set of samples.

The Jaccard coefficient is based on these two measures.

This value is 0 when the two sets are disjoint, 1 when they are equal, and between 0 and 1 otherwise. We used the Jaccard similarity index to look at two important aspects of our data: (1) the similarity of the topographical maps (for each category) between subjects from the same group (within group similarity). This analysis provides information about the consistency of the topographical representation of the categories across subjects within the same group (See Figure 1B-left). (2) The similarity between the topographical maps from both the blind and the sighted in the auditory experiment and the topographical map of the sighted in the visual experiment (between groups similarity). This analysis provides information about the similarity between the visual and auditory (both in sighted and in blind) topographical representations (see figure 1C).

For the within group similarity we computed the Jaccard similarity between the topographical selectivity map of each subject and the mean topographical map of his own group. This analysis produces 4 Jaccard similarity indices (one for each of the main category: (1) animal, (2) human, (3) manipulable objects and (4) big objects and places) for each group. Low values for one category mean that the topographical representation of that category varies a lot across subjects of the same group (e.g. the animal category in EBa). The results of the Jaccard similarity within each group are represented in the Figure 2B.

The results from the Jaccard similarity within groups are now reported in our article as follow:

“In order to look at the consistency of the topographical representation of the categories across subjects within the same group we computed the Jaccard similarity between the topographical selectivity map of each subject and the mean topographical map of his own group. […] The animals’ category showed a significantly lower Jaccard similarity within the EBa group compared to both SCa (pFDR = 0.002) and SCv (pFDR <0.001) groups while the similarity of big objects and places category was significantly higher in EBa compared to SCv (pFDR = 0.038).”

For the between group analysis we computed the Jaccard similarity index for each category, between the topographical map of each blind and sighted subject in the auditory experiment and the averaged topographical selectivity map of the sighted in the visual experiment (see Figure 1C for the results). In more practical terms, this means that we have 4 Jaccard similarity indices (one for each of the main category: (1) animal, (2) human, (3) manipulable objects and (4) big objects and places) for each of the two group pairs: SCv-EBa and SCv-SCa.

The Figure 2C represents the results from this analysis.

We now present these new results from the Jaccard similarity between groups:

“In addition, we wanted to explore the similarity and differences among the topographical representations of our categories when they were presented visually compared to when they were presented acoustically, both in sighted and in blind. […] No difference emerged between groups, suggesting comparable level of similarity of the auditory topographical maps (both in blind and in sighted) with the visual topographical map in sighted participants.“

In addition to the Jaccard similarity analyses, we thought that a further relevant analysis would be to look at the number of voxels showing the preference for each category, in each group. In fact, the degree of overlap might be driven by the number of voxels selective for each category (e.g. it might be that there is no overlap for the animal category between SCv and EBa because almost no voxel prefer animal in blind).

The Figure 2B represents the number of selective voxels for each category and in every group.

We now present the results from the analysis on number of voxels selective for each category:

“Finally, since the degree of overlap highlighted by the Jaccard similarity values might be driven by the number of voxels selective for each category, we looked at the number of voxels showing the preference for each category in each group. […] The post hoc comparisons revealed that this difference was mainly driven by the reduced number of voxels selective for the animal category in EBa compared to SCv (p=0.02).”

We now include a new figure (Figure 2) in the new version of our manuscript in which we represent the results from these three analyses (i.e. number of selective voxels, Jaccard similarity within groups and Jaccard similarity with SCv).

We also added a section in the Discussion of our manuscript to integrate these results in the general framework of our paper and to relate them with previous studies. This section now reads as follows:

“Even though the categorical representation of VOTC appears, to a certain degree, immune to input modality and visual experience, there are also several differences emerging from the categorical representation of sight and sounds in the sighted and blind. […] Interestingly, in case of early visual deprivation this difference is also in the behavioural evaluation of the similarity of our stimuli.”

To go even more in details, in the supplemental material we included a representation of the same analyses applied on the 8 categories (see Figure 2—figure supplement 1).

The most striking observation from this analysis is that a large number of voxels show a preference for the big mechanical category (almost the double than the number of voxels preferring the other categories) in the VOTC of EBa group. In line with this result, the topographical map for the big mechanical objects is the most stable across EBa subjects, while the map for big environmental is the one less stable (i.e. lowest Jaccard similarity within EBa group). Moreover, also in this analysis, we see that in EBa a limited number of voxels is selective for both birds and mammals and the topographical maps of these two animals’ categories show a lower consistency across blind subjects.

Finally, it is true that our topographical selectivity maps are not thresholded by activation strength or selectivity significance. Our experiment was, definitely, designed to suit a multivariate fMRI analysis approach (e.g. fast event related design, short inter-stimulus interval, etc.) therefore our statistical power for univariate analyses might not be optimal. Moreover, it is important to keep in mind that we are looking at the activity of the occipital cortex for sounds, including in sighted individual, therefore we are not expecting high β-values in our univariate GLM. However, we believe that our topographical selectivity analysis is anyway valuable to make inference about some aspects of the categorical representation of the different categories across the different groups (for a similar approach see also Hurk et al., 2017; Striem-Amit et al., 2018). In other words, even though we cannot (and we do not) make any inference about the topographical category selectivity itself since our map are not thresholded, we however can use those maps for a second-order statistics (i.e. the correlation between the winner-take-all maps). It is important to highlight that if those maps were noisy, we would not observe any significant correlation. On the opposite, our results (supported by stringent statistics) show that the topographical map generated in VOTC by the auditory stimuli (both in blind and in sighted) is not stochastic; quite the opposite, it partially reflects the topographical map generated by the visual stimuli in sighted (see Figure 1B).

In addition, we would like some further clarification of how the correlations have been constructed, if indeed a use-takes-all approach is used to label each voxel.

To create the topographical selectivity map, we assign a label to each voxel within the VOTC mask. Because of the issue raised at point #2 concerning the usage of group mean to test statistical differences, we now compute a topographical selectivity map in each subject, in order to adequately represent inter-subject variance within each group. We improved the description of the analysis and the statistical tests in the “Materials and methods”, in the section related to the “Topographical selectivity map”:

“To create the topographical selectivity map (Figure 1B) we extracted in each participant the b-value for each of our 4 main conditions (animals, humans, manipulable objects and places) from each voxel inside the VOTC mask and we assigned to each voxel the condition producing the highest β-value (winner takes all). This analysis resulted in specific clusters of voxels that spatially distinguish themselves from their surround in terms of selectivity for a particular condition (Hurk et al., 2017, Striem-Amit et al., 2018).”

6) In the representational connectivity analysis, the authors may be overestimating the connectivity between regions due to contribution of intrinsic fluctuations between areas. To more accurately estimate the representational connectivity, it would be better if the authors used separate trials for the seed and target regions. See Henriksson et al., 2015, Neuroimage for a demonstration and discussion of this issue. Indeed, considering the clear differences found elsewhere between groups, the high between-group correlations are striking. It could have been informative to examine a control node (e.g. low-level visual cortex; low level auditory cortex) just to gain a better sense for what these correlations reveal.

We thank the reviewer for highlighting this point and for pointing out the relevant paper from Henriksson et al., 2015. In this paper, the authors reported that intrinsic cortical dynamics might strongly affect the representational geometry of a brain region, as reflected in response-pattern dissimilarities, and might exaggerate the similarity (quantified using a correlation measure) of representations between brain regions (Henriksson et al., 2015). In the paper the authors show that visual areas closer in cortex tended to exhibit greater similarity of representations. To bypass this problem, they suggest using independent data (e.g. data acquired in different runs) to compute the DSM in the seed ROI and the DSMs from the other brain ROIs that we want to correlate with the seed. This would, indeed, be a clever way to remove the intrinsic fluctuation bias.

However, we think that we are not under the influence of similar problems due to the way we built our dissimilarity matrices. As we are building our neural dissimilarity matrices using a cross-validation approach for each possible pair of conditions (and not 1 – the correlation value), we drastically reduce the possibility of this intrinsic fluctuation bias. As we explain in the Materials and methods section: “we preferred to use binary MVP-classification as dissimilarity index to build neural DSMs rather than other types of dissimilarity measures […] because when using a decoding approach, due to the intrinsic cross-validation steps, we would find that the two conditions that don’t drive responses are indistinguishable, despite their substantial correlation (Walther et al., 2016) since the noise is independent between the training and testing partitions, therefore cross-validated estimates of the distance do not grow with increasing noise. This was crucial in our study since we are looking at brain activity elicited by sounds in brain regions that are primarily visual (EVC and VOTC) where the SNR is expected to be low, at least in sighted people”.

In other words, the use of cross-validation across imaging runs ensures that the estimated distances between neural patterns are not systematically biased by run-specific noise (Whalther et al., 2016; Evans et al., 2019). Therefore, we believe that in our case the method suggested by Henriksson et al. is not necessary.

Moreover, the results we obtained from the representational connectivity analysis in VOTC already suggest that this intrinsic fluctuation bias cannot explain our results. If we look at the representational connectivity profiles represented in Figure 6, we clearly see that the proximity of the ROIs in the cortex is not systematically linked with a higher representational similarity (in contrast with what Henriksson et al. showed in their data). One example is the DSM form the fusiform node in that shows a higher similarity with the DSM from the inferior parietal cortex compared to the cuneus and even the lingual gyrus, which are nevertheless much closer in cortex.

Following the suggestion of the reviewers, we performed the same analysis also in the EVC as a control node, in order to gain a better sense for what these correlations reveal. In this ROI, we found a significant correlation only between the representational connectivity profiles of the two groups of sighted (SCv and SCa) and not between neither the EBa and the SCv nor between the EBa and the SCa. This highlight that the conclusion drawn from VOTC are specific and not the byproduct of a methodological property of how we compute our representational connectivity analysis. As we highlight in the Discussion:

“In support of this possibility, the representational connectivity profile of EVC in EBa did not show any similarity with the one of sighted (neither SCv nor SCa), suggesting a different way of crossmodal plasticity expression in this brain region”.

These results (together with other evidences coming from the decoding analysis and the RSA analysis) support the hypothesis that the posterior part of the occipital cortex in EB is the region that distance itself the most from the native computation it typically implements (Bi, Wang, and Caramazza, 2016; Büchel, 2003; Wang et al., 2015).

Importantly for the sake of this comment, from a methodological point of view, this is in support of the idea that the intrinsic cortical dynamics alone cannot explain our results, since the correlation between the representational connectivity profiles is sometimes absent.

Also, in this section it is not clear why the authors use a separate ROI for each of the 3 VOTC sub-regions, rather than the combination of all three, as they have for the main analysis? As a minor point, the reviewer wasn't clear how the authors switched from 3 nodes for each group to only one comparison for each group pair.

Our decision of using a separate ROI for each of the 3 VOTC sub-regions is mostly driven by previous literature. Previous studies have shown that, in sighted people, specific region within VOTC are functionally and/or structurally connected with specific extrinsic brain regions. For instance, the visual word form area (VWFA) in VOTC shows robust and specific anatomical connectivity to EVC and to frontotemporal language networks (Saygin et al., 2016). Similarly, the fusiform face area (FFA) shows a specific connectivity profile with other occipital regions (Saygin et al., 2012), and also a direct structural connection with the temporal voice area (TVA) in the superior temporal sulcus (Blank et al., 2011; Benetti et al., 2018) thought to support similar computations applied on faces and voices as well as their integration (Von Kriegstein, Kleinschmidt, Sterzer, and Giraud, 2005). In order to explore these large-scale brain connections in our 3 groups, we thought that it would be better to keep the specificity of the representational connectivity profile of each sub-region of VOTC. In this way we did not cancel out the differences across the representational connectivity profile of each VOTC sub-region and we ended-up with a higher variance in the connectivity profile of each subject, which is methodologically good for the following correlation analysis between the connectivity profiles. It is important to understand that we did not compare the connectivity profile of the single sub-regions, but in line with the rest of the paper, we concatenated the connectivity profile obtained from the three ROIs: our main aim was to keep more granularity in our connectivity profile.

However, to reassure the reviewers that keeping the entire VOTC as seed ROI would have a major impact on the results, we report the results from this analysis (Author response image 7, bottom panel), together with the same analysis using the 3 seeds ROIs (Author response image 7, top panel). As you can see the correlation values do not change critically in the two analyses.

Author response image 7
Comparisons of the results from the RSA connectivity analysis using the 3 sub-regions of VOTC (fusiform, parahippocampal, infero-temporal) as seeds ROIs or VOTC as a single ROI.

Moreover, since the reviewers highlighted that our explanation on how we switched from 3 nodes for each group to only one comparison for each group pair was not clear enough, we now clarified this section in the new version of our manuscript:

”Finally, we computed the Spearman’s correlation between the 3 seed ROIs (i.e. fusiform gyrus, parahippocampal gyrus and infero-temporal cortex) and all the other 27 ROIs. We ended up with a connectivity profile of 3 (number of seeds) by 27 (ROIs representing the rest of the brain) for each subject. We considered this 3*27 matrix as one representational connectivity profile of the seed region (e.g. VOTC) in each subject.”

https://doi.org/10.7554/eLife.50732.sa2

Article and author information

Author details

  1. Stefania Mattioni

    Institute of research in Psychology (IPSY) & Institute of Neuroscience (IoNS) - University of Louvain (UCLouvain), Louvain-la-Neuve, Belgium
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology
    For correspondence
    stefania.mattioni@uclouvain.be
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8279-6118
  2. Mohamed Rezk

    1. Institute of research in Psychology (IPSY) & Institute of Neuroscience (IoNS) - University of Louvain (UCLouvain), Louvain-la-Neuve, Belgium
    2. Centre for Mind/Brain Sciences, University of Trento, Trento, Italy
    Contribution
    Methodology
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1866-8645
  3. Ceren Battal

    1. Institute of research in Psychology (IPSY) & Institute of Neuroscience (IoNS) - University of Louvain (UCLouvain), Louvain-la-Neuve, Belgium
    2. Centre for Mind/Brain Sciences, University of Trento, Trento, Italy
    Contribution
    Methodology
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9844-7630
  4. Roberto Bottini

    Centre for Mind/Brain Sciences, University of Trento, Trento, Italy
    Contribution
    Conceptualization
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7941-7762
  5. Karen E Cuculiza Mendoza

    Centre for Mind/Brain Sciences, University of Trento, Trento, Italy
    Contribution
    Software, Methodology
    Competing interests
    No competing interests declared
  6. Nikolaas N Oosterhof

    Centre for Mind/Brain Sciences, University of Trento, Trento, Italy
    Contribution
    Data curation, Software, Methodology
    Competing interests
    No competing interests declared
  7. Olivier Collignon

    Institute of research in Psychology (IPSY) & Institute of Neuroscience (IoNS) - University of Louvain (UCLouvain), Louvain-la-Neuve, Belgium
    Contribution
    Conceptualization, Resources, Data curation, Software, Supervision, Funding acquisition, Validation, Visualization, Methodology, Project administration
    For correspondence
    olivier.collignon@uclouvain.be
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1882-3550

Funding

European Commission (Starting Grant MADVIS: 337573)

  • Olivier Collignon

Excellence of Science (30991544)

  • Olivier Collignon

Fonds De La Recherche Scientifique - FNRS

  • Olivier Collignon

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We would like to express our gratitude to Marco Barilari, Stefania Benetti, Giorgia Bertonati, Francesca Barbero who have helped with the data acquisition, to Yangwen Xu, Matthew Bennet and Remi Gau for giving comments on a preliminary version of the paper, to Jorge Jovicich for helping to set-up the fMRI acquisition parameters and to Pietro Chiesa for continuing support with stimuli-delivery systems. We are also extremely thankful to our blind participants and to the Unioni Ciechi of Trento, Mantova, Genova, Savona, Cuneo, Torino, Trieste and Milano and the blind Institute of Milano for helping with the recruitment. The project was funded by the ERC starting grant MADVIS (Project: 337573) and the Belgian Excellence of Science program (Project: 30991544) awarded to Olivier Collignon. Olivier Collignon is research associate at the Fond National de la Recherche Scientifique de Belgique (FRS-FNRS).

Ethics

Human subjects: The ethical committee of the University of Trento approved this study (protocol 2014-007) and participants gave their informed consent before participation.

Senior Editor

  1. Barbara G Shinn-Cunningham, Carnegie Mellon University, United States

Reviewing Editor

  1. Tamar R Makin, University College London, United Kingdom

Reviewer

  1. Tamar R Makin, University College London, United Kingdom

Publication history

  1. Received: July 31, 2019
  2. Accepted: February 14, 2020
  3. Accepted Manuscript published: February 28, 2020 (version 1)
  4. Version of Record published: March 31, 2020 (version 2)

Copyright

© 2020, Mattioni et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,192
    Page views
  • 224
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)