1. Neuroscience
Download icon

The nature of the animacy organization in human ventral temporal cortex

  1. Sushrut Thorat  Is a corresponding author
  2. Daria Proklova
  3. Marius V Peelen  Is a corresponding author
  1. Radboud University, Netherlands
  2. University of Western Ontario, Canada
Research Article
  • Cited 3
  • Views 943
  • Annotations
Cite this article as: eLife 2019;8:e47142 doi: 10.7554/eLife.47142

Abstract

The principles underlying the animacy organization of the ventral temporal cortex (VTC) remain hotly debated, with recent evidence pointing to an animacy continuum rather than a dichotomy. What drives this continuum? According to the visual categorization hypothesis, the continuum reflects the degree to which animals contain animal-diagnostic features. By contrast, the agency hypothesis posits that the continuum reflects the degree to which animals are perceived as (social) agents. Here, we tested both hypotheses with a stimulus set in which visual categorizability and agency were dissociated based on representations in convolutional neural networks and behavioral experiments. Using fMRI, we found that visual categorizability and agency explained independent components of the animacy continuum in VTC. Modeled together, they fully explained the animacy continuum. Finally, clusters explained by visual categorizability were localized posterior to clusters explained by agency. These results show that multiple organizing principles, including agency, underlie the animacy continuum in VTC.

https://doi.org/10.7554/eLife.47142.001

Introduction

One of the main goals of visual cognitive neuroscience is to understand the principles that govern the organization of object representations in high-level visual cortex. There is broad consensus that the first principle of organization in ventral temporal cortex (VTC) reflects the distinction between animate and inanimate objects. These categories form distinct representational clusters (Kriegeskorte et al., 2008) and activate anatomically distinct regions of VTC (Grill-Spector and Weiner, 2014; Chao et al., 1999).

According to the visual categorization hypothesis, this animate-inanimate organization supports the efficient readout of superordinate category information, allowing for the rapid visual categorization of objects as being animate or inanimate (Grill-Spector and Weiner, 2014). The ability to rapidly detect animals may have constituted an evolutionary advantage (Caramazza and Shelton, 1998; New et al., 2007).

However, recent work has shown that the animacy organization reflects a continuum rather than a dichotomy, with VTC showing a gradation from objects and insects to birds and mammals (Connolly et al., 2012; Sha et al., 2015). This continuum was interpreted as evidence that VTC reflects the psychological property of animacy, or agency, in line with earlier work showing animate-like VTC responses to simple shapes whose movements imply agency (Castelli et al., 2002; Martin and Weisberg, 2003; Gobbini et al., 2007). According to this agency hypothesis, the animacy organization reflects the degree to which animals share psychological characteristics with humans, such as the ability to perform goal-directed actions and experiencing thoughts and feelings.

Importantly, however, the finding of an animacy continuum can also be explained under the visual categorization hypothesis. This is because some animals (such as cats) are easier to visually categorize as animate than others (such as stingrays). This visual categorizability is closely related to visual typicality – an animal's perceptual similarity to other animals (Mohan and Arun, 2012). Indeed, recent work showed that the visual categorizability of animals (as measured by reaction times) correlates with the representational distance of those animals from the decision boundary of an animate-inanimate classifier trained on VTC activity patterns (Carlson et al., 2014). The finding of an animacy continuum is thus fully in line with the visual categorization hypothesis.

The difficulty in distinguishing between the visual categorization and agency hypotheses lies in the fact that animals’ visual categorizability and agency are correlated. For example, four-legged mammals are relatively fast to categorize as animate and are also judged to be psychologically relatively similar to humans. Nevertheless, visual categorizability and agency are distinct properties that can be experimentally dissociated. For example, a dolphin and a trout differ strongly in perceived agency (dolphin > trout) but not necessarily in visual categorizability. In the present fMRI study, we disentangled visual categorizability and agency to assess their ability to explain the animacy continuum in VTC. This was achieved by selecting, out of a larger set, 12 animals for which visual categorizability and agency were orthogonal to each other across the set.

We find that visual categorizability and agency independently contribute to the animacy continuum in VTC as a whole. A model that combines these two factors fully explains the animacy continuum. In further analyses, we localize the independent contributions of visual categorizability and agency to distinct regions in posterior and anterior VTC, respectively. These results provide evidence that multiple organizing principles, including agency, underlie the animacy continuum and that these principles express in different parts of visual cortex.

Results

Disentangling visual categorizability and agency

To create a stimulus set in which visual categorizability and agency are dissociated, we selected 12 animals from a total of 40 animals. Visual categorizability was quantified in two ways, using convolutional neural networks (CNNs) and human behavior, to ensure a comprehensive measure of visual categorizability. Agency was measured using a rating experiment in which participants indicated the degree to which an animal can think and feel. Familiarity with the objects was also assessed and controlled for in the final stimulus set used in the fMRI experiment.

Agency and familiarity

Agency and familiarity measures were obtained through ratings (N = 16), in which participants indicated the thoughtfulness of, feelings of, and familiarity with the 40 animals (Figure 1A). The correlation between the thoughtfulness and feelings ratings (τ = 0.70) was at the noise ceiling of both those ratings (τthought = 0.69, τfeel = 0.70). We therefore averaged the thoughtfulness and feelings ratings and considered the averaged rating a measure of agency.

Figure 1 with 1 supplement see all
Obtaining the models to describe animacy in the ventral temporal cortex.

(A) Trials from the ratings experiment are shown. Participants were asked to rate 40 animals on three factors - familiarity, thoughtfulness, and feelings. The correlations between the thoughtfulness and feelings ratings are at the noise ceilings of both these ratings. Therefore, the average of these ratings was taken as a measure of agency. (B) A schematic of the convolutional neural network (CNN) VGG-16 is shown. The CNN contains 13 convolutional layers (shown in green), which are constrained to perform the spatially-local computations across the visual field, and three fully-connected layers (shown in blue). The network is trained to take RGB image pixels as inputs and output the label of the object in the image. Linear classifiers are trained on layer FC8 of the CNN to classify between the activation patterns in response to animate and inanimate images. The distance from the decision boundary, toward the animate direction, is the image categorizability of an object. (C) A trial from the visual search task is shown. Participants had to quickly indicate the location (in the left or right panel) of the oddball target among 15 identical distractors which varied in size. The inverse of the pairwise reaction times are arranged as shown. Either the distractors or the targets are assigned as features of a representational space on which a linear classifier is trained to distinguish between animate and inanimate exemplars (Materials and methods). These classifiers are then used to categorize the set of images relevant to subsequent analyses; the distance from the decision boundary, towards the animate direction, is a measure of the perceptual categorizability of an object.

https://doi.org/10.7554/eLife.47142.002

Visual categorizability

The first measure of visual categorizability was based on the features extracted from the final layer (FC8) of a pre-trained CNN (VGG-16 [Simonyan and Zisserman, 2015]; Materials and methods). This layer contains rich feature sets that can be used to accurately categorize objects as animate or inanimate by a support vector machine (SVM) classifier. This same classifier was then deployed on the 40 candidate objects (4 exemplars each) of our experiment to quantify their categorizability. This resulted, for each object, in a representational distance from the decision boundary of the animate-inanimate classifier (Figure 1B). Because this measure was based on a feedforward transformation of the images, which was not informed by inferred agentic properties of the objects (such as thoughtfulness), we label this measure image categorizability.

The second measure of visual categorizability was based on reaction times in an oddball detection task previously shown to predict visual categorization times (Mohan and Arun, 2012; Figure 1C). The appeal of this task is that it provides reliable estimates of visual categorizability using simple and unambiguous instructions (unlike a direct categorization task, which relies on the participants’ concept of animacy, again potentially confounding agency and visual categorizability). Participants were instructed to detect whether an oddball image appears to the left or the right of fixation. Reaction times in this task are an index of visual similarity, with relatively slow responses to oddball objects that are visually relatively similar to the surrounding objects (e.g. a dog surrounded by cats). A full matrix of pairwise visual similarities was created by pairing all images with each other. For a given object, these similarity values constitute a perceptual representation with respect to the other objects. These visual similarity values were then used as features in an SVM trained to classify animate vs inanimate objects. Testing this classifier on the images of the fMRI experiment resulted, for each object, in a representational distance from the decision boundary of the animate-inanimate classifier (Figure 1C). Because this measure was based on human perception, we labeled this measure perceptual categorizability. The neural representations the reaction times in this task rely on are not fully known, and might reflect information about inferred agency of the objects. As such, accounting for the contribution of perceptual categorizability in subsequent analyses provides a conservative estimate of the independent contribution of agency to neural representations in VTC.

The two measures of visual categorizability were positively correlated for the 12 animals that were selected for the fMRI experiment (Kendall’s τ = 0.64), indicating that they partly reflect similar animate-selective visual properties of the objects. The correspondence between these two independently obtained measures of visual categorizability provides a validation of these measures and also shows that the image categorizability obtained from the CNN is meaningfully related to human perception.

Selection of image set

The final set of 12 animals for the fMRI experiment were chosen from the set of 40 images such that the correlations between image categorizability, agency, and familiarity were minimized (Figure 2). This was successful, as indicated by low correlations between these variables (τ < 0.13, for all correlations). Because perceptual categorizability was not part of the selection procedure of the stimulus set, there was a moderate residual correlation (τ = 0.30) between perceptual categorizability and agency.

Disentangling image categorizability and agency.

The values of agency and image categorizability are plotted for the 40 animals used in the ratings experiment. We selected 12 animals such that the correlation between agency and image categorizability is minimized. Data-points corresponding to those 12 animals are highlighted in red.

https://doi.org/10.7554/eLife.47142.004

Animacy in the ventral temporal cortex

Participants (N = 17) in the main fMRI experiment viewed the 4 exemplars of the 12 selected animals while engaged in a one-back object-level repetition-detection task (Figure 3). The experiment additionally included 3 inanimate objects (cars, chairs, plants) and humans. In a separate block-design animacy localizer experiment, participants viewed 72 object images (36 animate, 36 inanimate) while detecting image repetitions (Figure 3).

The fMRI paradigm.

In the main fMRI experiment, participants viewed images of the 12 selected animals and four additional objects (cars, trees, chairs, persons). Participants indicated, via button-press, one-back object repetitions (here, two parrots). In the animacy localizer experiment, participants viewed blocks of animal (top sequence) and non-animal (bottom sequence) images. All images were different from the ones used in the main experiment. Each block lasted 16s, and participants indicated one-back image repetitions (here, the fish image).

https://doi.org/10.7554/eLife.47142.006

In a first analysis, we aimed to replicate the animacy continuum for the objects in the main experiment. The VTC region of interest was defined anatomically, following earlier work (Haxby et al., 2011; Figure 4A; Materials and methods). An SVM classifier was trained on activity patterns in this region to distinguish between blocks of animate and inanimate objects in the animacy localizer, and tested on the 16 individual objects in the main experiment. The distances from the decision boundary, towards the animate direction, were taken as the animacy scores.

Figure 4 with 3 supplements see all
Assessing the nature of the animacy continuum in the ventral temporal cortex (VTC).

(A) The region-of-interest, VTC, is highlighted. (B) The order of objects on the VTC animacy continuum, image categorizability (IC), perceptual categorizability (PC), and agency (Ag) are shown. (C) The within-participant correlations between VTC animacy and image categorizability (IC), perceptual categorizability (PC), visual categorizability (VC, a combination of image categorizability and perceptual categorizability; Materials and methods), and agency (Ag) are shown. All four models correlated positively with VTC animacy. (D) The left panel shows the correlations between VTC animacy and VC and Ag after regressing out the other measure from VTC animacy. Both correlations are positive, providing evidence for independent contributions of both agency and visual categorizability. The right panel shows the correlation between VTC animacy and a combination of agency and visual categorizability (Materials and methods). The combined model does not differ significantly from the VTC animacy noise ceiling (Materials and methods). This suggests that visual categorizability and agency are sufficient to explain the animacy organization in VTC. Error bars indicate 95% confidence intervals for the mean correlations.

https://doi.org/10.7554/eLife.47142.007

The mean cross-validated training accuracy (animacy localizer) of animate-inanimate classification in VTC was 89.6%, while the cross-experiment accuracy in classifying the 16 stimuli from the main fMRI experiment was 71.3%, indicating reliable animacy information in both experiments. Importantly, there was systematic and meaningful variation in the animacy scores for the objects in the main experiment (Figure 4B). Among the animals, humans were the most animate whereas reptiles and insects were the least animate (they were classified as inanimate, on average). These results replicate previous findings of an animacy continuum (Connolly et al., 2012; Sha et al., 2015).

Now that we established the animacy continuum for the selected stimulus set, we can turn to our main question of interest: what are the contributions of visual categorizability and agency to the animacy continuum in VTC? To address this question, we first correlated the visual categorizability scores and the agency ratings with the VTC animacy scores (Figure 4C). VTC animacy scores correlated positively with all three measures: image categorizability (τ = 0.16; p = 10-3), perceptual categorizability (τ = 0.26; p < 10-4); and agency (τ = 0.30; p < 10-4). A combined model of image categorizability and perceptual categorizability (visual categorizability; Materials and methods) also positively correlated with VTC animacy (τ = 0.23; p < 10-4).

Because agency and visual categorizability scores were weakly correlated, it is possible that the contribution of one of these factors was partly driven by the other. To test for their independent contributions, we regressed out the contribution of the other factor(s) from VTC animacy scores before computing the correlations. The correlation between VTC animacy and agency remained positive in all individual participants (τ = 0.23; p < 10-4 Figure 4D) after regressing out both image categorizability and perceptual categorizability. Similarly, the correlation between VTC animacy and visual categorizability remained positive after regressing out agency (τ = 0.12; p = 4.7 × 103).

Finally, to test whether a combined model including visual categorizability and agency fully explained the animacy continuum, we performed leave-one-out regression on VTC animacy with all three factors as independent measures. The resultant combined model (derived separately for each left-out participant) had a higher correlation with VTC animacy than any of the three factors alone (within-participant comparisons - ΔIC = 0.21, p < 10-4; ΔPC = 0.10, p = 6 x 10-4, ΔAg = 0.07, p = 8.3 x 10-3). Furthermore, the correlation between the combined model and VTC animacy (τ = 0.37; Figure 4D) is at VTC animacy noise ceiling (τNC = 0.39; Materials and methods). This result suggests that a linear combination of the two models (visual categorizability and agency) fully explains the animacy continuum in VTC, but the single models alone do not.

Whole-brain searchlight analysis

Our results indicate that both visual categorizability and agency contribute to the animacy continuum in VTC as a whole. Can these contributions be anatomically dissociated as well? To test this, we repeated the analyses in a whole-brain searchlight analysis (spheres of 100 proximal voxels). To reduce the number of comparisons, we constrained the analysis to clusters showing significant above-chance animacy classification (Materials and methods). Our aim was to reveal spheres showing independent contributions of visual categorizability or agency. To obtain the independent contribution of agency, we regressed out both image categorizability and perceptual categorizability from each sphere's animacy continuum and tested if the residue reflected agency. Similarly, to obtain the independent contribution of visual categorizability, we regressed out agency from the sphere's animacy continuum and tested if the residue reflected either image categorizability or perceptual categorizability. The resulting brain maps were corrected for multiple comparisons (Materials and methods).

Results (Figure 5) showed that both visual categorizability and agency explained unique variance in clusters of VTC, consistent with the region-of-interest analysis. Moreover, there was a consistent anatomical mapping of the two factors: the independent visual categorizability contribution (LH: 1584 mm3, center Montreal Neurological Institute (MNI) coordinates: x = -38, y = -80, z = 7; RH: 7184 mm3, center coordinates: x = 41, y = -71, z = 1) was located posterior to the independent agency contribution (LH: 592 mm3, center coordinates: x = -42, y = -56, z = -19; RH: 4000 mm3, center coordinates: x = 39, y = -52, z = -12), extending from VTC into the lateral occipital regions. The majority of the independent agency contribution was located in the anterior part of VTC. A similar posterior-anterior organization was observed in both hemispheres (Figure 5B), though stronger in the right hemisphere. These results provide converging evidence for independent contributions of visual categorizability and agency to the animacy continuum, and show that these factors explain the animacy continuum at different locations in the visual system.

Figure 5 with 1 supplement see all
Searchlight analysis testing for the independent contributions of agency and visual categorizability to the animacy continuum.

The analysis followed the approach performed within the VTC ROI (Figure 4C, middle panel) but now separately for individual spheres (100 voxels). The independent contribution of agency is observed within anterior VTC, while the independent contribution of visual categorizability extends from posterior VTC into the lateral occipital regions. Results are corrected for multiple comparisons (Materials and methods). (B) The correlations between agency and the animacy continuum in the searchlight spheres (variance independent of visual categorizability, in red) and the mean of the correlations between image and perceptual categorizabilities and the animacy continuum in the searchlight spheres (variance independent of agency, in blue), are shown as a function of the MNI y-coordinate. For each participant, the correlations are averaged across x and z dimensions for all the searchlight spheres that survived multiple comparison correction in the searchlight analysis depicted in (A). The blue and red bounds around the means reflect the 95% confidence bounds of the average correlations across participants. The green area denotes the anatomical bounds of VTC. Visual categorizability contributes more than agency to the animacy organization in the spheres in posterior VTC. This difference in contribution switches within VTC and agency contributes maximally to the animacy organization in more anterior regions of VTC.

https://doi.org/10.7554/eLife.47142.012

Discussion

The present study investigated the organizing principles underlying the animacy organization in human ventral temporal cortex. Our starting point was the observation that the animacy organization expresses as a continuum rather than a dichotomy (Connolly et al., 2012; Sha et al., 2015; Carlson et al., 2014), such that some animals evoke more animate-like response patterns than others. Our results replicate this continuum, with the most animate response patterns evoked by humans and mammals and the weakest animate response patterns evoked by insects and snakes (Figure 4B). Unlike previous studies, our stimulus set was designed to distinguish between two possible organizing principles underlying the animacy continuum, reflecting the degree to which an animal is visually animate (visual categorizability) and the degree to which an animal is perceived to have thoughts and feelings (agency). We found that both dimensions independently explained part of the animacy continuum in VTC; together, they fully explained the animacy continuum. Whole-brain searchlight analysis revealed distinct clusters in which visual categorizability and agency explained the animacy continuum, with the agency-based organization located anterior to the visual categorizability-based organization. Below we discuss the implications of these results for our understanding of the animacy organization in VTC.

The independent contribution of visual categorizability shows that the animacy continuum in VTC is at least partly explained by the degree to which the visual features of an animal are typical of the animate category. This was observed both for the image features themselves (as represented in a CNN) and for the perceptual representations of these images in a behavioral task. These findings are in line with previous studies showing an influence of visual features on the categorical organization in high-level visual cortex (Baldassi et al., 2013; Coggan et al., 2016; Nasr et al., 2014; Rice et al., 2014; Jozwik et al., 2016). Furthermore, recent work has shown that mid-level perceptual features allow for distinguishing between animate and inanimate objects (Levin et al., 2001; Long et al., 2017; Schmidt et al., 2017; Zachariou et al., 2018) and that these features can elicit a VTC animacy organization in the absence of object recognition (Long et al., 2018). Our results show that (part of) the animacy continuum is similarly explained by visual features: animals that were more easily classified as animate by a CNN (based on visual features) were also more easily classified as animate in VTC. This correlation persisted when regressing out the perceived agency of the animals. Altogether, these findings support accounts that link the animacy organization in VTC to visual categorization demands (Grill-Spector and Weiner, 2014).

In parallel to investigations into the role of visual features in driving the categorical organization of VTC, other studies have shown that visual features do not full explain this organization (for reviews, see Peelen and Downing, 2017; Bracci et al., 2017). For example, animate-selective responses in VTC are also observed for shape- and texture-matched objects (Proklova et al., 2016; Bracci and Op de Beeck, 2016) and animate-like VTC responses can be evoked by geometric shapes that, through their movement, imply the presence of social agents (Castelli et al., 2002; Martin and Weisberg, 2003; Gobbini et al., 2007). The current results contribute to these findings by showing that (part of) the animacy continuum reflects the perceived agency of the animals: animals that were perceived as being relatively more capable of having thoughts and feelings were more easily classified as animate in VTC. This correlation persisted when regressing out the influence of animal-diagnostic visual features. These findings provide evidence that the animacy continuum is not fully explained by visual categorization demands, with perceived agency contributing significantly to the animacy organization. The finding of an agency contribution to the animacy continuum raises several interesting questions.

First, what do we mean with agency and how does it relate to other properties? In the current study, agency was measured as the perceived ability of an animal to think and feel. Ratings on these two scales were highly correlated with each other, and also likely correlate highly with related properties such as the ability to perform complex goal-directed actions, the degree of autonomy, and levels of consciousness (Appendix). On all of these dimensions, humans will score highest and animals that score highly will be perceived as relatively more similar to humans. As such, the agency contribution revealed in the current study may reflect a human-centric organization (Contini et al., 2019). Future studies could aim to disentangle these various properties.

Second, why would agency be an organizing principle? One reason for why agency could be an important organizing principle is because the level of agency determines how we interact with an animal: we can meaningfully interact with cats but not with slugs. To predict the behavior of high-agentic animals requires inferring internal states underlying complex goal-directed behavior (Sha et al., 2015). Again, these processes will be most important when interacting with humans but will also, to varying degrees, be recruited when interacting with animals. The agency organization may reflect the specialized perceptual analysis of facial and bodily signals that allow for inferring internal states, or the perceptual predictions that follow from this analysis.

Finally, how can such a seemingly high-level psychological property as agency affect responses in visual cortex? One possibility is that anterior parts of visual cortex are not exclusively visual and represent agency more abstractly, independent of input modality (Fairhall et al., 2017; van den Hurk et al., 2017). Alternatively, agency could modulate responses in visual cortex through feedback from downstream regions involved in social cognition. Regions in visual cortex responsive to social stimuli are functionally connected to the precuneus and medial prefrontal cortex – regions involved in social perspective taking and reasoning about mental states (Simmons and Martin, 2012). Indeed, it is increasingly appreciated that category-selective responses in visual cortex are not solely driven by bottom-up visual input but are integral parts of large-scale domain-selective networks involved in domain-specific tasks like social cognition, tool use, navigation, or reading (Peelen and Downing, 2017; Price and Devlin, 2011; Martin, 2007; Mahon and Caramazza, 2011). Considering the close connections between regions within each of these networks, stimuli that strongly engage the broader system will also evoke responses in the visual cortex node of the network, even in the absence of visual input (Peelen and Downing, 2017; Amedi et al., 2017).

An alternative possibility is that agency is computed within the visual system based on visual features. This scenario is consistent with our results as long as these features are different from the features leading to animate-inanimate categorization. Similar to the visual categorizability of animacy, visual categorizability of agency could arise if there was strong pressure to quickly determine the agency of an animal based on visual input. In a supplementary analysis, we found that a model based on the features represented in the final fully connected layer of a CNN allows for predicting agency ratings (Appendix). As such, it remains possible that agency is itself visually determined.

The unique contributions of agency and visual categorizability were observed in different parts of VTC, with the agency cluster located anterior to the visual categorizability cluster. This posterior-anterior organization mirrors the well-known hierarchical organization of visual cortex. A similar posterior-anterior difference was observed in studies dissociating shape and category representations in VTC, with object shape represented posterior to object category (Proklova et al., 2016; Bracci and Op de Beeck, 2016). The finding that visual categorizability and agency expressed in different parts of VTC is consistent with multiple scenarios. One possibility is that VTC contains two distinct representations of animals, one representing category-diagnostic visual features and one representing perceived agency. Alternatively, VTC may contain a gradient from a more visual to a more conceptual representations of animacy, with the visual representation gradually being transformed into a conceptual representation. More work is needed to distinguish between these possibilities.

In sum, our results provide evidence that two principles independently contribute to the animacy organization in human VTC: visual categorizability, reflecting the presence of animal-diagnostic visual features, and agency, reflecting the degree to which animals are perceived as thinking and feeling social agents. These two principles expressed in different parts of visual cortex, following a posterior-to-anterior distinction. The finding of an agency organization that is not explained by differences in visual categorizability raises many new questions that can be addressed in future research.

Materials and methods

Neural network for image categorizability

Request a detailed protocol

Image categorizability was quantified using a convolutional neural network (CNN), VGG-16 (Simonyan and Zisserman, 2015), which had 13 convolutional layers and 3 fully-connected layers which map 224 × 224 × 3 RGB images to a 1000 dimensional object category space (each neuron corresponds to distinct labels such as cat and car). The CNN was taken from the MatConvNet package (Vedaldi and Lenc, 2015) and was pre-trained on images from the ImageNet ILSVRC classification challenge (Russakovsky et al., 2015).

Activations were extracted from the final fully-connected layer, prior to the softmax operation, for 960 colored images obtained from Kiani et al. (2007) of which 480 contain an animal or animal parts and the rest contain inanimate objects or their parts. The dimensionality of the activations was reduced to 495 dimensions using principal component analysis in order to reduce the training time while keeping the captured variance high (more details can be found in Thorat, 2017). Support vector machines (SVMs) with linear kernels were trained (with the default parameters of the function fitcsvm in MATLAB r2017b, The MathWorks, Natick, MA) to distinguish between animate and inanimate object-driven activations. Training accuracies were quantified using 6-fold cross-validation. The average cross-validation accuracy was 92.2%. The image categorizability of an object was defined as the distance to the representation of that object from the decision boundary of the trained classifier. The stimuli used in the subsequent tasks, behavioral and fMRI, did not occur in this training set of 960 images.

Visual search task for perceptual categorizability

Request a detailed protocol

Perceptual categorizability was quantified using a visual search task adopted from Mohan and Arun (2012). 29 healthy adults participated in this experiment, of which 21 (9 females; age range: 19–62, median: 27) were selected according to the conditions specified below to obtain an estimate of perceptual categorizability. All participants gave informed consent. All procedures were carried out in accordance with the Declaration of Helsinki and were approved by the ethics committee of Radboud University (ECSW2017-2306-517).

For details of the experimental procedure, we refer the reader to Proklova et al. (2016). Briefly, on each trial, participants were presented with 16 images and had to indicate, as fast and as accurately as possible, in which panel (left or right) the oddball image occurred (see Figure 1C). The 15 distractor images were identical except that they varied in size. Participants were not instructed to look for a particular category and only had to indicate the position of the different-looking object. All participants had accuracies of 90% and higher in every block. The trials on which they made an error were repeated at the end of the respective blocks. Psychtoolbox (Brainard, 1997) controlled the stimuli presentation. Gray-scaled images of the animate objects (four exemplars each of 12 animals and humans) used in the fMRI experiment and 72 images (36 animate) from the functional localizer experiment of Proklova et al. (2016) were used in this experiment. Images were gray-scaled to make the participants capitalize on differences in object shapes and textures rather than color.

In order to obtain perceptual categorizability scores for the animate objects used in the fMRI experiment, we trained animate-inanimate classifiers on representations capturing perceptual similarity between objects. For each participant, the images of animate objects from the fMRI experiment were the test set, and 28 (14 animate) images randomly chosen from the 72 images were the training set. In order to obtain representations which encoded perceptual similarity between objects, each of the images from the training and test set were used as either targets or distractors (randomly chosen for each participant) while the images from the training set were used as distractors or targets (corresponding to the previous choice made for each participant). The inverse of the reaction time was used as a measure of perceptual similarity (Mohan and Arun, 2012). For each of the images in the train and test set, 82 values (1/RT) were obtained which were associated with a perceptual similarity-driven representational space and were used as features for the animate-inanimate classifier. A linear SVM was trained to classify the training images as animate or inanimate. The distances of the representations of the test images were then calculated from the classification boundary and were termed decision scores. This resulted, for each participant, in decision scores for images of animals and humans used in the main fMRI experiment.

For further analysis, only those participants who had both training (4-fold cross-validation) and test accuracies for animacy classification above 50% were selected. For the relevant 21 participants, the mean training accuracy was 63.4% (>50%, < 10-4), and the mean test accuracy was 70.0% (>50%, < 10-4). Each object’s perceptual categorizability was quantified as the average of its decision scores across participants.

Main ratings experiment for agency

Request a detailed protocol

Agency was quantified using a ratings experiment. Sixteen healthy adults participated in this experiment (9 females; all students at the University of Trento). All participants gave informed consent. All procedures were carried out in accordance with the Declaration of Helsinki and were approved by the ethics committee of the University of Trento (protocol 2013–015).

In each trial, four colored images of an animal from a set of 40 animals were shown, and participants were asked to indicate, on a scale of 0 to 100, how much thoughtfulness or feelings they attributed to the animal, or how familiar they were with the animal. These three factors constituted three blocks of the experiment (the order was randomized across participants). At the beginning of each block, a description of the relevant factor was provided. Participants were encouraged to use the descriptions as guidelines for the three factors. In quantifying their familiarity with an animal, participants had to account for the knowledge about and the amount of interaction they have had with the animal. In quantifying the thoughtfulness an animal might have, participants had to account for the animal’s ability in planning, having intentions, and abstraction. In quantifying the feelings an animal might have, participants had to account for the animal’s ability to empathise, have sensations, and react to situations.

As mentioned in the Results section, the feelings and thoughtfulness ratings co-varied substantially with each other (the within-participant correlations were at the noise ceilings of the two factors). Agency was quantified as the average of the ratings for feelings and thoughtfulness.

fMRI experiment

A functional magnetic resonance imaging (fMRI) experiment was performed to obtain the animacy continua in high-level visual cortex, specifically the ventral temporal cortex (VTC). The design was adopted from Proklova et al. (2016). Schematics of the main experiment and the animacy localizer experiment are shown in Figure 3.

Participants

Seventeen healthy adults (6 females; age range: 20 − 32, median: 25) were scanned at the Center for Mind/Brain Sciences of the University of Trento. This sample size was chosen to be the same as in Proklova et al. (2016), as our animacy localizer and main experiment procedure were similar to theirs. All participants gave informed consent. All procedures were carried out in accordance with the Declaration of Helsinki and were approved by the ethics committee of the University of Trento (protocol 2013–015).

Main experiment procedure

Request a detailed protocol

The stimuli consisted of colored images (4 exemplars each) of the 12 animals for which image categorizability and agency were orthogonalized, humans, and three inanimate objects (cars, plants, and chairs). There were a total of 64 images.

The main experiment consisted of eight runs. Each run consisted of 80 trials that were composed of 64 object trials and 16 fixation-only trials. In object trials, a single stimulus was presented for 300 ms, followed by a 3700 ms fixation period. In each run, each of the 64 images appeared exactly once. In fixation-only trials, the fixation cross was shown for 4000 ms. Trial order was randomized, with the constraints that there were exactly eight one-back repetitions of the same category (e.g., two cows in direct succession) within the object trials and that there were no two fixation trials appearing in direct succession. Each run started and ended with a 16s fixation period, leading to a total run duration of 5.9 min. Participants were instructed to press a button whenever they detected a one-back object repetition.

Animacy localizer experiment procedure

Request a detailed protocol

In addition to the main experiment, participants completed one run of a functional localizer experiment. During the localizer, participants viewed grey-scale images of 36 animate and 36 inanimate stimuli in a block design. Each block lasted 16s, containing 20 stimuli that were each presented for 400 ms, followed by a 400 ms blank interval. There were eight blocks of each stimulus category and four fixation-only blocks per run. The order of the first 10 blocks was randomized and then mirror-reversed for the other 10 blocks. Participants were asked to detect one-back image repetitions, which happened twice during every non-fixation block.

fMRI acquisition

Request a detailed protocol

Imaging data were acquired using a MedSpec 4-T head scanner (Bruker Biospin GmbH, Rheinstetten, Germany), equipped with an eight-channel head coil. For functional imaging, T2*-weighted EPIs were collected (repetition time = 2.0s, echo-time = 33 ms, 73° flip-angle, 3 mm × 3 mm × 3 mm voxel size, 1 mm gap, 34 slices, 192 mm field of view, 64 × 64 matrix size). A high-resolution T1-weighted image (magnetization prepared rapid gradient echo; 1 mm × 1 mm × 1 mm voxel size) was obtained as an anatomical reference.

fMRI data pre-processing

Request a detailed protocol

The fMRI data were analyzed using MATLAB and SPM8. During preprocessing, the functional volumes were realigned, co-registered to the structural image, re-sampled to a 2 mm × 2 mm × 2 mm grid, and spatially normalized to the Montreal Neurological Institute 305 template included in SPM8. No spatial smoothing was applied.

Region of interest - Ventral temporal cortex

Request a detailed protocol

VTC was defined as in Haxby et al. (2011). The region extended from −71 to −21 on the y-axis of the Montreal Neurological Institute (MNI) coordinates. The region was drawn to include the inferior temporal, fusiform, and lingual/parahippocampal gyri. The gyri were identified using Automated Anatomical Labelling (AAL) parcellation (Tzourio-Mazoyer et al., 2002).

Obtaining the animacy continua from fMRI data

Request a detailed protocol

Animacy continua were extracted from parts of the brain (either a region of interest such as VTC or a searchlight sphere) with a cross-decoding approach. SVM classifiers were trained on the BOLD images obtained from the animate and inanimate blocks of the localizer experiment, and tested on the BOLD images obtained from the main experiment. The degree of animacy of an object is given by the distance of its representation from the classifier decision boundary. As the BOLD response is delayed by seconds after stimulus onset, we had to decide the latency of the BOLD images we wanted to base our analysis on. The classification test accuracy and the animacy continuum noise ceiling for the objects from the main experiment were higher for the BOLD images at 4s latency than both 6s latency and the average of the images at 4 and 6s latencies. Therefore, we based our analysis on the BOLD images at 4s latency. The findings remain unchanged across the latencies mentioned.

All the images of the brain shown in this article were rendered using MRIcron.

Comparing models with the animacy continua in the brain

Request a detailed protocol

We compared the animacy continua in the brain (such as the animacy continuum in VTC and animacy continua in searchlight spheres) with image and perceptual categorizabilities (visual categorizability), agency, their combination, and their independent components. The comparisons were performed at participant-level with rank-order correlations (Kendall's τ). The comparison between an animacy continuum and the independent component of a model was performed by regressing out other models from the animacy continuum and correlating the residue with the model of interest, for each participant.

Given a participant, the comparison between an animacy continuum and the combination of models was computed as follows. The animacy continuum was modeled as a linear combination of the models (with linear regression) for the rest of the participants. The regression weights associated with each model in the combination across those participants were averaged, and the animacy continuum of the participant of interest was predicted using a linear combination of the models using the averaged weights. The predicted animacy continuum was then correlated with the actual animacy continuum of this participant. This procedure was implemented iteratively for each participant to get a group estimate of the correlation between an animacy continuum and a combination of models.

Comparing visual categorizability with the animacy continua

Request a detailed protocol

The contribution of visual categorizability to an animacy continuum is gauged by the comparison between that animacy continuum and a combination of image and perceptual categorizabilities in a leave-one-participant-out analysis as mentioned above. The independent contribution of visual categorizability to an animacy continuum is gauged by regressing out agency from the image and perceptual categorizabilities and combining the residues to model the animacy continuum in a leave-one-participant-out analysis as mentioned above. When visual categorizability is to be regressed out of an animacy continuum (to obtain the independent contribution of agency), image and perceptual categorizabilities are regressed out. When visual categorizability is to be included in a combination of models, image and perceptual categorizabilities are added as models.

To assess if a model or a combination of models explained all the variance in the animacy continuum across participants, for each participant we tested if the correlation between the model or the animacy continuum predicted by the combined model (in a leave-one-out fashion as above) and the average of animacy continua of the other participants was lower than the correlation between the participant’s animacy continuum and the average of animacy continua of the other participants. On the group level, if this one-sided test (see ‘Statistical tests in use’) was not significant (p > 0.05), we concluded that the correlation between the model or a combination of models hit the animacy continuum noise ceiling and thus explained all the variance in the animacy continuum across participants. In the comparisons in Figure 4C–D, only the correlation between VTC animacy and the combination of visual categorizability and agency was at VTC animacy noise ceiling.

Searchlight details

Request a detailed protocol

In the whole-brain searchlight analysis, the searchlight spheres contained 100 proximal voxels. SVM classifiers were trained to distinguish between the BOLD images, within the sphere, corresponding to animate and inanimate stimuli from the localizer experiment. The classifiers were tested on the BOLD images, within the sphere, from the main experiment. Threshold-free cluster enhancement (TFCE; Smith and Nichols, 2009) with a permutation test was used to correct for multiple comparisons of the accuracies relative to baseline (50%). Further analysis was constrained to the clusters which showed above-chance classification (between-subjects, p < 0.05, on both localizer and main experiment accuracies) of animate and inanimate objects. Within each searchlight sphere that survived the multiple comparisons correction, the animacy continuum was compared with image and perceptual categorizabilities (after regressing out agency) and agency (after regressing out both image and perceptual categorizabilities). Multiple comparisons across spheres of correlations to baseline (0) were corrected using TFCE. The independent visual categorizability clusters were computed as a union of spheres that had a significant contribution (independent of agency) from either image or perceptual categorizabilities.

Statistical tests in use

Request a detailed protocol

Hypothesis testing was done with bootstrap analysis. We sampled 10,000 times with replacement from the observations being tested. p-Values correspond to one minus the proportion of sample means that are above or below the null hypothesis (corresponding to the test of interest). The p-values reported in the paper correspond to one-sided tests. The 95% confidence intervals were computed by identifying the values below and above which 2.5% of the values in the bootstrapped distribution lay. Exact p-values are reported except when means of all the bootstrap samples are higher or lower than hypothesized in which case we mention < 10-4.

Appendix 1

Agency can be derived from visual feature differences

To test whether agency ratings can be predicted based on high-level visual feature representations, agency ratings were collected for a set of 436 images. The activation patterns of these images in the final fully-connected layer (FC8) of VGG-16 was established. A regression model trained on these activation patterns could accurately predict agency ratings of the stimuli used in our fMRI experiment, as described in more detail below.

Agency ratings were collected for 436 object images, which included the 12 animal images from the main fMRI experiment. The ratings experiment was similar to the main ratings experiment. However, instead of thoughtfulness and feelings, 16 participants rated the agency of animals shown in the stimuli. All participants gave informed consent. All procedures were carried out in accordance with the Declaration of Helsinki and were approved by the ethics committee of Radboud University (ECSW2017-2306-517). One image of an object was shown at a time. Agency was defined as the capacity of individuals to act independently and to make their own free choices, and participants were instructed to consider factors such as ‘planning, intentions, abstraction, empathy, sensation, reactions, thoughtfulness, feelings’. The agency ratings for the 12 animals co-varied positively with the agency scores from the main ratings experiment (τ=0.75, p<10-4), and the mean correlation was at (main ratings experiment's) agency noise ceiling (τ=0.75).

Activations from VGG-16 FC8 were extracted for these 436 images and principal component analysis was performed on the activations driven by the 388 images, excluding the 12 × 4 animal images from the main fMRI experiment. A cross-validated regression analysis was performed, with the agency ratings as the dependent variable and the principal components of FC8 as the independent variables. The first 20 principal components (regularisation cut-off) were included in the final model, as the models with more components provided with little gains in the similarities of the computed scores to the actual agency scores for the left-out images, while the similarities for the included images kept increasing (over-fitting). The agency scores were computed for the left out 12 animals and compared to the agency ratings obtained from the current experiment. They co-varied positively (τ=0.61, p<10-4) but the mean correlation was not at the agency ratings noise ceiling (τ=0.79). These observations show that agency ratings can be predicted based on high-level visual feature representations in a feedforward convolutional neural network.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
    How deep is the feature analysis underlying rapid visual categorization?
    1. S Eberhardt
    2. JG Cader
    3. T Serre
    (2016)
    Advances in Neural Information Processing Systems. pp. 1100–1108.
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
    Very deep convolutional networks for Large-Scale image recognition
    1. K Simonyan
    2. A Zisserman
    (2015)
    International Conference on Learning Representations.
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
    Matconvnet: convolutional neural networks for matlab
    1. A Vedaldi
    2. K Lenc
    (2015)
    Proceedings of the 23rd ACM International Conference on Multimedia ACM. pp. 689–692.
    https://doi.org/10.1145/2733373.2807412
  44. 44

Decision letter

  1. Thomas Serre
    Reviewing Editor; Brown University, United States
  2. Michael J Frank
    Senior Editor; Brown University, United States
  3. James Haxby
    Reviewer; Dartmouth College, United States
  4. Dirk Bernhardt-Walther
    Reviewer

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "The nature of the animacy organization in human ventral temporal cortex" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Thomas Serre as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Michael Frank as the Senior Editor. The following individuals involved in the review of your submission have agreed to reveal their identity: James Haxby (Reviewer #2); Dirk Bernhardt-Walther (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This is a timely manuscript that seeks to explain the animacy continuum found in the Ventral Temporal Cortex (VTC). The authors used computational and behavioral methods to parameterize a set of stimuli used for an fMRI experiment to disentangle visual categorizability vs. agency. While these two measures have been confounded in previous work, the present study uses a stimulus set in which visual categorizability and agency are dissociated based on representations derived from convolutional neural networks and behavioral experiments. A linear model which incorporates both measures fully explains the continuum found in the fMRI experiment. A subsequent spotlight search shows that categorizability and agency are actually represented in separate clusters of voxels further strengthening the main result.

The reviewers unanimously agreed that the paper is of sufficient interest, quality, and novelty to merit accepting it for publication in eLife. The reviewers all agreed that no additional experiments are needed. A revision is however needed to clarify some of the methods and concepts used as well as to provide additional data analyses. We have the following suggestions to improve the manuscript.

Essential revisions:

The reviewers suggest further analyses and figures to clarify how the representation of visual categorization and of agency are structured. Is the animacy continuum based both on visual features and agentic properties, or are there two animacy continua or a gradient reflecting a transition from a representation of visual information that lays the foundation and a representation of agency? Should the term "animacy continuum" refer to an amalgam of visual features and agency, or should it be reserved for semantic content?

One suggestion includes a multidimensional scaling analysis similar to the analyses and figures of Connolly et al. (2016). In the original paper, the authors showed gradients of changes in representational geometry that (a) disentangled representation of animal taxonomy vs. visual appearance in VTC and (b) disentangled predacity from visual appearance and taxonomy along the STS. A similar method could be used to help clarify exactly how the animacy continuum in VTC reflects both visual categorizability and agency. They seem to be dissociable, and the anatomical and conceptual aspects of this dissociation could be explored and explicated more thoroughly.

In a similar vein, the authors pose the question "How can such a seemingly high-level psychological property as agency affect responses in visual cortex?" Why the need to assume VTC is purely visual? This assumption carries with it the assertion that semantic information must come from elsewhere in a large-scale network. It seems quite possible that IT cortex makes the same computations to extract this critical feature from the constellation of visual features and learned associations that could be calculated elsewhere in a network, and direct and automatic extraction of this information in VTC may be more efficient and adaptive, as the authors acknowledge later in the Discussion. Are such computations "visual" or should VTC be characterized as something more than "visual"? The authors cite numerous papers that show the influence of nonvisual information on representation in VTC, and clearer incorporation of these facts in their reasoning could help here. Incidentally, the discussion of nonvisual activation of VTC should include a paper by Fairhall et al. (2017), who show representation in VTC of agentic properties conveyed by voices in the congenitally blind.

With regard to using a CNN as a measure of visual categorizablity, the authors write, "Because this measure was based on a feedforward transformation of the images, we label this measure image categorizability." This statement may not be completely accurate. The training of a CNN uses millions of semantically-labeled images, meaning that semantics are incorporated in the development of feature spaces that afford view-invariant, lighting-invariant, and most importantly exemplar-invariant labeling of new images. Thus, characterizing this CNN as solely feedforward overlooks the fact that it is trained with semantic labels and produces semantic labels. This gray area between "visual categorizability" and semantic classification needs to be clarified. The current manuscript tries to make a binary distinction between visual categorizability and semantic judgment of agency, when in fact it is more complicated insofar as both the CNN and behavioral criteria for visual categorizability can be influenced by semantics – the CNN in how it is trained and the behavioral task in terms of the role that automatic activation of semantic features and categories may influence response times.

The reviewers suggest a rework of the Materials and methods section as they found it difficult to decipher what exactly had been done in the experiment and in the data analysis. Below is a list of clarifications or requests for missing information.

Clarifications regarding methods for using the CNN are also requested:

1) Please confirm/clarify that the network was not specifically fine-tuned for animal/non-animal categorization. An SVM is trained for animal vs. non-animal categorization on top of features extracted from a pre-trained CNN.

2) More generally, when using a CNN to get a visual categorizability score per image, the choice of the final layer as "feature representation" is somewhat odd because units are already category selective and responses across category units tend to be very sparse. For these kinds of transfer learning tasks, people normally consider layers below. Prior work (see Eberhardt et al., 2016) has shown that the best correlation between layer outputs and human decisions for rapid animal/non-animal categorization was actually in the higher convolutional layers. Please comment.

3) In the statement: "Training accuracies were quantified using 6-fold cross-validation. The training accuracy was 92.2%." The authors probably meant "average test or validation accuracy", right (i.e., by training using 5 folds and testing on the remaining one and averaging over 6 rounds)? Please confirm.

4) The image categorizability of an object was defined as the distance to the representation of that object from the decision boundary of the trained classifier. Please confirm that this is distance when the image is used as a test.

5) There is a methodological difference in the way categorizability scores are derived for CNNs and humans. Why? It would have been better to use the cross-validation method for both but obviously, there are constraints on how perceptual measures can be derived. Why not then use the same method for CNNs as for the human experiment of categorizability?

Clarifications regarding human experiments:

1) Which classifier was used for the animacy decoding?

2) What is the "animacy continuum"? Is it the unthresholded decision value of the classifier that was trained on the localizer run?

3) The authors introduced two separate measures of categorizability, which turn out to be correlated. How do these two separate measures of categorizability interact with animacy?

4) The linear relationship between animacy, agency, and categorizability could be investigated solely based on behavioral data. What was the scientific contribution that we obtained from the fMRI data? There is no question that there is one, but we would like to see this worked out better.

5) There is no statement about informed consent for the psychophysics experiments.

6) The image used differ across experiments: sometimes images are grayscale, sometimes color and sometimes unspecified. Please comment.

7) The subsection about "Comparing visual categorizability with the animacy continua" is confusing. As written, the statistical analysis tests whether the model correlation with the animacy score is lower than that of the human to human correlation. If the corresponding p-value is <.05 as stated, then the model is indeed lower than the noise ceiling not at the noise ceiling? Please clarify. At the very least the wording should be such as to remove any source of ambiguity about the null hypothesis etc.

8) It is said that "Animacy classifiers were trained on the BOLD images obtained from the animate and inanimate blocks of the localizer experiment and tested on the BOLD images obtained from the main experiment." How do you then get a p-value for significance for above chance classification for each sphere as reported in the subsection “Searchlight details”?

https://doi.org/10.7554/eLife.47142.020

Author response

Essential revisions:

The reviewers suggest further analyses and figures to clarify how the representation of visual categorization and of agency are structured. Is the animacy continuum based both on visual features and agentic properties, or are there two animacy continua or a gradient reflecting a transition from a representation of visual information that lays the foundation and a representation of agency? Should the term "animacy continuum" refer to an amalgam of visual features and agency, or should it be reserved for semantic content?

Thanks for raising these intriguing questions. In terms of terminology, we use “animacy continuum” to refer to the continuum observed in the representational distances from the decision boundary of the animate-inanimate classifier in VTC. As we (and others before us) show, not all animals are equally animate when considering SVM decision scores – some animals (e.g., mammals and humans) are very strongly animate while others (e.g., insects and reptiles) are weakly animate. As we outline in the Introduction of the paper, this continuum in visual cortex might be explained by differences in animate-diagnostic features, agency, or both. Our results show that the animacy continuum in VTC is explained by both visual features and agency. In a new analysis (Figure 5B), we confirm that these two factors contribute at different locations in the visual system. This is consistent with a gradient reflecting a transformation from visual features to agency. However, it is also consistent with the existence of two separate continua that appear as gradients due to overlap or cross-participant averaging. We have added these considerations to the Discussion: “The finding that visual categorizability and agency expressed in different parts of VTC is consistent with multiple scenarios. One possibility is that VTC contains two distinct representations of animals, one representing diagnostic visual features and one representing perceived agency. Alternatively, VTC may contain a gradient from a more visual to a more conceptual representations of animacy, with the visual representation gradually being transformed into a conceptual representation. More work is needed to distinguish between these possibilities.”

One suggestion includes a multidimensional scaling analysis similar to the analyses and figures of Connolly et al. (2016). In the original paper, the authors showed gradients of changes in representational geometry that (a) disentangled representation of animal taxonomy vs. visual appearance in VTC and (b) disentangled predacity from visual appearance and taxonomy along the STS. A similar method could be used to help clarify exactly how the animacy continuum in VTC reflects both visual categorizability and agency. They seem to be dissociable, and the anatomical and conceptual aspects of this dissociation could be explored and explicated more thoroughly.

We have added new analyses in response to these suggestions:

1) Figure 5B shows the contributions of visual categorizability and agency to the animacy continuum along the posterior-anterior axis. Confirming the searchlight maps, we observed a clear posterior-to-anterior transition.

2) We have added a principal component analysis on VTC patterns, similar to Connolly et al. (2016) to Figure 4 (Figure 4—figure supplement 1). This analysis, while not directly testing our hypothesis, visualizes the main dimensions of VTC representations more generally, as well as how these relate to VTC animacy.

In a similar vein, the authors pose the question "How can such a seemingly high-level psychological property as agency affect responses in visual cortex?" Why the need to assume VTC is purely visual? This assumption carries with it the assertion that semantic information must come from elsewhere in a large-scale network. It seems quite possible that IT cortex makes the same computations to extract this critical feature from the constellation of visual features and learned associations that could be calculated elsewhere in a network, and direct and automatic extraction of this information in VTC may be more efficient and adaptive, as the authors acknowledge later in the Discussion. Are such computations "visual" or should VTC be characterized as something more than "visual"? The authors cite numerous papers that show the influence of nonvisual information on representation in VTC, and clearer incorporation of these facts in their reasoning could help here. Incidentally, the discussion of nonvisual activation of VTC should include a paper by Fairhall et al. (2017), who show representation in VTC of agentic properties conveyed by voices in the congenitally blind.

We agree that it is possible that VTC itself is not exclusively visual. We now added this possibility to the cited paragraph, which now also includes a citation to the Fairhall et al. paper: “One possibility is that anterior parts of visual cortex are not exclusively visual and represent agency more abstractly, independent of input modality (Fairhall et al., 2017; van den Hurk et al., 2017). Alternatively, agency could modulate responses in visual cortex through feedback from downstream regions involved in social cognition.”

With regard to using a CNN as a measure of visual categorizablity, the authors write, "Because this measure was based on a feedforward transformation of the images, we label this measure image categorizability." This statement may not be completely accurate. The training of a CNN uses millions of semantically-labeled images, meaning that semantics are incorporated in the development of feature spaces that afford view-invariant, lighting-invariant, and most importantly exemplar-invariant labeling of new images. Thus, characterizing this CNN as solely feedforward overlooks the fact that it is trained with semantic labels and produces semantic labels. This gray area between "visual categorizability" and semantic classification needs to be clarified. The current manuscript tries to make a binary distinction between visual categorizability and semantic judgment of agency, when in fact it is more complicated insofar as both the CNN and behavioral criteria for visual categorizability can be influenced by semantics – the CNN in how it is trained and the behavioral task in terms of the role that automatic activation of semantic features and categories may influence response times.

We agree that it is likely that there are semantic influences on our visual measures, if only because the CNN was trained with semantically-labeled images. We have changed the mentioned statement to: "Because this measure was based on a feedforward transformation of the images which was not informed by inferred agentic properties of the objects (such as thoughtfulness), we label this measure image categorizability.” Importantly, in our work, we are interested in a specific part of semantic information – agency. VGG-16 is trained to classify images but is not provided with any kind of information about the agency of the objects in the images.

The behavioral task might also show some influence of semantic information. We now mention this in the text: “The neural representations the reaction times in this task rely on are not fully known, and might reflect information about inferred agency of the objects. As such, accounting for the contribution of perceptual categorizability in subsequent analyses provides a conservative estimate of the independent contribution of agency to neural representations in VTC.”

The reviewers suggest a rework of the Materials and methods section as they found it difficult to decipher what exactly had been done in the experiment and in the data analysis. Below is a list of clarifications or requests for missing information.

Clarifications regarding methods for using the CNN are also requested:

1) Please confirm/clarify that the network was not specifically fine-tuned for animal/non-animal categorization. An SVM is trained for animal vs. non-animal categorization on top of features extracted from a pre-trained CNN.

Yes, we trained an SVM on top of features of a pre-trained CNN. This is mentioned as follows: “The CNN was taken from the MatConvNet package (Vedaldi and Lenc, 2015) and was pre-trained on images from the ImageNet ILSVRC classification challenge (Russakovsky et al., 2015). Activations were extracted from the final fully-connected layer, prior to the softmax operation, for 960 colored images obtained from Kiani et al. (2007) of which 480 contain an animal or animal parts and the rest contain inanimate objects or their parts.”

2) More generally, when using a CNN to get a visual categorizability score per image, the choice of the final layer as "feature representation" is somewhat odd because units are already category selective and responses across category units tend to be very sparse. For these kinds of transfer learning tasks, people normally consider layers below. Prior work (see Eberhardt et al., 2016) has shown that the best correlation between layer outputs and human decisions for rapid animal/non-animal categorization was actually in the higher convolutional layers. Please comment.

We chose to focus on FC8 based on our previous (unpublished) work in which we observed that the neural representations in VTC correlated as highly as any other layer with the neural representations in FC8 of VGG-16, and that the animal/non-animal classification performance was one of the highest in FC8.

Following the reviewer’s suggestion, and motivated by Eberhardt et al., we ran the main analysis again using C5-2 features to compute image categorizability. Results are reported in Figure 4—figure supplement 3. As shown in the figure, results were highly similar when image categorizability (IC) was based on features of C5-2 rather than FC8. Specifically, the independent contributions of visual categorizability and agency to VTC animacy remained significant and the correlation between the combined model and VTC animacy was at VTC animacy noise ceiling.

We now also present the correlations between the image categorizability scores across all layers in Figure 1—figure supplement 1. This shows that the image categorizability in the fully connected layers (FC6-FC8) are highly similar. All the findings described in the paper are robust to a change in layer-selection among the fully connected layers.

3) In the statement: "Training accuracies were quantified using 6-fold cross-validation. The training accuracy was 92.2%." The authors probably meant "average test or validation accuracy", right (i.e., by training using 5 folds and testing on the remaining one and averaging over 6 rounds)? Please confirm.

Yes. We have changed the statement to: "The average cross-validation accuracy was 92.2%".

4) The image categorizability of an object was defined as the distance to the representation of that object from the decision boundary of the trained classifier. Please confirm that this is distance when the image is used as a test.

Yes. To clarify, we have added this sentence to the corresponding Materials and methods section: “The stimuli used in the subsequent tasks, behavioral and fMRI, did not occur in this training set of 960 images.”

5) There is a methodological difference in the way categorizability scores are derived for CNNs and humans. Why? It would have been better to use the cross-validation method for both but obviously, there are constraints on how perceptual measures can be derived. Why not then use the same method for CNNs as for the human experiment of categorizability?

In principle, the method used for extracting image categorizability scores from the CNN is the most straightforward. As noted by the reviewer, that method could not be used to extract perceptual categorizability scores. In computing perceptual categorizability we obtained a measure of visual similarity between two given images. There are numerous metrics which could be used to obtain such a measure of visual similarity between the representations of two images from a CNN (e.g. Pearson correlation, Euclidean distance, Isomap distance). Which of such metrics is appropriate is not a trivial consideration. Rather than choosing an arbitrary metric, we chose to stick to the most straightforward approach for the CNN. It should be noted that image and perceptual categorizability correlated quite strongly, despite the differences in the methodology of these measures.

Clarifications regarding human experiments

1) Which classifier was used for the animacy decoding?

SVMs were used for animacy decoding, and this is now indicated in the sub-section “Obtaining the animacy continua from fMRI data”.

2) What is the "animacy continuum"? Is it the unthresholded decision value of the classifier that was trained on the localizer run?

Yes, as indicated by "The degree of animacy of an object is given by the distance of its representation from the classifier decision boundary." in the sub-section “Obtaining the animacy continua from fMRI data”.

3) The authors introduced two separate measures of categorizability, which turn out to be correlated. How do these two separate measures of categorizability interact with animacy?

We have added two new figures (Figure 4—figure supplement 2; Figure 5—figure supplement 1) to show the separate contributions of visual and perceptual categorizability to animacy after regressing out agency.

4) The linear relationship between animacy, agency, and categorizability could be investigated solely based on behavioral data. What was the scientific contribution that we obtained from the fMRI data? There is no question that there is one, but we would like to see this worked out better.

Our study aimed to improve our understanding of the organizing principles of human ventral temporal cortex. Without the fMRI data we would not have had access to the animacy continuum in VTC. Behavioral experiments can be used to obtain estimates of how animate an object is for humans, and how visual or conceptual features contribute to this, but would not tell us how those estimates relate to the animacy continuum in VTC.

5) There is no statement about informed consent for the psychophysics experiments.

We have added informed consent statements for every experiment mentioned.

6) The image used differ across experiments: sometimes images are grayscale, sometimes color and sometimes unspecified. Please comment.

We now mention whether the images were colored or gray-scaled in the methods description of each experiment.

7) The subsection about "Comparing visual categorizability with the animacy continua" is confusing. As written, the statistical analysis tests whether the model correlation with the animacy score is lower than that of the human to human correlation. If the corresponding p-value is <.05 as stated, then the model is indeed lower than the noise ceiling not at the noise ceiling? Please clarify. At the very least the wording should be such as to remove any source of ambiguity about the null hypothesis etc.

We have adjusted the wording: “To assess if a model or a combination of models explained all the variance in the animacy continuum across participants, for each participant we tested if the correlation between the model or the animacy continuum predicted by the combined model (in a leave-one-out one fashion as above) and the average of animacy continua of the other participants was lower than the correlation between the participant's animacy continuum and the average of animacy continua of the other participants. On the group level, if this one-sided test (see “Statistical tests in use") was not significant (p > 0.05), we concluded that the correlation between the model or a combination of models hit the animacy continuum noise ceiling and thus explained all the variance in the animacy continuum across participants.”

8) It is said that "Animacy classifiers were trained on the BOLD images obtained from the animate and inanimate blocks of the localizer experiment and tested on the BOLD images obtained from the main experiment." How do you then get a p-value for significance for above chance classification for each sphere as reported in the subsection “Searchlight details”?

The “Searchlight details” section has been updated to further clarify the methods. The p-value mentioned in the question is computed in a test of above-chance classification for each sphere across participants.

https://doi.org/10.7554/eLife.47142.021

Article and author information

Author details

  1. Sushrut Thorat

    Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
    Contribution
    Conceptualization, Investigation, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    s.thorat@donders.ru.nl
    Competing interests
    No competing interests declared
  2. Daria Proklova

    Brain and Mind Institute, University of Western Ontario, London, Canada
    Contribution
    Conceptualization, Resources
    Competing interests
    No competing interests declared
  3. Marius V Peelen

    Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
    Contribution
    Conceptualization, Supervision, Funding acquisition, Writing—original draft, Writing—review and editing
    For correspondence
    m.peelen@donders.ru.nl
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4026-7303

Funding

Horizon 2020 Framework Programme (725970)

  • Marius V Peelen

Autonomous Province of Trento (ATTEND)

  • Marius V Peelen

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Daniel Kaiser for his help with experimental design. The research was supported by the Autonomous Province of Trento, Call ‘Grandi Progetti 2012’, project ‘Characterizing and improving brain mechanisms of attention - ATTEND’. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 725970).

Ethics

Human subjects: All participants gave informed consent. All procedures were carried out in accordance with the Declaration of Helsinki and were approved by the ethics committees of the University of Trento (protocol 2013-015) and the Radboud University (ECSW2017-2306-517).

Senior Editor

  1. Michael J Frank, Brown University, United States

Reviewing Editor

  1. Thomas Serre, Brown University, United States

Reviewers

  1. James Haxby, Dartmouth College, United States
  2. Dirk Bernhardt-Walther

Publication history

  1. Received: March 26, 2019
  2. Accepted: July 17, 2019
  3. Version of Record published: September 9, 2019 (version 1)

Copyright

© 2019, Thorat et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 943
    Page views
  • 135
    Downloads
  • 3
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Computational and Systems Biology
    2. Neuroscience
    Tuomo Mäki-Marttunen et al.
    Research Article Updated

    Signalling pathways leading to post-synaptic plasticity have been examined in many types of experimental studies, but a unified picture on how multiple biochemical pathways collectively shape neocortical plasticity is missing. We built a biochemically detailed model of post-synaptic plasticity describing CaMKII, PKA, and PKC pathways and their contribution to synaptic potentiation or depression. We developed a statistical AMPA-receptor-tetramer model, which permits the estimation of the AMPA-receptor-mediated maximal synaptic conductance based on numbers of GluR1s and GluR2s predicted by the biochemical signalling model. We show that our model reproduces neuromodulator-gated spike-timing-dependent plasticity as observed in the visual cortex and can be fit to data from many cortical areas, uncovering the biochemical contributions of the pathways pinpointed by the underlying experimental studies. Our model explains the dependence of different forms of plasticity on the availability of different proteins and can be used for the study of mental disorder-associated impairments of cortical plasticity.

    1. Neuroscience
    Kazuki Shiotani et al.
    Research Article Updated

    The ventral tenia tecta (vTT) is a component of the olfactory cortex and receives both bottom-up odor signals and top-down signals. However, the roles of the vTT in odor-coding and integration of inputs are poorly understood. Here, we investigated the involvement of the vTT in these processes by recording the activity from individual vTT neurons during the performance of learned odor-guided reward-directed tasks in mice. We report that individual vTT cells are highly tuned to a specific behavioral epoch of learned tasks, whereby the duration of increased firing correlated with the temporal length of the behavioral epoch. The peak time for increased firing among recorded vTT cells encompassed almost the entire temporal window of the tasks. Collectively, our results indicate that vTT cells are selectively activated during a specific behavioral context and that the function of the vTT changes dynamically in a context-dependent manner during goal-directed behaviors.