1. Neuroscience
Download icon

Causal neural mechanisms of context-based object recognition

  1. Miles Wischnewski
  2. Marius V Peelen  Is a corresponding author
  1. Donders Institute for Brain, Cognition and Behaviour, Radboud University, Netherlands
  2. Department of Biomedical Engineering, University of Minnesota, United States
Research Article
  • Cited 0
  • Views 762
  • Annotations
Cite this article as: eLife 2021;10:e69736 doi: 10.7554/eLife.69736

Abstract

Objects can be recognized based on their intrinsic features, including shape, color, and texture. In daily life, however, such features are often not clearly visible, for example when objects appear in the periphery, in clutter, or at a distance. Interestingly, object recognition can still be highly accurate under these conditions when objects are seen within their typical scene context. What are the neural mechanisms of context-based object recognition? According to parallel processing accounts, context-based object recognition is supported by the parallel processing of object and scene information in separate pathways. Output of these pathways is then combined in downstream regions, leading to contextual benefits in object recognition. Alternatively, according to feedback accounts, context-based object recognition is supported by (direct or indirect) feedback from scene-selective to object-selective regions. Here, in three pre-registered transcranial magnetic stimulation (TMS) experiments, we tested a key prediction of the feedback hypothesis: that scene-selective cortex causally and selectively supports context-based object recognition before object-selective cortex does. Early visual cortex (EVC), object-selective lateral occipital cortex (LOC), and scene-selective occipital place area (OPA) were stimulated at three time points relative to stimulus onset while participants categorized degraded objects in scenes and intact objects in isolation, in different trials. Results confirmed our predictions: relative to isolated object recognition, context-based object recognition was selectively and causally supported by OPA at 160–200 ms after onset, followed by LOC at 260–300 ms after onset. These results indicate that context-based expectations facilitate object recognition by disambiguating object representations in the visual cortex.

Introduction

Objects are typically seen within a rich, structured, and familiar context, such as cars on a road and chairs in a living room. Decades of behavioral work have shown that context facilitates the recognition of objects (Bar, 2004; Biederman et al., 1982; Oliva and Torralba, 2007). This contextual facilitation is crucial for everyday behavior, allowing us to recognize objects under poor viewing conditions (Figure 1), at a distance, in clutter, and in the periphery where visual resolution is low. Yet despite the pervasive influence of context on object recognition, our knowledge of the neural mechanisms of object recognition almost exclusively comes from studies in which participants view clearly visible isolated objects without context. These studies have shown that isolated object recognition results from the transformation of local, low-level features into view-invariant object representations along the ventral stream (DiCarlo et al., 2012; Liu et al., 2009; Riesenhuber and Poggio, 1999; Serre et al., 2007). Does a similar local-to-global hierarchy support context-based object recognition?

Example of context-based object recognition.

At night (top panels), the truck is easily recognized by participants when placed in context (left) but not when taken out of context (right). With sufficient light (bottom panels), the truck is easily recognized also when presented in isolation.

One possibility is that context-based object recognition is supported by the parallel feedforward processing of local object information in the ventral stream object pathway and global scene processing in a separate scene pathway (Henderson and Hollingworth, 1999; Park et al., 2011). Output of these pathways may then be combined in downstream decision-making regions, leading to contextual benefits in object recognition. Alternatively, context-based object recognition may be supported by feedback processing, with scene context providing a prior that is integrated with ambiguous object representations in the visual cortex (Bar, 2004; Brandman and Peelen, 2017; de Lange et al., 2018). Neuroimaging studies have not been able to distinguish between these possibilities because the contextual modulation of neural activity in object-selective cortex (Brandman and Peelen, 2017; Faivre et al., 2019; Gronau et al., 2008; Rémy et al., 2014) could precede but also follow object recognition, for example reflecting post-recognition imagery (Dijkstra et al., 2018; Reddy et al., 2010).

To distinguish between these accounts, we used transcranial magnetic stimulation (TMS) to interfere with processing in right object-selective lateral occipital cortex (LOC; Grill-Spector, 2003; Malach et al., 1995) and right scene-selective occipital place area (OPA; Dilks et al., 2013; Grill-Spector, 2003) at three time points relative to stimulus onset. We additionally stimulated the early visual cortex (EVC) to investigate the causal contribution of feedback processing in this region during both isolated object recognition and context-based object recognition (Camprodon et al., 2010; Koivisto et al., 2011; Pascual-Leone and Walsh, 2001; Wokke et al., 2013). EVC stimulation was targeted around 2 cm above the inion, with the coil positioned such that TMS induced static phosphenes centrally in the visual field, where the stimuli were presented. This region corresponds primarily to V1 (Koivisto et al., 2010; Pascual-Leone and Walsh, 2001). The three regions were stimulated in separate pre-registered experiments (N=24 in each experiment; see Materials and methods).

Because TMS effects are variable across individuals, for example, due to individual differences in functional coordinates but also skull thickness and subject-specific gyral folding patterns (Opitz et al., 2013), we used a TMS-based assignment procedure to ensure the effectiveness of TMS over each of the three stimulated regions at the individual participant level (van Koningsbruggen et al., 2013). To achieve this, all 72 participants in the current study first underwent a separate TMS session in which the effectiveness of TMS over the three regions was established using object and scene recognition tasks (for the full procedure and results of this screening experiment, see Wischnewski and Peelen, 2021). Only participants who showed reduced scene recognition performance after OPA stimulation were assigned to the OPA experiment (N=24), only participants who showed reduced object recognition performance after LOC stimulation were assigned to the LOC experiment (N=24), and only participants who experienced TMS-induced phosphenes after EVC stimulation were assigned to the EVC experiment (N=24). All 72 participants satisfied at least one of these criteria such that no participants had to be excluded.

In all experiments, participants performed an unspeeded eight-alternative forced-choice object recognition task, indicating whether a briefly presented stimulus belonged to one of the eight categories (Figure 2). Participants performed this task for clearly visible isolated objects (isolated object recognition) as well as for degraded objects presented within a congruent scene context (context-based object recognition). In addition to the object recognition tasks, participants also performed a scene-alone task in which the object was cropped out and replaced with background. In this condition, participants had to guess the object category of the cropped-out object.

Overview of task and stimulation methods.

(a) Schematic overview of a trial. Two TMS pulses (40 ms apart) were delivered on each trial at one of three time windows relative to stimulus onset (60–100 ms, 160–200 ms, and 260–300 ms). The three TMS timings occurred in random order within each block. (b) Examples of each of the eight categories shown in the experiment, in the isolated object condition (left) and the context-based object condition (right). Note that the local degradation of the objects in the context-based object condition is not clearly visible from these small example images. This degradation strongly reduces object recognition when the degraded objects are presented out of scene context (see Brandman and Peelen, 2017). These conditions were presented in random order and participants performed the same categorization task on all stimuli. (c) Overview of the three TMS sites and the three time windows of stimulation. Shaded background colors indicate presumed time windows of inhibition for double-pulse TMS. TMS, transcranial magnetic stimulation.

Predictions (Figure 3a) were based on the findings of recent functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG) experiments investigating context-based object recognition (Brandman and Peelen, 2017). In those experiments, participants viewed degraded objects in scene context, degraded objects alone, and scenes alone. Behavioral results showed that the degraded objects were easy to recognize when presented in scene context (>70% correct in a nine-category task) but hard to recognize when presented alone (37% correct). fMRI results showed that the multivariate representation of the category of the degraded objects in LOC was strongly enhanced when the objects were viewed in scene context relative to when they were viewed alone. Importantly, the corresponding scenes presented alone did not evoke discriminable object category responses in LOC, providing evidence for supra-additive contextual facilitation. Interestingly, the contextual facilitation of object processing in LOC was correlated with concurrently evoked activity in scene-selective regions, suggesting an interaction between scene- and object-selective regions. MEG results showed that the information about the category of the degraded objects in scenes (derived from multivariate sensor patterns) peaked at two time points: at 160–180 ms and at 280–300 ms after stimulus onset. Crucially, only the later peak showed a significant contextual facilitation effect, with more information about the degraded objects in scenes than the degraded objects alone. Similar to the LOC results, at this time point, the scenes alone did not evoke discriminable object category responses, such that the contextual facilitation of object processing could not reflect the additive processing of scenes and objects. Taken together, these results indicate that scenes—processed in scene-selective cortex—disambiguate object representations in LOC at around 300 ms after stimulus onset.

Predictions and results.

(a) We hypothesized that isolated object recognition (top row) would be causally supported by EVC at 60–100 ms (early time point in right plot), followed by LOC at 160–200 ms (middle time point in central plot), reflecting feedforward processing of intact object features (Cichy et al., 2014). Scene-selective OPA (left plot) was not expected to contribute to isolated object recognition at any time point (Dilks et al., 2013; Wischnewski and Peelen, 2021). Similar to isolated object recognition, we hypothesized that context-based object recognition (middle row) would be causally supported by EVC at 60–100 ms and by LOC at 160–200 ms, reflecting feedforward processing. In contrast to isolated object recognition, we hypothesized that OPA would causally support context-based object recognition at 160–200 ms (middle time point in left plot), reflecting scene processing. Crucially, scene-based expectations were hypothesized to reach LOC later in time, disambiguating object representations at 260–300 ms (late time point in central plot; Brandman and Peelen, 2017). TMS over LOC at this time point should thus selectively disrupt context-based object recognition. EVC was hypothesized to receive feedback from LOC at 160–200 ms (Camprodon et al., 2010; Koivisto et al., 2011; Murray et al., 2002; Wokke et al., 2013), which we expected to be most important for context-based object recognition, in which the object needs to be segregated from the background scene (Korjoukov et al., 2012; Lamme and Roelfsema, 2000; Scholte et al., 2008). Finally, OPA was predicted to causally support scene-alone recognition at 160–200 ms (bottom row). (b) Results of three TMS experiments. Predictions were largely confirmed, except for feedback effects in EVC (at 160–200 ms), which were specific to isolated object recognition rather than context-based object recognition. *p<0.05, **p<0.01, ***p<0.001, with error bars reflecting the SEM. EVC, early visual cortex; LOC, lateral occipital cortex; OPA, occipital place area; TMS, transcranial magnetic stimulation.

Figure 3—source data 1

Individual participant means (accuracy and RT).

https://cdn.elifesciences.org/articles/69736/elife-69736-fig3-data1-v1.xlsx

The current TMS study was designed to provide causal evidence for this account. Differently from the neuroimaging studies, here we compared the recognition of degraded objects in scenes with the recognition of intact objects alone, rather than degraded objects alone. This was because the large accuracy difference between the recognition of degraded objects in scenes and degraded objects alone prevents a direct comparison of TMS effects between these conditions. Furthermore, this design allowed us to compare the causal neural mechanisms underlying object recognition based on scene context and local features, with the possibility to match the tasks in terms of recognition accuracy.

Results

Across all TMS conditions, objects were equally recognizable when presented in isolation (without degradation; 75.8%) and when presented degraded within a scene (76.7%; main effect of Task: F(1,71)=1.22, p=0.27), showing that scene context can compensate for the loss of object visibility induced by the local degradation (Brandman and Peelen, 2017). Importantly, despite the equal performance, recognition in the two object recognition tasks was supported by different neural mechanisms in a time-specific manner (three-way interaction between Task [intact object recognition, context-based object recognition], Region [OPA, LOC, and EVC], and Time [60–100 ms, 160–200 ms, and 260–300 ms]; F(4,138)=14.37, p<0.001, ηp²=0.294). This interaction was followed up by separate analyses for each of the stimulated regions.

TMS did not significantly affect response time (RT), with no interactions involving either TMS time or TMS region: Time×Region, F(4,138)=0.163, p=0.957; Task×Region, F(2,69)=0.81, p=0.450; Time×Task, F(2,142)=0.37, p=0.689; Time×Region×Task, F(4,138)=1.153, p=0.334. There were also no significant main effects of Time (F(2,142)=2.88, p=0.060) or Region (F(2,69)=0.82, p=0.447).

OPA experiment

Stimulation of scene-selective OPA differentially affected performance in the two tasks (Figure 3b, left panel; Task×Time interaction F(2,46)=8.21, p<0.001, ηp²=0.263). For isolated object recognition, there was no effect of TMS time (F(2,46)=0.07, p=0.935, ηp²=0.003), indicating that isolated object recognition was not influenced by TMS over OPA. By contrast, context-based object recognition was strongly modulated by TMS time (F(2,46)=19.54, p<0.001, ηp²=0.459). As predicted, TMS selectively impaired context-based object recognition performance when OPA was stimulated 160–200 ms after scene onset, both relative to earlier stimulation (t(23)=5.39, p<0.001, d=1.099) and relative to later stimulation (t(23)=5.36, p<0.001, d=1.095), with no significant difference between early and late stimulation (t(23)=0.26, p=0.795). These results show that OPA, a scene-selective region, is causally and selectively involved in (context-based) object recognition.

The pre-registration of the OPA experiment additionally included predictions for a third task, the scene-alone task (Figure 3a, bottom row). Similar to the context-based recognition task, we expected that OPA stimulation at 160–200 ms after scene onset would impair accuracy in the scene-alone task. The Task×Time interaction reported above was also significant when including this condition as a third task in the ANOVA (F(4,92)=4.64, p=0.002, ηp²=0.168). For the scene-alone task, accuracy was significantly affected by TMS time (F(2,46)=4.77, p=0.013, ηp²=0.172). TMS impaired scene-alone accuracy when OPA was stimulated at 160–200 ms after scene onset relative to later stimulation (t(23)=3.02, p=0.006, d=0.616), though not relative to earlier stimulation (t(23)=1.62, p=0.118). There was no significant difference between early and late stimulation (t(23)=−1.50, p=0.145). Together with the context-based object recognition results, these findings provide information about the causal time course of OPA’s involvement in scene recognition, showing a selective OPA effect at 160–200 ms after stimulus onset.

LOC experiment

Stimulation of object-selective LOC differentially affected performance in the two tasks (Figure 3b, middle panel; Task×Time interaction F(2,46)=12.99, p<0.001, ηp²=0.361). For isolated object recognition, there was a main effect of TMS time (F(2,46)=15.50, p<0.001, ηp²=0.403; Figure 3b). As predicted, TMS selectively impaired isolated object recognition performance when LOC was stimulated at 160–200 ms after stimulus onset, both relative to earlier stimulation (t(23)=4.58, p<0.001, d=0.936) and relative to later stimulation (t(23)=5.39, p<0.001, d=1.101), with no significant difference between early and late stimulation (t(23)=−1.17, p=0.255). A different temporal profile was observed for context-based object recognition. For this task, TMS time also had a significant effect (F(2,46)=9.03, p<0.001, ηp²=0.282; Figure 3b). In contrast to the isolated object condition, performance strongly decreased when TMS was applied later in time, at 260–300 ms after stimulus onset, both relative to early stimulation (t(23)=4.01, p<0.001, d=0.818) and relative to middle stimulation (t(23)=2.26, p=0.034, d=0.461). Context-based object recognition accuracy was moderately reduced when TMS was applied at 160–200 ms relative to earlier stimulation (t(23)=2.17, p=0.041, d=0.442). These findings confirm that LOC is causally involved in both isolated object recognition and context-based object recognition at 160–200 ms after stimulus onset. Crucially, LOC was causally involved in context-based object recognition at 260–300 ms, confirming our hypothesis that contextual feedback to LOC supports context-based object recognition.

EVC experiment

Finally, stimulation of EVC allowed us to test whether similar feedback effects could be observed earlier in the visual hierarchy. Results showed that the time of EVC stimulation differentially affected performance in the two tasks (Figure 3b, right panel; Task×Time interaction F(2,46)=14.42, p<0.001, ηp²=0.385). For isolated object recognition, there was a main effect of TMS time (F(2,46)=13.27, p<0.001, ηp²=0.366; Figure 3b). As predicted, TMS applied early in time impaired recognition performance relative to TMS late in time (t(23)=3.44, p=0.002, d=0.701). Interestingly, and contrary to our prediction, isolated object recognition was also impaired when TMS was applied at 160–200 ms compared to late stimulation (t(23)=5.19, p<0.001, d=1.06). There was no difference in performance between TMS at early and intermediate time windows (t(23)=1.16, p=0.257). For context-based object recognition, there was a main effect of TMS time (F(2,46)=19.01, p<0.001, ηp²=0.452; Figure 3b). As predicted, TMS applied early in time impaired recognition performance relative to TMS late in time (t(23)=5.41, p<0.001, d=1.105). Contrary to our prediction, context-based object recognition performance was not significantly reduced when TMS was applied at the middle time window relative to later stimulation (t(23)=0.99, p=0.334). These findings confirm that EVC is causally involved in initial visual processing, supporting both isolated object recognition and context-based recognition. In the 160–200 ms time window, EVC was causally involved in isolated object recognition but not context-based object recognition.

Discussion

Altogether, these results reveal distinct neural mechanisms underlying object recognition based on local features (isolated object recognition) and scene context (context-based object recognition). During feedforward processing, EVC and object-selective cortex supported both the recognition of objects in scenes and in isolation, while scene-selective cortex was uniquely required for context-based object recognition. Results additionally showed that feedback to EVC causally supported isolated object recognition, while feedback to object-selective cortex causally supported context-based object recognition. These results provide evidence for two routes to object recognition, each characterized by feedforward and feedback processing but involving different brain regions at different time points (Figure 4).

Schematic summarizing results.

Distinct cortical routes causally support isolated object recognition and context-based object recognition. Isolated object recognition (top row) was supported by EVC early in time (60–100 ms), reflecting initial visual encoding. This was followed by LOC at 160–200 ms, reflecting higher-level object processing. At this time window, EVC was still required for isolated object recognition, presumably reflecting feedback processing. Similar to isolated object recognition, context-based object recognition (bottom row) was supported by EVC at 60–100 ms, followed by LOC at 160–200 ms. However, context-based object recognition additionally required OPA at 160–200 ms, reflecting scene processing. Finally, context-based object recognition causally depended on late processing (260–300 ms) in LOC, reflecting contextual disambiguation (Brandman and Peelen, 2017). Note that the arrows do not necessarily reflect direct connections between brain regions. EVC, early visual cortex; LOC, lateral occipital cortex.

The finding that EVC (for isolated object recognition) and LOC (for context-based object recognition) causally supported object recognition well beyond the feedforward sweep suggests that feedback processing is required for accurate object recognition. Feedback processing in EVC and LOC may be explained under a common hierarchical perceptual inference framework (Friston, 2005; Haefner et al., 2016; Lee and Mumford, 2003; Rao and Ballard, 1999), in which a global representation provides a prior that allows for disambiguating relatively more local information. For context-based object recognition, the scene (represented in OPA) would be the global element, providing a prior for processing the relatively more local shape of the object (represented in LOC). For isolated object recognition, object shape would be the global element, providing a prior for processing the relatively more local inner object features (e.g., the eyes of a squirrel; represented in EVC). Feedback based on the more global representations thus serves to disambiguate the representation of more local representations. While feedback processing was hypothesized for LOC based on previous neuroimaging findings, we did not hypothesize that feedback to EVC would be required for recognizing isolated objects. Future studies are needed to test under what conditions feedback to EVC causally contributes to object recognition (Camprodon et al., 2010; Koivisto et al., 2011; Wokke et al., 2013). In line with the reverse hierarchy theory, we expect that the specific feedback that is useful for a given task—and the brain regions involved—depend on the available information in the image together with specific task demands (Hochstein and Ahissar, 2002).

An alternative interpretation of the relatively late causal involvement of EVC in isolated object recognition, and LOC in context-based object recognition, is that these effects reflect local recurrence rather than feedback. This interpretation cannot be ruled out based on the current results alone. However, based on previous findings, we think this is unlikely, at least for LOC. In the fMRI study that used a similar stimulus set as used here (Brandman and Peelen, 2017), representations of degraded objects in LOC were facilitated (relative to degraded objects alone) by the presence of scene context, indicating input from outside of LOC considering that LOC did not represent object information from scenes presented alone. Furthermore, the corresponding MEG study showed two peaks for degraded objects in scenes, one at 160–180 ms and one at 280–300 ms. The later peak showed a significant contextual facilitation effect in the MEG study, with better decoding of degraded objects in scenes than degraded objects alone. The present finding that TMS over LOC at 260–300 ms selectively impaired context-based object recognition is fully in line with these fMRI and MEG findings, pointing to feedback processing rather than local recurrence.

Taken together with previous findings, the current results are thus best explained by an account in which information from scenes (processed in scene-selective cortex) feeds back to LOC to disambiguate object representations. This mechanism may underlie the behavioral benefits previously observed for object recognition in semantically and syntactically congruent (vs. incongruent) scene context (Biederman et al., 1982; Davenport and Potter, 2004; Munneke et al., 2013; Võ and Wolfe, 2013), as predicted by interactive accounts that propose that contextual facilitation is supported by contextual expectations (Bar, 2004; Davenport and Potter, 2004), with quickly extracted global scene ‘gist’ priming the representation of candidate objects in the visual cortex (Bar, 2004; Oliva and Torralba, 2007; Torralba, 2003). The current TMS results suggest that OPA is crucial for extracting this global scene information at around 160–200 ms after scene onset, and that this information is integrated with local object information in LOC around 100 ms later. The current results do not speak to whether OPA-LOC connectivity is direct or indirect, for example involving additional brain regions such as other scene-selective regions or the orbitofrontal cortex (Bar, 2004).

Our study raises the interesting question of what type of context-based expectations help to disambiguate object representations in LOC. The scenes in the current study provided multiple cues that may help to recognize the degraded objects. For example, the scenes provided information about the approximate real-world size of the objects as well as the objects’ likely semantic category. Both of these cues may help to recognize objects (Biederman et al., 1982; Davenport and Potter, 2004; Munneke et al., 2013; Võ and Wolfe, 2013). Future experiments could test whether feedback to LOC is specifically related to one of these cues. For example, one could test whether similar effects are found when objects are presented in semantically uninformative scenes, with the scene only providing information about the approximate real-world size of the object.

To conclude, the current study provides causal evidence that context-based expectations facilitate object recognition by disambiguating object representations in the visual cortex. More generally, results reveal that distinct neural mechanisms support object recognition based on local features and global scene context. Future experiments may extend our approach to include other contextual features such as co-occurring objects, temporal context, and input from other modalities.

Materials and methods

Participants

Prior to experimentation, we decided to test 24 participants in all three experiments. Preregistrations can be found at https://aspredicted.org/cs4wz.pdf (OPA), https://aspredicted.org/yc969.pdf (LOC), and https://aspredicted.org/cy9fq.pdf (EVC). In total, 72 right-handed volunteers (43 females, mean age ± SD = 23.33 ± 3.59, age range = 18–33) with normal or corrected-to-normal vision took part in the experiment, after participating in a TMS localization experiment (Wischnewski and Peelen, 2021). Participants were excluded if they reported to have one of the following: CNS-acting medication, previous neurosurgical treatments, metal implants in the head or neck area, migraine, epilepsy or previous cerebral seizures (also within their family), pacemaker, intracranial metal clips, cochlea implants, or pregnancy. Additionally, participants were asked to refrain from consuming alcohol and recreational drugs 72 hr before the experiment and refrain from consuming coffee 2 hr before the experiment. Participants were divided over three experiments, targeting three cortical areas, based on a previous experiment. All experiments included 24 participants (OPA experiment, 12 females, mean age ± SD = 23.67 ± 3.92; LOC experiment, 14 females, mean age ± SD = 23.50 ± 3.09; EVC experiment, 17 females, mean age ± SD = 22.83 ± 3.81). Prior to the experimental session, participants were informed about the experimental procedures and gave written informed consent. The study procedures were approved by the ‘Centrale Commissie voor Mensgebonden Onderzoek (CCMO)’ and conducted in accordance with the Declaration of Helsinki.

Transcranial magnetic stimulation

Request a detailed protocol

TMS was applied via a Cool-B65 figure-of-8 coil with an outer diameter of 75 mm, which received input from a Magpro-X-100 magnetic stimulator (MagVenture, Farum, Denmark). Two TMS pulses (biphasic, wavelength: 280 µs) separated by 40 ms (25 Hz) were applied to disrupt visual cortex activity. Given that latency of visual cortex activation varies across participants, a two-pulse TMS design was chosen since it allows for a broader time window of disruption while maintaining relatively good temporal resolution (O'Shea et al., 2004; Pitcher et al., 2007; Wokke et al., 2013). The intensity of stimulation was adjusted to 85% of the individual phosphene threshold (PT). PT was established by increasing stimulator output targeting EVC until 50% of the pulses resulted in the perception of a phosphene while participants fixated on a black screen in a dimly lit room. The TMS coil was placed with the help of an infrared-based neuronavigation system (Localite, Bonn, Germany) using an individually adapted standard brain model over the right LOC, right OPA, or EVC. Each stimulation location was identified through Talairach coordinates set in the Localite neuronavigation system. The coordinates were 45, –74, 0 for LOC (Pitcher et al., 2009) and 34, –77, 21 for OPA (Julian et al., 2016). TMS was placed on EVC based on its anatomical location, 2 cm above the inion (Koivisto et al., 2010; Pascual-Leone and Walsh, 2001). We then established the optimal coil position in such a way that phosphenes were reported centrally in the visual field, where the stimuli were presented.

Experimental stimuli

Request a detailed protocol

Stimuli consisted of 128 scene photographs with a single object belonging to one of the following eight categories: airplane, bird, car, fish, human, mammal, ship, and train. For the isolated object recognition task, the object was cropped out of the scene and presented at its original location on a gray background. For the context-based object recognition task, the object was pixelated to remove local features. The experiment additionally included a scene-alone condition, in which the object was cropped out and replaced with background using a content-aware fill tool. In this condition, participants had to guess the object category of the cropped-out object.

To avoid that participants could recognize the degraded objects in scenes based on having seen their intact version, the stimulus set was divided into two halves: for each participant, half of the stimuli were used in the context-based object condition, and the other half of the stimuli were used both in the isolated object condition and the scene-alone condition. This assignment was counterbalanced across participants. The scenes spanned a visual angle of 6°×4.5°.

Main task

Request a detailed protocol

Before the experiment, participants received instructions and were presented with an example stimulus (which was not used in the main experiment). This example displayed how each stimulus variation (context-based, isolated object, and scene alone) was derived from an original photograph. For the main task, each trial started with a fixation cross (500 ms), followed by a stimulus presented for 33 ms. Next, a blank screen was shown for 500 ms. After this, participants were asked to respond by pressing one out of eight possible keys according to the object category presented (Figure 2). No limit on RT was given. However, participants were encouraged during the instructions to respond within 3 s. The response screen was presented until the participant responded. The next trial started after a 2 s inter-trial interval. This relatively long interval was chosen to prevent repetitive TMS effects. TMS was applied at one of three different time points, with randomized order. TMS pulses could be applied at 60 ms and 100 ms after stimulus onset, 160 ms and 200 ms after stimulus onset, or 260 ms and 300 ms after stimulus onset. In 2 participants out of the 72 (1 in the LOC experiment and 1 in the EVC experiment), each pulse was accidentally delivered 16 ms earlier than described above.

Each stimulus was repeated three times, once for each TMS timing (60–100 ms, 160–200 ms, and 260–300 ms). This resulted in a total of 576 trials, which were presented in a random order. To avoid fatigue, the task was divided into 12 blocks of 48 trials, each lasting approximately 4 min, with short breaks in between of approximately 1 min. Thus, completing the task took about 60 min. The total duration of the experiment, including preparation and PT determination, was approximately 90 min.

Data availability

All data generated or analysed during this study are included in the manuscript and supporting files. Source data files have been provided for Figure 3.

References

    1. Bar M
    (2004) Visual objects in context
    Nature Reviews Neuroscience 5:617–629.
    https://doi.org/10.1038/nrn1476
    1. Friston K
    (2005) A theory of cortical responses
    Philosophical Transactions of the Royal Society B: Biological Sciences 360:815–836.
    https://doi.org/10.1098/rstb.2005.1622

Decision letter

  1. Redmond G O'Connell
    Reviewing Editor; Trinity College Dublin, Ireland
  2. Joshua I Gold
    Senior Editor; University of Pennsylvania, United States
  3. Redmond G O'Connell
    Reviewer; Trinity College Dublin, Ireland
  4. Peter Kok
    Reviewer; University College London, United Kingdom

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Acceptance summary:

This study will be of interest to scientists involved in high-level vision. The data provide a compelling demonstration of the causal role of three key visual areas in context-based object recognition. The key claims of the manuscript are supported by the data, and are strengthened by the pre-registration of each of the three experiments.

Decision letter after peer review:

Thank you for submitting your article "Causal neural mechanisms of context-based object recognition" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, including Redmond G O’Connell as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Joshua Gold as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Peter Kok (Reviewer #3).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) The preregistration for this study commits to including the scene-only condition in the statistical analyses for the main experiment however only the object-based and context-based conditions were considered. The authors should either provide a strong justification for this deviation or else run and report the statistics as originally planned.

2) More detail on the precise participant screening procedure is required. Exactly what tasks and stimuli were participants exposed to? What TMS SOAs were used? How many participants were excluded based on this procedure? A statement should be added to the main text flagging this procedure to the reader so that they are clear that the basic main effect of TMS to LOC object-based task performance was pre-ordained.

3) The authors should provide greater discussion of alternative interpretations of their results. Currently the authors only entertain the possibility that the results reflect feedback effects but they should also consider the possibility of persistent/recurrent activity within visual areas reflecting extended processing of stimuli held in iconic memory.

Reviewer #1 (Recommendations for the authors):

The study hypotheses could be articulated more clearly. Citations should be provided to back up the hypothesised TMS SOA effects in each region. The legend of Figure 3 would seem a good place to lay out the predictions in more detail.

The area identified as EVA should be clarified in the Introduction. What visual regions does this area encompass? Just V1? Others?

Reviewer #2 (Recommendations for the authors):

1. The authors argue that feedback based on more global representations (of the scene) serve to disambiguate more local representations (of the object).

The scene-only condition seems important to this interpretation, and I would suggest discussing this manipulation further in the main text. An alternative interpretation is that scene recognition simply reduces the epistemic priors of what the object could be (e.g., if the scene is grass, the object is unlikely to be a fish, ship, airplane, or train). Indeed, Figure S2 suggests that performance in the scene-only condition, although relatively poor (40-50%), is much higher than chance (12.5%). So scene recognition seems to filter the range of possible correct responses in the first place, and the effects that are observed may be separate to the capacity of scene recognition to disambiguate specific features of the object itself.

I take the authors' point that the absent effect of TMS on accuracy in the scene-only condition argues against this as a possibility. Nevertheless, an informative condition may have been one in which the image was simply occluded by a shape (e.g., circle) that provided only information about the relative size of the object, but not about its internal features. Such a manipulation would have maintained a similarity with the context-based condition while eliminating the intrinsic perceptual features of the object that could be disambiguated by the scene (other than its approximate size, which provides information about the possible object category, but not the object itself). This may have allowed the authors to more clearly determine whether context-based object recognition is specifically driven by disambiguation of the perceptual features intrinsic to each object.

I am not necessarily suggesting running an additional experiment, but further clarification/discussion on the above issues would be worthwhile.

Reviewer #3 (Recommendations for the authors):

– Why where these exact time windows chosen for stimulation (and hypotheses)? Can you provide a more concrete (neurophysiological) rationale?

– The EVC results are opposite to the hypothesised ones, with later involvement for objects in isolation than those in context. The authors acknowledge this, but do not discuss in much depth why this might be. Alternatively, they might simply acknowledge that we do not know why this is, and more research is needed on this point.

– Could you also draw the hypotheses that the parallel account yields, so that it can be clearly seen which data points distinguish the hypotheses? Is the late LOC stimulation effect in context vs. isolation the only crucial point?

– Regarding this datapoint, could it be that late LOC stimulation interferes with object recognition in the context condition because the objects were degraded, rather than because they were presented in a scene context? In other words, because there is more (local) recurrence in LOC required to resolve degraded objects? This seems important to rule out (or acknowledge).

– From Figure S1, it seems that participants were generally faster in the EVC experiment. Any idea why this might be?

https://doi.org/10.7554/eLife.69736.sa1

Author response

Essential revisions:

1) The preregistration for this study commits to including the scene-only condition in the statistical analyses for the main experiment however only the object-based and context-based conditions were considered. The authors should either provide a strong justification for this deviation or else run and report the statistics as originally planned.

The pre-registered statistics for the scene-alone condition in the OPA experiment are now included in the manuscript (p.9-10). The relevant figure (Figure 3) has also been updated such that the scene-alone condition results are now in the main text rather than the Supplement. Results confirmed our predictions, showing a reduction of scene-alone performance when OPA was stimulated 160-200 ms after stimulus onset. Note that the scene-alone condition was only included in the pre-registration of the OPA experiment, which is why we had not reported the corresponding statistics previously. (This condition was not relevant for the LOC and EVC experiments.)

2) More detail on the precise participant screening procedure is required. Exactly what tasks and stimuli were participants exposed to? What TMS SOAs were used? How many participants were excluded based on this procedure? A statement should be added to the main text flagging this procedure to the reader so that they are clear that the basic main effect of TMS to LOC object-based task performance was pre-ordained.

We now introduce the screening procedure in the main text and point the reader to a recent publication that documents the methods and results of this experiment (Wischnewski and Peelen, J Neurosci 2021). The screening experiment followed the design of Dilks et al. (J Neurosci 2013), stimulating OPA and LOC using 5 TMS pulses at a rate of 10Hz (i.e., no SOAs were used). No participants were excluded – all participants were assigned to one of the three conditions (OPA, LOC, EVC). This is now more clearly explained in the manuscript.

3) The authors should provide greater discussion of alternative interpretations of their results. Currently the authors only entertain the possibility that the results reflect feedback effects but they should also consider the possibility of persistent/recurrent activity within visual areas reflecting extended processing of stimuli held in iconic memory.

We have added a paragraph to the Discussion section in which we discuss the alternative interpretation of local recurrence (p.13-14).

Reviewer #1 (Recommendations for the authors):

The study hypotheses could be articulated more clearly. Citations should be provided to back up the hypothesised TMS SOA effects in each region. The legend of Figure 3 would seem a good place to lay out the predictions in more detail.

Thanks for these suggestions. We have added citations to back up the hypothesized SOAs. We have also explained the previous fMRI/MEG study that led to these predictions in more detail. We now also included a more detailed explanation in the Figure 3 legend.

The area identified as EVA should be clarified in the Introduction. What visual regions does this area encompass? Just V1? Others?

We have added this clarification to the Introduction, citing previous work using the same procedures. EVC here primarily corresponds to V1.

Reviewer #2 (Recommendations for the authors):

1. The authors argue that feedback based on more global representations (of the scene) serve to disambiguate more local representations (of the object).

The scene-only condition seems important to this interpretation, and I would suggest discussing this manipulation further in the main text. An alternative interpretation is that scene recognition simply reduces the epistemic priors of what the object could be (e.g., if the scene is grass, the object is unlikely to be a fish, ship, airplane, or train). Indeed, Figure S2 suggests that performance in the scene-only condition, although relatively poor (40-50%), is much higher than chance (12.5%). So scene recognition seems to filter the range of possible correct responses in the first place, and the effects that are observed may be separate to the capacity of scene recognition to disambiguate specific features of the object itself.

I take the authors' point that the absent effect of TMS on accuracy in the scene-only condition argues against this as a possibility. Nevertheless, an informative condition may have been one in which the image was simply occluded by a shape (e.g., circle) that provided only information about the relative size of the object, but not about its internal features. Such a manipulation would have maintained a similarity with the context-based condition while eliminating the intrinsic perceptual features of the object that could be disambiguated by the scene (other than its approximate size, which provides information about the possible object category, but not the object itself). This may have allowed the authors to more clearly determine whether context-based object recognition is specifically driven by disambiguation of the perceptual features intrinsic to each object.

I am not necessarily suggesting running an additional experiment, but further clarification/discussion on the above issues would be worthwhile.

The scene-alone condition is now included in the main text, as well as in Figure 3. We have also included a more extensive summary of previous fMRI and MEG studies that formed the basis for the current study, where we focused on the question of whether scene and object information are additive or super-additive. We have explained this study in more detail in the Introduction to make the predictions clearer.

The reviewer also raises an interesting point about what aspects the scene helps to disambiguate. Object size is certainly one candidate. We have included a paragraph to the Discussion raising this possibility (p.14-15).

Reviewer #3 (Recommendations for the authors):

– Why where these exact time windows chosen for stimulation (and hypotheses)? Can you provide a more concrete (neurophysiological) rationale?

We have added citations to back up the hypothesized SOAs. We have also explained our previous fMRI and MEG work in more detail in the Introduction, as this led to the current predictions.

– The EVC results are opposite to the hypothesised ones, with later involvement for objects in isolation than those in context. The authors acknowledge this, but do not discuss in much depth why this might be. Alternatively, they might simply acknowledge that we do not know why this is, and more research is needed on this point.

We have extended the discussion of these results and mention that more work is needed to follow up on these findings (p.13).

– Could you also draw the hypotheses that the parallel account yields, so that it can be clearly seen which data points distinguish the hypotheses? Is the late LOC stimulation effect in context vs. isolation the only crucial point?

Yes, the late LOC effect is the most informative point to distinguish between the hypotheses. We have now made this clear in the Introduction, also based on a more detailed description of previous fMRI/MEG work.

– Regarding this datapoint, could it be that late LOC stimulation interferes with object recognition in the context condition because the objects were degraded, rather than because they were presented in a scene context? In other words, because there is more (local) recurrence in LOC required to resolve degraded objects? This seems important to rule out (or acknowledge).

We have added a paragraph to the Discussion section in which we discuss the alternative interpretation of local recurrence (p.13-14).

– From Figure S1, it seems that participants were generally faster in the EVC experiment. Any idea why this might be?

There were no significant differences between regions (main effect of Region: (F(2,69) = 0.82, p = 0.447)), which is now reported in the main text (p.8). Thus, while participants were numerically faster in the EVC experiment, this was not reliably different from the other regions (note that Region was manipulated across participants, unlike the other variables).

https://doi.org/10.7554/eLife.69736.sa2

Article and author information

Author details

  1. Miles Wischnewski

    1. Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
    2. Department of Biomedical Engineering, University of Minnesota, Minneapolis, United States
    Contribution
    Data curation, Formal analysis, Investigation, Methodology, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  2. Marius V Peelen

    Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
    Contribution
    Conceptualization, Supervision, Funding acquisition, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    m.peelen@donders.ru.nl
    Competing interests
    Reviewing editor, eLife
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4026-7303

Funding

H2020 European Research Council (725970)

  • Marius V Peelen

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 Research and Innovation Program (Grant agreement no 725970). The authors would like to thank Talia Brandman for help in stimulus creation, Andrea Ghiani for discussing results, and Marco Gandolfo, Floris de Lange, and Surya Gayet for feedback on an earlier version of the manuscript.

Ethics

Human subjects: Prior to the experimental session, participants were informed about the experimental procedures and gave written informed consent. The study procedures were approved by the 'Centrale Commissie voor Mensgebonden Onderzoek (CCMO)' under project number 2019-5311 (NL69407.091.19), and conducted in accordance with the Declaration of Helsinki.

Senior Editor

  1. Joshua I Gold, University of Pennsylvania, United States

Reviewing Editor

  1. Redmond G O'Connell, Trinity College Dublin, Ireland

Reviewers

  1. Redmond G O'Connell, Trinity College Dublin, Ireland
  2. Peter Kok, University College London, United Kingdom

Publication history

  1. Received: April 24, 2021
  2. Preprint posted: April 26, 2021 (view preprint)
  3. Accepted: July 26, 2021
  4. Version of Record published: August 10, 2021 (version 1)

Copyright

© 2021, Wischnewski and Peelen

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 762
    Page views
  • 96
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Debora Fusca, Peter Kloppenburg
    Research Article

    Local interneurons (LNs) mediate complex interactions within the antennal lobe, the primary olfactory system of insects, and the functional analog of the vertebrate olfactory bulb. In the cockroach Periplaneta americana, as in other insects, several types of LNs with distinctive physiological and morphological properties can be defined. Here, we combined whole-cell patch-clamp recordings and Ca2+ imaging of individual LNs to analyze the role of spiking and nonspiking LNs in inter- and intraglomerular signaling during olfactory information processing. Spiking GABAergic LNs reacted to odorant stimulation with a uniform rise in [Ca2+]i in the ramifications of all innervated glomeruli. In contrast, in nonspiking LNs, glomerular Ca2+ signals were odorant specific and varied between glomeruli, resulting in distinct, glomerulus-specific tuning curves. The cell type-specific differences in Ca2+ dynamics support the idea that spiking LNs play a primary role in interglomerular signaling, while they assign nonspiking LNs an essential role in intraglomerular signaling.

    1. Neuroscience
    Wanhui Sheng et al.
    Research Article Updated

    Hypothalamic oxytocinergic magnocellular neurons have a fascinating ability to release peptide from both their axon terminals and from their dendrites. Existing data indicates that the relationship between somatic activity and dendritic release is not constant, but the mechanisms through which this relationship can be modulated are not completely understood. Here, we use a combination of electrical and optical recording techniques to quantify activity-induced calcium influx in proximal vs. distal dendrites of oxytocinergic magnocellular neurons located in the paraventricular nucleus of the hypothalamus (OT-MCNs). Results reveal that the dendrites of OT-MCNs are weak conductors of somatic voltage changes; however, activity-induced dendritic calcium influx can be robustly regulated by both osmosensitive and non-osmosensitive ion channels located along the dendritic membrane. Overall, this study reveals that dendritic conductivity is a dynamic and endogenously regulated feature of OT-MCNs that is likely to have substantial functional impact on central oxytocin release.