Successful recognition of partially occluded objects is presumed to involve dynamic interactions between brain areas responsible for vision and cognition, but neurophysiological evidence for the involvement of feedback signals is lacking. Here, we demonstrate that neurons in the ventrolateral prefrontal cortex (vlPFC) of monkeys performing a shape discrimination task respond more strongly to occluded than unoccluded stimuli. In contrast, neurons in visual area V4 respond more strongly to unoccluded stimuli. Analyses of V4 response dynamics reveal that many neurons exhibit two transient response peaks, the second of which emerges after vlPFC response onset and displays stronger selectivity for occluded shapes. We replicate these findings using a model of V4/vlPFC interactions in which occlusion-sensitive vlPFC neurons feed back to shape-selective V4 neurons, thereby enhancing V4 responses and selectivity to occluded shapes. These results reveal how signals from frontal and visual cortex could interact to facilitate object recognition under occlusion.https://doi.org/10.7554/eLife.25784.001
When an object is partially occluded, relevant sensory evidence available to the visual system is diminished, making the process of object recognition challenging. Nevertheless, primates are remarkably adept at recognizing partially occluded objects –a common occurrence in the natural world. The neural mechanisms that mediate this perceptual capacity are largely unknown and are the focus of this study.
Biologically inspired models of object recognition are often implemented as hierarchical, feedforward architectures (Perrett and Oram, 1993; Wallis and Rolls, 1997; Riesenhuber and Poggio, 1999) despite extensive evidence for the role of feedback signaling in visual processing (Lamme et al., 1998; Gilbert and Li, 2013). These feedforward models, and even more elaborate schemes such as artificial convolutional neural networks, remain incapable of successfully recognizing partially occluded objects, although they can perform other object recognition tasks well (Wyatte et al., 2012; Pepik et al., 2015). The failure of these models has been attributed to the exclusion of critical computations mediated by feedback signals (Yuille and Kersten, 2006; Kriegeskorte, 2015; Rust and Stocker, 2010; Tang and Kreiman, 2017). Indeed, more recent models that incorporate feedback signals show improved recognition performance for occluded objects (O'Reilly et al., 2013; Tang et al., 2014b). However, little is known about where the relevant feedback signals originate in higher cortex, where they terminate in visual cortex and how they contribute to the recognition of occluded objects. To provide new insights, we investigated the role of the prefrontal cortex in the representation of occluded objects, focusing on how responses in this area compare to responses in the visual cortex, and how frontal and visual cortical areas might interact to facilitate recognition performance.
The prefrontal cortex (PFC) plays an important role in cognitive control—the orchestration of thought and action in accordance with internal goals (Miller and Cohen, 2001). Given its high-level function, it may seem unlikely that PFC would contribute to low-level visual representations and mediate the perception and recognition of occluded objects. However, anatomical studies demonstrate that a sub-region of PFC, the ventrolateral PFC (vlPFC), receives direct projections from visual cortical areas involved in higher form processing, that is V4 and inferotemporal cortex (IT) (Barbas and Mesulam, 1985; Ungerleider et al., 2008). The vlPFC also sends projections back to these visual areas (Ninomiya et al., 2012). The existence of functional interactions between these areas is also supported by the demonstration of synchronous neural activity in the theta frequency range between lateral PFC and V4 during perceptual discrimination of visual stimuli (Liebe et al., 2012) and by the engagement of PFC in perceptual processing under conditions of greater task difficulty (Jiang and Kanwisher, 2003). Given the anatomical and physiological evidence for interactions between vlPFC and visual cortical areas, we hypothesized that vlPFC responses could contribute to the representation and recognition of objects when perceptual judgments are made more difficult by partial occlusion.
To test this hypothesis, we conducted neurophysiological recordings in rhesus monkeys while they discriminated partially occluded shapes. Based on these neuronal data, we addressed three questions. First, how do vlPFC neurons respond to partially occluded shapes compared to neurons in visual area V4? Second, are the response dynamics and tuning properties of V4 neurons consistent with the arrival of feedback signals from vlPFC? Third, is V4 neuronal discriminability for occluded shapes enhanced after the putative arrival of feedback signals from vlPFC?
To determine how vlPFC contributes to the representation and recognition of partially occluded objects, we studied single neuronal responses in monkeys performing a sequential shape discrimination task. In this task, monkeys reported whether two sequentially presented shapes, the ‘reference’ and ‘test’, were the same or different by making a saccade to one of two choice targets (Figure 1A). To test discrimination under occlusion, the test stimulus was partially occluded with a field of randomly positioned dots. The level of occlusion was titrated by varying dot diameter and was quantified as the percentage of the shape area that remained visible (% visible area). In each session, two shapes were chosen from a standard stimulus set (Pasupathy and Connor, 2001; Kosai et al., 2014) to serve as the discriminanda. For both monkeys, task performance was high for unoccluded stimuli (100% visible area) and decreased gradually as the % visible area decreased (Figure 1B) — that is as occlusion increased (gray arrow).
We analyzed the responses of vlPFC neurons during the test stimulus epoch in which occlusion level was varied. Many neurons responded strongly to occluded stimuli and weakly to unoccluded stimuli. Data from two example neurons are shown (Figure 2). The responses of the first example neuron demonstrate a preference for one of the two shapes used (compare Figure 2A–B). For both the preferred and non-preferred shape, responses were stronger when the shapes were occluded (colored lines) than unoccluded (black lines). These responses were also more discriminable when the shapes were occluded: shape selectivity was stronger for occluded than unoccluded stimuli (Figure 2C; see Materials and methods). The responses of the second example neuron were also stronger when the shapes were occluded than unoccluded (Figure 2D–F). However, this neuron showed no preference for either of the two shapes used; shape selectivity was therefore weak for occluded and unoccluded stimuli (Figure 2F). The responses of this second example neuron are consistent with sensitivity for the total area or circumference of the occluding dots. In contrast, the responses of the first example neuron are inconsistent with this interpretation because the stronger responses to occluded stimuli were also accompanied by stronger shape selectivity.
Most of the vlPFC neurons we recorded responded more strongly to occluded stimuli (Figure 3). Of 216 neurons that were visually responsive during the test stimulus epoch (see Materials and methods), 98 neurons (45%) were significantly modulated by occlusion level (2-way ANOVA, p<0.05). The responses of most of these occlusion-sensitive neurons were stronger for higher occlusion levels (Figure 3A). For individual neurons, this observation manifested as a negative linear regression slope between % visible area and the average responses during the test epoch (Figure 3B). Of the 98 occlusion-sensitive neurons, 71 had a negative regression slope; 59 neurons had a slope that was significantly less than zero (p<0.05). For the subset of occlusion-sensitive neurons, normalized responses were also stronger at higher occlusion levels (Figure 3C). The results were also qualitatively similar when all visually responsive vlPFC neurons were included.
Shape selectivity was stronger for occluded than unoccluded stimuli across the population of vlPFC neurons. For the subset of occlusion-sensitive vlPFC neurons, shape selectivity was strongest for occluded stimuli at intermediate occlusion levels (blue/green) and weakest for unoccluded stimuli (black) (Figure 3D). This observation also held for the subset of shape-selective vlPFC neurons (N = 66; Figure 3—figure supplement 1B) and for all visually responsive neurons (N = 216; Figure 3—figure supplement 1D). Even for the small subset of vlPFC neurons that responded more strongly or equally well to unoccluded stimuli (27/98 neurons had a positive regression slope; 17/98 neurons had a slope significantly greater than zero, p<0.05), shape selectivity was not stronger for unoccluded than occluded stimuli (see Figure 3—figure supplement 2A). Thus, the vlPFC neuronal population has stronger, more shape-selective responses to occluded than unoccluded stimuli.
In addition to showing a preference for occluded stimuli during the test epoch, the responses of many vlPFC neurons during this epoch also signaled whether the test and reference shapes were a match/nonmatch. Of 216 neurons, the responses of 42 had a significant main effect of match/nonmatch condition, and the responses of 65 other neurons had a significant interaction between shape and match/nonmatch condition (two-way ANOVA, p<0.05).
The vlPFC results presented thus far differ markedly from what we and others have reported in the visual cortex regarding the representation of occluded and unoccluded objects. In monkey cortical areas V4 (Kosai et al., 2014) and IT (Kovács et al., 1995) and in human occipitotemporal cortex (Tang et al., 2014a), neuronal responses are strongest for unoccluded objects and neuronal shape selectivity declines gradually with increasing occlusion level. Thus, the strong responses and shape selectivity of vlPFC neurons for occluded stimuli cannot be inherited directly from visual cortex. Next, we examine how the signals in vlPFC compare to those in the visual cortex by analyzing neuronal response dynamics in V4 datasets collected previously (Kosai et al., 2014), recorded in the same monkeys while they performed the same behavioral task used in the vlPFC testing sessions.
If feedback signals originating in vlPFC contribute to V4 responses, their influence on V4 would be evident after the onset of vlPFC responses. Additionally, their influence would manifest in V4 as stronger responses for occluded than unoccluded stimuli, consistent with the vlPFC response properties described earlier. Many V4 neurons in our dataset did not show evidence of feedback modulation in their temporal response profiles. The responses of one such example neuron (Figure 4A–B) had a temporal response profile with a single transient response phase (i.e. peak) followed by a sustained response phase. During both the transient and sustained phases, responses were stronger for unoccluded than occluded stimuli, unlike what we observed in vlPFC. However, many other V4 neurons showed a different temporal profile with two transient response peaks – one early and one late – each of which showed a different dependency on occlusion level. The responses of one such example neuron (Figure 4C–D) had two transient response peaks: the first ~82 ms and the second ~150 ms after test stimulus onset. The neuron’s responses during the first peak (Figure 4D, black bar) were strongest for the unoccluded stimulus and declined gradually with increasing occlusion level. In contrast, the neuron’s responses during the second peak (Figure 4D, red bar) were strongest for intermediate occlusion levels.
The responses of two other V4 neurons with two peaks are shown (Figure 5). For one neuron (Figure 5A), the first and second peaks occurred ~63 ms and ~191 ms after stimulus onset. For the other neuron (Figure 5B), the first and second peak occurred ~66 ms and ~218 ms after stimulus onset. Additional examples of V4 neurons with two peaks are provided (Figure 5—figure supplement 1).
We developed an ad hoc peak finding algorithm to identify V4 neurons with and without two transient response peaks (see Materials and methods; Figure 6—figure supplement 1). The algorithm detected the occurrence of two robust transient peaks separated by a sizeable intervening trough, and the results were vetted using statistical tests. Of 85 neurons, 30 neurons (35%; 14 neurons recorded in Monkey O and 16 neurons recorded in Monkey M) were classified as having two peaks (Figure 6A) and 55 neurons were classified as not having two peaks (Figure 6B). The second response peak was less striking when the responses were averaged across neurons (Figure 6C) due to variability in second peak times for individual neurons (see also Figure 10—figure supplement 6). Across all V4 neurons with two peaks, the timing of the first and second peaks had a broad range, with a median of 84 ms and 214 ms, respectively (Figure 6C). In comparison, across all occlusion-sensitive vlPFC neurons, the peak response occurred later than in V4, 93–581 ms after test stimulus onset, with a median of 157 ms (Figure 6D). Thus, the median peak time in vlPFC straddled the median peak times of the first and second response peaks in V4.
If feedback signals from vlPFC contribute to V4 activity during the second response peak, we expect that V4 responses would differ in their dependence on occlusion level over time. We therefore assessed neuronal sensitivity to occlusion during the first and second response peaks in V4. Data from three example V4 neurons are shown (Figure 7A–C). For the first neuron (Figure 7A; same neuron as in Figure 4C–D), responses during the first peak (black) declined gradually as occlusion level increased. In contrast, responses during the second peak were strongest at intermediate occlusion levels. Thus, the difference in responses between the first and second peak (gray) was largest at intermediate occlusion levels. The two other example neurons showed similar results (Figure 7B–C; same neurons as in Figure 5A–B, respectively): the difference in responses between the first and second peak was larger for occluded stimuli than unoccluded stimuli for both neurons.
We compared the first and second peak responses for V4 neurons with and without two peaks. For both groups of neurons, responses during the first peak (69–99 ms) declined gradually as occlusion level increased. There was no significant difference between neurons with and without two peaks in their responses during the first peak at any occlusion level (t Test, p>0.1). However, later in the test stimulus epoch (199–229 ms), around the time of the second peak, the responses of the two groups of neurons had different trends. For neurons with two peaks, the difference in responses between the first and second peak was largest for intermediate occlusion levels and small for unoccluded stimuli and high occlusion levels (Figure 7D, dark gray). Thus, the response difference curve had an inverted U shape, as seen for the example neurons (compare dark gray curves, Figure 7A–D). In contrast, for neurons without two peaks, the difference in responses between the two time points was small and similar in magnitude across all occlusion levels (Figure 7D, light gray). The difference in responses between the first and second peaks was significantly greater for neurons with two peaks than other neurons at intermediate occlusion levels (compare curves in 7D; t Test, p<0.05, asterisks). This finding is explained by the observation that V4 neurons without two peaks had responses that declined gradually over time, for all occlusion levels (Figure 6B). In contrast, neurons with two peaks showed a relative increase in responses to occluded stimuli during the second peak (Figure 6A), a pattern that mirrored the responses of occlusion-sensitive neurons in vlPFC.
To quantify how shape selectivity evolves during the test stimulus epoch, we examined average neuronal shape selectivity across time for unoccluded and occluded stimuli for V4 neurons with two peaks (Figure 8A) and those without (Figure 8B). For unoccluded stimuli (black lines), shape selectivity was similar in magnitude and time course for the two groups, reaching a maximum value at ~120 ms. For occluded stimuli (colored lines), shape selectivity was similar for the two groups (t Test, p=0.6) early in the test stimulus epoch, around the time of the first peak (69–99 ms). However, shape selectivity for neurons with two peaks was significantly stronger (t Test, p<0.01) later in the test stimulus epoch, around the time of the second peak (199–229 ms). This is because for neurons with two peaks, shape selectivity for occluded stimuli increased over time and reached a maximal value closer to the time of the second peak (Figure 8—figure supplement 1).
We demonstrate this enhanced shape selectivity for occluded stimuli later in the test stimulus epoch in two ways. First, we compared the magnitude of shape selectivity at different periods (Figure 8C, early and late). Second, we compared the timing of peak selectivity for occluded and unoccluded stimuli (Figure 8D–E). For neurons with two peaks, shape selectivity around the time of the second peak was significantly stronger than around the time of the first peak (t Test, p<0.01; Figure 8C). This observation did not hold for neurons without two peaks (p=0.92) or for unoccluded stimuli for either group of neurons (p>0.5). The timing of maximal shape selectivity for unoccluded stimuli occurred significantly earlier than the second peak (Figure 8D, median 131 vs. 214 ms, respectively; t Test, p<0.01). In contrast, the timing of maximal shape selectivity for occluded stimuli occurred around the time of the second peak (Figure 8E, median 188 vs. 214 ms, respectively; t Test, p>0.98),
The enhanced shape selectivity for occluded stimuli around the time of the second peak occurred even for neurons that had stronger responses during the first peak than the second peak (Figure 8—figure supplement 2A). This finding suggests that response magnitude during the second peak does not fully account for the strength of shape selectivity. However, for neurons that had stronger shape selectivity during the second peak, the relative magnitude of responses during the second peak was larger for the preferred than non-preferred shapes (Figure 8—figure supplement 2B). This differential enhancement of responses to preferred shapes serves to amplify shape selectivity during the second peak.
Given that we classified neurons based on an ad hoc algorithm with customized parameters (see Materials and methods and Figure 6—figure supplement 1), we sought to ensure that the findings did not depend on the choice of parameters used and that the algorithm did not yield false-positives. To address these concerns, we examined population results for neurons with and without two peaks using different choices of threshold parameters (Figure 8—figure supplements 3–4). Additionally, we developed a model-based procedure that was independent of the ad hoc peak finding algorithm to identify neurons whose responses to occluded stimuli were stronger than expected from a linear scaling of responses to unoccluded stimuli (Figure 8—figure supplements 5–6). We found good correspondence between the model-based and algorithm-based approaches in terms of the neurons identified as having two peaks. Population results generated using different parameter choices for the ad hoc algorithm and using the model-based procedure were remarkably similar to those presented earlier (Figures 6 and 8).
Collectively, these results support the hypothesis that occlusion-sensitive signals in vlPFC are relayed to V4 and that these feedback signals contribute to V4 responses during the second peak, enhancing neuronal selectivity for occluded shapes. These putative feedback signals may be well suited to enhance perceptual discriminability of partially occluded objects.
To demonstrate the plausibility of feedback signals from vlPFC contributing to V4 responses to occluded stimuli, we constructed a two-layer dynamical model of V4 and vlPFC interactions (Figure 9; see Materials and methods). In this model, shape-selective V4 units send feedforward inputs to vlPFC units (Figure 9, light gray arrows). The shape preference of each vlPFC unit is inherited from the V4 unit which provides the strongest input. vlPFC units also receive a gain modulation signal that increases with increasing occlusion level (dashed box), imparting a preference for occluded stimuli that is not observed in the feedforward V4 inputs to vlPFC. Additionally, vlPFC units send feedback inputs onto V4 units (medium gray arrows) with connection strengths that are proportional to the feedforward signals from each V4 unit. Importantly, feedback signals from vlPFC first pass through a rectifying nonlinearity prior to their arrival in V4 (Equation 7, Materials and methods). The vlPFC feedback signals contribute to two key response features of the V4 units: a second transient response peak and a dynamic preference for occlusion level over the test stimulus epoch.
To demonstrate model performance, we present data for a simulated V4 and vlPFC unit (Figure 10). In the model, feedforward input to the V4 unit is modulated both by shape and occlusion level (Figure 10A): it is strongest when the preferred shape is unoccluded and is progressively weaker for higher occlusion levels. This pattern is consistent with our V4 neuronal data and captures responses during the first peak for V4 neurons with two peaks, as well as the responses of V4 neurons without two peaks. Occlusion-dependent gain modulation of feedforward input from V4 produced vlPFC unit responses that were weak to unoccluded stimuli and stronger to occluded stimuli (Figure 10B). Furthermore, the V4 unit receiving feedback signals from vlPFC had two transient response peaks: one earlier and one later than the response peak of the vlPFC unit (Figure 10C). The V4 unit’s responses during the first peak were strongest for the unoccluded stimulus and declined with increasing occlusion level. In contrast, the V4 unit’s responses during the second peak were strongest at intermediate occlusion levels. Thus, the response dynamics generated by this V4–vlPFC interaction model successfully recapitulated the dynamics observed in our neuronal recordings (Figures 2 and 4).
Our model was constructed to include only the minimal set of mechanisms that were needed to account for the main features of the neurophysiological data. We started with the simplest feedforward model composed of two V4 units and two vlPFC units, and we included four mechanisms to achieve the desired dynamics in V4 and vlPFC responses: (1) feedback from vlPFC to V4; (2) synaptic adaptation in the feedforward inputs from V4 to vlPFC; (3) half-wave rectification of feedback signals from vlPFC to V4; (4) occlusion-dependent gain modulation input to vlPFC. We included feedback from vlPFC to V4 because a network with only feedforward connections from V4 to vlPFC units cannot generate the second response peak in V4 units (Figure 10—figure supplement 1A). Thus, in our model, feedback from vlPFC to V4 is necessary to reproduce the response dynamics observed in V4. Second, without synaptic adaptation on the feedforward connections from V4 to vlPFC (Equation 9, Materials and methods), the feedforward-feedback loop reinforces activity in V4 and PFC units positively (Figure 10—figure supplement 1B). The resulting ‘ringing’ and ‘blow up’ in simulated model responses are inconsistent with the data, thus arguing for the inclusion of an adaptation mechanism. Indeed, such an adaptation mechanism is often used in models to soften positive feedback loops (e.g. Wei and Wang, 2016). Third, without half-wave rectification of the feedback input from vlPFC to V4 units, the model produces a large second response peak even to presentation of non-preferred stimuli (Figure 10—figure supplement 2), which conflicts with the data (Figure 8—figure supplement 2B). Thus, without half-wave rectification, the enhanced shape selectivity during the second peak in the V4 neuronal data (Figure 8C) is not reproduced by the model. Given that rectifying nonlinearities can occur when synaptic inputs are transformed into output spikes, the model suggests that feedback from vlPFC may arrive in V4 after passing through a synapse. Indeed, anatomical observations of disynaptic feedback connections between V4 and vlPFC exist (Ninomiya et al., 2012). Fourth, when all other mechanisms are in place but the gain modulation is removed, vlPFC responses decrease with increasing occlusion level (Figure 10—figure supplement 1C). Consequently, V4 shape selectivity under occlusion is not enhanced during the second peak. To consider the possibility that gain modulation of vlPFC may be mediated by signals from V4 or IT cortex, we verified that our simulations were unaffected by delays of up to ~50 ms in the arrival of gain modulation relative to the arrival of shape selective signals (Figure 10—figure supplement 3).
To generate the simulated responses shown (Figure 10), we chose model parameters that reproduced the response dynamics of example neurons (Figures 2A, 4 and 5). However, V4 neurons show substantial diversity in the magnitude and timing of the second response peak (Figure 5—figure supplement 1). vlPFC neurons also show diversity in terms of their shape selectivity and the dependence of their responses on occlusion level. Therefore, we systematically varied the model parameters governing synaptic strengths and delays to generate diverse simulated response dynamics and patterns in V4 and vlPFC model units. We varied the relative strengths of the feedforward input from the two V4 units to vlPFC, and we verified that a second response peak was observed in the V4 unit responses even when vlPFC units are only weakly shape-selective (Figure 10—figure supplement 4). By varying the feedback connection strengths, and the synaptic delays between the two areas, we were able to generate a range of second peak magnitudes (Figure 10—figure supplement 5A) and second peak times (Figure 10—figure supplement 5B). Finally, the population average response across model V4 units (Figure 10—figure supplement 6) resembles the V4 population data (Figure 6A).
To determine the contributions of prefrontal cortex to the representation and recognition of partially occluded objects, we compared the response dynamics of vlPFC and V4 neurons in monkeys discriminating shapes in the presence and absence of occluders. Our study provides three new insights. First, neuronal responses in vlPFC are strongest for occluded stimuli and weaker for unoccluded stimuli, in contrast to neuronal responses in visual areas V4 and IT (Kosai et al., 2014; Kovács et al., 1995; Tang et al., 2014a). Second, the responses of many V4 neurons have two transient peaks, the second of which emerges after the onset of vlPFC responses and shows a stronger preference for occluded stimuli. Third, neuronal shape selectivity for occluded stimuli in V4 is enhanced during the second transient peak. Our results support the hypothesis that feedback signals from vlPFC mediate V4 responses during the second transient peak and that these signals facilitate object recognition under occlusion.
Our results demonstrate that visual representations in vlPFC do not always mirror representations in visual cortex and suggest that vlPFC may play an important role in representing objects. We used different experimental approaches for the V4 and vlPFC recordings, but these methodological differences cannot account for differences in how V4 and vlPFC neurons represent occluded and unoccluded stimuli. In V4 recording sessions, but not in vlPFC sessions, we tailored the stimulus color and shape to the preferences of the neuron; this may explain the preponderance of V4 neurons that responded preferentially to unoccluded shapes in our dataset. Without tailoring stimuli we would expect a roughly equal proportion of neurons showing responses that increased and decreased with increasing occlusion level. We found, however, that 72% of vlPFC neurons responded preferentially to occluded shapes, a proportion that deviates significantly from the null hypothesis (binomial test, p<0.01). Furthermore, because the visible difference between any two shapes declines with increasing occlusion level, we expect shape selectivity to decline regardless of whether we tailored visual stimuli. The enhanced shape selectivity we observed in vlPFC under occlusion defies this expectation.
The stronger responses to occluded stimuli in vlPFC cannot be attributed to neuronal preferences for the color of the occluding dots. We verified in control experiments that the preference for occluded stimuli was independent of dot color (data not shown). Given that many vlPFC neurons are selective for the shape of the occluded stimulus, it is unlikely that vlPFC responses solely reflect task difficulty level or attentional demands. If difficulty or attention could fully explain vlPFC responses, occlusion-sensitive neurons would not be shape-selective (i.e. the PSTHs in Figure 2A and Figure 2B would be identical). We also verified in control experiments that vlPFC neuronal responses were weaker when the occluding dots were in the same color as the background – an observation suggesting that the vlPFC responses we observed rely on explicit occlusion-related signals.
The dependence of vlPFC responses on occlusion level varied across neurons. The responses of some neurons increased gradually with increasing occlusion level whereas the responses of other neurons increased abruptly, even at the lowest occlusion levels. Further experiments are needed to determine whether neuronal sensitivity to occlusion is determined by feedforward inputs, by gating in vlPFC or by the difficulty of the perceptual discrimination.
We propose that vlPFC responses arise from the modulation of occlusion-dependent, shape-selective feedforward signals from V4 by another feedforward signal that is dependent only on the occlusion level. In our simple behavioral task, where the occluding dots have a different color than the occluded shapes, a neuron sensitive to the color and area of the occluding dots could signal the level of occlusion. Indeed, in one monkey performing the same behavioral task used in the current study, we found that the responses of many IT neurons were consistent with encoding the total area of the occluders (Namima and Pasupathy, 2016). However, in the natural world, where there are multiple objects and the attributes of the occluders are not known a priori, identifying which object is occluded, and by how much, could be challenging. Extending our simple model to tackle more complex, naturalistic cases would likely require the incorporation of attention and memory processes.
To perform the sequential shape discrimination task used in the current study, the reference stimulus held in memory must be compared to the test stimulus on the screen. Given its role in working memory, the PFC is a plausible neural locus for this comparison (Fuster, 1989; Kim and Shadlen, 1999; Romo and de Lafuente, 2013). Our results suggest, however, that the comparison of reference and test stimuli is unlikely to be implemented in vlPFC. We found stronger neuronal selectivity in vlPFC for occluded than unoccluded test stimuli. Thus, if behavioral performance depended on comparisons implemented in vlPFC, discriminability would be higher for occluded stimuli and lower for unoccluded stimuli – the opposite of the performance we observed (Figure 1B). The weak neuronal responses in vlPFC to unoccluded stimuli are consistent with a report of weak neuronal selectivity in this area for stimulus color in monkeys performing a color change detection task (Lara and Wallis, 2014). Together, these findings challenge the notion that vlPFC activity mediates perceptual discriminations of form and color directly. The comparison of sensory representations could be implemented in other parts of the PFC or in sensory cortex, where signals correlated with monkeys’ behavioral decisions have been reported (Kim and Shadlen, 1999; Eskandar et al., 1992; Miller and Desimone, 1994; Wallis and Miller, 2003; Romo and Salinas, 2003; Zaksas and Pasternak, 2006; Kosai et al., 2014). The evidence for functional connectivity between V4 and lateral PFC during memory maintenance also supports the implementation of decision computations in visual cortex (Liebe et al., 2012).
The finding that neuronal responses in vlPFC are stronger for occluded stimuli is consistent with two possibilities—a role for this area in decision-making and a role in the recognition of occluded objects. Given that the monkeys were required to report their perceptual judgments, it is possible that the vlPFC responses we recorded reflect this area’s engagement in facilitating decisions under limited sensory evidence. When shape selectivity in visual cortex is weakened by the presence of occlusion (Kosai et al., 2014; Kovács et al., 1995; Tang et al., 2014a), decision-making becomes difficult. In this case, vlPFC feedback might serve to amplify weak signals expressly to facilitate decisions. In this regard, our results are consistent with vlPFC’s engagement in tasks of greater difficulty or cognitive demand (Crittenden and Duncan, 2014). Importantly, when the task becomes difficult, rather than reflecting task difficulty per se, we propose that vlPFC responses amplify behaviorally relevant signals to facilitate perceptual decisions.
An alternative possibility is that the preference of vlPFC neurons for occluded stimuli may be related to the specific engagement of vlPFC in the recognition of occluded objects. Previous work has argued that the processing of complex visual scenes containing clutter and occlusions may be guided by higher cognitive, memory processes (Cavanagh, 1991; Kveraga et al., 2007). For example, image representations in early and mid-level areas of the ventral visual pathway may be relayed to higher processing stages where they are compared to stored representations of object prototypes, leading to the recognition of objects in the scene (Kveraga et al., 2007). This recognition process may then guide the grouping of appropriate contours and regions, thereby facilitating object segmentation and scene understanding (McDermott, 2004; Kveraga et al., 2007). Our results are also broadly consistent with the possibility that vlPFC activity embodies a recognition signal that is fed back to V4 to refine object representations. Further experiments are needed to differentiate between the two alternative roles for vlPFC.
Studies of object representation and recognition often consider spiking activity only within the first hundred milliseconds after stimulus onset (e.g. Hung et al., 2005). The rationale for choosing this early temporal epoch for analysis is based on the argument that successful categorization can be achieved by feedforward processes alone (VanRullen and Thorpe, 2001; Serre et al., 2007). However, neuronal responses to visual stimuli depend not only on signals carried by feedforward connections but also by feedback and horizontal connections. Feedback and horizontal connections may modulate neuronal responses based on stimulus context and behavioral goals, and confer selectivity for more complex visual stimuli (Lamme and Roelfsema, 2000; Gilbert and Li, 2013).
The relative contributions of feedforward, feedback and horizontal connections to neuronal responses are hard to disentangle experimentally, but examining the response dynamics provides useful insights. For example, recent studies comparing V4 and V1 response dynamics during contour grouping and scene segmentation tasks suggest that feedback from V4 to V1 enhances the representation of figures and suppresses the representation of backgrounds in V1 (Chen et al., 2014; Poort et al., 2012). Similarly, our results suggest that feedback from vlPFC to V4 enhances the representation of behaviorally relevant, occluded shapes in V4.
Our model simulations suggest that the enhancement of V4 shape selectivity could be mediated even by weakly tuned vlPFC neurons. This may be because we used only two discriminanda in each experimental session, thereby simplifying the object recognition problem. In this case, any given vlPFC neuron receiving differential input from the two subsets of V4 neurons that signal the two shape discriminanda could contribute to enhanced V4 shape selectivity. We cannot rule out the possibility that IT responses also contribute to V4 responses during the second transient peak. However, our IT recordings suggest this is unlikely because, as in V4, shape selectivity in IT is stronger for unoccluded than occluded stimuli (Namima and Pasupathy, 2016). Thus, putative feedback from IT may not be well-suited for enhancing V4 shape selectivity at intermediate occlusion levels.
The V4 neurons in our data set showed a broad range of second peak times and peak magnitudes. The diversity in timing of the second peak is likely because V4 and vlPFC are connected via feedforward and feedback pathways that are direct and indirect, and these different pathways are expected to have different conduction times. It is also possible that feedback signals from vlPFC to V4 are carried by a sparse projection and then distributed more broadly across V4 via horizontal connections, resulting in longer delays (than expected for disynaptic transmission) between the peak of vlPFC responses and the second peak of V4 responses. The strength of connections between V4 and vlPFC is likely heterogeneous, which could explain the range of second peak magnitudes we documented. Our simulations demonstrate that even weak functional interactions between the two areas could result in a small, second response peak in V4 that may be undetectable in highly variable responses. Overall, the heterogeneous properties of the second response peak support the possibility that V4 neurons with and without two peaks lie along a continuum.
While our model simulations successfully reproduced the diversity of neuronal dynamics observed in V4 in terms of the amplitude and timing of the second transient response peak, this result was achieved by tweaking the parameters of response timing and connection strengths in different instantiations of the two-layer V4–vlPFC interaction model. We do not know whether a large set of interconnected neurons, each with a large number of incoming inputs, could exhibit diverse response dynamics despite differences in response timing and connection strengths. In such a network, ‘averaging’ across the many inputs each neuron receives could dampen diversity in the response dynamics. It is also possible that in the limiting case where the neural population is large in size, but the number of active incoming connections per neuron is relatively small, substantial variation in response dynamics could persist across neurons. A detailed study of the relationships between network connectivity, network size and neuronal dynamics would be useful to validate the proposed model.
We studied vlPFC and V4 neuronal responses in the same monkeys and using the same behavioral paradigm, thereby facilitating direct comparisons of measurements made in the two cortical areas. Nevertheless, our approach to studying vlPFC contributions to V4 had several limitations. First, we conducted V4 and vlPFC recordings in separate sessions so we cannot compare V4 and vlPFC activity on individual trials. Second, we used only two discriminanda in each behavioral session, so we do not know whether vlPFC neurons are sufficiently sensitive to shape information to mediate the recognition of occluded objects. Third, we did not instruct monkeys to report their behavioral decisions as soon as possible, so we cannot infer the precise epoch of V4 neural activity that mediates perceptual judgments. Specifically, we do not know whether V4 neuronal responses during the second transient peak, which have stronger shape selectivity, contributed to the monkeys’ perceptual decisions. Fourth, we do not know whether the engagement of vlPFC neurons in our study was contingent on the monkeys reporting their perceptual decisions, or whether these same neurons would also be engaged in the recognition of partially occluded objects in natural viewing conditions. We hope that future studies will answer these outstanding questions and probe the causal link between V4 and vlPFC activity using perturbations of neuronal activity.
Partial occlusions pose a major challenge to the successful recognition of visual objects because they reduce the evidence available to the brain. Recognizing partially occluded objects could require solving an ill-posed inverse problem (Helmholtz, 1910; Yuille and Kersten, 2006), one that lacks a unique solution because the retinal image of an occluded object is often compatible with multiple interpretations (e.g. see Bregman's B illusion, Bregman, 1981). As a result, recognition must rely not only on information about the physical object but also on information about the occlusion, scene context and perceptual experience. Our results provide support for the hypothesis that feedback signals from vlPFC, which carry information about occlusions, contribute to object representation in V4 and to object recognition under occlusion. Other brain regions, for example IT cortex, are likely to be involved and should be studied. Future experiments are needed to reveal the detailed algorithms used by neurons and circuits to solve object recognition under occlusion.
Two adult male rhesus macaques (Macaca mulatta) were prepared for neurophysiological recordings using sterile surgical procedures. For experiments in prefrontal cortex, recording chambers were centered over the principal sulcus and targeted the ventrolateral prefrontal cortex (vlPFC), located ventral to and along the caudal third of the principal sulcus. The stereotaxic, central coordinates of prefrontal recording chambers were derived based on structural MRI images for each animal, and were ~21 mm anterior of interaural zero and ~19 mm lateral to the midline. For experiments in visual cortex, recording chambers were centered on the dorsal surface along the prelunate gyrus and targeted area V4, extending between the lunate sulcus and the superior temporal sulcus. Recordings from the two areas were carried out serially in the same monkeys, starting with V4 then vlPFC. All animal procedures conformed to NIH guidelines and were approved by the Institutional Animal Care and Use Committee at the University of Washington.
Extracellular recordings were performed using epoxy-coated tungsten microelectrodes (250 µm, FHC) lowered into cortex through an acute microdrive system (Gray Matter Research, 8-channel). Voltage signals were amplified and band-pass filtered (0.1–8 kHz) using a recording system (Plexon Systems, 16-channel). The waveforms of single units were isolated manually using spike-sorting software (Plexon Systems, Offline Sorter). The results reported in the current study are based on 381 vlPFC neurons (260 and 121 from Monkey M and Monkey O, respectively) and on 85 V4 neurons (41 from Monkey M and 44 from Monkey O, respectively). A subset of the V4 neurons (62 of 85 neurons) contributed to a previous study (see Kosai et al., 2014).
Visual stimuli were presented on a calibrated CRT monitor (1600 x 1200 pixels; 97 Hz frame rate; 57 cm in front of the monkey). Stimuli were presented against an achromatic gray background of mean luminance 5.4 cd/m2. Stimulus onset and offset times were based on photodiode detection of synchronized pulses in one corner of the monitor. Stimulus presentation and behavioral events were controlled by custom software written in Python (Pype, originally developed by Jack Gallant and James Mazer; Mazer, 2013). Eye position was monitored using a 1 kHz infrared eye-tracking system (Eyelink 1000; SR Research).
Monkeys performed a sequential shape discrimination task in the presence and absence of occluders (Figure 1A). Each trial began with the presentation of a central point (0.1°), which the monkey had to fixate within a circular window of radius 0.75°. After acquiring fixation, two stimuli were presented: a ‘reference’ stimulus, followed by a ‘test’ stimulus. The reference stimulus was always an unoccluded, 2D shape. The test stimulus was a 2D shape that was unoccluded or partially occluded by a field of randomly positioned dots. Occlusion level was quantified as the percentage of the shape area that remained visible (‘% visible area’) and was titrated by varying the diameter of the occluding dots (for details, see Kosai et al., 2014). Each stimulus was presented for 600 ms, with an inter-stimulus interval of 200 ms between the reference and the test stimuli. Following a 50 ms delay, the fixation point was extinguished and two peripheral choice targets appeared (left and right dots, 6° eccentric; Figure 1A). The monkey reported whether the two shapes presented were the same or different via a saccade to the right or left target, respectively, and within 500 ms of target onset. The monkey received liquid reward for correct performance. In cases where the monkey broke fixation or failed to respond, the trial was repeated later in the session. Behavioral trials were separated by an inter-trial interval of 2 s.
V4 data were collected by studying one neuron at a time and tailoring the shapes, occluding dots, colors and position of the test stimulus to the preferences of the neuron recorded. Based on preliminary characterizations (for details, see Kosai et al., 2014) we chose two shapes as the discriminanda: one preferred and one non-preferred. The two shapes were presented in the neuron’s preferred color whereas the occluding dots were presented in a contrasting, non-preferred color. The reference stimulus was presented at central fixation whereas the test stimulus was presented at the center of the neuron’s RF. The vlPFC data were collected by studying several neurons simultaneously, an approach that precluded tailoring the shapes and occluding dots to the preferences of individual neurons. To equate behavioral task difficulty across V4 and vlPFC recording sessions, we chose, at random, stimulus parameters for each vlPFC recording session from among those used for V4 recording sessions.
Two shapes were used in each behavioral session, yielding four trial conditions per occlusion level (2 shapes x two behavioral outcomes). We studied each neuron’s responses to the two shapes under four or more occlusion levels, including the unoccluded case. In V4 recordings, we sampled 4–9 oc clusion levels (median = 6). In vlPFC recordings, we sampled 5–6 oc clusion levels (median = 5). We only included data from neurons tested with at least seven repeated presentations of each trial condition and of each occlusion level tested. The median number of repeats was 24 for V4 recordings and 15 for vlPFC recordings.
We identified visually responsive vlPFC neurons by comparing the firing rate during a 150 ms window, beginning 80 ms after test stimulus onset, to the firing rate during the fixation epoch before reference stimulus onset. Neuronal responses to the test stimulus were often transient (see Figure 2), motivating us to calculate firing rates in a 150 ms window rather than the full duration of stimulus presentation. The 80 ms offset was introduced to account for the visual response latency of neurons. Among 381 vlPFC neurons, 216 (142/260 in Monkey M and 74/121 in Monkey O) were significantly responsive during the test stimulus epoch (t Test, p<0.01). All further data analyses were restricted to these visually responsive neurons (57% of vlPFC neurons recorded).
To assess whether neuronal responses to the test stimulus were modulated by shape and/or occlusion level, we conducted a 2-way ANOVA on activity during the same response window defined above, with stimulus shape and occlusion level as factors. Among the 216 visually responsive neurons, the responses of 98 neurons (71/260 in Monkey M and 27/121 in Monkey O) showed a significant dependence on occlusion level (p<0.05) and the responses of 66 neurons showed a significant dependence on stimulus shape (p<0.05).
To examine the dynamics of neuronal shape selectivity, we performed a sliding-window Receiver Operating Characteristic (ROC) curve analysis on responses to the preferred and non-preferred shapes at each occlusion level. For V4 neurons, the preferred shape was that which evoked the largest average response across all occlusion levels. For vlPFC, because many neurons did not respond to unoccluded shapes, we computed the average response for each shape across all occlusion levels (i.e. visible area <100%) during the test epoch, and identified the preferred shape as that which evoked the largest average response. At every time point (1 ms steps), we counted spikes in a centered window of duration 75 ms (for V4) or 150 ms (for vlPFC). We then assessed shape selectivity by computing the area under the ROC curve derived from the spike count distributions of responses to preferred and non-preferred shapes. Shape selectivity values ranged from 0.5 (unselective) to 1.0 (very selective). To identify the time of maximal shape selectivity for occluded stimuli (Figure 8D, red bars), we also computed shape selectivity as described above, pooling across all levels of occlusion tested for each neuron.
To generate population response histograms (Figures 3 and 6), we normalized the responses of each neuron to the maximum across all occlusion levels then averaged the data at each occlusion level for all neurons. We did not test all neurons at the same occlusion levels (for each neuron, we tested 4–9 occlusion levels), so the number of neurons contributing to the average histograms varied for each occlusion level; these numbers are listed in the figures. For both cortical areas, population response histograms for the occlusion conditions of 44% and 27% visible area were based on only a few neurons and were therefore excluded.
To find the time to peak response for each neuron, we first constructed an average response histogram from the Gaussian-smoothed (σ = 10 ms) PSTHs across all occlusion levels. We then identified the time of maximal response between 50–600 ms after test stimulus onset. This temporal window allowed us to identify peaks associated with responses to the test stimulus rather than responses related to the preceding reference stimulus, memory delay or saccades that followed the test stimulus.
To identify V4 neurons with two transient peaks in their responses to occluded shapes, we devised an ad hoc algorithm, described below. This procedure was designed to identify neurons with a robust second transient response peak that could not be attributed to small, noisy ripples in the response. For each neuron, we first constructed an average PSTH of its responses to the preferred shape at different occlusion levels, smoothed with a Gaussian function (σ = 10 ms). We only included occlusion levels that evoked a response that was at least 33% of the maximal response to the unoccluded preferred shape. We then used a zero-crossing algorithm to identify local peaks within 300 ms of stimulus onset. Small peaks (<50% of the first transient peak) and small trough-to-peak modulation ratios (<15% of local peak magnitude) were rejected as false positives (see Figure 6—figure supplement 1 for a schematic of the procedure and examples of rejection cases). For each putative peak that met the peak amplitude and modulation criteria, we asked whether there was a statistically significant response increase relative to the preceding trough. To assess statistical significance, we conducted a paired t-Test between single trial spike counts within a 30 ms window centered at the peak and at the preceding trough (p<0.05, Bonferroni corrected). Of 85 V4 neurons, 43 had no robust peaks beyond the first transient that passed the peak amplitude and modulation criteria. Of the remaining 42 neurons, 30 had a second peak that showed a statistically significant response increase relative to the preceding trough; these neurons were classified as having two peaks. Specifically, 29/30 neuons had exactly one peak that qualified as the second response peak.
To evaluate whether interactions between vlPFC and V4 could account for the observed response dynamics, we constructed a network model (Figure 9). The model includes two stages of cortical processing that are intended to map onto areas vlPFC and V4. The V4 stage comprises two units ( and ), each selective for one of the shapes used in a testing session. The vlPFC stage also comprises two units ( and ) that receive excitatory feedforward input from V4 units. Each vlPFC unit inherits a preference for stimulus shape from the V4 unit that provides the strongest input (e.g. receives the strongest feedforward input from ). In the model simulations presented here, feedforward and feedback connections strengths are proportional. However, we have verified that the results hold for a broad range of connection strengths, as long as each vlPFC unit sends stronger feedback input to the V4 unit which provides its dominant feedforward input.
All model parameters (listed in Table 1) were chosen to reproduce the response dynamics observed in the experimental data. In addition, we also varied synaptic weights and delays over a range of values (see Table 2) to compare the heterogeneity of model units to observed data (Figure 10—figure supplements 3–5).
We modeled the dynamical firing rate response of each model V4 unit, , as:
and the firing rate response of each model vlPFC unit, , as:
where and denote the time constants of the responses; t denotes time; is a nonlinear function of the Naka Rushton form given by:
where and are the firing rate thresholds, and is a Gaussian white noise term with a standard deviation of and a timestep . We omitted the noise term for some simulations (Figure 10—figure supplements 1–6). Note that the precise form of the nonlinear function is not critical; any monotonically increasing nonlinear function with saturation and threshold, along with the dynamics of firing rates defined in Equations (1) and (2) provide a standard firing rate model (Dayan and Abbott, 2005).
For V4 model units, the input was the sum of two sources: (i) excitatory feedforward input from upstream visual areas, , and (ii) excitatory feedback inputs from vlPFC, and . The feedforward input confers shape selectivity to the V4 model units and a dependence of their responses on occlusion level (Figure 9). For the preferred stimulus, this input is strong and declines gradually with increasing occlusion level. For the non-preferred shape, this input is weak, as is the modulatory influence of occlusion level. The feedforward input, , was constructed by first convolving a difference of Gaussian filter (the standard kernel normalized difference of and )) with a -long ramp followed by cubing, normalization and half-wave rectification:
The ramp function (R), defined separately for the preferred (i = 1) and nonpreferred (i = 2) shapes, increases monotonically with the percentage of visible area (c) and declines over time with a support of 500 ms, that is
when . is otherwise.
Equations 4 and 5 were designed to simulate the input to V4 units (e.g. Figure 10A) with an onset latency of 30 ms, a strong initial transient response, a gradually declining sustained response, collectively lasting ~500 ms. Note that the precise function defining is not critical as long as it produces strong input signals for the preferred shape that decrease with increasing occlusion level, thus capturing the observed V4 neuronal response properties.
For the vlPFC units, the input, is the excitatory feedforward inputs from both V4 units, and . In addition, the vlPFC units receive a gain modulation signal, , that is proportional to the occlusion level. We modeled as a nonlinear, cubic function of the % visible area,. The function’s output was lowest for the unoccluded shape ( and increased for higher occlusion levels (Figure 9, ). The coefficients were fit so that the model responses closely resembled the neuronal data, but the qualitative results were independent of the coefficient values used:
Inputs between model units were modulated by connection weights: for the stronger feedforward inputs from V4 units to vlPFC units of the same shape preference (e.g. V4 unit 1→ vlPFC unit 1), for the corresponding feedback inputs (e.g. vlPFC unit 1→ V4 unit 1), for the weaker feedforward inputs from V4 units to vlPFC units of a different shape preference (e.g. V4 unit 1→ vlPFC unit 2), and for the corresponding feedback inputs (e.g. vlPFC unit 1→ V4 unit 2). Thus, the feedback input from vlPFC unit onto V4 unit , , was implemented as follows:
where the responses of vlPFC units were thresholded and half-wave rectified. This threshold on vlPFC firing rates was introduced to reduce the magnitude of the second transient peak in V4 unit responses to the non-preferred shape (see Figure 10—figure supplement 2).
The feedforward excitatory input from V4 unit to vlPFC unit was implemented as:
The feedforward and feedback temporal delays between vlPFC and V4 unit responses, and were chosen to be consistent with the difference in time between the vlPFC and V4 response peaks observed in our neuronal data.
To prevent the second response peak of V4 units from inducing a second response peak in vlPFC units (see Figure 10—figure supplement 1B), the feedforward connections from V4 to vlPFC included an adaptation term, as follows:
where the weight of connections from V4 to vlPFC represents both and , and evolves with time scale . When vlPFC activity exceeds the value of (10 spk/sec, see Table 1), the steady state feedforward connection from V4 to vlPFC, , goes to 0, and any subsequent input from V4 will fail to activate vlPFC. The feedback connectivity weight was time-independent and set to steady state values: , .
The set of differential equations, with stochastic noise term was solved using the Forward Euler Method in MATLAB. The initial firing rate values for and were set to 0 spikes per second. The initial connectivity weights were equivalent to the steady-state weights , , and ; these and other parameters are given in Tables 1 and 2.
The code for the full model is available on Github (Choi, 2017). A copy is archived at https://github.com/elifesciences-publications/V4-PFC-dynamics.
Perceptual OrganisationAsking the “What-For” questions in auditory perception, Perceptual Organisation, Erlbaum.
Representation of Vision: Trends and Tacit Assumptions in Vision Research295–304, What's up in top-down processing?, Representation of Vision: Trends and Tacit Assumptions in Vision Research.
Theoretical Neuroscience: Computational and Mathematical Modeling of Neural SystemsMIT Press.
The Prefrontal CortexNew York: Raven.
Treatise on Physiological OpticsNew York: Dover.
Common neural mechanisms for response selection and perceptual processingJournal of Cognitive Neuroscience 15:1095–1110.https://doi.org/10.1162/089892903322598076
Neural correlates of a decision in the dorsolateral prefrontal cortex of the macaqueNature neuroscience 2:176–185.https://doi.org/10.1038/5739
The role of visual area V4 in the discrimination of partially occluded shapesJournal of Neuroscience 34:8570–8584.https://doi.org/10.1523/JNEUROSCI.1375-14.2014
Deep Neural Networks: A New Framework for Modeling Biological Vision and Brain Information ProcessingAnnual Review of Vision Science 1:417–446.https://doi.org/10.1146/annurev-vision-082114-035447
Feedforward, horizontal, and feedback processing in the visual cortexCurrent Opinion in Neurobiology 8:529–535.https://doi.org/10.1016/S0959-4388(98)80042-1
The distinct modes of vision offered by feedforward and recurrent processingTrends in Neurosciences 23:571–579.https://doi.org/10.1016/S0166-2236(00)01657-X
Executive control processes underlying multi-item working memoryNature Neuroscience 17:876–883.https://doi.org/10.1038/nn.3702
An integrative theory of prefrontal cortex functionAnnual Review of Neuroscience 24:167–202.https://doi.org/10.1146/annurev.neuro.24.1.167
Neural responses in the inferior temporal cortex to partially occluded and occluding stimuliSociety for Neuroscience Abstracts.
Hierarchical models of object recognition in cortexNature neuroscience 2:1019–1025.https://doi.org/10.1038/14819
Flutter discrimination: neural codes, perception, memory and decision makingNature Reviews Neuroscience 4:203–218.https://doi.org/10.1038/nrn1058
Conversion of sensory signals into perceptual decisionsProgress in Neurobiology 103:41–75.https://doi.org/10.1016/j.pneurobio.2012.03.007
Ambiguity and invariance: two fundamental challenges for visual processingCurrent Opinion in Neurobiology 20:382–388.https://doi.org/10.1016/j.conb.2010.04.013
Computational and Cognitive Neuroscience of VisionRecognition of occluded objects, Computational and Cognitive Neuroscience of Vision, Singapore, Springer-Verlag, 10.1007/978-981-10-0213-7_3.
Invariant face and object recognition in the visual systemProgress in Neurobiology 51:167–194.https://doi.org/10.1016/S0301-0082(96)00054-8
From rule to response: neuronal processes in the premotor and prefrontal cortexJournal of Neurophysiology 90:1790–1806.https://doi.org/10.1152/jn.00086.2003
The limits of feedforward vision: recurrent processing promotes robust object recognition when objects are degradedJournal of Cognitive Neuroscience 24:2248–2261.https://doi.org/10.1162/jocn_a_00282
Vision as Bayesian inference: analysis by synthesis?Trends in Cognitive Sciences 10:301–308.https://doi.org/10.1016/j.tics.2006.05.002
Directional signals in the prefrontal cortex and in area MT during a working memory for visual motion taskJournal of Neuroscience 26:11726–11742.https://doi.org/10.1523/JNEUROSCI.3420-06.2006
Nicole RustReviewing Editor; University of Pennsylvania, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "Dynamic representation of partially occluded objects in primate prefrontal and visual cortex" for consideration by eLife. Your article has been favorably evaluated by David Van Essen (Senior Editor) and three reviewers, one of whom is a member of our Board of Reviewing Editors. The reviewers have opted to remain anonymous.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this letter to crystallize our concerns going forward. We feel the work is important and interesting but key issues remain unresolved that must be addressed satisfactorily to produce an acceptable manuscript.
At this point we are unable to render a binding recommendation and require a response from you indicating the feasibility of your completing the essential tasks in a reasonable period of time – around 2 months. The Board member and reviewers will consider your response and provide a binding decision.
This paper characterizes responses of V4 and vlPFC neurons to partially occluded visual stimuli and suggests that feedback from vlPFC to V4 boosts V4 responses to occluded shapes and helps resolve stimulus identity. The authors recorded vlPFC and V4 neurons from the same monkeys performing the same task, although in separate experimental sessions. They demonstrate that vlPFC neurons respond more strongly and more selectively to occluded stimuli, unlike V4 neurons, which commonly respond most strongly and selectively to unoccluded images. The authors suggest that a subset of V4 neurons respond to occluded images with two distinct peaks (but not all reviewers are convinced that the distinction between subpopulations with one and two peaks are real). The second peak follows the vlPFC response peak and shows similar characteristics to vlPFC. Based on these observations, the authors construct a two-layer neural network in which V4 and vlPFC model units are reciprocally connected and vlPFC responses shape the second peak of V4 units.
The reviewers find the proposal that PFC interacts with V4 to resolve the challenge of solving shape discrimination in the presence of occlusion to be timely and of significant interest. At the same time, the reviewers identified problematic issues with the data analysis that must be resolved before this work could be considered for publication.
General concerns about the experimental aspects of the paper include the reproducibility of the main result and the impact of using different approaches when recording from the two brain areas that are compared. General concerns about the model include that it may be unnecessarily complex for the current illustrations provided, and that, even with this complexity, the current illustrations do not reflect population average effects.
1) The main claims of the paper rest on the assertion that V4 shape selectivity increases as a function of time. The concerns about this claim are two-fold:
1A) The current illustration is made via an argument that there are two subpopulations of neurons, those that have a second, shape selective peak and those that do not. The reviewers are concerned that the existence of these two subpopulations will not be reproducible. This includes some confusion about the methods that were employed to identify the two peaks, as well as the suspicion that the specific parameters used for this identification were overfit to this particular data set.
The way that the two peaks are identified (subsection “Peak finding algorithm”) is hard to understand: "were within ± 12 ms of at least one third of the peaks identified based on the PSTHs for individual occlusion levels".
How sensitive are these findings to the parameters of the ad hoc peak finding algorithm? Similarly, one may think that the algorithm for detecting double-peak V4 cells has many false positives and the true frequency of cells with two peaks is lower than that suggested in the paper. Can you clarify?
In Figure 6C, how can the "Time to peak" for V4 neurons be greater than 300 ms, if the second peak was required to be before 300 ms ("The second peak was constrained to be no later than 300 ms after test stimulus onset")?
The manuscript states: "However, shape selectivity for occluded shapes, particularly at intermediate occlusion levels (visible area 72- 95%), was different for the two groups: neurons with two peaks had significantly greater shape selectivity than neurons without two peaks in the time interval ~200-260 ms after test stimulus onset (t-Test, p < 0.05)." The supporting figure is Figure 8A vs. B. What this is stating exactly? Is the significance tested for each occlusion level separately and it was significant for visible area conditions of 72, 82, 90, and 95%? If so, was Bonferroni correction or another correction applied? How was this time period selected (~200-260 ms)? Was there a correction for testing multiple time periods? Do the results hold in different time periods?
To examine the tuning of the "second peak" the authors use as baseline the activity in an earlier phase of the visual response. Hence, if the neurons respond strongly early on, their response will decrease more strongly during the second peak. In other words, there is suspicion that the results presented in Figure 7 where activity of the second peak decreases if the visible area is large is simply an artifact of this erroneous choice of a time window to compute the baseline. Instead the authors should use the pre-stimulus activity as baseline for all epochs. This choice implies in the subsection “Response change”, in the first equation that b depends on i, in incoming input. In Figure 7D, E the y-axis is in units of "normalized response change". How was the normalization?
The average PSTH of V4 subpopulation with two response peaks is quite distinct from the single cell examples. For the population, the second peak is not obvious and responses to unoccluded stimuli stay above the occluded shapes, unlike the single cells in Figures 4 and 5. What causes the discrepancy? Is it the variability of the time of the second peak in V4 neurons? It will be helpful to have a supplementary figure with more single cell examples.
1B) More generally, the claim is that V4 shape selectivity changes as a function of time, and missing is the more direct comparison of shape selectivity for the same neurons early versus later in the response.
2) The broad tuning of neurons in vlPFC seems qualitatively inconsistent with it playing a role in enhancing V4 shape selectivity. The model does not currently resolve this nor is an explanation provided.
3) There are concerns that the functional differences between the responses in PFC and V4 may follow from differences in the experimental paradigm. These include:
3A) What was the effect of tailoring stimuli for neurons in one area and not doing so in another area? Can the latency of responses or their tolerance to occlusion be influenced by tailoring stimuli for neurons?
3B) It is unclear why the authors focus on the vlPFC neurons that are influenced by occlusion. There are also many neurons that do not care about the occlusion and we wonder whether these cells include some neurons that are tuned to the shape of the stimuli. If yes, is their tuning better or worse than that of the neurons that are influence by the occlusion?
What would Figure 3D look like if you included all of the 216 vlPFC neurons that were responsive during the test epoch?
Some cells have a stronger response if the visible area increases. Is it possible that these neurons are better tuned to the shapes than those neurons that are most active for high levels of occlusion?
Related to this last point: is it conceivable that the neurons that increase their response if there is more occlusion are tuned to some aspect of the occluders, e.g. the total surface area or the total perimeter of the occluding dots?
This latter possibility seems to be supported by the finding that the vlPFC preference for occlusion decreased when the occluders had the same color as the background (as is stated in the first paragraph of the subsection “Representation of occluded stimuli in vlPFC”).
There were 381 vlPFC neurons, 98 were significantly modulated by occlusion. How many of these 98 neurons are in monkey M vs. O?
4) Can the model replicate all aspects of the data? Including:
The shape selectivity index in vlPFC is considerably lower than in V4. Can the model in figure 9 work with reduced shape selectivity?
V4 cells are divided into two subpopulations, one with two peaks and another with a single peak. How can the feedback model explain that many V4 cells do not show two peaks in their responses to occluded stimuli? Are the authors assuming inhomogeneous feedback from vlPFC to V4? Alternatively, they may be suggesting that V4 population includes a continuum of responses that vary from single peak to double peaks and include in-between responses.
Can the model be adjusted to replicate population data in Figure 6? In its current form the model seems to best explain single cell examples. It is unclear how well these examples represent the population.
5) The gain mechanism that the authors propose in their model may be envisioned as PFC receiving two signals – an intertwined shape and occlusion signal and an occlusion-only signal, and then correcting the intertwined estimate with occlusion information. This seems like a bit of a chicken-and-egg problem: how and why would the brain extract occlusion level from occluded stimuli but not shape (preceding the locus at which it disambiguates shape)? Where could this v4-independent but occlusion dependent gain modulation be coming from? The authors suggest IT as a potential source, but IT units receive input from V4 and also feedback to V4. Does the model assume that gain modulation arrives at vlPFC with the same latency as the V4 inputs? Why can't feedback from IT to V4 be the source of the second V4 response peak? The authors cite their SfN abstract for occlusion selective signals in IT. It will be useful to provide more explanation about the results, especially for readers who did not stop by the SfN poster. What if there are two visible stimuli: one that is occluded and one that is not? Would the gain modulation then be different for the two stimuli? What if one stimulus occludes another stimulus?
6) Can the model be simplified? The proposed dynamic model has multiple degrees of freedom and several nonlinearities. Is all this complexity necessary? In the current version of the paper, it is difficult to gain intuition about the model based on the text. Simplifying the model can shine light on which components and nonlinearities are indispensable and therefore reasonable targets for follow-up studies. For example, the adaptation term for V4 projections to vlPFC seems a little arbitrary. Why such an adaptation does not happen for other connections in the model? A similar question can be asked about the half-rectified feedback from vlPFC to V4. Why other connections in the model are not half-rectified? More generally, we encourage the authors to explore the model space a little more extensively for simpler model architectures. As it stands, the network seems like a high-parameter model that one can tweak to get many different types of outputs. Establishing the necessity of the proposed architecture and parameterization requires a little more work.
7) On the interpretation of the effects in V4:
In the first paragraph of the subsection “Representation of occluded stimuli in vlPFC” the authors argue that the responses of some of the vlPFC neurons that are stronger if occlusion is stronger does not depend on the increased task difficulty or attentional demands, but this reasoning is unclear. Furthermore, the authors seem to have changed their mind in the fourth paragraph of the subsection 2Representation of occluded stimuli in vlPFC”, where they argue that vlPFC may amplify weak signals and that vlPFC is engaged in tasks of greater difficulty or cognitive demand.
In the first paragraph of the subsection “Representation of occluded stimuli in vlPFC” the authors argue that because many vlPFC neurons were tuned to shape, that these neurons therefore cannot reflect task difficulty or attentional demands. This argument does not hold because neurons may well be tuned to multiple aspects of a task.
8) How did you know whether penetrations were indeed in vlPFC? Was histology performed?
9) [Additional comment sent to the authors in response to authors’ plan for revision]: The authors should clarify if there is a real dichotomy between neurons with one versus two peaks, as well as how broad the distribution of the timing for the second peak is. Can they really convince the reader that they are not simply amplifying noise with their analysis? Furthermore, they now make the point that the tuning is stronger during the second peak. We would like to know if this is not simply predicted by the presence of extra spikes – i.e. the presence of a peak implies some extra spikes at a certain point in time.
[Editors' note: further revisions were requested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled "Dynamic representation of partially occluded objects in primate prefrontal and visual cortex" for further consideration at eLife. Your revised article has been favorably evaluated by David Van Essen (Senior Editor), a Reviewing Editor, and two reviewers.
The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as described below by reviewer #3. We envision that these revisions will be straightforward to carry out and that verification can be handled by the Reviewing Editor.
The authors have addressed the most critical comments. There are structural weaknesses in the dataset that keep alternative interpretations plausible. However, I believe the authors' interpretation is strengthened by the new analyses and the paper passes threshold for publication.
In their revision, Fyall et al. have addressed many of my concerns satisfactorily. They have made it clearer that some of the V4 neurons have a second peak that is more prominent in case of occlusion (Figure 4C represents a compelling example). Also, the peak detection method is now more convincing and it is also better documented. It remains unclear whether activity in vlPFC indeed contributes to late V4 activity and it is therefore conceivable that there are additional areas that could contribute to late V4 activity. Yet, I do realize that demonstrating the causal link between dlPFC and V4 would require a different approach, which would be beyond the scope of the present contribution. However, establishing such a causal link might be an important topic for future research, and the authors could mention this point, which could be added to the paragraph of suggested future work (subsection “Response dynamics in V4”, fifth paragraph).
1) I find it difficult to understand why the vlPFC neurons do not respond so well when the occluders have the same color of the background (subsection “Representation of occluded stimuli in vlPFC”, second paragraph). I would suspect that the processes for shape recognition would remain the same. Or did the monkeys' performance show signs that this was not the case?
2) Quite some p-values are lacking, three examples:
"The responses of most of these occlusion-sensitive 155 neurons (71/98) increased with increasing occlusion level".
"Even for the small subset of vlPFC neurons that responded more strongly to unoccluded stimuli (27/98), shape selectivity was not stronger for unoccluded than occluded stimuli (see Figure 3—figure supplement 2A)."
"Shape selectivity for occluded shapes was significantly higher during the second peak than during the first peak."
3) "Of 85 neurons, 30 neurons (~35%) were classified as having two peaks". How were these cells distributed across the two monkeys?
4) The model with interactions between vlPFC and V4 seems still somewhat simplistic as there are only a few neurons and the variation (in effect size and timing) across neurons shown in the figures is actually a variation across neurons in different models rather than a variation of neurons within the same model. In networks with many units and reciprocal connections, the network dynamics might actually work against variation across neurons. The authors should discuss this. It would be great if it would be possible to show the same range of differences between neurons within a same model, but I will not insist on such a demonstration given that making such a larger model might require a substantial investment of time.
5) "We cannot rule out the possibility that IT responses also contribute to V4 responses during the second transient peak. However, our IT recordings suggest this is unlikely because, as in V4, shape selectivity in IT is stronger for unoccluded than occluded stimuli". Is it conceivable that some IT neurons also have two phases in their response where the second phase is more pronounced in the presence of occlusion? It would be great if the authors could look for this possibility in the previous data set by Namima and Pasupathy, 2016? If the two phases are there it would strengthen the paper, but it would also be interesting if that is not the case.
6) Equations 4/5: I failed to see the logic of these equations, would it be possible to clarify this? Equation 9: what is thr2?
7) I found Figure 8—figure supplement 6A confusing: how do you compute y/z for neurons with one peak?https://doi.org/10.7554/eLife.25784.030
- Hannah Choi
- Eric Shea-Brown
- Anitha Pasupathy
- Anitha Pasupathy
- Anitha Pasupathy
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We thank Wyeth Bair, Gregory Horwitz and Dina Popovkina for helpful discussions and comments on the manuscript, and Yoshito Kosai for assistance with animal training and V4 data collection. Technical support was provided by the Bioengineering group at the Washington National Primate Research Center. This work was funded by NEI grant R01EY018839 to A Pasupathy, Vision Core grant P30EY01730 to the University of Washington, P51 grant OD010425 to the Washington National Primate Research Center, NSF grant DMS-1056125 to E Shea-Brown, and Washington Research Foundation Innovation Postdoctoral Fellowship in Neuroengineering to H Choi.
Animal experimentation: All animal procedures conformed to NIH guidelines and were approved by the Institutional Animal Care and Use Committee at the University of Washington (IACUC Protocol #4133-01).
- Nicole Rust, Reviewing Editor, University of Pennsylvania, United States
- Received: February 19, 2017
- Accepted: August 24, 2017
- Version of Record published: September 19, 2017 (version 1)
© 2017, Fyall et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.