1. Neuroscience
Download icon

Visual processing of informative multipoint correlations arises primarily in V2

  1. Yunguo Yu
  2. Anita M Schmid
  3. Jonathan D Victor  Is a corresponding author
  1. Weill Cornell Medical College, United States
Research Advance
  • Cited 12
  • Views 1,200
  • Annotations
Cite as: eLife 2015;4:e06604 doi: 10.7554/eLife.06604

Abstract

Using the visual system as a model, we recently showed that the efficient coding principle accounted for the allocation of computational resources in central sensory processing: when sampling an image is the main limitation, resources are devoted to compute the statistical features that are the most variable, and therefore the most informative (eLife 2014;3:e03722. DOI: 10.7554/eLife.03722 Hermundstad et al., 2014). Building on these results, we use single-unit recordings in the macaque monkey to determine where these computations—sensitivity to specific multipoint correlations—occur. We find that these computations take place in visual area V2, primarily in its supragranular layers. The demonstration that V2 neurons are sensitive to the multipoint correlations that are informative about natural images provides a common computational underpinning for diverse but well-recognized aspects of neural processing in V2, including its sensitivity to corners, junctions, illusory contours, figure/ground, and ‘naturalness.’

https://doi.org/10.7554/eLife.06604.001

Introduction

We recently showed (eLife 2014;3:e03722. DOI: 10.7554/eLife.03722 [Hermundstad et al., 2014]) how a normative theory based on the efficient coding principle (Barlow, 1961) can account for the allocation of resources for the representation of complex sensory features. Specifically, we analyzed the local statistics of natural images, and compared the variability of these statistics with their perceptual salience. The statistics that were the most variable—that is, the least predictable and therefore the most informative—were the most salient perceptually. This relationship, in which greater resources are allocated to more variable features, emerges from the efficient coding principle in the regime that the main constraint is input sampling (Barlow, 1961; van Hateren, 1992; Doi and Lewicki, 2014; Hermundstad et al., 2014). The observed relationship contrasts with the more familiar ‘whitening’ regime (Srinivasan et al., 1982), which emerges when the main constraint is output capacity (e.g., with regard to the retina and the optic nerve bottleneck); the whitening regime predicts that fewer resources are allocated to more variable features. We note that the results of (Hermundstad et al., 2014) provide empirical support for the hypothesis that input sampling, rather than output capacity, is the main constraint—since a transmission limit would have predicted a lower sensitivity for image statistics that were the most variable, the opposite of what we found.

To reach this result, we analyzed natural images via their multipoint correlations, that is, the statistics of the combinations of luminance values that appear in several points of the image. This approach has several advantages. First, it reduces the dimensionality of the space of image statistics that need to be considered, since it can be applied to binarized images, and it separates informative from uninformative statistics (Tkačik et al., 2010). Second, the approach enables rigorous tests of theoretical predictions, since the individual kinds of informative and uninformative multipoint correlations can be isolated in synthetic image sets (Victor and Conte, 2012). In contrast, the multipoint correlations in natural images covary in a complex manner that is difficult to capture or control. Synthetic image sets that isolate individual kinds of multipoint correlations are highly un-natural, but here this is an advantage: our predictions, which are derived from natural images, are tested in an out-of-sample fashion.

The information-theoretic framework of (Hermundstad et al., 2014) and (Tkačik et al., 2010) played a key role in this analysis, and we briefly summarize it here. We used a two-stage model: first, the informative multipoint features (as identified by [Tkačik et al., 2010]) are extracted from a visual image by a set of local nonlinear processing elements. Then, the output of this stage, that is, the frequency of each feature in patches of the image, is represented and transmitted by central visual areas, to serve as the basis for visual inferences (Figure 4C of [Hermundstad et al., 2014]). We used a linear channel with additive Gaussian noise as an approximation for this latter process. While obviously a simplification, this leads to an analytic solution (van Hateren, 1992) for the allocation of resources that maximizes the mutual information between stimuli and their central representation—and the analytic solution accurately accounted for dozens of independently-determined psychophysical parameters (Hermundstad et al., 2014).

Unaddressed, however, was where the extraction of multipoint correlations takes place. Several lines of evidence suggested that selective sensitivity to multipoint correlations arises in visual cortex (discussed in [Hermundstad et al., 2014]), but a direct demonstration was lacking.

Here, we report single-unit recordings in macaque V1 and V2, showing that neuronal selectivity for multipoint correlations is infrequent in V1, and becomes prominent in V2, especially in its supragranular layers. Well-recognized characteristics of V2 neurons, including sensitivity to corners, junctions (Das and Gilbert, 1999), illusory contours (von der Heydt et al., 1984), figure/ground (Qiu and von der Heydt, 2005), and ‘naturalness’ (Freeman et al., 2013) all entail sensitivity to multipoint correlations; here we show that this sensitivity is present even when these correlations are separated from their natural context.

Results

We recorded the extracellular activity of 421 individual neurons (269 in V1, 152 in V2) in the anesthetized, paralyzed macaque to stimulus sets that isolate the multipoint correlations previously studied in natural images (Tkačik et al., 2010; Hermundstad et al., 2014) and psychophysically (Victor and Conte, 1991, 2012).

The stimulus sets are illustrated in the top row of Figure 1 (see ‘Materials and methods’ for details). In the ‘random’ stimulus set, check colors are assigned independently, with an equal chance of being white or black. The six structured stimulus sets were as follows: The ‘even’ and ‘odd’ sets isolate the opposite extremes of the visually salient four-point correlation (Hermundstad et al., 2014), there denoted α. The ‘white triangle’ and ‘black triangle’ sets isolate the extremes of the visually salient three-point correlation (Hermundstad et al., 2014), there denoted θ. The ‘wye’ and ‘foot’ sets have multipoint correlations are not visually salient (Victor and Conte, 1991); this is in keeping with the efficient coding principle because in natural images, these correlations are predictable from simpler quantities (Tkačik et al., 2010). We focused on three- and four-point correlations, since one- and two-point statistics (luminance and spatial contrast) are well-known to modulate responses throughout the visual system, beginning in the retina.

Example responses to multipoint correlations in V1 and V2.

Top row: examples of the stimulus sets used to isolate the different kinds of multipoint correlations. Six sets consist of 1024 16 × 16 binary checkerboards, each with a different statistical structure (left columns); the seventh set consists of 1024 16 × 16 random checkerboards (right column); see ‘Materials and methods’ for details. In each column, the row of PSTH's shows responses of a single neuron to 1024 examples of stimuli drawn from the seven sets. Responses are generally dominated by a transient increase or decrease in firing, occurring 70 to 100 ms after the onset of each stimulus. In some cases, the size or configuration of this transient depends on the type of multipoint correlation (for example, the units in the second row). The asterisks indicate responses to the structured stimulus sets (black) that are significantly different (see ‘Materials and methods’) from the responses to the random stimuli (light gray, beginning of each row). Decremental responses following contrast onset were present in both areas, but more often in V2. However, a decremental response was not a requirement for discriminating among multipoint correlations: outside of supragranular V2, there were many neurons that had incremental responses to the stimulus transient and distinguished among the types of multipoint correlation (for example, the third and fourth rows on the right).

https://doi.org/10.7554/eLife.06604.002

Figure 1 shows post-stimulus histograms (PSTHs) of typical neurons in V1 and V2. Responses have a prominent transient after each stimulus transition, when on average half of the checks change from black to white or from white to black. For some neurons (e.g., the first, third, fourth, and sixth examples in V1), this transient is nearly identical for each of the stimulus sets. For other neurons (e.g., the second and fifth examples in V1, and most of the V2 examples), the transients differ in magnitude or configuration, suggesting a differential response to multipoint correlations.

To quantify these differences, we applied a shuffle test to the smoothed firing rates (see ‘Materials and methods’). Significant differences between responses to structured and random stimuli (the asterisks in Figure 1) were more common in V2 than in V1. For a more thorough characterization, we defined the ‘multipoint correlation discrimination index’ (MCDI), which counted not only the comparison between structured and random stimuli, but also comparisons among pairs of different structured stimuli. The MCDI was defined as the fraction of the 21 pairwise comparisons that differed by the above statistical criterion. An MCDI of 0 means that a neuronal response to all stimulus types is indistinguishable; an MCDI of 1 means that a neuronal response distinguishes between all stimulus pairings, and therefore, between all of the structured stimulus sets.

Figure 2A (upper row) summarizes the MCDI across the neuronal population. The average MCDI peaked at a value of approximately 0.05 in V1, and approximately 0.10 in V2; this difference became significant at 70 ms after stimulus onset.

Figure 2 with 1 supplement see all
Differential sensitivity to multipoint correlations arises intracortically, primarily in V2, and are selective for informative (Tkačik et al., 2010) multipoint correlations.

(A) The multipoint correlation discrimination index (MCDI, see ‘Materials and methods’) for all stimulus types. Upper panels include all neurons in each area, lower three rows subdivide according to lamina. Mean (dark red), median (gray), and 75th percentile (dark green). 25th percentile is 0 in all cases. The red dots in the upper right panel indicate a significant difference between V2 and V1 (p < 0.05, two-tailed, Wilcoxon rank-sum test, false-discovery-rate corrected). The number in the upper left of each panel indicates the number of units analyzed. (B) Mean values of the stimulus-specific MCDI. The stimuli with the highest contributions are the ones that contain correlations that are informative for natural images (Tkačik et al., 2010): even (red), odd (green), white triangles (yellow) black triangles (blue). In contrast, the others (random (black), wye (magenta), and foot (cyan)) are uninformative for natural images, and contributed little to the MCDI. (C) Pairwise discrimination of the multipoint correlation types. The grayscale shows the average pair-specific MCDI, which is the fraction of neurons that respond differentially at any time from 55 to 250 ms following stimulus onset. The stimuli for each row and column are indicated by the same color code as in panel B. Note that panel A shows the overall MCDI, panel B shows the stimulus-specific MCDI, and panel C shows the pair-specific MCDI. (D) Multidimensional scaling of the pair-specific MCDI. The distance between two points corresponds to the fraction of neurons that responds differentially to each type of multipoint correlation. A semitransparent gray plane marks the 0-value along the vertical. Note that in V2, especially in the supragranular layer, there is a wide separation between even and odd stimuli, and between black and white triangle stimuli, and these separations lie on different axes.

https://doi.org/10.7554/eLife.06604.003

A laminar analysis of the MCDI (Figure 2A, lower three rows) revealed a slight increase from the V1 granular (input) layer (mean 0.025) to the V1 extragranular layers (supragranular: 0.033, infragranular, 0.045), followed by a jump at the V2 granular layer (0.101), with a marked upsurge in the V2 supragranular layer (0.162), but not the infragranular layer (0.052). The difference between the MCDI in supragranular V2 and each of the other compartments was significant, except for the comparison with granular V2 (p = 0.053). The median value of the MCDI in supragranular V2 was 0.12, meaning that the ‘typical’ neuron responded differentially for 2 or 3 of the 21 pairwise comparisons. In all other compartments (in V1 and V2), the median was 0, that is, the ‘typical’ neuron did not distinguish between any of the stimulus types. Atypical neurons in V1 did distinguish among multipoint correlations. These were primarily neurons in the infragranular layer and with large receptive fields (RFs)—see Figure 2—figure supplement 1. But overall, the mean MCDI was lower in V1 (0.027) than in V2 (0.081), especially in its supragranular compartment (0.162).

Sensitivity to multipoint correlations was not restricted to specific cell types. Specifically, the MCDI was not significantly associated with the simple vs complex distinction, nor with the distinction between regular-spiking and fast-spiking neurons, as determined by extracellular action potential shapes (Niell and Stryker, 2008). Sensitivity to multipoint correlations was also present in isolated units that did not have overt RFs by hand-mapping or by reverse correlation (81/269 in V1 and 65/152 in V2); these units had waveforms that were isolated by the tetrode recordings, and likely include many of the ‘unresponsive’ neurons (Olshausen and Field, 2004) that would not have been selected for study with single-electrode methods. There was no significant difference in the MCDI between these neurons and the simultaneously-recorded neurons with mappable RFs, either in V1 or V2. Among the neurons with mappable RFs, the MCDI was not significantly different for neurons whose RFs were above vs below the median size for their laminar compartment. Thus, the sensitivity to multipoint correlations does not require a precise match between the RF size and the spatial scale of the correlations. In sum, sensitivity to multipoint correlations was widely distributed across V2 neurons.

The difference in sensitivity to multipoint correlations between V1 and V2 was not due to a difference in RF size, nor to stimulus scaling (i.e., the number of stimulus checks within the RF). The upper left panel of Figure 2—figure supplement 1 compares MCDI across V1 and V2 as a function of RF area; across the entire range of sizes, the MCDI is higher in V2 than in V1. The upper right panel makes this comparison as a function of the number of checks within the RF, which also equates neurons whose RFs covered the same fraction of the stimulus area. Here too, the MCDI in V2 was larger than in V1. The remaining rows of Figure 2—figure supplement 1 break the analysis down by laminar compartment. In granular and supragranular layers, the above observations hold, but there is a suggestion of a subset of V1 neurons with large RFs (lower left panel) that are sensitive to multipoint correlations. However, it is unlikely that this subpopulation underlies the high MCDI seen in V2: the targets of infragranular V1 (Felleman and Van Essen, 1991) are the superior colliculus (layer 5) and the lateral geniculate (layer 6), while the inputs to V2 arise mainly from layers 2, 3, and 4b, where the MCDI is low. Moreover, the difference between supragranular V2 and the V2 input layer strongly suggests that the behavior in supragranular V2 is a result of intrinsic processing in V2, not a feature of signals passed on by V1 (which would already have been present in the granular layer).

Figure 2B shows that the multipoint correlations that contribute to the MCDI are the ones previously identified as being informative about natural images (Tkačik et al., 2010) and perceptually salient (Victor and Conte, 1991), namely, the even, odd, white triangle, and black triangle stimuli. Figure 2C further breaks down the MCDI into the individual pairwise comparisons. Few neurons, either in supragranular V2 or across the population, discriminated among pairs of the stimuli with uninformative multipoint correlations (random, wye, and foot). To visualize the pattern of discrimination across the neuronal population, we applied multidimensional scaling to the data of Figure 2C. This led to a three-dimensional representation (Figure 2D) in which the seven stimulus types are represented by points, and the distance between the points corresponds to the fraction of neurons that distinguishes between them (i.e., the average pair-specific MCDI across the population). In V1, points are clustered near the origin, since most neurons cannot distinguish between any stimulus types. In V2, the representation expands into a multidimensional space. The two opposite stimulus pairs (even vs odd, and white triangle vs black triangle) are separated along different axes. Correspondingly, psychophysical studies show that the even-vs-odd gamut, and the white triangle-vs-black triangle gamut are independent perceptual axes (Victor and Conte, 2012) (Figure 8 panel 2 of reference [Victor and Conte, 2012], and the [θ, α]-panels of Figure 3 of Hermundstad et al. (2014)). Human perceptual sensitivities are larger for the four-point configuration than for the three-point configurations (Victor and Conte, 2012; Hermundstad et al., 2014); this is mirrored by higher values of the MCDI for the even stimuli than for the white triangle or black triangle stimuli in supragranular V2 (Figure 2D).

However, there are some differences between representation of informative multipoint correlations in the V2 population (as shown in Figure 2D) and human psychophysics (Victor and Conte, 2012; Hermundstad et al., 2014). First, the points corresponding to the uninformative stimuli are close to, but not superimposed on, the random stimulus. Additionally, while psychophysical sensitivity to the odd stimulus is only about 25% less than sensitivity to the even stimuli (Victor and Conte, 2012), the MCDI for the odd stimulus is much lower. We note that the odd stimulus contains even correlations when analyzed at spatial scales larger than a single check (Victor and Conte, 1989), so neuronal mechanisms sensitive to the even correlation will also contribute to the perceptual salience of the odd stimulus. More generally, the discrepancies between V2 neuronal activity and perception may reflect the simple measure used for quantifying discrimination at the population level (the average MCDI and multidimensional scaling), as well as further neural processing between V2 and perception.

Discussion

Building on recent findings that the perceptual salience of complex (multipoint) image statistics is governed by their informativeness in natural images, here we show that selective sensitivity to these image statistics arises primarily in V2. Within V2, the greatest sensitivity is in the supragranular layers, where the typical (median) neuron can distinguish between two or three of the stimulus pairs. In contrast, typical neurons in V1 do not distinguish between any of the stimuli, although there appears to be a subpopulation of large-RF neurons in infragranular V1 with a modest level of selective sensitivity. The overall pattern of neuronal sensitivity to image statistics (Figure 2D) resembles the sensitivity of human observers, driven primarily by the multipoint statistics that are visually salient.

We speculate that sensitivity to informative multipoint correlations is the computational underpinning of many of the changes in neural characteristics from V1 to V2 that have previously been noted—sensitivity to corners, junctions (Das and Gilbert, 1999), illusory contours (von der Heydt et al., 1984), figure vs ground (Qiu and von der Heydt, 2005), and ‘naturalness’ (Freeman et al., 2013). The distinction between informative and uninformative multipoint correlations emerged from a formal information-theoretic analysis of natural images (Tkačik et al., 2010). While this analysis did not relate ‘informativeness’ to these other characteristics, inspection of the examples of Figure 1 suggests several points of contact. With regard to junctions and contours, examples of the odd ensemble images contain large numbers of corners, while examples of the even ensemble contain large numbers of crossings. The extended contours evident in the even ensemble are a kind of illusory contour, since the polarity changes that define them undergo random flips, which would confound a linear edge detector. With regard to figure vs ground, stimuli in the black triangle and white triangle ensembles appear to contain, respectively, black figures on white backgrounds, vs white figures on black backgrounds—even though the stimulus sets are matched for spatial frequency content and the number of black and white checks. Thus, informative multipoint correlations result in images that are enriched for junctions, contours, and objects, compared to images that have the same first- and second-order statistics but lacking these correlations. While the extent to which these local features account for ‘naturalness’ remains for future work, the present results show that selective sensitivity of V2 neurons for informative multipoint correlations persists even when they are removed from the context of a natural image.

It is unclear to what extent it is necessary to match the scale of a multipoint correlation with that an illusory contour or junction in order for the visual feature to be extracted. However, the distinction between informative and uninformative multipoint statistics holds over at least a fourfold range of length scales (the entire range analyzed, SI figure 14 of [Tkačik et al., 2010]). Human sensitivity to these correlations is present over at least a similar range of check sizes (0.03–0.25 deg, Figure 2, 8 of [Victor and Conte, 1989]; also [Conte et al., 2014]) comparable to the range of check sizes used in this study (0.08–0.5 deg). This broad range of sensitivities is found even when stimuli are restricted in eccentricity (Victor and Conte, 1989). Figure 2—figure supplement 1 (right column) shows that V2 sensitivity to multipoint correlations also does not require a close match between RF size and the scale of the multipoint correlation; this sensitivity is present over a threefold range of length scales (i.e., a 10-fold range of the number of checks per receptive field). Thus, it is likely that the entire range of scales relevant to perception can be accounted for by the properties of individual neurons, along with the variation in RF sizes at each eccentricity (Hubel and Wiesel, 1968).

Neurons whose RFs are difficult to map are often ignored in physiologic studies (Olshausen and Field, 2004). We were able to analyze their responses here because of the tetrode recording method, and found that many V2 neurons whose RFs could not be mapped nevertheless often showed selective sensitivity to multipoint correlations. We consider some possible reasons for this here. As defined in this paper, a neuron is considered to have a mappable RF if the reverse correlation of the neuron's responses to the stimulus passes a statistical criterion (see ‘Materials and methods’). Standard practice is to use random binary stimuli for this mapping procedure (Reid et al., 1997); here we include stimuli with high-order correlations in the mapping computation. The rationale is that inclusion of these stimuli allows some kinds of nonlinear responses to emerge in a first-order cross-correlation between stimulus and response, because of correlations within the stimuli (Schmid et al., 2011). But even an expanded stimulus set may not reveal the RFs of all neurons that respond to multipoint correlations. Reverse correlations may not exceed our statistical threshold because of response variability, or because the neuron is only responsive to stimulus configurations that occur very rarely in the stimulus set. We also note that from a computational point of view, our assay for sensitivity to multipoint correlations is independent of whether the neural response is correlated with the state of any single check: for each of the ensembles that probe a different kind of multipoint correlation, the number of black and white checks are equated, at each location. Thus, it is quite possible for a neuron to process information in a localized region of space (as manifest by its sensitivity to multipoint correlations) yet fail to have a RF that is measurable by reverse correlation methods, as we show here.

Finally, our findings carry implications for neural mechanisms. Many biologically-plausible mechanisms can extract multipoint correlations, including a simple linear-nonlinear cascade (provided that the nonlinearity is more than quadratic), and modulatory surrounds (Schmid and Victor, 2014; Self et al., 2014). But models need to account for the specificity of the responses, not just their existence. In this regard, we note (Victor and Conte, 1991) that the specificity we observe can be produced by a two-stage (linear-nonlinear-linear-nonlinear) cascade, in which the first linear-nonlinear element detects local edges, and the second one combines signals from collinear edges via a second threshold. Removal of either component of the second stage—either its linear or the nonlinear element—eliminates this specificity. The finding that responses to multipoint correlations are more prominent in supragranular V2 than in its input layers or in V1 suggests possible correspondences between this cascade and neural circuitry. One possibility is that the first stage is in V1 and the second stage is in V2 (Wilson et al., 1992; Rust et al., 2005). Alternatively, the two linear-nonlinear stages may represent two loops of signal passage through a recurrent network within a single cortical area (Joukes et al., 2014).

Materials and methods

All procedures conformed to the guidelines provided by the US National Institutes of Health and Weill Cornell Medical College Animal Care and Use Committee. Full details concerning the physiologic preparation are provided in Schmid et al. (2014), and are summarized here.

Preparation

Single-unit recordings using arrays of three to six independently positioned tetrodes (typical input resistance, 1–2 MΩ; Thomas Recording GmbH, Giessen, Germany) were made in V1 and V2 of 14 macaques, anesthetized with propofol and sufentanil and paralyzed with vecuronium or rocuronium. Tetrodes were placed on opposite sides of the V1/V2 boundary, and typically within 1 mm of each other within each region, so that the units recorded by the tetrodes generally had neighboring or overlapping RFs. This yielded a total dataset of 421 neurons (269 in V1, 152 in V2), following spike sorting and selection for firing rate criteria (see below).

Initial neuronal characterization

Tetrodes were independently lowered until they recorded visually-driven extracellular action potentials. After initial hand-mapping, tuning properties were determined from responses to 3–4 s presentations of drifting sinusoidal gratings. Stimulus parameters were successively refined in the order of orientation, spatial frequency, temporal frequency and contrast, based on on-line analysis of the responses of the target unit. When the recorded cluster had well-isolated units that preferred an orientation other than the preferred orientation for the target unit, this process was repeated for a second, and rarely a third, orientation as well.

Characterization of sensitivity to multipoint correlations

To determine neuronal responses to multipoint correlations, we measured responses to a sequence of black-and-white checkerboards that isolated the individual kinds of correlation. Figure 1 (top) shows three examples of these seven stimulus types. Each stimulus consisted of a 16 × 16 array of black and white checks. In the ‘random’ stimulus set, check colors were assigned independently, with an equal chance of being white or black. In the other stimulus sets, the coloring rule isolated a single kind of multipoint correlation. In the ‘even’ set, there was always an even number of white (or black) checks in any 2 × 2 neighborhood of checks. In the ‘odd’ set, there was always an odd number of white (or black) checks in a 2 × 2 neighborhood. Even and odd sets are the opposite extremes of the visually salient four-point correlation (Hermundstad et al., 2014), α. In the ‘white triangle’ set, there were always one or three white checks within a triangular region; in the ‘black triangle’ set, there were always one or three black checks within a region of the same shape. These two sets correspond to opposite extremes of the visually salient three-point correlation (Hermundstad et al., 2014), θ. We also examined responses to four-point correlations in two other spatial configurations, ‘wye’ and ‘foot.’ Multipoint correlations in the wye and foot configurations are predictable from simpler quantities in natural images (Tkačik et al., 2010), and, in keeping with predictions of efficient coding (Hermundstad et al., 2014), they are not visually salient (Victor and Conte, 1991).

Check size was scaled to the RF size of the target neuron so that approximately two checks corresponded to one lobe of the optimal spatial frequency, and orientation was set according to the orientation preference of the target neuron. This resulted in about 8 checks within the classical RF (V1: mean 7.40, median 6.00, SD 5.33; V2: mean 8.68, median 7.00, SD 5.46; statistics across all mappable units and not just the target; see below for details on RF mapping and Figure 2—figure supplement 1 for the distribution of number of checks in the RF); thus, the checks are within the resolution limits of the neuron, and the stimulus patch covers an area that is substantially larger than the RF. Across all recordings (including mappable and un-mappable units), check sizes ranged from 0.08 to 0.5 deg (V1: mean 0.18, median 0.20, SD 0.05; V2: mean 0.22, median 0.20, SD 0.12).

For each type of stimulus, we presented 1024 examples (two repetitions each) for 320 ms, interleaved in a pseudorandom sequence. This large set size was chosen so that we can distinguish average responses to each of the stimulus sets (our focus) from responses that might be driven to the specific white or black checks or edges present in particular examples (a potential confound).

Stimuli were generated via a Markov recurrence rule (Victor and Conte, 1991, 2012), so that other than the constraint of their defining multipoint correlation, they are as random as possible (maximum-entropy). This yields stimulus sets that enable testing of each kind of multipoint correlation in isolation. In each set, there are no two-point correlations—checks at any pair of locations are colored independently—so that the sets have the same power spectra, and therefore the same spatial frequency content. The four kinds of correlations (the even/odd axis, the black triangle/white triangle axis, wye, and foot) are independently controlled: each set extremizes one of these correlations, while keeping all the others at 0 (Gilbert, 1980). Thus they provide a way to assay responsiveness to each kind of multipoint correlation in isolation.

All stimuli were rendered on a 1280 × 1024-pixel display at 100 Hz, using either a 21-inch ViewSonic G225f monitor (mean luminance 47 cd/m2, gamma-corrected) or a Sun GDM5410 monitor (mean luminance 46 cd/m2, gamma-corrected) at 114 cm. Control signals for the displays were generated by PC-based system using OpenGL software.

Spike sorting

After bandpass filtering (300–9000 Hz) and thresholding, waveforms were clustered using customized versions of KlustaKwik and Klusters (Hazan et al., 2006); details as in Schmid et al. (2014). The 17 features consisted of peak and trough amplitudes (8 features), the first 8 principal components, and time. All neurons whose mean firing rates across all stimuli were ≥ 1 Hz were analyzed for their responsiveness to the multipoint correlation stimuli described above.

To classify extracellular spike waveforms as narrow-spiking (putative inhibitory) and broad-spiking (putative excitatory), we used a method similar to that of refs. (Mitchell et al., 2007) and (Niell and Stryker, 2008). For each single unit, the waveforms from each tetrode channel were averaged and the channel with the largest signal to noise ratio (SNR) was selected for the spike width measurement. Two parameters of spike width were measured: (1) trough to peak width—the duration from the trough to the peak of the waveform, and (2) half-peak width—the duration from the peak of the waveform to half its height. The distribution of both measurements across the 1856 waveforms from the laboratory database were significantly bimodal (p < 0.01 by the Hartigan dip test [Hartigan and Hartigan, 1985]). Based on the notch in the distribution, we classified extracellular waveforms as narrow-spiking (<405 µs) and broad-spiking (>430 µs). Next the averaged waveforms themselves were clustered using k-means. The clusters were separated identically by k-means of the waveforms, and the distribution of the spike width parameters.

Localization of recording sites

At the conclusion of the experiment, we made small lesions at locations that bracketed the recording sites along each tetrode track, via current passage through the most distal tetrode contact. Details concerning the procedures for lesions, perfusion, and histology are in ref. (Schmid et al., 2014). For sites for which the laminar assignment was uncertain, neurons were included in the tallies for V1 and V2 (e.g., top rows of each panel of Figure 2) and the statistical comparisons between them, but not in the breakdown by layer or statistical comparisons between layers. This amounted to <10% of the units.

Data analysis

Tuning curves were computed in the standard fashion from the Fourier components of the spike train elicited by each grating stimulus, as detailed in Schmid et al. (2014). Tuning curve peaks were determined from the DC response (F0) or the first harmonic (F1), whichever was larger. We classified neurons as simple or complex according to whether their response to a drifting grating was primarily at the period of the grating (simple) or primarily a maintained elevation (complex), as quantified by the F1/F0 ratio (Skottun et al., 1991). F1 is the first harmonic of the response to the optimal grating tested, F0 is the maintained firing rate of the response, after subtraction of the average firing rate in response to a uniform field at the mean illumination. Note that since grating parameters were chosen according to the preference of neurons whose waveforms could be discriminated online, some neurons may not have been stimulated at the optimal orientation or spatial frequency.

RF maps were determined by correlating the neural response (1 for white checks, −1 for black checks) to the checkerboard stimuli (16 × 16 checks). The response measure was the total number of spikes over the duration of each presentation (320 ms) averaged across both repetitions; this is equivalent to computing the spike-triggered-average and then summing over the stimulus duration. Maps were separately computed for each of the seven stimulus types; as reported previously, some neurons (Schmid et al., 2012) that did not have mappable RFs for random checkerboards nevertheless had mappable RFs for the other stimulus types. Statistical significance for each of these seven maps was determined by a shuffle test: we recomputed maps from 500 surrogate data sets in which the responses to each stimulus type were permuted, determined the mean and standard deviation of these surrogate maps at each check, and then used the corresponding Gaussian distribution to determine which actual map values were significant at p < 0.05 (two-tailed, correcting for multiple comparison via the Benjamini-Hochberg method, that is, false discovery rate [FDR] method) (Benjamini and Hochberg, 1995). We then determined the union of the seven maps obtained from each stimulus. Usually this yielded a single connected component, and the RF was taken to be its convex hull. When more than one connected component was present, smaller components were merged with the largest one if they were separated by no more than a single check, and the convex hull of the resulting region was taken as the RF. The number of checks in this convex hull was taken as the measure of RF size. If none of the seven classes of stimuli yielded a significant RF map by the above criteria, the neuron was considered not to have a mappable RF. As an alternative procedure, we also computed RF maps by correlating the responses with all (7 × 1024) stimuli, and this yielded very similar results.

To measure sensitivity to multipoint correlations, we proceeded as follows (see Figure 3). For each of the stimulus types, we accumulated a PSTH across all 1024 examples (and 2 repeats), and then determined the smoothed firing rate via local linear regression (Loader, 2012). Significance of the difference between two firing rate functions was determined by a shuffle test, in which 3000 surrogate data sets were created by randomly exchanging responses among a pair of stimulus types. The exchanges were limited to responses that were recorded in adjacent trials (within 4 s of each other), to avoid confounds due to slow changes in firing rate over time. The difference between the smooth firing rates of the actual data was compared to the distribution of differences seen in the surrogate datasets at each 5 ms bin, from 55 to 250 ms. The number of times the actual difference was exceeded by any of the 3000 surrogates yielded a raw two-tailed p-value at each of these 40 time points. If the raw p-value was below the false-discovery-corrected threshold of p = 0.05, the neuron was considered to have a different response to the two kinds of stimuli at that time point. For each neuron, the MCDI at each time point (Figure 2A) was defined as the fraction of stimulus pairs that elicited statistically different responses as determined by the above procedure; the MCDI was therefore n/21, where n is the number of stimulus pairs that elicited statistically different responses. For each of the seven stimuli, we also calculated a stimulus-specific MCDI, considering only the six pairs of discriminations involving that particular stimulus (Figure 2B); this was therefore a quantity n/6. Finally, to detail the pattern of pairwise discriminations (Figure 2C,D), we computed a ‘pair-specific’ MCDI—either 0 (no discrimination) or 1 (discrimination), and averaged it across the population. For this purpose, we considered a neuron to distinguish a pair of stimuli if a difference was present at any time during the 55–250 ms period, again using the above statistical criteria.

Procedure for determination of differential sensitivity to multipoint correlations for a stimulus pair.

(A) A smoothed firing rate is constructed from the responses to examples of each stimulus type (1024 examples, each presented twice). (B) A parallel procedure is carried out for 3000 surrogate datasets, in which responses are randomly exchanged among the stimuli. The exchanges were limited to responses recorded in adjacent trials, to avoid confounds due to slow change in firing rate over time. (C) The difference between the smoothed responses to the two stimuli is computed, both for the actual responses and each of the surrogate datasets. The relationship of the actual firing rate difference (black) to the distribution of differences encountered in the surrogate datasets (gray) is determined. (D) At each time point, the position of the actual difference in the surrogate difference distribution is expressed as a two-tailed p-value. The actual difference is considered to be significant if any of these p-values over the range 55–250 ms (dashed vertical lines) fall below the false-discovery-rate (FDR) threshold q corresponding to a significance level of 0.05. The FDR threshold q, illustrated as the horizontal dashed line in Figure 3D, is a data-determined quantity (Benjamini and Hochberg, 1995) that is substantially less than the raw significance level of 0.05 (in this case, q < 0.01).

https://doi.org/10.7554/eLife.06604.005

Sensitivity to multipoint correlations was not associated with the simple vs complex distinction (as measured by F1/F0 ratio, with a dividing point at 1 [as in ref. Mechler and Ringach, 2002] or at the population median). These and other comparisons between subsets of cells (e.g., V1 vs V2, or between laminar compartments) were carried out using a two-tailed Wilcoxon rank-sum test. The raw p-values were subjected to false discovery correction (Benjamini and Hochberg, 1995) across time points, in 5 ms bins from 55 ms to 250 ms. Statistical significance corresponds to p < 0.05.

To visualize the population pattern of differential responses (Figure 2D), we used standard multidimensional scaling (Kruskal and Wish, 1978), applied to the fraction of neurons that distinguished between each pair of correlation types. The first two embedding dimensions (as shown in Figure 2D) typically accounted for > 90% of the variance.

References

  1. 1
    Sensory communication
    1. H Barlow
    (1961)
    Sensory communication, Cambridge, MIT Press.
  2. 2
    Controlling the false discovery rate: a practical and powerful approach to multiple testing
    1. Y Benjamini
    2. Y Hochberg
    (1995)
    Journal of the Royal Statistical Society Series B, Statistical Methodology 57:289–300.
  3. 3
    Sensitivity to local image statistics is (almost) scale-invariant
    1. MM Conte
    2. SM Rizvi
    3. DJ Thengone
    4. JD Victor
    (2014)
    Vision Sciences Society Annual Meeting.
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
    Multidimensional scaling
    1. JB Kruskal
    2. M Wish
    (1978)
    SAGE publications, inc.
  15. 15
    Handbook of computational statistics
    1. C Loader
    (2012)
    Handbook of computational statistics, Springer Berlin Heidelberg.
  16. 16
  17. 17
  18. 18
  19. 19
    23 Problems in Systems Neuroscience
    1. BA Olshausen
    2. DJ Field
    (2004)
    23 Problems in Systems Neuroscience, Oxford University Press.
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
    Mapping receptive fields using stimuli with third- and fourth-order statistics: black blobs better than random
    1. AM Schmid
    2. Y Yu
    3. JD Victor
    (2011)
    Washington, DC: Society for Neuroscience.
  26. 26
    Mapping receptive fields using stimuli with high-order statistics in V1 and V2
    1. AM Schmid
    2. Y Yu
    3. JD Victor
    (2012)
    Mapping receptive fields using stimuli with high-order statistics in V1 and V2, Annual meeting for Society for Neuroscience, October 16, 2012, New Orleans, LA, Society for Neuroscience.
  27. 27
  28. 28
    Classifying simple and complex cells on the basis of response modulation
    1. BC Skottun
    2. RL De Valois
    3. DH Grosof
    4. JA Movshon
    5. DG Albrecht
    6. AB Bonds
    (1991)
    Vision Research 31:1079–1086.
  29. 29
    Predictive coding: a fresh view of inhibition in the retina
    1. MV Srinivasan
    2. SB Laughlin
    3. A Dubs
    (1982)
    Proceedings of the Royal Society of London Series B, Biological Sciences 216:427–459.
    https://doi.org/10.1098/rspb.1982.0085
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
    Local image statistics: maximum-entropy constructions and perceptual salience
    1. JD Victor
    2. MM Conte
    (2012)
    Journal of the Optical Society of America A, Optics, Image Science, and Vision 29:1313–1345.
    https://doi.org/10.1364/JOSAA.29.001313
  35. 35
  36. 36

Decision letter

  1. Timothy Behrens
    Reviewing Editor; Oxford University, United Kingdom

eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.

Thank you for sending your work entitled “Visual processing of informative multipoint correlations arises primarily in V2” for consideration at eLife. Your article has been favorably evaluated by Timothy Behrens (Senior editor and Reviewing editor) and three reviewers, one of whom, Michael Landy, has agreed to share his identity.

The editor and the reviewers discussed their comments before we reached this decision, and the editor has assembled the following comments to help you prepare a revised submission.

The editor and reviewers agree that the finding of neural correlates of multipoint correlations reflects an important advance over your previous behavioural findings and are enthusiastic about the potential publication of this Research Advance.

For example:

This paper takes the previous work of this group on which 2-, 3- and 4-point correlations are visually salient and to which the visual system is sensitive, and shows that some differential responses to these correlations arise first in area V2. As the authors are aware, I'm pretty familiar with this line of work (having reviewed several of the earlier papers in the series). Tying this story to the physiology is certainly a logical and useful next step.

In the current manuscript Yu et al. add to previously published findings that human observes are more sensitive to more informative multi-point correlations in images. Here they provide a candidate for the neural substrate of this sensitivity: supragranular layers of V2. I think the manuscript is an interesting addition to the previous paper.

Overall, this Research Advance is clearly written and nicely complements the founding article by providing neuronal correlates for multipoint correlation stimuli that have theoretical significance and perceptual relevance.

However, there were several questions that the review panel would like addressed before we could consider publication of the study.

During discussion the panel agreed that the most critical issue to address before the paper can be published is the issue of the scaling of the stimuli.

Since receptive fields are larger in V2 and you are adjusting the stimulus to the receptive field size, aren't you effectively presenting two different stimulus ensembles to the V1 and the V2 population, respectively? Could that explain the differential response between V1 and V2? Do you have control data, where you, instead of upscaling the 16x16 patch, simply increased the number of pixels to match the receptive field size? Alternatively, you could drive the V1 population with the upscaled stimuli for the V2 neurons and see whether your results change. If you do not have such data readily to hand, do you have other means of ruling out that the stimulus scaling confounds the results? For example, are there recorded V1 cells whose receptive field completely overlapped with that of a V2 cell? In that case, you would have responses to the same stimulus ensemble, but from two different areas. The review panel agreed that this issue should be rigorously addressed.

If such data do not exist, then the review panel asks you to remove the claim of a distinction between V1 and V2 from the paper since the data does not really support a comparison, and to explicitly mention that V1 and V2 are stimulated with differently scaled stimuli and explain the reasoning behind it.

A related question about the relationship between V1 and V2 coding was also raised:

Figure 2A shows that at least 75% of V1 cells have no selectivity for multipoint correlation stimuli, yet the mean MCDI of all cells is ∼0.05. This implies that V1 has a small population of V1 cells that have an MCDI of 0.2 or more. How do the response properties of these “special” V1 neurons compare to a “typical” V2 neuron? With the current presentation, it's hard to tell whether V2's representation of multipoint correlations is new or enhances a representation already present in a small subpopulation of V1. As such, the first sentence of the Discussion, which says “arises primarily in V2” seems imprecise. This distinction is also relevant to the authors' Discussion hypothesis that higher-order correlation specificity might emerge from a two-stage cascade from V1 to V2.

The reviewers were also concerned about the illusory contour figure:

At present, the connections between higher-order correlations and previously hypothesized roles for V2 seem tenuous and insufficiently detailed to warrant a full figure in the main text.

For example:

The authors use Figure 3 to argue that V2's selectivity to multipoint correlations helps explain its involvement in the detection of illusory contours and the discrimination of figure and ground. I have two comments. First, in panel a, the “even and odd” correlation structure picks out the corners of the black bars. As such, the association of this correlation with the illusory contour is a consequence of the fact that the bars are spaced by the same distance that defined the “even and odd” correlation structure. Can the authors say anything about whether there is an association between the spatial scales of illusory contour detection and the “even and odd” correlation structure? For example, do humans perceive illusory contours over the same length scales that “even and odd” correlation structures are informative for natural images? Has V2 previously been shown to be sensitive to illusory contours on the spatial scale that the authors use in the “even and odd” correlation structure? Second, the “white and black triangle” correlation structure is the only correlation structure that can distinguish between the two stimuli in panel b because it's the only odd-ordered correlation. This is why I earlier alluded to the point that it would have been helpful to include an uninformative third-order correlation stimulus. Also, in the specific example shown, couldn't one just use the mean (i.e. a first-order structure) to discriminate between the stimuli? I wonder if there might be a better choice of stimuli for this panel.

The reviewers also had several questions that we believe can be addressed by changes to the manuscript text.

In Hermundstad et al. 2014 the second order stimuli (beta) are more informative than the fourth order stimuli (alpha), which are more informative than the third order stimuli (theta). However, the authors only present data for alpha and theta stimuli (that were less informative in the Hermundstad paper) while not presenting the beta stimuli (that were more informative in the Hermundstad paper). I would like to know whether the authors (i) performed experiments with beta stimuli, (ii) if so why they did not report them, or (iii) why they did not consider them. I am sure the authors had a good reason which should be mentioned in the paper.

It would be nice if you could mention more explicitly how the rank order of the MCDI relate to the sensitivity order found in the psychophysical experiments of the previous paper.

I was interested by the finding that the MCDI was unassociated with mappable receptive fields and would be interested in hearing some thoughts from the authors in their Discussion section.

Can the authors clarify how they determined the p-value threshold in Figure 4D? Since the authors declare significance “if any of these p-values” falls below the FDR threshold, is the Benjamini-Hochberg correction equivalent to a Bonferroni correction? If so, then 40 comparisons would lead to a p-value correction less than that displayed. Or do the authors somehow correct for fact that their temporal smoothing effectively leads to fewer than 40 comparisons?

There were also questions about the underlying assumptions in the model. We understand that these comments pertain equally to the already-published paper, but we nevertheless hope that you will be able to deal with them in a few sentences, which we felt would help the current manuscript.

What justifies the authors to assume the regime of sampling limitation rather than transmission limitation? You write that your results fit into the efficient coding framework if sampling an image is the main limitation. What is the empirical evidence that justifies this assumption as opposed to the transmission limited regime many other studies are based on?

If I understand correctly, the fact that humans/neurons should be more sensitive to more variable features is derived using a linear model with Gaussian input and channel noise. However, the mapping from images to multi-point correlations does not seem to be linear. How do you know that this result still holds in the non-linear case, in particular if the sampling regime is characterized by dominating input noise (which would get nonlinearly transformed)?

How do you justify that more variable features contain more information? In the discrete case, I can see that. However, in the limit of infinitely many images, the multi-point correlations become continuous. In that case I could transform all features by a pointwise monotonic transformation (histogram equalization) that would not change the information content but make all features equally variable.

https://doi.org/10.7554/eLife.06604.007

Author response

During discussion the panel agreed that the most critical issue to address before the paper can be published is the issue of the scaling of the stimuli.

Since receptive fields are larger in V2 and you are adjusting the stimulus to the receptive field size, aren't you effectively presenting two different stimulus ensembles to the V1 and the V2 population, respectively? Could that explain the differential response between V1 and V2? Do you have control data, where you, instead of upscaling the 16x16 patch, simply increased the number of pixels to match the receptive field size? Alternatively, you could drive the V1 population with the upscaled stimuli for the V2 neurons and see whether your results change. If you do not have such data readily to hand, do you have other means of ruling out that the stimulus scaling confounds the results? For example, are there recorded V1 cells whose receptive field completely overlapped with that of a V2 cell? In that case, you would have responses to the same stimulus ensemble, but from two different areas. The review panel agreed that this issue should be rigorously addressed.

If such data do not exist, then the review panel asks you to remove the claim of a distinction between V1 and V2 from the paper since the data does not really support a comparison, and to explicitly mention that V1 and V2 are stimulated with differently scaled stimuli and explain the reasoning behind it.

We agree that this is an important issue, and we also very much appreciate the latitude given by the reviewers in responding to the point. Because of space limitations in the original submission, we had only touched on this, mentioning that responsiveness to multipoint correlations was similar for neurons with large and small receptive fields (RF’s); we now devote a new figure to this issue (Figure 2–figure supplement 1), and add material describing this analysis to the Results section, after Figure 2A. Briefly, the new figure shows that the difference between V1 and V2 cannot be explained merely by a difference in RF size, or a difference in the number of checks within the RF: whether one compares the MCDI for V1 and V2 neurons with comparable RF sizes, or for V1 and V2 neurons stimulated with comparably scaled stimuli, the V2 neurons have a larger MCDI. The upper left panel shows the MCDI is plotted as a function of RF area, showing a difference in MCDI for V2 vs. V1 neurons across the entire range. So, V2 neurons are not simply larger versions of V1 neurons. But, as the reviewers point out, since we adjust the stimulus to match the resolution of the neural activity at one of the tetrodes (i.e., at one brain site), there is the possibility that the V2 vs. V1 difference relates to the number of the checks in the receptive field, or, equivalently, the fraction of the 16 x 16 stimulus array that is “seen” by the RF. The upper right panel rules out this possibility as well: when compared on the basis of equal number of checks within the RF, V2 neurons have a larger MCDI than V1 neurons. The remaining rows of Figure 2 supplement break the analysis down by laminar compartment; there are no surprises in the supra‐ and granular layers but the analysis of the infragranular compartment (see below) suggests that there is a subset of V1 neurons with large receptive fields that are sensitive to multipoint correlations. We now mention this point at the end of the new paragraph devoted to Figure 2–figure supplement 1, but (as discussed below) it does not change our basic finding, that sensitivity to multipoint correlations arises primarily in V2.

We also want to mention (see point below concerning number of checks in the RF) that the number of checks considered to be within the RF (median, ∼8) is probably an underestimate for the actual RF size, as a check is only to be considered to be “within” the RF if reverse‐correlation analysis passes a criterion level of statistical significance; a substantial fraction of the dataset has neurons that did not meet this criterion but still had clear responses to the stimuli, as manifest by a large MCDI.

A related question about the relationship between V1 and V2 coding was also raised:

Figure 2A shows that at least 75% of V1 cells have no selectivity for multipoint correlation stimuli, yet the mean MCDI of all cells is ∼0.05. This implies that V1 has a small population of V1 cells that have an MCDI of 0.2 or more. How do the response properties of thesespecialV1 neurons compare to atypicalV2 neuron? With the current presentation, it's hard to tell whether V2's representation of multipoint correlations is new or enhances a representation already present in a small subpopulation of V1. As such, the first sentence of the Discussion, which saysarises primarily in V2seems imprecise. This distinction is also relevant to the authors' Discussion hypothesis that higher-order correlation specificity might emerge from a two-stage cascade from V1 to V2.

There are two related points here, and both are interesting: (a), given that average MCDI in V1 is driven by a small subset of cells with large MCDI’s, does this subset constitute a subpopulation with definable features? And (b), should one view V2 as merely enhancing a representation of multipoint correlations that are already present in V1? With regard to (a), the new analysis presented in Figure 2–figure supplement 1 provides an important clue, which we now mention in the Results where the summary statistics are described: the infragranular layer contains a subset of V1 neurons, with large receptive fields, and this subpopulation has a large MCDI. With regard to (b), we don’t think that the existence of these neurons changes the basic conclusion that sensitivity to multipoint correlations arises primarily in V2: these units make up only a small subset of one laminar compartment in V1, and even among these neurons, the median MCDI does not approach the levels in supragranular V2. Independently, anatomical evidence supports this view: the targets of infragranular V1 (Felleman and Van Essen (1991)) are the superior colliculus (layer 5) and the lateral geniculate (Layer 6); while the inputs to V2 arise mainly from layers 2, 3, and 4b, where the MCDI is low. Finally, the difference between supragranular V2 and the V2 input layer strongly suggests that the behavior in supragranular V2 is a result of intrinsic processing in V2, not a feature of signals passed on by V1 (which would already have been present in the granular layer). We now mention these points in the main text concerning Figure 2–figure supplement 1. Finally, to add precision to the broad statement that sensitivity to multipoint correlations arises primarily in V2, we add mention of the subpopulation of infragranular V1 neurons to the opening paragraph of the Discussion.

The reviewers were also concerned about the illusory contour figure:

At present, the connections between higher-order correlations and previously hypothesized roles for V2 seem tenuous and insufficiently detailed to warrant a full figure in the main text.

Our main reason for presenting this figure was to provide an intuition as to why only some kinds of multipoint correlations are informative, as an aid to readers who might not be comfortable with the technicalities of the approach taken in Hermundstad et al. and Tkacik et al. (PNAS, 2010). But we agree that the argument made by the figure is intuitive and un‐rigorous, rendering it unhelpful for the more sophisticated reader, so we removed it. Nevertheless it is useful to speculate about why these multipoint correlations are important, and we retain these comments, which we clearly mark as speculative. These comments, which are in a rewritten second paragraph of the Discussion, use the stimulus examples in the retained Figure 1 to illustrate the points made.

For example:

The authors use Figure 3 to argue that V2's selectivity to multipoint correlations helps explain its involvement in the detection of illusory contours and the discrimination of figure and ground. I have two comments. First, in panel a, theeven and oddcorrelation structure picks out the corners of the black bars. As such, the association of this correlation with the illusory contour is a consequence of the fact that the bars are spaced by the same distance that defined theeven and oddcorrelation structure. Can the authors say anything about whether there is an association between the spatial scales of illusory contour detection and theeven and oddcorrelation structure? For example, do humans perceive illusory contours over the same length scales thateven and oddcorrelation structures are informative for natural images? Has V2 previously been shown to be sensitive to illusory contours on the spatial scale that the authors use in theeven and oddcorrelation structure?

With regard to the scaling question: we don’t know to what extent it is necessary to match the scale of a multipoint correlation with the features that define an illusory contour in order for the contour to be extracted. As the reviewer notes, the former Figure 3 suggests that this is the case, but of course one expects that illusory contours are present across a wide range of scales too.

On the other hand, our findings do not rest on a narrow range of RF sizes or correlation scales, and the broad range of spatial scales to which neurons are sensitive to multipoint correlations corresponds to that of human perceptual sensitivity. Specifically, the distinction between informative and uninformative four‐point correlations persists across over a fourfold range of length scales (the entire range analyzed, see SI of Tkacik et al., PNAS 2010, Figure 14). The characteristics of human perception are approximately constant over a similar range of length scales (.03 to 0.25 deg checks, a range similar to that used in these experiments (Figures 2 and 8 of Victor and Conte 1989 [four‐point correlations]; also Victor, Thengone, Rizvi, and Conte et al., VSS Abstracts 2014 [two‐ and three‐point correlations as well]). Sensitivity across this range of scales is present even when eccentricity is restricted (Victor and Conte 1989). While individual neurons may not subtend the entire range, the population of V1/V2 neurons as a whole is likely to do, and Figure 2–figure supplement 1 shows that V2 sensitivity to multipoint correlations does not require a close match between RF size and the scale of the correlations. We now add a paragraph about this to the Discussion, and add further specifics about check size to the Methods section.

Second, thewhite and black trianglecorrelation structure is the only correlation structure that can distinguish between the two stimuli in panel b because it's the only odd-ordered correlation. This is why I earlier alluded to the point that it would have been helpful to include an uninformative third-order correlation stimulus. Also, in the specific example shown, couldn't one just use the mean (i.e. a first-order structure) to discriminate between the stimuli? I wonder if there might be a better choice of stimuli for this panel.

The “earlier alluded” passage seems to have been omitted from the consolidated reviews, but we think we understand the point. We agree that it would have been a good control to include an uninformative third‐order correlation, but previous work on natural image statistics did not identify any such configurations (and didn’t seek to do so). We agree with the comment that other third‐order correlations could distinguish figure from ground. But the reviewer is incorrect about first‐order statistics—they would not have been able to make this distinction, as the fraction of black and white checks in the figure/ground component of the former Figure 3 was each 50% (this is not at all apparent to casual viewing).

The reviewers also had several questions that we believe can be addressed by changes to the manuscript text.

In Hermundstad et al. 2014 the second order stimuli (beta) are more informative than the fourth order stimuli (alpha), which are more informative than the third order stimuli (theta). However, the authors only present data for alpha and theta stimuli (that were less informative in the Hermundstad paper) while not presenting the beta stimuli (that were more informative in the Hermundstad paper). I would like to know whether the authors (i) performed experiments with beta stimuli, (ii) if so why they did not report them, or (iii) why they did not consider them. I am sure the authors had a good reason which should be mentioned in the paper.

The reason that we did not include them in the paper was that our focus was on determining the origin of sensitivity to three‐ and four‐point statistics, and the study was designed to concentrate the experiment time on that issue. Second‐ and first‐order statistics, which affect, respectively, the mean luminance and spatial frequency content of the stimuli, will modulate responses even at the level of retinal ganglion cells, so clearly there’s no mystery as to where responses to such statistics arise. We now state this rationale in the first paragraph of Results.

On the other hand, we agree that the extent to which the pattern of sensitivities in V1 and V2 accounts for the pattern of human sensitivities to local image statistics of all orders (e.g., the shape and orientation of the isodiscrimination contours in Hermundstad et al.) is interesting and important, and we are carrying out experiments in the macaque to look at this. Preliminary results confirm the expectation of sensitivity to first‐ and second‐order statistics in V1 and V2 (as well as the current findings about third‐ and fourth‐order sensitivities primarily in V2), and also indicate that neurons have preferences pointing in many oblique directions in the space. But this goes far beyond the current manuscript, and we think deserves its own paper.

It would be nice if you could mention more explicitly how the rank order of the MCDI relate to the sensitivity order found in the psychophysical experiments of the previous paper.

We now add this material to the final two paragraphs of the Results section, underscoring that the V2 results are not a precise match for perception, but this is not surprising, as further processing may well intervene.

I was interested by the finding that the MCDI was unassociated with mappable receptive fields and would be interested in hearing some thoughts from the authors in their Discussion section.

We add a paragraph concerning this to the Discussion. In brief, there are several ways in which neurons that are not “mappable” could, nevertheless, participate in form vision tasks in general (Olshausen and Field 2004) , or, in particular, discriminate among multipoint correlations (as we show here). As defined in this paper, a “mappable” RF requires that reverse correlation of the neuron’s responses to the stimulus passes a statistical criterion. Standard practice is to use random binary stimuli for this mapping procedure; here we include stimuli with high‐order correlations in the mapping computation. The rationale is that this allows some kinds of nonlinear responses to emerge in a first‐order cross‐ correlation between stimulus and response, because of correlations within the stimuli (Schmid, Yu, and Victor, SfN Abstracts, 2012). But there is no reason to believe that this expanded stimulus set reveals all RFs. For example: (i) reverse correlations may not exceed threshold because of response variability; (ii) a neuron may only be responsive to stimulus configurations that occur very rarely in the stimulus set; or (iii) a neuron may be selectively responsive to one or more of the multipoint correlations tested, but since the occurrence of this configuration is not correlated with whether a check is white or black, there will be no correlation of the response to the state of individual checks. Our data do not permit us to determine which of these explanations dominate, but we suspect that (i) and (iii) contribute.

Can the authors clarify how they determined the p-value threshold in Figure 4D? Since the authors declare significanceif any of these p-valuesfalls below the FDR threshold, is the Benjamini-Hochberg correction equivalent to a Bonferroni correction? If so, then 40 comparisons would lead to a p-value correction less than that displayed. Or do the authors somehow correct for fact that their temporal smoothing effectively leads to fewer than 40 comparisons?

Sorry for being unclear on this. As is standard, the FDR threshold for a given significance level alpha is a data‐ determined quantity q, for which it can be anticipated that less than a fraction alpha of the values below q are false‐ positives. Typically q is substantially less than alpha (in the case shown in the Figure, alpha is 0.05 but q is <0.01). We clarify this by labeling the significance threshold in panel d as “q”, and adding text to the caption to highlight that the FDR threshold q is distinct from, and more stringent than, the un‐corrected significance threshold of alpha=0.05. With regard to positive correlations among the individual values, the false discovery correction procedure we used takes this into account: under conditions of positive correlation, the standard procedure reduces to the FDR procedure we used (Benjamini & Yekutieli, (2001), “The control of the false discovery rate in multiple testing under dependency”, Annals of Statistics 29 (4): 1165–1188. doi:10.1214/aos/1013699998. MR 1869245). [Note: previous Figure 4, the subject of these changes, is now Figure 3.]

There were also questions about the underlying assumptions in the model. We understand that these comments pertain equally to the already-published paper, but we nevertheless hope that you will be able to deal with them in a few sentences, which we felt would help the current manuscript.

What justifies the authors to assume the regime of sampling limitation rather than transmission limitation? You write that your results fit into the efficient coding framework if sampling an image is the main limitation. What is the empirical evidence that justifies this assumption as opposed to the transmission limited regime many other studies are based on?

The empirical evidence for the “sampling limitation” hypothesis is that it predicts the main finding of Hermundstad et al., relating psychophysical sensitivities of local features to statistics of natural images: greater sensitivity for image statistics that are more variable. Had output capacity (transmission) been the main limitation, psychophysical sensitivities would have been large for image statistics that were less variable—the opposite of what we found (Figure 4B vs. Figure 4C of Hermundstad et al.). We now add this point to the first paragraph of the Introduction.

If I understand correctly, the fact that humans/neurons should be more sensitive to more variable features is derived using a linear model with Gaussian input and channel noise. However, the mapping from images to multi-point correlations does not seem to be linear. How do you know that this result still holds in the non-linear case, in particular if the sampling regime is characterized by dominating input noise (which would get nonlinearly transformed)?

The model in Hermundstad et al. consists of local, nonlinear extraction of features, followed by analysis of how the statistics of these features are represented. The model did not attempt to analyze those nonlinear processes but simply considered their outputs to be “tokens”, which, once extracted, need to be represented and transmitted by central visual areas (Figure 4C of Hermundstad et al.). To apply principles of efficient coding to this process of representation and transmission, we assumed that once the tokens are extracted, the rest of the process is linear and noise is Gaussian. Obviously this is a simplification, but it is one that appears to account for the data. We now summarize this framework in a new paragraph (the third paragraph) in the Introduction.

How do you justify that more variable features contain more information? In the discrete case, I can see that. However, in the limit of infinitely many images, the multi-point correlations become continuous. In that case I could transform all features by a pointwise monotonic transformation (histogram equalization) that would not change the information content but make all features equally variable.

As the reviewer indicates, the Hermundstad et al. analysis was predicated on binarizing the continuum of gray levels in natural images, and naturally leads to the question of how the analysis would “scale up” to the full continuum of gray levels. But the notion that histogram equalization would make all features equally variable is not correct, on several grounds. First, although it would equalize first‐order statistics, it would not equalize higher‐order statistics. Second, even for first‐order statistics, histogram equalization is only information‐preserving if one ignores limitations on discriminating nearby gray levels that are arbitrarily close. We know from the work of Chubb et al. that with only 16 gray levels in play (limiting consideration to first‐order statistics), there are only three perceptual coordinates. So something much more interesting than histogram equalization is going on. We’ve begun to look at the three‐gray‐level case with multipoint statistics. Here, there are 66 possible feature dimensions in a 2x2 local neighborhood, and at least a dozen of these are visually perceptible, but not nearly all 66. In any case, this is a complex and interesting matter, and would take us far beyond the scope of this paper.

https://doi.org/10.7554/eLife.06604.008

Article and author information

Author details

  1. Yunguo Yu

    Brain and Mind Research Institute, Weill Cornell Medical College, New York, United States
    Contribution
    YY, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  2. Anita M Schmid

    Brain and Mind Research Institute, Weill Cornell Medical College, New York, United States
    Contribution
    AMS, Acquisition of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  3. Jonathan D Victor

    Brain and Mind Research Institute, Weill Cornell Medical College, New York, United States
    Contribution
    JDV, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    For correspondence
    jdvicto@med.cornell.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0002-9293-0111

Funding

National Institutes of Health (NIH) (EY09314)

  • Jonathan D Victor

The funder had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was supported by NIH EY09314 to JV. We thank Ferenc Mechler, Ifije Ohiorhenuan, Qin Hu and Eyal Nitzany for their assistance with the physiological experiments, Mary Conte and Keith Purpura for many helpful discussions, and Ann Hermundstad for her comments on the manuscript. We also thank Daniel Thengone for his help classifying spike waveforms into narrow- and broad-spiking. A portion of this work was reported at the 2013 meeting of CoSyNe (Computational and Systems Neuroscience), Salt Lake City, UT.

Ethics

Animal experimentation: This study was performed in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. All animal procedures were approved by the Institutional Animal Care and Use Committee (IACUC) of Weill Cornell Medical College, protocol 0810-795A.

Reviewing Editor

  1. Timothy Behrens, Oxford University, United Kingdom

Publication history

  1. Received: January 22, 2015
  2. Accepted: April 24, 2015
  3. Accepted Manuscript published: April 27, 2015 (version 1)
  4. Version of Record published: May 12, 2015 (version 2)

Copyright

© 2015, Yu et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,200
    Page views
  • 263
    Downloads
  • 12
    Citations

Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

  1. Further reading

Further reading

    1. Neuroscience
    Paul Hoffman et al.
    Research Article Updated
    1. Neuroscience
    Romain Franconville et al.
    Research Article Updated