Using the visual system as a model, we recently showed that the efficient coding principle accounted for the allocation of computational resources in central sensory processing: when sampling an image is the main limitation, resources are devoted to compute the statistical features that are the most variable, and therefore the most informative (eLife 2014;3:e03722. DOI: 10.7554/eLife.03722 Hermundstad et al., 2014). Building on these results, we use single-unit recordings in the macaque monkey to determine where these computations—sensitivity to specific multipoint correlations—occur. We find that these computations take place in visual area V2, primarily in its supragranular layers. The demonstration that V2 neurons are sensitive to the multipoint correlations that are informative about natural images provides a common computational underpinning for diverse but well-recognized aspects of neural processing in V2, including its sensitivity to corners, junctions, illusory contours, figure/ground, and ‘naturalness.’https://doi.org/10.7554/eLife.06604.001
We recently showed (eLife 2014;3:e03722. DOI: 10.7554/eLife.03722 [Hermundstad et al., 2014]) how a normative theory based on the efficient coding principle (Barlow, 1961) can account for the allocation of resources for the representation of complex sensory features. Specifically, we analyzed the local statistics of natural images, and compared the variability of these statistics with their perceptual salience. The statistics that were the most variable—that is, the least predictable and therefore the most informative—were the most salient perceptually. This relationship, in which greater resources are allocated to more variable features, emerges from the efficient coding principle in the regime that the main constraint is input sampling (Barlow, 1961; van Hateren, 1992; Doi and Lewicki, 2014; Hermundstad et al., 2014). The observed relationship contrasts with the more familiar ‘whitening’ regime (Srinivasan et al., 1982), which emerges when the main constraint is output capacity (e.g., with regard to the retina and the optic nerve bottleneck); the whitening regime predicts that fewer resources are allocated to more variable features. We note that the results of (Hermundstad et al., 2014) provide empirical support for the hypothesis that input sampling, rather than output capacity, is the main constraint—since a transmission limit would have predicted a lower sensitivity for image statistics that were the most variable, the opposite of what we found.
To reach this result, we analyzed natural images via their multipoint correlations, that is, the statistics of the combinations of luminance values that appear in several points of the image. This approach has several advantages. First, it reduces the dimensionality of the space of image statistics that need to be considered, since it can be applied to binarized images, and it separates informative from uninformative statistics (Tkačik et al., 2010). Second, the approach enables rigorous tests of theoretical predictions, since the individual kinds of informative and uninformative multipoint correlations can be isolated in synthetic image sets (Victor and Conte, 2012). In contrast, the multipoint correlations in natural images covary in a complex manner that is difficult to capture or control. Synthetic image sets that isolate individual kinds of multipoint correlations are highly un-natural, but here this is an advantage: our predictions, which are derived from natural images, are tested in an out-of-sample fashion.
The information-theoretic framework of (Hermundstad et al., 2014) and (Tkačik et al., 2010) played a key role in this analysis, and we briefly summarize it here. We used a two-stage model: first, the informative multipoint features (as identified by [Tkačik et al., 2010]) are extracted from a visual image by a set of local nonlinear processing elements. Then, the output of this stage, that is, the frequency of each feature in patches of the image, is represented and transmitted by central visual areas, to serve as the basis for visual inferences (Figure 4C of [Hermundstad et al., 2014]). We used a linear channel with additive Gaussian noise as an approximation for this latter process. While obviously a simplification, this leads to an analytic solution (van Hateren, 1992) for the allocation of resources that maximizes the mutual information between stimuli and their central representation—and the analytic solution accurately accounted for dozens of independently-determined psychophysical parameters (Hermundstad et al., 2014).
Unaddressed, however, was where the extraction of multipoint correlations takes place. Several lines of evidence suggested that selective sensitivity to multipoint correlations arises in visual cortex (discussed in [Hermundstad et al., 2014]), but a direct demonstration was lacking.
Here, we report single-unit recordings in macaque V1 and V2, showing that neuronal selectivity for multipoint correlations is infrequent in V1, and becomes prominent in V2, especially in its supragranular layers. Well-recognized characteristics of V2 neurons, including sensitivity to corners, junctions (Das and Gilbert, 1999), illusory contours (von der Heydt et al., 1984), figure/ground (Qiu and von der Heydt, 2005), and ‘naturalness’ (Freeman et al., 2013) all entail sensitivity to multipoint correlations; here we show that this sensitivity is present even when these correlations are separated from their natural context.
We recorded the extracellular activity of 421 individual neurons (269 in V1, 152 in V2) in the anesthetized, paralyzed macaque to stimulus sets that isolate the multipoint correlations previously studied in natural images (Tkačik et al., 2010; Hermundstad et al., 2014) and psychophysically (Victor and Conte, 1991, 2012).
The stimulus sets are illustrated in the top row of Figure 1 (see ‘Materials and methods’ for details). In the ‘random’ stimulus set, check colors are assigned independently, with an equal chance of being white or black. The six structured stimulus sets were as follows: The ‘even’ and ‘odd’ sets isolate the opposite extremes of the visually salient four-point correlation (Hermundstad et al., 2014), there denoted α. The ‘white triangle’ and ‘black triangle’ sets isolate the extremes of the visually salient three-point correlation (Hermundstad et al., 2014), there denoted θ. The ‘wye’ and ‘foot’ sets have multipoint correlations are not visually salient (Victor and Conte, 1991); this is in keeping with the efficient coding principle because in natural images, these correlations are predictable from simpler quantities (Tkačik et al., 2010). We focused on three- and four-point correlations, since one- and two-point statistics (luminance and spatial contrast) are well-known to modulate responses throughout the visual system, beginning in the retina.
Figure 1 shows post-stimulus histograms (PSTHs) of typical neurons in V1 and V2. Responses have a prominent transient after each stimulus transition, when on average half of the checks change from black to white or from white to black. For some neurons (e.g., the first, third, fourth, and sixth examples in V1), this transient is nearly identical for each of the stimulus sets. For other neurons (e.g., the second and fifth examples in V1, and most of the V2 examples), the transients differ in magnitude or configuration, suggesting a differential response to multipoint correlations.
To quantify these differences, we applied a shuffle test to the smoothed firing rates (see ‘Materials and methods’). Significant differences between responses to structured and random stimuli (the asterisks in Figure 1) were more common in V2 than in V1. For a more thorough characterization, we defined the ‘multipoint correlation discrimination index’ (MCDI), which counted not only the comparison between structured and random stimuli, but also comparisons among pairs of different structured stimuli. The MCDI was defined as the fraction of the 21 pairwise comparisons that differed by the above statistical criterion. An MCDI of 0 means that a neuronal response to all stimulus types is indistinguishable; an MCDI of 1 means that a neuronal response distinguishes between all stimulus pairings, and therefore, between all of the structured stimulus sets.
Figure 2A (upper row) summarizes the MCDI across the neuronal population. The average MCDI peaked at a value of approximately 0.05 in V1, and approximately 0.10 in V2; this difference became significant at 70 ms after stimulus onset.
A laminar analysis of the MCDI (Figure 2A, lower three rows) revealed a slight increase from the V1 granular (input) layer (mean 0.025) to the V1 extragranular layers (supragranular: 0.033, infragranular, 0.045), followed by a jump at the V2 granular layer (0.101), with a marked upsurge in the V2 supragranular layer (0.162), but not the infragranular layer (0.052). The difference between the MCDI in supragranular V2 and each of the other compartments was significant, except for the comparison with granular V2 (p = 0.053). The median value of the MCDI in supragranular V2 was 0.12, meaning that the ‘typical’ neuron responded differentially for 2 or 3 of the 21 pairwise comparisons. In all other compartments (in V1 and V2), the median was 0, that is, the ‘typical’ neuron did not distinguish between any of the stimulus types. Atypical neurons in V1 did distinguish among multipoint correlations. These were primarily neurons in the infragranular layer and with large receptive fields (RFs)—see Figure 2—figure supplement 1. But overall, the mean MCDI was lower in V1 (0.027) than in V2 (0.081), especially in its supragranular compartment (0.162).
Sensitivity to multipoint correlations was not restricted to specific cell types. Specifically, the MCDI was not significantly associated with the simple vs complex distinction, nor with the distinction between regular-spiking and fast-spiking neurons, as determined by extracellular action potential shapes (Niell and Stryker, 2008). Sensitivity to multipoint correlations was also present in isolated units that did not have overt RFs by hand-mapping or by reverse correlation (81/269 in V1 and 65/152 in V2); these units had waveforms that were isolated by the tetrode recordings, and likely include many of the ‘unresponsive’ neurons (Olshausen and Field, 2004) that would not have been selected for study with single-electrode methods. There was no significant difference in the MCDI between these neurons and the simultaneously-recorded neurons with mappable RFs, either in V1 or V2. Among the neurons with mappable RFs, the MCDI was not significantly different for neurons whose RFs were above vs below the median size for their laminar compartment. Thus, the sensitivity to multipoint correlations does not require a precise match between the RF size and the spatial scale of the correlations. In sum, sensitivity to multipoint correlations was widely distributed across V2 neurons.
The difference in sensitivity to multipoint correlations between V1 and V2 was not due to a difference in RF size, nor to stimulus scaling (i.e., the number of stimulus checks within the RF). The upper left panel of Figure 2—figure supplement 1 compares MCDI across V1 and V2 as a function of RF area; across the entire range of sizes, the MCDI is higher in V2 than in V1. The upper right panel makes this comparison as a function of the number of checks within the RF, which also equates neurons whose RFs covered the same fraction of the stimulus area. Here too, the MCDI in V2 was larger than in V1. The remaining rows of Figure 2—figure supplement 1 break the analysis down by laminar compartment. In granular and supragranular layers, the above observations hold, but there is a suggestion of a subset of V1 neurons with large RFs (lower left panel) that are sensitive to multipoint correlations. However, it is unlikely that this subpopulation underlies the high MCDI seen in V2: the targets of infragranular V1 (Felleman and Van Essen, 1991) are the superior colliculus (layer 5) and the lateral geniculate (layer 6), while the inputs to V2 arise mainly from layers 2, 3, and 4b, where the MCDI is low. Moreover, the difference between supragranular V2 and the V2 input layer strongly suggests that the behavior in supragranular V2 is a result of intrinsic processing in V2, not a feature of signals passed on by V1 (which would already have been present in the granular layer).
Figure 2B shows that the multipoint correlations that contribute to the MCDI are the ones previously identified as being informative about natural images (Tkačik et al., 2010) and perceptually salient (Victor and Conte, 1991), namely, the even, odd, white triangle, and black triangle stimuli. Figure 2C further breaks down the MCDI into the individual pairwise comparisons. Few neurons, either in supragranular V2 or across the population, discriminated among pairs of the stimuli with uninformative multipoint correlations (random, wye, and foot). To visualize the pattern of discrimination across the neuronal population, we applied multidimensional scaling to the data of Figure 2C. This led to a three-dimensional representation (Figure 2D) in which the seven stimulus types are represented by points, and the distance between the points corresponds to the fraction of neurons that distinguishes between them (i.e., the average pair-specific MCDI across the population). In V1, points are clustered near the origin, since most neurons cannot distinguish between any stimulus types. In V2, the representation expands into a multidimensional space. The two opposite stimulus pairs (even vs odd, and white triangle vs black triangle) are separated along different axes. Correspondingly, psychophysical studies show that the even-vs-odd gamut, and the white triangle-vs-black triangle gamut are independent perceptual axes (Victor and Conte, 2012) (Figure 8 panel 2 of reference [Victor and Conte, 2012], and the [θ, α]-panels of Figure 3 of Hermundstad et al. (2014)). Human perceptual sensitivities are larger for the four-point configuration than for the three-point configurations (Victor and Conte, 2012; Hermundstad et al., 2014); this is mirrored by higher values of the MCDI for the even stimuli than for the white triangle or black triangle stimuli in supragranular V2 (Figure 2D).
However, there are some differences between representation of informative multipoint correlations in the V2 population (as shown in Figure 2D) and human psychophysics (Victor and Conte, 2012; Hermundstad et al., 2014). First, the points corresponding to the uninformative stimuli are close to, but not superimposed on, the random stimulus. Additionally, while psychophysical sensitivity to the odd stimulus is only about 25% less than sensitivity to the even stimuli (Victor and Conte, 2012), the MCDI for the odd stimulus is much lower. We note that the odd stimulus contains even correlations when analyzed at spatial scales larger than a single check (Victor and Conte, 1989), so neuronal mechanisms sensitive to the even correlation will also contribute to the perceptual salience of the odd stimulus. More generally, the discrepancies between V2 neuronal activity and perception may reflect the simple measure used for quantifying discrimination at the population level (the average MCDI and multidimensional scaling), as well as further neural processing between V2 and perception.
Building on recent findings that the perceptual salience of complex (multipoint) image statistics is governed by their informativeness in natural images, here we show that selective sensitivity to these image statistics arises primarily in V2. Within V2, the greatest sensitivity is in the supragranular layers, where the typical (median) neuron can distinguish between two or three of the stimulus pairs. In contrast, typical neurons in V1 do not distinguish between any of the stimuli, although there appears to be a subpopulation of large-RF neurons in infragranular V1 with a modest level of selective sensitivity. The overall pattern of neuronal sensitivity to image statistics (Figure 2D) resembles the sensitivity of human observers, driven primarily by the multipoint statistics that are visually salient.
We speculate that sensitivity to informative multipoint correlations is the computational underpinning of many of the changes in neural characteristics from V1 to V2 that have previously been noted—sensitivity to corners, junctions (Das and Gilbert, 1999), illusory contours (von der Heydt et al., 1984), figure vs ground (Qiu and von der Heydt, 2005), and ‘naturalness’ (Freeman et al., 2013). The distinction between informative and uninformative multipoint correlations emerged from a formal information-theoretic analysis of natural images (Tkačik et al., 2010). While this analysis did not relate ‘informativeness’ to these other characteristics, inspection of the examples of Figure 1 suggests several points of contact. With regard to junctions and contours, examples of the odd ensemble images contain large numbers of corners, while examples of the even ensemble contain large numbers of crossings. The extended contours evident in the even ensemble are a kind of illusory contour, since the polarity changes that define them undergo random flips, which would confound a linear edge detector. With regard to figure vs ground, stimuli in the black triangle and white triangle ensembles appear to contain, respectively, black figures on white backgrounds, vs white figures on black backgrounds—even though the stimulus sets are matched for spatial frequency content and the number of black and white checks. Thus, informative multipoint correlations result in images that are enriched for junctions, contours, and objects, compared to images that have the same first- and second-order statistics but lacking these correlations. While the extent to which these local features account for ‘naturalness’ remains for future work, the present results show that selective sensitivity of V2 neurons for informative multipoint correlations persists even when they are removed from the context of a natural image.
It is unclear to what extent it is necessary to match the scale of a multipoint correlation with that an illusory contour or junction in order for the visual feature to be extracted. However, the distinction between informative and uninformative multipoint statistics holds over at least a fourfold range of length scales (the entire range analyzed, SI figure 14 of [Tkačik et al., 2010]). Human sensitivity to these correlations is present over at least a similar range of check sizes (0.03–0.25 deg, Figure 2, 8 of [Victor and Conte, 1989]; also [Conte et al., 2014]) comparable to the range of check sizes used in this study (0.08–0.5 deg). This broad range of sensitivities is found even when stimuli are restricted in eccentricity (Victor and Conte, 1989). Figure 2—figure supplement 1 (right column) shows that V2 sensitivity to multipoint correlations also does not require a close match between RF size and the scale of the multipoint correlation; this sensitivity is present over a threefold range of length scales (i.e., a 10-fold range of the number of checks per receptive field). Thus, it is likely that the entire range of scales relevant to perception can be accounted for by the properties of individual neurons, along with the variation in RF sizes at each eccentricity (Hubel and Wiesel, 1968).
Neurons whose RFs are difficult to map are often ignored in physiologic studies (Olshausen and Field, 2004). We were able to analyze their responses here because of the tetrode recording method, and found that many V2 neurons whose RFs could not be mapped nevertheless often showed selective sensitivity to multipoint correlations. We consider some possible reasons for this here. As defined in this paper, a neuron is considered to have a mappable RF if the reverse correlation of the neuron's responses to the stimulus passes a statistical criterion (see ‘Materials and methods’). Standard practice is to use random binary stimuli for this mapping procedure (Reid et al., 1997); here we include stimuli with high-order correlations in the mapping computation. The rationale is that inclusion of these stimuli allows some kinds of nonlinear responses to emerge in a first-order cross-correlation between stimulus and response, because of correlations within the stimuli (Schmid et al., 2011). But even an expanded stimulus set may not reveal the RFs of all neurons that respond to multipoint correlations. Reverse correlations may not exceed our statistical threshold because of response variability, or because the neuron is only responsive to stimulus configurations that occur very rarely in the stimulus set. We also note that from a computational point of view, our assay for sensitivity to multipoint correlations is independent of whether the neural response is correlated with the state of any single check: for each of the ensembles that probe a different kind of multipoint correlation, the number of black and white checks are equated, at each location. Thus, it is quite possible for a neuron to process information in a localized region of space (as manifest by its sensitivity to multipoint correlations) yet fail to have a RF that is measurable by reverse correlation methods, as we show here.
Finally, our findings carry implications for neural mechanisms. Many biologically-plausible mechanisms can extract multipoint correlations, including a simple linear-nonlinear cascade (provided that the nonlinearity is more than quadratic), and modulatory surrounds (Schmid and Victor, 2014; Self et al., 2014). But models need to account for the specificity of the responses, not just their existence. In this regard, we note (Victor and Conte, 1991) that the specificity we observe can be produced by a two-stage (linear-nonlinear-linear-nonlinear) cascade, in which the first linear-nonlinear element detects local edges, and the second one combines signals from collinear edges via a second threshold. Removal of either component of the second stage—either its linear or the nonlinear element—eliminates this specificity. The finding that responses to multipoint correlations are more prominent in supragranular V2 than in its input layers or in V1 suggests possible correspondences between this cascade and neural circuitry. One possibility is that the first stage is in V1 and the second stage is in V2 (Wilson et al., 1992; Rust et al., 2005). Alternatively, the two linear-nonlinear stages may represent two loops of signal passage through a recurrent network within a single cortical area (Joukes et al., 2014).
All procedures conformed to the guidelines provided by the US National Institutes of Health and Weill Cornell Medical College Animal Care and Use Committee. Full details concerning the physiologic preparation are provided in Schmid et al. (2014), and are summarized here.
Single-unit recordings using arrays of three to six independently positioned tetrodes (typical input resistance, 1–2 MΩ; Thomas Recording GmbH, Giessen, Germany) were made in V1 and V2 of 14 macaques, anesthetized with propofol and sufentanil and paralyzed with vecuronium or rocuronium. Tetrodes were placed on opposite sides of the V1/V2 boundary, and typically within 1 mm of each other within each region, so that the units recorded by the tetrodes generally had neighboring or overlapping RFs. This yielded a total dataset of 421 neurons (269 in V1, 152 in V2), following spike sorting and selection for firing rate criteria (see below).
Tetrodes were independently lowered until they recorded visually-driven extracellular action potentials. After initial hand-mapping, tuning properties were determined from responses to 3–4 s presentations of drifting sinusoidal gratings. Stimulus parameters were successively refined in the order of orientation, spatial frequency, temporal frequency and contrast, based on on-line analysis of the responses of the target unit. When the recorded cluster had well-isolated units that preferred an orientation other than the preferred orientation for the target unit, this process was repeated for a second, and rarely a third, orientation as well.
To determine neuronal responses to multipoint correlations, we measured responses to a sequence of black-and-white checkerboards that isolated the individual kinds of correlation. Figure 1 (top) shows three examples of these seven stimulus types. Each stimulus consisted of a 16 × 16 array of black and white checks. In the ‘random’ stimulus set, check colors were assigned independently, with an equal chance of being white or black. In the other stimulus sets, the coloring rule isolated a single kind of multipoint correlation. In the ‘even’ set, there was always an even number of white (or black) checks in any 2 × 2 neighborhood of checks. In the ‘odd’ set, there was always an odd number of white (or black) checks in a 2 × 2 neighborhood. Even and odd sets are the opposite extremes of the visually salient four-point correlation (Hermundstad et al., 2014), α. In the ‘white triangle’ set, there were always one or three white checks within a triangular region; in the ‘black triangle’ set, there were always one or three black checks within a region of the same shape. These two sets correspond to opposite extremes of the visually salient three-point correlation (Hermundstad et al., 2014), θ. We also examined responses to four-point correlations in two other spatial configurations, ‘wye’ and ‘foot.’ Multipoint correlations in the wye and foot configurations are predictable from simpler quantities in natural images (Tkačik et al., 2010), and, in keeping with predictions of efficient coding (Hermundstad et al., 2014), they are not visually salient (Victor and Conte, 1991).
Check size was scaled to the RF size of the target neuron so that approximately two checks corresponded to one lobe of the optimal spatial frequency, and orientation was set according to the orientation preference of the target neuron. This resulted in about 8 checks within the classical RF (V1: mean 7.40, median 6.00, SD 5.33; V2: mean 8.68, median 7.00, SD 5.46; statistics across all mappable units and not just the target; see below for details on RF mapping and Figure 2—figure supplement 1 for the distribution of number of checks in the RF); thus, the checks are within the resolution limits of the neuron, and the stimulus patch covers an area that is substantially larger than the RF. Across all recordings (including mappable and un-mappable units), check sizes ranged from 0.08 to 0.5 deg (V1: mean 0.18, median 0.20, SD 0.05; V2: mean 0.22, median 0.20, SD 0.12).
For each type of stimulus, we presented 1024 examples (two repetitions each) for 320 ms, interleaved in a pseudorandom sequence. This large set size was chosen so that we can distinguish average responses to each of the stimulus sets (our focus) from responses that might be driven to the specific white or black checks or edges present in particular examples (a potential confound).
Stimuli were generated via a Markov recurrence rule (Victor and Conte, 1991, 2012), so that other than the constraint of their defining multipoint correlation, they are as random as possible (maximum-entropy). This yields stimulus sets that enable testing of each kind of multipoint correlation in isolation. In each set, there are no two-point correlations—checks at any pair of locations are colored independently—so that the sets have the same power spectra, and therefore the same spatial frequency content. The four kinds of correlations (the even/odd axis, the black triangle/white triangle axis, wye, and foot) are independently controlled: each set extremizes one of these correlations, while keeping all the others at 0 (Gilbert, 1980). Thus they provide a way to assay responsiveness to each kind of multipoint correlation in isolation.
All stimuli were rendered on a 1280 × 1024-pixel display at 100 Hz, using either a 21-inch ViewSonic G225f monitor (mean luminance 47 cd/m2, gamma-corrected) or a Sun GDM5410 monitor (mean luminance 46 cd/m2, gamma-corrected) at 114 cm. Control signals for the displays were generated by PC-based system using OpenGL software.
After bandpass filtering (300–9000 Hz) and thresholding, waveforms were clustered using customized versions of KlustaKwik and Klusters (Hazan et al., 2006); details as in Schmid et al. (2014). The 17 features consisted of peak and trough amplitudes (8 features), the first 8 principal components, and time. All neurons whose mean firing rates across all stimuli were ≥ 1 Hz were analyzed for their responsiveness to the multipoint correlation stimuli described above.
To classify extracellular spike waveforms as narrow-spiking (putative inhibitory) and broad-spiking (putative excitatory), we used a method similar to that of refs. (Mitchell et al., 2007) and (Niell and Stryker, 2008). For each single unit, the waveforms from each tetrode channel were averaged and the channel with the largest signal to noise ratio (SNR) was selected for the spike width measurement. Two parameters of spike width were measured: (1) trough to peak width—the duration from the trough to the peak of the waveform, and (2) half-peak width—the duration from the peak of the waveform to half its height. The distribution of both measurements across the 1856 waveforms from the laboratory database were significantly bimodal (p < 0.01 by the Hartigan dip test [Hartigan and Hartigan, 1985]). Based on the notch in the distribution, we classified extracellular waveforms as narrow-spiking (<405 µs) and broad-spiking (>430 µs). Next the averaged waveforms themselves were clustered using k-means. The clusters were separated identically by k-means of the waveforms, and the distribution of the spike width parameters.
At the conclusion of the experiment, we made small lesions at locations that bracketed the recording sites along each tetrode track, via current passage through the most distal tetrode contact. Details concerning the procedures for lesions, perfusion, and histology are in ref. (Schmid et al., 2014). For sites for which the laminar assignment was uncertain, neurons were included in the tallies for V1 and V2 (e.g., top rows of each panel of Figure 2) and the statistical comparisons between them, but not in the breakdown by layer or statistical comparisons between layers. This amounted to <10% of the units.
Tuning curves were computed in the standard fashion from the Fourier components of the spike train elicited by each grating stimulus, as detailed in Schmid et al. (2014). Tuning curve peaks were determined from the DC response (F0) or the first harmonic (F1), whichever was larger. We classified neurons as simple or complex according to whether their response to a drifting grating was primarily at the period of the grating (simple) or primarily a maintained elevation (complex), as quantified by the F1/F0 ratio (Skottun et al., 1991). F1 is the first harmonic of the response to the optimal grating tested, F0 is the maintained firing rate of the response, after subtraction of the average firing rate in response to a uniform field at the mean illumination. Note that since grating parameters were chosen according to the preference of neurons whose waveforms could be discriminated online, some neurons may not have been stimulated at the optimal orientation or spatial frequency.
RF maps were determined by correlating the neural response (1 for white checks, −1 for black checks) to the checkerboard stimuli (16 × 16 checks). The response measure was the total number of spikes over the duration of each presentation (320 ms) averaged across both repetitions; this is equivalent to computing the spike-triggered-average and then summing over the stimulus duration. Maps were separately computed for each of the seven stimulus types; as reported previously, some neurons (Schmid et al., 2012) that did not have mappable RFs for random checkerboards nevertheless had mappable RFs for the other stimulus types. Statistical significance for each of these seven maps was determined by a shuffle test: we recomputed maps from 500 surrogate data sets in which the responses to each stimulus type were permuted, determined the mean and standard deviation of these surrogate maps at each check, and then used the corresponding Gaussian distribution to determine which actual map values were significant at p < 0.05 (two-tailed, correcting for multiple comparison via the Benjamini-Hochberg method, that is, false discovery rate [FDR] method) (Benjamini and Hochberg, 1995). We then determined the union of the seven maps obtained from each stimulus. Usually this yielded a single connected component, and the RF was taken to be its convex hull. When more than one connected component was present, smaller components were merged with the largest one if they were separated by no more than a single check, and the convex hull of the resulting region was taken as the RF. The number of checks in this convex hull was taken as the measure of RF size. If none of the seven classes of stimuli yielded a significant RF map by the above criteria, the neuron was considered not to have a mappable RF. As an alternative procedure, we also computed RF maps by correlating the responses with all (7 × 1024) stimuli, and this yielded very similar results.
To measure sensitivity to multipoint correlations, we proceeded as follows (see Figure 3). For each of the stimulus types, we accumulated a PSTH across all 1024 examples (and 2 repeats), and then determined the smoothed firing rate via local linear regression (Loader, 2012). Significance of the difference between two firing rate functions was determined by a shuffle test, in which 3000 surrogate data sets were created by randomly exchanging responses among a pair of stimulus types. The exchanges were limited to responses that were recorded in adjacent trials (within 4 s of each other), to avoid confounds due to slow changes in firing rate over time. The difference between the smooth firing rates of the actual data was compared to the distribution of differences seen in the surrogate datasets at each 5 ms bin, from 55 to 250 ms. The number of times the actual difference was exceeded by any of the 3000 surrogates yielded a raw two-tailed p-value at each of these 40 time points. If the raw p-value was below the false-discovery-corrected threshold of p = 0.05, the neuron was considered to have a different response to the two kinds of stimuli at that time point. For each neuron, the MCDI at each time point (Figure 2A) was defined as the fraction of stimulus pairs that elicited statistically different responses as determined by the above procedure; the MCDI was therefore n/21, where n is the number of stimulus pairs that elicited statistically different responses. For each of the seven stimuli, we also calculated a stimulus-specific MCDI, considering only the six pairs of discriminations involving that particular stimulus (Figure 2B); this was therefore a quantity n/6. Finally, to detail the pattern of pairwise discriminations (Figure 2C,D), we computed a ‘pair-specific’ MCDI—either 0 (no discrimination) or 1 (discrimination), and averaged it across the population. For this purpose, we considered a neuron to distinguish a pair of stimuli if a difference was present at any time during the 55–250 ms period, again using the above statistical criteria.
Sensitivity to multipoint correlations was not associated with the simple vs complex distinction (as measured by F1/F0 ratio, with a dividing point at 1 [as in ref. Mechler and Ringach, 2002] or at the population median). These and other comparisons between subsets of cells (e.g., V1 vs V2, or between laminar compartments) were carried out using a two-tailed Wilcoxon rank-sum test. The raw p-values were subjected to false discovery correction (Benjamini and Hochberg, 1995) across time points, in 5 ms bins from 55 ms to 250 ms. Statistical significance corresponds to p < 0.05.
To visualize the population pattern of differential responses (Figure 2D), we used standard multidimensional scaling (Kruskal and Wish, 1978), applied to the fraction of neurons that distinguished between each pair of correlation types. The first two embedding dimensions (as shown in Figure 2D) typically accounted for > 90% of the variance.
Sensory communicationSensory communication, Cambridge, MIT Press.
Controlling the false discovery rate: a practical and powerful approach to multiple testingJournal of the Royal Statistical Society Series B, Statistical Methodology 57:289–300.
Sensitivity to local image statistics is (almost) scale-invariantVision Sciences Society Annual Meeting.
A simple model of optimal population coding for sensory systemsPLOS Computational Biology 10:e1003761.https://doi.org/10.1371/journal.pcbi.1003761
A functional and perceptual signature of the second visual area in primatesNature Neuroscience 16:974–981.https://doi.org/10.1038/nn.3402
Random colorings of a lattice of squares in the planeSIAM Journal on Algebraic Discrete Methods 1:152–159.https://doi.org/10.1137/0601018
Klusters, NeuroScope, NDManager: a free software suite for neurophysiological data processing and visualizationJournal of Neuroscience Methods 155:207–216.https://doi.org/10.1016/j.jneumeth.2006.01.017
Receptive fields and functional architecture of monkey striate cortexThe Journal of Physiology 195:215–243.https://doi.org/10.1113/jphysiol.1968.sp008455
Motion detection based on recurrent network dynamicsFrontiers in Systems Neuroscience 8:239.https://doi.org/10.3389/fnsys.2014.00239
Multidimensional scalingSAGE publications, inc.
Handbook of computational statisticsHandbook of computational statistics, Springer Berlin Heidelberg.
Highly selective receptive fields in mouse visual cortexThe Journal of Neuroscience 28:7520–7536.https://doi.org/10.1523/JNEUROSCI.0623-08.2008
23 Problems in Systems Neuroscience23 Problems in Systems Neuroscience, Oxford University Press.
Responses to orientation discontinuities in V1 and V2: physiological dissociations and functional implicationsThe Journal of Neuroscience 34:3559–3578.https://doi.org/10.1523/JNEUROSCI.2293-13.2014
Mapping receptive fields using stimuli with third- and fourth-order statistics: black blobs better than randomWashington, DC: Society for Neuroscience.
Mapping receptive fields using stimuli with high-order statistics in V1 and V2Mapping receptive fields using stimuli with high-order statistics in V1 and V2, Annual meeting for Society for Neuroscience, October 16, 2012, New Orleans, LA, Society for Neuroscience.
Orientation-tuned surround suppression in mouse visual cortexThe Journal of Neuroscience 34:9290–9304.https://doi.org/10.1523/JNEUROSCI.5051-13.2014
Classifying simple and complex cells on the basis of response modulationVision Research 31:1079–1086.
Predictive coding: a fresh view of inhibition in the retinaProceedings of the Royal Society of London Series B, Biological Sciences 216:427–459.https://doi.org/10.1098/rspb.1982.0085
Local statistics in natural scenes predict the saliency of synthetic texturesProceedings of the National Academy of Sciences of USA 107:18149–18154.https://doi.org/10.1073/pnas.0914916107
Cortical interactions in texture processing: scale and dynamicsVisual Neuroscience 2:297–313.https://doi.org/10.1017/S0952523800001218
Spatial organization of nonlinear interactions in form perceptionVision Research 31:1457–1488.https://doi.org/10.1016/0042-6989(91)90125-O
Local image statistics: maximum-entropy constructions and perceptual salienceJournal of the Optical Society of America A, Optics, Image Science, and Vision 29:1313–1345.https://doi.org/10.1364/JOSAA.29.001313
Timothy BehrensReviewing Editor; Oxford University, United Kingdom
eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.
Thank you for sending your work entitled “Visual processing of informative multipoint correlations arises primarily in V2” for consideration at eLife. Your article has been favorably evaluated by Timothy Behrens (Senior editor and Reviewing editor) and three reviewers, one of whom, Michael Landy, has agreed to share his identity.
The editor and the reviewers discussed their comments before we reached this decision, and the editor has assembled the following comments to help you prepare a revised submission.
The editor and reviewers agree that the finding of neural correlates of multipoint correlations reflects an important advance over your previous behavioural findings and are enthusiastic about the potential publication of this Research Advance.
This paper takes the previous work of this group on which 2-, 3- and 4-point correlations are visually salient and to which the visual system is sensitive, and shows that some differential responses to these correlations arise first in area V2. As the authors are aware, I'm pretty familiar with this line of work (having reviewed several of the earlier papers in the series). Tying this story to the physiology is certainly a logical and useful next step.
In the current manuscript Yu et al. add to previously published findings that human observes are more sensitive to more informative multi-point correlations in images. Here they provide a candidate for the neural substrate of this sensitivity: supragranular layers of V2. I think the manuscript is an interesting addition to the previous paper.
Overall, this Research Advance is clearly written and nicely complements the founding article by providing neuronal correlates for multipoint correlation stimuli that have theoretical significance and perceptual relevance.
However, there were several questions that the review panel would like addressed before we could consider publication of the study.
During discussion the panel agreed that the most critical issue to address before the paper can be published is the issue of the scaling of the stimuli.
Since receptive fields are larger in V2 and you are adjusting the stimulus to the receptive field size, aren't you effectively presenting two different stimulus ensembles to the V1 and the V2 population, respectively? Could that explain the differential response between V1 and V2? Do you have control data, where you, instead of upscaling the 16x16 patch, simply increased the number of pixels to match the receptive field size? Alternatively, you could drive the V1 population with the upscaled stimuli for the V2 neurons and see whether your results change. If you do not have such data readily to hand, do you have other means of ruling out that the stimulus scaling confounds the results? For example, are there recorded V1 cells whose receptive field completely overlapped with that of a V2 cell? In that case, you would have responses to the same stimulus ensemble, but from two different areas. The review panel agreed that this issue should be rigorously addressed.
If such data do not exist, then the review panel asks you to remove the claim of a distinction between V1 and V2 from the paper since the data does not really support a comparison, and to explicitly mention that V1 and V2 are stimulated with differently scaled stimuli and explain the reasoning behind it.
A related question about the relationship between V1 and V2 coding was also raised:
Figure 2A shows that at least 75% of V1 cells have no selectivity for multipoint correlation stimuli, yet the mean MCDI of all cells is ∼0.05. This implies that V1 has a small population of V1 cells that have an MCDI of 0.2 or more. How do the response properties of these “special” V1 neurons compare to a “typical” V2 neuron? With the current presentation, it's hard to tell whether V2's representation of multipoint correlations is new or enhances a representation already present in a small subpopulation of V1. As such, the first sentence of the Discussion, which says “arises primarily in V2” seems imprecise. This distinction is also relevant to the authors' Discussion hypothesis that higher-order correlation specificity might emerge from a two-stage cascade from V1 to V2.
The reviewers were also concerned about the illusory contour figure:
At present, the connections between higher-order correlations and previously hypothesized roles for V2 seem tenuous and insufficiently detailed to warrant a full figure in the main text.
The authors use Figure 3 to argue that V2's selectivity to multipoint correlations helps explain its involvement in the detection of illusory contours and the discrimination of figure and ground. I have two comments. First, in panel a, the “even and odd” correlation structure picks out the corners of the black bars. As such, the association of this correlation with the illusory contour is a consequence of the fact that the bars are spaced by the same distance that defined the “even and odd” correlation structure. Can the authors say anything about whether there is an association between the spatial scales of illusory contour detection and the “even and odd” correlation structure? For example, do humans perceive illusory contours over the same length scales that “even and odd” correlation structures are informative for natural images? Has V2 previously been shown to be sensitive to illusory contours on the spatial scale that the authors use in the “even and odd” correlation structure? Second, the “white and black triangle” correlation structure is the only correlation structure that can distinguish between the two stimuli in panel b because it's the only odd-ordered correlation. This is why I earlier alluded to the point that it would have been helpful to include an uninformative third-order correlation stimulus. Also, in the specific example shown, couldn't one just use the mean (i.e. a first-order structure) to discriminate between the stimuli? I wonder if there might be a better choice of stimuli for this panel.
The reviewers also had several questions that we believe can be addressed by changes to the manuscript text.
In Hermundstad et al. 2014 the second order stimuli (beta) are more informative than the fourth order stimuli (alpha), which are more informative than the third order stimuli (theta). However, the authors only present data for alpha and theta stimuli (that were less informative in the Hermundstad paper) while not presenting the beta stimuli (that were more informative in the Hermundstad paper). I would like to know whether the authors (i) performed experiments with beta stimuli, (ii) if so why they did not report them, or (iii) why they did not consider them. I am sure the authors had a good reason which should be mentioned in the paper.
It would be nice if you could mention more explicitly how the rank order of the MCDI relate to the sensitivity order found in the psychophysical experiments of the previous paper.
I was interested by the finding that the MCDI was unassociated with mappable receptive fields and would be interested in hearing some thoughts from the authors in their Discussion section.
Can the authors clarify how they determined the p-value threshold in Figure 4D? Since the authors declare significance “if any of these p-values” falls below the FDR threshold, is the Benjamini-Hochberg correction equivalent to a Bonferroni correction? If so, then 40 comparisons would lead to a p-value correction less than that displayed. Or do the authors somehow correct for fact that their temporal smoothing effectively leads to fewer than 40 comparisons?
There were also questions about the underlying assumptions in the model. We understand that these comments pertain equally to the already-published paper, but we nevertheless hope that you will be able to deal with them in a few sentences, which we felt would help the current manuscript.
What justifies the authors to assume the regime of sampling limitation rather than transmission limitation? You write that your results fit into the efficient coding framework if sampling an image is the main limitation. What is the empirical evidence that justifies this assumption as opposed to the transmission limited regime many other studies are based on?
If I understand correctly, the fact that humans/neurons should be more sensitive to more variable features is derived using a linear model with Gaussian input and channel noise. However, the mapping from images to multi-point correlations does not seem to be linear. How do you know that this result still holds in the non-linear case, in particular if the sampling regime is characterized by dominating input noise (which would get nonlinearly transformed)?
How do you justify that more variable features contain more information? In the discrete case, I can see that. However, in the limit of infinitely many images, the multi-point correlations become continuous. In that case I could transform all features by a pointwise monotonic transformation (histogram equalization) that would not change the information content but make all features equally variable.https://doi.org/10.7554/eLife.06604.007
- Jonathan D Victor
The funder had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
This work was supported by NIH EY09314 to JV. We thank Ferenc Mechler, Ifije Ohiorhenuan, Qin Hu and Eyal Nitzany for their assistance with the physiological experiments, Mary Conte and Keith Purpura for many helpful discussions, and Ann Hermundstad for her comments on the manuscript. We also thank Daniel Thengone for his help classifying spike waveforms into narrow- and broad-spiking. A portion of this work was reported at the 2013 meeting of CoSyNe (Computational and Systems Neuroscience), Salt Lake City, UT.
Animal experimentation: This study was performed in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. All animal procedures were approved by the Institutional Animal Care and Use Committee (IACUC) of Weill Cornell Medical College, protocol 0810-795A.
- Timothy Behrens, Reviewing Editor, Oxford University, United Kingdom
© 2015, Yu et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.