1. Neuroscience
Download icon

Mouse V1 population correlates of visual detection rely on heterogeneity within neuronal response patterns

  1. Jorrit S Montijn Is a corresponding author
  2. Pieter M Goltstein
  3. Cyriel MA Pennartz Is a corresponding author
  1. University of Amsterdam, Netherlands
  2. Max Planck Institute of Neurobiology, Germany
Research Article
  • Cited 3
  • Views 1,410
  • Annotations
Cite as: eLife 2015;4:e10163 doi: 10.7554/eLife.10163

Abstract

Previous studies have demonstrated the importance of the primary sensory cortex for the detection, discrimination, and awareness of visual stimuli, but it is unknown how neuronal populations in this area process detected and undetected stimuli differently. Critical differences may reside in the mean strength of responses to visual stimuli, as reflected in bulk signals detectable in functional magnetic resonance imaging, electro-encephalogram, or magnetoencephalography studies, or may be more subtly composed of differentiated activity of individual sensory neurons. Quantifying single-cell Ca2+ responses to visual stimuli recorded with in vivo two-photon imaging, we found that visual detection correlates more strongly with population response heterogeneity rather than overall response strength. Moreover, neuronal populations showed consistencies in activation patterns across temporally spaced trials in association with hit responses, but not during nondetections. Contrary to models relying on temporally stable networks or bulk signaling, these results suggest that detection depends on transient differentiation in neuronal activity within cortical populations.

https://doi.org/10.7554/eLife.10163.001

eLife digest

Seeing is not the same as perceiving, where an object is recognized and information about it is interpreted by the brain. Things might be in your field of view, but not actively perceived; for example, when daydreaming with your eyes open. Many researchers have investigated how the brain responds differently to a perceived object compared with something that is seen but not perceived. However, using relatively coarse techniques, only small differences in brain activity have been found.

Many of the techniques used to investigate brain activity only look at the average activity of a group of neurons – the cells in the brain that process information. This raises the possibility that the perception of an object relies on more subtle or complex interactions in brain activity. To investigate this, Montijn et al. trained mice to lick a reward spout that gave out sugar water when they perceived a particular image. A technique called two-photon calcium imaging was then used to simultaneously record the activity of tens to hundreds of neurons in part of the brain called the visual cortex as the mice performed the perception task.

This revealed that the average activation of a group of neurons was only weakly related to whether a mouse had perceived the image. However, differences in the strength of the responses of the individual neurons in the group reflected perception more strongly: when a mouse perceived the image and licked in response, a heterogeneous (non-uniform) set of neuronal responses occurred. The diversity of the neuronal responses could also be used to predict how quickly a mouse would respond to an image. These activity differences would not be picked up by techniques that detect the average activity of many neurons, explaining why these effects had not previously been seen.

These findings shed light on which patterns of activity in the visual region of the brain lead to objects being perceived or not. Whether similar mechanisms operate in different regions of the brain remains to be investigated.

https://doi.org/10.7554/eLife.10163.002

Introduction

Lesion studies in humans and animals indicate the causal importance of the primary visual cortex (V1) in detection, discrimination, and awareness of visual stimuli (Lashley, 1943; Weiskrantz et al., 1974; Weiskrantz, 1996), and this role has been recently confirmed by direct optogenetic inhibition of mouse V1 (Glickfeld et al., 2013). Visual perception has been proposed to arise from interactions between stimulus-specific processing in V1 and neural activity in higher visual and frontoparietal areas, involving both feed-forward propagation of activity and recurrent, top-down feedback (Shadlen and Newsome, 1996; Britten and van Wezel, 1998; Lamme et al., 2000; Haynes et al., 2005). Critical in unraveling neural correlates of vision is how detected and undetected stimuli are processed differently, especially when these stimuli are physically identical. For instance, it has been suggested that the intensity, duration, and reproducibility of sensory neural activity may provide signatures critical for visual perception (e.g. Moutoussis and Zeki, 2002; Schurger et al., 2010). In addition, it has been proposed that neural activity in V1 does not correlate with visual perception because stimuli that were seen or not seen evoked similar V1 blood-oxygenation-level-dependent signals (Vuilleumier et al., 2001; Rees, 2000), but this remains an area of substantial controversy (Ress and Heeger, 2003; Palmer et al., 2007; Nienborg and Cumming, 2014). In this context, it is important to recall that functional magnetic resonance imaging (fMRI), electro-encephalogram (EEG), and magnetoencephalography (MEG) rely on a mean-field approach, leaving open the possibility that neural correlates of perception may be coded in more subtle ways that take into account the local differentiation present in populations of sensory neurons.

Such local, functional differentiation is supported by single- or multiunit recording studies in visual, auditory, and somatosensory areas of animals trained to make perceptual decisions (Logothetis et al., 1995; Britten et al., 1996; Posner and Gilbert, 1999; Petersen, 2002; Luna et al., 2005; Palmer et al., 2007; Mitchell et al., 2009; Cohen and Maunsell, 2009; Cohen and Maunsell, 2011; Sachidhanandam et al., 2013; Chen et al., 2013; Miyashita and Feldman, 2013; Doron et al., 2014; McGinley et al., 2015). Over the last decade, it has become clear that the shared response variability between neurons (i.e. noise correlation) might be particularly important for sensory processing because noise correlations can influence the amount of information that can be extracted from neuronal population codes (Averbeck et al., 2006; Cafaro and Rieke, 2010). Furthermore, it has been observed that these correlations can be reduced during stimulus presentation (Gutnisky and Dragoi, 2008; Snyder et al., 2014) and directed attention, which may aid in disentangling stimulus information from noisy population responses (Mitchell et al., 2009; Cohen and Maunsell, 2009; Herrero et al., 2013).

Although noise correlations have been studied well, they have the drawback of not being an instantaneous measure—their computation requires integrating neural activity over multiple time points or stimulus repetitions. Instantaneous aspects of population activity in cortex, such as temporal spike co-occurrence and population sparseness, seem critical for efficient neural coding (Olshausen and Field, 1997; Vinje, 2000; Benucci et al., 2013; Harris and Mrsic-Flogel, 2013). Some population-based measures have been proposed and tested in somatosensory and auditory cortex (Romo et al., 2003; Safaai et al., 2013; Carnevale et al., 2013; Buran et al., 2014). It has, for example, been shown that measures based on the variability and correlations between neurons correlate better with the animal’s decision than simpler approaches based on the mean spiking rate (Safaai et al., 2013; Carnevale et al., 2013). However, in the domain of visual perception the behavioral relevance of only few population measures has been experimentally tested in paradigms where animals report behaviorally whether they have seen a stimulus or not.

Therefore, we investigated correlates of visual stimulus detection using two-photon calcium imaging of populations of ~100 neurons in V1 L2/3 of mice performing a detection task as superficial layers are easy to access with calcium imaging and have been reported to show neural correlates with stimulus detection (van der Togt, 2006; Ito and Gilbert, 1999). Our first aim was to examine whether visual detection correlates with the mean visual response strength of V1 neurons or rather with other metrics of population responses, such as noise correlation or variance. This led us to develop a novel population metric—response heterogeneity—that correlates better with stimulus detection performance, and particularly with the animal’s reaction time, than traditional measures by capturing the dissimilarity of neuronal responses within a population. Second, an assumption in many computational models of vision is that neurons in distributed cortical architectures have relatively fixed roles in encoding visual features, but modulate their activation in a temporally dynamic manner based on attentional needs that can influence perception (e.g. Jones and Palmer, 1987; Itti et al., 1998; Desimone, 1998; Dayan and Abbott, 2001; Deco and Rolls, 2004; Reynolds and Heeger, 2009). To study whether modulations of neuronal activity that influence stimulus perception show temporally recurring patterns, we asked whether population activation patterns are more similar across trials that repeat the same stimulus presentation when the stimulus is successfully detected. We report that (1) visual stimulus detection does not correlate well with mean response strength, but is significantly correlated with population heterogeneity; (2) neuronal populations show consistencies in activation patterns across temporally spaced trials in association with hit responses, but not when the animal fails to report a stimulus; and (3) in addition to heterogeneity, multidimensional structures in neuronal population responses provide information on visual detection.

Results

To investigate how ensembles of primary visual cortex (V1) neurons are involved in visual detection, we trained mice to perform a go/no-go stimulus detection task (Figure 1a). After task acquisition, we performed two-photon calcium imaging in V1 contralateral to the visually stimulated eye (Figure 1—figure supplement 1, animals were awake, head-fixed, and performed a detection task where they indicated by licking whether a square-wave drifting grating was presented. Stimulus duration was delimited by the onset of the first licking response, with a maximum of 3.0 s for no response; therefore, no licks occurred during presentation of the stimulus. To acquire a sufficient range of hit/miss ratios, we presented test stimuli with different luminance contrasts: 0.5%, 2%, 8%, and 32%. These test trials were interleaved with 0% no-contrast and 100% full-contrast probe trials to estimate the animals’ ratio of false alarms and omissions due to lack of motivation. For all analyses, we discarded trials where animals responded within 150 ms after stimulus onset (0.3–3.5% of trials per animal) because such fast responses may be ascribed to spontaneous licking.

Figure 1 with 1 supplement see all
Mice perform a go/no-go task during in vivo calcium imaging.

(a) Task schematic showing the time course of a single trial. In each trial, one of a combination of eight different directions and five contrasts, or a 0% contrast probe trial (isoluminant gray blank screen) was presented (ratio 1:5 of 0%-contrast-probe:stimulus trials). When mice made a licking response during stimulus presentation, the visual stimulus was turned off and sugar water was presented. (b) Schematic of experimental setup. During task performance, we recorded eye movements with an infrared-sensitive camera, licking responses, and running on a treadmill. (c) All eight animals performed statistically significant stimulus detection during neural recordings, as quantified by non-overlapping 2.5th–97.5th Clopper–Pearson (CP) percentile confidence intervals (95% CI) (p<0.05) of behavioral response proportions for 0% and 100% contrast probe trials. (d) Example of simultaneously recorded behavioral measures, population heterogeneity, mean population dF/F0, and traces of neurons labeled in panel (b) Vertical colored bars represent stimulus presentations; width, color, and saturation represent duration, orientation, and contrast, respectively. (e, f) Animals showed significant increases in behavioral response (behav. resp.) proportion (linear regression analysis, see ‘Materials and methods’, p<0.001) (e) and reductions in reaction time (p<0.01); (f) with higher stimulus contrasts. Shaded areas show the standard error of the mean. Statistical significance: **p<0.01; ***p<0.001.

https://doi.org/10.7554/eLife.10163.003

To quantify behavioral performance during execution of the task, we calculated the 2.5th–97.5th percentile intervals [henceforth 95% confidence intervals (CIs)] of response proportions to the two types of probe trials: no-contrast and full-contrast stimuli. All eight animals (see ‘Materials and methods’) showed a significantly above-chance visual detection of square-wave drifting gratings during the acquisition of neural data (Figure 1c) (non-overlapping Clopper–Pearson 95% CIs). Behavioral response proportions increased with higher stimulus contrasts (Figure 1e) (group-level linear regression analysis, p<0.001) and mean reaction times decreased (Figure 1f) (p<0.005).

Response dissimilarity within neuronal populations correlates with detection

As a first approach to examine population correlates of visual detection, we investigated differences in mean activity levels between hit and miss trials (Figure 2). We defined each neuron’s response during a trial as the mean dF/F0 during the entire stimulus presentation (Figure 2a,b). Because hit/miss would arguably be stronger in the population of neurons that prefer the features of the visual stimulus, we started out with investigating neural correlates of detection in the preferred population. We, therefore, calculated each neuron’s preferred stimulus orientation (see ‘Materials and methods’), and for the analysis in Figure 2c,d took for each trial the responses of only neurons that preferred the presented stimulus orientation (henceforth ‘preferred population’). In Figure 2c, all trials of a single animal were grouped by stimulus contrast and behavioral response [hit/false-alarm (‘response’) or miss/correct-rejection (‘no-response’)], and the average preferred population response was calculated for hits and misses. As expected, the mean response increased with higher stimulus contrasts (Figure 2—figure supplement 1 for traces across time). However, for this animal we did not find a significant difference between hit and miss trials for any individual contrast [false-discovery rate (FDR)-corrected paired t-test, p>0.05 for all contrasts] (note that for both false alarms and correct rejections V1 mean population response was indistinguishable from zero; Figure 2c,d, 0% contrast; Figure 2—figure supplement 1a). When grouping the test contrasts (0.5–32%), the data did show a modestly higher response for hit than miss trials for single animals as well as across animals (p<0.05). We, therefore, asked whether this increase in neuronal responses during stimulus detection was due to consistent response enhancements of specific neurons or due to a population-distributed process.

Figure 2 with 1 supplement see all
The difference in neural activity between hit and miss trials can be partly explained by consistent hit-associated increases in activity of specific neurons, and somewhat better by trial-by-trial population-wide fluctuations, but these mean-based approaches fall short of being fully descriptive.

(a) Recorded traces from 10 randomly selected neurons over four subsequent trials. For further analysis, we took the mean dF/F0 per neuron over the visual stimulation period (thick colored lines) as single mean neural activity measure per trial. (b) Data of one entire recording block consisting of 74 tuned and simultaneously recorded neurons over 336 trials. Blue rectangle shows the four trials depicted in panel (a). (c) In an example animal, the detection of stimuli (green) with test contrasts (0.5–32%) correlated with a modest increase in preferred population dF/F0 over undetected stimuli (red) (two-sample t-test, p<0.05) but none of the individual contrasts reached statistical significance (resp. vs. no resp., two-sample t-tests, FDR-corrected p>0.05). (d) As (c), but for mean over all animals the graph shows a small, but consistent overall difference of dF/F0 with visual detection (test contrasts 0.5–32%, n=8 animals, p<0.05). (c, d) Shaded areas show the standard error of the mean. (e) The hit-associated increase in neural activity per neuron for all hit trials of test contrast stimuli (panel I) can be partly explained by specific neurons showing consistent dF/F0 increases or decreases across trials (panel II) partly by trial-by-trial population-wide fluctuations regardless of neuronal identity (panel III) and somewhat better by both (panel IV). (f) Control analysis by shuffling neuronal identities (IDs) per trial (n=1000 iterations, black distribution) shows that the population activity is more predictable based on consistent hit modulations per neuron (top panel) and more neurons are significantly hit-modulated (bottom panel) than can be expected by chance. (g) Analyses as in (f), but across animals; comparison versus shuffle-based R2-expectation showed above-chance (at α=0.05) predictability of hit modulations using neuron ID, trial ID or both for respectively 7/8, 8/8, and 8/8 animals (left panel). The fraction of significantly hit-modulated neurons was above chance (at α=0.05) for 7/8 animals (right panel).

https://doi.org/10.7554/eLife.10163.005

Including again also nonpreferred neurons for all further analyses (unless stated otherwise), we calculated the hit modulation (dF/F0 increase during hits relative to misses) per neuron per hit trial (see ‘Materials and methods’) (Figure 2e(I)) and investigated whether this hit modulation could be explained by a subgroup of neurons that consistently enhances its activation during detection trials, random trial-by-trial population fluctuations, or both (Figure 2e). Hit modulation was explained to a small but significant extent by neuronal identity [R2=0.059; p<0.05; Figure 2e,f], and to a larger extent by population fluctuations across trials [R2=0.248; Figure 2e(III)] or both processes together [R2=0.281; Figure 2e(IV)]. The number of consistently hit-modulated neurons (which could be either up- or downregulated) was significantly above chance (Figure 2f; p<0.05). This pattern was robust over animals as hit modulations could be explained with above-chance accuracy by neuron identity, population fluctuations, or both in 7/8, 8/8, and 8/8 animals, respectively (Figure 2g). The fraction of significantly hit-modulated neurons was above chance (at p<0.05) for 7/8 animals. Although significant for most subjects, the variance explained by the consistency of neuronal responses was fairly low (always R2<0.1), and even the combination of trial-by-trial population fluctuations and neuronal identity never exceeded R2=0.35. This could indicate either that detection-related neural correlates in V1 are minor or that a simple enhancement of mean activity is an index ill-suited to describe potentially strong, but more complex changes in neuronal population dynamics. In particular, we hypothesized that correlates of stimulus detection may be unfolding by multineuron interactions at the single trial level and rely on the relative contrast in activation between neurons.

Several metrics aim to quantify response heterogeneity within neuronal populations, such as the sparseness (Field, 1994), or variance (Seung and Sompolinsky, 1993). However, such metrics are rarely studied in the context of behavioral relevance, and in the few cases where they are, their ability to predict behavior appeared modest (Froudarakis et al., 2014). Therefore, we developed an alternative measure of population heterogeneity that aims to capture the spread in normalized population activity (Figure 3a,b; see also ‘Materials and methods’): by subtracting the z-scored response (each trial being a single data point per neuron, see Equation 2) of each neuron from that of all other neurons in that same trial, we obtained a Δz-score matrix where high values indicate high pairwise dissimilarity in neuronal activation. Taking the mean over all pairwise Δz-scores provides a measure of population heterogeneity that can in theory be computed over an arbitrarily small time interval (but note that for all analyses, except those shown in Figure 5, we used a single trial as time unit). This way, similarly strongly activated as well as similarly weakly activated pairs of neurons will decrease heterogeneity. By contrast, dissimilarly activated neuronal pairs (i.e. one strong, one weak) will increase it. Therefore, population heterogeneity incorporates both trial-by-trial fluctuations and intra-population differences in a neuronal pairwise manner. Its dependence on z-scored activity means that a neuron’s contribution to heterogeneity is scaled to its relative level of activation—and because highly active neurons are often highly variable (Baddeley et al., 1997; Montijn et al., 2014) also to its signal-to-noise ratio.

Figure 3 with 2 supplements see all
Neuronal response heterogeneity within populations correlates better with visual detection than mean preferred population (pref. pop.) activity.

(a) Neuronal activity as in Figure 2d, but presented as z-score normalized per contrast to be able to compare relative changes across contrasts. (b) Schematic representation of the method to compute heterogeneity on an example trial (see also ‘Materials and methods’). The dF/F0 response of each neuron is z-scored per contrast and the distance (absolute difference) in z-scored activity between all pairs of neurons is calculated for each trial (color-coded ΔZ-score). The population heterogeneity in a given trial is defined as the mean ΔZ-score over all neuronal pairs. (c) Population activity heterogeneity in an example animal shows a strong correlation with visual detection. Comparison between detected (resp.) and undetected (no resp.) trials for test contrasts as a group (paired t-test, p<0.001) was highly significant. (d) As (c), but showing mean over all animals (n=8). Stimulus detection correlated with higher heterogeneity; test contrast group hit–miss comparison was highly significant (p<0.001). (e) As (d), but for heterogeneity within the preferred (left panel) and within the nonpreferred (right panel) population only. Hit–miss differences were found in the preferred population (test contrast group, p<0.01) and nonpreferred population (test contrast group, p<0.01) similar to the whole population (d). (f) Using a measure of effect size analysis (Cohen’s d), heterogeneity was found to show a stronger correlation with stimulus detection than mean dF/F0 within the whole population (Cohen’s d=0.114 vs. d=0.218); within the preferred population (Cohen’s d=0.119 vs. d=0.213) and within the nonpreferred population (Cohen’s d=0.110 vs. d=0.206) [paired t-tests over animals (n=8) whole population; p<0.05, preferred population; p<0.05, nonpreferred population; p<0.01]. (g) Example receiver operating characteristic (ROC) curve showing the linear separability of single-trial hit and miss trials using population heterogeneity (see ‘Materials and methods’). The separability can be quantified by the area under the curve (AUC; blue shaded area). True positive rate: fraction of hit trials classified as hit. False positive rate: fraction of miss trials classified as hit. (h) Statistical quantification of hit/miss separability using either mean dF/F0 (black) or heterogeneity (red) across animals (n=8). Both measures predict the animal’s response above chance (FDR-corrected paired t-test dF/F0 and heterogeneity AUC vs. 0.5, p<0.05 and p<0.001, respectively) but behavior can be predicted better using heterogeneity (paired t-test, dF/F0 vs. heterogeneity AUC, p<0.01). All panels: shaded areas/error bars show the standard error of the mean. Statistical significance: *p<0.05; **p<0.01; ***p<0.001.

https://doi.org/10.7554/eLife.10163.007

We applied this metric to the activity of neurons from the entire population during hit and miss trials and found a stronger correlation with behavioral stimulus detection than for mean response strength (see Video 1 and Figure 3c for a single animal example). Test contrasts (0.5–32%) showed a highly significant overall increase in heterogeneity for hit trials (Figure 3c, paired t-test, p<0.001), but such modulations were absent for probe trials. This difference was consistent over animals (Figure 3d) and showed similar patterns for the within-preferred and within-non-preferred population heterogeneity (Figure 3—figure supplement 1). Using a measure of effect size over animals (Cohen’s d), we observed that heterogeneity showed a stronger correlation with visual detection than mean dF/F0 (Figure 3f; three paired t-tests vs. dF/F0 were all p<0.05). Linear single-trial prediction of hit or miss responses with a receiver operating characteristic (ROC) analysis on either mean dF/F0 or heterogeneity showed that behavioral responses could be predicted above chance at single-trial basis with both metrics, but heterogeneity showed a significantly higher prediction score [area under curve (AUC), t-test across animals, dF/F0 vs. 0.5; p<0.05, heterogeneity vs. 0.5; p<0.001, dF/F0 vs. heterogeneity; p<0.01] (Figure 3g,h). These results show that correlates of visual detection are better captured by the strength of pairwise response dissimilarities within the neuronal population than to overall increases in mean activation (but note that correlation-based measures also work well for hit–miss differentiation; Figure 3—figure supplement 2).

Video 1
Typical raw data example showing the first 10,000 recorded frames from animal 5.

The left-hand side shows xy-corrected, but otherwise unaltered, raw fluorescence data. The legend above this raw data video shows the time after start of the experiment in seconds and the number of acquired frames. The top two panels on the right-hand side show (left) a depiction of the stimulation screen (gray isoluminant background or oriented drifting grating during stimulus presentation), and (right) whether the mouse is making a licking response. The two panels below show a live updated summary of mean dF/F0 (left) and heterogeneity (right) during each trial. Green indicates a licking response, and red indicates no response. The two lower panels show a live trace of mean population dF/F0 and heterogeneity. Licking responses are shown as red dotted lines, and stimulus presentations are shown as a gray shaded area. Note that the recording is very stable, except during periods of heavy licking, such as after hit responses, when reward is delivered. Also note that neural data acquired during licking are not used for any of our analyses and do, therefore, not influence our results. The mouse is licking vigorously during the initial period of the recording, but more typical behavior sets in less than 2 min after start of the recording. Near the end of the video, it can be seen that hits and misses are more easily separable using heterogeneity than dF/F0 (although this difference is stronger in the example video than in the entire data set as a whole).

https://doi.org/10.7554/eLife.10163.010

Heterogeneity predicts reaction time

Our observations suggest that not a general gain increase in population activity, but rather more complex changes in response strengths within a population determine the behavioral accuracy. Behavioral reaction time is often used as a proxy for salience, attention, and readiness (Beck et al., 2008), so we hypothesized that similar dissociations for fast/slow responses may be found as for hit/miss trials. We performed linear regressions per animal for dF/F0 (Figure 4a) and heterogeneity (Figure 4d) as a function of reaction time. Similarly to hit/miss differences, the preferred population dF/F0 was not significantly associated with behavioral performance (regression slopes vs. 0, FDR-corrected one-sample t-test, n.s.), nor were preferred population z-scored activity, variance, sparseness, instantaneous Pearson-like correlations (see ‘Materials and methods’), whole-population (raw and z-scored) dF/F0, and sliding-window based correlations (Figure 4—figure supplement 1). However, heterogeneity and the spread in instantaneous Pearson-like correlations were inversely correlated with reaction time (p<0.001, p<0.01, respectively) and explained significantly more reaction-time-dependent variance in the data than all other measures (FDR-corrected pairwise t-tests, heterogeneity vs. all, p<0.05). This relationship holds when analyzed over animals (Figure 4c) as well as per individual animal (Figure 4—figure supplement 1).

Figure 4 with 2 supplements see all
Heterogeneity is correlated with reaction time.

(a) Reaction time shows no correlation with mean preferred population dF/F0 activity during stimulus presentation for any individual animal (left panel; example animal) nor for the meta-analysis over animals (right panel; FDR-corrected one sample t-test over individual regression slopes per animal, n=8, n.s.). (b) Same as (a) but for heterogeneity. Heterogeneity shows a strong correlation with reaction time (left panel; example animal, p<0.001) as well as well for the meta-analysis over animals (p<0.001). (a, b) Note different y-axis scaling per panel for display purposes. (c) Comparison of the explained variance of several neural metrics. Only heterogeneity (FDR-corrected one sample t-test, p<0.001) and spread (SD) in instantaneous Pearson-like correlation (see ‘Materials and methods’) (p<0.01) correlate significantly with reaction time; all other metrics do not [preferred-population (Pref. P.) dF/F0, preferred-population (P.P.) z-scored dF/F0, variance, sparseness (kurtosis) mean instantaneous Pearson-like correlation, whole-population (z-scored) dF/F0, mean and SD of sliding window correlation (width 1.0 s); all n.s.]. Heterogeneity explains more reaction-time-dependent variance than any other metric (FDR-corrected paired t-tests, all p<0.05). (d) Decoding of stimulus presence shows similar accuracy as actual behavioral performance by the animals (Figure 1e). When the animal has detected the stimulus (resp.; green line), the decoder is better able to correctly judge its presence (a value of 1 indicates perfect performance, paired t-test, p<0.001). Shaded areas show the standard error of the mean. (e) Behavioral detection performance is more similar (sim.) to the optimal decoder’s performance than expected by chance (paired t-test, n=8 animals, shuffled vs. real similarity, p<0.001). Gray: single animal; blue: mean across animals. All panels: error bars/shaded regions show standard error of the mean. Statistical significance: *p<0.05; ***p<0.01; ***p<0.001.

https://doi.org/10.7554/eLife.10163.011

Our definition of heterogeneity is computationally somewhat similar to the width of the distribution of pairwise neuronal correlations in a population (see ‘Materials and methods’). However, whereas the spread in instantaneous Pearson-like correlations is based on multiplying z-scored pairwise neuronal responses and taking their standard deviation, the heterogeneity metric (which instead uses the mean absolute distance in z-scored dF/F0 between pairs of neurons) was an even better predictor of behavioral reaction times (Figure 4c, p<0.05). As such, it is more closely related to the population mean of nondirectional neuron-pairwise Mahalanobis (i.e. normalized Euclidian) distances than Pearson’s correlations. Our analysis shows that visual detection correlates well with large mean Mahalanobis distances in neural activity between pairs of neurons; that is, a high heterogeneity in population activity.

Nonetheless, it could be argued that changes in population activity might be uncorrelated with the fidelity with which the population code represents visual stimuli. To address this, we used a Bayesian maximum-likelihood decoder to assess the presence of a stimulus from V1 population activity (see also ‘Materials and methods’; Montijn et al., 2014). Decoding performance was higher for behaviorally correct detection trials (Figure 4d; hit vs. miss, p<0.05), and the performance was similar to the animals’ actual behavioral performance at a global level across contrasts (shuffled vs. nonshuffled, p<0.001; Figure 4e), as well as at a single-trial level (chi-square similarity analysis of hit/miss trials for behavioral response and stimulus presence decoding, χ2=135.36, p<10−30; Figure 4—figure supplement 2). Moreover, additional analyses revealed that stimulus features (orientation, contrast) were better decodable when the animal made a correct detection (Figure 4—figure supplement 2a) and when heterogeneity was high (Figure 4—figure supplement 2b–d). Thus, stimulus features, such as orientation, are represented more accurately by neuronal populations in V1 during hit trials, even though the specific orientation was irrelevant for the animal to perform the visual stimulus detection task.

During higher levels of arousal, it has been observed that neuronal activation is more desynchronized (Cohen and Maunsell, 2009; Froudarakis et al., 2014). Based on our current observations, this led us to hypothesize that a high heterogeneity in V1 populations reflects a brain state conducive to stimulus detection. If correct, heterogeneity immediately prior to stimulus presentation should be predictive of reaction time. To test this, we split all hit trials into the slowest 50% and fastest 50% per contrast (e.g. see Figure 5a–d) and calculated a measure of predictability of slow versus fast responses based on the 3 s preceding stimulus presentation (Figure 5—figure supplement 1). Using pre-stimulus-onset heterogeneity, fast response trials were highly predictable (FDR-corrected one-sample t-tests, p<0.01), while slow versus miss trials were not (p=0.799). Behavioral responses were not predictable based on population dF/F0 (slow-miss, p=0.157; slow-fast, p=0.811; fast-miss, p=0.924), and the difference in predictability between heterogeneity and dF/F0 was significant for slow-fast (p<0.01) and fast-miss (p<0.05), but not for slow-miss (p=0.477) trials.

Figure 5 with 1 supplement see all
Heterogeneity preceding stimulus onset predicts behavioral reaction time.

(a, b) Example traces from one example animal for population dF/F0 (a) and heterogeneity (b) of fast (F, green), slow (S, purple), and miss (M, red) responses (mean ± standard error over trials) also showing mean stimulus offsets. (c, d) As (a, b) but showing mean ± standard error over animals. (e) Fast behavioral responses are predictable before stimulus onset using heterogeneity [FDR-corrected one-sample t-tests vs. chance level (0); S–M, p=0.799; F–S, p<0.001; F–M, p<0.01] but not using dF/F0 (FDR-corrected one-sample t-tests vs. chance level; S–M, p=0.157; F–S, p=0.811; F–M, p=0.924; FDR-corrected paired t-test for heterogeneity vs. dF/F0; S–M, p=0.477; S–F, p<0.01; F–M, p<0.05). (f) The population heterogeneity during stimulus presentation does not merely reflect a continuation of pre-stimulus neural state; detected stimuli (slow and fast responses) elicit a faster rise to the maximum heterogeneity level than undetected stimuli (miss trials) (paired t-tests, n=8 animals, p<0.05). Slow and fast responses do not differ significantly (p>0.05). All panels: error bars/shaded areas indicate standard error of the mean. Statistical significance: *p<0.05; **p<0.01; ***p<0.001.

https://doi.org/10.7554/eLife.10163.014

Although heterogeneity before stimulus onset thus predicts behavior, we also found a dissociation between detected (slow and fast responses) and undetected stimuli (miss trials) in the rise time latency to maximum heterogeneity upon stimulus onset (p<0.05). Detected trials correlated with fast rise times, while neuronal response heterogeneity to undetected stimuli ramped up much more slowly (Figure 5f). This argues against the interpretation that heterogeneity merely reflects a tonic brain state that can be fully gauged before stimulus onset. The formation of nonhomogenous response patterns within neuronal populations is also related to the actual detection of visual stimuli and constitutes a second effect in addition to background heterogeneity.

Temporal consistency of the population code

So far, we have mainly addressed static differences in population activity structure correlating with behavioral responses. However, population codes can show complex temporal properties, such as transient formation of assemblies (Miller et al., 2014; Harris and Mrsic-Flogel, 2013). After confirming the stability of our recordings to avoid potential confounds (Figure 1—figure supplement 1), we addressed whether such temporal population structures might offer additional insight in neural mechanisms of visual detection. We again split the data into miss, fast, and slow response trials, and computed the correlations between response patterns from different trials separately for preferred and nonpreferred neuronal populations (Figure 6a,b). Note that this analysis is not sensitive to potential nonstationary effects that might create artificial differences because all stimulus types and behavioral responses are intermingled in time.

Consistency in population activation patterns across trials is increased during hit trials (fast and slow) compared to miss trials.

(a) Data from example animal showing inter-trial correlations (Pearson’s r) between population responses to same-orientation stimuli (pooled over test contrasts only). (b) Data from same animal as in panel (a), showing population higher activity pattern consistency (mean ± standard error over trial pairs) for fast and slow response trials than miss trials within both the preferred and nonpreferred population. Colored lines show real data, and black lines show shuffled data (see text and ‘Materials and methods’). (c, d) Inter-trial correlations (mean ± standard error over all animals) as quantification of population activation pattern consistency are significantly higher during fast trials (with a trend for slow trials) than during miss trials within the preferred as well as the nonpreferred neuronal population, suggesting that visual stimulus detection is correlated with the occurrence of more stereotyped population responses (FDR-corrected paired t-tests, preferred population; miss-slow, p=0.081; miss-fast, p<0.05; slow-fast, n.s.; nonpreferred population; miss-slow, p<0.05; miss-fast, p<0.05; slow-fast, n.s.). Comparison with correlations of shuffled data yielded similar results (both preferred and nonpreferred populations; paired t-test real vs. shuffled: miss; p>0.2, slow and fast; p<0.05). Error bars indicate standard error. Statistical significance: * p<0.05; 0.05<p<0.1.

https://doi.org/10.7554/eLife.10163.016

Within the preferred population, as well as within the nonpreferred population, we found that neuronal population activity patterns were more similar during fast trials, with a trend for slow trials, than during miss trials (preferred population, miss-slow; p=0.081, miss-fast; p<0.05, nonpreferred population, miss-slow; p<0.05, miss-fast; p<0.05) (Figure 6c,d). To rule out that this effect might arise from biases in the analysis, we also compared these population pattern consistencies to those obtained from a shuffling procedure within stimulus types (see ‘Materials and methods’). Similarly, population pattern correlations were significantly higher than shuffled for fast and slow trials, while miss trial consistency was not statistically different from the shuffled control (both preferred and nonpreferred populations, paired t-tests shuffled vs. real, miss; p>0.3, slow and fast; p<0.05). However, note that these pattern consistencies are relatively low: they cannot fully account for the population activity structure and must therefore be interpreted as happening against a background of dynamic population activity.

Analysis of heterogeneity in multidimensional space

Most of our results so far have focused on the experimentally observed differences in hit-miss and reaction-time-dependent effect sizes between heterogeneity and mean population activity, but have not addressed the question how this metric might be interpreted within a theoretical framework. Although heterogeneity correlates better with behavioral responses, and especially with reaction time, than many other metrics, this does not exclude that heterogeneity might be an epiphenomenon. We will address this issue next (see also ‘Materials and methods’, section ‘Analysis of multidimensional inter-trial distance in neural activity’) using an alternative definition of heterogeneity extended to multidimensional space. This alternative definition is required to study multidimensional properties of population responses, but yields a very similar correlation with stimulus detection as our pairwise definition of heterogeneity (Figure 3—figure supplement 2).

Neuronal population activity during any time epoch can be visualized as a single point in multidimensional neural space. For instance, the mean output in spikes per second of a ‘population’ of two neurons will always be somewhere within a two-dimensional space bounded by the minimal and maximal neural activity of these two neurons (Figure 7a). Within a normalized version of this space, a change in mean neural activity will always be parallel to the main diagonal that crosses the origin (minimal neural activity of both neurons) and the point of maximum activity for both neurons. This is true for populations consisting of two neurons, but can be readily extended to any number of dimensions (Figure 7d). Heterogeneity, on the other hand, does not change when a point moves across this diagonal (the difference will be zero regardless of whether all neurons are firing at 0 spikes per second, or their maximum), but rather changes as a point moves orthogonally to this diagonal (Figure 7c).

Conceptual interpretation of heterogeneity as neuronal population coding phenomenon.

(a–d) The mean population activity during a certain time epoch can be visualized a single point in multidimensional neural space, where every axis represents the activity of a single neuron. (a) For an example population of two neurons, the main diagonal (arrow) represents the line along which the mean population activity changes. Orthogonal to this line is the gradient along which heterogeneity changes, representing the distance of each point to the main diagonal. The effects of heterogeneity on hit/miss differentiation as reported in this study could be epiphenomenal if the real underlying differentiation depends on localized, segregated clusters of neural activity for hits (green cloud) and misses (red cloud). (b) This principle can be extended to multidimensional space; segregated clusters activity will show asymmetrical distributions of population activity around the diagonal. (c, d) Alternatively, heterogeneity itself could represent a fundamental characteristic of hit/miss differences; in this case, population responses should be distributed symmetrically around the diagonal (see text for more explanation). (e) Calculating the pairwise inter-point distance (each point being the population activity during a single trial) can reveal information about the underlying multidimensional structure of neuronal population activity. Green: distribution of inter-point distances for hit trials; red: same for miss trials. (f) Population responses during hit trials are distributed within a larger volume of neural space, as shown by the on average larger inter-point distance for hits than misses [paired t-test, difference in center of mass (d(CoM) hit vs. miss, p<0.05]. Mirroring point across the diagonal to assess symmetry shows a small, but significant asymmetry for hits and miss (both p<0.05) and larger asymmetry for hits than misses (p<0.05). This suggests that neuronal populations during hit trials show more structured behavior in a more extended neural space than during miss trials. (g) Schematic representation of how the mean, heterogeneity, or both can be removed from population responses to assess the effect they have on hit/miss separability (see also text and ‘Materials and methods’). (h) Removing heterogeneity impairs hit/miss decoding more than removing the mean (paired t-test, p<0.05) but in all cases (including removing both) the hit/miss separability is still well above chance (0.5). This suggests that heterogeneity is more important than population mean activity for differentiating stimulus detection from non-detection, but that other more complex neural phenomena account for most of the population response structure. All panels: error bars indicate standard error. Statistical significance: *p<0.05.

https://doi.org/10.7554/eLife.10163.017

The observation we present in our current study—viz. that heterogeneity correlates with hit responses—can be explained by two mutually exclusive hypotheses: (1) the basis for hit and miss-related responses in V1 resides in specific regions in multidimensional neural response space (i.e. discrete states in the neural circuit) and therefore heterogeneity is an epiphenomenon (Figure 7a,b), or (2) neuronal response heterogeneity per se is important for stimulus detection. The latter implies that neuronal population response patterns during hit and miss trials should be distributed symmetrically around the main diagonal (which is the gradient along which the mean changes, as well as the axis where heterogeneity is zero). This is because regardless of the specific location in multidimensional space, heterogeneity only captures the distance to this diagonal; rotation or mirroring around this diagonal, therefore, does not change heterogeneity, but does in fact change the distribution in multidimensional space (Figure 7c,d).

Accurately quantifying a distribution’s shape in multidimensional space requires exponentially more data points as the number of dimensions increases. For direct quantification, our current data set is unfortunately insufficiently large. Therefore, in order to estimate multidimensional symmetry around the main diagonal, we decided to study the effect of mirroring points across this diagonal. First, we calculated the distribution of pairwise inter-point distances in neural response space without mirroring (Figure 7e). In this case, each point again represents the population response during a single trial. The data show that the inter-point distance was slightly, but significantly larger for hit than miss trials (paired t-test, p<0.05) (Figure 7f). This indicates that population responses during hits encompass a larger volume of neural response space than during misses, which will increase heterogeneity values and allows more information to be encoded with the same number of neurons.

To assess symmetry specifically, we mirrored each trial one at a time across the main diagonal and recomputed in each case the distribution of pairwise inter-point distances. If the population responses are distributed asymmetrically around the diagonal, then mirroring will increase the pairwise distance, while if they are distributed symmetrically no change should be observed. There was a small but significant increase in inter-point distance for both hits and misses, and mirroring increased the inter-point distance more for hits than misses (p<0.05; Figure 7f, inset). Although the effect sizes are small, they are significant, and we can therefore—at least based on this analysis—not (yet) conclude that heterogeneity is more than an epiphenomenon. In fact, the difference between hits and misses suggests that population responses during hit trials are more asymmetrical (i.e. more clustered in discrete states of neural activity) than during miss trials (Figure 7f, inset). Considering that inter-trial population pattern consistencies were lower during miss trials (Figure 6), we can conclude that neuronal populations during miss trials show more random behavior within a limited neural space, while neuronal populations during hit trials show more structured behavior in a more extended neural space.

Noting the small effect size of the previous analysis, we asked whether the removal of the mean of the neuronal population response was as detrimental to encoding hit/miss differences as the removal of heterogeneity. Again visualizing population responses as points in neural space, one can remove any differences in mean response between trials by projecting all population responses to a plane orthogonal to the diagonal or remove any differences in heterogeneity by projecting all points onto a manifold at a fixed distance from the diagonal (see ‘Materials and methods’ and Figure 7g). To test the effect of these removals on hit/miss differences, we performed a decoding procedure of hit versus miss trials (i.e. we decoded the animal’s response) on the original data, and on data with the mean, the heterogeneity or both aspects removed. The results show no difference between the original data and the mean-removed data, but removing heterogeneity (or both heterogeneity and the mean) led to a small but significant decrease in decoding performance (heterogeneity-removed vs. original data, p<0.05; heterogeneity-removed vs. mean-removed, p<0.05) (Figure 7h). However, even with heterogeneity removed, decoding performance was still well above chance (63% correct for original data, 60% correct for both removed; 50% is chance).

We, therefore, conclude that heterogeneity contributes significantly as a non-epiphenomal population property to the differentiation in neural responses between visual detections and failures to detect a stimulus, but also that most information resides in neuronal population response patterns other than its mean or heterogeneity. Moreover, the mean response of a neuronal population is less important than its heterogeneity. While neural response heterogeneity may be an important factor and useful metric for its strong correlation with especially reaction times, further research is required to discover which other neural properties may be important for visual stimulus detection.

Discussion

We found that behavioral stimulus detection correlates more with nonlinear neuronal population activation patterns, such as heterogeneity, correlations, and variance, rather than overall response strength in L2/3 of mouse V1 (Figures 2 and 3; Figure 2—figure supplement 1, Figure 3—figure supplement 1, Figure 3—figure supplement 2). Using a novel measure of population heterogeneity, we show that the differentiation in activation within these populations predicts visual detection, and particularly behavioral reaction time, and is associated with an increased accuracy of stimulus presence and feature representation by the population (Figure 4; Figure 4—figure supplement 2). High heterogeneity prior to stimuli correlated with fast hit responses, but also showed a dissociation between detection and nondetection behavior, indicating that detection-related population activity may be gated by arousal mechanisms (Figure 5; Figure 5—figure supplement 1). Neuronal population activation patterns are more similar during accurate task performance upon repetition of the same stimulus, but not when the animal fails to respond, suggesting that specific population patterns may recur when the animal is well engaged in the task (Figure 6). Taken together, these results suggest that neural processing of information related to detection behavior depends on transient differentiation in neuronal activity within cortical populations rather than on temporally stable ensembles or on gain modulation of population activity as a whole (Figure 7).

Potential confounds

Our analyses show differences between hit and miss trials that we interpret as being related to perception and visual processing. However, in principle, the observed differences could be subject to a number of confounds that might limit their interpretability. First, the relatively mild water restriction led to a behavioral performance at the 100% probe trials that is lower compared to other studies using similar tasks (Glickfeld et al., 2013). This could mean that the observed differences in heterogeneity between hit and miss trials are due to changes in motivation rather than visual detection. If this were the case, however, hit–miss differences should be as large during 100% probe trials as during test contrast trials. Figure 3 and Figure 3—figure supplement 1f,2 shows that this is not the case; during intermediate test contrasts (2% and 8%), the hit–miss difference in heterogeneity is largest. A similar reasoning applies to 0% contrast probe trials, where a heterogeneity difference between (false alarm) responses and correct rejections was lacking, which strongly argues against heterogeneity being due to response emissions per se. Thus, it is highly unlikely that the neural correlates of behavioral responses as reported in this study are due to differences in motivation. Given this caveat on suboptimal performance, one may predict that a better behavioral performance would have only increased the hit–miss effect sizes we report in this study.

Other potential confounds include instabilities in z-plane focus, other locomotor-related artifacts and running-induced modulations, as it has been reported that behavioral activity and running can induce instabilities in the plane of focus with awake two-photon calcium recordings, as well as changes in the neuronal responses in mouse V1 (Dombeck et al., 2007; Niell and Stryker, 2010; Saleem et al., 2013). To address potential z-shifts during the acquisition of neural data, we compared each imaging frame to 3D anatomical z-stacks acquired after recording the neural data (see ‘Materials and methods’). Slight changes in z-location were detected after the onset of hit responses by licking, but these were mostly confined to the reward presentation period (which was not used for any of our analyses) and rarely exceeded more than a couple of microns (Figure 1—figure supplement 1). Moreover, exclusion of trials where mice were running did not qualitatively change hit–miss differences in neuronal activity (Figure 2—figure supplement 1g,h), nor did using only the first 400 ms after stimulus onset to avoid licking-preparation feedback from other brain regions (Figure 2—figure supplement 1i, j).

To control for potential confounds related to eye movements and blinking, we analyzed eye-tracking videos to detect blinks and saccades. After removing all trials where the animals were making saccades or were blinking, we again found no qualitative difference with the results we observed previously (Figure 2—figure supplement 1k,l), but did observe a small but significant correlation between pupil size and heterogeneity, suggesting higher heterogeneity with increased arousal states (Figure 4—figure supplement 1e). Overall, we conclude that the neural correlates we report here, and interpret as related to perception, are most likely not due to recording instability, changes in motivation, locomotion, motor-related signals associated with licking, or eye movements and blinking.

Possible cortical layer specificity of poor correlation of mean population responses

Importantly, our results only pertain to L2/3 of the primary visual cortex in mouse, which does not exclude the possibility that the mean population response of, for example, deeper layers (L5) in V1 would correlate better with visual stimulus detection. Previous research has shown that extensive differences exist between superficial and deep layers: whereas L2/3 neurons often show relatively low peak firing rates and sparse responses to sensory stimulation, L5 neurons show denser response patterns with on average higher peak firing rates (De Kock and Sakmann, 2008; Harris and Mrsic-Flogel, 2013). In somatosensory cortex, it has been shown that hits and correct rejections in a go-no-go object localization task can be better separated using mean spiking rates in L5 than in L2/3 (O'Connor et al., 2010). Our result that L2/3 populations show only a small differentiation between hit and miss responses in mean activation should therefore not be taken as proof for a canonical principle also applicable to other cortical layers. Future validation of our results in deep layers is necessary for a decisive answer whether our results are indeed applicable to different layers of primary sensory cortex.

Comparison with other studies and neural interpretation of heterogeneity

Our task design included drifting gratings with different orientations, with the qualification that orientation was task-irrelevant for the mice, as they were only required to detect stimuli whenever they appeared. Our observation that stimulus features are represented more accurately (as quantified by decoding accuracy) during hit than miss trials may therefore be somewhat surprising (Figure 4—figure supplement 2a). This suggests that mechanisms that increase the likelihood of stimulus detection may be acting through a general enhancement of stimulus processing intensity, corroborating previous research in monkey showing that attention can lead to horizontal shifts in contrast response curves, as if the stimulus were of higher contrast (Martı́nez-Trujillo and Treue, 2002). It is interesting to ask whether our results on heterogeneity can be cast in terms of dynamic range effects. Neurons are expected to climb in this dynamic range when visual contrast increases, which is confirmed by the rise in dF/F0 (Figure 2d). However, if heterogeneity would be primarily determined by neurons being able to operate along the steep slope of their dynamic range, then the large difference in heterogeneity between hits and misses (Figure 3c) along the test contrasts (0.5–32%) would not be expected.

Of further interest is to compare our results on heterogeneity with studies reporting that sparseness in L2/3 populations of rodent V1 is high during passive viewing (Barth and Poulet, 2012) depends on cortical state and improves neural discriminability during passive processing of natural scenes (Froudarakis et al., 2014). Although in our analysis sparseness and variance explained more behavioral variability in reaction time than (z-scored) mean population activity (Figure 4c), these measures perform much worse than heterogeneity and the spread of instantaneous Pearson-like correlations. Possibly, a sample of ~60–70 tuned neurons is insufficient to estimate instantaneous sparseness accurately. An alternative explanation for this poor correlation could be that sparseness of L2/3 populations results from anatomical wiring required for efficient stimulus coding and to enable locally selective synaptic plasticity without immediately changing the coding of stimulus features within the population response (Rao and Ballard, 1999). Correlates of visual detection that depend on accurate stimulus feature representation might then be better captured by a maximization of Mahalanobis distances in neural activity between pairs of neurons within this already sparse network. This latter interpretation is in line with the data we recorded and suggests that sparse stimulus representation by L2/3 neurons reflects a structural optimization of the population code to represent stimulus features, while heterogeneity captures more temporally dynamic modulations related to perception.

Multidimensional analysis: heterogeneity contributes to but does not fully account for visual detection

In addition to our approach based on pairwise relations in neuronal responses, we investigated multidimensional patterns of population activity (Figure 7). These results indicate that, while heterogeneity is more important for separating stimulus detections from nondetection in neural response space than the population mean, these properties combined still cannot capture the full set of neuronal response characteristics that define the accurate detection of visual stimuli in L2/3 of mouse V1. This suggests that other patterns of population activity, such as potentially transient assembly formation, may be important for visual stimuli to be correctly detected. From our multidimensional analyses, we can conclude that simple bulk approaches (i.e. correlating a population’s mean response with behavioral output) are insufficient when one aims to address how early sensory cortex areas are involved in the processing and detection of visual stimuli.

Related to these findings is our observation of behavioral-state-specific consistencies in population activation patterns across trials. This provides some constraints on how population heterogeneity is modulated at a neurophysiological level. Neuromodulators such as acetylcholine (ACh) and noradrenaline are correlated with attention and arousal, and may influence cortical population dynamics (Metherate et al., 1992; Coull et al., 2004; Pinto et al., 2013), such that they facilitate repeated activation of similar subnetworks of neurons within a population responding to the same stimulus. Without such neuromodulators, neurons within the same preferred population would randomly participate in representing the current stimulus. This interpretation is compatible with previous work; for instance, ACh has been observed to influence burst spiking, membrane potential fluctuations, cortical oscillations, and desynchronization. These processes have been implicated in modulating competitive inhibition effects within neuronal populations and may very well influence the consistency of specific neuronal subnetworks being activated (Borgers et al., 2008; Fries, 2009; Bosman et al., 2014). If heterogeneity in a recurrently connected V1 population is in part determined by suppression of the most weakly by the most strongly stimulus-driven neurons, then behaviorally correlated heterogeneity enhancements may be another facet of arousal as well as perception-related modulations of stimulus-evoked population activity.

Population coding phenomena have long been hypothesized to be important for sensory processing, but so far few studies have investigated their relevance for perceptual decisions. Here, we show that population heterogeneity is correlated with behavioral stimulus detection and that it predicts correct behavioral performance. Our results imply that neurophysiological measures dependent on population averages (i.e. multiunit activity, EEG, and fMRI) may underestimate the correlation between visual detection and V1 L2/3 activity because the assumption of population response homogeneity is violated especially during active processing of visual information. In short, our results support contrast-sensitive changes in mean population activity during visual task performance (Figure 3c,d), but stress the importance of population recordings with single-cell resolution (Figure 4c–f).

Materials and methods

Animals and surgery

All experimental procedures were conducted with approval of the animal ethics committee of the University of Amsterdam (cf. Goltstein et al., 2013; Montijn et al., 2014). Experiments were performed on eight adult, male wild-type C57BL/6 mice (Harlan), 128–164 days old at the day of calcium imaging (29.1–32.7 g). Prior to the imaging experiment, all animals were surgically fitted with a head-bar implant and trained head-fixed for up to 3 months to perform a visual go/no-go detection task. At the day of the imaging experiment, we performed intrinsic signal imaging to define the area corresponding to the retinotopic region in V1 responsive to the visual stimulus. We performed a small (1.5–2.0 mm) craniotomy at that location and used multicell bolus loading with Oregon Green BAPTA-1 AM to record calcium transients and sulforhodamine-101 (SR101) to label astrocytes (Stosiek et al., 2003; Nimmerjahn et al., 2004).

Behavioral training

Mice were trained 5 days per week, each for approximately 45 min per day, on a head-fixed go/no-go visual detection task over a period of 10–12 weeks, where we aimed to get sufficient hit as well as miss trials for test contrasts. Mice were water-deprived for 6 h preceding training and otherwise had ad libitum access to water. Weight was monitored three times per week and never dropped below 90% of their nonrestricted growth curve. Behavioral training was performed inside four dark, sound attenuated chambers and occurred during the active (dark) cycle of the animals; each animal was always trained in the same behavioral setup. We did not observe any deviant learning effects associated with any specific behavioral setup (data not shown). During the first five days of training, we conditioned licking in response to visual stimulation by pairing passive stimulation with reward delivery (~9 µl of water with 15% sucrose with 1% vanilla extract) (stage 1). After the conditioning phase, visual stimuli (100% contrast drifting gratings as described in the previous paragraph) were presented indefinitely until mice made a licking response that was monitored using a custom-built infrared LED-based lick detector. When animals made a response, the visual stimulus presentation terminated and reward was available for 5 s. This shaping phase (stage 2) lasted for a maximum of 5 days or less if the animals were often making clear lick responses. After this ~2-week initial phase, we started training the animals on a simple version of the final task (stage 3); maximum stimulus presentation was reduced to 5 s and subsequent trials would only start if the mice did not make any lick responses for at least a random interval of 1–3 s. During this stage, reward size was gradually reduced to ~3 µl per trial. When animals would consistently perform at least 80 trials within a period of 45 min, they would be moved to the next stage. In stage 4, we introduced 0% contrast probe trials to monitor the behavioral performance of animals by testing for false-alarm responses and calculating if they showed statistically significant above-chance performance. In this stage, we also lengthened the inter-trial interval to any random duration between 6 and 8 s. Once mice attained a sufficient ratio of hit/miss trials, we moved them to training stage 5, where we increased the inter-trial interval to 10–12 seconds and presented mild air puffs as a negative reinforcer whenever mice would lick outside the stimulus presentation or reward delivery period. At this stage, animals were required to not lick for a random interval of 1–3 s in order to gain access to the next stimulus presentation. Stage 5 lasted until the mice had been trained for 8–10 weeks in total. Finally, if mice performed consistently and significantly above chance during stage 5 (n = 12 / 21 animals), then in the 2-week period preceding the imaging experiment mice were trained on the microscope setup, and our setup’s resonant mirrors were activated to produce the characteristic 8000 Hz sound that would also be present during calcium imaging. In this final stage, all possible efforts were made to simulate surroundings of the eventual calcium imaging experiment as closely as possible to habituate the mice to the two-photon laser lab’s environment. Mice were always allowed to take up to 3 s after stimulus onset to respond and were thus not explicitly trained to make fast behavioral responses.

Craniotomy and dye injection

On the day of the two-photon calcium imaging experiment, buprenorphine (0.05 mg/kg) was injected subcutaneously 30–60 min before induction of anesthesia with isoflurane (4.0% induction, 0.8% maintenance during intrinsic signal imaging, 1.5–2.5% maintenance during invasive surgical procedures). After induction, the animal was placed in a custom-built head-bar holder designed for performing surgical procedures. We removed the cover glass, silicon elastomer, and layer of glue covering the skull in the cranial window before performing intrinsic signal imaging to localize the precise location of our stimulus’ receptive field location in the primary visual cortex (V1). We subsequently performed a small (1.5–2 mm) craniotomy above the retinotopic area responding to visual stimulation with drifting gratings. After the craniotomy, the dura was kept wet with an artificial cerebrospinal fluid (ACSF: NaCl 125 mM, KCl 5.0 mM, MgSO4 * 7 H2O 2.0 mM, NaH2PO4 2.0 mM, CaCl2 * 2 H2O 2.5 mM, glucose 10 mM) buffered with HEPES (10 mM, adjusted to pH 7.4). After making the craniotomy, multicell bolus loading with Oregon Green BAPTA-1 AM (OGB) and SR101 was performed 230-270 µm below the dura as previously described in Montijn et al., 2014 and Goltstein et al., 2013. After injection of the dyes, the exposed dura was covered with agarose (1.5% in ACSF) and sealed with a circular cover glass that was fixed to the skull using cyanoacrylate glue. The animal was allowed to recover for a minimum of 90 min before starting the behavioral task and two-photon calcium imaging. Of the 12 mice that learned the task, 2 animals were rejected due to insufficient imaging quality.

Visual stimulation

All visual stimulation was performed on a 15 in. TFT screen with a refresh rate of 60 Hz positioned at 16 cm from the mouse’s eye, which was controlled by MATLAB using the PsychToolbox extension (Brainard, 1997; Pelli, 1997). Stimuli consisted of sequences of eight different directions of square-wave drifting gratings that were monocularly presented in randomized order. Visual stimulus duration started at infinite during the initial training phase and was gradually reduced to a maximum duration of 3 s for the final task stage. Stimuli were alternated by a blank inter-trial interval of variable duration (random minimum of 10–12 s) during which an isoluminant gray screen was presented. Visual drifting gratings (diameter 60 retinal degrees, spatial frequency 0.05 cycles/°, temporal frequency 1 Hz) were presented within a circular cosine-ramped window to avoid edge effects at the border of the circular window. A field-programmable gate array (OpalKelly XEM6001, Opal Kelly Incorporated, Portland, OR) was connected to the microscope and behavioral setup and interfaced with the visual stimulus presentation computer to synchronize the timing of visual stimulation with the microscope frame acquisition and behavioral setups.

Z-drift quantification and recording stability analysis

Slow z-drifts were quantified by comparing the similarity of 100 frames in the beginning, middle and end of each stimulus repetition set to slices recorded at different cortical depths (step size ~1–2 µm) before or after functional calcium imaging was performed for five of eight animals. If z-drifts larger than 10 µm occurred slowly over multiple repetition blocks, or if slow z-drift was detected manually, the entire recording of a single animal was split into multiple analysis periods (n=2 populations for animals 1 and 7; n=1 population for all other animals) and analyzed independently (Figure 1—figure supplement 1). For the two animals for which we split the recordings, we afterwards averaged all measures over the two populations, yielding a single independent data point also for these animals for each measure.

To confirm the stability of our recordings, we performed a further analysis quantifying the discriminability of neurons relative to their surroundings over time (Figure 1—figure supplement 1). Therefore, we calculated during each imaging frame the mean fluorescence of the pixels within the neuron’s soma (Fsoma) and the fluorescence of a neuropil annulus surrounding the soma (Fneuropil), which we defined as all pixels within a concentric band from 2–5 µm away from the soma. For each frame, we then calculated the discriminability ratio Dr as Dr = Fsoma / (Fsoma+ Fneuropil), and set a threshold at Dr=0.5 (equal luminance of soma and neuropil). Whenever this measure dropped below the threshold, we calculated the duration of this epoch until it would return to above the threshold, and took the maximum duration of all these epochs as a single measure per neuron. Most neurons from all sessions showed maximum below-threshold durations near 0 s, and no neurons showed durations longer than 1 s (Figure 1—figure supplement 1).

To address the potential confound of fast changes in z-plane due to anticipatory fidgeting behavior by the animals, we calculated the depth of each imaging frame and analyzed whether responses to visual stimuli were preceded by shifts in z-plane that could influence our results. As can be seen in figure supplement 1-1L–O, z-shifts were mostly confined to the epoch immediately following hit responses, which are not used in our analyses, and in general z-shifts were very small and rarely exceeded more than 1 µm.

Eye tracking

We recorded eye movements during the entirety of the calcium imaging experiment to be able to correct for possible contamination of our results by excessive blinking and/or saccades. For this purpose, we placed a near-infrared light sensitive camera (JAI CV-A50IR-C Monochrome 1/2" IT CCD Camera, JAI A/S, Germany) with a large-aperture narrow-field lens (50 mm EFL, f/2.8) above the visual stimulation screen directed at the mouse’s visually stimulated eye. Images were acquired at 25 Hz and pupil tracking was performed offline using custom-written MATLAB scripts. Eye position was used to control for possible saccade effects (Figure 2—figure supplement 1k,l), and pupil diameter was used to assess its correlation with heterogeneity (Figure 4—figure supplement 2e).

Calcium imaging recordings and final task parameters

Dual-channel two-photon imaging recordings (filtered at 500–550 nm for OGB and 565–605 nm for SR101; see Figure 1d) with a 512 x 512 pixel frame size were performed at a sampling frequency of 25.4 Hz. We used an in vivo two-photon laser scanning microscopy setup (modified Leica SP5 confocal system) with a Spectra-Physics Mai-Tai HP laser set at a wavelength of 810 nm to simultaneously excite OGB and SR101 molecules, as previously described (Montijn et al., 2014) in cortical layer 2/3 at depths from the pia mater ranging from 140 to 170 µm (Figure 1—figure supplement 1, Video 1). During data acquisition, mice were performing a go/no-go stimulus detection task where the animals had to lick whenever a visual stimulus was presented. Stimulus parameters were equal to those described above. We varied the contrast of the drifting grating (0%, 0.5%, 2%, 8%, 32%, and 100%) to elicit a wide range of hit/miss ratios. Responses to 0% contrast probe trials were not rewarded, but responses to all other contrasts were. We did not explicitly aim for very high detection performance (high hit rates and low miss rates) to avoid overtraining and associated habitual or automated responding (Balleine and Dickinson, 1998). A complete set of visual stimuli, therefore, consisted of 48 trials (6 contrasts times 8 directions). The order of presentation of these 48 trials was randomized independently for each repetition block. After the experiment was completed, we tested for statistically significant stimulus detection performance by calculating the binomial 2.5th–97.5th percentile intervals (henceforth 95% CI) of response proportion to the two probe trial types—100% and 0% contrast stimuli—using the CP method. Of the 10 animals from which we recorded calcium imaging data during task performance, one was rejected because of excessive variability in responses due to brain movement and one was rejected due to insufficient discriminability between the two types of probe trials (overlapping CIs). All data we present in this paper are from the remaining eight animals. The number of repetitions per stimulus type (unique orientation x contrast) ranged from 6 to 16. For most analyses, we took the mean over all orientations (n=4), so each contrast was presented 24–64 times. For all analyses of single-animal data, each trial was taken as a single data point, where its value was the mean dF/F0 over all recorded frames during stimulus presentation (which was dependent on the reaction time of the mouse). To avoid the confound of having higher signal-to-noise ratios for miss than hit trials due to longer data acquisition, within each contrast group we randomly assigned to all miss trials a duration randomly selected from the reaction time distribution of hit trials.

Data preprocessing

After a recording was completed, small x–y drifts were corrected offline with an image registration algorithm (Guizar-Sicairos et al., 2008). To retrieve dF/F0 values from the recordings, regions of interest (ROIs; neurons, astrocytes, and blood vessels) were determined semiautomatically using custom-made MATLAB software for each repetition block separately (see https://github.com/JorritMontijn/Preprocessing_Toolbox). For these ROIs, we subsequently calculated dF/F0 values as previously described (Montijn et al., 2014): For each image frame i, a single dFi/F0i value was obtained for each neuron by calculating the baseline fluorescence (F0i), taken as the mean of the lowest 50% during a 30 s window surrounding image frame i. dFi is defined as the difference between the fluorescence for that neuron in the given frame and the sliding baseline fluorescence (dFi = Fi – F0i) (Montijn et al., 2014). The mean number of simultaneously recorded neurons/session was 92.6 [range 68 – 130 (SD: 19.0) neurons]. After this initial analysis, all neurons were tested on consistency for preferred stimulus orientation and any neurons that showed inconsistencies over different repetition blocks (i.e. more than one-third showing different preferred orientations) were rejected from further analysis [mean number of consistently tuned neurons per animal was 66.3 ± 18.6 (70.8% ± 7.75% of all neurons) (mean ± SD)]. Unless otherwise specified, all analyses shown in this paper are based on across-animal meta statistics based on a set of eight independent data points (one data point/animal) and all multiple comparison t-test p-values were adjusted by the Benjamini and Hochberg FDR correction procedure and were deemed significant if the resultant p-value was <0.05. For quantification and control procedures related to z-drift and recording stability, see Figure 1—figure supplement 1. For control analyses where we performed neuropil fluorescence subtraction (Figure 4—figure supplement 2i), we used similar procedures as described previously (Greenberg et al., 2008; Mittmann et al., 2011); we calculated the correlation (r) between each neuron’s somatic fluorescence and surrounding neuropil (annulus between 2 and 5 µm from soma) and corrected on each frame the neuron’s fluorescence as follows: Fcorr = Fsoma – r * Fneuropil. Estimated neuropil contamination varied widely between neurons, but was generally in the range between 0.1 and 0.6, similar to previously reported values (Greenberg et al., 2008; Mittmann et al., 2011). We recomputed the explained variance of several metrics as a function of reaction time (see Figure 4c, Figure 4—figure supplement 1) and found that neuropil correction did not affect our main conclusions (Figure 4—figure supplement 2i).

Linear regression analysis

All linear regressions were performed on single-animal data sets, yielding regression coefficients for the intercept and slope through minimizing the error between a linear function and the single animal’s data points. Statistical significance was quantified by performing a one-sample t-test of the coefficients from all animals (n=8). Significance level was set at an α of 0.05 and p-values were adjusted if necessary by a post hoc Bonferroni–Holmes correction.

Calculation of preferred stimulus orientation

We presented eight directions of visual drifting gratings and calculated the preferred stimulus orientation of all neurons by summing opposite directions as belonging to the same stimulus type because the vast majority of mouse V1 neurons is tuned sharply to an axis of movement, but much less so to a specific direction within that axis (i.e. most neurons are strongly orientation-tuned, but less direction-tuned; e.g. Andermann et al., 2011). For these four orientations, we took each neuron’s mean response over all trials and defined its preferred orientation as the stimulus that caused the highest mean dF/F0 value. For most analyses, we used the neuronal responses to all orientations, except for Figure 2c,d, where we used only the response of the preferred orientation, as we hypothesized the preferred population might yield stronger hit/miss differences in neuronal activity.

Predictability of hit modulation by consistent neuronal responses and whole-population fluctuations

To investigate the source of hit-related increases in population dF/F0 and determine whether there might exist a subgroup of neurons that consistently enhances its activation during detection trials (as compared to nondetection), we defined a dF/F0 hit modulation index Ψ for each hit trial (t) for each neuron (i) as the neuron’s dF/F0 activity (R) relative to the mean (µ) and standard deviation (σ) of its response during miss trials (m) of the same type [identical orientation (θ) and contrast (c)]:

(1) Ψi,t = (Ri,t - μm,c,θ) / σm,c,θ

In other words, Ψi,t of a given trial represents the z-scored dF/F0 activity relative to the neuron’s response to the same stimulus when the stimulus remained undetected (Figure 2e, left panel). The hit-modulation matrix Ψ of all hit trials and all neurons can then be approximated by neuron identity (mean over trials), trial-by-trial fluctuations (mean over neurons), or both (addition of the matrices yielded by the two previous approximations) (Figure 2e). We then calculated the explained variance (R2) of the population response pattern by its canonical equation based on the residual (SSres) and total sum of squares (SStot). We defined SStot as the sum of all squared values in Ψ, and SSres as the sum of the squared differences between Ψ and the approximation matrix as defined above (by neurons, trials, or both). To assess significance, we performed 1000 shuffle iterations where we randomized neuronal identities per trial (for approximation by neuron identity), randomized trial identities per neuron (for approximation by trial identity), or randomized both (for approximation by both). Per shuffle iteration, we calculated the explained variance, which yielded a shuffled distribution per prediction (e.g. Figure 2f). A prediction was defined as significantly above chance when the real explained variance was at least 2 SDs away from the shuffled distribution mean (corresponding to p<0.05).

Heterogeneity calculation

We calculated heterogeneity of population activity as follows (see also Figure 3d). For each independent data source i (i.e. a neuron) that provides a certain measurement R at each time point t (i.e. dF/F0 activity of a single trial), we first z-scored the responses of i over all trials T (i.e. all contrasts and orientations). For all analyses we took t to be a single trial, except those shown in Figure 5, where t corresponds to a data acquisition point (i.e. a single calcium imaging frame), and calculated heterogeneity as follows. First, we z-scored all trial responses per neuron over all trial types (therefore high-contrast, preferred orientation stimuli yield higher z-score values than low-contrast, nonpreferred orientations):

(2) Zi,t = (Ri,t - μi)/σi

Z is therefore a matrix containing n (number of neurons) by T (trials) measurements of standard deviations (σ) from the mean over all trials (μ). Next, for each trial t, we calculated the pairwise distance (in standard deviations) from each independent source to each other independent source (pairwise neuronal Δσ): we repeated the z-scored population response vector zt over its singular dimension n times, where n is the number of neurons in zt (yielding a square matrix), subtracted this matrix from its own transpose ztT, and took the absolute of the result, giving the heterogeneity matrix Ht:

(3) Ht = | zt - ztT |

To get a single measure of population heterogeneity per trial (ht), we next took the mean of all z-scored distances between neuronal pairs (i,j) in the heterogeneity matrix; this provides a measure of the mean distance in activation levels within our population at a single trial t:

(4) ht = i=[1  n-1]j=[i+ 1  n] (Ht,i,j)/((n · (n - 1) )/2 )

Effect size of mean population dF/F0 and heterogeneity

We used a measure of effect size using Cohen’s d to quantify which metric (mean dF/F0 or heterogeneity) showed a stronger correlation with visual detection. We calculated for both metrics per animal the effect size for all intermediate contrasts (0.5–32%) between hit and miss trials and took the mean over these four values, yielding a mean hit/miss effect size for dF/F0 and heterogeneity per animal. This allowed us to perform a paired t-test between the dF/F0 effect sizes and heterogeneity effect sizes to test for statistical significance. Cohen's d is defined as the difference between the two means (hit; µh, miss; µm) divided by the pooled standard deviation for the data:

(5) d = (μh - μm) / σp,

where σp is defined as

(6) σp = [(nh - 1) ·varh +  (nm - 1) · varm]/(nh +  nm- 2)

Instantaneous Pearson-like correlations and sliding-window Pearson’s correlations

For a pair of neurons x and y, the Pearson’s correlation (R) of their activity can be calculated by z-scoring each neuron’s response vector (as in Equation 2) and taking the mean of the element-wise multiplication of the two vectors:

(7) Rx,y =t=[1  T] (Zx,t · Zy,t)/T

Here, notations are the same as for Equations 3–5; t is a single trial and T is the total number of trials. Using this equation, it is impossible to obtain an instantaneous correlation value between two neurons for each trial because its calculation requires taking the mean over all trials. This poses a problem if we want to estimate the instantaneous correlation value between a pair of neurons for a given trial. Therefore, we computed a modified measure, the instantaneous Pearson-like correlation (Ř). For each pair of neurons, we calculated the z-scored element-wise product (each element being a single trial), which yields a three-dimensional matrix Ž with size [n by n by T], where n is the number of neurons:

(8) Žx,y,t = Zx,t · Zy,t

Taking the mean over the matrix’s third dimension (trials) gives the conventional Pearson’s pairwise correlation matrix over neuronal pairs. However, the matrix also allows us to approximate the mean pairwise correlations within the whole population at any given trial (Řt) by taking the mean over all unique neuronal pair values in matrix Ž:

(9) Řt=i=[1  n-1]j=[i +1  n] (Ži,j,t)/((n · (n - 1) )/2 )

Similarly, we can take the standard deviation instead of the mean over all unique pairs per trial to estimate the spread of the instantaneous pairwise correlation distribution. However, note that while the instantaneous Pearson-like correlation is similar to the conventional Pearson correlation, Ř is not bounded within the interval [−1 1], because the z-scored element-wise product and the mean-operator work over different sets of values (i.e. matrix dimensions).

We additionally used for comparison a more conventional measure of correlations across time by using a wavelet-based sliding-window correlation (Cooper and Cowan, 2008). The time scale of the wavelet used in all sliding-window analyses was set to 1.0 s as this was similar to the animals’ median reaction times and should therefore maximize the stimulus-driven change in neuronal pairwise correlations.

ROC analysis of hit/miss separability

We quantified the single-trial behavioral response predictability using an ROC approach by calculating the area under the curve (AUC) for a false positive rate versus true positive rate plot. All ROC curves were computed separately per contrast and animal for both heterogeneity and mean population dF/F0 (Figure 3g). For comparison across animals, we averaged the AUC of the four test contrasts per animal, yielding a single AUC value per animal for both heterogeneity and dF/F0 (Figure 3h).

Decoding of stimulus presence

To ascertain the performance of a decoder on the same task as we required the mouse to perform, we created an algorithm that calculated the probability of a stimulus being present. This decoder was based on a previously published maximum-likelihood-naive Bayes decoding algorithm (for a more complete description, see Montijn et al., 2014). For each neuron and stimulus orientation, we computed the mean and standard deviation of mean dF/F0 during presentation of a 100% contrast stimulus as well as the mean and standard deviation during 0% probe trials. For each test trial and neuron with the preferred orientation as the trial’s stimulus orientation, we calculated the probability a stimulus was present by reading out the likelihood density function for 0% and 100% contrast trials. The product over neurons in the preferred population for each trial then yields a population posterior probability value for stimulus absence (0% likelihood) and presence (100% likelihood). The decoder’s read-out was the posterior with the highest probability. Because the likelihood was only based on 0% and 100% contrast responses, automatic cross-validation was ensured for decoding test contrast stimuli. After decoding stimulus presence for all trials, we split the trials into hits and misses and calculated the percentage for which the decoder indicated a stimulus was present per response type and contrast, averaging over repetitions and orientations. This yielded two curves per animal (see Figure 4d). We tested for statistically significant differences between response and no-response trials by performing a paired t-test over animals on the intermediate contrasts (0.5–32%).

Furthermore, we quantified the similarity of our decoder’s performance to the animal’s performance in the visual stimulus detection task by calculating the similarity per animal of its actual behavioral performance to the decoder’s performance (Pearson’s correlation over contrasts). We compared this value to the similarity obtained with a bootstrapped shuffling procedure (1000 iterations). Here, we shuffled the animal’s behavioral and decoder performance over contrasts, recalculated the similarity index, and took the mean over all iterations as the resultant shuffled similarity. To test for statistical significance, we performed a paired t-test over animals between the shuffled and real similarities (Figure 4e).

Moreover, we investigated the similarity between the animal’s and decoder’s output at a single-trial level with a chi-square analysis. Pooling all trials across animals showed significant correspondence between the decoder and animal’s judgment of stimulus presence; hit trials were more often decoded as ‘stimulus present’ and miss trials more often as ‘stimulus absent’ (Figure 4—figure supplement 2j). Note that this decoding procedure is not optimal; the absolute decoding performance therefore should not be interpreted as reflecting the actual amount of information present in the neural responses. The purpose of this decoder is merely to test—in coarse terms—the similarity between the neural signal and the animal’s behavior.

Behavioral response predictability

We analyzed the predictability of behavioral responses before they occurred based on either the mean population dF/F0 response or population heterogeneity between 3 and 0 s before stimulus onset (Figure 5e). Hit trials were split into the 50% fastest and 50% slowest reaction times per contrast per animal and then averaged over contrasts, yielding 6 data points per animal: the mean pre-stimulus population dF/F0 and mean population heterogeneity preceding fast, slow. and miss trials. We then quantified the consistency of differences over animals by calculating the distance of these points per animal to the mean of their own response group and the other two. We defined the predictability metric per point i (animal) for two response types r1 and r2 (i.e. two types out of fast, slow, or miss) as

(10) δr1,r2,i=((||d(ir1,μr2)||/( ||d(i r1,μr2)||+ ||d(i r1,μr1¬i)|| )) - 0.5 ) · 2,

where ‖d‖ is the absolute Euclidian distance (vector magnitude), µr is the mean location of lr– where lr is the group of points for response r – and µr¬i indicates the mean location of lr without point i. This analysis yields a vector δr1,r2; the separability between response type r1 and r2. Random placement would lead to a separability of δ = 0, so we quantified statistically significant predictability of responses by performing FDR-corrected one-sampled t-tests (vs. 0) for each separability vector and both neuronal population metrics (heterogeneity and mean dF/F0). We also tested whether the separability was higher for heterogeneity or dF/F0 by performing FDR-corrected paired t-tests between dF/F0 and heterogeneity separability vectors for the same response type comparisons (Figure 5e).

Rise time to maximum heterogeneity

We defined the rise time to maximum stimulus-driven heterogeneity as the time it took the population heterogeneity to rise from 10% to 90% of the difference between pre-stimulus baseline levels and maximum heterogeneity during the stimulus period. This rise time was calculated on the mean curves per animal and contrast as shown in Figure 5d. To create the graph shown in Figure 5f, we took the rise time across test contrasts per animal (n=8) and behavioral response type (miss, slow, fast). We tested for significant differences in average rise times between response types with paired t-tests across animals.

Population activation pattern consistency

Detection of a visual stimulus might be associated with consistencies in population activity. We, therefore, analyzed whether the inter-trial correlation of population activity varies depending on the behavioral performance of the animal. We again separated fast, slow, and miss trials, and for each stimulus orientation calculated the correlation of the dF/F0 response vector between pairs of trials with the same type of behavioral response (Figure 6a). We separated the neuronal responses for that orientation’s preferred and nonpreferred population of neurons, also to address whether consistency across trials might be restricted to the preferred population or would also occur in the nonpreferred population (Figure 6d–d). Note that because we calculated the correlations separately for preferred and nonpreferred populations, the relative contribution of the orientation signal is fairly low, which explains the relatively low correlation values. To assess above-chance similarities, we compared these values to correlations obtained from shuffled data. By shuffling within each stimulus orientation all trial identities randomly for each neuron, the orientation signal is preserved, but other similarities across trials are destroyed. We repeated this shuffling procedure 100 iterations and took the mean of these 100 iterations as shuffled correlation value per animal (Figure 6b–d). To test for statistically significant consistencies in population activation patterns, we performed FDR-corrected paired t-tests between the real and shuffled correlation values over animals for the different response types and the two neuronal population types. We also quantified the differences between response groups in the real data with paired t-tests (miss vs. slow, miss vs. fast, and fast vs. slow).

Analysis of multidimensional inter-trial distance in neural activity

To study the theoretical implications of our results relating to heterogeneity, we proceeded with an analysis of the question whether heterogeneity forms a special case of population codes that do not merely reflect an increased activity of all neurons upon visual detection. For the specific purpose of these analyses (shown in Figure 7), we use as definition for multidimensional heterogeneity the distance in neural space from the population’s activity to the closest point on the main diagonal (see text and below for further explanation). Although this definition is computationally different from our pairwise definition of heterogeneity, it also captures the overall dissimilarity of responses within a population of neurons. Moreover, applying this procedure to z-scored dF/F0 values yields Pearson's correlations of r > 0.9 when compared with our original definition of heterogeneity (Equations 3 and 4) and gives very similar hit/miss Cohen’s d values (Figure 3—figure supplement 2). The two metrics, therefore, likely capture the same neural phenomenon and show that heterogeneity can be studied by different, but related computational definitions.

To assess the distribution of neuronal population activity in multidimensional neural response space (where each dimension represents the activity of a single neuron; see Figure 7a–d), we calculated the inter-point distance (each point representing the population activity during a single trial) between all hit trial pairs and between all miss trial pairs. The distance in neuronal activity for a population of n neurons between a pair of trials x and y in multidimensional space can be calculated as the n-dimensional Euclidian:

(11) d (x, y)=(x1  y1)2+ (x2  y2)2++ (xn  yn)2

The pairwise inter-point distance is then given in units of neural activity (dF/F0, Figure 7e). Note that this formula can also be used to calculate the multidimensional heterogeneity, as defined above, by taking the distance between any trial (x) and the closest point on the diagonal (y).

Next, we investigated the symmetry of population responses around the main diagonal as this symmetry gives an indication of whether heterogeneity is an epiphenomenal observation or a fundamental neural characteristic underlying visual detection (see text). In order to do so, we mirrored each point across the diagonal and recalculated the inter-point distances for the mirrored data. Mirroring across the diagonal was achieved by direct inversion of the signs per neuron relative to the main diagonal. For a population response [r1 r2 … ri … rn], where n is the number of neurons, the mirrored version r’ = [r’1 r’2 … r’i … r’n] was calculated as follows:

(12) r'=(μr-(r-μr))

where µr is the mean population response over r.

Removal of mean and heterogeneity, and subsequent hit/miss decoding

For the analyses displayed in Figure 7g,h, we removed the mean and/or heterogeneity from the population responses and assessed the effect on decoding accuracy of hit/miss responses during test contrast stimuli. As mentioned before, for these analyses heterogeneity was defined as the distance to the main diagonal. As such, removal of the mean without influencing heterogeneity is trivial and can be achieved by simply subtracting the mean population response from all neuronal dF/F0 values obtained for each trial. Briefly, heterogeneity was removed from each trial without affecting the mean in two steps; first heterogeneity was removed, and next any influence on the mean was remedied by adding the difference between the new and old mean. First, heterogeneity was removed by dividing each neuron’s response during that trial by the square root of the sum of the squared differences between the neuronal responses and the mean (i.e. by dividing by the heterogeneity):

(13) r'¬H=r(r1-μr)2+ (r2-μr)2+ +(rn-μr)2

Next, changes in the mean were corrected by removing the new mean of the heterogeneity-removed population activation (μr'¬H) and adding the old population mean µr:

(14) r¬H= r'¬H+ μr-μr'¬H

This way, the heterogeneity (i.e. the Euclidian distance of that trial’s population activity to the main diagonal) is normalized to 1.0 for all trials. The multidimensional location relative to the diagonal is preserved, but its distance is always the same; all trials now fall on a cylinder with a radius of 1.0 dF/F0 around the main diagonal. In other words, the population activation during a trial is projected as a vector from the closest point on the diagonal to the trial’s position, and the vector’s angle is preserved, but its magnitude is normalized to 1.0. Both properties (mean and heterogeneity) can be removed by subtracting the mean from the heterogeneity-removed responses. Removing the mean as well as the heterogeneity collapses this cylinder onto a circle through multidimensional space around the origin.

Control analyses for confounds related to running, licking, and eye movements

To control for potential locomotor confounds, we split all data sets into trials where the mouse was still (90.9% ± 3.6% of trials) and where it was moving during stimulus presentation (8.1% ± 3.6% of trials), and reanalyzed our data. Our results with exclusion of running trials (Figure 2—figure supplement 1g,h) are very similar to our original analysis (Figure 2a,b), showing that the effects we observed cannot have been due to running-induced modulations (paired t-test, hit vs. miss, 0.5–32%, p<0.05).

Another potential confound for our results could be that response trials induce signals related to motor feedback or motivation to initiate motor actions because the animal initiates licking as a behavioral response. This also seems unlikely; because 0% contrast probe trials did not induce neuronal activity during false alarms (Figure 2—figure supplement 1a, green line). Theoretically, however, such signals could still be present and influence population activity only when occurring concurrently with visual stimulation. To control for this, we re-performed our analyses shown in Figure 2a,b, but now used data only from the first 0.4 s after stimulus onset; approximately 0.8 s before the mean reaction time. Leaving a window of 0.8 s between the latest frame included in the data analysis and the licking response should also eliminate potential modulatory activity from motor cortex related to the preparation of licking. The results from this control analysis were slightly noisier due to the shorter data acquisition duration per trial, but showed no qualitative differences to the original analysis regarding heterogeneity (Figure 2—figure supplement 1i,j). The intermediate contrasts still showed significant enhancements in heterogeneity (p<0.01) during hit trials, but we found no significant differences for mean population dF/F0 (p=0.543). We, therefore, conclude that our results regarding heterogeneity are not confounded by motor-related modulations due to running or licking, nor by reward-expectation prior to licking responses, and confirm that the mean population dF/F0 is not or less useful as a measure of neural correlates of perception.

To control for possible effects of blinking and saccades, we performed pupil detection on our eye-tracking data and removed all trials in which the animals blinked or made saccades during any time of the stimulus presentation [10.2% ± 4.6% of trials removed (mean ± SD)]. We re-performed our analyses on only the trials where no contamination by incorrect eye position and/or closing of the eyelids was possible (Figure 2—figure supplement 1k,l) and observed that our results regarding heterogeneity were qualitatively and quantitatively similar to our original analyses, but that the dF/F0 results were again more sensitive to a conservative analysis (hit/miss difference for intermediate contrasts, paired t-test, n=8; dF/F0, p=0.136; heterogeneity, p<0.005). We conclude that our main results are not biased by incorrect eye position and blinking.

Orientation decoding

We addressed whether the orientation information contained in the population responses was dependent on the mean dF/F0 and heterogeneity during stimulus presentation. We decoded the presented stimulus orientation for each contrast separately (i.e. 100% contrast trials based on likelihood from 100% contrast trials, etc.) by a leave-one-out cross-validation and afterwards split all trials into correctly and incorrectly decoded ones (Figure 4—figure supplement 2b). To quantify the dependence of decoding accuracy on dF/F0 during stimulus presentation, we took for each contrast the trials with highest and lowest 50% of dF/F0 and calculated the mean decoding accuracy for both groups (high and low activity). Next, we took the mean for these groups over contrasts per animal and calculated a percentage decoding accuracy increase for the highest versus lowest 50% dF/F0 trials (see Figure 4—figure supplement 2c). To test for statistical significance, we performed a one-sample t-test of the percentage increase values over animals. For heterogeneity, we performed the same steps and performed a t-test versus 0% increase (Figure 4—figure supplement 2c).

Stimulus feature decoding

To address whether visual stimulus features (i.e. orientation and contrast) were more accurately represented by neuronal population activity during correct versus incorrect behavioral performance, we used a Bayesian maximum-likelihood decoder as previously described to extract those features from the population activity (for a more complete description, see Montijn et al., 2014). We defined all combinations of orientations and contrasts as different stimulus types, yielding a total of 21 different stimulus types (four orientations times five contrasts plus probe trials). Next, we performed a leave-one-out cross-validated decoding procedure for all trials and calculated the mean percentage correct decoding trials for hits and misses per stimulus type; then we averaged the percentage correct over stimulus types, yielding an accuracy per animal for hit and miss trials. We tested for a statistical difference between hits and misses with a paired t-test over animals (Figure 4—figure supplement 2a).

Noise correlations

To investigate detection-related increases or decreases in noise correlations (Figure 4—figure supplement 2f,g), we first calculated a response vector for each stimulus orientation θ that was presented during a test contrast trial. Here, each element in the vector is the neuron’s response to a single presentation t (i.e. a trial) of that stimulus orientation:

(15) Rθ=[RθtRθn]

where n is the number of repetitions per response type per orientation. Because we aim to compare a single noise correlation value per neuronal pair i,j, we took the mean noise correlation over all four stimulus orientations:

(16) ρi, jnoise=θ=0135corr(Ri,θ ,Rj,θ )4

The noise correlation is, therefore, an index of the mean trial-by-trial variability shared by pairs of neurons over all stimulus orientations.

Behavioral response predictability on single-trial basis

To verify that the behavioral predictability before stimulus onset that we found (Figure 5e) was not merely a group-level effect, but was indeed also a single-trial phenomenon, we subsequently performed single-trial decoder-based predictions of fast/slow/miss behavioral responses that occurred during the subsequent stimulus presentation (see Figure 5—figure supplement 1). We used a similar leave-one-out cross-validated naive Bayes decoder as described above for fast, slow, and miss trials, and calculated per trial the relative likelihood that the subsequent stimulus presentation would lead to a miss, fast, or slow response. We then split the predictive decoding results per actual behavioral response group and averaged the relative prediction likelihood per animal. This yields three relative probability values per actual response type per animal. Assigning an angle to each of these behavioral responses that are separated by 2/3π on the unit circle and taking the relative likelihood as the vector magnitude, it is then possible to calculate a resultant prediction vector per actual response type per animal. To quantify statistical significance, we multiplied an angle-based correctness index (+1 when the resultant prediction vector angle is perfectly aligned to the actual response angle and –1 when they are separated by 1π) with the vector magnitude, giving a normalized decoding accuracy index between –1.0 and +1.0, where chance level is 0. Lastly, we performed one-sample t-tests on the normalized decoding accuracy indices over animals and response types for heterogeneity and dF/F0, and a paired t-test between dF/F0 and heterogeneity (Figure 5—figure supplement 1).

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
    Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems
    1. P Dayan
    2. LF Abbott
    (2001)
    Cambridge, MA: The MIT Press.
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
    A Model of Saliency-Based Visual Attention for Rapid Scene Analysis
    1. L Itti
    2. Koch C
    3. Niebur E
    (1998)
    A Model of Saliency-Based Visual Attention for Rapid Scene Analysis.
  40. 40
    An evaluation of the two-dimensional gabor filter model of simple receptive fields in cat striate cortex
    1. JP Jones
    2. LA Palmer
    (1987)
    Journal of Neurophysiology 58:1233–1258.
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
    Cellular bases of neocortical activation: modulation of neural oscillations by the nucleus basalis and endogenous acetylcholine
    1. R Metherate
    2. CL Cox
    3. JH Ashe
    (1992)
    The Journal of Neuroscience 12:4701–4711.
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
    Attention and primary visual cortex
    1. MI Posner
    2. CD Gilbert
    (1999)
    Proceedings of the National Academy of Sciences of the United States of America 96:2585–2587.
    https://doi.org/10.1073/pnas.96.6.2585
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
    Simple models for reading neuronal population codes
    1. HS Seung
    2. H Sompolinsky
    (1993)
    Proceedings of the National Academy of Sciences of the United States of America 90:10749–10753.
    https://doi.org/10.1073/pnas.90.22.10749
  75. 75
    Motion perception: seeing and deciding
    1. MN Shadlen
    2. WT Newsome
    (1996)
    Proceedings of the National Academy of Sciences of the United States of America 93:628–633.
  76. 76
  77. 77
    In vivo two-photon calcium imaging of neuronal networks
    1. C Stosiek
    2. O Garaschuk
    3. K Holthoff
    4. A Konnerth
    (2003)
    Proceedings of the National Academy of Sciences of the United States of America 100:7319–7324.
    https://doi.org/10.1073/pnas.1232232100
  78. 78
  79. 79
  80. 80
  81. 81
    Visual capacity in the hemianopic field following a restricted occipital ablation
    1. L Weiskrantz
    2. EK Warrington
    3. MD Sanders
    4. J Marshall
    (1974)
    Brain 97:709–728.
  82. 82

Decision letter

  1. David C Van Essen
    Reviewing Editor; Washington University in St Louis, United States

eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.

Thank you for submitting your work entitled "Mouse V1 population correlates of visual detection rely on heterogeneity within neuronal response patterns" for peer review at eLife. Your submission has been evaluated by Eve Marder (Senior editor), a Reviewing editor, and three reviewers.

The reviewers have discussed the reviews with one another and the Reviewing editor has drafted this decision to help you prepare a revised submission.

The reviewers articulated a desire to see your responses to their critiques and felt you should be given a chance to revise your work. We are allowing you to submit a revision, as long as you understand that a positive outcome is not assured, and will depend on whether the reviewers and Reviewing editor feel that you have adequately revised your manuscript and/or rebutted the critiques.

Summary:

The statistics of population neuronal responses of early sensory cortices associated with animal perceptual behavior is an important issue in neuroscience. In this paper, Montijn and collaborators compared neuronal fluorescence signals from L2/3 mouse V1 with behavioral responses in a detection visual task. The reviewers think that there are interesting results but it is not clear up to what point the observed correlation between behavior and the heterogeneity measure supports the authors' claim. An important issue is that authors neglect many previous developments on the neuronal correlates in other early primary cortical areas (somatosensory cortex and auditory cortex). The paper focuses on the neuronal correlates of V1 with visual detection performance, but this is a general problem and the authors should mention this.

Essential revisions:

Reviewer #1:

1) The authors state that "for each trial took the responses of only neurons that preferred the presented stimulus orientation" (subsection “Response dissimilarity within neuronal populations correlates with detection”). This practice is extremely dubious. First, it is totally unclear how it affects subsequent analysis. Were Z-scores calculated once over all orientations or calculated for separately each orientation for those neurons that preferred it? When examining the relationship between behavior and heterogeneity, was a correlation calculated for each orientation or over all orientations?

2) The presented stimuli consist of square wave gratings with 8 different directions but a response to any orientation was rewarded. In essence, this becomes a matter of responding to a change in light level. Thus analyzing responses to oriented stimuli may result in a bias towards those neurons with inherent responses to the stimuli which obscures or masks the responses from the neurons actually involved in the discrimination of the change in intensity. Nonetheless, the authors, on the basis of Ca2 -transient measurements from ~100 L2 neurons in monocular V1, conclude that "visual perception does not correlate well with mean response strength, but is significantly correlated with population heterogeneity." This statement ought to be drastically revised to reflect that it is contingent on the ad-hoc procedures chosen by the authors, and how the correlation is calculated using data-selection procedures based on orientation, which was not part of the behavioral task.

3) Since the animal needs in principle only to respond to an increase in ambient light intensity brought about by the stimulus and since no behavioral dependence on orientation has been reported, all of the analysis concerning orientation selectivity (preferred populations etc.) is potentially irrelevant, and the logic behind this experimental design is not clear. If one were designing an experiment to test for a correlation between mean response strength and visual perception, surely it would be wise to do one's best to ensure that the neurons from which responses were recorded had response properties that were at least to some degree related to the discrimination target? While it would be equally unwise to assume that orientation selective neurons in V1 do not play a role in visual discriminations not involving oriented stimuli at their preferred orientation, the failure on the authors' side to discuss in any way the caveats associated with their experimental design and simultaneously to draw the conclusions that they do and state them as strongly as they do is remarkable.

4) The heterogeneity measure, the sum of pairwise absolute z-score differences, does not correspond to any normal usage of the word heterogeneity and is never adequately justified. For example, if all neurons respond to a given stimulus with the same fluorescence increase, the heterogeneity of that stimulus will not be zero but will depend on their responses to other stimuli. Even a trial that elicits no fluorescence change in any neuron the heterogeneity will not be zero. Since the measure is based on z-scores, it will amplify fluorescence noise in neurons that are less frequently active so that for sparse activity noise can dominate the measure, but this issue is never discussed. While it does indeed seem to correlate better than some other measures with behavior, the manuscript does not adequately explain how this measure was calculated and in any case this measure would not tell us what is going on the brain.

5) The alternative measure "instantaneous Pearson correlations" suffers from the same problems as "heterogeneity." It is improperly named as it is not a Pearson correlation. Time varying correlation measures already exist and should be mentioned; they are generally based on sliding windows (e.g. "Time-varying correlation coefficients estimation and its application to dynamic connectivity analysis of fMRI" Fu et al. 2013 or "The sliding window correlation procedure for detecting hidden correlations: existence of behavioral subgroups illustrated with aged rats" Schulz and Huston 2002).

6) The nature of the decoder used (subsection “Heterogeneity predicts reaction time”) is never explained in the main text or Methods. The extremely convoluted use of a similarity metric and p-value based on comparison to randomly shuffled data (Figure 4E) to claim that the decoder and the animal behave similarly is not a clear and honest presentation of results. The similarity metric was not explained in the Methods. There is nothing to support the statement that "the performance as a function of contrast was strikingly similar to the animals' actual behavioral performance."

7) The assertion, in the Introduction, that "a widely held assumption in computational models of vision is that neurons in distributed cortical architectures have relatively fixed roles in information coding" is a straw-man argument. The authors do not adequately characterize what this assumption of "fixed roles" means, and also fail to characterize the diverse set of existing theories and conjectures about how the visual system may function.

8) We need to see much more raw data so as to evaluate data quality. In particular, we should see supplementary movies showing simultaneous raw, unprocessed imaging data, behavior, and "heterogeneity" for ~10 consecutive trials.

9) The very large responses of some neurons with nearly 100% DF/F in Figure 1d don't seem to match the very modest DF/F of 4% over "preferred populations" in Figure 2d. Are the data in Figure 1 not representative of the full dataset? Or is the time window for averaging each trial's responses perhaps too long? The presentation, figure and analyses are unclear.

10) The first stated aim is to ask: “does visual detection correlate with mean visual response strength or other metrics?". This may be of interest if one could determine for certain that the response strength was being determined for the neurons really involved in the detection/perception required by the task. But why should we care what L2/3 is doing during this task, when it may not even be involved in generating the behavioral response?

11) The authors assert in the Introduction that "specific ensemble activation patterns reoccur across temporally spaced trials in association with hit responses, but not when the animal fails to report a stimulus." I do not understand how, on the basis of the data presented in Figure 6 and the manuscript text associated with it, that this conclusion can be drawn. The authors state: "We again split the data into miss, fast and slow response trials, and computed the correlations between response patterns from different trials separately for preferred and non-preferred neuronal populations…" What response patterns are being correlated? The Methods states that the "mean inter-trial correlations over animals" was compared. I find the link between this measure and the conventional definition of ensemble tenuous at best. Further, the calculated correlation coefficients are very low (<0.12), which does not support well the claim made above.

12) The authors describe their method for assessing the extent of slow drift in the z-plane, which they quantize into 10μm bins. It is unclear what additional effects this may have on the measured Ca2 -transients, something that would be best determined empirically using simultaneous electrophysiology. More importantly, fast shifts in the z-plane are a considerably larger problem, and these would be anticipated as the animal changes its posture or shifts fore- or hindlimb. This sort of "fidgeting" is commonly observed in advance of a rodent making a behavioural response. How the authors measured these postural adjustments is not clear, neither is the effect that these movements have on the activity recorded. It is certainly conceivable that a z-shift could move the focal plane further inside some neurons and further outside others, thereby increasing "heterogeneity."

13) Previous multiphoton Ca2 -imaging studies have shown that correcting xy-shifts uniformly across the whole image is not sufficient for motion correction in awake animals (see Dombeck et al. 2007, Greenberg and Kerr 2009). As described above, motion-associated artefacts resulting from the fidgeting of the animal around a response are not quantified and potentially important.

14) The caveat that the only neurons from which recordings were made were superficial neurons ought to have been explicitly discussed. Is it not conceivable that the correlation of mean activity with perception might be significantly higher for neurons in deeper layers?

15) How did the authors control for possible ocular torsion (twisting of the eye and retina round the optic axis) during the experiment? This would totally invalidate all analyses based on orientation if present but not accounted for.

Reviewer #2:

1) The concern is about the animal's behavior. The performances shown in Figure 1C, E are relatively low at 100% contrast; in many cases slightly different than the ones at 32%. The presence of errors at full contrast imply mechanisms other than visual detection contributing to the animal's response variability that will potentially contaminate all other conditions as well.

2) Regarding the correlation between heterogeneity and behavior, the authors claim that "…the increased spread of neuronal response strengths within a population determine the behavioral accuracy". This reviewer is concerned about how strong is the change in heterogeneity between hit and misses to support this claim. In his opinion the authors should explicitly quantify how predictive is the animal's decision from this population measure, on a trial-by-trial basis.

3) He finds very interesting the fact that the measure of heterogeneity – but not the mean population response – correlates with detection. However, as far as he understands, this would be the case in any situation in which the detection of the stimulus is represented by a population code that is not merely an increase of activity of all neurons. The mean population response is only one particular projection of the population activity (let's say, described by the vector [1 1 … 1]). If detecting the stimulus activates the neural population in any other direction in neural space, this measure of heterogeneity will increase (because some neurons increase activity while others decrease). In particular, if detecting or not the stimulus modulates the population activity in a direction orthogonal to [1 1 … 1], the mean population response will not be affected (and won't correlate with the animal's behavior). His concern is that, if this is the case, it is not heterogeneity per se that is relevant, but the presence of complex population patterns of activity that are not visible at the level of the mean response. He thinks the authors should check if there is a population signal other than the mean response that correlates with the animal's decision.

4) The authors claim that ensemble patterns reoccur upon presentation of the same stimulus. However, inter-trial correlations of population responses are relatively low (~0.11). They should explain what value they take as a reference to validate this claim and why. Correlations could increase because of reasons other than reoccurring of the same activity pattern; a more detailed analysis is needed to support this claim.

5) He believes it is necessary to explain why the authors chose this particular measure of spread in neural responses, as opposed to – arguably – more natural ones like the variance. If the variance does not correlate with behavior as much as heterogeneity does, then this might also be informative of the properties of the population code. A set of related statistics are examined in regard to reaction times (Figure 4C) but not in relation to the decision of the animal.

Reviewer #3:

Positive points:

1) Evaluated multiple metrics for stimuli detection.

2) Propose a new metric for population heterogeneity, where dissimilarly activated neurons have high population heterogeneity.

3) Data from a sufficient number of mice, 8, were collected and analyzed and the results hold across animals.

Negative points:

1) Preferred orientation and non-preferred orientation neurons are analyzed separately – this ignores potential interactions between neurons (subsection “Data processing”).

2) The preferred orientation neurons are selected using the mean dF/F0 value, however, the main result of the paper suggests that a different metric, heterogeneity, is more robust in capturing stimuli recognition; how will the analysis be affected if the same metric is used for pruning the neurons? (subsection “Calculation of preferred stimulus orientation”).

3) As defined, heterogeneity seems a reasonable metric, however, it only considers pairwise relationships between neurons; a more holistic, group-level metric should be considered, since the goal of the analysis is to discover groups of neurons.

4) Can you explain or cite the reasoning behind using the procedure in the subsection “Behavioral response predictability on single trial basis”, to compute a prediction? Can the model likelihood be used to make predictions instead?

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Mouse V1 population correlates of visual detection rely on heterogeneity within neuronal response patterns" for further consideration at eLife. Your revised article has been favorably evaluated by David Van Essen (Senior editor) and two reviewers. The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

In brief, both reviewers had positive comments about the revisions but also request minor additional revisions that will not require re-review.

Reviewer #1:

Regarding the authors’ response to Reviewer #1, comment 10: The study by Glickfeld activated PV neurons in a 1mm diameter around the injection pipettes ~up to 1mm below the V1 surface and showed that this increased the threshold for detection of both orientation and contrast by the animals. I do not see the relationship between the author's response to my question and the question. Why is it reasonably assumed that L2/3 is involved in the task that is presented?

Regarding the authors’ response to Reviewer #1, comment 11: Please change the last sentence in the Abstract to reflect the changes in terminology by removing ensembles (see below). I’m not sure what “selective and dynamic neuronal ensembles” are. Please also rephrase the first paragraph of the Discussion, which suffers from the same issue.

From the Abstract:

"Contrary to models relying on temporally stable networks or bulk-signaling, these results suggest that detection depends on transient activation of selective and dynamic neuronal ensembles."

Reviewer #2:

Single-trial population recordings in behaving animals have the potential to uncover how the dynamics of a network of neurons give rise to perception, decision and behavior. In the context of visual detection, given the activity of a population of neurons, what is the population measure that better relates with the animal detecting or not the stimulus is unknown. This study shows that in L2/3 of primary visual cortex, measures of spread of neural activity are more predictive of the animal's detection than mean-based measures. The authors did a very good job addressing the issues mentioned in the revision. I believe the paper has improved significantly both in the analysis of the data and in the precision with which the claims are expressed.

Response to my prior comments:

1) I had noted that the low performances at full contrast imply mechanisms other than visual detection contributing to the animal's decision (lack of motivation, for example). This means that test contrast trials are probably contaminated with a significant amount of trials (close to 50% for several animals) in which the animal actually detected the stimulus but didn't respond. The authors argue that heterogeneity does not reflect these other mechanisms because it's equal for both behavioral responses at full contrast. I agree with the argument and understand that the low performances might actually be diluting the effect reported in the paper. But I still would like to ask, does the distribution of heterogeneity in "No Resp" trials show any hint of bimodality, reflecting the 50% of trials in which the stimulus was in fact detected?

2) I had requested a quantification of how predictive is the single-trial value of heterogeneity of the animal's behavior. This was added in Figure 3G, H.

3) I had asked whether the reported effect of increased heterogeneity could be an artifact of the presence of complex -but well-defined- patterns of activation orthogonal to the mean activity. The authors developed an elegant new analysis to address this question by mirroring neural responses with respect to the mean and measuring its symmetry. The results show that neural responses are a bit asymmetrical, pointing to the existence of a structured activation related to visual detection, although the effect size is very small. Besides, this analysis leads to the finding that hits are more structured than misses. Finally, removal of the mean, the heterogeneity or both, allows identifying the importance of each property on hit/miss decoding. I consider the point well taken.

4) I had requested more details on the analysis of reoccurring patterns of activity between trials. The authors addressed this question by expanding the analysis of correlations between population patterns and added the corresponding controls.

5) I had asked for a deeper explanation of why they choose this particular mathematical definition for heterogeneity as opposed to others. The authors expanded the analysis of hit/miss difference for other metrics of heterogeneity and found that many lead to the same results. They mention this fact in the revised manuscript, clarifying that the main result is that measures of "spread" of neural responses are more predictive than mean-based ones.

https://doi.org/10.7554/eLife.10163.019

Author response

Essential revisions:

Reviewer #1:

1) The authors state that "for each trial took the responses of only neurons that preferred the presented stimulus orientation" (subsection “Response dissimilarity within neuronal populations correlates with detection”). This practice is extremely dubious. First, it is totally unclear how it affects subsequent analysis. Were Z-scores calculated once over all orientations or calculated for separately each orientation for those neurons that preferred it? When examining the relationship between behavior and heterogeneity, was a correlation calculated for each orientation or over all orientations?

We have clarified in the revised version of the manuscript that the line the reviewer quotes referred only to Figure 2c,d; the reviewer’s concern about “dubious” practice was therefore based on an apparently unclear description in the original version and not on an actually dubious practice. We have more clearly stated in the revised version that we include all neurons in almost all analyses (subsection “Response dissimilarity within neuronal populations correlates with detection”, first, second and fourth paragraphs). We apologize for the confusion and hope these changes are sufficient to avoid further ambiguity. We also agree with the reviewer that the z-scoring procedure that heterogeneity is based upon could have been described more clearly, especially in the initial in-text description. We have edited this description (in the third paragraph of the aforementioned subsection) as well as in the Material and methods describing the calculation of heterogeneity (subsection “Heterogeneity calculation”).

To explain these clarifications in the manuscript, and to answer the reviewer’s questions more directly: the z-scoring procedure is performed on trial responses, not on single time points (except for Figure 5). Because heterogeneity calculations are trial-based rather than single-frame-based, we changed “time-point” to “trial” (subsections “Response dissimilarity within neuronal populations correlates with detection” and “Predictability of hit-modulation by consistent neuronal responses and whole-population fluctuations”). The z-scoring is performed for each neuron across all trials of all contrasts and orientations. Thus, trials of the neuron’s preferred, as well as its non-preferred, orientation were included in the z-score calculation. As such, each trial yields a heterogeneity value that includes the response of preferred and non-preferred neurons. We pooled the heterogeneity values across all trials, and calculated the mean heterogeneity for response vs. no-response trials (Figure 3c,d), the mean for fast, slow and miss trials (Figure 5), and the correlation of heterogeneity with the animal’s reaction time (Figure 4a–c). Moreover, to allow a more faithful comparison between the heterogeneity and mean dF/F0, we calculated both metrics using the whole population, the preferred population, and only the non-preferred population and show these results side-by-side in the original version as well as the revised version (Figure 3F), with single animal examples as well as the mean over animals (Figure 3—figure supplement 1). These figures show that heterogeneity hit-miss differences are larger than mean dF/F0 hit-miss differences in each of these cases.

2) The presented stimuli consist of square wave gratings with 8 different directions but a response to any orientation was rewarded. In essence, this becomes a matter of responding to a change in light level. Thus analyzing responses to oriented stimuli may result in a bias towards those neurons with inherent responses to the stimuli which obscures or masks the responses from the neurons actually involved in the discrimination of the change in intensity. Nonetheless, the authors, on the basis of Ca2 -transient measurements from ~100 L2 neurons in monocular V1, conclude that "visual perception does not correlate well with mean response strength, but is significantly correlated with population heterogeneity." This statement ought to be drastically revised to reflect that it is contingent on the ad-hoc procedures chosen by the authors, and how the correlation is calculated using data-selection procedures based on orientation, which was not part of the behavioral task.

As the reviewer mentions, the task our animals had to perform was stimulus detection, and not orientation discrimination. However, because our goal was to investigate neural correlates of visual stimulus detection rather than orientation discrimination (as we state throughout the manuscript), we do not believe this presents a confound for our conclusions. In the original (as well as the revised) version we consistently used the term “detection” rather than “discrimination” throughout the manuscript. In the Introduction and Discussion we tried to place our results in a broader perspective and therefore referred to “perception” as a more general phenomenon. However, we recognize that our results are not based on a generalized, multi-faceted perception task, but on a visual stimulus detection task, and have therefore edited the manuscript to state instead that “visual stimulus detection does not correlate well with mean response strength, but is significantly correlated with population heterogeneity” (Introduction). Moreover, as we also mention below in response to a different comment by the reviewer, in the revised version we now explicitly discuss that our observations relate only to L2/3, and that L5 might show a different relationship with the detection of visual stimuli (subsection “Possible cortical layer specificity of poor correlation of mean population responses”).

3) Since the animal needs in principle only to respond to an increase in ambient light intensity brought about by the stimulus and since no behavioral dependence on orientation has been reported, all of the analysis concerning orientation selectivity (preferred populations etc.) is potentially irrelevant, and the logic behind this experimental design is not clear. If one were designing an experiment to test for a correlation between mean response strength and visual perception, surely it would be wise to do one's best to ensure that the neurons from which responses were recorded had response properties that were at least to some degree related to the discrimination target? While it would be equally unwise to assume that orientation selective neurons in V1 do not play a role in visual discriminations not involving oriented stimuli at their preferred orientation, the failure on the authors' side to discuss in any way the caveats associated with their experimental design and simultaneously to draw the conclusions that they do and state them as strongly as they do is remarkable.

We agree with the reviewer that the features of the drifting gratings were irrelevant for the performance of the task by our animals; this was also clear from the task description in the original version. However, we hypothesized that despite this irrelevance of stimulus feature specifics, correct stimulus detection would rely on general population phenomena that also influence coding fidelity of irrelevant stimulus features. To address the issue the reviewer raises here we performed the analyses described in the original version; we investigated whether there was a correlation between stimulus feature (contrast and orientation) representation fidelity by the recorded neurons and behavioral response (Figure 4—figure supplement 2a). Although stimulus feature representation was irrelevant for the task the mice had to perform, we did find a behavioral correlate. We interpret these results as suggesting that the neural mechanisms governing performance in our behavioral task are also reflected in changes in population representation of irrelevant stimulus features. Although we consistently used “stimulus detection” rather than “orientation discrimination” throughout the manuscript, this line of reasoning was indeed insufficiently explained. We changed the text in the relevant section to now state that the stimulus features used for decoding (i.e. orientation) were indeed in fact irrelevant for the mouse to perform the task (subsection “Heterogeneity predicts reaction time”, third paragraph). However, note that during the inter-trial interval the screen’s background was isoluminant grey with the drifting gratings. Although not gamma-corrected, the grey background luminance was therefore in-between the perceived brightness of the black and white bars of the drifting grating stimuli.

Performing similar analyses as we did, but on a different behavioral task where mice have to discriminate orientations, would probably also yield interesting results. However, such an experiment addresses a different question than we focus on in our current manuscript. We believe that the observation that visual detection is correlated with increased fidelity of the neural representation of stimulus features, despite these features being irrelevant to the task, is interesting by itself. In particular, it suggests that sensory detection is coupled to better features representation than non-detection, even if that feature is task- irrelevant. This observation could not have been made if we had performed a similar experiment with an orientation discrimination task. We have added a similar comment as described here to the manuscript’s Discussion section (subsection “Comparison with other studies and neural interpretation of heterogeneity”, first paragraph).

4) The heterogeneity measure, the sum of pairwise absolute z-score differences, does not correspond to any normal usage of the word heterogeneity and is never adequately justified. For example, if all neurons respond to a given stimulus with the same fluorescence increase, the heterogeneity of that stimulus will not be zero but will depend on their responses to other stimuli. Even a trial that elicits no fluorescence change in any neuron the heterogeneity will not be zero. Since the measure is based on z-scores, it will amplify fluorescence noise in neurons that are less frequently active so that for sparse activity noise can dominate the measure, but this issue is never discussed. While it does indeed seem to correlate better than some other measures with behavior, the manuscript does not adequately explain how this measure was calculated and in any case this measure would not tell us what is going on the brain.

The definition of heterogeneity as the sum of pairwise differences in neuronal response might indeed come across as a narrow and specific form of heterogeneity, although its calculation was described in detail in the original version of the manuscript. As we have noted above in response to the reviewer’s first comment, some details may have been described somewhat unclearly, which we have rectified in the revised version (subsections “Response dissimilarity within neuronal populations correlates with detection” and “Predictability of hit-modulation by consistent neuronal responses and whole-population fluctuations”).

To address (among other things) the reviewer’s concerns about the specificity of our computational analysis, its relation to a more common definition of “heterogeneity” and the implication of our results for the functional properties of cortical circuits, we have added extra analyses using an alternative, more population-centered definition of heterogeneity (Figure 7, Results subsection “Analysis of heterogeneity in multidimensional space”). This alternative definition of heterogeneity is based on the location of the population activity in multidimensional neural response space, where heterogeneity is orthogonal to the main axis along which the mean population response changes (see the aforementioned subsection and subsections “Analysis of multidimensional inter-trial distance in neural activity” and “Removal of mean and heterogeneity, and subsequent hit/miss decoding”). We found a high trial-by-trial correspondence between both forms of heterogeneity, supporting the idea that the effects we observed in our data are robust and do not depend strongly on the precise computational definition of heterogeneity.

Regarding the reviewer’s comment on z-scoring of neural activity across trials, and the resulting non-zero heterogeneity arising from zero change in actual fluorescence, we would like to note that this is an effect of z-scoring neuronal activity and not of our pairwise-based definition of heterogeneity. The reviewer’s issue therefore comes down to whether z-scoring of neuronal responses is an acceptable practice. As the ability to separate two classes (which can also be the absence and presence of a stimulus) depends on the relative distance in standard deviations between the two distributions representing the responses to their respective stimulus conditions, z-scoring is a natural and widely used way to study information content in neuronal spike trains. As the reviewer noted, this indeed changes neuronal output properties at face value, because it is a non-linear transformation. For sparsely active neurons it will enhance the relative variability in z-score values for trials where the neuron shows low activity, but it will also similarly enhance the relative neural response when the neuron is in fact active. As highly active neurons (i.e. non-sparsely responding) are likely to also be highly variable (i.e. neurons generally have Fano factors higher than one), neural signals can often be better separated using normalized neuronal activity rather than absolute spiking rates or calcium fluorescence dF/F0 (Baddeley et al. 1997 as cited (subsection “Response dissimilarity within neuronal populations correlates with detection”, third paragraph); Montijn, Vinck, and Pennartz 2014). We agree with the reviewer that neuronal response variability and normalization procedures thereof are important and interesting topics to discuss, but we believe this is beyond the scope of our current manuscript (also considering the already substantial size of the manuscript). We trust that the clarifications of the heterogeneity calculation in combination with the additional analyses in the revised manuscript (Figure 7) will sufficiently address the reviewer’s concerns and better explain how to interpret heterogeneity, and its relevance for the brain.

5) The alternative measure "instantaneous Pearson correlations" suffers from the same problems as "heterogeneity." It is improperly named as it is not a Pearson correlation. Time varying correlation measures already exist and should be mentioned; they are generally based on sliding windows (e.g. "Time-varying correlation coefficients estimation and its application to dynamic connectivity analysis of fMRI" Fu et al. 2013 or "The sliding window correlation procedure for detecting hidden correlations: existence of behavioral subgroups illustrated with aged rats" Schulz and Huston 2002).

The initial name of “instantaneous Pearson correlation” might have been interpreted as being somewhat inaccurate, because these correlations are indeed not classical Pearson correlations, but are only computationally similar. We have therefore renamed them to “instantaneous Pearson-like correlations” in the revised manuscript. We believe this name is warranted, because they are instantaneous and share all underlying computational steps with Pearson correlations, except the last; they are both pairwise and rely on the relative distance from the mean in standard deviations. As the reviewer suggested, we have also added a sliding-window correlation analysis to our comparison of different metrics (Figure 4, Figure 3—figure supplement 2, subsection “Instantaneous Pearson-like correlations and sliding window Pearson correlations”). These correlations showed a similar performance as other metrics, and were significantly poorer at predicting reaction times than heterogeneity (Figure 4).

6) The nature of the decoder used (subsection “Heterogeneity predicts reaction time”) is never explained in the main text or Methods. The extremely convoluted use of a similarity metric and p-value based on comparison to randomly shuffled data (Figure 4e) to claim that the decoder and the animal behave similarly is not a clear and honest presentation of results. The similarity metric was not explained in the Methods. There is nothing to support the statement that "the performance as a function of contrast was strikingly similar to the animals' actual behavioral performance."

In the original version of the manuscript we explained the basic decoding procedure and referenced to one of our previously published papers describing the nature of the decoder in extended detail (Montijn, Vinck, and Pennartz 2014). In the original version we also explained that the similarity metric was defined as the “Pearson correlation over contrasts” and was compared to a distribution of correlations after randomly shuffling over contrasts. We therefore believe our original statement was valid, honest and supported by our analyses, but we recognize the reviewer’s wish for further information on the specifics of the analyses. We have therefore revised the manuscript to explain in more detail our decoding procedure (subsection “Decoding of stimulus presence”), and have added an additional chi-square analysis of the trial-by-trial correspondence between the decoder’s output and the animals’ response (Figure 4—figure supplement 2J, subsection “Heterogeneity predicts reaction time”, third paragraph and, subsection “Decoding of stimulus presence”, first paragraph). We have removed “strikingly” from the revised manuscript, as we agree with the reviewer this is somewhat subjective.

7) The assertion, in the Introduction, that "a widely held assumption in computational models of vision is that neurons in distributed cortical architectures have relatively fixed roles in information coding" is a straw-man argument. The authors do not adequately characterize what this assumption of "fixed roles" means, and also fail to characterize the diverse set of existing theories and conjectures about how the visual system may function.

We agree with the reviewer that the statement in the original version of our manuscript was insufficiently clear. We have edited the text to more accurately reflect our line of reasoning – that while the stimulus features represented by neurons are generally stable across time, neural modulations that regulate or affect whether physically identical stimuli are perceived may be more temporally dynamic (Introduction, last paragraph). As we also mention below, we have revised the manuscript to now refer to “population pattern consistencies” rather than “ensemble reoccurrences”.

8) We need to see much more raw data so as to evaluate data quality. In particular, we should see supplementary movies showing simultaneous raw, unprocessed imaging data, behavior, and "heterogeneity" for ~10 consecutive trials.

We have included a movie (Video 1) showing fluorescence data, the presented stimulus, licking responses, heterogeneity and mean population dF/F0 with the revised manuscript. Note that x-y shifts and z-drifts are small, both preceding and during stimulus presentation periods. During reward periods (after stimulus offsets) these shifts are more visible, but because we only take neural responses up to the first licking response, these spatial instabilities do not influence our neural data (see also below for a more comprehensive discussion of potential confounds related to z-shifts). Several neurons can be seen by eye to respond with increased fluorescence to visual stimulation.

9) The very large responses of some neurons with nearly 100% DF/F in Figure 1d don't seem to match the very modest DF/F of 4% over "preferred populations" in Figure 2d. Are the data in Figure 1 not representative of the full dataset? Or is the time window for averaging each trial's responses perhaps too long? The presentation, figure and analyses are unclear.

As can be seen in Figure 2a, in many trials the animal responds to the stimulus before the neuronal calcium response reaches maximum intensity. Moreover, because it takes time for the initial neuronal response to emerge in V1, and OGB’s rise time is a couple dozen milliseconds, the first couple of imaging frames yield baseline dF/F0 values (Chen et al. 2013). Averaging over only these early frames up to the animal’s response, where dF/F0 has not yet reached maximal intensity, therefore results in mean trial dF/F0 values that are much lower than the peak responses visible in the traces. As can also be seen in Figure 1d and 2a, during most trials some neurons respond vigorously, while others show very little change in dF/F0 (even within preferred populations), thereby further lowering the mean dF/F0 averaged across neurons.

10) The first stated aim is to ask: “does visual detection correlate with mean visual response strength or other metrics?". This may be of interest if one could determine for certain that the response strength was being determined for the neurons really involved in the detection/perception required by the task. But why should we care what L2/3 is doing during this task, when it may not even be involved in generating the behavioral response?

Although we have not performed experiments to test the causal involvement of mouse V1 in our specific task and setup, it has been reported before that (as we also mention in the first paragraph of the Introduction) mouse primary visual cortex is used to detect both orientation and contrast changes, as shown by optogenetic silencing of mostly superficial layers in V1 during behavioral task performance (Glickfeld et al., 2013). For our experiment stimulus orientation was irrelevant for obtaining reward, but the contrast-dependent stimulus detection in our design is similar to the one employed by Glickfeld and colleagues in their optogenetic study. It may therefore be reasonably assumed that the neuronal populations we recorded in L2/3 of mouse are causally involved in generating the behavioral response to visual stimuli with different contrasts. However, we agree with the reviewer that we were insufficiently clear in stating that our results only pertain to L2/3 neurons in mouse V1. We have changed the text in the manuscript so that it more accurately reflects the scope of our results (subsection “Possible cortical layer specificity of poor correlation of mean population responses “and throughout manuscript).

11) The authors assert in the Introduction that "specific ensemble activation patterns reoccur across temporally spaced trials in association with hit responses, but not when the animal fails to report a stimulus." I do not understand how, on the basis of the data presented in Figure 6 and the manuscript text associated with it, that this conclusion can be drawn. The authors state: "We again split the data into miss, fast and slow response trials, and computed the correlations between response patterns from different trials separately for preferred and non-preferred neuronal populations…" What response patterns are being correlated? The Methods states that the "mean inter-trial correlations over animals" was compared. I find the link between this measure and the conventional definition of ensemble tenuous at best. Further, the calculated correlation coefficients are very low (<0.12), which does not support well the claim made above.

In the original version when using “ensemble”, we meant to refer to the population of neurons we measured, conforming to the common use of the term "ensemble recordings" in the literature, but we agree that within certain disciplines “ensemble” would imply a stronger correlation of activation, similar to what “assembly” would entail. In the revised version of the manuscript we therefore now state that “neuronal populations show consistencies in activation patterns” (Introduction). Moreover, we now note that because we performed this analysis separately for preferred and non-preferred neurons, the low correlation values can be explained by a removal of much of the orientation-based signal in population responses (subsection “Population activation pattern consistency”). To further validate our results, we also performed as extra control a shuffling procedure. This control showed significantly lower correlation values than the unshuffled data for slow and fast, but not for miss trials (Figure 6).

12) The authors describe their method for assessing the extent of slow drift in the z-plane, which they quantize into 10μm bins. It is unclear what additional effects this may have on the measured Ca2 -transients, something that would be best determined empirically using simultaneous electrophysiology. More importantly, fast shifts in the z-plane are a considerably larger problem, and these would be anticipated as the animal changes its posture or shifts fore- or hindlimb. This sort of "fidgeting" is commonly observed in advance of a rodent making a behavioural response. How the authors measured these postural adjustments is not clear, neither is the effect that these movements have on the activity recorded. It is certainly conceivable that a z-shift could move the focal plane further inside some neurons and further outside others, thereby increasing "heterogeneity."

Although having addressed slow drifts in the original version of the manuscript, we had indeed not performed any quantification of fast shifts in z-plane, which – as the reviewer mentions – could present a major confound to our results. To address this issue in the revised version, we computed the depth in z-plane for each imaging frame. As we mention in the revised version of our manuscript, z-depth was quite stable (shifts rarely exceeded 1-2 microns) and pre-response z-shifts were not observable (subsections “Potential confounds” and “Z-drift quantification and recording stability analysis”). Some increase in z-shifts was visible following a behavioral hit response (Figure 1—figure supplement L-O), but these epochs were not included in any of our analyses, and can therefore not influence our results.

13) Previous multiphoton Ca2 -imaging studies have shown that correcting xy-shifts uniformly across the whole image is not sufficient for motion correction in awake animals (see Dombeck et al. 2007, Greenberg and Kerr 2009). As described above, motion-associated artefacts resulting from the fidgeting of the animal around a response are not quantified and potentially important.

We agree that motion artifacts can be a profound problem in calcium imaging experiments in awake animals. The first paper the reviewer mentions (Dombeck et al., 2007) shows that image warping within single frames occurs only during running, which limits the potential confound to running episodes. Nevertheless, within-frame image warping could indeed prove to be quite problematic when recording fluorescence data during locomotion. However, we believe that the noise levels induced by within-frame warping of the acquisition image as reported by Dombeck et al. (2007) and Greenberg and Kerr (2009) are most likely much higher than for our data sets for the following reasons. Dombeck et al. (2007) quantify the level of noise artifacts in an experimental setup where the mouse is running on a Styrofoam ball, rather than sitting on a treadmill. Mice placed on Styrofoam balls tend to run spontaneously because they are placed on a non-level surface, while all of our animals were trained for several months on a level-surface treadmill. Locomotion by our animals generally became less pronounced over several months of training, and had in all animals become quite rare during calcium imaging recordings.

More importantly, both studies mentioned by the reviewer used frame acquisition rates much lower than in our study (4 or 8 Hz by Dombeck et al., 2007 and 10.4 Hz by Greenberg and Kerr, 2009 vs. 25.4 Hz in our study). Within-frame image warping depends directly on the frame acquisition speed, because for low acquisition speeds there is more time for spatial movements in the brain to distort the image. As acquisition speed increases, brain tissue movement leads to displacement between frames rather than warping within frames. Such between-frame image displacement can still greatly influence data quality if left uncorrected, but these movements can be corrected using affine x-y registration algorithms, as we have done in our study (see our response to the previous comment for issues related to z-shifts and potential fidgeting behavior). We would also like to note that Greenberg and Kerr performed experiments on rats, where the dura was removed, which probably leads to significantly larger motion artifacts than in a mouse brain with intact dura. Still, these arguments provide only circumstantial proof that our data would show fewer and less strong motion artifacts. Therefore, to test for contamination by running-induced artifacts, we performed control analyses where we removed all trials during which the animals were running and found that this did not influence our results (Figure 2—figure supplement 1g–h). Therefore the within-frame image warping artifacts the reviewer mentions are unlikely to influence our data to a significant degree (see also Video 1).

14) The caveat that the only neurons from which recordings were made were superficial neurons ought to have been explicitly discussed. Is it not conceivable that the correlation of mean activity with perception might be significantly higher for neurons in deeper layers?

We agree with the reviewer that this is an important issue. We have corrected this flaw in the revised version and now explicitly discuss that our observations relate only to L2/3, and that L5 might show a very different correlation with the detection of visual stimuli (subsection “Possible cortical layer specificity of poor correlation of mean population responses”).

15) How did the authors control for possible ocular torsion (twisting of the eye and retina round the optic axis) during the experiment? This would totally invalidate all analyses based on orientation if present but not accounted for.

Ocular torsion in mice has been reported to be less than in primates and humans, and rarely exceeds more than a couple of degrees (Mezey et al. 2004 (Vision Research 44); Goonetilleke et al. 2008 (Vision Research 48); Migliaccio, Meierhofer, and Santina 2010 (Experimental Brain Research 210)). We presented drifting gratings of 8 different directions to the mice, and thus they were spaced around the unit circle in steps of 45 degrees. The inter-stimulus difference of 45 degrees is therefore an order of magnitude higher than reported ocular torsions in mice. As such, it would have little impact on our analyses based on orientation. In any case, this potential confound would not influence our main result (the hit/miss and RT prediction of heterogeneity), because this does not depend on analyzing different orientations separately. Potential ocular torsion effects would influence our analyses based on explicit differential treatment of stimulus orientations (for example, the difference between preferred and non-preferred populations in Figure 3e,f, and Figure 6), where its effect would not be to totally invalidate them, but to inject some extra noise into the results. We would also like to note that it is very uncommon to record ocular torsion in awake rodents, and the general consensus is that no corrections are necessary (e.g. Dombeck et al. 2007; Greenberg and Kerr 2009, as cited by the reviewer, also do not report any data on or potential confounds related to ocular torsion).

Reviewer #2:

1) The concern is about the animal's behavior. The performances shown in Figure 1c,e are relatively low at 100% contrast; in many cases slightly different than the ones at 32%. The presence of errors at full contrast imply mechanisms other than visual detection contributing to the animal's response variability that will potentially contaminate all other conditions as well. We understand the reviewer’s concerns about the animal’s performance, however, we believe that the behavioral mechanisms contributing to the animal’s decision to respond to 100% contrast visual stimuli are not reflected to a significant degree in the physiological hit/miss differences we find for test contrasts. As we mention in the original version of the manuscript, if V1 population dF/F0 or heterogeneity would be influenced by these factors, we would expect to also find a differential effect between hits and misses for 100% stimulus contrast trials. Our data show that this is not the case (Figure 3 and Figure 2—figure supplement 1a); hit/miss differences in dF/F0 and heterogeneity are largest for intermediate contrasts. The animals’ suboptimal performance most likely dilutes the actual hit/miss difference, and therefore better behavioral performance would have probably led to larger, rather than smaller, effect sizes. We now also describe these caveats in the revised version of our manuscript (subsection “Potential confounds”).

2) Regarding the correlation between heterogeneity and behavior, the authors claim that "…the increased spread of neuronal response strengths within a population determine the behavioral accuracy". This reviewer is concerned about how strong is the change in heterogeneity between hit and misses to support this claim. In his opinion the authors should explicitly quantify how predictive is the animal's decision from this population measure, on a trial-by-trial basis.

We agree with the reviewer that providing a single-trial based analysis of linear separation of hits and misses would improve the quality of our manuscript. We have performed a single-trial based analysis of the predictability of the behavioral response with either mean dF/F0 or heterogeneity using a receiver operating characteristic (ROC) analysis (Figure 3f–h;; subsections “Response dissimilarity within neuronal populations correlates with detection”, last paragraph and “Receiver Operating Characteristic (ROC) analysis of hit/miss separabilit”). As we mention in the manuscript, single trial linear predictions were significantly above chance for both dF/F0 and heterogeneity, and heterogeneity performed significantly better than dF/F0.

3) He finds very interesting the fact that the measure of heterogeneity – but not the mean population response – correlates with detection. However, as far as he understands, this would be the case in any situation in which the detection of the stimulus is represented by a population code that is not merely an increase of activity of all neurons. The mean population response is only one particular projection of the population activity (let's say, described by the vector [1 11]). If detecting the stimulus activates the neural population in any other direction in neural space, this measure of heterogeneity will increase (because some neurons increase activity while others decrease). In particular, if detecting or not the stimulus modulates the population activity in a direction orthogonal to [1 1 … 1], the mean population response will not be affected (and won't correlate with the animal's behavior). His concern is that, if this is the case, it is not heterogeneity per se that is relevant, but the presence of complex population patterns of activity that are not visible at the level of the mean response. He thinks the authors should check if there is a population signal other than the mean response that correlates with the animal's decision.

We agree with the reviewer this is a very interesting question and thank him/her for this excellent suggestion. It has inspired us to perform extra analyses that we describe in the revised version of our manuscript (Figure 7, subsection “Analysis of heterogeneity in multidimensional space” and “Analysis of multidimensional inter-trial distance in neural activity”). In summary, in order to study multidimensional aspects of population responses and their relation to heterogeneity, we had to develop an alternative definition of multidimensional heterogeneity. We argue that within a multidimensional neural response space (where each neuron represents a single dimension), heterogeneity within a population response corresponds to the distance from this response to the main diagonal, which represents the gradient along which the mean of the response changes. Here, the mean and heterogeneity are two complementary features of population responses with gradients in orthogonal directions (Figure 7). We studied whether population responses were distributed symmetrically around the main diagonal (i.e. no complex multidimensional patterns are present, other than captured by heterogeneity) or rather asymmetrically, which would show that heterogeneity is most likely an epiphenomenon. Population responses showed a modest, but significant asymmetry, for hit as well as miss trials, but interestingly hits were more asymmetrically distributed than misses (Figure 7f). We provide further analyses where we removed either the mean or the heterogeneity of population responses (but preserved all other response structures) and quantify the effect of their removal on decoding the animal’s behavioral response. Removal of the mean had little effect, but removal of heterogeneity significantly impaired decoding performance (Figure 7g,h). We thus can confirm that heterogeneity is an important factor that is contributing more to the differentiation between detected and non-detected stimuli than the mean of the population response, but that other, more complex population response properties contribute even more to the detection of visual stimuli than heterogeneity. We describe these results in more detail in the revised manuscript (subsection “Analysis of heterogeneity in multidimensional space”).

4) The authors claim that ensemble patterns reoccur upon presentation of the same stimulus. However, inter-trial correlations of population responses are relatively low (~0.11). They should explain what value they take as a reference to validate this claim and why. Correlations could increase because of reasons other than reoccurring of the same activity pattern; a more detailed analysis is needed to support this claim.

In the revised manuscript we now describe in more detail the exact procedures of calculating the population activity pattern similarities, and note that the low correlations can be explained by the separate analysis of preferred and non-preferred populations, which removes pattern correlations due to stimulus tuning (subsection “Population activation pattern consistency”). Moreover, we performed an additional control analysis where we shuffled trials randomly across neurons, but kept the stimulus types intact. This way, we kept the stimulus response properties the same for each individual neuron, but destroyed temporal inter- relations between the activity of neurons. This control revealed that our analyses indeed have an inherent bias towards small, non-zero correlations. Interestingly, shuffled correlations were almost identical to correlations during miss trials, but were significantly lower than non-shuffled correlations during fast and slow responses (Figure 6c,d).

5) He believes it is necessary to explain why the authors chose this particular measure of spread in neural responses, as opposed to – arguably – more natural ones like the variance. If the variance does not correlate with behavior as much as heterogeneity does, then this might also be informative of the properties of the population code. A set of related statistics are examined in regard to reaction times (Figure 4c) but not in relation to the decision of the animal.

As we mentioned in response to this reviewer’s third comment, we have significantly extended our manuscript with an in-depth analysis of the nature of heterogeneity in neuronal population responses (Figure 7, subsection “Analysis of heterogeneity in multidimensional space”). As the reviewer requested, we have also performed similar analyses of effect size for hits and misses on other measures (Figure 3—figure supplement 2). These analyses showed that heterogeneity performed significantly better than mean-based measures, but was not statistically different from other metrics (such the variance or correlation-based measures) in distinguishing hits and misses. We edited the revised manuscript accordingly to more accurately reflect that heterogeneity is specifically better than other measures in explaining reaction time variability, but only better than mean-based metrics in hit/miss differentiation.

Reviewer #3:

Negative points:1) Preferred orientation and non-preferred orientation neurons are analyzed separately – this ignores potential interactions between neurons (subsection “Data processing”).

As we also mentioned in response to the first comment by reviewer 1, we analyzed preferred and non- preferred populations only separately for Figure 2c,d and Figure 6. We apologize for the apparently ambiguous explanation in the original version of the manuscript and have attempted to more clearly state in the revised version that we include all neurons in almost all analyses (subsection “Response dissimilarity within neuronal populations correlates with detection”, first, second and fourth paragraphs).

2) The preferred orientation neurons are selected using the mean dF/F0 value, however, the main result of the paper suggests that a different metric, heterogeneity, is more robust in capturing stimuli recognition; how will the analysis be affected if the same metric is used for pruning the neurons? (subsection “Calculation of preferred stimulus orientation”).

As heterogeneity is a population-based metric, it is ill-suited to be applied to single neurons. Only a jackknifing procedure would allow an estimation of the effect single neurons have on population heterogeneity on average. However, even in this case a neuron’s contribution to population heterogeneity would depend on the activation of other neurons (see Eqs. 2-4), and therefore does not represent a unique feature of single neurons. However, we recognize the reviewer’s concern about potentially different effects occurring in preferred and non-preferred populations. We have therefore performed an analysis of the hit/miss effect size in dF/F0 and heterogeneity for the preferred, non-preferred and whole populations (Figure 3f and Figure 3—figure supplement 1. Also note that Figure 4—figure supplement 2h shows that heterogeneity differences exist between hits and misses in all quintiles of neurons when sorted by relative activity, further supporting our results that behavioral correlates of heterogeneity are distributed among the entire population, including weakly active and non-preferred neurons.

3) As defined, heterogeneity seems a reasonable metric, however, it only considers pairwise relationships between neurons; a more holistic, group-level metric should be considered, since the goal of the analysis is to discover groups of neurons.

We agree with the reviewer that pairwise relationships between neurons are a narrow spectrum of interactions to consider. We have therefore added extensive new analyses to the revised version of the manuscript, including an alternative, 'holistic' definition of heterogeneity (Figure 7 and subsections “Analysis of heterogeneity in multidimensional space, “Analysis of multidimensional inter-trial distance in neural activity”, and “Removal of mean and heterogeneity, and subsequent hit/miss decoding”). As we note in the revised manuscript, our analyses suggest that the differentiation in heterogeneity between hits and misses is at least in part non-epiphenomenal, but that detection vs. non-detection differences also reside in neuronal population response patterns other than its mean or heterogeneity (subsections “Analysis of heterogeneity in multidimensional space” and “Multidimensional analysis: heterogeneity contributes to, but does not fully account for visual detection”).

4) Can you explain or cite the reasoning behind using the procedure in the subsection “Behavioral response predictability on single trial basis”, to compute a prediction? Can the model likelihood be used to make predictions instead?

The procedure the reviewer refers to erroneously stated in the original version that the calculation was based on the relative decoding probabilities of response types (miss, fast, or slow), while in fact it was already the relative likelihood of response types that determined the output. We have corrected this error in the revised version of the manuscript (subsection “Behavioral response predictability on single trial basis”). We would also like to note that in the revised version we have now performed an additional single-trial based hit/miss analysis using receiver operating characteristic (ROC) curves (Figure 3g,h). We hope this sufficiently addresses the reviewer’s concerns.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Reviewer #1:

Regarding the authors’ response to Reviewer #1, comment 10: The study by Glickfeld activated PV neurons in a 1mm diameter around the injection pipettes ~up to 1mm below the V1 surface and showed that this increased the threshold for detection of both orientation and contrast by the animals. I do not see the relationship between the author's response to my question and the question. Why is it reasonably assumed that L2/3 is involved in the task that is presented? We agree with the reviewer, as we also stated in our initial reply, that we cannot claim to have shown causal involvement of V1 L2/3 neurons in detecting stimuli. In our manuscript we therefore only discuss our results in terms of correlation, rather than causation. However, the results presented by Glickfeld and colleagues also show that their effects of optogenetic stimulation were stronger for superficial than deep layers (reduction of firing rate: -66.7% for units in superficial layers, vs. -33.9% for units in deep layers), as the channelrhodopsin-exciting blue light was shone unto the cortical surface and was therefore less intense when it reached deeper layers. Arguably, the reduction in behavioral performance could therefore be due mostly to the stronger neural effect in superficial layers. This is of course hypothetical, as Glickfeld et al. did not explicitly quantify the dependence of their behavioral effect on the reduction in firing rate in superficial vs. deep layers. However, we believe that correlational rather than causational research is still valuable in itself, as it may provide new insights and ideas on how the visual cortex could be involved in the detection of visual stimuli. Moreover, it has been reported that superficial layers (L2/3) in visual cortex correlate with the perception of visual stimuli (e.g. Ito and Gilbert 1999; van der Togt et al., 2006). In summary, although there is no proof yet for the causal involvement of L2/3 in visual detection, we believe these considerations offer sufficient justification for examining L2/3 population behavior. We have added references to these previous studies that show correlates in superficial layers to the revised manuscript (Introduction, last paragraph).

Regarding the authors’ response to Reviewer #1, comment 11: Please change the last sentence in the Abstract to reflect the changes in terminology by removing ensembles (see below). I’m not sure what “selective and dynamic neuronal ensembles” are. Please also rephrase the first paragraph of the Discussion, which suffers from the same issue.

From the Abstract:"Contrary to models relying on temporally stable networks or bulk-signaling, these results suggest that detection depends on transient activation of selective and dynamic neuronal ensembles."

We agree with the reviewer and have changed the corresponding sentence to “detection depends on transient differentiation in neuronal activity within cortical populations”, which more accurately reflects the main findings of our study (Abstract and Discussion, first paragraph). We have also removed our reference to ensemble formation from the Discussion.

Reviewer #2:

1) I had noted that the low performances at full contrast imply mechanisms other than visual detection contributing to the animal's decision (lack of motivation, for example). This means that test contrast trials are probably contaminated with a significant amount of trials (close to 50% for several animals) in which the animal actually detected the stimulus but didn't respond. The authors argue that heterogeneity does not reflect these other mechanisms because it's equal for both behavioral responses at full contrast. I agree with the argument and understand that the low performances might actually be diluting the effect reported in the paper. But I still would like to ask, does the distribution of heterogeneity in "No Resp" trials show any hint of bimodality, reflecting the 50% of trials in which the stimulus was in fact detected?

A coarse analysis of the distribution of heterogeneity values across miss trials during test contrasts shows that there might be a slight bimodality (or at least, a skewed distribution; see below). Supposing the distribution of heterogeneity values during miss trials is indeed bimodal, the effect size between the two underlying distributions (Cohen’s d) would be 0.9641 (difference in standard deviations between red and blue curves in Author response image 1). However, this is of course hypothetical, as the distribution shown may very well be unimodal. Given the current data, and considering that testing for bimodality is a notoriously difficult problem, it is therefore hard to draw a strong conclusion. Because this analysis gave mostly inconclusive results, we have not incorporated these findings in the manuscript.

https://doi.org/10.7554/eLife.10163.020

Article and author information

Author details

  1. Jorrit S Montijn

    Swammerdam Institute for Life Sciences, Center for Neuroscience, Faculty of Science, University of Amsterdam, Amsterdam, Netherlands
    Contribution
    JSM, Built the setup, performed the experiments and analyzed the data, designed the experiments and analyses, and wrote the paper
    For correspondence
    j.s.montijn@uva.nl
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0002-5621-090X
  2. Pieter M Goltstein

    1. Swammerdam Institute for Life Sciences, Center for Neuroscience, Faculty of Science, University of Amsterdam, Amsterdam, Netherlands
    2. Max Planck Institute of Neurobiology, Martinsried, Germany
    Contribution
    PMG, Built the setup
    Competing interests
    The authors declare that no competing interests exist.
  3. Cyriel MA Pennartz

    1. Swammerdam Institute for Life Sciences, Center for Neuroscience, Faculty of Science, University of Amsterdam, Amsterdam, Netherlands
    2. Research Priority Program Brain and Cognition, University of Amsterdam, Amsterdam, Netherlands
    Contribution
    CMAP, Designed the experiments and analyses, and wrote the paper
    For correspondence
    c.m.a.pennartz@uva.nl
    Competing interests
    The authors declare that no competing interests exist.

Funding

Nederlandse Organisatie voor Wetenschappelijk Onderzoek (Excellence grant for the Brain & Cognition project 433-09-208)

  • Cyriel MA Pennartz

European Commission (EU FP7-ICT grant 270108)

  • Cyriel MA Pennartz

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors thank Q Perrenoud, G Meijer, and M Vinck for feedback on earlier versions of the manuscript and fruitful discussions. They would also like to thank W Oldenhof, J Verharen, and L Forsman for assistance with training the animals.

Ethics

Animal experimentation: All experimental procedures were conducted with approval of the animal ethics committee of the University of Amsterdam (DED234). All animals were housed socially in enriched cages and received analgesia (buprenorfine) and anesthesia (isoflurane) during invasive operations to minimize suffering.

Reviewing Editor

  1. David C Van Essen, Reviewing Editor, Washington University in St Louis, United States

Publication history

  1. Received: July 16, 2015
  2. Accepted: December 6, 2015
  3. Accepted Manuscript published: December 8, 2015 (version 1)
  4. Version of Record published: January 21, 2016 (version 2)

Copyright

© 2015, Montijn et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,410
    Page views
  • 363
    Downloads
  • 3
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Scopus, Crossref.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)