A parameterfree statistical test for neuronal responsiveness
Abstract
Neurophysiological studies depend on a reliable quantification of whether and when a neuron responds to stimulation. Simple methods to determine responsiveness require arbitrary parameter choices, such as binning size, while more advanced modelbased methods require fitting and hyperparameter tuning. These parameter choices can change the results, which invites bad statistical practice and reduces the replicability. New recording techniques that yield increasingly large numbers of cells would benefit from a test for cellinclusion that requires no manual curation. Here, we present the parameterfree ZETAtest, which outperforms ttests, ANOVAs, and renewalprocessbased methods by including more cells at a similar falsepositive rate. We show that our procedure works across brain regions and recording techniques, including calcium imaging and Neuropixels data. Furthermore, in illustration of the method, we show in mouse visual cortex that (1) visuomotormismatch and spatial location are encoded by different neuronal subpopulations and (2) optogenetic stimulation of VIP cells leads to early inhibition and subsequent disinhibition.
Introduction
Many neuroscience studies rely on the analysis and visualization of neuronal spiking signals. Classical studies used manual curation during experiments to select cells for analysis (Hubel and Wiesel, 1959; Mountcastle, 1957), but this method cannot provide a statistically unbiased sample. Moreover, such manual curation is unsuitable for stateoftheart largescale recording techniques, such as Neuropixels and highdensity multielectrode arrays (Bartolo et al., 2020; Jun et al., 2017; Semedo et al., 2019; Steinmetz et al., 2019).
Despite the widespread application of neuronal responsiveness analyses, neuroscience currently lacks a standard practice for determining whether a neuron is responsive to an experimental stimulus or treatment (Mesa et al., 2021). Common approaches, such as comparing a neuron’s average spike rate during the presence and absence of a stimulus, can only detect meanrate modulated cells (Mazurek et al., 2014; Ringach et al., 2002). On the other hand, approaches such as computing a peristimulus time histogram (PSTH) and applying an ANOVA, require the a priori selection of an arbitrary binning window size (Palm et al., 1988). Choosing the wrong bin size reduces the test’s sensitivity, while optimizing from a range of window sizes creates a multiplecomparison problem. This lowers the approach’s statistical power when corrections are applied, or can even lead to (unintentional) ‘phacking’ if the results are not corrected (Head et al., 2015). Finally, while (pointprocess) modelbased approaches can circumvent many of the above problems (Kass et al., 2014), they still require the a priori selection, or tuning, of hyperparameters specific to the statistical properties of classes of cells, or even individual neurons. Many modelbased approaches are therefore not well suited to an unsupervised analysis of largescale data recorded with stateoftheart techniques.
To solve these problems, we developed a method that detects whether a cell is responsive to stimulation in a statistically robust way and avoids binning and parameter selection altogether. This method, which we call ZETA (Zenith of Eventbased Timelocked Anomalies), either outperformed or matched that of ttests, ANOVAs and pointprocessbased methods in all conditions tested. Building upon this framework, we also present a procedure to visualize instantaneous spiking rates without the need for binning, and show how this can be used to estimate peakactivity latencies with submillisecond accuracy. We apply these methods to transientdetected twophoton calcium imaging data from the visual cortex of mice traversing a virtual linear track and find that visuomotor mismatch signals and spatial location are encoded by different V1 neuronal subpopulations. Finally, we apply our approach to Allen Brain Institute Neuropixels data and show that optogenetic stimulation of VIPexpressing cells in mouse visual cortex has a separable early inhibitory and late disinhibitory effect on the local neural circuit. We anticipate that the ZETAtest will be a useful resource for a wide range of applications across various disciplines.
Results
ZETA: Zenith of eventbased timelocked anomalies
A common procedure in preprocessing neural data is removing cells that are not responsive to an experimental stimulus. Many experimenters determine the ‘stimulus responsiveness’ of a cell by comparing its average spiking rate during the presentation and absence of a stimulus (see Figure 1A–C for an example V1 cell). This procedure will therefore remove neurons that show no response, but has the risk of also removing neurons that show a strong, but complex timelocked response to stimuli. To remedy this shortcoming, we developed a binningfree method for determining whether a neuron shows any timelocked modulation. We call this statistical test ZETA for Zenith of Eventbased Timelocked Anomalies (Figure 1D–F). It represents whether a neuron’s spike train could be observed by chance, if it were not responding to an experimenter’s event of interest: for example, the presentation of a visual stimulus, the onset of optogenetic stimulation, or a selfgenerated variable, such as an animal’s location on a track.
ZETA is calculated on a single cell by performing the following steps. First, we align all spikes to stimulus onsets, as when making a raster plot (Figure 1A). Pooling all spikes across trials, we obtain a single vector of spike times relative to stimulus onset, and calculate the cumulative distribution as a function of time (Figure 1D). The deviation of this curve from a linear baseline represents whether the neuron has a higher or lower spiking density relative to a nonmodulated spiking rate (Figure 1E, blue curve). We compare this pattern to the likelihood of observing it by chance by running multiple bootstraps by jittering stimulusonset times to generate a null hypothesis distribution (Figure 1E, gray curves). After scaling the experimentally observed curve to the variation in the null hypothesis distribution, we use it to obtain a pvalue corresponding to the Zenith of Eventbased Timelocked Anomalies. Low ZETAtest pvalues indicate that the neuron’s firing pattern is statistically unlikely to be observed if the neuron is not modulated by the event of interest.
The ZETAtest bears similarities to a meansubtracted KolmogorovSmirnov test applied to a renewal process model (see methods). In the method section ‘ZETA and renewalprocess models’ we show how the ZETAtest gains robustness to violations of the assumptions underlying renewal processes and outperforms alternative approaches. In short, the ZETAtest’s main difference from other approaches used to infer a neuron’s stimulus responsiveness, is that our test makes no a priori assumptions about the underlying distribution of temporal modulations and is binningfree. It can therefore detect both longtimescale changes in mean firing rate, as well as shorttimescale stimuluslocked bursts or lapses of activity at any point in time relative to stimulus onset.
Benchmarking the ZETAtest
To investigate whether the ZETAtest includes more cells recorded in mouse visual cortex in response to a drifting grating, while still retaining a 5 % falsepositive rate at a significance level of α = 0.05, we used a benchmarking test comparing onsetjittered and nonjittered data (Figure 2). In the nonjittered case, we compared the inclusion rate of the ZETAtest, as described above, to that of a meanrate ttest. For the ttest, we calculated the average spiking rate of a cell during stimulation (0 s – 1 s after onset) and after stimulation (1 s – 1.5 s), and performed a paired ttest over trial repetitions (Figure 1C). This showed that cells included with a ttest were almost exclusively a subset of the cells detected with ZETA (see Figure 2B for all V1 cells recorded with Neuropixels). In other words, if a cell is detected as being visually responsive with a ttest, it is almost guaranteed to also be detected by the ZETAtest. In addition to these cells, the ZETAtest also includes cells that were not registered by a ttest. Although many varieties exist, Figure 2C shows an example cell that is detected by both ttests and ZETA (top, sustained change in firing rate), and an example only detected by ZETA (bottom, balanced on/off peaks). In general, any cell lacking a sustained change but displaying a temporally nonuniform spiking distribution would be picked up by the ZETAtest but not a ttest: for example, cells with a sharp but narrow onset peak and variable baseline activity, cells with a balanced on/off response, or oscillatory cells that phasereset on stimulus onset.
Next, we ran the same tests again, but now on data where we randomly jittered the stimulus onset times between –τ and +τ, where τ is median onsettoonset duration. This procedure preserves the properties of a neuron’s spike train, but removes locking of the responses to the stimulus (Figure 2D). If the ZETAtest simply always gave low pvalues, then this would result in a high falsepositive rate as many neurons would still be included. In contrast, the falsepositive rate of the ZETAtest was generally low, and consistent with that expected for a significance level of α = 0.05 (Figure 2E–F).
Sensitivity of ZETA is superior to meanrate Ttests
We performed this benchmark for singlecell activity obtained from n = 12 combinations of various visual regions (V1, AM, PM, LGN, SC, LP, NOT, APN, Retina) using multiple techniques (Neuropixels, n = 8; Neuronexus, n = 2; GCaMP6, n = 1; pMEA, n = 1), in response to light flashes (retina, n = 1) or drifting gratings (all others, n = 11). Under all conditions, the inclusion rate using ZETAtests was higher than using ttests: at a significance level of α = 0.05, the inclusion rate for the ZETAtests was 79 % and for meanrate ttests was 64 %; ttest of ZETA vs meanrate ttest inclusion rates: n = 12 data sets, p = 2.8 × 10^{–7} (Figures 1G and 3, Figure 3—figure supplement 1). This means that the ZETAtest includes 42 % of the cells that were not included by a ttest. A significance level of α = 0.05 is rather arbitrary, so we also performed a receiver operating characteristic (ROC) analysis, where we investigated the number of inclusions as a function of the number of false positives (Figure 3). The ROC’s summary statistic is the area under the curve (AUC); one being a perfect discriminator. Again, we found that the ZETAtest showed a higher statistical sensitivity (ZETAtest, mean AUC = 0.914) than a meanrate ttest (ttest, mean AUC = 0.843), and that this difference was statistically significant (paired ttest, n = 12, p = 4.9 x 10^{–6}).
We also benchmarked various versions of tests derived from the theoretical framework of renewal and Poisson process models. None of these models reached the statistical power and computational efficiency of the ZETAtest (Figure 3—figure supplements 2 and 3), but they do provide an attractive mathematical connection to a more widely studied class of models. We have therefore provided more information on the mathematical relationship between this class of models and the ZETAtest in the method section.
The percentage of visually responsive cells detected by the ZETAtest is higher than typically reported. For example, using gratings that only differed by orientation, we found 93.3 % of all V1 cells to be visuallymodulated. Even more striking was that the lower bound of the binomial 95%confidence interval was at 89.7 %. This lowerbound is higher than the responsiveness previously reported in many studies, including our own (Montijn et al., 2016a; Niell and Stryker, 2008; Shuler and Bear, 2006; Steinmetz et al., 2019), exemplifying the advantage of the ZETAtest.
The data used so far for benchmarking contains more stimulus repetitions than commonly used in neuroscience. This raises the possibility that the ZETAtest is only advantageous when large numbers of trials are used. To test this, we randomly subsampled the number of trials included in the analysis, and repeated our benchmark. As the results for V1 in Figure 3—figure supplement 4A show, the ZETAtest consistently included more than the ttest, regardless of the number of trials.
We hypothesized that the ttest’s worse performance might result from pooling the responses to different orientations into one group. Therefore, we repeated the ttest’s benchmark after first splitting the trials into 24 groups corresponding to the 24 directions we presented. A neuron was included if the spiking rate during stimulus presentation was significantly different from its prestimulus baseline rate in any group, after applying a Bonferroni correction. However, this procedure reduced the ttests’ performance (Figure 3—figure supplement 4A, middle panel).
We noticed that the ttest’s false positive rate was rather low after Bonferroni corrections. To test whether we overcorrected the ttest, we removed the multiplecomparison correction (Figure 3—figure supplement 4A, righthand panel). In this case, the ttest false positive rate increased to >50%, while its inclusion rate (89.9%) remained lower than that of the ZETAtest (95.0%). Finally, we investigated whether the ttest’s performance was hampered by including the immediate offresponse after stimulus offset in the baseline period. We reran the above analyses, but now limited the baseline to the 300 ms preceding the stimulus onset. As can be seen in Figure 3—figure supplement 4B, this did not improve the ttest’s performance. In summary, the ttest is at its most sensitive when using the full 500 ms epoch inbetween stimulus presentations as baseline period and when pooling data across all orientations.
The sensitivity of ZETA is superior to an ANOVA with an optimal bin width
Meanrate ttests are common in neuroscientific analysis, but it could be argued that this is somewhat of a strawman to use as baseline performance. An alternative is to construct a peristimulus time histogram (PSTH) and run a oneway ANOVA across bins to test a neuron’s responsiveness to a particular stimulus. However, because this requires picking a bin width, this can lead to arbitrary choices based on the experimenter’s visual inspection of the data, which might increase false positive rates. A better solution is to use one of the various methods to estimate optimal widths for binning (Freedman and Diaconis, 1981; Scott, 2009; Shimazaki and Shinomoto, 2007). We therefore calculated the optimal bin width using the Shimazaki & Shinomoto method, which was specifically designed for building a PSTH, and repeated the benchmark described above, but now testing the responsiveness of neurons using an ANOVA (see Materials and methods). This ANOVA procedure performed markedly better than a meanrate ttest (Figures 1G and 3). However, we found that it still showed an inclusion rate (at α = 0.05) that was lower than using the ZETAtest (ANOVA mean inclusion rate = 71 %; ZETAtest inclusion = 79%, paired ttest, n = 12, p = 0.0014). Importantly, this difference could not be explained by different levels of false positives, as an ROC analysis also showed a superior statistical sensitivity for the ZETAtest: mean ANOVAAUC = 0.880, mean ZETAAUC = 0.914, paired ttest, n = 12, p = 7.7 × 10^{–4}.
Taken together, the results of comparing the ZETAtest to ttests, ANOVAs, and renewalprocess based tests show that the binless ZETAtest has a statistical sensitivity superior to all alternative tests, regardless of number of trial repetitions, brain region where the data were recorded, or specifics of the data preparation.
ZETAtest in the absence of short peaks of activity
Having established that the ZETAtest performs well in real neural data, we looked for conditions under which the ZETAtest fails. We know that the ttest has access to information that the ZETAtest does not: the spike times used by the ZETAtest are flattened over trials, while the ttest uses the variability across trials. Therefore, when the variability of mean activity across trials is low, but the variability of spike times within a trial is high, the ZETAtest could perform worse than a ttest.
To test this hypothesis, we simulated Poissonspiking artificial neurons (see Materials and methods) where we changed two variables: (1) we varied the difference in spiking rate (dHz) between stimulus and baseline (both 1 s) from 0 to 1 Hz with a background rate of 1 Hz and (2) we varied the period over which the neuron was active during stimulus presentation (T_{r}) from 0.5 to 1.0 s while keeping the total spike count constant. In effect, neurons with T_{r} <1.0 show a triphasic response, first increasing their spiking rate, then showing a cessation of activity, and finally returning to baseline. As before, we compared the tests’ performance using an ROC analysis. The ROC’s area under the curve (AUC) is conveniently similar to a WilcoxonMannWhitney statistic and can be used to directly determine which of two procedures is more sensitive (Calders and Jaroszewicz, 2007). As expected (Figure 4A), the ttest’s AUC depends only on dHz and fails when dHz = 0, while the ZETAtest also discriminates well when there is no difference in spike counts, but there is a consistent temporal discontinuity (dHz = 0, T_{r} <1.0). Interestingly, while the ttest performs better than the ZETAtest when T_{r} = 1.0, the ZETAtest also still performs reasonably well (Figure 4B).
While this scenario is important to consider from a theoretical perspective, pure Poissonspiking neurons probably do not exist in the brain. We therefore proceeded with a (somewhat) more biologically plausible simulation of bursting cells, where their bursting probability is orientationtuned (Figure 4C, see Materials ad methods and Table 1). These neurons show no consistent peaks or troughs of activity (Figure 4D and E). However, the highly variable spike counts this bursting produces result in the ZETAtest outperforming the ttest (AUC, ZETA = 0.941, ttest = 0.902, ztest, p = 4.1 × 10^{–11}). To conclude, even in hypothetical scenarios that we specifically constructed to investigate the limits of the ZETAtest, it performs close to the ttest (Figure 4B4). Importantly, in the case of strongly bursting cells (Figure 4F), the ZETAtest clearly outperforms the ttest.
Neuronal responsiveness to natural movies
Next, we asked how the performance of the ZETAtest compares to that of an ANOVA, in a case where there is no a priori knowledge regarding the neuronal response profile, but where the stimulus itself provides a natural timescale that may be used for binning neuronal responses. We therefore determined the responsiveness of neurons to natural movies, using either the ZETAtest, or a oneway ANOVA across bins, repeated for different bin sizes (i.e. timescales). Singlecell data were recorded using Neuropixels in seven visual brain areas of 3 mice, while the animals were presented with repetitions of 20 s long natural movies (Figure 5A).
To ensure that the ANOVA approach could detect short bursts of activity, as well as long timescale wholescene modulations of firing rates, we chose a wide range of bin sizes. We picked a single movie frame duration (0.0167 s) as the centre point, and used bins from 1/512th up to 512 movie frames, spaced equidistantly on a base2 logarithmic scale in 18 steps. For each area, we pooled all cells, and calculated the arealevel inclusion rate with either the timescalefree ZETAtest or Bonferronicorrected ANOVAs at different timescales (Figure 5B). Bin sizes of 33–267 ms did not differ significantly from the ZETAtest’s inclusion rate (FDRcorrected paired ttests, p > 0.05, n = 7 areas), while the ZETAtest’s inclusion rate was higher than with an ANOVA for all short ( < 33 ms) and long ( > 267 ms) bin durations (p < 0.05).
The above approaches give some insight into which bin sizes best capture the dominant temporal components in neuronal responses in our data, but a more powerful approach might be to classify a neuron as “included” whenever any of the 19 ANOVAs reached significance (i.e. p < α). Repeating this procedure for various significance levels α on the interval (0,1) produces an ROC curve (Figure 5C). Using this approach, we found there was no significant difference in performance between a set of ANOVAs and the ZETAtest (ztest, p = 0.554): ANOVAs AUC = 0.798 ± 0.010, ZETAtest AUC = 0.792 ± 0.011 (mean ± sd). Overall, the above results show that, under these conditions, the binless and timescalefree ZETAtest performs as well as an aggregate set of ANOVAs binned at various timescales.
Instantaneous firing rates (IFRs) for visualization and onset latency detection
The above paragraphs have shown that the ZETAtest is a sensitive statistical tool to detect whether neurons respond to a stimulus. However, it cannot be used to determine when exactly the strongest response of a neuron occurs. Therefore, we also developed a method that determines the instantaneous firing rate (IFR) using the temporal deviations upon which ZETA is based. Like the ZETAtest, it avoids the bin size selection issue of peristimulus time histograms (PSTHs). Another advantage of this IFR is that its temporal resolution is limited only by the neuron’s spike density. Moreover, unlike modelbased methods, such as the multiplicative inhomogeneous Markov interval (MIMI) model (Kass and Ventura, 2001), it requires no fitting and is orders of magnitude faster (see Materials and methods, Figure 3—figure supplement 2). It is therefore a useful tool for determining spike train features with high precision, such as a neuron’s onset latency.
Figure 6A and B show two example V1 neurons with a relatively high (Figure 6A) and low (Figure 6B) firing rate. Here we define ‘onset’ as the time the halfmaximal response of the peak is first crossed, a metric that is heavily influenced by the chosen bin size when using PSTHbased analyses. Moreover, this binwidthdependent estimation makes PSTHbased comparisons across cell classes problematic, as the choice of optimal bin width depends on spiking properties such as the firing rate and peak firing duration (Freedman and Diaconis, 1981; Scott, 2009; Shimazaki and Shinomoto, 2007), which are heterogeneous across neuronal cell types (Figure 6A5 and B5). Hence, the main advantage of our method is that it avoids having to tailor the bin size to each neuron individually; allowing for better comparisons across varying cell types and brain regions.
To test performance on real data, we compared the estimated onset latency using our IFR to what we obtain from a PSTH analysis with varying bin sizes. Across all V1 neurons (Figure 6C, n = 119 neurons) as well as all neurons recorded in visual areas (Figure 6D, n = 1403 neurons), our metric showed the strongest agreement in latency estimation with PSTH bin sizes between 1 and 10 ms.
As no ground truth is known for the real latencies of experimentally recorded neurons, we performed a benchmark test with artificially generated spike trains. Like before, we used Poisson neurons with a constant background spiking rate, where each neuron was assigned a background rate ranging from 0.5 to 32 Hz (Figure 6E–G). We used 100 trials, each 2 s long, and created a peak response on top of this baseline rate by adding a single spike in 50 % of all trials (i.e. 50 spikes in total). We also varied the peakresponse width by jittering the time each spike was added according to a normal distribution, σ ranging from 1 to 10 ms. We found that the error in the peak estimate was independent of background spiking rate (Figure 6H), indicating robust performance even when the spikes contributing to the peak were only 0.78 % of the total (i.e. for 32 Hz). The mean error was also independent of the jitter, while the standard deviation of the error estimate grew in a theoretically optimal fashion as O(σ) (Figure 6I).
We next tested the estimator accuracy when using a binningbased PSTH approach (bin width 1–58 ms) and found it depends on both the bin width and the width of the peak response (Figure 6J). This means that accurate latency determination using PSTHs requires the use of multiple bin sizes. More importantly, when comparing the estimation error between these binbased methods and our IFR, we found that the binningless IFRbased latencies were consistently as accurate as, or more accurate than, the best possible binwidth for any given peakwidth (Figure 6K). Finally, the IFRlatency accuracy also exceeded the MIMImodel fit based method. In other words, the IFRbased accuracy supersedes the PSTHbased (and MIMIbased) accuracy without the need to handpick the optimal binning width per neuron for a PSTHmethod, nor tune hyperparameters such as knot number, location, and regularization strength for the MIMImethod.
Visuomotor mismatch and spatial location are mediated by different neuronal subpopulations
Having established that the ZETAtest and IFR are statistically robust and have clear advantages over meanrate approaches and PSTHs, we applied these tools to a GCaMP6 data set. Many theories, such as predictive coding (Friston, 2005; Gregory et al., 1980; Rao and Ballard, 1999), biologically realistic error backpropagation (Ooyen and Roelfsema, 2003; Whittington and Bogacz, 2019), and canonical cortical microcircuit operation (Bastos et al., 2012; Douglas et al., 1989), define a ‘topdown’ signal representing an expectation, error or surprise signal as distinct from a bottomup sensory drive. In mouse V1, such topdown visuomotor mismatch signals have been reported previously (Attinger et al., 2017; Keller et al., 2012; Leinweber et al., 2017; Saleem et al., 2013). However, whether individual V1 neurons can be classified into different groups based on their encoding of topdown visuomotor mismatch or bottomup sensorydriven spatial location signals has not been studied.
To examine this issue, and to provide an example of how one could use ZETA in a neurophysiology study, we used neuronal calcium data recorded in L2/3 V1 of 4 mice running on a virtualreality linear track (Figure 7A). In 87 % of all corridor runs (N = 622/713 trials), the track was rendered normally, and the mice received visual feedback matching their running speed. In the remaining 13 % of runs (N = 91), rendering was halted at a random location for 500 ms before resuming (Figure 7B). After performing calciumtransient detection to obtain putative spike times (Montijn et al., 2016b), we calculated ZETAscores for all neurons in three different ways. We aligned the spikes to mismatchonsets, to trial starts, or converted the spike times into locations on the track, and aligned these spike locations to the start. For each recording (n = 7), we calculated the Pearson correlation for each pair of these three ZETAscores (Figure 7C–F). Across recordings, time and locationmodulation were positively correlated (mean r = 0.23, onesample ttest, n = 7 recordings, p = 0.016); time and mismatchmodulation were not significant (r = 0.13, p = 0.28); and location and mismatchmodulation were negatively correlated (r = −0.22, p = 0.04).
We next used a kerneldensity estimate (KDE) to directly test whether the joint encoding of two features within single neurons was different from chance (Figure 7G–I; Figure 7—figure supplement 1). Indeed, we found that it was less likely that a neuron showed high modulation values for both spatial location and visuomotor mismatch than if the two features were encoded independently (z = −4.3, p = 2.0 × 10^{–5}). We also found time and location to be more likely to be encoded by the same neurons (z = 3.8, p = 1.1 × 10^{–4}), while time and mismatchmodulation show no effect (z = −0.06, p = 0.95). This suggests a functional specialization for many neurons to encode either visuomotor mismatch signals or spatial location, but not both. While it is possible that location and mismatchencoding are also mediated by specific genetic subtypes of interneurons (Attinger et al., 2017), our analysis of putative pyramidal cells demonstrates that principal cells also show encoding specialization.
Optogenetic stimulation of VIP cells disinhibits visual cortex
Finally, we applied the IFR and ZETAtest to data recorded at the Allen Brain Institute (Siegle et al., 2019). In this case, we investigate whether optogenetic stimulation of VIP cells in visual cortex disinhibits the local circuit, as has been shown previously for auditory cortex and mPFC (Pi et al., 2013). The analysis of optogenetic stimulation in visuallyresponsive areas is complicated by the fact that mice can see the blue light used for optogenetic stimulation. In other words, if a neuron is active after a laser pulse, it could be caused by direct stimulation, indirect circuit disinhibition, or simply be a sensorydriven response. Using the methods described in this paper to overcome these issues, we show that optogenetic stimulation of VIPexpressing cells in mouse visual cortex causes shortlatency inhibition and longer latency disinhibition in separate neuronal subpopulations (Figure 8).
Data used for this analysis were recorded from 5 VipIRESCre; Ai32 mice at the Allen Brain Institute. Cells were only included for analysis if the clustering quality was sufficient, and they were recorded in a visual cortex area (1706 cells total in AL, AM, PM, L, V1, RL, MMP). We performed a ZETAtest to discard cells that were not modulated within the interval (–0.5, + 0.5 s) after optogenetic stimulation (N = 1144 cells remaining). Instantaneous firing rate peak and troughlatencies were computed and cells were discarded if their peak occurred earlier than +1 ms after optogenetic stimulation onset. The remaining cells (N = 909) were classified as VIP (N = 13) if their peak latencies occurred within the 10 ms duration of the optogenetic stimulation, as Inhibited (N = 59) or Activated (N = 137) if they displayed respectively decreased or increased firing within 20 ms from the stimulus offset, or Other (N = 700). We limited our classification window to the first 30 ms after the onset of the optogenetic stimulus as it takes about 30–50 ms for retinal light responses to emerge in the visual cortex. Hence, the majority of ‘Other’ cells likely show sensorydriven responses.
Based on these classifications, we compared the singlecell latencies of Inhibited and Activated neurons, restricting our analysis to peaks within the 10–30 ms postoptogenetic stimulation window. The latency of Inhibited cells was significantly shorter after optogenetic stimulation than it was for Activated cells (median latency Inh = 16.5 ms, Act = 23.6, MannWhitney Utest, p = 5.1 × 10^{–7}). These results show a VIPmediated disinhibition mechanism operates in vivo in visual cortex, confirm slice connectivity studies (Pfeffer et al., 2013), and are in general agreement with results from the auditory cortex and mPFC (Pi et al., 2013).
Discussion
We developed the ZETAtest, a statistical method for determining whether neuronal spiking responses are modulated by the occurrence of events, such as the onset of sensory stimulation. The ZETAtest is widely applicable: we have shown that it can accurately detect neuronal responsiveness in a wide range of settings, across various brain regions, stimuli, and recording techniques. In most cases, the ZETAtest showed markedly improved statistical sensitivity compared to established and powerful statistical techniques, such as ttests and ANOVAs. For example, the ZETAtest detected a visual response in 42 % of the cells that were not included by a ttest and in 28 % of the cells not included by a PSTHbased ANOVA. In addition to its improved statistical performance, the ZETAtest avoids arbitrary parameter choices, as it does not require the selection of a temporal bin size. This makes the ZETAtest even easier to use than established methods, as it can be applied directly to raw spike times and stimulus onsets, and the lack of a parameter selection naturally lends itself to the bulkanalysis of large numbers of cells.
Secondly, we developed an instantaneous firing rate for analysis and visualization of neuronal firing patterns. Similar to the ZETAtest, it has two main advantages over alternative common approaches, such as PSTHs: (1) it does not require binning, and therefore removes the need to tune binning widths for individual cells or cell types and (2) its temporal resolution is only limited by the spike density, which allows more accurate determination of spike train events such as peak response latencies. When benchmarking its performance for latencyestimation, it outperformed PSTHbased approaches without the need to finetune bin sizes for each neuron individually (Figure 6K). Finally, compared to methods that rely on model fitting, such as the multiplicative inhomogeneous Markov interval (MIMI) model, the ZETAtest is considerably faster, not subject to overfitting, and does not require tuning of hyperparameters (Figure 3—figure supplement 2).
We investigated the performance of the ZETAtest in a variety of cases and observed that the ZETAtest sometimes performed as well as other techniques and sometimes better. So when, and why, exactly does the ZETAtest provide extra statistical power? We have shown in the methods that the ZETAtest shares mathematical properties with a KolmogorovSmirnov test of a neuron’s spike train against permuted onsetjitter bootstraps. Various components of the ZETAtest were chosen to relax the assumptions made by other tests with the aim to gain relative invariance to the specifics of a neuron’s spike train (Figure 3—figure supplement 3). While the ZETAtest is slightly worse than a ttest for purely Poissondistributed spike counts, it outperforms the ttest in every other case we tested, including those where the interspike interval distribution is highly peaked, such as in bursting cells (Figures 1—4). Finally, while more sophisticated modelbased approaches might be able to attain better performance than the ZETAtest, this requires that their hyperparameters be individually tuned to a cell’s firing statistics (e.g. MIMIbased methods; Figure 3—figure supplement 2). Therefore, the ZETAtest may often be the preferred choice as it required no fitting, hyperparameter tuning, and shows a statistical sensitivity superior to modelbased approaches in all cases we tested.
An alternative to using the ZETAtest could be to perform a set of ANOVAs as in Figure 5. While this would certainly be an improvement over doing a paired ttest, multiple ANOVAs are more complex to implement and the procedure needs multiplecomparison correction (Head et al., 2015). More importantly, it still requires choosing an arbitrary set of bin sizes, as simply taking the optimal bin size still leads to an underperformance relative to the ZETAtest (Figure 1G). Moreover, the ZETAtest has the advantage of not having to choose any parameter at all. As we have shown in Figure 8, using the ZETAtest and IFR allowed us to significantly simplify various analysis steps when investigating the heterogeneous latency effects of optogenetic stimulation of VIP neurons.
However, like all statistical tools, the ZETAtest also has its limitations. As we designed it to be a generalist test applicable to any spike train, the ZETAtest might not perform as well as models manually fitted to describe a particular cell’s response. However, we believe in many cases it will provide a superior alternative to other responsiveness tests, and responselatency determinations. The ZETAtest shows great merit especially as an unbiased parameterfree inclusion criterion, which is something that currently often varies between studies and can adversely affect the replicability of scientific results (Mesa et al., 2021). Another, and fairly obvious, limitation is that the ZETAtest only compares pointevents (e.g. spike times) to other pointevents (e.g. stimulus onsets). Consequentially, it is a powerful tool for stateoftheart electrophysiological techniques, like Neuropixels recordings, but does not apply to calcium imaging dF/F0 traces. However, as we show in Figure 7, the ZETAtest performs well when applied to putative spiking events extracted from calcium imaging data. Therefore, the ZETAtest would still be useful in calcium imaging data sets where the temporal response profile of neurons to the experimental treatment is unknown. Moreover, optical recording techniques continue to improve, allowing better singlespike extractions with calcium imaging (Packer et al., 2015) and genetically encoded voltage indicators (Knöpfel and Song, 2019). As such, in all likelihood nearfuture neural data will remain spikebased regardless of the underlying recording technique used.
The main aim of this paper is to present the IFR and ZETAtest and show how they improve upon commonly used analysis tools, such as PSTHs, ttests, Markovmodelbased approaches, and ANOVAs. However, we also show some results that by themselves are scientifically noteworthy. For example, our results indicate that almost all V1 neurons ( > 93%) are responsive to drifting gratings, even if the spatial frequency (SF) and temporal frequency (TF) parameters are not optimal to drive these cells (Figure 3). This calls into question the idea that V1 cells are generally narrowly tuned and can only be driven by specific stimulus feature parameters (Xing et al., 2004). Our analysis indicates that neuronal responsiveness rarely drops to 0 for a combination of stimulus features, as otherwise we would not have found such a large number of responsive V1 cells using a featuresparse stimulus (24 directions, 1 SF: 0.05cpd, 1 TF: 2 Hz).
This suggests that the response of V1 neurons to features such as SF and TF may drop off rather slowly with distance to their preferred stimulus properties. Conceptually, this would mean that the V1 neural code may in fact be more ‘dense’ than proposed by some (Ohiorhenuan et al., 2010; Olshausen and Field, 1997; Vinje and Gallant, 2000). That said, our results show that only a relatively small group of neurons is strongly driven by any one stimulus: sparse coding may therefore still operate in V1, but it acts on top of a dense, but weak code.
Two other noteworthy findings are that VIP cell activation drives disinhibition in visual cortex (Figure 8) and that the strength of visuomotor signals and modulation by spatial location are negatively correlated (Figure 7). This suggests that bottomup and topdown processing in visual cortex may be mediated by distinct neuronal subnetworks. While this functional segregation has previously been shown based on a macroanatomical (e.g. laminar) analysis (Kok et al., 2016; Markov et al., 2014; Poort et al., 2012; Self et al., 2019), our results suggest that this functional segregation also holds at a microanatomical level for neurons located within a single recording plane (Figure 7).
In conclusion, the IFR and ZETAtest are simpler, more statistically powerful, and less errorprone tools than binbased PSTHs, ttests, Markovmodel based approaches, and ANOVAs widely used in neuroscience today. Statistically underpowered studies are still common in neuroscience, which makes the development of statistically sensitive tools especially important (Button et al., 2013). Moreover, when power analyses are used to determine the necessary sample size, then increased statistical sensitivity, such as with the ZETAtest, can reduce the number of required experimental animals. To facilitate the adoption of the ZETAtest by the neuroscientific community, we provide easytouse and welldocumented opensource implementations online in MATLAB (https://github.com/JorritMontijn/ZETA, Jorrit, 2021b) and Python (https://github.com/JorritMontijn/zetapy).
Materials and methods
ZETA
Request a detailed protocolWell documented and easytouse Matlab and python code performing the procedures described in the following paragraphs can be found here: https://github.com/JorritMontijn/ZETA and https://github.com/JorritMontijn/zetapy.
We developed a timescalefree, binningless statistical test for determining whether a neuron shows a timelocked modulation of spiking activity. It is derived from a metric that represents the reliability, as number of standard deviations away from chance, that the temporal density of spikes is nonrandom across trial repetitions. This metric, we call ZETA (ζ), can be computed on a vector of i = [1 … N] spike times x, and a vector of k = [1 … q] event times (e.g. stimulus onsets), w, using the following steps.
First, we make a vector v of the spike times in x relative to the most recent stimulus onset, as when making a raster plot of spike times:
where
Next, we remove all spike times that are larger than a cutoff value τ, for example the trial duration, and add two artificial spikes at t = 0 and t=τ to ensure coverage of the full epoch. We sort the n spike times in v such that v_{i} < v_{i+1}, and calculate the fractional position g_{i}, ranging from 1 /n to 1, of each spike time in v:
Therefore, another interpretation is that g represents a neuron’s cumulative density function sampled at the spike times in v. In order to quantify whether this distribution is different from our null hypothesis – that is that the neuron’s firing rate is not modulated with respect to the stimulus onset – we compare this vector to a linear baseline density vector b. If a neuron’s spiking rate is constant, the cumulative density function is linear over time, and therefore the expected fractional position of spike i at time v_{i} converges to the spike time divided by the trial duration τ as the number of events q increases:
The difference δ_{i} between g_{i} and b_{i} therefore gives a neuron’s deviation from a temporally nonmodulated spiking rate at time point v_{i}:
As we show in the Materials and methods section ‘A proof of timeinvariance’, using δ_{i} to compute ZETA would make it dependent on the choice of onset times. Therefore, we create d, a timeinvariant meannormalized version of δ:
where
We then define the Zenith of Eventbased Timelocked Anomalies (ZETA, or ζ_{r}) as the most extreme value, that is the maximum of the absolute values:
Null hypothesis for ZETA
Request a detailed protocolHaving calculated ZETA from the temporal deviation vector d, we wish to quantify its statistical significance. First, we scale it such that its value is interpretable as a zscore. We therefore construct a null hypothesis distribution by repeating the above procedure P times with jittered eventtimes w’, where we move each event time by a random sample drawn from the interval [τ, τ]. This way, we calculate the chance of observing randomly high values in d without having to make assumptions about the underlying distribution of d. However, a naive approach would lead to difficulties here, as jittering w also changes the corresponding values of v; and any jittered vector d’ we obtain would be sampled at different times than the original vector d. Therefore, we instead linearly interpolate the values of jittered fractional position vector g’ at the original spike times of v. First, we construct a vector f of fractional spiking positions analogously to g, but based on jittered eventtimes w’:
Note that we cannot simply take g, as the total number of spikes n’ in this jittered version is likely different from the original number of spikes n, because we only consider the spike times in the interval [0, τ] after the (jittered) event times. Next, we interpolate the values of f at sample times v’ to the original sample times v:
where
with
We repeat this process P times; where for each jitter iteration j, we calculate δ’(j):
Note that b is invariant with respect to the jitter iteration, as it is simply the nelement linear vector from 1 /n to n. As before, we meannormalize δ’(j) to obtain a temporal deviation vector d’(j).
Now we can define a nullhypothesis ZETA sample j as:
Statistical significance of ZETA
Request a detailed protocolHaving constructed a way to generate samples from a nullhypothesis distribution, we are left with the task of using it in calculating the statistical significance of ZETA. If we had infinite samples, we could directly calculate the percentile of the empirical ζ_{r} from the nulldistribution. However, as this is computationally intractable, we will approximate the true distribution from a finite number of nullhypothesis samples. From extreme value theory we know that the distribution of maximum values is known as a Gumbel distribution (Gumbel, 1941). Its cumulative density is given by:
Here, x is the sample maximum (i.e., ζ_{r}), m is the mode, and β is the scale parameter. Therefore, we need to find m and β, which can be derived from the estimated sample mean and variance over jittered ZETAs of ζ’. The mean $\stackrel{\xb4}{x}$ and variance v are given by Gumbel, 1954:
Here, γ is the Euler–Mascheroni constant (γ ≈ 0.577), m is the mode, and β is the scale parameter. Using $v=\text{Var}\left({\mathit{\zeta}}^{\prime}\right)$ , and Equation 18, we can write the scale parameter β as:
Then using Equation 17 and $\overline{x}=\overline{{\zeta}^{\prime}}$ , the mode can be computed from β and the mean:
Now we can define the pvalue by reading out the cumulative Gumbel distribution at ζ_{r}:
Finally, we can use p with the standard normal’s quantile function ${\Phi}^{1}$ to obtain a corrected ZETA ζ that is interpretable as a zscore:
Note that when we refer to ZETA or ζ in the rest of the manuscript, we mean the corrected version and its pvalue as defined above.
Computing an optimal bin size
Request a detailed protocolFor the analyses where we used an optimal binning width to compare the performance of the ZETAtest and a binwise ANOVA, we computed the optimal bin width using the procedure described by Shimazaki and Shinomoto, 2007. Their method describes a loss function that can be computed for a given binwidth. To find the optimal bin width, we used a simple iterative 10point grid search until a local minimum was found. The code used for finding the optimal bin size is available online at https://github.com/JorritMontijn/GeneralAnalysis, (copy archived at swh:1:rev:7f866e0c875af17e9d76fdfbd8cec3d41145c031, Jorrit, 2021a) in the function opthist.m.
The multiplicative inhomogeneous Markov Interval (MIMI) Model
Request a detailed protocolA classic model for neuronal firing rates are inhomogeneous Poisson processes, where a timedependent function f(t) can be used to describe how the mean firing rate λ of a cell varies with time t after some experimental intervention, such as the onset of a visual stimulus (Kass et al., 2014):
Spiketimes v can then be generated by sampling from an exponential distribution with an interspike interval equal to 1/λ:
While this framework is attractive in its simplicity, it cannot capture several important properties of spiking dynamics, such as refractory periods or burst firing. Even when one is only interested in the question whether a particular neuron responds to a visual stimulus, bursting cells might produce apparent bumps in a peristimulus time histogram (PSTH). This could lead to the possibly erroneous conclusion that a cell is stimulusmodulated, simply because their spiking patterns are nonPoisson by nature. This problem is remedied by a class of models that combine a Poisson process with a renewal process, which describes the likelihood of a spike conditioned on the time since the last spike. These processes are called multiplicative inhomogeneous Markov interval (MIMI) processes (Kass and Ventura, 2001):
Here, s_{*}(t) is the time of the last spike, the λ_{1} term refers to the inhomogeneous Poisson process described above, while the latter λ_{2} term captures the interspikeinterval dependent spiking probability. While this model can be extended to include interaction terms, nback spike dependencies, and bias constants per trial, prior work has shown these additions do not appreciably improve the fitting quality (Kass and Ventura, 2001). As an additional baseline model, we therefore also compared the performance of the ZETAtest to that of a method based on the MIMImodel (Equation 25).
MIMImodel fit evaluation as a statistical test for responsiveness
Request a detailed protocolTo use the MIMImodel framework as a statistical test, we binned spikes in 1 ms bins. We used cubic splines with 16 Bform coefficients spread uniformly over the trial’s 1.5 s duration for the inhomogeneous Poisson component, and 16 Bform coefficients spread uniformly over a time horizon of 500 ms for the renewalprocess component (Kass et al., 2014). We then simultaneously fitted these 32 coefficients to produce the closest match to the neuron’s spike train by running a leastsquares curve fitting algorithm. Specifically, we minimized the error between the stimuluslocked trace reconstituted from the Bform splines and the real average spiking rate per bin of the PSTH. The fitting procedure used 1 ms binning, but the resulting model can be resolved at theoretically infinitesimal time steps. We ran a couple of fits with different numbers of coefficients ranging from 8 + 8–32 + 32, but this did not strongly impact either the fitting quality or computation time (Figure 3—figure supplement 2).
Theoretically, we could now follow the same procedure as with the ZETAtest by generating a nullhypothesis distribution from multiple iterations of jittered onset times. However, the MIMI fitting procedure’s computational time cost meant this was not a realistic solution: even a single MIMImodel fit took 557 times as long to run as the ZETAtest, which already included 100 random jitter iterations (Figure 3—figure supplement 2, panel H). This means that using 100 random jitters for the MIMImodel test would take 55,700 times as long as the ZETAtest. We therefore took an alternative approach by calculating pairwise d’ values (i.e. the distance in standard deviations between the two bins) for all bin pairs, and transformed the highest d’ into a pvalue. This is obviously a suboptimal approach, as this will lead to a high number of false positives. However, our goal here is not to develop a fullfledged MIMIbased test, but rather to verify what the statistical sensitivity of such an approach could be. It is possible to investigate the statistical sensitivity as the low pvalue bias exists for both real data and shuffle controls, so instead of comparing the inclusion level at an alpha of 0.05, we performed an ROC analysis over all cells in V1.
The ROC analysis is insensitive to the absolute level of significance values, but instead provides insight in the discriminability of real inclusions from false positives (Figure 3—figure supplement 2, panel G). To keep the computational time tractable, we subsampled the data to include only V1 cells, and only their response to the first 480 drifting grating trials. We compared the ZETAtest, ttest and MIMImodel method as described above. The ZETAtest gave an area under the curve (AUC) of 0.996, the ttest 0.900, and the MIMImethod 0.749. However, we noticed that the MIMImethod appeared to fail mostly for cells with low firing rates, so we also added a hypothetical curve where we only included neurons with > 1000 spikes during the 480trial long epoch (MIMI1k). This significantly boosted the MIMImethod’s discriminability to an AUC similar to the ttest’s at 0.898. Perhaps choosing a different number of coefficients would improve the MIMI test, but the issue remains that this test does not work without manual tuning. These results suggest it will require significant work to develop a MIMIbased test that can compete with a ttest, and that even if we were successful, it might not exceed the ZETAtest’s performance. Moreover, as the MIMImodel requires parameters to be iteratively fitted to experimental data, it is multiple orders of magnitude slower than the ZETAtest and ttest. To conclude, full MIMImodel based methods do not seem to be suited for unsupervised, largescale use as neuronal responsiveness tests.
Decomposition of ZETA
Request a detailed protocolWhile using full multiplicative inhomogeneous Markov interval models of the form of Equation 25 is not a viable option when creating a responsiveness test, we used it as a starting point to further explore which properties of the ZETA allow it to function so well. As already noted above, the main problem for developing a robust responsiveness test is finding a suitable nullhypothesis distribution to test against. As a first naïve baseline, we built a simple test that checks whether a cell’s firing rate, binned with width τ and averaged across trials, differs from a homogeneous Poisson process with rate λ. Under this nullhypothesis, the number of spikes per bin X is therefore distributed as:
The nullhypothesis random variable H_{0} for rates averaged over T trials follows:
Whether a neuron’s observed number of spikes per bin, averaged over trials, differs from this null distribution can then be tested using a standard KolmogorovSmirnov test.
We benchmarked this approach with V1 cells, using 1 ms bins, and performed an ROC analysis of real inclusions versus false positives using two different ways to construct neuronal responses that are unmodulated by the visual stimulation: (1) we shuffled the interspike intervals to construct a new set of spike times (Figure 3—figure supplement 3A) and (2) we jittered the stimulus onset times by ±6 s (Figure 3—figure supplement 3B). We found that the Poissonbased test performed well if the H0 spiking statistics matched the shufflecontrol (PoissKS, AUC = 0.976), but failed due to high numbers of false positives when we instead jittered the stimulus onset times (PoissKS, AUC = 0.611). By construction, the fluctuations in the rate are not linked to the stimulus onset times anymore, but the distribution of the spike times over the trial period are not consistent with a Poisson neuron.
This may seem a trivial result, as the nullhypothesis distribution in this latter case does not match the null (Poisson) distribution used by the statistical test. This is a critical issue, however: in the case of real experimental data sets, there is no known ground truth, so a robust statistical test for neuronal responsiveness must be able to handle a wide variety of intrinsic spiking behaviors. Clearly, the Poisson test fails this requirement, as it is only able to distinguish i.i.d. Poissondistributed spiking from anything that is not exactly i.i.d. Poissondistributed spiking. For comparison, both the ZETAtest and ttest show robust behavior that is insensitive to the specifics of the shufflecontrol we use. The ZETAtest gives an AUC of 0.983 using ISIshuffles and 0.984 using onset jittering; and the meanrate ttest gives AUCs of 0.899 and 0.902 respectively. In the following section, we investigate which properties of the ZETAtest allow it to perform so much better than the Poisson test.
The first aspect we investigated is the assumption of the homogeneous Poissondistributed spiking when a neuron is unmodulated by visual stimulation. We know that neurons can be intrinsically bursting and have refractory periods, so even the most purely sensorydriven cell in V1 is likely to not fire i.i.d. Poisson when no visual stimulus is present. A more versatile and possibly more accurate H0 might be a renewal process. We therefore constructed an interspikeinterval (ISI)based test, where we first calculated a neuron’s interspike intervals dt from its spike time vector t:
We then randomly permuted the ISIs, creating a shuffled ISI vector dt^{s} and using it to construct a nullhypothesis vector of spike times t^{0}:
In effect, this nullhypothesis vector is a different random sample of the same renewal process that would generate the real neuron’s spiking times, under the simplifying assumption that the spike times are generated by a renewal process. We constructed mean firing rates x^{0} by binning t^{0} using 1 ms bins. We repeated this 100 times, and used a twosample KS test between the real binned spike count vector x and shufflecontrol spike count matrix X^{0}, as we also did for the Poisson test described above. Unfortunately, this SISIKS (Shuffled InterSpikeInterval KolmogorovSmirnov) test performed very similar to the Poisson test, resulting in an AUC of 0.985 for the ISIshuffle control, and an AUC of 0.610 for the jittercontrol.
This similarity might be explained by the ability of the KS test to distinguish with high sensitivity between the spiking distributions obtained from shuffling ISIs and jittering stimulus onsets, as these are not identical. If we wish to construct a more robust test, we must therefore use a procedure that is less sensitive to the full shape of the H0 distribution, and only takes into account the likelihood of observing extreme deviations from the average firing rate. This will make the test less sensitive to real stimulusinduced activity, but also less sensitive to errors in the specific shape of the null hypothesis distribution we use to estimate the neuron’s natural variability. We achieved this by using the Gumbeldistribution of maximum absolute deviations in our random ISIshuffle samples to calculate a pvalue of the maximum absolute deviation in the real, unshuffled, PSTH. The procedure works as intended: this SISIG test gives an AUC of 0.844 for the ISIshuffle controls, and an AUC of 0.821 for the jitter controls.
We noticed that the SISIG test suffered from high variance in the PSTH when using 1 ms bins, especially for cells with few spikes. One option would therefore be to increase the bin width, but this would come at the expense of temporal resolution and ability to detect short peaks of activity. We therefore opted instead to calculate the maximum absolute deviation of the cumulative sum of spikes counts, similar to the ZETAtest’s, but in this case over the discrete 1 ms spike count vector x^{0}. For the SISI∫G test, we defined the normalized cumulative spike count vector T0 (similar to Equation 29) as:
Moreover, as this would create fixed points with 0 variance at j = 1 and j = n, we also meansubtracted the T0 vector itself:
In essence, this test is a 1 ms binned and ISIshuffle based version of the binless onsetjittering ZETAtest. Benchmarking this test, we found that it performed close to, but slightly less well than, the ZETAtest. This SISI∫G test gave an AUC of 0.968 for the ISIshuffle controls and an AUC of 0.974 for the jitter controls.
Finally, we created an alternative, also binless, version of the ZETAtest where we created the null distributions by shuffling the interspike intervals rather than jittering the stimulus onsets. The alternative ZETAISI test performed at a level indistinguishable from ZETA; the ZETAISI gave an AUC of 0.986 for the ISIshuffle controls and an AUC of 0.982 for the jitter controls. To conclude; the (alternative) ZETAtest strongly outperforms other tests, mainly for two reasons: (1) using the Gumbel distribution to calculate a cell’s significance based on the most extreme stimuluslocked spiking deviation rather than a KS test allows the ZETAtest to be relatively invariant to the full, and a priori unknown, spike time distribution of a neuron and (2) using an integralbased approach has a timescalefree smoothing effect that reduces spurious peaks that can occur in the firing rate domain. Finally, the binless ZETAtest shows a small, but significant, improvement over its 1 ms binned cousin in terms of both statistical power and computational efficiency (Figure 3—figure supplement 3C). While the random null distribution (ISI shuffling or onset jittering) did not seem to have a large impact on the ZETAtest’s performance for this data set of predominantly regularspiking V1 neurons, the following section shows that this distinction becomes more important when one tests the responsiveness of bursting cells.
Simulated bursting cells: onset jittering versus interspike interval shuffling
Request a detailed protocolStimulus onset jittering and interspike interval shuffling produce different distributions, unless a neuron’s probability of spiking only depends on the time since the last spike. The previous section showed that, when using our V1 data set, performance has already saturated too much to show a difference between the ISIshuffle ZETA and onsetjitter ZETA tests. To better differentiate their performance, we therefore generated a population of simulated bursting cells. While bursting cells are rare in visual cortex, they are abundant in many brain regions, such as the subiculum and others (Cooper, 2002; Mattia et al., 1993). To test the performance on bursting cells, we generated artificial spike trains, using the parameters for burst properties from Chen et al., 2009 as listed in Table 1.
All cells were assigned a background singlespike firing rate that was on average 1 Hz; on top of these single spikes we generated bursts with varying length and interspike intervals. The interburst intervals were 16 s on average during intertrial intervals and nonpreferred stimuli; and 0.8 s during the neuron’s preferred stimulus. Each neuron had a randomly assigned preferred orientation and tuning width following a von Mises distribution. We presented 20 repetitions of 24 orientations (15 degree steps) for a total of 480 trials. We generated responses for 1,000 independently parameterized neurons (see Figure 3—figure supplement 3DE). To create stimulus nonresponsive neurons, we set the interburst intervals to be approximately 6.8 s both within and outside stimulus presentation; we chose this value to ensure the spiking rate distributions for stimulusresponsive and unresponsive neurons were approximately equal (25th75th population percentiles of firing rates; stimulusresponsive: 6.7–19.7 Hz; unresponsive: 6.9–20.7 Hz).
Running the same benchmark as before on this artificial data set, we found that the onsetjitter ZETAtest indeed outperformed both the ISIshuffle ZETAtest and meanrate ttest (Figure 3—figure supplement 3F). The ZETAtest gave an AUC of 0.941, the ISIshuffle ZETAtest an AUC of 0.845, and the meanrate ttest an AUC of 0.902. This difference in performance can be attributed to the fact that jittering stimulus onsets keeps the properties of individual bursts intact, while ISIshuffling changes these properties, leading to more variable burst spike trains. While shuffling of ISIs or stimulus onsets both produce spike trains that are unmodulated by stimulus presence, we have shown that they are not equivalent.
Multiscale derivatives of ZETA for latency detection
Request a detailed protocolThe ZETAtest indicates whether a neuron shows reliable deviations in spiking rate with respect to a particular series of events. The time of this maximum deviation, however, is not necessarily when the neuron shows its strongest firing rate modulation, but rather when the cumulative distribution reaches peak statistical significance. Therefore, in order to use the ZETA procedure to calculate the time of peaks in modulations (e.g. onset latencies), we should take the derivative of the temporal deviation vector d underlying ZETA. A naïve approach with a simple spiketospike derivative unfortunately yields a curve with many spurious peaks. One solution would be to calculate the derivative over a larger time interval, but this comes at the expense of temporal resolution. Moreover, many different cell types exist with different dominant time constants. To balance temporal resolution and robustness, we therefore developed a multiscale derivative procedure. First, we define a vector t of S timescales at which to compute derivatives. By default, we define the timescales to lie on a logarithmic scale with base 1.5, as this gave a reasonable tradeoff between computational speed and accuracy. Base values closer to one will give more accurate results at the cost of computational speed. For base b and a trial duration of τ:
where
The derivative $\dot{d}$ at spike i can then be defined for d_{i} and timescale t_{k} as:
where
Here, v are spike times, following the definition above. To avoid undefined edges, we set a and b to one and n respectively, iff v_{i} ± t_{k}/2 falls outside the interval [0, τ]. Taking the mean over all S timescales, we obtain an average of multiscale derivatives m,
which has two important properties. First, longtimescale derivatives tend to 0, so there is a bias of m to more strongly follow shorter timescales. Secondly, random noise at the shortest timescales averages out over multiple shorttimescale derivatives. Therefore, these two properties combined lead m to reflect the shortest timescales at which a real signal starts to emerge from random noise.
Calculation of highresolution instantaneous firing rates
Request a detailed protocolAnother interesting property of the mean multiscale derivative m is that it scales with the actual firing rate. In other words, it produces a timelocked neural activation curve, similar to a peristimulus time histogram (PSTH). If we properly rescale m, we can therefore create an instantaneous firing rate metric with a temporal resolution that is only limited by the spike density.
Remember that the temporal deviation vector δ itself is scaled to lie between –1 and +1, as its value depends on the difference between the fractional position of a spike (from 0 to 1) and the linear interval from x,y=[0,0] to [τ,1]. The theoretical lower limit of the multiscale derivative is therefore $1/\tau $. This can be illustrated as follows. Imagine a hypothetical neuron where all spikes are fired in an arbitrarily short interval close the start of each trial. This means that δ rises from 0 to 1 in a short interval and from then decays linearly from 1 to 0 over an interval of τ. As $\dot{d}$ is defined between two spikes (including window edges 0 and τ), this means that the lowest possible value in this extreme case is the point between the last spike n and τ, i.e.:
The lowest possible firing rate is 0 Hz, which therefore corresponds to $1/\tau $ . An upper bound for $\dot{d}$ does not exist, as an arbitrarily short interval with a finite number of n spikes would lead to arbitrarily high $\dot{d}$ :
This is a desirable property, as the maximum instantaneous firing rate of a neuron is not theoretically constrained. Finally, the average firing rate of our metric should correspond to the real average firing rate in Hz (n/τq), where q is the number of events as defined above. We therefore define our instantaneous firing rate metric r as:
Here, $\overline{m}$ is the weighted average of m by the interspike interval, such that the averaging occurs in the time domain and not the spikenumber domain:
Considering the definitions above, we can therefore state that the maximum firing rate in r occurs where
is at its maximum, with ${a}_{k}$ the index of the largest spike time smaller than ${v}_{i}\frac{{t}_{k}}{2}$ and ${b}_{k}$ the index of the smallest spike time larger than ${v}_{i}+\frac{{t}_{k}}{2}$ ; with ${t}_{k}$ being the logarithmically distributed time ranges. In the limit of a high number of spikes, we find that ${v}_{{a}_{k}}\approx {v}_{i}\frac{{t}_{k}}{2}$ and ${v}_{{b}_{k}}\approx {v}_{i}+\frac{{t}_{k}}{2}$ , and therefore,
which is an average of the instantaneous spike rates at ${v}_{i}$ at timescales $t}_{k$.
Meanrate artificial Poisson neurons
Request a detailed protocolWe tested whether ZETA required short bursts of activity to work by generating artificial spike trains that only varied in meanrate between a 1 s stimulus presentation and a 1 s interstimulus interval, and did not show onset peak responses. We created spike trains for 100 neurons with an orientation preference θ randomly sampled from a uniform distribution on the interval (0,π). The shape of the tuning curve was defined as the sum of two von Mises distributions centered at preferred orientation θ and θ+π with a concentration parameter of κ = 5 + ε, where ε was randomly sampled from a uniform distribution on the interval (0,5). The von Mises probability density function with mean θ and concentration parameter κ is given by:
Here, where I_{0}(κ) is the modified Bessel function of order 0. The baseline mean spiking rate µ_{base} was defined by randomly sampling from an exponential distribution with a mean of λ_{base} = 5 Hz. µ_{base} defined the trough of the neuron’s tuning curve (i.e. the activity at θ+π/2 and θπ/2) as well as the activity of the neuron when no stimulus was present. The firing rate for the preferred stimulus µ_{stim} was determined by similarly sampling from an exponential distribution with a mean of λ_{stim} = µ_{base} +20 Hz. We generated spiking activity for 160 trials (20 repetitions of 8 stimulus orientations: θ_{stim} = [0, 45, …, 315]). The average baseline firing across all n = 10,000 artificial neurons was therefore 5 Hz and the average firing rate during the preferred stimulus was 25 Hz. Spike times were generated for each trialepoch (stimulus/baseline) independently by consecutively drawing interspike intervals from a Poisson distribution with λ = 1/µ.
Artificial Poisson neurons for peaklatency benchmarking
Request a detailed protocolWe also addressed the question whether our instantaneous firing rate metric was sufficiently robust to allow accurate peaktime detection over a range of background firing rates and a range of peak widths. The procedure here was similar as above, with the exception that the firing rate during baseline and stimulus periods was identical: µ_{base} = µ_{stim}. We tested 13 base rates (0.5–32 Hz) and 19 jitter widths (1–10 ms in steps of 0.5). For each combination of base rate and jitter width, we generated 100 neurons and 100 trials per neuron. Peaks were added to the background activity by adding a single spike to half of all trials. Spike times were chosen by random sampling from a normal distribution with the standard deviation equal to the above jitter width and centered at 100 ± 10 ms after stimulus onset.
Figure panels 6 J,K used slightly different parameters. Instead, we used 10 jitter widths (1–10 ms in steps of 1), simulated only a base rate of 32 Hz, used 160 trials for 1000 neurons, and compared the peaklatency detection using the ZETAIFR with binning windows ranging from 1.00 ms to 57.67 ms; a logarithmic scale of base 1.5 with the exponent ranging from 0 to 10 in steps of 1.
Acquisition and preprocessing of laminar probe data (neuronexus and neuropixels)
Request a detailed protocolWe performed silicon probe recordings in six C57BL/6 mice, 2–7 months of age. Mice were housed in a 12 hr/12 hr dark/light cycle with ad libitum access to food and water. All experiments were approved by the animal ethics committee of the Royal Netherlands Academy of Arts and Sciences, in compliance with all relevant ethical regulations.
Mice were habituated for 1–4 weeks before being implanted with a cranial bar used for headfixation. Mice were anesthetized with isoflurane (3 % induction, 1–1.5% maintenance in 50 % O2) and injected subcutaneously with an analgesic and antiinflammatory compound (Metacam, 2 mg/kg). The eyes were protected from drying by Cavasan eye ointment. They were moved to a stereotact with a thermal mat to keep their core temperature at 37°C, and the fur on their heads was removed. Once anesthesia was sufficiently deep, as indicated by the absence of a toepinch reflex, we applied lidocaine locally on the skin of the head and sterilized it with 70%ethanol or betadine. The skin was removed, the skull cleaned, and a small metal rod was fixed to the skull anterior of bregma with the use of bluelight curing dental cement. If necessary, we sutured the skin, and let the mice recover for 2–7 days.
After the mice recovered as indicated by a return to their preoperative weight, mice were habituated to sitting headfixed in the electrophysiology rig for 3–10 days. Once habituated, they underwent a craniotomy surgery, following the same preparatory steps as described above. Before performing the craniotomy (1.5–3 mm in diameter), we first constructed a small ring of dental cement so the brain could be bathed in saline during recordings to avoid tissue desiccation. Once the craniotomy was complete, this ring was filled with sterile silicone, and the animals were left to recover for at least 16 hr. Over the next 1–7 days, we performed repeatedinsertion recordings using either NeuroNexus or Neuropixels silicon probes. For a subset of animals, we dipped the probe into DiI on the last day of recording, and perfused the animal to perform posthoc tracing of the electrode recording locations. For the remaining recordings, the probe position was determined using anatomical landmarks (i.e., bregma and lambda).
NeuroNexus recordings were performed using either a TuckerDavis Technologies digitizer and customwritten MATLAB code as described previously (Ahmadlou and Heimel, 2015). Neuropixels recordings were performed using a National Instruments I/O PXIe6341 module and SpikeGLX (https://github.com/billkarsh/SpikeGLX). Visual stimulation was performed as described previously (Montijn et al., 2016a), and synchronized with high accuracy ( < 1 ms) using photodiode signals that recorded visual stimulus onsets. Spikes were sorted posthoc using Kilosort2 (https://github.com/MouseLand/Kilosort2, Pachitariu, 2021) and only clusters of sufficient quality, as defined by Kilosort2’s default threshold, were included for further analysis. Highquality clusters (i.e. putative neurons) were assigned a brain region using the AllenCCF MATLAB toolbox (https://github.com/cortexlab/allenCCF, Peters, 2021), which automatically calculates a neuron’s anatomical position based on penetration location, angle, and depth of the silicon probe. Abbreviations for brain areas mostly follow the Allen Brain Atlas area codes. V1: primary visual cortex. AM: anteromedial visual cortex. PM: posteromedial visual cortex. LGN: lateral geniculate nucleus. LP: lateral posterior nucleus. NOT: nucleus of the optic tract. APN: anterior pretectal nucleus. SC: superior colliculus. All code used in laminar probe data acquisition and preprocessing is available online (https://github.com/JorritMontijn/Acquipix, Montijn, 2021).
Visual stimulus parameters
Visual stimuli during Neuronexus and Neuropixels recordings were shown at 60 Hz on a 51 by 29 cm Dell screen at 17 cm distance from the animal’s eyes, using Psychtoolbox three in Matlab. Drifting gratings were displayed within a 120 visualdegree diameter window with two visualdegree cosine edge that faded smoothly into a neutralgray background. Drifting gratings were shown in 24 directions: [0, 15, … 345] degrees at a spatial frequency of 0.05 cycles per degree and a temporal frequency of 1 cycle per second. Natural movies were 20 s long and consisted of four distinct scenes taken from the BBC nature documentary Earthflight (Montijn et al., 2016a).
Acquisition and preprocessing of retinal multielectrode array data
Request a detailed protocolAdult ( > 1 year) zebrafish (Danio rerio), were darkadapted for at least 1 hr. Then under IR illumination fish were euthanized by rapid immersion in icecold water, the eyes removed, the retina isolated and placed photoreceptor side up on a perforated 60 electrode array (60pMEA200/30iRTi using a MEA2100 system: Multichannel systems, Reutlingen, Germany) in a recording chamber mounted on an Nikon Optiphot2 upright microscope and viewed under IR with an Olympus 2 x objective and video camera (Abus TVCC 20530). Room temperature Ames’ medium (SigmaAldrich) gassed with a mixture of O2 and CO2 at pH of 7.6 continuously superfused the MEA recording chamber.
Extracellular multiunit GC activity was recorded at 25 kHz in MC rack (Multichannel systems, Reutlingen, Germany), zerophase bandpass filtered (250–6250 Hz) with a fourthorder Butterworth filter in Matlab (MathWorks, Natick, MA, USA), and sorted into singleunit activity with ‘offline spike sorter’ (Plexon, Dallas, TX, USA). Spikes were detected using an amplitude threshold > 4σ_{n} where σ_{n} is an estimation of the background noise
with x being the bandpassfiltered signal (Quiroga et al., 2004). The detected spikes were manually sorted into single units based on the first two principal components versus time.
The light stimulus consisted of a 500 ms full field light flash, preceded and followed by a 500 and 1000 ms period of darkness, generated using Psychophysics Toolbox Version 3 (Brainard, 1997; Kleiner et al., 2007), and repeated either 50 or 100 times. Stimuli were projected onto the retina from the photoreceptor side by a DLP projector (Light Crafter 4500, Wintech, Carlsbad, CA, USA) using a custombuilt 2 x water immersion objective. Only white light stimuli were used. The “dark” light intensity was 6 μW/m2, and the maximal ‘light’ intensity was 176.2 μW/m^{2}.
Acquisition and preprocessing of calcium imaging data
Request a detailed protocolDrifting grating responses were recorded as previously described (Montijn et al., 2016a) and putative spike times were extracted using an exponential fitting algorithm (Montijn et al., 2016b). All codes used in preprocessing of drifting grating calcium imaging data are available online (https://github.com/JorritMontijn/Preprocessing_Toolbox).
Virtual corridor experiments were performed on adult male RCFLtdTOM x PV FlpO x VIP2rcre (n = 1) and VIP2rcre mice (n = 3). During water restriction, the weight of the animals was carefully monitored and kept stable through powdered milk intake (10 % diluted in drinking water) in the corridor or additionally fed solid drink. Mice were kept on a reversed day/night schedule (12 hr/12 hr) and experiments were performed in their active dark phase.
For virus injection and chronic window implantation procedures, animals were anesthetized with isoflurane at 5 % for induction and 1.0–1.5% for maintenance. Metacam (1 mg/kg) was administered subcutaneously for systemic analgesia and dexamethasone (8 mg/kg) was administered to prevent brain swelling. The eyes were covered throughout the surgeries with Cavasan ointment. For virus injections, three small holes were drilled around the center of the right V1 (2.9 mm lateral from midline, 0.5 mm anterior from lambda). Animals were injected with salinediluted AAV2/1.hSyn.GCaMP6f.WPRE.SV40 (Addgene) at a depth of 250 and 550 µm (final concentration, 10^{12} viral particles/ml; 36.8 nl/depth/location). Mice were allowed to recover for a minimum of 2 weeks in their home cage before window implantations. For chronic window implantations, a custommade head bar was positioned above V1 and fixed on the skull with dental cement (Kerr). A circular 3 mm craniotomy was made on the area of injection while the dura was kept intact. A double glass window (3 + 4 mm diameter) was placed inside the craniotomy and fixed on the skull with dental cement (Kerr). Animals were allowed to recover for a minimum of 1 week before training began.
Before imaging, mice were trained to being head fixed and run through a virtual corridor on a custommade treadmill (Figure 7A). Running speed of the mice was measured with a rotary encoder and processed using an Arduino and Matlab. Absolute running speed was used to render the virtual corridor in realtime. The left half of the virtual corridor was rendered on a gammacorrected monitor (Dell) placed under an angle of 45° with a mirror to create the perception of a symmetrical corridor. The virtual corridor was written in Matlab using OpenGL and Psychophysics Toolbox three and contained a 100 cm black and white Gaussian noise texture with overlying visual stimuli. We used three vertical gratings and three checkerboard stimuli positioned 11 cm apart, at locations 22–77 cm (Figure 7B). After completing a run in the corridor, mice were immediately shown a luminancematched gray screen followed by a 0.5 s auditory cue (8 kHz) 1 second later. Two seconds after this cue, the mice received a 5–10 µl milk reward. Mice were trained for ~ 10 sessions until they completed up to ~ 150 trials during imaging sessions.
Imaging was performed with a twophoton microscope (Neurolabware) equipped with a Tisapphire laser (MaiTai ‘Deepsee’, Spectraphysics; wavelength, 920 nm) and a 16 x, 0.8 NA water immersion objective (Nikon). The microscope was controlled by Scanbox (Neurolabware) running on Matlab. One one mouse we performed dualplane imaging at 15.5 Hz/plane using an electrically tunable lens (OptoTune). The other three mice were imaged in a single plane at 15.5 Hz. During the preprocessing stage, we discarded all interneurons and only included putative pyramidal cells for further analysis.
Analysis of virtual corridor jointfeature encoding
Request a detailed protocolFor each neuron, we calculated ZETAscores after aligning the putative spike times to trial start (‘timealigned’), converting spike times to locations and aligning them to the beginning of the corridor (‘locationaligned’), or aligning the spike times to a visuomotor mismatch event, where rendering of the virtual corridor was paused for 500 ms (‘mismatchaligned’) (Figure 7C). We call the resulting values timemodulation, locationmodulation and mismatchmodulation respectively. In the case of the spatial “locationaligned” analysis, we furthermore discarded the first and last 10 % of the track to avoid including the start and end box locations. We investigated whether there was a relationship between the modulation values that neurons showed for these three different features in two ways. First, we simply calculated the Pearson correlation at the level of a single recording between time/location, time/mismatch, and location/mismatch modulations (Figure 7D–F). A onesample ttest on these correlation values showed there was a significant, positive correlation for time/location and a significant, negative correlation for location/mismatch. However, this analysis does not directly answer the question whether the joint encoding of two features at the level of single neurons is more or less likely than we would expect by chance.
We therefore zscored the modulation values per recording and pooled all neurons. We then used a kerneldensity estimator (KDE) to construct a probability density estimate for the distribution of modulation values for time, location, and mismatch (Figure 7—figure supplement 1). Combining these distributions for a pair of features, we obtain a nullhypothesis distribution of what the jointfeatureencoding distribution would look like if the modulationscores are independent of each other. Using the same bandwidths as for the singlefeature KDEdistributions, we smoothed the real data so we could directly compare which regions are over or underrepresented in the real data, producing the heat maps in Figure 7G–I. To quantify this over/underrepresentation, we counted the number of neurons in the upperright quadrant, where neurons lie that have high modulation scores for both features. We compared the real count to the distribution obtained from 10,000 random samples taken from the jointfeature KDEderived nullhypothesis distribution, where there is no correlation between the two feature modulation scores. As shown in Figure 7GI, Figure 7—figure supplement 1, time and locationmodulation scores were more often both high in single neurons than expected from chance. Moreover, there were fewer neurons that showed a joint encoding of both spatial location and visuomotor mismatch than expected from chance.
VIPoptogenetics analysis of Allen Brain Institute cephys data
Request a detailed protocolDetailed information on experimental and data acquisition procedures can be found online at the Allen Brain Institute website: https://portal.brainmap.org/explore/circuits/visualcodingneuropixels. We used data from 5 VipIRESCre; Ai32 mice that underwent laserbased optogenetic stimulation. We preselected 1706 clusters that were recorded in visuallyresponsive cortex (AL, AM, PM, L, V1, RL, or MMP) and were of sufficient quality, specifically: KiloSort2 tagged the cluster as “good”, ISI violations were under 0.5, amplitude thresholds under 0.1, and presence ratios over 0.9. The next step was to cull this population to only cells that showed modulated spiking with respect to the onset of optogenetic stimulation. We therefore included N = 1144 cells that showed P < 0.05 with a ZETAtest on the window between (–0.5, + 0.5 s) after optogenetic stimulation. We calculated the peakresponse latency using our IFR method and discarded cells with peak responses earlier than 1 ms after optogenetic stimulation onset (N = 909). The remaining cells were classified as VIP if their IFR peak latency was < 10 ms (N = 13), as Inhibited if their firing rate was significantly lower during 10–30 ms after optogenetic stimulation than during the 50 ms preceding optogenetic onsets (N = 59), as Activated if their rate was significantly higher during 10–30 ms after optogenetic onset than during the prestimulus baseline (N = 137), and otherwise as Other (N = 700). The large size of this latter group could be explained by many cells being visuallyresponsive to the blue laser light. We chose a 10–30 ms window to compare the IFR peak/trough latencies of Activated and Inhibited cells, as 10 ms was the time of laser offset, and visual responses start to emerge in visual cortex after about 30–50 ms.
A Proof of TimeInvariance
Request a detailed protocolWe have described some properties of the ZETAtest in the main method section, but we have not yet explained what the function is of the meansubtraction of δ in Equation 14. This step plays a critical role in ensuring that the ZETAtest is timeinvariant: i.e., that the latency of a neuronal response with respect to the stimulus onset does not affect the statistical significance of the ZETAtest.
We can see that this is the case if we have made a specific choice for the trial onsets, consisting of consecutive intervals of τ, and made a set of n spike times v_{1} to v_{n} in the interval [0, τ]. First we rewrite Equations 1–4 as:
Recall that d_{i} = δ_{i} – 1 /n Σ_{i} δ_{i} (Equations 6; 7). Now, consider a shift of the trial onset times by Δ and let v_{k} be the highest spike time smaller than Δ. This results in a new set of n spike times v_{i}’:
Note that Equation 47 implies circular time with the recording wrapping back to the beginning at the end of all trials, which we assume here to keep n constant. If we define δ_{i}’, analogous to δ_{i}, and use Equation 46–47 we find:
And if we subtract its mean, the constants are removed, and we get:
The set of d’_{i} are thus identical to the set d_{i} except for a reordering. The maximum of the set d_{i} will therefore also be the maximum of the set d_{i}’. This means that it does not matter when the trial onsets are taken to compute ZETA. Note that this is not the case if δ_{i} is used instead of d_{i}.
To illustrate this difference, we will now derive closedform solutions for the expectation and variance of δ_{i} and d_{i} in the specific case of stepwise changing Poissondistributed spiking rates. Note that this section serves only to illustrate the above derivation (Equation 45–51) for a specific case, so the reader may choose to skip the rest of this section without missing out on any particularly important comments.
First, recall the base variables: we use v as a vector of spike times relative to stimulus onset, with the total onsettoonset epoch duration defined as τ (see equation 1). We will set the neuron’s firing probability to be homogeneous with Poisson rate λ. Since exponentially distributed interspike intervals generate Poissondistributed spike counts, we use:
Therefore, spike time w_{i} is:
In the limit of large n, the incremental deviation $\dot{D}}_{i$ from an exactly uniform spiketime distribution (i.e., 1/λ) is therefore:
In reality, however, the rate $\widehat{\lambda}$ is estimated from data that are limited by the observation window τ, number of trials m, and observed number of spikes $n=m\tau \widehat{\lambda}$ . First we collapse all spikes over trials, following Equation 1, such that v_{n}=τ. Then we normalize by τ to make v_{n} = 1, which means that we now have:
generating spike times
with incremental deviations
We can confirm this is correct by rewriting the total deviation δ_{i} at spike i as:
To compute a closedform for the variance of d, we need the first and second central moments of δ (i.e., the mean and variance). δ depends on $\dot{\delta}$, the variance of which is:
In the case for large n, each $\dot{\delta}$ is an exponential random variable. As δ_{i} is a sum over $\dot{\delta}}_{\mathrm{s}$, we can approximate its pdf as an Erlang distribution with scale parameter k equal to the spike number i:
The Erlang distribution is a special case of the gamma distribution with discrete parameters. An equivalent formulation is therefore:
However, we have so far ignored that δ_{i} is meanzero and fixed at 0 at t = 0 and t = τ. For a Wiener process W, such fixed points are known as a Brownian bridge (Mansuy and Yor, 2008), which is described by:
However, since the underlying stochastic process in our case is not a standard normal, but gammadistributed, we cannot directly apply the above equation. Instead, we found that with sufficiently large n, the behavior of δ is described by a weighted difference of two timesymmetric series of gammas (see Figure 2—figure supplement 1B). One series grows as Equation 60, whereas its symmetric counterpart shrinks as k = ni:
The central moments of the difference between two gamma distributions are given by Klar, 2015:
Where the subscripts indicate the two distributions and α_{1}= k = i, α_{2}= n  i, and β_{1} = β_{2} = 1/λ. Filling in the above parameters and weighting variables, we therefore get:
Now, we need to compute the variance over d_{i}, which is defined as the meansubtracted δ_{i}.
To simplify the following derivation, we will assume that time is circular (as in Equation 47) and that jittering does not strongly impact the number of spikes n, so we can we treat n as a fixed number of samples. As shown in Figure 2—figure supplement 1C, these assumptions allow accurate estimations. Now, we consider v_{k}, $1\le k\le n$ taken from a uniform distribution on [0,1] and ordered such that ${v}_{1}\le {v}_{2}\le \dots \le {v}_{n}$. We recognize the v_{k} as the order statistics of the sample, for which the probability distribution is
The expectation of ${v}_{k}$ and ${v}_{k}^{2}$ are given by
Now, redefine
Then using Equation 69 and Equation 70, we find
We find that the variance of ${\delta}_{i}$ is parabolic with respect to i, with its minimum at $i=1$ and $i=n$ and its maxium at the middle between those points. The maximum of all ${\delta}_{i}$ is thus also more likely to at the middle ${\delta}_{i}$ than at the extremes.
We define again
with $\overline{\delta}=1/n{\mathrm{\Sigma}}_{i}{\delta}_{i}$. We know already that $E({d}_{i}^{2})$ is the same for all i, and we can thus write
To compute $E({\overline{\delta}}^{2})$ , we need to know $E\left({\delta}_{i}{\delta}_{j}\right)$ and therefore $E\left({v}_{i}{v}_{j}\right)$. To compute $E\left({v}_{i}{v}_{j}\right)$ for $j>i$, we can write ${v}_{j}={v}_{i}+{x}_{ji}$ and understand that ${x}_{ji}$ follows the same order statistic distribution as ${v}_{i}$ except that there are now only $ni$ samples taken from an interval $\left[\mathrm{0,1}{v}_{i}\right]$ and ${x}_{ji}$ is the ji^{th} sample.
Using this we see that for$j>i$
Then we find, after a long but straightforward calculation, that
And therefore
Note that the dependence on i has disappeared: i.e., the variance of d is timeinvariant; which is what we aimed to show. Also note that while we made various assumptions to simplify the above derivations, our theoretical solutions accurately predict simulated data (Figure 2—figure supplement 1); showing these assumptions have little impact on the results.
If we could now compute Var[max(d’)], E[max(d’)] and E[max(d)] from the above solutions for d’, we would even be able to construct a closedform solution for the ZETAtest’s pvalue in the case of exponentiallydistributed interspike intervals. Unfortunately, the distribution of max(d) is unknown and fairly complex, because the elements of d are not statistically independent.
Data availability
As stated in the manuscript, opensource code for the ZETAtest is available at https://github.com/JorritMontijn/ZETA and https://github.com/JorritMontijn/zetapy. Furthermore, code to reproduce the ZETA benchmarks are available at https://github.com/JorritMontijn/ZETA_analysis_repository (copy archived at https://archive.softwareheritage.org/swh:1:rev:58dc4d8d3e9db6c06906445a8c3fa4a253b1fe3a) The Neuropixels data are annotated and available here: https://doi.org/10.5061/dryad.6djh9w108.

Dryad Digital RepositoryZETA benchmarking neuropixels data.https://doi.org/10.5061/dryad.6djh9w108
References

Preference for concentric orientations in the mouse superior colliculusNature Communications 6:6773.https://doi.org/10.1038/ncomms7773

Informationlimiting correlations in large neural populationsThe Journal of Neuroscience 40:1668–1678.https://doi.org/10.1523/JNEUROSCI.207219.2019

Power failure: Why small sample size undermines the reliability of neuroscienceNature Reviews. Neuroscience 14:365–376.https://doi.org/10.1038/nrn3475

BookEfficient AUC optimization for classificationIn: Kok JN, Koronacki J, Lopez de Mantaras R, Matwin S, Mladenič D, Skowron A, editors. Knowledge Discovery in Databases. Berlin Heidelberg: Springer. pp. 42–53.https://doi.org/10.1007/9783540749769_8

Detection of bursts in neuronal spike trains by the mean interspike interval methodProgress in Natural Science 19:229–235.https://doi.org/10.1016/j.pnsc.2008.05.027

The significance of action potential bursting in the brain reward circuitNeurochemistry International 41:333–340.https://doi.org/10.1016/s01970186(02)000682

A canonical microcircuit for neocortexNeural Computation 1:480–488.https://doi.org/10.1162/neco.1989.1.4.480

On the histogram as a density estimator:L 2 theoryZeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 57:453–476.https://doi.org/10.1007/BF01025868

A theory of cortical responsesPhilosophical Transactions of the Royal Society of London. Series B, Biological Sciences 360:815–836.https://doi.org/10.1098/rstb.2005.1622

Perceptions as hypothesesPhilosophical Transactions of the Royal Society of London. Series B, Biological Sciences 290:181–197.https://doi.org/10.1098/rstb.1980.0090

The return period of flood flowsThe Annals of Mathematical Statistics 12:163–190.https://doi.org/10.1214/aoms/1177731747

BookStatistical Theory of Extreme Values and Some Practical Applications: A Series of LecturesU.S. Government Printing Office.

The extent and consequences of Phacking in sciencePLOS Biology 13:e1002106.https://doi.org/10.1371/journal.pbio.1002106

Receptive fields of single neurones in the cat’s striate cortexThe Journal of Physiology 148:574–591.https://doi.org/10.1113/jphysiol.1959.sp006308

A spiketrain probability modelNeural Computation 13:1713–1720.https://doi.org/10.1162/08997660152469314

A note on gamma difference distributionsJournal of Statistical Computation and Simulation 85:3708–3715.https://doi.org/10.1080/00949655.2014.996566

Optical voltage imaging in neurons: Moving from technology development to practical toolNature Reviews. Neuroscience 20:719–727.https://doi.org/10.1038/s4158301902314

BookAspects of Brownian MotionHeidelberg, Germany: SpringerVerlag.https://doi.org/10.1007/9783540499664

Anatomy of hierarchy: Feedforward and feedback pathways in macaque visual cortexThe Journal of Comparative Neurology 522:225–259.https://doi.org/10.1002/cne.23458

Membrane properties of rat subicular neurons in vitroJournal of Neurophysiology 70:1244–1248.https://doi.org/10.1152/jn.1993.70.3.1244

Robust quantification of orientation selectivity and direction selectivityFrontiers in Neural Circuits 8:92.https://doi.org/10.3389/fncir.2014.00092

Modality and topographic properties of single neurons of cat’s somatic sensory cortexJournal of Neurophysiology 20:408–434.https://doi.org/10.1152/jn.1957.20.4.408

Highly selective receptive fields in mouse visual cortexThe Journal of Neuroscience 28:7520–7536.https://doi.org/10.1523/JNEUROSCI.062308.2008

ConferenceProcessing Supplementary Proceedings ICANN/ICONIP 2003In Artificial Neural Networks and Neural Information.

On the significance of correlations among neuronal spike trainsBiological Cybernetics 59:1–11.https://doi.org/10.1007/BF00336885

Orientation selectivity in macaque V1: Diversity and laminar dependenceThe Journal of Neuroscience 22:5639–5651.https://doi.org/10.1523/JNEUROSCI.221305639.2002

Integration of visual motion and locomotion in mouse visual cortexNature Neuroscience 16:1864–1869.https://doi.org/10.1038/nn.3567

A method for selecting the bin size of a time histogramNeural Computation 19:1503–1527.https://doi.org/10.1162/neco.2007.19.6.1503

Reward timing in the primary visual cortexScience 311:1606–1609.https://doi.org/10.1126/science.1123513

Theories of Error BackPropagation in the BrainTrends in Cognitive Sciences 23:235–250.https://doi.org/10.1016/j.tics.2018.12.005

Correlation of local and global orientation and spatial frequency tuning in macaque v1The Journal of Physiology 557:923–933.https://doi.org/10.1113/jphysiol.2004.062026
Decision letter

Mark CW van RossumReviewing Editor; University of Nottingham, United Kingdom

Joshua I GoldSenior Editor; University of Pennsylvania, United States

Jonathan D VictorReviewer; Weill Cornell Medical College, United States
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
A central question in neuroscience is whether a neuron responds to a certain stimulus. However, when the response is complex, e.g., biphasic where an increase in rate is followed by a lower rate, this is not always easy to determine. In this methods paper, the authors introduce a new binless method that detects whether a neuron responds to a stimulus. In particular for labs using highthroughput data collection, the method should be useful.
Decision letter after peer review:
[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]
Thank you for submitting your work entitled "A parameterfree statistical test that improves the detection of neuronal responsiveness" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by a Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Jonathan D Victor (Reviewer #3).
Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife.
While the reviewers saw considerable merit in the study, in particular regarding the extensive and robust result of applying the method to data, the claims regarding the method being state of the art were found to be insufficiently convincing for eLife.
Reviewer #1:
This paper introduces a, as far as I know, new method to detect modulation of neural firing rates. It is particularly more sensitive to neurons with a biphasic modulation of rate.
The method is straightforward and perhaps not very advanced but should be of interest to a wide community, also because it does not depend on binning.
The study appears to be competently carried out and applies the method to a very large range of systems.
Reviewer #2:
This paper will be of interest to researchers using electrophysiology or Calcium imaging who want to detect whether cells are responsive to a stimulus and are concerned that testing the mean firing rate may miss cells that are responsive through temporal patterns of their spike trains. The method reported here is rigorous and is an advance compared to doing a standard statistical test on mean firing rate, since it is sensitive to additional temporal information. However, other methods for detecting responsiveness based on temporal information are available and it is not clear that the new test has a meaningful performance advantage compared to these.
The goal here is (1) to develop a reliable method for detecting whether and when a neuron responds to stimulation and does so without requiring the bin size to be specified and (2) that this method is superior to previous approaches. As stated by the authors "we present the parameterfree ZETAtest, which outperforms ttests and ANOVAs by including more cells at a similar falsepositive rate".
The authors have achieved the first goal; developing a parameterfree "zeta test", which works by testing whether the cumulative distribution of spike times deviates significantly from what would be expected from random firing. However, achievement of the second goal is not proven.
Readers should consider the following when reading the paper:
1. The binless aspect of the zeta test is elegant as it avoids the need to choose a binsize for the analysis.
2. That the zeta test is more sensitive than a ttest is a straightforward consequence of the fact that the two tests are applied to different inputs. The ttest is used to compare whether the mean firing rate in some temporal interval is greater during the stimulus compared to baseline. The zeta test works on the basis of the spike times within the response interval. The increased sensitivity of the zeta test compared to the ttest stems from the property that the timing of spikes within an interval necessarily conveys as much or more information then the mean firing rate within the interval.
3. Given the previous point, a better benchmark for the zeta test is against other approaches that take spike timing into account. The authors consider an ANOVA, which splits the response interval into bins, and tests whether or not the pattern of firing rate over those bins is temporally modulated. They find that the zeta test is more sensitive than the ANOVA for some binsizes. However, for the most physiologically relevant binsizes (those around the timescale of the stimulus), the two methods perform similarly. The apparent advantage of the zeta test is largest at small bin sizes < 1 ms. It is possible that the apparent performance advantage in these small bins compared to 20100 ms bins, is a sampling artefact. Further work is needed to exclude this. In sum, contrary to the claim of the abstract, it is not clear that the zeta method has a genuine performance improvement compared to ANOVA.
Overall, following on from the comments above, this method will be of potential interest to a broad community of researchers seeking that measure neural activity with cellular resolution (extracellular electrophysiology and Calcium imaging) but the advance is at a level more suited to a specialised journal.
The authors find the inclusion rates of the zeta and ANOVA to be indistinguishable for binsizes of ~220 times the sampling interval of the movie, so within this broad range of timescales, there is no performance advantage for the zeta test. There is an advantage for smaller bins and for larger bins. The advantage for larger bins is presumably for the same reason as occurs in the ttest section (this should be explained in Results). The advantage for small bins is problematic for two reasons. First, the range of bin sizes is taken down to < 0.1 ms. This is 100 times less than the natural timescale of the movie stimulus (its frame interval) so, although not impossible, it is surprising/interesting/unconvincing that there is extra temporal information there missed by the ANOVA. Another possibility (as extensively considered in the information theory literature, e.g. Panzeri et al., 2007) is that this is a sampling artefact. Further work is needed to address this point, for example by using simulated data where the response timescales are controlled. It is also possible that the ANOVA is underperforming due to overconservative nature of Bonferroni correction. the ANOVA's inclusion rate might be better with multiple comparisons control done via the BenjaminiHochbergYekutieli procedure.
Reviewer #3:
This work presents a new approach to determining whether neuronal responses are modulated in response to a stimulus. The approach is motivated by the need for statistical tests that do not require a choice of a timeframe for analysis, or other analysis parameters. The investigators derive a null distribution for the new measure, provide code for its implementation, and benchmark it against the ttest for both simulated and real data in several diverse systems.
While I am generally sympathetic to the motivation of the work, I have both conceptual and technical concerns. The most critical is that, while I think they have presented strong evidence that the tool is useful, I don't think that they have shown that it advances the state of the art (point 1 below): they compare their approach to a ttest, which I think is something of a straw man. I believe the biological results are strong and interesting, but, since the paper is submitted as "Tools and Resources" paper, my review focuses on the tool itself, and whether it has been shown to be an advance.
1. The main comparisons are to the ttest. While the ttest may well be widely used and simple, it is far from the state of the art. Whenever a response is modulated in time (and not just steadily elevated), there is almost always a better approach. For example, for periodic stimulation, such as the examples of Figure 2, one can use Fourier analysis to identify modulations in the response. More generally, point process models – see for example the work of Emery Brown and Rob Kass, including the Kass, Eden, and Brown textbook "Analysis of Neural Data", provide a theoretical framework for detecting response modulations in a wide variety of settings, as well as determining when the peak modulation occurs; these methods provide estimates of the underlying firing rate along with error bars. As is the case for the proposed method, these pointprocess approaches work directly with spike times, and do not require the choice of a bin width. So, while it is a good point that tstatistics may underestimate the fraction of neurons that are modulated by an experimental manipulation in some situations, there are a number of standard solutions already available. The manuscript does not discuss these other approaches or compare the new approach to them, and therefore stops substantially short of showing that the ZETAtest is an advance on the state of the art. [On a subjective note, I think that as a general approach, the pointprocess framework mentioned above is superior conceptually, as it begins with an explicit statistical model and an inference framework.]
2. For the examples in which the response is constructed to be a pure elevation of the firing rate, the ZETA test does not appear to be superior (Figure 4, row 1): there is a larger fraction of cells detected as responsive, but also a greater number of false positives. An ROC curve would likely show this. But more importantly, if the underlying firing statistics are Poisson – as they are constructed to be – then, since the firing rate is a sufficient statistic, simple estimation of the firing rate cannot be worse than the zeta statistic.
3. The derivation of the null distribution for the statistic makes a lot of assumptions, including the extent of jittering and the use of extremevalue asymptotics; there may be a much simpler way to the goal. If I understand correctly, the idea is to construct an estimate of the cumulative distribution of spike times, and to compare that with the null hypothesis that they are uniformly distributed. If that is the case, perhaps one could just use the KolmogorovSmirnov test? This also brings up another issue: if the neuron's underlying firing pattern is far from Poisson, e.g., that it is similar to a renewal process whose interspike interval distribution is highly peaked – will the zeta test work?
4. Finally, I think that the virtues of a "nonparametric" test is not as obvious or universal as they might first appear. On the one hand, even with a test such as the one proposed here, one still needs to choose the time period to analyze. On the other hand, one often cares very much about the timescale of the response variation, both to exclude phenomena that one is not interested in, and to understand what aspects of the response are informative. This point is of course not germane to whether the paper makes an advance, but I think it would be preferable to discuss these considerations, rather than to assert that nonparametric tests are preferable.
https://doi.org/10.7554/eLife.71969.sa1Author response
[Editors’ note: the authors resubmitted a revised version of the paper for consideration. What follows is the authors’ response to the first round of review.]
Reviewer #1:
This paper introduces a, as far as I know, new method to detect modulation of neural firing rates. It is particularly more sensitive to neurons with a biphasic modulation of rate.
The method is straightforward and perhaps not very advanced but should be of interest to a wide community, also because it does not depend on binning.
We would like to thank the reviewer for the positive comments and review. As noted above, we now show that the ZETAtest is not only superior in the case of biphasic neurons, but also for bursting cells (Figure 4). We have also added an analysis of the marginal case of purely exponentially distributed interspike intervals (ISIs); in this hypothetical scenario, the ttest exceeds the ZETAtest.
However, we also show that only small deviations from exact alignment are required for the ZETA test to exceed the ttest’s performance (Figure 4).
The study appears to be competently carried out and applies the method to a very large range of systems. All my comments are minor.
Reviewer #2:
This paper will be of interest to researchers using electrophysiology or Calcium imaging who want to detect whether cells are responsive to a stimulus and are concerned that testing the mean firing rate may miss cells that are responsive through temporal patterns of their spike trains. The method reported here is rigorous and is an advance compared to doing a standard statistical test on mean firing rate, since it is sensitive to additional temporal information. However, other methods for detecting responsiveness based on temporal information are available and it is not clear that the new test has a meaningful performance advantage compared to these.
The goal here is (1) to develop a reliable method for detecting whether and when a neuron responds to stimulation and does so without requiring the bin size to be specified and (2) that this method is superior to previous approaches. As stated by the authors "we present the parameterfree ZETAtest, which outperforms ttests and ANOVAs by including more cells at a similar falsepositive rate".
The authors have achieved the first goal; developing a parameterfree "zeta test", which works by testing whether the cumulative distribution of spike times deviates significantly from what would be expected from random firing. However, achievement of the second goal is not proven.
We thank the reviewer for these useful comments on our manuscript. We appreciate that our previous version lacked sufficient comparisons with more advanced statistical approaches. As noted above, we have added a considerable amount of new material to the manuscript, including multiple alternative statistical tests. We hope that the reviewer finds the new comparisons sufficiently convincing. Regarding the comparison with the multitimescale ANOVA, we believe that our previous manuscript was not sufficiently clear, and we have now rewritten parts of the description of the ANOVAbased analysis.
Readers should consider the following when reading the paper:
1. The binless aspect of the zeta test is elegant as it avoids the need to choose a binsize for the analysis.
Thank you, we appreciate the comment.
2. That the zeta test is more sensitive than a ttest is a straightforward consequence of the fact that the two tests are applied to different inputs. The ttest is used to compare whether the mean firing rate in some temporal interval is greater during the stimulus compared to baseline. The zeta test works on the basis of the spike times within the response interval. The increased sensitivity of the zeta test compared to the ttest stems from the property that the timing of spikes within an interval necessarily conveys as much or more information then the mean firing rate within the interval.
We agree with the reviewer that the number of data points used by a ttest (n=number of trials) is much lower than with the ZETAtest (n=number of spikes). However, the ttest does have access to information that the ZETAtest does not: the spike times used by the ZETAtest are flattened over trials, while the ttest uses the variability across trials. Therefore, when the variability across trials is low, but the variability within a trial is high, the ZETAcould perform worse than a ttest. The important analysis here would therefore be to find where their point of equality lies. As we show in our new Figure 4AB, the ttest exceeds the ZETAtest’s sensitivity only for biologically implausible exponentiallyspiking cells. Importantly, even in the case of strongly bursting cells (Figure 4CF), the ZETAtest outperforms the ttest. We discuss this issue in the paragraph “ZETAtest in the absence of short peaks of activity” (lines 172202).
3. Given the previous point, a better benchmark for the zeta test is against other approaches that take spike timing into account.
We have now added a detailed comparison of the ZETAtest with alternative tests derived from renewalprocess theory to the manuscript (lines 112117, 542697, Supplementary Methods and Figure 3 – Supplements 2,3), and performed a new comparison with an optimallybinned ANOVA (lines 149170, 533540, Figure 3). The ZETAtest is superior to all alternatives we have tested, and we now also describe more clearly which mathematical properties are crucial to the ZETAtest’s superior statistical sensitivity.
The authors consider an ANOVA, which splits the response interval into bins, and tests whether or not the pattern of firing rate over those bins is temporally modulated. They find that the zeta test is more sensitive than the ANOVA for some binsizes. However, for the most physiologically relevant binsizes (those around the timescale of the stimulus), the two methods perform similarly. The apparent advantage of the zeta test is largest at small bin sizes < 1 ms. It is possible that the apparent performance advantage in these small bins compared to 20100 ms bins, is a sampling artefact. Further work is needed to exclude this. In sum, contrary to the claim of the abstract, it is not clear that the zeta method has a genuine performance improvement compared to ANOVA.
Overall, following on from the comments above, this method will be of potential interest to a broad community of researchers seeking that measure neural activity with cellular resolution (extracellular electrophysiology and Calcium imaging) but the advance is at a level more suited to a specialised journal.
The authors find the inclusion rates of the zeta and ANOVA to be indistinguishable for binsizes of ~220 times the sampling interval of the movie, so within this broad range of timescales, there is no performance advantage for the zeta test. There is an advantage for smaller bins and for larger bins. The advantage for larger bins is presumably for the same reason as occurs in the ttest section (this should be explained in Results). The advantage for small bins is problematic for two reasons. First, the range of bin sizes is taken down to < 0.1 ms. This is 100 times less than the natural timescale of the movie stimulus (its frame interval) so, although not impossible, it is surprising/interesting/unconvincing that there is extra temporal information there missed by the ANOVA. Another possibility (as extensively considered in the information theory literature, eg Panzeri et al., 2007) is that this is a sampling artefact. Further work is needed to address this point, for example by using simulated data where the response timescales are controlled.
We apologize for not describing this procedure sufficiently clearly. The ZETAtest is intrinsically timescalefree, and we chose different bin widths (i.e., timescales) only for the various ANOVAs. The fact that the ANOVAs perform worse than the ZETAtest at short and long timescales is not an indication that the ANOVA missed information at those timescales. Rather, no information about the stimulus exists at those timescales, and therefore the ANOVA performs poorly. We agree with the reviewer’s logic if this were what our results showed, but this is not the case: the reviewer’s interpretation was caused by us describing our methods insufficiently clearly. We have made various changes to the text (lines 204229) and the legend of Figure 5 to further clarify that only the ANOVAs used different bin sizes.
It is also possible that the ANOVA is underperforming due to overconservative nature of Bonferroni correction. the ANOVA's inclusion rate might be better with multiple comparisons control done via the BenjaminiHochbergYekutieli procedure.
Using a less conservative correction procedure would increase the ANOVA’s inclusion, but also increase the number of false alarms. As such, it would not change the ROC curve shown in Figure 5C, except that the curve’s inclusion and FA rates would correspond to different values of the significance level α.
Reviewer #3:
This work presents a new approach to determining whether neuronal responses are modulated in response to a stimulus. The approach is motivated by the need for statistical tests that do not require a choice of a timeframe for analysis, or other analysis parameters. The investigators derive a null distribution for the new measure, provide code for its implementation, and benchmark it against the ttest for both simulated and real data in several diverse systems.
While I am generally sympathetic to the motivation of the work, I have both conceptual and technical concerns. The most critical is that, while I think they have presented strong evidence that the tool is useful, I don't think that they have shown that it advances the state of the art (point 1 below): they compare their approach to a ttest, which I think is something of a straw man. I believe the biological results are strong and interesting, but, since the paper is submitted as "Tools and Resources" paper, my review focuses on the tool itself, and whether it has been shown to be an advance.
1. The main comparisons are to the ttest. While the ttest may well be widely used and simple, it is far from the state of the art. Whenever a response is modulated in time (and not just steadily elevated), there is almost always a better approach. For example, for periodic stimulation, such as the examples of Figure 2, one can use Fourier analysis to identify modulations in the response. More generally, point process models – see for example the work of Emery Brown and Rob Kass, including the Kass, Eden, and Brown textbook "Analysis of Neural Data", provide a theoretical framework for detecting response modulations in a wide variety of settings, as well as determining when the peak modulation occurs; these methods provide estimates of the underlying firing rate along with error bars. As is the case for the proposed method, these pointprocess approaches work directly with spike times, and do not require the choice of a bin width. So, while it is a good point that tstatistics may underestimate the fraction of neurons that are modulated by an experimental manipulation in some situations, there are a number of standard solutions already available. The manuscript does not discuss these other approaches or compare the new approach to them, and therefore stops substantially short of showing that the ZETAtest is an advance on the state of the art. [On a subjective note, I think that as a general approach, the pointprocess framework mentioned above is superior conceptually, as it begins with an explicit statistical model and an inference framework.]
We understand the reviewer’s point of view that (A) the ZETAtest was not well described in terms of existing mathematical frameworks and (B) that more specialized and advanced techniques may outperform the ZETAtest.
Regarding point A: In this revised version, we have dedicated significant work to better ground the ZETAtest in existing mathematical frameworks. We compare the ZETAtest to alternative formulations derived from renewal and pointprocess models (lines 608697 and Figure 3 – Supplements 2,3), showing how ZETA can be constructed stepbystep from a simple Kolmogorov Smirnov test of real spiketimes versus a Poisson H0 distribution up to the fullfledged ZETAtest. Comparing the performance of these various alternative tests, we found that the ZETAtest outperforms all others (Figure 3 – Supplement 3). Moreover, we now describe how the ZETAtest achieves timeinvariance (Supplementary Methods), and show that an ANOVA that uses an optimal bin width still performs worse than the ZETAtest (lines 149170, Figure 3).
Regarding point B: We created the ZETAtest as an easytouse, firstpass test for neuronal responsiveness that requires no hyperparameter tuning and is relatively fast, as no such test yet exists for neurophysiological analysis. We do not claim that the ZETAtest always exceeds the performance of more sophisticated and specialized modelbased approaches, nor was this our aim. Rather, the point we wish to make in our manuscript is that the ZETAtest is a powerful, statistically sensitive test that exceeds the performance of other naïve approaches. That said, we acknowledge that the range of alternative statistical tests in our previous version was limited. We have therefore added a comparison with a multiplicative inhomogeneous Markov interval (MIMI) model, as described in (Kass and Ventura, 2001; Neural computation; Kass, Eden and Brown, 2014; Analysis of Neural Data), to the current version. We use it to both detect responsive cells (Figure 3 – Supplement 2) and to determine response latencies (Figure 6). In both cases, the MIMImethod performed somewhat mediocrely. It was very well able to detect responsive cells with many spikes, but showed a high false alarm rate for cells with few spikes, presumably due to overfitting. As we also note in the manuscript, the MIMImethod may be a useful technique in many regards, but it requires hyperparameter tuning, such as regularization and the choice of number and location of knots, in order to perform well. It is therefore not suited as an automated generalist test for responsiveness of neurons with varied spiking statistics. Moreover, any alternative method that uses model fitting will likely show similar requirements. Using the MIMImethod to determine latencies, we found that it showed more timescaleinvariance than PSTHbased approaches, but still did not perform as well as the ZETAIFR (Figure 6). We made numerous edits throughout the manuscript to better communicate the aim and applicability of the ZETAtest, and expanded the discussion of the proposed scope of the ZETAtest (lines 377418). In summary, full MIMImodel based methods do not seem to be suited for unsupervised, largescale use as neuronal responsiveness tests.
2. For the examples in which the response is constructed to be a pure elevation of the firing rate, the ZETA test does not appear to be superior (Figure 4, row 1): there is a larger fraction of cells detected as responsive, but also a greater number of false positives. An ROC curve would likely show this. But more importantly, if the underlying firing statistics are Poisson – as they are constructed to be – then, since the firing rate is a sufficient statistic, simple estimation of the firing rate cannot be worse than the zeta statistic.
We have added a more indepth analysis of how the ZETAtest and ttest behave in the case of cells with purely exponentiallydistributed interspike intervals (Figure 4AB). Our analyses show that there indeed exists a range of a parameters where the ttest outperforms the ZETAtest. Importantly, however, this is a fairly narrow range and only holds for biologically implausible parameter values.
3. The derivation of the null distribution for the statistic makes a lot of assumptions, including the extent of jittering and the use of extremevalue asymptotics; there may be a much simpler way to the goal.
As we now also better explain in the methods (lines 514521), we use the extreme value asymptotics only to limit the number of random samples needed to calculate the pvalue of ZETA. In the limit of infinite random jitters, we would obtain the true null distribution. The use of the Gumbel approximation is therefore simply to make the test more computationally tractable.
If I understand correctly, the idea is to construct an estimate of the cumulative distribution of spike times, and to compare that with the null hypothesis that they are uniformly distributed. If that is the case, perhaps one could just use the KolmogorovSmirnov test?
We agree with the reviewer that this is indeed an interesting idea. We have therefore explored it in more detail. As we show in lines 608668 and Figure 3 – Supplement 3, the basis of ZETA is not dissimilar to a KS test, but a KS test shows a very high false alarm rate. In short, this is presumably caused by the issue that any deviation from nonuniform ISIs will lead to a low pvalue; i.e., stimulus nonresponsive, but bursting, cells will cause significant pvalues.
This also brings up another issue: if the neuron's underlying firing pattern is far from Poisson, e.g., that it is similar to a renewal process whose interspike interval distribution is highly peaked – will the zeta test work?
Again, we agree with the reviewer that this is an important question, and thank the reviewer for bringing this up. We further investigated this issue using simulated bursting cells. Our new analysis show that even in the absence of peaks, but in the presence of bursting and highly peaked ISI distributions, the ZETAtest performs very well, and better than the ttest (Figure 4CF).
4. Finally, I think that the virtues of a "nonparametric" test is not as obvious or universal as they might first appear. On the one hand, even with a test such as the one proposed here, one still needs to choose the time period to analyze. On the other hand, one often cares very much about the timescale of the response variation, both to exclude phenomena that one is not interested in, and to understand what aspects of the response are informative. This point is of course not germane to whether the paper makes an advance, but I think it would be preferable to discuss these considerations, rather than to assert that nonparametric tests are preferable.
We have now added some additional comments to the discussion on modelbased approaches (e.g., lines 386391, noting that they may attain better performance than the ZETAtest), but require supervision and hyperparameter selection. As we wrote above, we do not wish to claim that the ZETAtest is the endall and beall of statistical tests: each test must be chosen for a specific use case. We have added various comments relating to the scope of the ZETAtest throughout the manuscript.
https://doi.org/10.7554/eLife.71969.sa2Article and author information
Author details
Funding
Stichting Vrienden van het Herseninstituut
 J Alexander Heimel
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
Brain images in Figures 3—5 were generated using Brain Explorer 2 (Lau et al., 2008). We thank the Allen Brain Institute for their openly accessible data sets. We also thank the engineers of the mechatronics workshop at the NIN.
Ethics
All experiments were approved by the animal ethics committee of the Royal Netherlands Academy of Arts and Sciences, in compliance with all relevant ethical regulations. Animals received anesthetics and analgesics where applicable, such as during surgeries, and every effort was made to minimize animal suffering.
Senior Editor
 Joshua I Gold, University of Pennsylvania, United States
Reviewing Editor
 Mark CW van Rossum, University of Nottingham, United Kingdom
Reviewer
 Jonathan D Victor, Weill Cornell Medical College, United States
Publication history
 Received: July 6, 2021
 Accepted: September 22, 2021
 Accepted Manuscript published: September 27, 2021 (version 1)
 Version of Record published: November 26, 2021 (version 2)
 Version of Record updated: November 30, 2021 (version 3)
Copyright
© 2021, Montijn et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 3,612
 Page views

 552
 Downloads

 2
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Neuroscience
Sensory responses of cortical neurons are more discriminable when evoked on a baseline of desynchronized spontaneous activity, but cortical desynchronization has not generally been associated with more accurate perceptual decisions. Here, we show that mice perform more accurate auditory judgments when activity in the auditory cortex is elevated and desynchronized before stimulus onset, but only if the previous trial was an error, and that this relationship is occluded if previous outcome is ignored. We confirmed that the outcomedependent effect of brain state on performance is neither due to idiosyncratic associations between the slow components of either signal, nor to the existence of specific cortical states evident only after errors. Instead, errors appear to gate the effect of cortical state fluctuations on discrimination accuracy. Neither facial movements nor pupil size during the baseline were associated with accuracy, but they were predictive of measures of responsivity, such as the probability of not responding to the stimulus or of responding prematurely. These results suggest that the functional role of cortical state on behavior is dynamic and constantly regulated by performance monitoring systems.

 Neuroscience
Successive auditory inputs are rarely independent, their relationships ranging from local transitions between elements to hierarchical and nested representations. In many situations, humans retrieve these dependencies even from limited datasets. However, this learning at multiple scale levels is poorly understood. Here, we used the formalism proposed by network science to study the representation of local and higherorder structures and their interaction in auditory sequences. We show that human adults exhibited biases in their perception of local transitions between elements, which made them sensitive to highorder network structures such as communities. This behavior is consistent with the creation of a parsimonious simpliﬁed model from the evidence they receive, achieved by pruning and completing relationships between network elements. This observation suggests that the brain does not rely on exact memories but on a parsimonious representation of the world. Moreover, this bias can be analytically modeled by a memory/efficiency tradeoff. This model correctly accounts for previous findings, including local transition probabilities as well as highorder network structures, unifying sequence learning across scales. We finally propose putative brain implementations of such bias.