Abstract
Hebb's idea of a cell assembly as the fundamental unit of neural information processing has dominated neuroscience like no other theoretical concept within the past 60 years. A range of different physiological phenomena, from precisely synchronized spiking to broadly simultaneous rate increases, has been subsumed under this term. Yet progress in this area is hampered by the lack of statistical tools that would enable to extract assemblies with arbitrary constellations of time lags, and at multiple temporal scales, partly due to the severe computational burden. Here we present such a unifying methodological and conceptual framework which detects assembly structure at many different time scales, levels of precision, and with arbitrary internal organization. Applying this methodology to multiple single unit recordings from various cortical areas, we find that there is no universal cortical coding scheme, but that assembly structure and precision significantly depends on the brain area recorded and ongoing task demands.
https://doi.org/10.7554/eLife.19428.001Introduction
Even more than six decades after its conception, Hebb's (1949) fundamental idea of a cell assembly continues to play a key role in our understanding of how neural physiology may link up to cognitive function. Loosely, a cell assembly refers to a group of neurons which, by functionally organizing into a temporally coherent set, come to represent mental or perceptual entities, thereby forming the basis of neural coding and computation (Hebb, 1949). However, the term lacks a stringent and universally accepted definition, and has been used to denote anything from the precise zerophaselag spike synchronization in a defined subset of neurons (Abeles, 1991; Singer and Gray, 1995; Roelfsema et al., 1997; Diesmann et al., 1999; Harris et al., 2003) to temporally coherent changes in average firing rates on larger time scales (GoldmanRakic, 1995; Durstewitz et al., 2000). Often the term is meant to imply precise millisecond coordination of spike times for a 'volley' of activity which repeats at regular or irregular intervals in relation to specific perceptual or motor events (Figure 1A, I; e.g. [Riehle et al., 1997; Roelfsema et al., 1997; Harris et al., 2003; Fries et al., 2007]). Precise sequential patterns of spiking times (i.e., with time lags ≠ 0) have been reported as well (Figure 1A, II), most commonly in the hippocampal formation where they may correspond to sequential orders of places (Skaggs and McNaughton, 1996; Buzsáki and Draguhn, 2004), in the visual cortex as a consequence of different activation levels (König et al., 1995), or as possibly generated through synfirechainlike structures (Abeles, 1991; Diesmann et al., 1999). More generally, neurons may contribute several spikes in any order to a fixed spatiotemporal pattern (Figure 1A, III), as reported and linked to putative synaptic input motifs in vitro and in vivo (Ikegaya et al., 2004; Yuste et al., 2005). At a coarser temporal scale, neurons could fire with a specific temporal patterning to which each neuron may contribute 'bursts' of variable length (Figure 1A, IV). Such temporally ordered transitions among coherent firing rate patterns across sets of simultaneously recorded neurons have been described in different cognitive tasks and systems (Seidemann et al., 1996; Beggs and Plenz, 2003; Jones et al., 2007; Lapish et al., 2008; Durstewitz et al., 2010). At a still broader temporal scale, sets of neurons jointly increasing their average rates for some period of time (Figure 1A,V), as during persistent activity in a working memory task, have also been linked to the cell assembly idea (Durstewitz et al., 2000).
There is indeed an ongoing, sometimes heated, controversy about the degree of temporal precision and coordination present in neural activity and its relevance for neural coding, partly based on empirical (Shadlen and Movshon, 1999; London et al., 2010), partly on statistical arguments (Mokeichev et al., 2007). Based on this discussion, it seems at present premature and limiting to focus on a single specific assembly concept, theoretical idea, or particular timescale. Here we develop a novel statistical approach for multicell recordings that treats the temporal scale, precision, and internal organization of coherent activity patterns as free parameters, to be determined from the data, and is thus open to a large family of possible assembly definitions (Figure 1A). By deriving a fast parametric test statistic for pairwise dependencies that automatically corrects for nonstationarity locally, computationally costly bootstrapping and sliding window analyses are avoided, reducing the computational burden by factors of 100–1000 (see Materials and methods). Thus, in combination with a computationally efficient agglomeration scheme which recursively combines units into larger sets based on significant relations detected in the previous step, considerable speedups are achieved. This in turn enables screening for assemblies at all possible lag constellations and temporal scales, not accomplished (to this extent) by previous algorithms to our knowledge (see Materials and methods). We then apply this methodology to examine in multiple singleunit (MSU) recordings from different cortical areas whether these employ a kind of universal temporal coding scheme, or whether and how the properties of the assembly code are adapted to the areaspecific computations and task demands.
Results
Theoretical framework for assembly detection
From a statistical perspective, any of the assemblies from Figure 1A should reveal itself through recurring activity patterns in a set of simultaneously recorded spike trains, where a pattern can be any suprachance constellation of unit activities with a specific distribution of time lags $l$ among them. The idea is to capture the multiple temporal scales introduced above through the width $\mathrm{\Delta}$ used for binning the spike time series. We start from the relatively old notion of assessing the departure of the joint spike count distribution $p\left(A,B\right)$ of two units (or sets) A and B from independence (Grün et al., 2002a; Pipa et al., 2008). For two independent units with stationary spike trains, the joint distribution of spike occurrences at a specified time lag $l$ would factor into the single unit (‘marginal’) distributions, $p\left(A,B\right)=p\left(A\right)p\left(B\right)$. Assume each recorded spike time series has been converted into a series $\left\{{c}_{t}\right\}$ of spike counts of length $T$ at bin width $\mathrm{\Delta}$, with ${\#}_{A}$ and ${\#}_{B}$ denoting the total numbers of spikes emitted by units A and B, respectively. If $\mathrm{\Delta}$ is small enough such that $c}_{t}\in \left\{0,1\right\$ (binary counts), then, under the null hypothesis ($H}_{0$) of independence, the joint spike count ${\#}_{AB,l}$ at time lag $l$ follows a hypergeometric distribution with mean ${\mu}_{AB,l}={\#}_{A}{\#}_{B}/\left(Tl\right)$ and variance ${\sigma}_{AB,l}^{2}$. If the binning is such that spike counts $c}_{t$ larger than one occur, the hypergeometric distribution is no longer directly applicable. We then split the series into several (mutually dependent) binary series (cf. Figure 6A) for which we obtain a joint mean and variance as derived in the Materials and methods.
The mean $\mu}_{AB,l$ and variance $\sigma}_{AB,l}^{2$ could, in principle, be used to check for deviation from the $H}_{0$ of independence at lag $l$, but in practice such a statistic would be corrupted by nonstationarities like (coupled) changes in the underlying firing rate (see Materials and methods, Figure 7, and Appendix on the importance of accounting for nonstationarity). Sliding window (Grün et al., 2002b) or bootstrapbased (Fujisawa et al., 2008; Pipa et al., 2008; PicadoMuiño et al., 2013) analyses have most commonly been used to deal with this issue, but come at the price of considerable data loss or computational burden. Here we suggest a simple remedy which corrects for nonstationarity locally by using the difference statistic ${\#}_{ABBA,l}={\#}_{AB,l}{\#}_{AB,l}$ (see Materials and methods, Figure 6B). The idea is that this way nonstationarities in firing rates would cancel out locally, on a comparatively fine time scale ($\approx l\mathrm{\Delta}$), since they would affect ${\#}_{AB,l}$ and ${\#}_{AB,l}$ alike (for assessment of synchronous spiking, we use ${\#}_{ABBA,0}={\#}_{AB,0}{\#}_{AB,{l}^{*}}$, with ${l}^{*}=2$; see sect. ‘Choice of reference (correction) lag’ for the motivation of this particular choice and a more general discussion of the reference statistics chosen). The statistic $Q}_{l}\equiv {{\mathrm{\#}}_{ABBA,l}}^{2}/{{\hat{\sigma}}^{2}}_{ABBA,l$ finally is approximately Fdistributed and can be used for fast parametric assessment of the $H}_{0$ (see Materials and methods and Figure 7; Figure 7—figure supplements 1 and 2, for derivation and empirical confirmation using nonstationary synthetic data).
Having derived a fast, nonstationaritycorrected parametric test statistic for assessing the independence of pairs, we designed an agglomerative, heuristic clustering algorithm for fusing significant pairs into higherorder assemblies (see Figure 6—figure supplement 1 and Materials and methods for full derivation and pseudocode). In essence, at each agglomeration step the algorithm treats each set of units fused in an earlier step just like a single unit with activation times defined through one of its member units. This allows for the same pairwise test procedure on sets of units as defined for single units above, while at the same time effectively testing for higherorder dependencies based on the joint (set) distributions (see Materials and methods). Each pair is tested at all possible lags $l\in \left\{{l}_{max}\dots {l}_{max}\right\}$ (with ${l}_{max}$ provided by the user), which is a reasonably fast process given the parametric evaluation introduced above. Should a pair of unitsets prove significant at several lags $l$ at any step, only the one associated with the minimum pvalue is retained. The recursive setfusing scheme stops if no more significant relationships among agglomerated sets and single units are detected. All subsets nested within larger sets are then discarded. This whole procedure is repeated for a set of userprovided bin widths $\mathrm{\Delta}\in \{{\mathrm{\Delta}}_{min}\dots {\mathrm{\Delta}}_{max}\}$. For each formed assembly, the width $\mathrm{\Delta}}^{\ast$ associated with the lowest pvalue may then be defined as its characteristic temporal precision. All tests are performed at a userspecified, strictly Bonferronicorrected αlevel (always set to 0.05 here; see Materials and methods for details).
Performance evaluation on simulated data
The agglomerative scheme described above is a fast heuristic proxy, similar in spirit to the apriori algorithm in machine learning (Hastie et al., 2009; PicadoMuiño et al., 2013), for evaluation of all possible unit and lag combinations. To illustrate and evaluate its performance, synthetic data with known ground truth were created. Cell assembly structures with the different levels of temporal precision and internal organization (i.e., lag distributions) as shown in Figure 1A were simultaneously embedded within inhomogenous (i.e., nonstationary) Poisson spike trains, with a mean rate following an autoregressive process (see Materials and methods). The assemblyassignment matrix in Figure 1B demonstrates that all five different types of assemblies (and only these, no false detections) were correctly identified with their associated temporal precision and lag distributions. Figure 1C illustrates the quality of ‘assembly retrieval’ (measured as fraction of assembly units correctly assigned) as a function of bin width Δ: As expected, the retrieval quality steeply declines for the temporally precise assemblies as the bin width increases (types I and II), while it rises up to the appropriate temporal scale for the more broadly defined assemblies (types IV and V). For assemblytype III, defined by precise temporal relationships, yet extended across time without strictly sequential structure, both these time scales are revealed (leading to the local peak at ~300 ms). Also note that the correlated rate increases which define assemblies of types IV and V naturally can be discovered already at lower bin widths than the one which corresponds to the temporal extent of the whole pattern. We also investigated more systematically (Figure 2, see also Materials and methods) how assembly retrieval varies as a function of sample size and potential spike sorting errors. Assembly detection starts to significantly degrade only when their relative contribution to the spike series drops below ~4% (Figure 2A), or when more than ~30% of all spike times were corrupted by spike sorting errors (Figure 2B). More importantly, across a whole range of sample sizes, spike assignment errors, and assembly structures tested, the fraction of units falsely ascribed to any one assembly stayed uniformly low at about 0.5% (Figure 2C,D), indicating that our procedure is quite conservative and rarely returns false positives in the simulated scenarios.
Area and taskspecific assembly configurations and time scales
We next examined assembly structure in different brain regions from which multiple singleunit recordings were obtained in previously published experiments, including the rat anterior cingulate cortex (ACC; [Hyman et al., 2012; Hyman et al., 2013]), hippocampal CA1 region, and entorhinal cortex (EC, [Pastalkova et al., 2008; Mizuseki et al., 2009, 2013]) (see Materials and methods for further specification). Figure 3A presents the assemblyassignment matrix from one of the ACC data sets. Detected assemblies span a large range of temporal precisions, from ~10 ms to about 1.5 s, with a variety of lag distributions, and are composed of about 10% (ACC) to 16% (CA1, EC) of the recorded neurons. Note that different from the clearcut hypothetical examples (Figure 1B) which were strictly disjoint by design, many of the experimentally recorded assemblies partially overlap (i.e., share units; see also Materials and methods). Figure 3B also gives specific examples of assemblies with relatively high (top) and with lower (bottom) temporal precision. Finally, many of the unraveled assemblies were highly selective for specific task events as illustrated in Figure 3C, Figure 3—figure supplement 1.

Figure 3—source data 1
 https://doi.org/10.7554/eLife.19428.006
A specific question one might ask is whether different brain areas host different types of assembly structures, and how these may depend on the behavioral task. These aspects are quantified in Figure 4A by plotting the distribution of all significant unit pairs as a function of bin width $\mathrm{\Delta}$ and time lag $l$. Several features are noteworthy here: First, the joint $\left(\mathrm{\Delta},l\right)$distribution changes dramatically as the animals move from unstructured, completely selfpaced, little demanding environmental exploration (Figure 4A, left column) to a highly structured, demanding delayed alternation task (Figure 4A, right column). During the latter, a much larger number and richer repertoire of assembly structures turned out, as also indicated by the ‘marginal’ distributions of significant unit pairs across time scales $\mathrm{\Delta}$ in Figure 4B. However, secondly, while these changes in ACC and CA1 were mostly focused on the larger timescales, in EC they appeared to run across all timescales, yet were overall less dramatic than in CA1 (Figure 4A, bottom; Figure 4B, bottom). Third, assemblies remarkably differ among the three brain regions in the range of temporal scales they occupy: While EC mainly harbors fine temporal structure with a precision of about 15–50 ms, in ACC broad rate change patterns in the 120–700 ms range appear to dominate (Figure 4B). CA1, in contrast, expresses both temporal scales, or in fact a wide spectrum from about 30–1500 ms, of which the broader scales (>100 ms) only surface during delayed alternation. Overall, a much larger number of units engaged in any one assembly in CA1 (>90%) than in ACC (30–50%; Figure 4B). These observations indicate that the temporal composition and precision of assembly activity strongly depends on both the brain area and the behavioral setting.
On closer inspection, some of the temporally more precise 30–50 ms assemblies in CA1 were found to code for specific place fields (‘place assemblies’) in the rat’s environment (Figure 5A, Figure 5—figure supplement 1). These assemblies mainly consisted of synchronous (lag0) spiking units (Figure 4A). Meanwhile, the more broadly tuned assemblies in CA1 tended to code for temporally extended events which often appeared to have a specific behavioral meaning in the task context: For instance, these assemblies may become active during the reward event irrespective of its spatial location (Figure 5B), or for the whole correct choice path after a behavioral decision was made (Figure 5D). These temporally broader assemblies commonly also followed a more sequential (lag≠0) layout (Figure 4A). Interestingly, the single cells constituting CA1 assemblies did not necessarily share the same place preference with their ‘parent’ assembly (Figure 5—figure supplement 1). Similar as in CA1, broader assemblies in ACC were tuned to specific task phases and events (lever presses, delays, stimulus conditions) and reflected the task’s sequential structure (Figure 3C, Figure 3—figure supplement 1).
Discussion
Here we introduced a novel theoretical and statistical framework, based on fast parametric testing and computationally efficient agglomerative algorithms, which detects assembly structure at many different temporal scales, and with arbitrary internal organization, while at the same time accounting for nonstationarity on a fine time scale. This enables to readdress fundamental questions about the temporal structure and nature of neural representations in a largely unbiased way. One potential caveat to be noted here, however, is that the particular choice of reference bin for removing nonstationarity still entails a (mild) assumption about structure: For the present choice of pairing ${\#}_{AB,l}$ with its time reverse, ${\#}_{AB,l}$, it is that temporal dependencies are essentially directed (except for the synchronous case), i.e. do not simultaneously, within a given pair of time series, occur with exactly the same time lag in both directions. If in doubt about this, however, analyses may be repeated with another (not too large) reference lag (see sect. ‘Choice of reference (correction) lag’ for a discussion of alternative choices and factors to consider, including differences in sensitivity to nonstationarity implied by different lag spans covered by the test statistic). Testing other reference lags may also help with types of nonstationarity that may violate the conditions defined below Equation 6.
Illustrating this methodology on multiple singleunit recordings from ACC, CA1, and EC, it appeared as if the temporal structure and precision of the revealed assemblies were closely related to the computations performed by these brain areas: While the CA1 region processes precise spatial (O’Keeffee and Nadel, 1978; Harris et al., 2003; Diba and Buzsáki, 2007) and temporal (Eichenbaum, 2014) environmental structure, the ACC is much less concerned with finelygranulated details of the spatial world (Hyman et al., 2012). Rather, activity in ACC reflects behavioral organization, behavioral monitoring, overall context, and task structure, processes which typically unfold on much slower temporal scales (Lapish et al., 2008; Hyman et al., 2012). Likewise, in addition to spatial coding, the hippocampal CA1 region has also been reported to represent aspects of higherorder decision making, like paths to a defined goal state or choice outcomes (Lisman and Redish, 2009; Buzsáki, 2015). These capacities may become relevant only when an animal is transferred from unstructured environmental exploration to a task which involves clearly defined goal states, rewardrelated choices, and possibly time delays between them. Consequently, sequential organization of assemblies at broader time scales was much more often observed in the latter than in the former task context.
Numerous other statistical procedures for detecting assemblies or sequential patterns have been proposed previously (Grün et al., 2002a; Grün et al., 2002b; Pipa et al., 2008; Torre et al., 2016a), but most of these adhere to one or the other theoretical conceptualization of a cell assembly (cf. Figure 1A), or become computationally impractical for larger cell numbers or multiple lags (see Appendix for further discussion of both more recent and more 'traditional', crosscorrelationbased, approaches). Also, none of these, to our knowledge, combines all of the features presented here. The statistical tools developed here may allow readdressing questions about the nature of neural coding in different brain areas, without requiring the researcher to commit to any particular assembly concept or theoretical framework a priori. Indeed, we observed that there may be not just one type of cortical assembly code, but that the temporal precision, scale, and sequential composition with which cortical neurons organize into coherent patterns strongly depends on the brain area and task context investigated. We further note that our methods are not specific to the neuroscientific domain, but could be used more widely in other scientific areas to detect structure at multiple temporal scales in multivariate event count series.
Materials and methods
Statistical test for pairwise interaction
Request a detailed protocolAssume we have recorded $N$ spike trains, each divided into $T$ bins of equal bin width $\mathrm{\Delta}$, resulting in a spike count series $\left\{{c}_{K,t}\right\}$, $t=1\dots T$, for each recorded unit $K$. Bin width $\mathrm{\Delta}$ sets the temporal precision at which unit interactions are to be detected. We would like to test for a range of time lags $l\in \left\{{l}_{max}\dots {l}_{max}\right\}$ whether the joint spike count ${\#}_{AB,l}$ of units A and B at lag $l$ significantly exceeds of what would have been expected under the null hypothesis ($H}_{0$) of independence of the two spiking processes. For clarity, note that ${\#}_{AB,l}$ is computed by counting the number of times we have a spike in unit A and a corresponding spike in unit B $l$ time bins later. From the range of all considered lags, we select the one which corresponds to the highest count $\mathrm{\#}}_{AB,\overline{l}$, i.e. $\overline{l}\equiv {\mathrm{a}\mathrm{r}\mathrm{g}\mathrm{m}\mathrm{a}\mathrm{x}}_{l}({\mathrm{\#}}_{AB,l})$. For deriving the proper distributional assumptions under the $H}_{0$, spike count series $\left\{{c}_{K,t}\right\}$ are often thresholded (Grün et al., 2002a; Humphries, 2011; Shimazaki et al., 2012; PicadoMuiño et al., 2013) such that binary {0,1}series are obtained, presumably partially since multivariate extensions of the binomial or hypergeometric distribution are not yet commonplace (see Teugels, 1990; Dai et al., 2013). Especially for larger bin widths $\mathrm{\Delta}$ this implies a serious loss of information, however. Therefore, we sought a different approach to the problem that makes use of the full spike counts, based on the first two moments of a multivariate hypergeometric distribution. Instead of thresholding, we split each spike count series into a set of $\alpha =1\dots {M}_{K}$ binary processes as indicated in Figure 6A, where ${M}_{k}\equiv max\left({c}_{K,t}\right)$ is the maximum spike count observed for unit $K$ for the specified bin width. The first binary process is defined by having a ‘1’ in all time bins for which spike count ${c}_{K,t}\ge 1$ (and ‘0’ otherwise), the second process by having a ‘1’ only in those time bins for which ${c}_{K,t}\ge 2$, and so on. For any two units A and B, defining $M\equiv min\left({M}_{A},{M}_{B}\right)$, the total joint count $\mathrm{\#}}_{AB,\overline{l}$ at selected lag $\overline{l}$ is now simply given by the sum of joint counts $\mathrm{\#}}_{AB,\overline{l}}^{\alpha$ across all pairs of binary subprocesses $\alpha =1\dots M$,
Since each of the $M$ subprocesses is binary, each number $\mathrm{\#}}_{AB,\overline{l}}^{\alpha$ follows a hypergeometric distribution under the $H}_{0$ (since marginal counts ${\#}_{A}^{\alpha}$ and ${\#}_{B}^{\alpha}$ are fixed by the observed data, a binomial distribution is not appropriate [Gütig et al., 2002]). From this, and noting that the $M$ binary processes are not independent since by construction each higherrank process γ >α can only have ‘1 s’ where the lowerrank processes α had as well (but not necessarily vice versa), one can derive the expectancy value and variance of $\mathrm{\#}}_{AB,\overline{l}$, respectively, under the $H}_{0$ as
A parametric test statistic, $S}_{\overline{l}}\equiv \left({\mathrm{\#}}_{AB,\overline{l}}{\hat{\mu}}_{AB,\overline{l}}\right)/{\hat{\sigma}}_{AB,\overline{l}$ (tdistributed with $2\left(T\overline{l}\right)M1$ degrees of freedom), could be based directly on these moments under the null hypothesis of independence on all time scales (for all choices of $\mathrm{\Delta}$), and if the data were truly stationary (by which we mean here that the joint probability distribution ${\pi}_{AB,t,l}\left(u,v\right)\equiv pr\left({c}_{A,t}=u\wedge {c}_{B,t+l}=v\right)$ is timeinvariant, i.e. ${\pi}_{AB,t,l}\left(u,v\right)={\pi}_{AB,l}\left(u,v\right)$ for all t). This will commonly not be the case with electrophysiological time series. Rather, there will be rate fluctuations on different temporal scales, as for instance induced by oscillatory drive or external stimuli (QuirogaLombard et al., 2013). Under these conditions, variance Equation 3 will usually be highly biased, often underestimating the true variation. Furthermore, the joint count $\mathrm{\#}}_{AB,\overline{\text{l}}$ may not factor into the product of the marginal counts as in Equation 2 anymore, since, in general, $E\left[{\mathrm{\#}}_{A}\right]E\left[{\mathrm{\#}}_{B,l}\right]\ne \left(Tl\right){\sum}_{t=1}^{Tl}E\left[{\mathrm{\#}}_{A,t}\right]E\left[{\mathrm{\#}}_{B,t+l}\right]$ for any fixed nonstationary set $\left\{{\pi}_{AB,t,l}\left(u,v\right),t=1\dots Tl\right\}$. Finally, we are interested in testing for independence at a defined time scale $\mathrm{\Delta}$, but may want to permit the processes to be coupled at other temporal scales, like for instance with common external or oscillatory drive. In this case the simple test statistic defined above ($S}_{\overline{l}$) breaks down (Figure 7 left column). Hence, we would like to test against a stronger $H}_{0$ which allows for uncoupled or coupled rate changes on broader temporal scales without corrupting our assessment of independence on finer scales.
In the time series literature, the most common remedies for nonstationarity issues are bootstrapbased techniques (Fujisawa et al., 2008; Pipa et al., 2008; PicadoMuiño et al., 2013; Torre et al., 2013) and sliding window analyses (Grün et al., 2002b).These two methods have, however, severe limitations. Bootstrapbased approaches are computationally quite demanding since essential steps of the algorithm may have to be repeated for a 100–1000 bootstrap replications. This may become outright prohibitive especially when multiple lag constellations are to be considered as in the present work. Sliding window analysis, on the other hand, uses only small fragments of the data set for estimation in each window, thus can be seriously plagued by low sample size issues (resulting in weak statistical power). Sometimes this is (partly) addressed by pooling across many trials, but this in turn requires (a) a sufficient number of trials, (b) stationarity across trials, and (c) clear external timestamps such that windows across trials are indeed comparable and can be aligned. In many tasks probing higher cognition, where just a handful of trials are not rare (e.g. [Lapish et al., 2008; Hyman et al., 2012]), or in selfpaced tasks, these methods are thus not applicable.
We therefore propose a new approach to nonstationarity here. Rather than testing $\mathrm{\#}}_{AB,\overline{l}$ directly for significance, our approach focuses on the asymmetry between the occurrence rates of patterns $\left(AB,l\right)$ and $\left(BA,l\right)=\left(AB,l\right)$, respectively, defined through
The idea is that nonstationarity on slower time scales ($>\overline{l}\mathrm{\Delta}$, if $\mathrm{\Delta}$ is the time scale considered), e.g. in the form of correlated rate changes, will, on average, cancel out in $\mathrm{\#}}_{ABBA,\overline{l}$, since it will affect $\mathrm{\#}}_{AB,\overline{l}$ and $\mathrm{\#}}_{AB,\overline{l}$ alike (for clarity, note that $\mathrm{\#}}_{AB,\overline{l}$ is accumulated across $t=1:T\overline{l}$, while $\mathrm{\#}}_{AB,\overline{l}$ runs across $t=\overline{l}+1:T$; we also note that while differencing to remove nonstationarity is, in general, a more common practice in the ‘classical’ time series literature, e.g. (Box et al., 2013), here we use this technique in a very specific sense by forming the difference between one joint count series and its reverse). More precisely, the scenarios covered by our ${H}_{0}\left(\mathrm{\Delta}\right)$ should be those where the two processes A and B are independent on the specified scale $\mathrm{\Delta}$, in fact independent for all scales at least up to $l\mathrm{\Delta}$, while they may be coupled and/or nonstationary on slower temporal scales of at least $l\mathrm{\Delta}$. Our $H}_{0$ requires $E\left[{\mathrm{\#}}_{AB,\overline{l}}^{\mathrm{\Delta}}\right]=E\left[{\mathrm{\#}}_{AB,\overline{l}}^{\mathrm{\Delta}}\right]$ (where superscript $\mathrm{\Delta}$ indicates that counts were taken at that temporal resolution), which strictly holds if
The H_{0}(Δ) furthermore demands that this factorizes as
Now, if the two processes A and B are reasonably stationary at least on scales up to $l\cdot \mathrm{\Delta}$, we have $p\left({c}_{A,t}^{\mathrm{\Delta},\alpha}\right)\approx p\left({c}_{A,t+l}^{\mathrm{\Delta},\alpha}\right)$ and $p\left({c}_{B,t+l}^{\mathrm{\Delta},\alpha}\right)\approx p\left({c}_{B,t}^{\mathrm{\Delta},\alpha}\right)$ and the equalities above should (approximately) hold, even if the processes are nonstationary and/or coupled on broader temporal scales. Thus, under the $H}_{0$ of independence on the time scale fixed by precision $\mathrm{\Delta}$, $E\left[{\mathrm{\#}}_{AB,\overline{l}}^{\mathrm{\Delta}}\right]=E\left[{\mathrm{\#}}_{AB,\overline{l}}^{\mathrm{\Delta}}\right]$ if (eq. 6) holds exactly, and approximately otherwise.
Under the alternative hypothesis of dependence on the specific scale $\mathrm{\Delta}$ considered, on the other hand, if pattern $(AB,\overline{l})$ occurs more frequently than expected by chance, e.g. because unit A excites unit B, it appears (mechanistically) rather unlikely that the same is true for the exact time reversed pattern $(AB,\overline{l})$, unless the two units are in perfect synchrony as treated below (but see sect. ‘Choice of reference (correction) lag’ where this issue and alternative choices of reference bin are further discussed); hence $\mathrm{\#}}_{ABBA,\overline{l}$ would be expected to differ from zero.
To accommodate the strictly synchronous case ($\overline{l}=0$), finally, we slightly modify Equation 4 to be
with ${l}^{*}\ne 0$. While, in principle, different reference lags ${l}^{*}$ may be attempted for confirmatory purposes, here we suggest ${l}^{*}=2$ as a tradeoff between potential ‘spillover’ issues and the effective timescale at which nonstationarity is removed, as discussed in more detail in the sect. ‘Choice of reference (correction) lag’ below (see also Figure 1—figure supplement 1). Note that ${\#}_{ABBA,0}>0$ if synchrony is the dominant pattern, while the sequential case $\overline{l}=2$ is tested separately by ${\#}_{ABBA,2}$. Again, under the $H}_{0$, $E\left[{\mathrm{\#}}_{AB,0}\right]=E\left[{\mathrm{\#}}_{AB,{l}^{\ast}}\right]$, while under the $H}_{1$ of synchronous spiking ${\#}_{ABBA,0}$ would be expected to be larger than zero (one may also define the symmetric quantity ${\#}_{ABBA,0}={\#}_{AB,0}\left({\#}_{AB,{l}^{*}}+{\#}_{AB,{l}^{*}}\right)/2$, but it makes the computation of the variance, Equation 10 below, more cumbersome). Following the same lines laid out above in deriving Equations 2 and 3, for these modified, stationaritycorrected statistics we thus obtain for the mean $\mu}_{ABBA,\overline{l}$ and variance $\sigma}_{ABBA,\overline{l}}^{2$ under the $H}_{0$, respectively,
and
where ${\mu}_{ABBA,\overline{l}}=0$ will approximately hold even under nonstationarity (see above). If the process is strongly nonstationary, however, evaluating Equation 9 directly across the whole time series may still give an inaccurate estimate of variance. In general, we therefore divide the binned spike train into $C$ segments of $k$ time bins each, and combine the local, segmentwise variance estimates into the global estimate $\hat{\sigma}}_{ABBA,\overline{l}}^{2$ with
For smaller segment length $k$ (here we used $k$=100), this approximation will become more accurate (as within each short segment the process will approach stationarity), yet at the same time computationally more demanding. In practice, covariation among segments may often be negligible compared to the withinsegment variance contributions (see below; e.g. if autocorrelations decay relatively fast), so to reduce the computational burden one may evaluate the quantity ${\hat{\sigma}}_{ABBA,\overline{l}}^{2}\approx 2{\sum}_{c=1}^{C}\mathrm{v}\mathrm{a}\mathrm{r}({\mathrm{\#}}_{AB,\overline{l}}^{c})2{\sum}_{c=1}^{C}\mathrm{c}\mathrm{o}\mathrm{v}({\mathrm{\#}}_{AB,\overline{l}}^{c},{\mathrm{\#}}_{AB,\overline{l}}^{c})$. This is in fact the estimate we have used throughout this manuscript.
Based on the estimates derived above, we can then define the following approximately Fdistributed quantity which can be used for significance testing:
with 1 numerator and $v$ denominator degrees of freedom. We found these approximations to work reasonably well for $E\left[{\mathrm{\#}}_{AB,\overline{l}}\right]>4$ (see Figure 7). For one numerator and large denominator d.f., the F distribution is known to converge to the $\chi}_{1}^{2$ distribution which could be used for testing instead. For smaller sample sizes and nonstationary scenarios (where the variance $\hat{\sigma}}_{ABBA,\overline{l}}^{2$ is not exact anymore but estimated from segments), however, the F distribution appeared more appropriate, although the exact denominator d.f. are unknown in this case. For all analysis reported here we have used $v=2\left(T\overline{l}\right)M1$, but more generally we recommend $v=T\overline{l}$ which is more conservative for low sample size ($T<50$; the differences become negligible for $T>400$ as in the case of most analyses reported here). Finally, all αlevels are Bonferronicorrected for the number $R$ of tests performed (see pseudocode below and sect. ‘Recursive assembly agglomeration algorithm’).
Limitations of parametric testing under nonstationarity
Request a detailed protocolTo examine the error made by the various approximations introduced above, we empirically studied different scenarios by simulation. In one set of simulations, discrete, steplike ratechanges were used. Within a total of $T$=10^{6} ‘elementary’ time bins (not to be confused with bin width $\mathrm{\Delta}$ used for assembly detection), lowrate states were randomly interspersed with $m$ highrate states of duration $L$ (expressed in terms of numbers of elementary bins). For each elementary time bin, spikes were drawn from a Bernoulli process with probabilities π_{low} and π_{high}, respectively. Here, $L$ explicitly defines the time scale on which the two processes are nonstationary and, in some simulations, coupled. Simulations (with $\mathrm{\Delta}$=100, $l$∈{0,5,10}) were performed with both relatively fast ($m$=25, $L$=3000, on the order of $2{l}_{max}\mathrm{\Delta}$; Figure 7—figure supplement 1, center column) and slow ($m$=1, $L$=75000; Figure 7—figure supplement 1, right column) rate variations (both with the same number of highrate bins), and with one independent scenario (Figure 7—figure supplement 1, top row) and one where the rate variations were completely synchronized between the two processes (Figure 7—figure supplement 1, bottom row). As the percentilepercentile plots in Figure 7—figure supplement 1 reveal, in all these cases departures from the theoretical F distribution were relatively mild. We emphasize that this was the case although the difference between the low and high rate states was assumed to be rather large in our simulations (π_{A,low}=0.01 vs. π_{A,high}=0.05, and π_{B,low}=0.03 vs. π_{B,high}=0.15), a larger violation of stationarity than one may actually expect empirically.
In another set of simulations, timevarying firing rates for the two neurons were drawn from a slowly varying autoregressive process with Gaussian noise (or, equivalently, a joint multivariate Gaussian). Spike counts for each bin were then drawn from a Poisson distribution with the rate λ determined by the autoregressive process passed through a nonnegative transform (‘linkfunction’, see Equation 13 below). We simulated scenarios with both somewhat faster ($cov({\lambda}_{A,t},{\lambda}_{A,t+2\mathrm{\Delta}{l}_{max}})\approx 0.8$; Figure 7—figure supplement 2, center column) and very slowly decaying autocorrelations ($cov({\lambda}_{A,t},{\lambda}_{A,t+2\mathrm{\Delta}{l}_{max}})\approx 0.99$; Figure 7—figure supplement 2, right column), and without ($cov\left({\lambda}_{A,t},{\lambda}_{B,t}\right)=0$; Figure 7—figure supplement 2, top row) or with ($cov\left({\lambda}_{A,t},{\lambda}_{B,t}\right)\approx 0.7$; Figure 7—figure supplement 2, bottom row) correlations among the two processes present. As Figure 7—figure supplement 2 reveals, the results for these simulation experiments were very similar to those shown in Figure 7—figure supplement 1, with the empirical $Q}_{\overline{l}$ distribution deviating only slightly from the theoretical F distribution. We thus conclude that for empirically reasonable scenarios of nonstationarity and coupling on longer time scales, the parametric statistical procedure introduced above should return quite accurate results.
It is important to note that while coherent rate changes constitute a coupled nonstationarity from the viewpoint of smaller timescales $\mathrm{\Delta}$, for which they will be removed by our difference statistic, they actually represent an assembly on their own characteristic scale (type V in Figure 1). They are indeed correctly identified as such in both simulated scenarios when the bin width is chosen to be about the same as the timescale of the ‘nonstationarity’, i.e. $\mathrm{\Delta}\approx L$ ($m$=75, $L$=1000; Figure 7—figure supplements 1,2, left columns, blue curves in bottom graphs). Hence, nonstationarity, in the present definition, refers to somewhat slower (co)variations that may mislead detection of coincident events on comparatively finer time scales. Both, the relevant time scale for assembly detection and the associated time scale of nonstationarity, are strictly defined by the experimenter by fixing $\mathrm{\Delta}$. Whether the detected temporal structure is interpreted as an assembly or as nonstationarity therefore depends on the timescale chosen, and is ultimately up to the experimenter and the research question posed. From this perspective, our method basically ensures that coincidence structure is not falsely attributed to a certain temporal precision $\mathrm{\Delta}$ while it was really produced by slower (co)variations, an issue that has frequently plagued the discussion about precise temporal coding in the nervous system in the past (eg., [Singer, 1999] vs. [Shadlen and Movshon, 1999]).
As another note of caution, we remark that while under the ${H}_{0}$ assumptions (independence and stationarity on scales $\le l\mathrm{\Delta}$) equality (GoldmanRakic, 1995) will always hold (in expectancy), the size of the deviations from the ${H}_{0}$ in the case of true structure (and thus the test’s sensitivity or power) may, in principle, depend on how exactly nonstationary processes interact with underlying structurecreating processes (e.g., linearly vs. nonlinearly).
Oscillations
Request a detailed protocolFinally, as an example of a particularly common form of nonstationarity in neural data (e.g. [QuirogaLombard et al., 2013]), we considered oscillations (Buzsáki and Draguhn, 2004). (Note that oscillations constitute a type of nonstationarity from the present neurophysiological perspective, although they may be considered stationary in the classical statistical definition with access to an infinite ensemble of time series starting at random phases (see [Fan and Yao, 2003]). We tested two scenarios here: One in which two neurons were spiking independently at the time scale considered, but were driven by a common oscillatory drive at the same frequency and phase, and one where on top the units exhibited suprachance coincident patterns. Specifically, for both units A and B the firing probability was taken to follow a Poisson distribution with rate parameter ${\lambda}_{\{A,B\}}(t)=5(\mathrm{s}\mathrm{i}\mathrm{n}(2\pi \theta t)0.6+{a}_{\{A,B\}})$, ${a}_{A}=1$, ${a}_{B}=0.5$, $\theta =4Hz$, yielding mean firing rates of 5Hz and 2.5Hz, respectively. For the independent case, no structure is detected for smaller bin widths $\mathrm{\Delta}$ up to ≈30 ms (Figure 7—figure supplement 3A), while for larger bin widths (>50 ms) the algorithm picks up the coordinated rate changes. For the second scenario (finer timescale dependence on top of oscillation), patterns are inserted on top with a spike in unit A followed by one in unit B 20 ms later, phaselocked to the underlying rhythm. Within a $T$=1500 s long simulation period, a total of 90 of such patterns were placed randomly with a 20 ms delay to the oscillation peak. Figure 7—figure supplement 2B shows that these patterns are indeed correctly detected by our procedure at the smaller bin sizes ({5, 10} ms) tested. Thus, once again, this example illustrates that the detection of finer time scale spike time relations is not confounded by covariations at slower scales, while the slower covariations are flagged up as a coherent structure at their own respective time scale (50–120 ms).
Choice of reference (correction) lag
Request a detailed protocolTo eliminate nonstationarity as a confounding factor, we suggested computing the difference between the targetlag joint count ${\#}_{AB,l}$ and its time inverse ${\#}_{AB,l}$. Strictly, this implies that if there were precise repeats of spiking sequences in reverse order, both the forward and backward directions would go undetected as they would cancel each other. Although a few studies have suggested that reverse replay may indeed occur in hippocampus (Foster and Wilson, 2006; Diba and Buzsáki, 2007), in general it seems unlikely that sequential patterns are replayed in reverse order with exactly the same time lag constellation and at the very same temporal scale as in the forward direction, such that the patterns would fully cancel (or at least this seems not more likely than constellations implied by any other choice of reference lags). Nevertheless, this brings up the more general question of whether there is a better choice of reference lag. Furthermore, the pairing of $l$ with its exact opposite, $l$, implies that nonstationarity will be removed at different timescales (with different levels of precision, $l\mathrm{\Delta}$) for different lags, a potentially undesired effect if direct comparisons between structures with different lag sizes are sought.
Indeed, in principle, any other lag might potentially be chosen as a reference. A general recommendation therefore might be to simply repeat the analyses with other reference lags if researchers suspect that there might be significant structure at both $\overline{l}$ and $\overline{l}$, or to fix the reference lag to be, e.g., ${l}^{*}=\pm ({l}_{max}+1)$, with ${l}_{max}$ the maximum lag tested for, for all lags considered. Doing this for the ACC delayed alternation data set presented in Figure 4 as an example, we found that the overlap between assemblies detected with $l}^{\ast}=\overline{l$ and ${l}^{*}=\pm ({l}_{max}+1)$ (fixed for all query lags) was on average ≈97% across all timescales (range 96–99%; as measured by the Rand index, Equation 14, defined across all matching pairs with significant [$r$] or nonsignificant [$s$] relation), including the synchronous ($\overline{l}=0$) case.
This does not imply that the reference lag is arbitrary, however. In general, there are two factors to consider: The amount of nonstationarity permitted (loosely related to the type I error in statistical terms) vs. the true structure potentially removed by the choice of reference lag (related to the test’s sensitivity or ‘type II error’). For instance, while choosing a directly neighboring bin as reference, ${l}^{\ast}=\overline{l}1$, may, at first glance, appear like the ideal choice from the perspective of removing nonstationarity confounds most efficiently (since in this case we only require $p\left({c}_{B,t}^{\mathrm{\Delta},\alpha}\right)\approx p\left({c}_{B,t+1}^{\mathrm{\Delta},\alpha}\right)$ for Equation 6 to hold), it may also imply the highest risk of removing true structure from both a physiological and statistical point of view (the more similar and temporally proximal target and reference statistics, the more likely they are to be correlated). This is illustrated for the important case of synchronous activity on simulated data (type I & V assemblies) in Figure 1—figure supplement 1, where with ${l}^{\ast}=\overline{l}1$ we find that the test starts to lose part of the structure due to spillover into neighboring bins, while for too large lags the test’s temporal accuracy goes down, and with that, more generally, the false discovery rates due to nonstationarity would be expected to rise. Furthermore, given that by far most synaptic connections and physiological responses are unidirectional or at least asymmetrical (Markram et al., 1997), the scenario that a neuron A is correlated with a neuron B at a couple of different forward lags (e.g., due to bursting) appears more likely than a pair of asynchronous neurons strictly switching order multiple times. Finally, the strict symmetry implied by $l}^{\ast}=\overline{l$ makes the joint count statistics for target vs. reference lag strictly comparable (unspoiled, e.g., by possibly differing autocorrelations across the target vs. reference lag), and permits to interpret Equation 11 effectively as a twosided test (with directionality indicated by the sign of $\mathrm{\#}}_{ABBA,\overline{l}$).
While the ideal choice of reference lag may be an issue of further theoretical and empirical investigation, we emphasize that a), in practice, the precise choice of reference lag should not be overly crucial (as supported by the analyses reported above and in Figure 1—figure supplement 1), and b) analyses may always be repeated for a few different reference lags if in doubt about structure possibly missed by the initial choice of reference.
Recursive assembly agglomeration algorithm
Request a detailed protocolOur assembly agglomeration scheme starts from all significant pairwise interactions, and then adds new elements only on the basis of the structures already formed, similar in spirit to the apriorialgorithm in machine learning (Hastie et al., 2009; Sastry and Unnikrishnan, 2010). This heuristic procedure drastically reduces the number of configurations to be tested, but may lose significant unit configurations with nonsignificant subgroups (PicadoMuiño et al., 2013). For each pair of units A and B, the spike count ${\#}_{AB,l}$ is obtained for each triplet $\left(AB,l\right)$, with $l\in \left\{{l}_{max}\dots {l}_{max}\right\}$, and the maximum count is tested for significance (Figure 6—figure supplement 1, step 2). Since the marginal elementary processes ${\#}_{A}^{\alpha}$ and ${\#}_{B}^{\alpha}$ do not change for different lags $l$ (except for the small influence from the $l$ bins cut off), this selection procedure is formally equivalent to performing an explicit significance test for each lag $l$ and retaining the one associated with the lowest p value. In the next step, all significant configurations $(AB,\overline{l})$ are treated like single units, with the joint ‘spike (activation) times’ defined (arbitrarily) as those of unit A whenever it matches up with a spike in unit B separated by $\overline{l}$ time steps (bins). Each significant pair $\left(AB\right)$ is then paired in turn with all single units C which had entertained a significant relationship with either A or B in the previous step (Figure 6—figure supplement 1, step 3). Proceeding with composite pairs $((AB,{\overline{l}}_{AB})C,{\overline{l}}_{(AB)C})$ exactly as described above, higherorder structures are thus recursively built up. Note that this procedure effectively tests for higher order structure, rather than just aggregating pairwise information: After screening for pairwise relations in the first step, for instance, in the second iteration the algorithm tests for the factorization $P\left(A,B,C\right)=P\left(A,B\right)P\left(C\right)$ whenever a unit $C$ is considered for inclusion into the already formed set $(AB,\overline{l})$, rather than, e.g., $P\left(A,B,C\right)=P\left(A\right)P\left(B\right)P\left(C\right)$, or “$P\left(A,B\right)=P\left(A\right)P\left(B\right)\vee P\left(A,C\right)=P\left(A\right)P\left(C\right)\vee P\left(B,C\right)=P\left(B\right)P\left(C\right)$”. Likewise, higher order joint distributions are considered in all subsequent iterations.
Significance levels $\alpha$ at each step of the agglomeration scheme are strictly Bonferronicorrected as $\overline{\alpha}}_{i}=\alpha /{R}_{i$ (using $\alpha =0.05$ here), with ${R}_{i}$ the total number of tests performed. Specifically, for the first step, ${R}_{1}=N\left(N1\right)\left(2{l}_{max}+1\right)/2$, where $N$ is the total number of single units (correcting for the total number of different pairs). For each subsequent step $i$, ${R}_{i,a}={N}_{a,i}{N}_{u,a}\left(2{l}_{max}+1\right)$, where ${N}_{a,i}$ is the number of assemblies tested in iteration $i$, and ${N}_{u,a}$ the number of units tested in combination with that assembly $a$ (hence allowing an higher $\alpha$level for assemblies tested in conjunction with less different units). At any step, unitsets with the same elementary units but different lag distributions may result. From all these, we select only the one associated with the lowest pvalue, and discard all others. This whole procedure will stop when no units engage in significant relationships anymore with the already agglomerated sets (Figure 6—figure supplement 1, step 4). All true subsets of larger sets are finally discarded (but may be retained if hierarchical nesting is of interest, see below). A pseudocode for the agglomeration scheme is included below. To test for structure at different temporal resolutions (scales), the whole scheme is rerun for a range of userprovided bin widths $\mathrm{\Delta}=\{{\mathrm{\Delta}}_{min}\dots {\mathrm{\Delta}}_{max}\}$. For each assembly pattern the width $\mathrm{\Delta}}^{\ast$ associated with the lowest pvalue is defined as the characteristic time scale (or temporal precision) for that assembly.
As a final note, for very large $\mathrm{\Delta}$, the binned spike counts may potentially fluctuate around a high mean level and never fall below some minimum count c_{floor} considerably larger than zero for the whole time series. In our count statistics, also spikes up to that baseline rate c_{floor} would contribute to the coincidence counts $\mathrm{\#}}_{AB,\overline{l}$, although they are completely noninformative with respect to the coupled dynamics among units, thus potentially biasing the results for large $\mathrm{\Delta}$. In this case, since we are only interested in actual firing rate covariations, we suggest to subtract off the minima $min\left({\left\{{c}_{A}^{\mathrm{\Delta}}\right\}}_{t}\right)$ and $min\left({\left\{{c}_{B}^{\mathrm{\Delta}}\right\}}_{t}\right)$ from the two considered series A and B, respectively, thus removing the noninformative floor count before statistical testing. This procedure would not affect the evaluation of spike coincidences at reasonably small $\mathrm{\Delta}s$ for which $min\left({\left\{{c}^{\mathrm{\Delta}}\right\}}_{t}\right)=0$, obviously.
Further assembly pruning
Request a detailed protocolFurther pruning may be applied to the set of assemblies returned by the algorithm if desired. This may sometimes help interpretability and visualization, but of course depends on the exact analysis goals. If, for instance, the interest is in whether the same assemblies are replayed at a different time scale (e.g. [Diba and Buzsáki, 2007]), then one may want to keep more than just the one assembly associated with the lowest pvalue across time scales $\mathrm{\Delta}$. Here, solely for the purpose of visualization, in Figure 3A the full set of assemblies returned by the algorithm was pruned by selecting among all assemblies (across different $\mathrm{\Delta}$) with cosine distance <0.3 only the one with lowest pvalue. In Figure 1B, again for clarity and visualization, pruning was performed by discarding across scales $\mathrm{\Delta}$ any assembly which is a subset of another, larger assembly (by default this is always done within each time scale $\mathrm{\Delta}$). No pruning was used for any of the other figures presented in here.
Assembly activation
Request a detailed protocolAn instance of assembly activation in the multivariate spike time series was registered whenever spikes in the elementary assembly units occurred in the order prescribed by the associated pattern of time lags, with the activation time point defined as that of the assembly unit spiking earliest. The total assembly activation score (as given in Figure 2B–C) is then defined as the number of such activation instances within a given time bin of size $\mathrm{\Delta}$. This can lead to activation scores much larger than one, especially for assemblies defined through rate changes on coarser time scales, since each set of single assembly unit spikes occurring in the right order is counted. (Alternatively, one may define assembly activation through the correlation of the average assembly spike count pattern with the observed spike count patterns along the series of binned spike counts at the respective assembly resolution $\mathrm{\Delta}$. This would result, however, in a temporally much less well resolved activation score which otherwise would essentially return the same information.)
A Matlab (MathWorks) implementation of the whole procedure is provided at https://www.zimannheim.de/en/research/departmentsresearchgroupsinstitutes/theorneurosciencee/informationcomputationalneurosciencee.html. To give an idea of the performance speed, on a 12core, 2.5 GHz, workstation, for a set of 50 simulated units (see below), a time series of length $T$=1400 s, and with $\mathrm{\Delta}=\left\{0.015,\text{}0.05,\text{}0.1,\text{}0.15,1\right\}$s and $l=\left\{10,\dots ,10\right\}$, this whole procedure took <50 min for five embedded assemblies with five units each (scenario from Figure 1). Significant further improvements in performance speed may be obtained through an implementation in a more basic language like C++.
Pseudocode for agglomerative assembly formation
Request a detailed protocol% <italic>N</italic>: total number of units
% <italic>u<sub>i</sub></italic>, i=1…N: single units
% <italic>U<sub>m</sub></italic>: set of units and corresponding lags (assemblies)
% <italic>r</italic>: set counter
for <italic>i</italic> = 1:<italic>N</italic>, <italic>U<sub>i</sub></italic> ← {(<italic>u<sub>i</sub></italic>, 0)} % Initialize lists with single units <italic>u<sub>i</sub></italic>
for all <italic>i</italic> ≤ <italic>N</italic>, <italic>j</italic> ≤ <italic>N</italic>: <italic>Z<sub>ij</sub></italic> = <italic>FALSE</italic> % Initialize all single unit pair comparisons to be
‘false’ (= ‘accept H<sub>0</sub>’)
<italic>r</italic> = <italic>N</italic>, <italic>L<sup>old</sup></italic> = 0
REPEAT % agglomeration procedure
<italic>L<sup>new</sup> = r</italic>
for <italic>m</italic> = <italic>L<sup>old</sup></italic> + 1:<italic>L<sup>new</sup></italic> % move through all lists formed in previous step
for all <italic>u<sub>s</sub> </italic>∉ <italic>U<sub>m </sub> m </italic>< <italic>s</italic> ≤ <italic>N</italic> ∨ (<italic>m</italic> > <italic>N</italic> ∧ ∃ <italic>u<sub>l</sub></italic> ∈ <italic>U<sub>m</sub></italic>:<italic>Z<sub>sl</sub></italic> = <italic>TRUE</italic>)
% in first step (m ≤ N) probe <italic>U<sub>m</sub></italic> with all other single units not yet tested, or (for
% m>N) probe <italic>U<sub>m</sub></italic> with all other single units that occur in at least one other % significant pair with a unit from <italic>U<sub>m</sub></italic>
<italic>l¯</italic> ≡ argmax<sub><italic>l</italic></sub>(#<sub>(<italic>u<sub>m</sub>,u<sub>s</sub>),l</italic></sub>) % test for significance at lag <italic>l¯</italic> with maximum pairwise count:
if <italic>Pr</italic>(<italic>Q<sub><italic>l¯</italic></sub></italic> ≥ <italic>F</italic><sub>1,(T−<italic>l¯</italic>)<italic>M</italic>−</sub><sub>1</sub>[<italic>U<sub>m</sub></italic>, <italic>u<sub>s</sub></italic>, <italic>l¯</italic>]<italic>H</italic><sub>0</sub>) ≤ α / R with <italic>R</italic> = (<italic>L<sup>new</sup></italic> − <italic>L<sup>old</sup></italic>) · {<italic>u<sub>s</sub></italic>} · (2<italic>l<sub>max</sub></italic> + 1):
<italic>r</italic> ← <italic>r</italic> + 1,
<italic>U<sub>r</sub></italic> ← {<italic>U<sub>m</sub></italic>, (<italic>u<sub>s</sub></italic>, <italic>l¯</italic>)} = {(<italic>u<sub>r</sub></italic><sub>,1</sub>, 0), (<italic>u<sub>r</sub></italic><sub>,2</sub>, <italic>l¯</italic><sub>2</sub>), ... , (<italic>u<sub>r</sub></italic>,<sub><italic>U<sub>m</sub></italic>+1</sub>, <italic>l¯</italic><sub><italic>U<sub>m</sub></italic>+1</sub>)}
if <italic>U<sub>m</sub></italic> = 1, <italic>Z<sub>sm</sub></italic> = <italic>Z<sub>ms</sub></italic> = <italic>TRUE </italic>
% form new list where each <italic>l¯<sub>j</sub></italic> is defined relative to the activationtime point (‘0’) of
the first unit <italic>u<sub>r</sub></italic>,<sub>1</sub> in the ordered list; set pairwise flag to ‘true’ if singleunit
comparison
<italic>L<sup>old</sup></italic> ← <italic>L<sup>new</sup></italic>
<monospace>UNTIL</monospace> <italic>Pr</italic>(<italic>Q<sub><italic>l¯</italic></sub></italic> ≥ <italic>F</italic>[<italic>U<sub>m</sub></italic>, <italic>u<sub>s</sub></italic>, <italic>l¯</italic>]<italic>H</italic><sub>0</sub>) > α /<italic>R</italic> for all m, s
% Pruning steps:
Discard all <italic>U<sub>m</sub></italic> for which ∃ <italic>n</italic> ≠ <italic>m</italic>: <italic>U<sub>m</sub></italic> ⊂ <italic>U<sub>n</sub></italic>For all <italic>U<sub>n</sub></italic>, <italic>U<sub>m</sub></italic> for which ∀ <italic>u<sub>s</sub></italic> ∈ <italic>U<sub>n</sub></italic>:<italic>u<sub>s</sub></italic> ∈ <italic>U<sub>m</sub></italic> :<monospace>Remove</monospace> <italic>U<sub>m</sub></italic> if <italic>Pr</italic>(<italic>Q<sub><italic>l¯</italic></sub></italic> ≥ <italic>F</italic>[<italic>U<sub>m</sub></italic>]<italic>H</italic><sub>0</sub>) > <italic>Pr</italic>(<italic>Q<sub><italic>l¯</italic></sub></italic> ≥ <italic>F</italic>[<italic>U<sub>n</sub></italic>]<italic>H</italic><sub>0</sub>), and <italic>U<sub>n</sub></italic> otherwise
In the algorithm above ⋅ denotes the cardinality of a set, and all setoperations (∈, ⊂, etc.) are defined in terms only of the unitelements composing a set (i.e., ignoring the associated lags with which their occur).
Alternative procedure
Request a detailed protocolIn the REPEATloop, instead of probing all pairwise relations among the current lists (assemblies) and all single units from significant pairs, one could also check for significant relationships among pairs of lists ${U}_{n},{U}_{m}$. As in classical hierarchical, agglomerative clusteranalytic procedures (Gordon, 1999), at each step one may only fuse the pair $\left({U}_{n},{U}_{m}\right)$ associated with the lowest pvalue, add this to the current set of lists while removing ${U}_{n},{U}_{m}$, and repeat until $Pr({Q}_{\overline{l}}\ge F[{U}_{n},{U}_{m},\overline{l}]{H}_{0})>\alpha /R$ for all $n$, $m$. This would yield a dendrogramlike representation and thus reveal strictly hierarchical nesting among the assemblies. It comes at the cost, however, that a) many higherorder assemblies may go undetected, and b) partial overlap among assemblies, which one may expect if the units in assemblies act like ‘letters in words’, would be prohibited by the definition of the agglomerative procedure. Also note that hierarchical nesting, to the degree present, could also be revealed with the definition of the agglomeration scheme in the pseudocode above if subsets of further agglomerated sets are not pruned away at the end.
Construction of synthetical ‘groundtruth’ data
Request a detailed protocolTo test the full assembly detection schemes developed above, artificial spike trains from 50 cells were created according to inhomogeneous Poisson processes by drawing interspikeintervals from an exponential distribution with rate parameter ${\lambda}_{it}$ for each unit $i$. Instantaneous firing rates ${\lambda}_{it}$ were governed by an underlying stable firstorder autoregressive process
with coefficient matrix $\mathbf{D}$, and $E\left[{\mathit{\epsilon}}_{\mathit{t}}{\mathit{\epsilon}}_{{\mathit{t}}^{\mathbf{\prime}}}^{\mathit{T}}\right]=0$ for all $t\ne {t}^{\text{'}}$ (white noise process). (Note that although $\mathbf{D}$ is set such that the autoregressive process itself is stationary, i.e. $max\left(eig\left(\mathbf{D}\right)\right)<1$, it implies fluctuations in the firing rate which makes the Poisson spiking processes themselves nonstationary in our definition above). Since $\mathbf{s}}_{t$, in principle, is unbounded (in particular, can assume negative values), it was pushed through a sigmoid nonlinearity
with erf the error function, and constant mean rate vector $\overline{\lambda}$. Finally, to ensure a refractory period, a constant delay ${\tau}_{ref}$ is added to each interspikeinterval. Where not indicated otherwise, parameters used for the simulations were $\overline{\lambda}=5Hz$ for all units, ${\tau}_{ref}=15ms$, $\mathbf{D}=0.9\text{}\mathbf{I}$, where $\mathbf{I}$ denotes the identity matrix, $\upsilon =0.2$, ${\sigma}_{s}=0.01$.
Assemblies of all five types illustrated in Figure 1A were embedded within the same set of 50 spike trains as disjunctive groups of 5 neurons each. Note that since our algorithm is aimed at detecting significant spike time patterns (rather than, for instance, underlying connectivity), explicit control of such patterns and spike train statistics with vivolike characteristics is most important for a ground truth check, while adding more biophysical realism to the underlying simulation setup would not help in this case. For assembly type I, each occurrence is marked by five precisely synchronous spikes across the set of assembly neurons (e.g. [Harris et al., 2003; Miller et al., 2014]). For assembly type II, spikes follow a precise sequential pattern across the set of assembly neurons on each instance of activation (Lee and Wilson, 2002; Diba and Buzsáki, 2007). Time lags between spikes were drawn from a uniform distribution [0 0.1] s, and then fixed for each occurrence. For assembly type III, spikes across the set of assembly neurons followed a precise temporal pattern, but did not exhibit a strict temporal order, i.e. each neuron could contribute one to several spikes to the assembly pattern without strictly leading or following others (e.g. [Ikegaya et al., 2004]). For the simulations, these patterns were generated by distributing a few spikes at a Poisson rate of 10 Hz across a period of 0.2 s for each assembly neuron, but then keeping these patterns fixed on each occasion of assembly activation.
For the less precise assembly type IV, short windows of extra spikes for each assembly neuron were organized in a specific temporal pattern, with the exact occurrence of the extra spikes within the defined time windows determined randomly on each repetition (cf. Figure 1A; e.g. [Friedrich et al., 2004; Euston et al., 2007; Luczak et al., 2007; Peyrache et al., 2009; Adler et al., 2012]). Specifically, time windows of 0.3 s with extra spikes at a Poisson rate of 10 Hz were (without loss of generality) arranged in a sequential order, with the time lag between these windows drawn from a uniform distribution, [0 0.4] s. While this sequential ordering of time windows was fixed, within each window spikes were drawn at random on each assembly repetition. Assembly type V, finally, was simply defined by an increase of the Poisson firing rate from 5 Hz to 10 Hz for periods of 1 s simultaneously within the set of assembly neurons, as, e.g., during the delay period of a working memory task (e.g. [Fuster, 1973]).
For all assembly patterns, all spikes from the background process were erased within a $\pm 15ms$ window around each assembly spike to preserve the refractory period. Assembly activation times were distributed (uniformly) randomly across the whole spike time series.
Performance evaluation: Low sample size limit and corrupted spike trains
Request a detailed protocolTo evaluate the performance, statistical power, and potential biases of our assembly detection algorithm more systematically, we focused on two experimentally relevant scenarios: Low assembly occurrence rates and spike sorting errors. The Rand index (Rand, 1971) was used to quantify the match between predefined assemblies and those retrieved by the algorithm. The Rand index measures the agreement between two partitions, in our case of units into assemblies, and is defined as
where $r$ is the number of unit pairs correctly assigned to the same assembly in both partitions, $s$ the number of unit pairs correctly assigned to two different assemblies, and $n$ the total number of detected assembly units. $R\left(r,s\right)$ varies between 0 and 1, assuming 1 only if the assembly structure extracted by the algorithm exactly maps onto the one predefined. To obtain a clean picture on the algorithm’s statistical performance for each assembly type, unconfounded by the presence of other assembly types, in these analyses each assembly type from Figure 1 was investigated separately (i.e., unlike the analyses described in the main text where the different assembly types were mixed in the same simulations).
Figure 2A plots $R\left(r,s\right)$ for all five assembly types from Figure 1 as a function of total assembly occurrences (in spike time series of length $T$=1400 s). Obviously, assembly detection gradually degrades as these structures start to drown in the noise but, as Figure 2A reveals, this only happens when the occurrence rates drop below ~0.18 repetitions/sec. Likewise, as shown in Figure 2B, more than ~30% of all spike times need to be corrupted by spike sorting errors (assignment of assembly spikes to wrong units) before performance notably decays. In more detail, Figure 2A and B suggest that sequential assembly patterns (types II and IV) are more vulnerable to lower sample sizes than those assemblies defined through precisely or broadly, respectively, synchronous firing (types I and V). The likely reason for this is that the binning procedure itself introduces some noise (as spikes may fall randomly into one or the other of two neighboring bins), which affects sequential assemblies more than simultaneous assemblies (for which, in the case of our simulations, it is guaranteed that aligned groups of spikes end up in the same bin). Considering just assembly types I and V, also note that assemblies defined by broader simultaneous rate increases are detected much more easily than those characterized by precise spiking. This is not surprising given that in absolute terms each incidence of an assembly of type V contributes much more spikes to the whole process than an assembly of type I, thus in turn causing assemblies of type V being detected across a much larger range of bin widths than assemblies of type I (cf. Figure 1C). Perhaps most importantly, however, as studied in Figure 2C–D, our statistical framework is quite conservative and rarely produces false positives in the simulated scenarios, with a false discovery rate (fraction of units incorrectly assigned to an assembly) mostly remaining below or around 0.5% across all conditions examined.
Experimental procedures
Request a detailed protocolThe invivo recordings from the rat (LongEvans) anterior cingulate cortex (ACC) were taken from two studies by Hyman et al. (Hyman et al., 2012; Hyman et al., 2013). In both studies, multiple single unit recordings were performed with a set of 16 simultaneously implanted tetrodes, with an average of 35 and 30 isolated (and artifactfree) units per recording session for the environmental exploration and delayed alternation task, respectively (with n=9 and n=11 sessions in total). In the environmental exploration task studied in Hyman et al. (Hyman et al., 2012), rats were offloaded in a novel environment which they were free to explore, with one to several transfers between two different environments. Each environment was analyzed separately by concatenating the spike trains associated with the repeated exposures to the same environment. The delayed alternation task studied in Hyman et al. (Hyman et al., 2013), a classical working memory paradigm, took place in a Skinnerbox with two levers which the animals had to press in alternating fashion. A delay of 10 s was introduced between each lever press and a nose poke the animals had to perform on the side opposite to the levers before continuing with the next lever press.
Hippocampal and entorhinal cortex (EC) recordings on the exploration task, performed simultaneously within these two areas, were borrowed from (Mizuseki et al., 2013). Recordings were collected from three LongEvans rats implanted with multishank (32 or 64 sites) silicon probes lowered into the CA1 hippocampal pyramidal layer and into layers 3–5 of entorhinal cortex. In this task (Mizuseki et al., 2009), rats were free to explore a 180 cm x 180 cm arena with water or Froot Loop items randomly dispersed throughout. Here we analyzed n=28 sessions (selecting always the longest session from each day) with on average 22 (CA1) and 19 (EC) artefactfree units per session, respectively. CA1 and EC recordings on the delayed alternation task come from (Pastalkova et al., 2008), who used the same animals employed on the exploration task (Mizuseki et al., 2009), from which we took n=23 sessions (again selecting the longest from each day) with on average 28 (CA1) and 22 (EC) isolated and reasonably artefactfree units. Animals had to alternate between the two arms of a figureeight shaped maze to obtain reward at water spouts located at the rear of the arms. A delay of 10 or 20 s, respectively, spent in a running wheel, was inserted between trials for the two animals tested. For all analyses, all units with average firing rates below 0.2 Hz were excluded. Please see original publications for further details on electrode placement, unit separation, and experimental design. CA1 and EC datasets are publicly available at www.crcns.org; ACC datasets will be made available at https://www.zimannheim.de/en/research/departmentsresearchgroupsinstitutes/theorneurosciencee/informationcomputationalneurosciencee.html.
Appendix
Relation to previous methodological approaches
Numerous other statistical procedures for detecting assemblies or sequential patterns have been previously proposed (Abeles and Gerstein, 1988; Abeles and Gat, 2001, 2001b; Tetko and Villa, 2001a; Grün et al., 2002a, 2002b; Harris, 2005; Pipa et al., 2008; Sastry and Unnikrishnan, 2010; Staude et al., 2010a, 2010b; Humphries, 2011; LopesdosSantos et al., 2011; Gansel and Singer, 2012; Gerstein et al., 2012; Shimazaki et al., 2012; PicadoMuiño et al., 2013; Torre et al., 2013, 2016a; LopesdosSantos et al., 2013; Billeh et al., 2014; Logiaco et al., 2016), but most of these adhere to one or the other theoretical conceptualization of a cell assembly (cf. Figure 1A), or become (computationally) impractical for larger cell numbers or multiple lags, e.g. because they rely on timeconsuming bootstrap analyses (e.g. [Abeles and Gat, 2001; Pipa et al., 2008; Fujisawa et al., 2008; Gansel and Singer, 2012; PicadoMuiño et al., 2013; Torre et al., 2013, 2016a]).
Along similar lines as our procedure, unitary event analysis scans simultaneously recorded spike trains for precise spike cooccurrences (Figure 1A,I) that exceed the joint spike probability predicted from independent Poisson processes with the same local rate (Grün et al., 2002a, 2002b). However, this procedure has not been extended yet to multiple lags (but see Torre et al. (2016a)) or larger bins (with higher counts), and deals with nonstationarity through sliding windows or bootstrap analyses. In another approach to synchronous spikecluster detection based on the cumulants of the population spike density of all simultaneously recorded neurons, Staude et al. (Staude et al., 2010a, 2010b) developed a method and stringent statistical test for checking the presence of higherorder (lag0) correlations among neurons, without however providing the identity of the recorded assembly units. A recent ansatz by Shimazaki et al. (Shimazaki et al., 2012) builds on a statespace model for Poisson point processes developed by Smith and Brown (Smith and Brown, 2003) to extract higherorder (lag0) precise correlation patterns from multiple simultaneously recorded spike trains (see also (Pipa et al., 2008; Gansel and Singer, 2012; PicadoMuiño et al., 2013; Torre et al., 2013, 2016b; Billeh et al., 2014) for other recent approaches to the detection of groups of synchronous single spikes).
Smith et al. (Smith and Smith, 2006; Smith et al., 2010) address the problem of testing significance of recurring spike time sequences or activity chains like those observed in hippocampal place cells (Figure 1A, II, IV; see also [Abeles and Gerstein, 1988; Abeles and Gat, 2001; Lee and Wilson, 2004; Fujisawa et al., 2008; Gerstein et al., 2012]). Their approach makes use only of the order information in the neural activations, neglecting exact relative timing of spikes or even the number of spikes emitted by each neuron, in order to allow for derivation of exact probabilities based on the multinomial distribution and combinatorial considerations. In a similar vein, Sastry & Unnikrishnan (Sastry and Unnikrishnan, 2010) employ data mining techniques like 'marketbasket analysis' and the apriorialgorithm to combat the combinatorial explosion problem in sequence detection, scanning first for significant sequential pairs, then based on this subset of pairs for triples, quadruples, and so on, iteratively narrowing down the search space as potential sequences become longer. Several procedures for revealing common modulations in firing rate have been proposed as well (Peyrache et al., 2010; LopesdosSantos et al., 2011; Humphries, 2011; LopesdosSantos et al., 2013).
Although some of these techniques are related in one or the other aspect to our algorithm, none of them, to our knowledge, combines all of the features presented here. Our procedure combines fast parametric (bootstrapfree) testing with a fast agglomeration algorithm. This enables to consider, in a heuristic sense, all potential cell combinations with a large range of different temporal lags. We furthermore introduce a novel way for dealing with nonstationarity which is fast and allows to utilize the complete data set for assembly estimation, rather than slicing it into sufficiently short windows or using computationally demanding bootstrapping. Finally, we showed how a parametric statistic for evaluating the deviation of joint spike distributions from independence can be obtained also for series of counts larger than one by splitting the process into several binary streams. This enables to treat processes developing on slower scales (larger bin widths) within the same statistical framework without loss of information, another novel feature introduced here. The statistical tools developed here may thus allow to readdress important questions about the nature of neural coding in different brain areas, without requiring the researcher to sign up for any particular assembly concept or theoretical framework a priori.
Comparison with linear decomposition and correlationbased methods
The most popular choices for studying pairwise interactions and (synchronous) multipleunit structures are, respectively, (Pearsontype) crosscorrelations (e.g. [Shadlen and Newsome, 1998; Brody, 1999]) and principal component analysis (PCA; e.g. [Peyrache et al., 2010; LopesdosSantos et al., 2011]), owing to their methodological simplicity. In comparison to the methods developed here, there are a number of important issues to note:
First, both standard crosscorrelation and PCA are purely linear techniques. For strictly binary series, the Pearson crosscorrelation is equivalent to computing the deviation of the joint spiking probability from the product of its marginals in the numerator, p(A,B)p(A)p(B) (e.g., [QuirogaLombard et al., 2013]). For larger counts or more than two units, however, crosscorrelations and PCA, unlike our method, do not capture nonlinear interactions or higherorder joint probabilities. Indeed, PCA, strictly, does not even extract correlations among units but rather variancemaximizing directions from the multiple singleunit activity space (e.g. [Krzanowski, 2000; Durstewitz, In press]), which is a different objective and may lead to results different from methods aimed directly at correlations (e.g., [Yu et al., 2009]).
Second, importantly, for either crosscorrelation or PCA based methods, statistical significance of the unraveled relationships or structures needs to be properly assessed. In particular, false positives should be avoided. This is in itself a nontrivial topic: Several authors have pointed out in the past that interpretation of crosscorrelations may be severely plagued by the presence of (inevitable) nonstationarity (like slow rate covariations or stimulus responses) which may compromise ‘traditional’ testing by means of, e.g., mere bin or interspikeinterval shuffling (Brody, 1999; Grün, 2009; QuirogaLombard et al., 2013). Indeed, as illustrated in Appendix 1—figure 1, superfluous peaks in the crosscorrelation may occur, and even flagged up as significant by conventional bootstrapping, although in reality spike times for the two units were drawn independently. Hence, more sophisticated bootstrapping and sliding window analyses have to be afforded that take into account autocorrelations in the time series and nonstationarity (Davison and Hinkley, 1997). But these imply that statistical quantities need to be recalculated hundreds to thousands of times, a heavy computational burden that may severely restrict assembly assessment to a limited number of units or few possible lag constellations and temporal scales only (given that this is an NPhard combinatorial optimization problem [Nakahara and Amari, 2002; Staude et al., 2010]). In fact, the majority of assembly detection methods focus mostly on synchronous activity (Wilson and McNaughton, 1994; Grün et al., 2003; Peyrache et al., 2010; LopesdosSantos et al., 2011). Moreover, these methods usually come with some adhoc choices, e.g., the to be used window or block length (in case of block permutations; e.g. [Efron and Tibshirani, 1993]), for which there is likely no globally optimal choice across the whole time series. For PCA based methods, sometimes the theoretical MarčenkoPastur distribution has been used for assessing significance of eigenvalues (Peyrache et al., 2010; LopesdosSantos et al., 2011), but as illustrated in Appendix 1—figure 2, this may be quite misleading especially in the case of nonstationary data.
Third, crosscorrelation analysis still needs to be augmented with some agglomeration scheme that builds up higherorder structures from the pairwise interactions, again a nontrivial endeavor in its own right, especially if various time lag constellations are to be evaluated. PCA, on the other hand, in its basic and most applied formulation only recovers strictly synchronous activity. PCA also comes with a number of other inherent shortcomings, (a) because it is not geared toward identifying correlations but, as noted above, variancemaximizing directions (Krzanowski, 2000; Yu et al., 2009; Durstewitz, In press), (b) because it may produce assemblies of very unequal size since it places most variation on the first factor, after which variance contributions often fall off exponentially, and (c) because of quite high susceptibility to noise if assembly structures are not very clearcut (Appendix 1—figure 3).
Our algorithm aims to address all these statistical and computational issues in one go, provides proper and fast statistical assessment of unit interactions, unconfounded by nonstationarity within the limits shown (Figure 7, Figure 7—figure supplements 1,2), does this for all possible lag constellations (not just synchrony), and comes with an efficient, statistically anchored algorithm for detecting higherorder structures.
References

1
Detecting precise firing sequences in experimental dataJournal of Neuroscience Methods 107:141–154.https://doi.org/10.1016/S01650270(01)003648

2
Detecting spatiotemporal firing patterns among simultaneously recorded single neuronsJournal of Neurophysiology 60:909–924.

3
Corticonics: Neural Circuits of the Cerebral CortexCambridge: Cambridge University Press.

4
Temporal convergence of dynamic cell assemblies in the striatopallidal networkJournal of Neuroscience 32:2473–2484.https://doi.org/10.1523/JNEUROSCI.483011.2012
 5

6
Neuronal avalanches in neocortical circuitsJournal of Neuroscience 23:11167–11177.

7
Revealing cell assemblies at multiple levels of granularityJournal of Neuroscience Methods 236:92–106.https://doi.org/10.1016/j.jneumeth.2014.08.011

8
Time Series Analysis: Forecasting and ControlTime Series Analysis: Forecasting and Control.

9
Correlations without synchronyNeural Computation 11:1537–1551.https://doi.org/10.1162/089976699300016133

10
Neuronal oscillations in cortical networksScience 304:1926–1929.https://doi.org/10.1126/science.1099745
 11
 12
 13

14
Forward and reverse hippocampal placecell sequences during ripplesNature Neuroscience 10:1241–1242.https://doi.org/10.1038/nn1961
 15

16
Neurocomputational models of working memoryNature Neuroscience 3:1184–1191.https://doi.org/10.1038/81460
 17
 18
 19

20
Time cells in the hippocampus: a new dimension for mapping memoriesNature Reviews. Neuroscience 15:1–13.https://doi.org/10.1038/nrn3827
 21
 22
 23

24
Multiplexing using synchrony in the zebrafish olfactory bulbNature Neuroscience 7:862–871.https://doi.org/10.1038/nn1292
 25

26
Behaviordependent shortterm assembly dynamics in the medial prefrontal cortexNature Neuroscience 11:823–833.https://doi.org/10.1038/nn.2134

27
Unit activity in prefrontal cortex during delayedresponse performance: neuronal correlates of transient memoryJournal of Neurophysiology 36:61–78.

28
Detecting multineuronal temporal patterns in parallel spike trainsFrontiers in Neuroinformatics 6:18.https://doi.org/10.3389/fninf.2012.00018
 29

30
Detecting synfire chains in parallel spike dataJournal of Neuroscience Methods 206:54–64.https://doi.org/10.1016/j.jneumeth.2012.02.003
 31
 32
 33
 34

35
Effect of crosstrial nonstationarity on jointspike eventsBiological Cybernetics 88:335–351.https://doi.org/10.1007/s0042200203862

36
Datadriven significance estimation for precise spike correlationJournal of Neurophysiology 101:1126–1140.https://doi.org/10.1152/jn.00093.2008
 37
 38

39
Neural signatures of cell assembly organizationNature Reviews. Neuroscience 6:399–407.https://doi.org/10.1038/nrn1669

40
The Elements of Statistical Learning: Data Mining, Inference, and PredictionSpringer US.
 41

42
Spiketrain communities: finding groups of similar spike trainsJournal of Neuroscience 31:2321–2336.https://doi.org/10.1523/JNEUROSCI.285310.2011
 43

44
Action and outcome activity state patterns in the anterior cingulate cortexCerebral Cortex 23:1257–1268.https://doi.org/10.1093/cercor/bhs104
 45
 46

47
Principles of Multivariate Analysis: A User’s PerspectiveNew York: Oxford Univ Press.

48
How precise is neuronal synchronization?Neural Computation 7:469–485.https://doi.org/10.1162/neco.1995.7.3.469
 49
 50
 51

52
Prediction, sequences and the hippocampusPhilosophical Transactions of the Royal Society of London. Series B, Biological Sciences 364:1193–1201.https://doi.org/10.1098/rstb.2008.0316
 53
 54
 55

56
Detecting cell assemblies in large neuronal populationsJournal of Neuroscience Methods 220:149–166.https://doi.org/10.1016/j.jneumeth.2013.04.010
 57
 58
 59
 60

61
Multiple single unit recordings from different rat hippocampal and entorhinal regions while the animals were performing multiple behavioral tasksContributed by Gyorgy Buzsáki Lab, New York University.
 62

63
Informationgeometric measure for neural spikesNeural Computation 14:2269–2316.https://doi.org/10.1162/08997660260293238
 64
 65

66
Principal component analysis of ensemble recordings reveals cell assemblies at high temporal resolutionJournal of Computational Neuroscience 29:309–325.https://doi.org/10.1007/s1082700901546

67
Replay of rulelearning related neural patterns in the prefrontal cortex during sleepNature Neuroscience 12:919–926.https://doi.org/10.1038/nn.2337

68
Finding neural assemblies with frequent item set miningFrontiers in Neuroinformatics 7:9.https://doi.org/10.3389/fninf.2013.00009

69
NeuroXidence: reliable and efficient analysis of an excess or deficiency of jointspike eventsJournal of Computational Neuroscience 25:64–88.https://doi.org/10.1007/s1082700700653

70
Method for stationaritysegmentation of spike train data with application to the Pearson crosscorrelationJournal of Neurophysiology 110:562–572.https://doi.org/10.1152/jn.00186.2013

71
Objective criteria for the evaluation of clustering methodsJournal of the American Statistical Association 66:846–850.https://doi.org/10.1080/01621459.1971.10482356
 72
 73
 74

75
Simultaneously recorded single units in the frontal cortex go through sequences of discrete and stable states in monkeys performing a delayed localization taskJournal of Neuroscience 16:752–768.
 76

77
The variable discharge of cortical neurons: implications for connectivity, computation, and information codingJournal of Neuroscience 18:3870–3896.

78
Statespace analysis of timevarying higherorder spike correlation for multiple neural spike train dataPLoS Computational Biology 8:e1002385.https://doi.org/10.1371/journal.pcbi.1002385

79
Visual feature integration and the temporal correlation hypothesisAnnual Review of Neuroscience 18:555–586.https://doi.org/10.1146/annurev.ne.18.030195.003011
 80
 81

82
Estimating a statespace model from point process observationsNeural Computation 15:965–991.https://doi.org/10.1162/089976603765202622

83
Probability of repeating patterns in simultaneous neural dataNeural Computation 22:2522–2536.https://doi.org/10.1162/NECO_a_00020
 84

85
Higherorder correlations in nonstationary parallel spike trains: statistical modeling and inferenceFrontiers in Computational Neuroscience 4:1–17.https://doi.org/10.3389/fncom.2010.00016

86
CuBIC: cumulant based inference of higherorder correlations in massively parallel spike trainsJournal of Computational Neuroscience 29:327–350.https://doi.org/10.1007/s108270090195x
 87
 88

89
Some representations of the multivariate Bernoulli and binomial distributionsJournal of Multivariate Analysis 32:256–268.https://doi.org/10.1016/0047259X(90)90084U

90
ASSET: analysis of sequences of synchronous events in massively parallel spike trainsPLoS Computational Biology 12:e1004939.https://doi.org/10.1371/journal.pcbi.1004939

91
Statistical evaluation of synchronous spike patterns extracted by frequent item set miningFrontiers in Computational Neuroscience 7:132.https://doi.org/10.3389/fncom.2013.00132

92
Synchronous spike patterns in macaque motor cortex during an instructeddelay reachtograsp taskJournal of Neuroscience 36:8329–8340.https://doi.org/10.1523/JNEUROSCI.437515.2016
 93

94
Gaussianprocess factor analysis for lowdimensional singletrial analysis of neural population activityJournal of Neurophysiology 102:614–635.https://doi.org/10.1152/jn.90941.2008

95
The cortex as a central pattern generatorNature Reviews Neuroscience 6:477–483.https://doi.org/10.1038/nrn1686
Decision letter

Marc HowardReviewing Editor; Boston University, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "Cell assemblies at multiple time scales with arbitrary lag distributions" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom, Marc Howard (Reviewer #1), is a member of our Board of Reviewing Editors and the evaluation has been overseen by David Van Essen as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Peter König (Reviewer #2); Bruno Averbeck (Reviewer #3).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
Summary:
Noting the many definitions of “cell assembly” the paper introduces a novel technique for measuring coordinated motifs of sequential firing. This method is computationally efficient and, because it makes minimal assumptions about scale, could be widely used to clarify the notion of cell assemblies, a muchneeded theoretical development. Results from different brain regions in different behavioral tasks show a range of time scales suggesting that these reflect a diversity of mechanisms supporting sequences of firing patterns.
Essential revisions:
1) It is essential that a revision better discuss the filtering properties implicit in the method and address a specific concern:
The binning is equivalent to a boxcar average, i.e. temporal filtering with a rectangular window. I could say no window, see Press, W. H., & Teukolsky, S. A. (1988). Numerical recipes in C. Cambridge Figure 2.4.2. This implies that different frequencies are mapped to very different frequency bins. As here no Fourier analysis is performed, in itself it is not that bad. However, the comparison with the time reversed pattern is for different patterns at different distance. Thus, the implicit filtering is suddenly very important. At the very least, the sensitivity varies for patterns of different distance. Moreover, for the important case of perfect 0lag synchrony, the choice of reference is important. Too close by might lead to a spill over, too far away might allow non stationarities to creep in.
This problem might be addressed either by a convolution with e.g. a Gaussian kernel before binning, in order to create well defined and better behaved temporal properties. And/or, a standard and uniform across tests distance to the reference bin could help. Alternatively, I'm open to very good arguments, why it is not a problem after all jointly with an investigation of sensitivity.
2) A revision should better discuss the generality of the method. The method is advertised as a unified tool for all cases. Although I like it a lot, there might be a few gaps. Consider an avalanche model, where the avalanche might propagate in either one of the other direction. This creates a pair of above chance coactivation with reverse sequence, the present method is blind to. Such limitations have to be better discussed.
https://doi.org/10.7554/eLife.19428.022Author response
[…]
Essential revisions:
1) It is essential that a revision better discuss the filtering properties implicit in the method and address a specific concern:
The binning is equivalent to a boxcar average, i.e. temporal filtering with a rectangular window. I could say no window, see Press, W. H., & Teukolsky, S. A. (1988). Numerical recipes in C. Cambridge Figure 13.4.2. This implies that different frequencies are mapped to very different frequency bins. As here no Fourier analysis is performed, in itself it is not that bad. However, the comparison with the time reversed pattern is for different patterns at different distance. Thus, the implicit filtering is suddenly very important. At the very least, the sensitivity varies for patterns of different distance.
Yes, we agree, this is a very valid point, and we appreciate the referees' insight on this.
As the referees noted, there are two steps in our method that may imply some sort of temporal filtering. The first is the choice of bin size itself, as larger bins imply stronger lowpass filtering. This is, we think, however, not a problematic, or in fact even desired, feature of our algorithm, since the purpose of evaluating larger bins is precisely to search for structure with coarser granularity, averaging across finer temporal details (which themselves are addressed through the choice of smaller bin widths).
The second type of 'filtering', as the referees indicate, is through the choice of target and reference lag. It may be important to first stress that this ‘filtering’ does not affect the detection of significant patterns at a given lag, other than through the fact that nonstationarity is removed 'only' at time scales defined by this lag (see subsection “Statistical test for pairwise interaction“). This is to say that our significance test will always be valid as long as the process is stationary across at least the length of the lag considered.
However, it may affect the comparisons of patterns at different lags, as essentially for the smaller lags nonstationarity on finer time scales is corrected (in this sense enhancing temporal ‘sensitivity’, as the referees correctly pointed out).
We address this issue in a new section (“Choice of reference (correction) lag”) where we generally advice choosing the reverse lag, for several reasons, unless an explicit comparison of the relevance of patterns spanning different lag sizes is central to the research question. In this case, where this issue of different ‘sensitivities’ cannot be fully avoided, we now recommend choosing ${\text{l}}_{\text{max}}+1$ (where ${\text{l}}_{\text{max}}$ is the maximum lag tested) as a common reference lag for all target lags considered (thus eliminating differential ‘filtering’ as a possible confound).
Although there are theoretical reasons why sometimes one choice of reference lag may be preferred over the other, additional analyses we have run now on the experimental data sets (also reported in the new section “Choice of reference (correction) lag”) suggest that, in practice, the precise choice of reference lag is not overly crucial and the significant pairs detected by one or the other method largely overlapped (>96%, including the 0lag case).
Moreover, for the important case of perfect 0lag synchrony, the choice of reference is important. Too close by might lead to a spill over, too far away might allow non stationarities to creep in.
Yes, this is correct, and the most pragmatic solution to the problem may be to evaluate 0lag assemblies for a few different reference lags to confirm that the structure unraveled is largely insensitive to this precise choice (this is now included in the revised text as an explicit recommendation in the Discussion section and Materials and methods section).
We have evaluated this issue now in more depth by probing the detection of synchronous assemblies at both fine (‘type I’, cf. Figure 1) and broad (‘type V’) time scales with a set of different reference lags (new Figure 1—figure supplement 1). While we observed that indeed using too small a reference lag (${\text{l}}^{\text{*}}=1$) starts losing some structure while too large lags (${\text{l}}^{\text{*}}<3$) diminish temporal sensitivity, generally these effects were rather mild in the sense that detection of 0lag structure was, once again (see above), rather robust and independent of the precise choice of reference lag. Thus, from these theoretical and empirical considerations, one may derive the recommendation to use ${\text{l}}^{\text{*}}=2$ as a default for the synchronous case, as in the present manuscript, to safeguard against potential spillover issues, but possibly probe with a few different reference lags if unsure about this particular choice for a given experimental situation.
This problem might be addressed either by a convolution with e.g. a Gaussian kernel before binning, in order to create well defined and better behaved temporal properties. And/or, a standard and uniform across tests distance to the reference bin could help. Alternatively, I'm open to very good arguments, why it is not a problem after all jointly with an investigation of sensitivity.
We have addressed this issue as indicated in our previous replies above, that is, by basically following the referees' second and third recommendation, providing both some theoretical arguments for the choices made, but also rerunning analyses with a common reference lag, as suggested, and performing some sensitivity analyses for the lag0 case (reported in the new sect. “Choice of reference (correction) lag”).
Convolution with a Gaussian kernel is a nice idea which we have frequently relied on in the past (e.g., Durstewitz et al. 2010, Lapish et al., 2015, J Neurosci, 35(28):10172–10187), but is less straightforward to apply in the present case since, strictly, our test statistic is derived for count (integervalued) variables (although extensions are certainly possible).
2) A revision should better discuss the generality of the method. The method is advertised as a unified tool for all cases. Although I like it a lot, there might be a few gaps. Consider an avalanche model, where the avalanche might propagate in either one of the other direction. This creates a pair of above chance coactivation with reverse sequence, the present method is blind to. Such limitations have to be better discussed.
We thank the referees for bringing this up. Sequences repeated in reverse direction are indeed an important special case (please note, however, that avalanches, in particular, at least by some authors have been attributed to feedforward networks like synfire chains (e.g., Beggs and Plenz, 2003), which, if we are not mistaken, would imply directionality?). In the revised version, we have now included this point about reverse sequences, and have extended it into a wider discussion about the choice of reference lag and its implications and limitations (sect. ‘Limitations of parametric testing under nonstationarity’, new subsection “Choice of reference (correction) lag”). We have also included a new paragraph on limitations associated with the proposed difference statistic and choice of reference lag into the main Discussion section.
For the specific case of reverse sequences, we note that another choice of reference lag may provide a remedy, as discussed now in subsection “Choice of reference (correction lag).
https://doi.org/10.7554/eLife.19428.023Article and author information
Author details
Funding
Deutsche Forschungsgemeinschaft (CRC1134, SP D01)
 Daniel Durstewitz
Bundesministerium für Bildung und Forschung (01GQ1003B)
 Daniel Durstewitz
Deutsche Forschungsgemeinschaft (DU354/81)
 Daniel Durstewitz
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We are deeply indebted to Drs. James Hyman and Jeremy Seamans for lending us their ACC data (reported in [Hyman et al., 2012; Hyman et al., 2013]) for the present analysis. The CA1 and EC data were made publicly available by the authors (Mizuseki et al., 2013) at www.crcns.org. We also would like to thank Drs. Andreas Draguhn, Martin Both, Thomas Fucke, Hazem Toutounji, Loreen Hertäg, and Grant Sutcliffe for commenting on a previous version of this article.
Reviewing Editor
 Marc Howard, Boston University, United States
Publication history
 Received: July 8, 2016
 Accepted: October 27, 2016
 Version of Record published: January 11, 2017 (version 1)
Copyright
© 2017, Russo et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 3,091
 Page views

 621
 Downloads

 21
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, Scopus, PubMed Central.