Peer review process
Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.
Read more about eLife’s peer review process.Editors
- Reviewing EditorAnnalisa ScimemiUniversity at Albany, State University of New York, Albany, United States of America
- Senior EditorPanayiota PoiraziFORTH Institute of Molecular Biology and Biotechnology, Heraklion, Greece
Reviewer #1 (Public review):
Summary:
The authors note that while many software packages exist for spike sorting, these do not automatically differentiate with known accuracy between excitatory and inhibitory neurons. Moreover, most existing spike sorting packages are for in vivo use, where the majority of electrodes are separated from each other by several hundred microns or more. There is a need for spike sorting packages that can take advantage of high-density electrode arrays where all electrodes are within a few tens of microns from other electrodes. Here, the authors offer such a software package with SpikeMAP, and they validate its performance in identifying parvalbumin interneurons that were optogenetically stimulated.
Strengths:
The main strength of this work is that the authors use ground truth measures to show that SpikeMAP can take features of spike shapes to correctly identify known parvalbumin interneurons against a background of other neuron types. They use spike width and peak to peak distance as the key features for distinguishing between neuron types, a method that has been around for many years (Barthó, Peter, et al. "Characterization of neocortical principal cells and interneurons by network interactions and extracellular features." Journal of neurophysiology 92.1 (2004): 600-608.), but whose performance has not been validated in the context of high-density electrode arrays.
Another strength of this approach is that it is automated - a necessity if your electrode array has 4096 electrodes. Hand-sorting or even checking such a large number of channels is something even the cruellest advisor would not wish upon a graduate student. With such large channel counts, it is essential to have automated methods that are known to work accurately. Hence, the combination of validation and automation is an important advance.
A nice feature of this work is that with high-density electrode arrays, the spike waveforms appear on multiple nearby electrodes simultaneously. And since spike amplitudes fall off with distance, this allows triangulation of neuron locations within the regular electrode array. Thus, spike correlations between neuron types, or within neuron types, can be plotted as a function of distance. While SpikeMAP is not the first to do this (Peyrache, Adrien, et al. "Spatiotemporal dynamics of neocortical excitation and inhibition during human sleep." Proceedings of the national academy of sciences 109.5 (2012): 1731-1736.), it is a welcome capability of this package.
It is also good that the code for this package is open-source, allowing a community of people (I expect in vitro labs will especially want to use this) to use the code and further improve it.
Weaknesses:
As this code was developed for use with a 4096-electrode array, it is important to be aware of double counting neurons across the many electrodes. I understand that there are ways within the code to ensure that this does not happen, but care must be taken in two key areas: First, action potentials traveling down axons will exhibit a triphasic waveform that is different from the biphasic waveform that appears near the cell body, but these two signals will still be from the same neuron (for example, see Litke et al., 2004 "What does the eye tell the brain: Development of a System for the Large-Scale Recording of Retinal Output Activity"; figure 14). I did not see anything that would directly address this situation, so it might be something for you to consider in updated versions of the code. Second, spike shapes are known to change when firing rates are high, like in bursting neurons (Harris, K.D., Hirase, H., Leinekugel, X., Henze, D.A. & Buzsáki, G. Temporal interaction between single spikes and complex spike bursts in hippocampal pyramidal cells. Neuron 32, 141-149 (2001)). I did not see this addressed in the present version of the manuscript.
Another area for possible improvement would be to build on the excellent validation experiments you have already conducted with parvalbumin interneurons. Although it would take more work, similar experiments could be conducted for somatostatin and vasoactive intestinal peptide neurons against a background of excitatory neurons. These may have different spike profiles, but your success in distinguishing them can only be known if you validate against ground truth, like you did for the PV interneurons.
Appraisal:
This work addresses the need for an automated spike sorting software package for high density electrode arrays. Although no spike sorting software is flawless, the package presented here, SpikeMAP, has been validated on PV interneurons, inspiring a degree of confidence. This is a good start, and further validation on other neuron types could increase that confidence. Groups doing in vitro experiments, where 4096 electrode arrays are more common, could find this system particularly helpful.
Comments on revised version:
I appreciate the dialogue that has occurred over this submission. I have seen how the authors have taken into account the issues that I have raised, as well as those brought up by reviewer 2. I am satisfied that the paper has improved and is now a novel and useful contribution in the area of spike sorting.
Reviewer #2 (Public review):
Summary:
In this paper, entitled "SpikeMAP: An unsupervised spike sorting pipeline for cortical excitatory and inhibitory 2 neurons in high-density multielectrode arrays with ground-truth validation", the authors are presenting spikeMAP, a pipeline for the analysis of large-scale recordings of in vitro cortical activity. According to the authors, spikeMAP not only allows for the detection of spikes produced by single neurons (spike sorting), but also allows for the reliable distinction between genetically determined cell types by utilizing viral and optogenetic strategies as ground-truth validation. While I find that the paper is nicely written, and easy to follow, I find that the algorithmic part of the paper is not really new and should have been more carefully compared to existing solutions. While the GT recordings to assess the possibilities of a spike sorting tool to distinguish properly between excitatory and inhibitory neurons is interesting, spikeMAP does not seem to bring anything new to state of the art solutions, and/or, at least, it would deserve to be properly benchmarked. This is why I would suggest the authors to perform a more intensive comparison with existing spike sorters.
Strengths:
The GT recordings with optogenetic activation of the cells, based on the opsins is interesting and might provide useful data to quantify how good spike sorting pipelines are, in vitro, to discriminate between excitatory and inhibitory neurons. Such an approach can be quite complementary with artificially generated ground truth.
Weaknesses:
The global workflow of spikeMAP, described in Figure 1, seems to be very similar to the one of [Hilgen et al, 2020, 10.1016/j.celrep.2017.02.038.]. Therefore, the first question is what is the rationale of reinventing the wheel, and not using tools that are doing something very similar (as mentioned by the authors themselves). I have a hard time, in general, believing that spikeMAP has something particularly special, given its Methods, compared to state-of-the-art spike sorters. This is why at the very least, the title of the paper is misleading, because it let the reader think that the core of the paper will be about a new spike sorting pipeline. If this is the main message the authors want to convey, then I think that numerous validations/benchmarks are missing to assess first how good spikeMAP is, w.r.t. spike sorting in general, before deciding if this is indeed the right tool to discriminate excitatory vs inhibitory cells. The GT validation, while interesting, is not enough to entirely validate the paper. The details are a bit too scarce to me, or would deserve to be better explained (see other comments after)
Regarding the putative location of the spikes, it has been shown that center of mass, while easy to compute, is not the most accurate solution [Scopin et al, 2024, 10.1016/j.jneumeth.2024.110297]. For example, it has an intrinsic bias for finding positions within the boundaries of the electrodes, while some other methods such as monopolar triangulation or grid-based convolution might have better performances. Can the authors comment on the choice of Center of Mass as a unique way to triangulate the sources?
Still in Figure 1, I am not sure to really see the point of Spline Interpolation. I see the point of such a smoothing, but the authors should demonstrate that it has a key impact on the distinction of Excitatory vs. Inhibitory cells. What's special with the value of 90kHz for a signal recorded at 18kHz? What is the gain with spline enhancement compared to without? Does such a value depend on the sampling rate, or is it a global optimum found by the authors?
Figure 2 is not really clear, especially panel B. The choice of the time scale for the B panel might not be the most appropriate, and the legend filtered/unfiltered with a dot is not clear to me in Bii. In panel E, the authors are making two clusters with PCA projections on single waveforms. Does this mean that the PCA is only applied to the main waveforms, i.e. the ones obtained where the amplitudes are peaking the most? This is not really clear from the methods, but if this is the case, then this approach is a bit simplistic and not really matching state-of-the-art solutions. Spike waveforms are quite often, especially with such high-density arrays, covering multiple channels at once and thus the extracellular patterns triggered by the single units on the MEA are spatio-temporal motifs occurring on several channels. This is why, in modern spike sorters, the information in a local neighbourhood is often kept to be projected, via PCA, on the lower dimensional space before clustering. Information on a single channel only might not be informative enough to disambiguate sources. Can the authors comment on that, and what is the exact spatial resolution of the 3Brain device? The way the authors are performing the SVD should be clarified in the methods section. Is it on a single channel, and/or on multiple channels in a local neighbourhood?
About the isolation of the single units, here again, I think the manuscript lacks some technical details. The authors are saying that they are using a k-means cluster analysis with k=2. This means that the authors are explicitly looking for 2 clusters per electrodes. If so, this is a really strong assumption that should not be held in the context of spike sorting, because since it is a blind source separation technique, one cannot pre-determine in advance how many sources are present in the vicinity of a given electrode. While the illustration on Figure 2E is ok, there is no guarantee that one cannot find more clusters, so why this choice of k=2? Again, this is why most modern spike sorting pipelines are not relying on k-means, to avoid any hard coded number of clusters. Can the authors comment on that?
I'm surprised by the linear decay of the maximal amplitude as a function of the distance from soma, as shown in Figure 2H. Is it really what should be expected? Based on the properties of the extracellular media, shouldn't we expect a power law for the decay of the amplitude? This is strange that up to 100um away from the some, the max amplitude only dropped from 260 to 240 uV. Can the authors comment on that? It would be interesting to plot that for all neurons recorded, in a normed manner V/max(V) as function of distances, to see what the curve looks like
In Figure 3A, it seems that the total number of cells is rather low for such a large number of electrodes. What are the quality criteria that are used to keep these cells? Did the authors exclude some cells from the analysis, and if yes, what are the quality criteria that are used to keep cells? If no criteria are used (because none is mentioned in the Methods), then how come so few cells are detected, and can the authors convince us that these neurons are indeed "clean" units (RPVs, SNRs, ...)
Still in Figure 3A, it looks like there is a bias to find inhibitory cells at the borders, since they do not appear to be uniformly distributed over the MEA. Can the authors comment on that? What would be the explanation for such a behaviour? It would be interesting to see some macroscopic quantities on Excitatory/Inhibitory cells, such as mean firing rates, averaged SNRs, ... Because again, in Figure 3C, it is not clear to me that the firing rates of inhibitory cells is higher than Excitatory ones, while it should be in theory.
For Figure 3 in general, I would have performed an exhaustive comparison of putative cells found by spikeMAP and other sorters. More precisely, I think that to prove the point that spikeMAP is indeed bringing something new to the field of spike sorting, the authors should have compared the performances of various spike sorters to discriminate Exc vs Inh cells based on their ground truth recordings. For example, either using Kilosort [Pachitariu et al, 2024, 10.1038/s41592-024-02232-7], or some other sorters that might be working with such large high-density data [Yger et al, 2018, 10.7554/eLife.34518]
Figure 4 has a big issue, and I guess the panels A and B should be redrawn. I don't understand what the red rectangle is displaying.
I understand that Figure 4 is only one example, but I have a hard time understanding from the manuscript how many slices/mice were used to obtain the GT data? I guess the manuscript could be enhanced by turning the data into an open access dataset, but then some clarification is needed. How many flashes/animals/slices are we talking about. Maybe this should be illustrated in Figure 4, if this figure is devoted to the introduction of the GT data.
While there is no doubt that GT data as the ones recorded here by the authors are the most interesting data from a validation point of view, the pretty low yield of such experiments should not discourage the use of artificially generated recordings such as the ones made in [Buccino et al, 2020, 10.1007/s12021-020-09467-7] or even recently in [Laquitaine et al, 2024, 10.1101/2024.12.04.626805v1]. In these papers, the authors have putative waveforms/firing rates patterns for excitatory and inhibitory cells, and thus the authors could test how good they are in discriminating the two subtypes
Comments on revised version:
While I must thank the authors for their answers, I still think that they miss an important one, and only partially answering some of my concerns.
I truly think that SpikeMAP would benefit with a comparison with a state-of-the-art spike sorting pipeline, for example Kilosort. The authors said that they made the sorter modular enough such that only the E/I classification step can be compared. I think this would be worth it, just to be sure that SpikeMAP spike sorting, which might be more simple than other recent solution (with template matching), is not missing some cells, and thus degrading the E/I classification performances. I know that such a comparison is not straightforward, because there is no clear ground truth, but I would still need to be convinced that the sorting pipelines is bringing something, on its own. While there is no doubt that the E/I classification layer can be interesting, especially given the recordings shared by the authors, I'm still a bit puzzled by the sorting step. Thus maybe either a Table, a figure, or even as Supplementary one. Or the authors could try to generate fake GT data with MEArec for example, with putative E/I cells (discriminated via waveforms and firing rates) and show on such (oversimplified) data that SpikeMAP is performing similarly to modern spike sorters. Otherwise, this is a bit hard to judge...