Efficient and reproducible pipelines for spike sorting large-scale electrophysiology data

Alessio P Buccino; Arjun Sridhar; David Feng; Karel Svoboda; Joshua H Siegle

doi:10.7554/eLife.110170.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Reviewing Editor
Lisa Giocomo
Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, United States of America
Senior Editor
Panayiota Poirazi
FORTH Institute of Molecular Biology and Biotechnology, Heraklion, Greece

Reviewer #1 (Public review):

Summary:

Extracellular electrophysiology datasets are growing in both number and size, and recordings with thousands of sites per animal are now commonplace. Analyzing these datasets to extract the activity of single neurons (spike sorting) is challenging: signal-to-noise is low, the analysis is computationally expensive, and small changes in analysis parameters and code can alter the output. The authors address the problem of volume by packaging the well-characterized SpikeInterface pipeline in a framework that can distribute individual sorting jobs across many workers in a compute cluster or cloud environment. Reproducibility is ensured by running containerized versions of the processing components.

The authors apply the pipeline in two important examples. The first is a thorough study comparing the performance of two widely used spike-sorting algorithms (Kilosort 2.5 and Kilosort 4). They use hybrid datasets created by injecting measured spike waveforms (templates) into existing recordings, adjusting those waveforms according to the measured drift in the recording. These hybrid ground truth datasets preserve the complex noise and background of the original recording. Similar to the original Kilosort 4 paper, which uses a different method for creating ground truth datasets that include drift, the authors find Kilosort 4 significantly outperforms Kilosort 2.5. The second example measures the impact of compression of raw data on spike sorting with Kilosort 4, showing that accuracy, precision, and recall of the ground truth units are not significantly impacted even by lossy compression. As important as the individual results, these studies provide good models for measuring the impact of particular processing steps on the output of spike sorting.

Strengths:

The pipeline uses the Nextflow framework, which makes it adaptable to different job schedulers and environments. The high-level documentation is useful, and the GitHub code is well organized. The two example studies are thorough and well-designed, and address important questions in the analysis of extracellular electrophysiology data.

Weaknesses:

The pipeline is very complete, but also complex. Workflows - the optimal artifact removal, best curation for data from a particular brain area or species - will vary according to experiment. Therefore, a discussion of the adaptability of the pipeline in the "Limitations" section would be helpful for readers.

https://doi.org/10.7554/eLife.110170.1.sa2

Reviewer #2 (Public review):

Summary:

This work presents a reproducible, scalable workflow for spike sorting that leverages parallelization to handle large neural recording datasets. The authors introduce both a processing pipeline and a benchmarking framework that can run across different computing environments (workstations, HPC clusters, cloud). Key findings include demonstrating that Kilosort4 outperforms Kilosort2.5 and that 7× lossy compression has minimal impact on spike sorting performance while substantially reducing storage costs.

Strengths:

(1) Extremely high-quality figures with clear captions that effectively communicate complex workflow information.

(2) Very detailed, well-written methods section providing thorough documentation.

(3) Strong focus on reproducibility, scalability, modularity, and portability using established technologies (Nextflow, SpikeInterface, Code Ocean).

(4) Pipeline publicly available on GitHub with documentation.

(5) Clear cost analysis showing ~$5/hour for AWS processing with transparent breakdown.

(6) Good overview of previous spike sorting benchmarking attempts in the introduction.

(7) Practical value for the community by lowering barriers to processing large datasets.

Weaknesses:

No significant weaknesses were identified, although it is noted that the limitations section of the discussion could be expanded.

https://doi.org/10.7554/eLife.110170.1.sa1

Reviewer #3 (Public review):

Summary:

The authors provide a highly valuable and thoroughly documented pipeline to accelerate the processing and spike sorting of high-density electrophysiology data, particularly from Neuropixels probes. The scale of data collection is increasing across the field, and processing times and data storage are growing concerns. This pipeline provides parallelization and benchmarking of performance after data compression that helps address these concerns. The authors also use their pipeline to benchmark different spike sorting algorithms, providing useful evidence that Kilosort4 performs the best out of the tested options. This work, and the ability to implement this pipeline with minimal effort to standardize and speed up data processing across the field, will be of great interest to many researchers in systems neuroscience.

Strengths:

The paper is very well written and clear in most places. The accompanying GitHub and ReadTheDocs are well organized and thorough. The authors provide many benchmarking metrics to support their claims, and it is clear that the pipeline has been very thoroughly tested and optimized by users at the Allen Institute for Neural Dynamics. The pipeline incorporates existing software and platforms that have also been thoroughly tested (such as SpikeInterface), so the authors are not reinventing the wheel, but rather putting together the best of many worlds. This is a great contribution to the field, and it is clear that the authors have put a lot of thought into making the pipeline as accessible as possible.

Weaknesses:

There are no major weaknesses. I have only a handful of very minor questions and suggestions that could clarify/generalize aspects of the pipeline or make the text more understandable to non-specialists.

(1) Could the authors please expand on the statement on line 274, that processing their test dataset serially "on a single GPU-capable cloud workstation... would take approximately 75 hours and cost over 90 USD." How were these values calculated? I was a bit surprised that this is a >4-fold slow-down from their pipeline, but only increases the cost by ~1.35x, if I understood correctly. More context on why this is, and maybe some context on what a g4dn.4xlarge is compared to the other instances, might help readers who are less familiar with AWS and cloud computing.

(2) One of the most commonly used preprocessing pipelines for Neuropixels data is the CatGT/ecephys pipeline from the developers of SpikeGLX at Janelia. It may be worth commenting very briefly, either in the preprocessing section or in the discussion, on how the preprocessing steps available in this pipeline compare to the steps available in CatGT. For example, is "destriping" similar to the "-gfix" option in catGT to remove high-amplitude artifacts?

(3) Why are there duplicate units (line 194), and how often is this an issue? I understand that this is likely more of a spike sorter issue than an issue with this pipeline, but 1-2 sentences elaborating why might be helpful for readers.

(4) It seems from the parameter files on GitHub that the cluster curation parameters are customizable - correct? If so, it may be worth explicitly saying so in the curation section of the text, as the presented recipe will not always be appropriate. A presence ratio of >0.8 could be particularly problematic for some recordings, for example, if a cell is only active during a specific part of the behavior, that may be a feature of the experiment, or the animal could be transitioning between sleep and wake states, in which different units may become active at different times.

(5) The axis labels in Figures 3d-e are too small to see, and Figure 3d would benefit from a brief description of what is shown.

(6) What is the difference between "neural" and "passing QC" in Figure 4?

(7) I understand the current paper is focused on spike data, so there may not be an answer to this, but I am curious about the NP2.0 probes that save data in wideband. Does the lossy compression negatively affect the LFP data? Is software filtering applied for the spike band before or after compression?

https://doi.org/10.7554/eLife.110170.1.sa0

Efficient and reproducible pipelines for spike sorting large-scale electrophysiology data

Peer review process

Editors

Be the first to read new articles from eLife