1. Neuroscience
Download icon

SpikeForest, reproducible web-facing ground-truth validation of automated neural spike sorters

  1. Jeremy Magland  Is a corresponding author
  2. James J Jun
  3. Elizabeth Lovero
  4. Alexander J Morley
  5. Cole Lincoln Hurwitz
  6. Alessio Paolo Buccino
  7. Samuel Garcia
  8. Alex H Barnett
  1. Center for Computational Mathematics, Flatiron Institute, United States
  2. Scientific Computing Core, Flatiron Institute, United States
  3. Medical Research Council Brain Network Dynamics Unit, University of Oxford, United Kingdom
  4. Institute for Adaptive and Neural Computation Informatics, University of Edinburgh, United Kingdom
  5. Centre for IntegrativeNeuroplasticity (CINPLA), University of Oslo, Norway
  6. Centre de Recherche en Neuroscience de Lyon, Université de Lyon, France
Tools and Resources
Cite this article as: eLife 2020;9:e55167 doi: 10.7554/eLife.55167
8 figures, 2 tables and 1 additional file


Simplified flow diagram of the SpikeForest analysis pipeline.

Each in a collection of spike sorting codes (top) are run on each recording with ground truth (left side) to yield a large matrix of sorting results and accuracy metrics (right). See the section on comparison with ground truth for mathematical notations. Recordings are grouped into ‘studies’, and those into ‘study sets’; these share features such as probe type and laboratory of origin. The web interface summarizes the results table by grouping them into study sets (as in Figure 2), but also allows drilling down to the single study and recording level. Aspects such as extraction of mean waveforms, representative firing events, and computation of per-unit SNR are not shown, for simplicity.

Main results table from the SpikeForest website showing aggregated results for 10 algorithms applied to 13 registered study sets.

The left columns of the table show the average accuracy (see (5)) obtained from averaging over all ground-truth units with SNR above an adjustable threshold, here set to 8. The right columns show the number of ground-truth units with accuracy above an adjustable threshold, here set to 0.8. The first five study sets contain paired recordings with simultaneous extracellular and juxta- or intra-cellular ground truth acquisitions. The next six contain simulations from various software packages. The SYNTH_JANELIA, obtained from Pachitariu et al., 2019, is simulated noise with realistic spike waveforms superimposed at known times. The last study set is a collection of human-curated tetrode data. An asterisk indicates an incomplete (timed out) or failed sorting on a subset of results; in these cases, missing accuracies are imputed using linear regression as described in the Materials and methods. Empty cells correspond to excluded sorter/study set pairs. These results reflect the analysis run of March 23rd, 2020.

Screenshots from the SpikeForest website.

(left) Scatter plot of accuracy vs. SNR for each ground-truth unit, for a particular sorter (KiloSort2) and study (a simulated drift dataset from the SYNTH_JANELIA study set). The SNR threshold for the main table calculation is shown as a dashed line, and the user-selected unit is highlighted. Marker area is proportional to the number of events, and the color indicates the particular recording within the study. (right) A subset of spike waveforms (overlaid) corresponding to the selected ground truth unit, in four categories: ground truth, sorted, false negative, and false positive.

Results table from the SpikeForest website, similar to the left side of Figure 2 except showing aggregated precision and recall scores rather than accuracy.

Precision measures how well the algorithm avoids false positives, whereas recall is the complement of the false negative rate. An asterisk indicates an incomplete (timed out) or failed sorting on a subset of results; in these cases, missing accuracies are imputed using linear regression as described in the Materials and methods. Empty cells correspond to excluded sorter/study set pairs. These results reflect the analysis run of March 23rd, 2020.

Relationship between ground-truth accuracy and three quality metrics for all sorted units (with SNR ≥5), for the SYNTH_JANELIA tetrode study and five spike sorting algorithms.

Each marker represents a sorted unit. The x-axis of the plots in the final column is the predicted accuracy via linear regression using all three predictors (SNR, firing rate, and log ISI-vr).

Interaction of software and hardware components of the SpikeForest system, showing the flow of data from the server-side analysis (left) to the user’s web browser (right).

The processing pipeline automatically detects which sorting jobs need to be updated and runs these in parallel as needed on a compute cluster. Processing results are uploaded to two databases, one for relatively small JSON files and the other for large binary content. A NodeJS application pulls data from these databases in order to show the most up-to-date results on the front-end website.

Author response image 1
Accuracy vs. duration for nine sorters applied to a study set of simulated recordings.
Author response image 2
Same as Author response image 1, except that the simulations included synthetic drift.


Table 1
Table of spike sorting algorithms currently included in the SpikeForest analysis.

Each algorithm is registered into the system via a Python wrapper. A Docker recipe defines the operating system and environment where the sorter is run. Algorithms with asterisks were updated and optimized using SpikeForest data. For the other algorithms, we used the default or recommended parameters.

Sorting algorithmLanguageNotes
HerdingSpikes2*PythonDesigned for large-scale, high-density multielectrode arrays. See Hilgen et al., 2017.
IronClust*MATLAB and CUDADerived from JRCLUST. See Jun et al., in preparation.
JRCLUSTMATLAB and CUDADesigned for high-density silicon probes. See Jun et al., 2017a.
KiloSortMATLAB and CUDATemplate matching. See Pachitariu et al., 2016.
KiloSort2MATLAB and CUDADerived from KiloSort. See Pachitariu et al., 2019.
KlustaPythonExpectation-Maximization masked clustering. See Rossant et al., 2016.
MountainSort4Python and C++Density-based clustering via ISO-SPLIT. See Chung et al., 2017.
SpyKING CIRCUS*Python and MPIDensity-based clustering and template matching. See Yger et al., 2018.
Tridesclous*Python and OpenCLSee Garcia and Pouzat, 2019.
WaveClusMATLABSuperparamagnetic clustering. See Chaure et al., 2018; Quiroga et al., 2004.
Table 2
Table of study sets currently included in the SpikeForest analysis.

Study sets fall into three categories: paired, synthetic, and curated. Each study set comprises one or more studies, which in turn comprise multiple recordings acquired or generated under the same conditions.

Study set# Rec. / # Elec. / Dur.Source lab.Description
Paired intra/extracellular
PAIRED_BOYDEN19 / 32ch / 6-10minE. BoydenSubselected from 64, 128, or 256-ch. probes, mouse cortex
PAIRED_CRCNS_HC193 / 4-6ch / 6-12minG. BuzsakiTetrodes or silicon probe (one shank) in rat hippocampus
PAIRED_ENGLISH29 / 4-32ch / 1-36minD. EnglishHybrid juxtacellular-Si probe, behaving mouse, various regions
PAIRED_KAMPFF15 / 32ch / 9-20minA. KampffSubselected from 374, 127, or 32-ch. probes, mouse cortex
PAIRED_MEA64C_YGER18 / 64ch / 5minO. MarreSubselected from 252-ch. MEA, mouse retina
PAIRED_MONOTRODE100 / 1ch / 5-20minBoyden, Kampff, Marre, BuzsakiSubselected from paired recordings from four labs
SYNTH_BIONET36 / 60ch / 15minAIBSBioNet simulation containing no drift, monotonic drift, and random jumps; used by JRCLUST, IronClust
SYNTH_JANELIA60 / 4-64ch / 5-20minM. PachitariuDistributed with KiloSort2, with and without simulated drift
SYNTH_MAGLAND80 / 8ch / 10minFlatiron Inst.Synthetic waveforms, Gaussian noise, varying SNR, channel count and unit count
SYNTH_MEAREC_NEURONEX60 / 32ch / 10minA. BuccinoSimulated using MEAREC, varying SNR and unit count
SYNTH_MEAREC_TETRODE40 / 4ch / 10minA. BuccinoSimulated using MEAREC, varying SNR and unit count
SYNTH_MONOTRODE111 / 1ch / 10minQ. QuirogaSimulated by Quiroga lab by mixing averaged real spike waveforms
SYNTH_VISAPY6 / 30ch / 5minG. EinevollGenerated using VISAPy simulator
Human curated
MANUAL_FRANKLAB21 / 4ch / 10-40minL. FrankThree manual curations of the same recordings

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)