1. Neuroscience
Download icon

A spike sorting toolbox for up to thousands of electrodes validated with ground truth recordings in vitro and in vivo

  1. Pierre Yger
  2. Giulia LB Spampinato
  3. Elric Esposito
  4. Baptiste Lefebvre
  5. Stéphane Deny
  6. Christophe Gardella
  7. Marcel Stimberg
  8. Florian Jetter
  9. Guenther Zeck
  10. Serge Picaud
  11. Jens Duebel
  12. Olivier Marre  Is a corresponding author
  1. Institut de la Vision, INSERM UMRS 968, UPMC UM 80, France
  2. Laboratoire de Physique Statistique, CNRS, ENS, UPMC, 75005, France
  3. NMI, Neurophysics Group, Germany
Tools and Resources
Cite this article as: eLife 2018;7:e34518 doi: 10.7554/eLife.34518
4 figures, 1 table, 2 data sets and 1 additional file


Figure 1 with 1 supplement
Main steps of the spike sorting algorithm.

(A) Five randomly chosen electrodes, each of them with its own detection threshold (dash dotted line). Detected spikes, as threshold crossings, are indicated with red markers (B) Example of a spike in the raw data. Red: electrodes that can be affected by the spike, that is the ones close enough to the electrode where the voltage peak is the highest; gray: other electrodes that should not be affected. (C) Example of two clusters (red and blue)with associated templates. Green points show possible combinations of two overlapping spikes from the two cells for various time delays. (D) Graphical illustration of the template matching for in vitro data (see Materials and methods). Every line is a electrode. Grey: real data. Red: sum of the templates added by the template matching algorithm; top to bottom: successive steps of the template matching algorithm. E. Final result of the template matching. Same legend as (D, F) Examples of the fitted amplitudes for the first component of a given template as a function of time. Each dot correspond to a spike time at which this particular template was fitted to the data. Dashed dotted lines represent the amplitude thresholds (see Materials and methods).

Figure 1—figure supplement 1
Schematic of the parallel clustering of the spikes, in a toy example with two electrodes.

(A) Pre-clustering step. The different snippets are sorted according to the electrode where they peak. This divides a set of snippets in Nelec groups. Each of these groups is then processed independently. (B) Each group of snippets is projected in a low-dimension space, where clustering is performed using a density-based approach (see text and Materials and methods). (C) A template is extracted from each cluster and used for the template matching step.

Performance of the algorithm on ground truth datasets.

(A) Top: Schematic of the experimental protocol in vitro. A neuron close to the multielectrode array (MEA) recording is recorded in loose patch. Bottom: Image of the patch electrode on top of a 252 electrodes MEA, recording a ganglion cell. (B) Top, pink box: loose patch recording showing the spikes of the recorded neuron. Bottom, green box: Extra-cellular recordings next to the loose patched soma. Each line is a different electrode (C) Error rate of the algorithm as function of the largest peak amplitude of the ground-truth neuron, recorded extracellularly in vitro. (D) Comparison between the error rates produced by the algorithm on different ground truth datasets and the error rates of nonlinear classifiers (Best Ellipsoidal Error Rate) trained to detect the spikes for in vitro data (Spampinato et al., 2018). (E) Comparison of average performance for all neurons detected by the Optimal Classifier with an error less than 10% (n = 37). F. Same as D. but for in vivo data (Neto et al., 2016) (see Materials and methods).

Scaling to thousands of electrodes.

(A) Execution time as function of the number of processors for a 90 min dataset in vitro with 252 electrodes, expressed as a real-time ratio, that is the number of hours necessary to process one hour of data. (B) Execution time as function of the number of electrodes for a 30 min dataset recorded in vitro with 4225 electrodes. (C) Execution time as function of the number of templates for a 10-min synthetic dataset with 30 electrodes. (D) Creation of ‘hybrid’ datasets: chosen templates are injected elsewhere in the data as new templates. Dashed-dotted lines shows the detection threshold on the main electrode for a given template, and normalized amplitude is expressed relative to this threshold (see Materials and methods). (E) Mean error rate as function of the firing rate of injected templates, in various datasets. Errors bars show standard error over eight templates (F) Error rate as function of the normalized amplitude of injected templates, in various datasets. Errors bars show standard error over nine different templates (G) Injected and recovered cross-correlation value between pairs of neurons for five templates injected at 10 Hz, with a normalized amplitude of 2 (see Materials and methods).

Automated merging.

(A) Dip estimation (y-axis) compared to the geometrical mean of the firing rate (x-axis) for all pairs of units and artificially generated and split spike trains (see Materials and methods). Blue: pairs of templates originating from thesame neuron that have to be merged. Orange: pairs of templates corresponding to different cells. Panels on the right: for two chosen pairs, one that needs to be merged (in blue, top panel) and one should not be merged (orange, bottom panel) the full cross-correlogram and the geometrical mean of the firing rate (dashed line). The average correlation is estimated in the temporal window defined by the gray area. (B) Same as A, for a ground truth dataset. Blue points: points that need to be merged. Green points: pairs that should not be merged. Orange points: pairs where our ground truth data does not allow us telling if the pair should be merged or not. The gray area corresponds to the region where pairs are merged by the algorithm. Inset: zoom on one region of the graph. (C) Average error rate in the case where the decision of merging units was guided by the ground truth data (left) against the automated strategy designed here (right).



Table 1
Table of all the variables and notations found in the algorithm.
VariableExplanationDefault value
Generic notations


Number of electrodes


Physical position of electrode k [μm]


Ensemble of nearby electrodes for electrode k [μm]


Cardinal of Gk


Spike detection threshold for electrode k [μV]


Raw data [μV]


First component of the template for neuron j [μV]


Second component of the template for neuron j [μV]


Sampling frequency of the signal [Hz]
Preprocessing of the data


Cutoff frequency for butterworth filtering100 Hz


Temporal width for the templates5 ms


Spatial radius for the templates250 μm


Gain for threshold detection for channel k (θk)6


Number of waveforms collected per electrode10000


Number PCA features kept to describe a waveform5
Clustering and template estimation


l spikes peaking on electrode k and projected after PCA


Density around xlk


Minimal distance from xlk to spikes with higher densities


Number of spikes collected per electrode for clustering10000


Number of PCA features kept to describe a spike5


Number of neighbors for density estimation100


Maximal number of clusters per electrode10


Normalized distance between pairs of clusters


Threshold for merging clusters on the same electrode3


Centroid of the cluster m


Dispersion around the centroid αm


Minimal size of a cluster (in percent of Nspikes)0.005


Amplitudes allowed during fitting for a given template
Dictionary cleaning


Max over time for the Cross-correlation between wm and wn


Threshold above which templates are considered as similar0.975
Template matching


Product between s(t) and wj (normalized) at time ti


Same as aij but for the second component vj


Number of fitting attempts for a given spike time3
Automated merging


Similarity threshold to consider neurons as a putative pair0.8


Cross correlogram between spikes of unit m and n


Geometrical mean of the firing rates for units m and n [Hz2]


Maximal value for the dip in the cross correlogram at time 00.1 [Hz2]

Data availability

The following data sets were generated
  1. 1
The following previously published data sets were used
  1. 1

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)