Efficient and accurate extraction of in vivo calcium signals from microendoscopic video data
Abstract
In vivo calcium imaging through microendoscopic lenses enables imaging of previously inaccessible neuronal populations deep within the brains of freely moving animals. However, it is computationally challenging to extract singleneuronal activity from microendoscopic data, because of the very large background fluctuations and high spatial overlaps intrinsic to this recording modality. Here, we describe a new constrained matrix factorization approach to accurately separate the background and then demix and denoise the neuronal signals of interest. We compared the proposed method against previous independent components analysis and constrained nonnegative matrix factorization approaches. On both simulated and experimental data recorded from mice, our method substantially improved the quality of extracted cellular signals and detected more wellisolated neural signals, especially in noisy data regimes. These advances can in turn significantly enhance the statistical power of downstream analyses, and ultimately improve scientific conclusions derived from microendoscopic data.
https://doi.org/10.7554/eLife.28728.001Introduction
Monitoring the activity of largescale neuronal ensembles during complex behavioral states is fundamental to neuroscience research. Continued advances in optical imaging technology are greatly expanding the size and depth of neuronal populations that can be visualized. Specifically, in vivo calcium imaging through microendoscopic lenses and the development of miniaturized microscopes have enabled deep brain imaging of previously inaccessible neuronal populations of freely moving mice (Flusberg et al., 2008; Ghosh et al., 2011; Ziv and Ghosh, 2015). This technique has been widely used to study the neural circuits in cortical, subcortical, and deep brain areas, such as hippocampus (Cai et al., 2016; Ziv et al., 2013; Jimenez et al., 2018; Rubin et al., 2015), entorhinal cortex (Kitamura et al., 2015; Sun et al., 2015), hypothalamus (Jennings et al., 2015), prefrontal cortex (PFC) (Pinto and Dan, 2015), premotor cortex (Markowitz et al., 2015), dorsal pons (Cox et al., 2016), basal forebrain (Harrison et al., 2016), striatum (Barbera et al., 2016; Carvalho Poyraz et al., 2016; Klaus et al., 2017), amygdala (Yu et al., 2017), and other brain regions.
Although microendoscopy has potential applications across numerous neuroscience fields (Ziv and Ghosh, 2015), methods for extracting cellular signals from this data are currently limited and suboptimal. Most existing methods are specialized for twophoton or lightsheet microscopy. However, these methods are not suitable for analyzing singlephoton microendoscopic data because of its distinct features: specifically, this data typically displays large, blurry background fluctuations due to fluorescence contributions from neurons outside the focal plane. In Figure 1, we use a typical microendoscopic dataset to illustrate these effects (see Video 1 for raw video). Figure 1A shows an example frame of the selected data, which contains large signals additional to the neurons visible in the focal plane. These extra fluorescence signals contribute as background that contaminates the singleneuronal signals of interest. In turn, standard methods based on local correlations for visualizing cell outlines (Smith and Häusser, 2010) are not effective here, because the correlations in the fluorescence of nearby pixels are dominated by background signals (Figure 1B). For some neurons with strong visible signals, we can manually draw regionsofinterest (ROI) (Figure 1C). Following (Barbera et al., 2016; Pinto and Dan, 2015), we used the mean fluorescence trace of the surrounding pixels (blue, Figure 1D) to roughly estimate this background fluctuation; subtracting it from the raw trace in the neuron ROI yields a relatively good estimation of neuron signal (red, Figure 1D). Figure 1D shows that the background (blue) has much larger variance than the relatively sparse neural signal (red); moreover, the background signal fluctuates on similar timescales as the singleneuronal signal, so we can not simply temporally filter the background away after extraction of the mean signal within the ROI. This large background signal is likely due to a combination of local fluctuations resulting from outoffocus fluorescence or neuropil activity, hemodynamics of blood vessels, and global fluctuations shared more broadly across the field of view (photobleaching effects, drifts in $z$ of the focal plane, etc.), as illustrated schematically in Figure 1E.
The existing methods for extracting individual neural activity from microendoscopic data can be divided into two classes: semimanual ROI analysis (Barbera et al., 2016; Klaus et al., 2017; Pinto and Dan, 2015) and PCA/ICA analysis (Mukamel et al., 2009). Unfortunately, both approaches have wellknown flaws (Resendez et al., 2016). For example, ROI analysis does not effectively demix signals of spatially overlapping neurons, and drawing ROIs is laborious for large population recordings. More importantly, in many cases, the background contaminations are not adequately corrected, and thus the extracted signals are not sufficiently clean enough for downstream analyses. As for PCA/ICA analysis, it is a linear demixing method and therefore typically fails when the neural components exhibit strong spatial overlaps (Pnevmatikakis et al., 2016), as is the case in the microendoscopic setting.
Recently, constrained nonnegative matrix factorization (CNMF) approaches were proposed to simultaneously denoise, deconvolve, and demix calcium imaging data (Pnevmatikakis et al., 2016). However, current implementations of the CNMF approach were optimized for $2$photon and lightsheet microscopy, where the background has a simpler spatiotemporal structure. When applied to microendoscopic data, CNMF often has poor performance because the background is not modeled sufficiently accurately (Barbera et al., 2016).
In this paper, we significantly extend the CNMF framework to obtain a robust approach for extracting singleneuronal signals from microendoscopic data. Specifically, our extended CNMF for microendoscopic data (CNMFE) approach utilizes a more accurate and flexible spatiotemporal background model that is able to handle the properties of the strong background signal illustrated in Figure 1, along with new specialized algorithms to initialize and fit the model components. After a brief description of the model and algorithms, we first use simulated data to illustrate the power of the new approach. Next, we compare CNMFE with PCA/ICA analysis comprehensively on both simulated data and four experimental datasets recorded in different brain areas. The results show that CNMFE outperforms PCA/ICA in terms of detecting more wellisolated neural signals, extracting higher signaltonoise ratio (SNR) cellular signals, and obtaining more robust results in low SNR regimes. Finally, we show that downstream analyses of calcium imaging data can substantially benefit from these improvements.
Model and model fitting
CNMF for microendoscope data (CNMFE)
The recorded video data can be represented by a matrix $Y\in {\mathbb{R}}_{+}^{d\times T}$, where $d$ is the number of pixels in the field of view and $T$ is the number of frames observed. In our model, each neuron $i$ is characterized by its spatial ‘footprint’ vector ${\mathit{\bm{a}}}_{i}\in {\mathbb{R}}_{+}^{d}$ characterizing the cell’s shape and location, and ‘calcium activity’ timeseries ${\mathit{\bm{c}}}_{i}\in {\mathbb{R}}_{+}^{T}$, modeling (up to a multiplicative and additive constant) cell $i$’s mean fluorescence signal at each frame. Here, both ${\mathit{\bm{a}}}_{i}$ and ${\mathit{\bm{c}}}_{i}$ are constrained to be nonnegative because of their physical interpretations. The background fluctuation is represented by a matrix $B\in {\mathbb{R}}_{+}^{d\times T}$. If the field of view contains a total number of $K$ neurons, then the observed movie data is modeled as a superposition of all neurons’ spatiotemporal activity, plus timevarying background and additive noise:
where $A=[{\mathit{\bm{a}}}_{1},\mathrm{\dots},{\mathit{\bm{a}}}_{K}]$ and $C={[{\mathit{\bm{c}}}_{1},\mathrm{\dots},{\mathit{\bm{c}}}_{K}]}^{T}$. The noise term $E\in {\mathbb{R}}^{d\times T}$ is modeled as Gaussian, $E(t)\sim \mathcal{\mathcal{N}}(\mathbf{0},\mathrm{\Sigma})$ is a diagonal matrix, indicating that the noise is spatially and temporally uncorrelated.
Estimating the model parameters $A,C$ in model (1) gives us all neurons’ spatial footprints and their denoised temporal activity. This can be achieved by minimizing the residual sum of squares (RSS), aka the Frobenius norm of the matrix $Y(AC+B)$,
while requiring the model variables $A,C$ and $B$ to follow the desired constraints, discussed below.
Constraints on neuronal spatial footprints $A$ and neural temporal traces $C$
Each spatial footprint ${\mathit{\bm{a}}}_{i}$ should be spatially localized and sparse, since a given neuron will cover only a small fraction of the field of view, and therefore most elements of ${\mathit{\bm{a}}}_{i}$ will be zero. Thus, we need to incorporate spatial locality and sparsity constraints on $A$ (Pnevmatikakis et al., 2016). We discuss details further below.
Similarly, the temporal components ${\mathit{\bm{c}}}_{i}$ are highly structured, as they represent the cells’ fluorescence responses to sparse, nonnegative trains of action potentials. Following (Vogelstein et al., 2010; Pnevmatikakis et al., 2016), we model the calcium dynamics of each neuron ${\mathit{\bm{c}}}_{i}$ with a stable autoregressive (AR) process of order $p$,
where ${s}_{i}(t)\ge 0$ is the number of spikes that neuron fired at the $t$th frame. (Note that there is no further noise input into ${c}_{i}(t)$ beyond the spike signal ${s}_{i}(t)$.) The AR coefficients $\{{\gamma}_{j}^{(i)}\}$ are different for each neuron and they are estimated from the data. In practice, we usually pick $p=2$, thus incorporating both a nonzero rise and decay time of calcium transients in response to a spike; then Equation (3) can be expressed in matrix form as
The neural activity ${\mathit{\bm{s}}}_{i}$ is nonnegative and typically sparse; to enforce sparsity, we can penalize the ${\mathrm{\ell}}_{0}$ (Jewell and Witten, 2017) or ${\mathrm{\ell}}_{1}$ (Pnevmatikakis et al., 2016; Vogelstein et al., 2010) norm of ${\mathit{\bm{s}}}_{i}$, or limit the minimum size of nonzero spike counts (Friedrich et al., 2017b). When the rise time constant is small compared to the timebin width (low imaging frame rate), we typically use a simpler AR(1) model (with an instantaneous rise following a spike) (Pnevmatikakis et al., 2016).
Constraints on background activity $B$
In the above we have largely followed previously described CNMF approaches (Pnevmatikakis et al., 2016) for modeling calcium imaging signals. However, to accurately model the background effects in microendoscopic data, we need to depart significantly from these previous approaches. Constraints on the background term $B$ in Equation (1) are essential to the success of CNMFE, since clearly, if $B$ is completely unconstrained we could just absorb the observed data $Y$ entirely into $B$, which would lead to recovery of no neural activity. At the same time, we need to prevent the residual of the background term (i.e. $B\widehat{B}$, where $\widehat{B}$ denotes the estimated spatiotemporal background) from corrupting the estimated neural signals $AC$ in model (1), since subsequently, the extracted neuronal activity would be mixed with background fluctuations, leading to artificially high correlations between nearby cells. This problem is even worse in the microendoscopic context because the background fluctuation usually has significantly larger variance than the isolated cellular signals of interest (Figure 1D), and therefore any small errors in the estimation of $B$ can severely corrupt the estimated neural signal $AC$.
In (Pnevmatikakis et al., 2016), $B$ is modeled as a rank$1$ nonnegative matrix $B=\mathit{\bm{b}}\cdot {\mathit{\bm{f}}}^{T}$, where $\mathit{\bm{b}}\in {\mathbb{R}}_{+}^{d}$ and $\mathit{\bm{f}}\in {\mathbb{R}}_{+}^{T}$. This model mainly captures the global fluctuations within the field of view (FOV). In applications to twophoton or lightsheet data, this rank1 model has been shown to be sufficient for relatively small spatial regions; the simple lowrank model does not hold for larger fields of view, and so we can simply divide large FOVs into smaller patches for largely parallel processing (Pnevmatikakis et al., 2016; Giovannucci et al., 2017b). (See [Pachitariu et al., 2016] for an alternative approach.) However, as we will see below, the local rank1 model fails in many microendoscopic datasets, where multiple large overlapping background sources exist even within modestly sized FOVs.
Thus, we propose a new model to constrain the background term $B$. We first decompose the background into two terms:
where ${B}^{f}$ represents fluctuating activity and ${B}^{c}={\mathit{\bm{b}}}_{0}\cdot {\mathrm{\U0001d7cf}}^{T}$ models constant baselines ($\mathrm{\U0001d7cf}\in {\mathbb{R}}^{T}$ denotes a vector of $T$ ones). To model ${B}^{f}$, we exploit the fact that background sources (largely due to blurred outoffocus fluorescence) are empirically much coarser spatially than the average neuron soma size $l$. Thus, we model ${B}^{f}$ at one pixel as a linear combination of the background fluorescence in pixels which are chosen to be nearby but not nearest neighbors:
where $\mathrm{\Omega}}_{i}=\{j\text{}\text{dist}({\mathit{x}}_{i},{\mathit{x}}_{j})\in [{l}_{n},{l}_{n}+1)\$, with $\text{dist}({\mathit{\bm{x}}}_{i},{\mathit{\bm{x}}}_{j})$ the Euclidean distance between pixel $i$ and $j$. Thus, ${\mathrm{\Omega}}_{i}$ only selects the neighboring pixels with a distance of ${l}_{n}$ from the $i$th pixel (the green dot and black pixels in Figure 2B illustrate $i$ and ${\mathrm{\Omega}}_{i}$, respectively); here ${l}_{n}$ is a parameter that we choose to be greater than $l$ (the size of the typical soma in the FOV), e.g., ${l}_{n}=2l$. This choice of ${l}_{n}$ ensures that pixels $i$ and $j$ in Equation (6) share similar background fluctuations, but do not belong to the same soma.
We can rewrite Equation (6) in matrix form:
where ${W}_{ij}=0$ if $\text{dist}({\mathit{\bm{x}}}_{i},{\mathit{\bm{x}}}_{j})\notin [{l}_{n},{l}_{n}+1)$. In practice, this hard constraint is difficult to enforce computationally and is overly stringent given the noisy observed data. We relax the model by replacing the righthand side ${B}^{f}$ with the more convenient closedform expression
According to Equations (1) and (5), this change ignores the noise term $E$; since elements in $E$ are spatially uncorrelated, $W\cdot E$ contributes as a very small disturbance to ${\widehat{B}}^{f}$ in the lefthand side. We found this substitution for ${\widehat{B}}^{f}$ led to significantly faster and more robust model fitting.
Fitting the CNMFE model
Table 1 lists the variables in the proposed CNMFE model. Now we can formulate the estimation of all model variables as a single optimization metaproblem:
We call this a ‘metaproblem’ because we have not yet explicitly defined the sparsity and spatial locality constraints on $A$ and $S={[{\mathit{\bm{s}}}_{1},\mathrm{\dots},{\mathit{\bm{s}}}_{K}]}^{T}$; these can be customized by users under different assumptions (see details in Materials and methods). Also note that ${\mathit{\bm{s}}}_{i}$ is completely determined by ${\mathit{\bm{c}}}_{i}$ and ${G}^{(i)}$, and ${B}^{f}$ is not optimized explicitly but (as discussed above) can be estimated as $W\cdot (YAC{\mathit{\bm{b}}}_{0}\cdot {\mathrm{\U0001d7cf}}^{T})$, so we optimize with respect to $W$ instead.
The problem (PAll) optimizes all variables together and is nonconvex but can be divided into three simpler subproblems that we solve iteratively:
Estimating $A\mathrm{,}{b}_{\mathrm{0}}$ given $\widehat{C}\mathrm{,}{\widehat{B}}^{f}$
Estimating $C\mathrm{,}{b}_{\mathrm{0}}$ given $\widehat{A}\mathrm{,}{\widehat{B}}^{f}$
Estimating $W\mathrm{,}{b}_{\mathrm{0}}$ given $\widehat{A}\mathrm{,}\widehat{C}$
For each of these subproblems, we are able to use wellestablished algorithms (e.g. solutions for (PS) and (PT) are discussed in Friedrich et al., 2017a; Pnevmatikakis et al., 2016) or slight modifications thereof. By iteratively solving these three subproblems, we obtain tractable updates for all model variables in problem (PAll). Furthermore, this strategy gives us the flexibility of further potential interventions (either automatic or semimanual) in the optimization procedure, for example, incorporating further prior information on neurons’ morphology, or merging/splitting/deleting spatial components and detecting missed neurons from the residuals. These steps can significantly improve the quality of the model fitting; this is an advantage compared with PCA/ICA, which offers no easy option for incorporation of stronger prior information or manually guided improvements on the estimates.
Full details on the algorithms for initializing and then solving these three subproblems are provided in the Materials and methods section.
Results
CNMFE can reliably estimate large highrank background fluctuations
We first use simulated data to illustrate the background model in CNMFE and compare its performance against the lowrank NMF model used in the basic CNMF approach (Pnevmatikakis et al., 2016). We generated the observed fluorescence $Y$ by summing up simulated fluorescent signals of multiple sources as shown in Figure 1E plus additive Gaussian white noise (Figure 2A).
An example pixel (green dot, Figure 2A,B) was selected to illustrate the background model in CNMFE (Equation (6)), which assumes that each pixel’s background activity can be reconstructed using its neighboring pixels’ activities. The selected neighbors form a ring and their distances to the center pixel are larger than a typical neuron size (Figure 2B). Figure 2C shows that the fluorescence traces of the center pixel and its neighbors are highly correlated due to the shared large background fluctuations. Here, for illustrative purposes, we fit the background by solving problem (PB) directly while assuming $\widehat{A}\widehat{C}=0$. This mistaken assumption should make the background estimation more challenging (due to true neural components getting absorbed into the background), but nonetheless in Figure 2 we see that the background fluctuation was well recovered (Figure 2D). Subtracting this estimated background from the observed fluorescence in the center yields a good visualization of the cellular signal (Figure 2D). Thus, this example shows that we can reconstruct a complicated background trace while leaving the neural signal uncontaminated.
For the example frame in Figure 2A, the true cellular signals are sparse and weak (Figure 2E). When we subtract the estimated background using CNMFE from the raw data, we obtain a good recovery of the true signal (Figure 2D,F). For comparison, we also estimate the background activity by applying a rank$1$ NMF model as used in basic CNMF; the resulting backgroundsubtracted image is still severely contaminated by the background (Figure 2G). This is easy to understand: the spatiotemporal background signal in microendoscopic data typically has a rank higher than one, due to the various signal sources indicated in Figure 1E), and therefore a rank$1$ NMF background model is insufficient.
A naive approach would be to simply increase the rank of the NMF background model. Figure 2H demonstrates that this approach is ineffective: higher rank NMF does yield generally better reconstruction performance, but with high variability and low reliability (due to randomness in the initial conditions of NMF). Eventually as the NMF rank increases many singleneuronal signals of interest are swallowed up in the estimated background signal (data not shown). In contrast, CNMFE recovers the background signal more accurately than any of the highrank NMF models.
In real data analysis settings, the rank of NMF is an unknown and the selection of its value is a nontrivial problem. We simulated data sets with different numbers of local background sources and use a single parameter setting to run CNMFE for reconstructing the background over multiple such simulations. Figure 2I shows that the performance of CNMFE does not degrade quickly as we have more background sources, in contrast to rank$1$ NMF. Therefore, CNMFE can recover the background accurately across a diverse range of background sources, as desired.
CNMFE accurately initializes singleneuronal spatial and temporal components
Next, we used simulated data to validate our proposed initialization procedure (Figure 3A). In this example, we simulated 200 neurons with strong spatial overlaps (Figure 3B). One of the first steps in our initialization procedure is to apply a Gaussian spatial filter to the images to reduce the (spatially coarser) background and boost the power of neuronsized objects in the images. In Figure 3C, we see that the local correlation image (Smith and Häusser, 2010) computed on the spatially filtered data provides a good initial visualization of neuron locations; compare to Figure 1B, where the correlation image computed on the raw data was highly corrupted by background signals.
We choose two example ROIs to illustrate how CNMFE removes the background contamination and demixes nearby neural signals for accurate initialization of neurons’ shapes and activity. In the first example, we choose a wellisolated neuron (green box, Figure 3A+B). We select three pixels located in the center, the periphery, and the outside of the neuron and show the corresponding fluorescence traces in both the raw data and the spatially filtered data (Figure 3D). The raw traces are noisy and highly correlated, but the filtered traces show relatively clean neural signals. This is because spatial filtering reduces the shared background activity and the remaining neural signals dominate the filtered data. Similarly, Figure 3E is an example showing how CNMFE demixes two overlapping neurons. The filtered traces in the centers of the two neurons still preserve their own temporal activity.
After initializing the neurons’ traces using the spatially filtered data, we initialize our estimate of their spatial footprints. Note that simply initializing these spatial footprints with the spatially filtered data does not work well (data not shown), since the resulting shapes are distorted by the spatial filtering process. We found that it was more effective to initialize each spatial footprint by regressing the initial neuron traces onto the raw movie data (see Materials and methods for details). The initial values already match the simulated ground truth with fairly high fidelity (Figure 3D+E). In this simulated data, CNMFE successfully identified all 200 neurons and initialized their spatial and temporal components (Figure 3F). We then evaluate the quality of initialization using all neurons’ spatial and temporal similarities with their counterparts in the ground truth data. Figure 3G shows that all initialized neurons have high similarities with the truth, indicating a good recovery and demixing of all neuron sources.
Thresholds on the minimum local correlation and the minimum peaktonoise ratio (PNR) for detecting seed pixels are necessary for defining the initial spatial components. To quantify the sensitivity of choosing these two thresholds, we plot the local correlations and the PNRs of all pixels chosen as the local maxima within an area of $\frac{l}{4}\times \frac{l}{4}$, where $l$ is the diameter of a typical neuron, in the correlation image or the PNR image (Figure 3H). Pixels are classified into two classes according to their locations relative to the closest neurons: neurons’ central areas and outside areas (see Materials and methods for full details). It is clear that the two classes are linearly well separated and the thresholds can be chosen within a broad range of values (Figure 3H), indicating that the algorithm is robust with respect to these threshold parameters here. In lower SNR settings, these boundaries may be less clear, and an incremental approach (in which we choose the highestSNR neurons first, then estimate the background and examine the residual to select the lowestSNR cells) may be preferred; this incremental approach is discussed in more depth in the Materials and methods section.
CNMFE recovers the true neural activity and is robust to noise contamination and neuronal correlations in simulated data
Using the same simulated dataset as in the previous section, we further refine the neuron shapes ($A$) and the temporal traces ($C$) by iteratively fitting the CNMFE model. We compare the final results with PCA/ICA analysis (Mukamel et al., 2009) and the original CNMF method (Pnevmatikakis et al., 2016).
After choosing the thresholds for seed pixels (Figure 3H), we run CNMFE in full automatic mode, without any manual interventions. Two opensource MATLAB packages, CellSort (https://github.com/mukamellab/CellSort; Mukamel, 2016) and ca_source_extraction (https://github.com/epnev/ca_source_extraction; Pnevmatikakis, 2016), were used to perform PCA/ICA (Mukamel et al., 2009) and basic CNMF (Pnevmatikakis et al., 2016), respectively. Since the initialization algorithm in CNMF fails due to the large contaminations from the background fluctuations in this setting (recall Figure 2), we use the ground truth as its initialization. As for the rank of the background model in CNMF, we tried all integer values between 1 and 16 and set it as 7 because it has the best performance in matching the ground truth. We emphasize that including the CNMF approach in this comparison is not fair for the other two approaches, because it uses the ground truth heavily, while PCA/ICA and CNMFE are blind to the ground truth. The purpose here is to show the limitations of basic CNMF in modeling the background activity in microendoscopic data.
We first pick three closeby neurons from the ground truth (Figure 4A, top) and see how well these neurons’ activities are recovered. PCA/ICA fails to identify one neuron, and for the other two identified neurons, it recovers temporal traces that are sufficiently noisy that small calcium transients are submerged in the noise. As for CNMF, the neuron shapes remain more or less at the initial condition (i.e. the ground truth spatial footprints), but clear contaminations in the temporal traces are visible. This is because the pure NMF model in CNMF does not model the true background well and the residuals in the background are mistakenly captured by neural components. In contrast, on this example, CNMFE recovers the true neural shapes and neural activity with high accuracy.
We also compare the number of detected neurons: PCA/ICA detected 195 out of 200 neurons, while CNMFE detected all 200 neurons. We also quantitatively evaluated the performance of source extraction by showing the spatial and temporal cosine similarities between detected neurons and ground truth (Figure 4C); we find that the neurons detected using PCA/ICA have much lower similarities with the ground truth (Figure 4C). We also note that the CNMF results are much worse than those of CNMFE here, despite the fact that CNMF is initialized at the ground truth parameter values. This result clarifies an important point: the improvements from CNMFE are not simply due to improvements in the initialization step. Furthermore, running the full iterative pipeline of CNMFE leads to improvements in both spatial and temporal similarities, compared with the results in the initialization step.
In many downstream analyses of calcium imaging data, pairwise correlations provide an important metric to study coordinated network activity (Warp et al., 2012; Barbera et al., 2016; Dombeck et al., 2009; Klaus et al., 2017). Since PCA/ICA seeks statistically independent components, which forces the temporal traces to have nearzero correlation, the correlation structure is badly corrupted in the raw PCA/ICA outputs (Figure 4D). We observed that a large proportion of the independence comes from the noisy baselines in the extracted traces (data not shown), so we postprocessed the PCA/ICA output by thresholding at the 3 standard deviation level. This recovers some nonzero correlations, but the true correlation structure is not recovered accurately (Figure 4D). By contrast, the CNMFE results matched the ground truth very well due to accurate extraction of individual neurons’ temporal activity (Figure 4D). As for CNMF, the estimated correlations are slightly elevated relative to the true correlations. This is due to the shared (highly correlated) background fluctuations that corrupt the recovered activity of nearby neurons.
Next, we compared the performance of the different methods under different SNR regimes. Because of the above inferior results we skip comparisons to the basic CNMF here. Based on the same simulation parameters as above, we vary the noise level $\mathrm{\Sigma}$ by multiplying it with a SNR reduction factor. Figure 4E shows that CNMFE detects all neurons over a wide SNR range, while PCA/ICA fails to detect the majority of neurons when the SNR drops to sufficiently low levels. Moreover, the detected neurons in CNMFE preserve high spatial and temporal similarities with the ground truth (Figure 4F–G). This high accuracy of extracting neurons’ temporal activity benefits from the modeling of the calcium dynamics, which leads to significantly denoised neural activity. If we skip the temporal denoising step in the algorithm, CNMFE is less robust to noise, but still outperforms PCA/ICA significantly (Figure 4G). When SNR is low, the improvements yielded by CNMFE can be crucial for detecting weak neuron events, as shown in Figure 4H.
Finally, we examine the ability of CNMFE to demix correlated and overlapping neurons. Using the two example neurons in Figure 3E, we ran multiple simulations at varying correlation levels and extracted neural components using the CNMFE pipeline and PCA/ICA analysis. The spatial footprints in these simulations were fixed, but the temporal components were varied to have different correlation levels ($\gamma $) between calcium traces by tuning their shared component with the common background fluctuations. For high correlation levels ($\gamma >0.7$), the initialization procedure tends to first initialize a component that explains the common activity between two neurons and then initialize another component to account for the residual of one neuron. After iteratively refining the model variables, CNMFE successfully extracted the two neurons’ spatiotemporal activity even at very high correlation levels ($\gamma =0.95$; Figure 5A,B). PCA/ICA was also often able to separate two neurons for large correlation levels ($\gamma =0.9$, Figure 5B), but the extracted traces have problematic negative spikes that serve to reduce their statistical dependences (Figure 4A).
Application to dorsal striatum data
We now turn to the analysis of largescale microendoscopic datasets recorded from freely behaving mice. We run both CNMFE and PCA/ICA for all datasets and compare their performances in detail.
We begin by analyzing in vivo calcium imaging data of neurons expressing GCaMP6f in the mouse dorsal striatum. (Full experimental details and algorithm parameter settings for this and the following datasets appear in the Methods and Materials section.) CNMFE extracted 692 putative neural components from this dataset; PCA/ICA extracted 547 components (starting from 700 initial components, and then automatically removing false positives using the same criterion as applied in CNMFE). Figure 6A shows how CNMFE decomposes an example frame into four components: the constant baselines that are invariant over time, the fluctuating background, the denoised neural signals, and the residuals. We highlight an example neuron by drawing its ROI to demonstrate the power of CNMFE in isolating fluorescence signals of neurons from the background fluctuations. For the selected neuron, we plot the mean fluorescence trace of the raw data and the estimated background (Figure 6B). These two traces are very similar, indicating that the background fluctuation dominates the raw data. By subtracting this estimated background component from the raw data, we acquire a clean trace that represents the neural signal.
To quantify the background effects further, we compute the contribution of each signal component in explaining the variance in the raw data. For each pixel, we compute the variance of the raw data first and then compute the variance of the backgroundsubtracted data. Then the reduced variance is divided by the variance of the raw data, giving the proportion of variance explained by the background. Figure 6C (blue) shows the distribution of the backgroundexplained variance over all pixels. The background accounts for around 90% of the variance on average. We further remove the denoised neural signals and compute the variance reduction; Figure 6C shows that neural signals account for less than 10% of the raw signal variance. This analysis is consistent with our observations that background dominates the fluorescence signal and extracting highquality neural signals requires careful background signal removal.
The contours of the spatial footprints inferred by the two approaches (PCA/ICA and CNMFE) are depicted in Figure 6D, superimposed on the correlation image of the filtered raw data. The indicated area was cropped from Figure 6A (left). In this case, most neurons inferred by PCA/ICA were inferred by CNMFE as well, with the exception of a few components that seemed to be false positives (judging by their spatial shapes and temporal traces and visual inspection of the raw data movie; detailed data not shown). However, many realistic components were only detected by CNMFE (shown as the green areas in Figure 6D). In these plots, we rank the inferred components according to their SNRs; the color indicates the relative rank (decaying from red to yellow). We see that the components missed by PCA/ICA have low SNRs (green shaded areas with yellow contours).
Figure 6E shows the spatial and temporal components of 14 example neurons detected only by CNMFE. Here (and in the following figures), for illustrative purposes, we show the calcium traces before the temporal denoising step. For neurons that are inferred by both methods, CNMFE shows significant improvements in the SNR of the extracted cellular signals (Figure 6F), even before the temporal denoising step is applied. In panel G we randomly select 10 examples and examine their spatial and temporal components. Compared with the CNMFE results, PCA/ICA components have much smaller size, often with negative dips surrounding the neuron (remember that ICA avoids spatial overlaps in order to reduce nearby neurons’ statistical dependences, leading to some loss of signal strength; see (Pnevmatikakis et al., 2016) for further discussion). The activity traces extracted by CNMFE are visually cleaner than the PCA/ICA traces; this is important for reliable event detection, particularly in low SNR examples. See Klaus et al., 2017) for additional examples of CNMFE applied to striatal data.
Application to data in prefrontal cortex
We repeat a similar analysis on GCaMP6s data recorded from prefrontal cortex (PFC, Figure 7), to quantify the performance of the algorithm in a different brain area with a different calcium indicator. Again we find that CNMFE successfully extracts neural signals from a strong fluctuating background (Figure 7A), which contributes a large proportion of the variance in the raw data (Figure 7B). Similarly as with the striatum data, PCA/ICA analysis missed many components that have very weak signals (33 missed components here). For the matched neurons, CNMFE shows strong improvements in the SNRs of the extracted traces (Figure 7D). Consistent with our observation in striatum (Figure 6G), the spatial footprints of PCA/ICA components are shrunk to promote statistical independence between neurons, while the neurons inferred by CNMFE have visually reasonable morphologies (Figure 6E). As for calcium traces with high SNRs (Figure 7E, cell 16), CNMFE traces have smaller noise values, which is important for detecting small calcium transients (Figure 7E, cell 4). For traces with low SNRs (Figure 7, cell 710), it is challenging to detect any calcium events from the PCA/ICA traces due to the large noise variance; CNMFE is able to visually recover many of these weaker signals. For those cells missed by PCA/ICA, their traces extracted by CNMFE have reasonable morphologies and visible calcium events (Figure 7F).
The demixing performance of PCA/ICA analysis can be relatively weak because it is inherently a linear demixing method (Pnevmatikakis et al., 2016). Since CNMFE uses a more suitable nonlinear matrix factorization method, it has a better capability of demixing spatially overlapping neurons. As an example, Figure 7G shows three closeby neurons identified by both CNMFE and PCA/ICA analysis. PCA/ICA forces its obtained filters to be spatially separated to reduce their dependence (thus reducing the effective signal strength), while CNMFE allows inferred spatial components to have large overlaps (Figure 7G, left), retaining the full signal power. In the traces extracted by PCA/ICA, the component labeled in green contains many negative ‘spikes,’ which are highly correlated with the spiking activity of the blue neuron (Figure 7G, yellow). In addition, the green PCA/ICA neuron has significant crosstalk with the red neuron due to the failure of signal demixing (Figure 7G, cyan); the CNMFE traces shows no comparable negative ‘spikes’ or crosstalk. See also Video 8 for further details.
Application to ventral hippocampus neurons
In the previous two examples, we analyzed data with densely packed neurons, in which the neuron sizes are all similar. In the next example, we apply CNMFE to a dataset with much sparser and more heterogeneous neural signals. The data used here were recorded from amygdalaprojecting neurons expressing GCaMP6f in ventral hippocampus. In this dataset, some neurons that are slightly above or below the focal plane were visible with prominent signals, though their spatial shapes are larger than neurons in the focal plane.
This example is somewhat more challenging due to the large diversity of neuron sizes. It is possible to set multiple parameters to detect neurons of different sizes (or to e.g. differentially detect somas versus smaller segments of axons or dendrites passing through the focal plane), but for illustrative purposes here we use a single neural size parameter to initialize all of the components. This in turn splits some large neurons into multiple components. Following this crude initialization step, we updated the background component and then picked the missing neurons from the residual using a second greedy component initialization step. Next, we ran CNMFE for three iterations of updating the model variables $A,C$, and $B$. The first two iterations were performed automatically; we included manual interventions (e.g. merging/deleting components) before the last iteration, leading to improved source extraction results (see Video 10 for details on the manual merge and delete interventions performed here). In this example, we detected 24 CNMFE components and 24 PCA/ICA components. The contours of these inferred neurons are shown in Figure 8A. In total we have 20 components detected by both methods (shown in the first three rows of Figure 8B+C); each method detected extra components that are not detected by the other (the last rows of Figure 8B+C). Once again, the PCA/ICA filters contain many negative pixels in an effort to reduce spatial overlaps; see components 3 and 5 in Figure 8A–C, for example. All traces of the inferred neurons are shown in Figure 8D+E. We can see that the CNMFE traces have much lower noise level and cleaner neural signals in both high and low SNR settings. Conversely, the calcium traces of the three extra neurons identified by PCA/ICA show noisy signals that are unlikely to be neural responses.
Application to footshock responses in the bed nucleus of the stria terminalis (BNST)
Identifying neurons and extracting their temporal activity is typically just the first step in the analysis of calcium imaging data; downstream analyses rely heavily on the quality of this initial source extraction. We showed above that, compared to PCA/ICA, CNMFE is better at extracting activity dynamics, especially in regimes where neuronal activities are correlated (c.f. Figure 4D). Using in vivo electrophysiological recordings, we previously showed that neurons in the bed nucleus of the stria terminalis (BNST) show strong responses to unpredictable footshock stimuli (Jennings et al., 2013). We therefore measured calcium dynamics in CaMKIIexpressing neurons that were transfected with the calcium indicator GCaMP6s in the BNST and analyzed the synchronous activity of multiple neurons in response to unpredictable footshock stimuli. We chose 12 example neurons that were detected by both CNMFE and PCA/ICA methods and show their spatial and temporal components in Figure 9A–C. The activity around the onset of the repeated stimuli are aligned and shown as pseudocolored images in panel D. The median responses of CNMFE neurons display prominent responses to the footshock stimuli compared with the resting state before stimuli onset. In comparison, the activity dynamics extracted by PCA/ICA have relatively low SNR, making it more challenging to reliably extract footshock responses. Panel E summarizes the results of panel D; we see that CNMFE outputs significantly more easily detectable responses than does PCA/ICA. This is an example in which downstream analyses of calcium imaging data can significantly benefit from the improvements in the accuracy of source extraction offered by CNMFE. (sheintuch2017tracking recently presented another such example, showing that more neurons can be tracked across multiple days using CNMFE outputs, compared to PCA/ICA.)
Conclusion
Microendoscopic calcium imaging offers unique advantages and has quickly become a critical method for recording large neural populations during unrestrained behavior. However, previous methods fail to adequately remove background contaminations when demixing single neuron activity from the raw data. Since strong background signals are largely inescapable in the context of onephoton imaging, insufficient removal of the background could yield problematic conclusions in downstream analysis. This has presented a severe and wellknown bottleneck in the field. We have delivered a solution for this critical problem, building on the constrained nonnegative matrix factorization framework introduced in Pnevmatikakis et al., 2016 but significantly extending it in order to more accurately and robustly remove these contaminating background components.
The proposed CNMFE algorithm can be used in either automatic or semiautomatic mode, and leads to significant improvements in the accuracy of source extraction compared with previous methods. In addition, CNMFE requires very few parameters to be specified, and these parameters are easily interpretable and can be selected within a broad range. We demonstrated the power of CNMFE using data from a wide diversity of brain areas (subcortical, cortical, and deep brain areas), SNR regimes, calcium indicators, neuron sizes and densities, and hardware setups. Among all these examples (and many others not shown here), CNMFE performs well and improves significantly on the standard PCA/ICA approach. Considering that source extraction is typically just the first step in calcium imaging data analysis pipelines (Mohammed et al., 2016), these improvements should in turn lead to more stable and interpretable results from downstream analyses. Further applications of the CNMFE approach appear in (Cameron et al., 2016; Donahue and Kreitzer, 2017; Jimenez et al., 2016; Jimenez et al., 2018; Klaus et al., 2017; Lin et al., 2017; Murugan et al., 2016; Murugan et al., 2017; RodriguezRomaguera et al., 2017; Tombaz et al., 2016; Ung et al., 2017; Yu et al., 2017; Mackevicius et al., 2017; Madangopal et al., 2017; Roberts et al., 2017; Ryan et al., 2017; Roberts et al., 2017; Sheintuch et al., 2017).
We have released our MATLAB implementation of CNMFE as opensource software (https://github.com/zhoupc/CNMF_E (Zhou, 2017a)). A Python implementation has also been incorporated into the CaImAn toolbox (Giovannucci et al., 2017b). We welcome additions or suggestions for modifications of the code, and hope that the large and growing microendoscopic imaging community finds CNMFE to be a helpful tool in furthering neuroscience research.
Materials and methods
Algorithm for solving problem (PS)
Request a detailed protocolIn problem (PS), ${\mathit{\bm{b}}}_{0}$ is unconstrained and can be updated in closed form: ${\widehat{\mathit{\bm{b}}}}_{0}=\frac{1}{T}(\stackrel{~}{Y}A\cdot \widehat{C}{\widehat{B}}^{f})\cdot \mathrm{\U0001d7cf}$. By plugging this update into problem (PS), we get a reduced problem
where $\stackrel{~}{Y}=Y{\widehat{B}}^{f}\frac{1}{T}Y{\mathrm{\U0001d7cf\U0001d7cf}}^{T}$ and $\stackrel{~}{C}=\widehat{C}\frac{1}{T}\widehat{C}{\mathrm{\U0001d7cf\U0001d7cf}}^{T}$. We approach this problem using a version of ”hierarchical alternating least squares’ (HALS; Cichocki et al., 2007), a standard algorithm for nonnegative matrix factorization. (Friedrich et al., 2017b) modified the fastHALS algorithm (Cichocki and Phan, 2009) to estimate the nonnegative spatial components $A,\mathit{\bm{b}}$ and the nonnegative temporal activity $C,\mathit{\bm{f}}$ in the CNMF model $Y=A\cdot C+\mathit{\bm{b}}{\mathit{\bm{f}}}^{T}+E$ by including sparsity and localization constraints. We solve a problem similar to the subproblem solved in Friedrich et al. (2017b):
where ${P}_{k}$ denotes the the spatial patch constraining the nonzero pixels of the $k$th neuron and restricts the candidate spatial support of neuron $k$. This regularization reduces the number of free parameters in $A$, leading to speed and accuracy improvements. The spatial patches can be determined using a mildly dilated version of the support of the previous estimate of $A$ (Pnevmatikakis et al., 2016; Friedrich et al., 2017a).
Algorithms for solving problem (PT)
In problem (PT), the model variable $C\in {\mathbb{R}}^{K\times T}$ could be very large, making the direct solution of (PT) computationally expensive. Unlike problem (PS), the problem (PT) cannot be readily parallelized because the constraints ${G}^{(i)}{\mathit{\bm{c}}}_{i}\ge 0$ couple the entries within each row of C, and the residual term couples entries across columns. Here, we follow the block coordinatedescent approach used in (Pnevmatikakis et al., 2016) and propose an algorithm that sequentially updates each ${\mathit{\bm{c}}}_{i}$ and ${\mathit{\bm{b}}}_{0}$. For each neuron, we start with a simple unconstrained estimate of ${\mathit{\bm{c}}}_{i}$, denoted as $\widehat{{\mathit{\bm{y}}}_{i}}$, that minimizes the residual of the spatiotemporal data matrix while fixing other neurons’ spatiotemporal activity and the baseline term ${\mathit{\bm{b}}}_{0}$,
where $Y}_{\text{res}}=Y\hat{A}\hat{C}{\hat{\mathit{b}}}_{0}{\mathbf{1}}^{T}{B}^{f$ represents the residual given the current estimate of the model variables. Due to its unconstrained nature, $\hat{\mathit{y}}}_{i$ is a noisy estimate of ${\mathit{\bm{c}}}_{i}$, plus a constant baseline resulting from inaccurate estimation of ${\mathit{\bm{b}}}_{0}$. Given ${\widehat{\mathit{\bm{y}}}}_{i}$, various deconvolution algorithms can be applied to obtain the denoised trace ${\widehat{\mathit{\bm{c}}}}_{i}$ and deconvolved signal ${\widehat{\mathit{\bm{s}}}}_{i}$(Vogelstein et al., 2009; Pnevmatikakis et al., 2013; Deneux et al., 2016; Friedrich et al., 2017b; Jewell and Witten, 2017); in CNMFE, we use the OASIS algorithm from (Friedrich et al., 2017b). (Note that the estimation of ${\mathit{\bm{c}}}_{i}$ is not dependent on accurate estimation of ${\mathit{\bm{b}}}_{0}$, because the algorithm for estimating ${\mathit{\bm{c}}}_{i}$ will also automatically estimate the baseline term in ${\widehat{\mathit{\bm{y}}}}_{i}$.) After the ${\mathit{\bm{c}}}_{i}$’s are updated, we update ${\mathit{\bm{b}}}_{0}$ using the closedform expression ${\widehat{\mathit{\bm{b}}}}_{0}=\frac{1}{T}(\stackrel{~}{Y}\widehat{A}\cdot \widehat{C}{\widehat{B}}^{f})\cdot \mathrm{\U0001d7cf}$.
Estimating background by solving problem (PB)
Request a detailed protocolNext we discuss our algorithm for estimating the spatiotemporal background signal by solving problem (PB) as a linear regression problem given $\widehat{A}$ and $\widehat{C}$. Since ${B}^{f}\cdot \mathrm{\U0001d7cf}=\mathrm{\U0001d7ce}$, we can easily estimate the constant baselines for each pixel as
Next we replace the ${\mathit{\bm{b}}}_{0}$ in (PB) with this estimate and rewrite (PB) as
where $X=Y\widehat{A}\cdot \widehat{C}{\widehat{\mathit{\bm{b}}}}_{0}{\mathrm{\U0001d7cf}}^{T}$. Given the optimized $\widehat{W}$, our estimation of the fluctuating background is ${\widehat{B}}^{f}=\widehat{W}X$. The new optimization problem (PW) can be readily parallelized into $d$ linear regression problems for each pixel separately. By estimating all row columns of ${W}_{i,:}$, we are able to obtain the whole background signal as
In some cases, $X$ might include large residuals from the inaccurate estimation of the neurons’ spatiotemporal activity $AC$, for example, missing neurons in the estimation. These residuals act as outliers and distort the estimation of ${\widehat{B}}^{f}$ and ${\mathit{\bm{b}}}_{0}$. To overcome this problem, we use robust least squares regression (RLSR) via hard thresholding to avoid contaminations from the outliers (Bhatia et al., 2015). Before solving the problem (PW), we compute ${B}^{}=\widehat{W}(Y\widehat{A}\cdot \widehat{C}{\widehat{\mathit{\bm{b}}}}_{0}{\mathrm{\U0001d7cf}}^{T})$ (the current estimate of the fluctuating background) and then apply a simple clipping preprocessing step to $X$:
Then we update the regression estimate using ${X}^{clipped}$ instead of $X$, and iterate. Here, ${\sigma}_{i}$ is the standard deviation of the noise at ${\mathit{\bm{x}}}_{i}$ and its value can be estimated using the power spectral density (PSD) method (Pnevmatikakis et al., 2016). As for the first iteration of the model fitting, we set each $B}_{it}^{}=\frac{1}{{\mathrm{\Omega}}_{i}}\sum _{j\in {\mathrm{\Omega}}_{i}}{\stackrel{~}{X}}_{jt$ as the mean of the ${\stackrel{~}{X}}_{jt}$ for all $j\in {\mathrm{\Omega}}_{i}$. The thresholding coefficient $\zeta $ can be specified by users, although we have found a fixed default works well across the datasets used here. This preprocessing removes most calcium transients by replacing those frames with the previously estimated background only. As a result, it increases the robustness to inaccurate estimation of $AC$, and in turn leads to a better extraction of $AC$ in the following iterations.
Initialization of model variables
Since problem (PAll) is not convex in all of its variables, a good initialization of model variables is crucial for fast convergence and accurate extraction of all neurons’ spatiotemporal activity. Previous methods assume the background component is relatively weak, allowing us to initialize $\widehat{A}$ and $\widehat{C}$ while ignoring the background or simply initializing it with a constant baseline over time. However, the noisy background in microendoscopic data fluctuates more strongly than the neural signals (c.f. Figure 6C and Figure 7B), which makes previous methods less valid for the initialization of CNMFE.
Here, we design a new algorithm to initialize $\widehat{A}$ and $\widehat{C}$ without estimating $\widehat{B}$. The whole procedure is illustrated in Figure 10 and described in Algorithm 1. The key aim of our algorithm is to exploit the relative spatial smoothness in the background compared to the single neuronal signals visible in the focal plane. Thus, we can use spatial filtering to reduce the background in order to estimate single neurons’ temporal activity, and then initialize each neuron’s spatial footprint given these temporal traces. Once we have initialized $\widehat{A}$ and $\widehat{C}$, it is straightforward to initialize the constant baseline ${\mathit{\bm{b}}}_{0}$ and the fluctuating background ${B}^{f}$ by solving problem (PB).
Spatially filtering the data
Request a detailed protocolWe first filter the raw video data with a customized image kernel (Figure 10A). The kernel is generated from a Gaussian filter
Here, we use $h(\mathit{\bm{x}})$ to approximate a cell body; the factor of $1/4$ in the Gaussian width is chosen to match a Gaussian shape to a cell of width $l$. Instead of using $h(\mathit{\bm{x}})$ as the filtering kernel directly, we subtract its spatial mean (computed over a region of width equal to $l$) and filter the raw data with $\stackrel{~}{h}(\mathit{\bm{x}})=h(\mathit{\bm{x}})\overline{h}(\mathit{\bm{x}})$. The filtered data is denoted as $Z\in {\mathbb{R}}^{d\times T}$ (Figure 10B). This spatial filtering step helps accomplish two goals: (1) reducing the background $B$, so that $Z$ is dominated by neural signals (albeit somewhat spatially distorted) in the focal plane (see Figure 10B as an example); (2) performing a template matching to detect cell bodies similar to the Gaussian kernel. Consequently, $Z$ has large values near the center of each cell body. (However, note that we can not simply e.g. apply CNMF to $Z$, because the spatial components in a factorization of the matrix $Z$ will typically no longer be nonnegative, and therefore NMFbased approaches can not be applied directly.) More importantly, the calcium traces near the neuron center in the filtered data preserve the calcium activity of the corresponding neurons because the filtering step results in a weighted average of cellular signals surrounding each pixel (Figure 10B). Thus, the fluorescence traces in pixels close to neuron centers in $Z$ can be used for initializing the neurons’ temporal activity directly. These pixels are defined as seed pixels. We next propose a quantitative method to rank all potential seed pixels.
Ranking seed pixels
Request a detailed protocolA seed pixel $\mathit{\bm{x}}$ should have two main features: first, $Z(\mathit{\bm{x}})$, which is the filtered trace at pixel $\mathit{\bm{x}}$, should have high peaktonoise ratio (PNR) because it encodes the calcium concentration ${\mathit{\bm{c}}}_{i}$ of one neuron; second, a seed pixel should have high temporal correlations with its neighboring pixels (e.g. 4 nearest neighbors) because they share the same ${\mathit{\bm{c}}}_{i}$. We computed two metrics for each of these two features:
Recall that $\sigma (\mathit{\bm{x}})$ is the standard deviation of the noise at pixel $\mathit{\bm{x}}$; the function $\text{\mathbf{c}\mathbf{o}\mathbf{r}\mathbf{r}}()$ refers to Pearson correlation here. In our implementation, we usually threshold $Z(\mathit{\bm{x}})$ by $3\sigma (\mathit{\bm{x}})$ before computing $L(\mathit{\bm{x}})$ to reduce the influence of the background residuals, noise, and spikes from nearby neurons.
Most pixels can be ignored when selecting seed pixels because their local correlations or PNR values are too small. To avoid unnecessary searches of the pixels, we set thresholds for both $P(\mathit{\bm{x}})$ and $L(\mathit{\bm{x}})$, and only pick pixels larger than the thresholds ${P}_{\mathrm{min}}$ and ${L}_{\mathrm{min}}$. It is empirically useful to combine both metrics for screening seed pixels. For example, high PNR values could result from large noise, but these pixels usually have small $L(\mathit{\bm{x}})$ because the noise is not shared with neighboring pixels. On the other hand, insufficient removal of background during the spatial filtering leads to high $L(\mathit{\bm{x}})$, but the corresponding $P(\mathit{\bm{x}})$ are usually small because most background fluctuations have been removed. So we create another matrix $R(\mathit{\bm{x}})=P(\mathit{\bm{x}})\cdot L(\mathit{\bm{x}})$ that computes the pixelwise product of $P(\mathit{\bm{x}})$ and $L(\mathit{\bm{x}}$). We rank all $R(\mathit{\bm{x}})$ in a descending order and choose the pixel ${\mathit{\bm{x}}}^{*}$ with the largest $R(\mathit{\bm{x}})$ for initialization.
Algorithm 1. Initialize model variables $A$ and $C$ given the raw data 

$\mathbf{R}\mathbf{e}\mathbf{q}\mathbf{u}\mathbf{i}\mathbf{r}\mathbf{e}\phantom{\rule{negativethinmathspace}{0ex}}\mathbf{:}\text{}\mathrm{d}\mathrm{a}\mathrm{t}\mathrm{a}\text{}Y\in {\mathbb{R}}^{d\times T},\mathrm{n}\mathrm{e}\mathrm{u}\mathrm{r}\mathrm{o}\mathrm{n}\text{}\mathrm{s}\mathrm{i}\mathrm{z}\mathrm{e}\text{}l,\phantom{\rule{thinmathspace}{0ex}}\mathrm{t}\mathrm{h}\mathrm{e}\text{}\mathrm{m}\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{m}\mathrm{u}\mathrm{m}\phantom{\rule{thinmathspace}{0ex}}\mathrm{l}\mathrm{o}\mathrm{c}\mathrm{a}\mathrm{l}\phantom{\rule{thinmathspace}{0ex}}\mathrm{c}\mathrm{o}\mathrm{r}\mathrm{r}\mathrm{e}\mathrm{l}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}\phantom{\rule{thinmathspace}{0ex}}{L}_{min}\phantom{\rule{thinmathspace}{0ex}}\mathrm{a}\mathrm{n}\mathrm{d}\phantom{\rule{thinmathspace}{0ex}}\mathrm{t}\mathrm{h}\mathrm{e}\phantom{\rule{thinmathspace}{0ex}}\mathrm{m}\mathrm{i}\mathrm{n}\mathrm{i}\mathrm{m}\mathrm{u}\mathrm{m}\text{}\mathrm{P}\mathrm{N}\mathrm{R}\text{}{P}_{min}\text{}\mathrm{f}\mathrm{o}\mathrm{r}\text{}\mathrm{s}\mathrm{e}\mathrm{l}\mathrm{e}\mathrm{c}\mathrm{t}\mathrm{i}\mathrm{n}\mathrm{g}\phantom{\rule{thinmathspace}{0ex}}\mathrm{s}\mathrm{e}\mathrm{e}\mathrm{d}\phantom{\rule{thinmathspace}{0ex}}\mathrm{p}\mathrm{i}\mathrm{x}\mathrm{e}\mathrm{l}\mathrm{s}.$ $\begin{array}{lr}{\displaystyle 1\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}\mathrm{h}\leftarrow \mathrm{a}\text{}\mathrm{t}\mathrm{r}\mathrm{u}\mathrm{n}\mathrm{c}\mathrm{a}\mathrm{t}\mathrm{e}\mathrm{d}\text{}2\mathrm{D}\text{}\mathrm{G}\mathrm{a}\mathrm{u}\mathrm{s}\mathrm{s}\mathrm{i}\mathrm{a}\mathrm{n}\text{}\mathrm{k}\mathrm{e}\mathrm{r}\mathrm{n}\mathrm{e}\mathrm{l}\text{}\mathrm{o}\mathrm{f}\text{}\mathrm{w}\mathrm{i}\mathrm{d}\mathrm{t}\mathrm{h}{\sigma}_{\mathrm{x}}={\sigma}_{\mathrm{y}}=\frac{\mathrm{l}}{4};\mathrm{h}\in {\mathbb{R}}^{\mathrm{l}\times \mathrm{l}}}& {\displaystyle \u25b9\text{}2\mathrm{D}\text{}\mathrm{G}\mathrm{a}\mathrm{u}\mathrm{s}\mathrm{s}\mathrm{i}\mathrm{a}\mathrm{n}\text{}\mathrm{k}\mathrm{e}\mathrm{r}\mathrm{n}\mathrm{e}\mathrm{l}}\\ {\displaystyle 2\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}\stackrel{~}{h}\leftarrow h\overline{h};\stackrel{~}{h}\in {\mathbb{R}}^{l\times l}}& \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}{\displaystyle \u25b9\text{}\mathrm{m}\mathrm{e}\mathrm{a}\mathrm{n}\mathrm{c}\mathrm{e}\mathrm{n}\mathrm{t}\mathrm{e}\mathrm{r}\mathrm{e}\mathrm{d}\text{}\mathrm{k}\mathrm{e}\mathrm{r}\mathrm{n}\mathrm{e}\mathrm{l}\text{}\mathrm{f}\mathrm{o}\mathrm{r}\text{}\mathrm{s}\mathrm{p}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{a}\mathrm{l}\text{}\mathrm{f}\mathrm{i}\mathrm{l}\mathrm{t}\mathrm{e}\mathrm{r}\mathrm{i}\mathrm{n}\mathrm{g}}\\ {\displaystyle 3\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}Z\leftarrow \text{conv}(Y,h);Z\in {\mathbb{R}}^{d\times T}}& {\displaystyle \u25b9\text{}\mathrm{s}\mathrm{p}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{a}\mathrm{l}\mathrm{l}\mathrm{y}\text{}\mathrm{f}\mathrm{i}\mathrm{l}\mathrm{t}\mathrm{e}\mathrm{r}\text{}\mathrm{t}\mathrm{h}\mathrm{e}\text{}\mathrm{r}\mathrm{a}\mathrm{w}\text{}\mathrm{d}\mathrm{a}\mathrm{t}\mathrm{a}}\\ {\displaystyle 4\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}L\leftarrow \text{}\mathrm{l}\mathrm{o}\mathrm{c}\mathrm{a}\mathrm{l}\text{}\mathrm{c}\mathrm{r}\mathrm{o}\mathrm{s}\mathrm{s}\mathrm{c}\mathrm{o}\mathrm{r}\mathrm{r}\mathrm{e}\mathrm{l}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}\text{}\mathrm{i}\mathrm{m}\mathrm{a}\mathrm{g}\mathrm{e}\text{}\mathrm{o}\mathrm{f}\text{}\mathrm{t}\mathrm{h}\mathrm{e}\text{}\mathrm{f}\mathrm{i}\mathrm{l}\mathrm{t}\mathrm{e}\mathrm{r}\mathrm{e}\mathrm{d}\text{}\mathrm{d}\mathrm{a}\mathrm{t}\mathrm{a}\text{}\mathrm{Z};\text{}L\in {\mathbb{R}}^{d}}& \\ {\displaystyle 5\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}P\leftarrow \text{}\mathrm{P}\mathrm{N}\mathrm{R}\text{}\mathrm{i}\mathrm{m}\mathrm{a}\mathrm{g}\mathrm{e}\text{}\mathrm{o}\mathrm{f}\text{}\mathrm{t}\mathrm{h}\mathrm{e}\text{}\mathrm{f}\mathrm{i}\mathrm{l}\mathrm{t}\mathrm{e}\mathrm{r}\mathrm{e}\mathrm{d}\text{}\mathrm{d}\mathrm{a}\mathrm{t}\mathrm{a}\text{}\mathrm{Z};\text{}P\in {\mathbb{R}}^{d}}& \\ {\displaystyle 6\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}k\leftarrow 0}& {\displaystyle \u25b9\text{}\mathrm{n}\mathrm{e}\mathrm{u}\mathrm{r}\mathrm{o}\mathrm{n}\text{}\mathrm{n}\mathrm{u}\mathrm{m}\mathrm{b}\mathrm{e}\mathrm{r}}\\ {\displaystyle 7\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}\mathbf{w}\mathbf{h}\mathbf{i}\mathbf{l}\mathbf{e}\text{}\mathrm{T}\mathrm{r}\mathrm{u}\mathrm{e}\text{}\mathbf{d}\mathbf{o}}& \\ {\displaystyle 8\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}\phantom{\rule{1em}{0ex}}\mathbf{i}\mathbf{f}\text{}L(\mathit{x})\le {L}_{min}\text{}or\text{}P(\mathit{x})\le {P}_{min}\text{}\mathrm{f}\mathrm{o}\mathrm{r}\text{}\mathrm{a}\mathrm{l}\mathrm{l}\text{}\mathrm{p}\mathrm{i}\mathrm{x}\mathrm{e}\mathrm{l}\text{}\mathit{x}\text{}\mathbf{t}\mathbf{h}\mathbf{e}\mathbf{n}}& \\ {\displaystyle 9\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}\phantom{\rule{2em}{0ex}}\mathrm{b}\mathrm{r}\mathrm{e}\mathrm{a}\mathrm{k};}& \\ {\displaystyle 10\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\phantom{\rule{1em}{0ex}}\mathbf{e}\mathbf{l}\mathbf{s}\mathbf{e}}& \end{array}$ $\begin{array}{lr}{\displaystyle 11\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}\phantom{\rule{2em}{0ex}}k\leftarrow k+1}& \\ {\displaystyle 12\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}\phantom{\rule{2em}{0ex}}{\hat{\mathit{a}}}_{k}\leftarrow \mathbf{0};\mathit{a}\in {\mathbb{R}}^{d}}& \\ {\displaystyle 13\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}\phantom{\rule{2em}{0ex}}{\mathit{x}}^{\ast}\leftarrow {\text{argmax}}_{\mathit{x}}(L(\mathit{x})\cdot P(\mathit{x}))}& {\displaystyle \u25b9\text{}\mathrm{s}\mathrm{e}\mathrm{l}\mathrm{e}\mathrm{c}\mathrm{t}\text{}\mathrm{a}\text{}\mathrm{s}\mathrm{e}\mathrm{e}\mathrm{d}\text{}\mathrm{p}\mathrm{i}\mathrm{x}\mathrm{e}\mathrm{l}}\\ {\displaystyle 14\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}\phantom{\rule{2em}{0ex}}{\mathrm{\Omega}}_{k}\leftarrow \{\mathit{x}\mathit{x}\text{}\mathrm{i}\mathrm{s}\text{}\mathrm{i}\mathrm{n}\text{}\mathrm{t}\mathrm{h}\mathrm{e}\text{}\mathrm{s}\mathrm{q}\mathrm{u}\mathrm{a}\mathrm{r}\mathrm{e}\text{}\mathrm{b}\mathrm{o}\mathrm{x}\text{}\mathrm{o}\mathrm{f}\text{}\mathrm{l}\mathrm{e}\mathrm{n}\mathrm{g}\mathrm{t}\mathrm{h}\text{}\text{}(2\mathrm{l}+1)\mathrm{s}\mathrm{u}\mathrm{r}\mathrm{r}\mathrm{o}\mathrm{u}\mathrm{n}\mathrm{d}\mathrm{i}\mathrm{n}\mathrm{g}\text{}\mathrm{p}\mathrm{i}\mathrm{x}\mathrm{e}\mathrm{l}\text{}{\mathit{x}}^{\ast}\}}& {\displaystyle \u25b9\text{}\mathrm{c}\mathrm{r}\mathrm{o}\mathrm{p}\text{}\mathrm{a}\text{}\mathrm{s}\mathrm{m}\mathrm{a}\mathrm{l}\mathrm{l}\text{}\mathrm{b}\mathrm{o}\mathrm{x}\text{}\mathrm{n}\mathrm{e}\mathrm{a}\mathrm{r}\text{}{\mathit{x}}^{\ast}}& \\ {\displaystyle 15\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}\phantom{\rule{2em}{0ex}}\mathit{r}(\mathit{x})\leftarrow \text{corr}(Z(\mathit{x},:),Z({\mathit{x}}^{\ast},:))\text{}\mathrm{f}\mathrm{o}\mathrm{r}\text{}\mathrm{a}\mathrm{l}\mathrm{l}\text{}\mathit{x}\in {\mathrm{\Omega}}_{k};\mathit{r}\in {\mathbb{R}}^{{\mathrm{\Omega}}_{k}}}\\ {\displaystyle 16\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}\phantom{\rule{2em}{0ex}}{\mathit{y}}_{BG}\leftarrow {\displaystyle \frac{\sum _{\{\mathit{x}\mathit{r}(\mathit{x})\le 0.3\}}Y(\mathit{x},:)}{\sum _{\{\mathit{x}\mathit{r}(\mathit{x})\le 0.3\}}1};{\mathit{y}}_{BG}\in {\mathbf{R}}^{T}}}& \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}{\displaystyle \u25b9\text{}\mathrm{e}\mathrm{s}\mathrm{t}\mathrm{i}\mathrm{m}\mathrm{a}\mathrm{t}\mathrm{e}\text{}\mathrm{t}\mathrm{h}\mathrm{e}\text{}\mathrm{b}\mathrm{a}\mathrm{c}\mathrm{k}\mathrm{g}\mathrm{r}\mathrm{o}\mathrm{u}\mathrm{n}\mathrm{d}\text{}\mathrm{s}\mathrm{i}\mathrm{g}\mathrm{n}\mathrm{a}\mathrm{l}}\end{array}$ $\begin{array}{lr}{\displaystyle 17\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}\phantom{\rule{2em}{0ex}}{\hat{\mathit{c}}}_{k}\leftarrow {\displaystyle \frac{\sum _{\{\mathit{x}\mathit{x}(\mathit{x})\ge 0.7\}}Z(\mathit{x},:)}{\sum _{\{\mathit{x}\mathit{r}(\mathit{x})\ge 0.7\}}1};{\hat{\mathit{c}}}_{k}\in {\mathbf{R}}^{T}}}& {\displaystyle \u25b9\text{}\mathrm{e}\mathrm{s}\mathrm{t}\mathrm{i}\mathrm{m}\mathrm{a}\mathrm{t}\mathrm{e}\text{}\mathrm{n}\mathrm{e}\mathrm{u}\mathrm{r}\mathrm{a}\mathrm{l}\text{}\mathrm{s}\mathrm{i}\mathrm{g}\mathrm{n}\mathrm{a}\mathrm{l}}\\ {\displaystyle 18\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}\phantom{\rule{2em}{0ex}}{\hat{\mathit{a}}}_{k}({\mathrm{\Omega}}_{k}),{\hat{\mathit{b}}}^{(f)},{\hat{\mathit{b}}}^{(0)}\leftarrow {\text{argmin}}_{\mathit{a},{\mathit{b}}^{(f)},{\mathit{b}}^{(0)}}\Vert {Y}_{{\mathrm{\Omega}}_{k}}(\mathit{a}\cdot {\hat{\mathit{c}}}_{k}^{T}+{\mathit{b}}^{(f)}\cdot {\mathit{y}}_{BG}^{T}+{\mathit{b}}^{(0)}\cdot {\mathbf{1}}^{T}){\Vert}_{F}^{2}}& \\ {\displaystyle 19\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}\phantom{\rule{2em}{0ex}}{\hat{\mathit{a}}}_{k}\leftarrow max(0,{\hat{\mathit{a}}}_{k})}& \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}{\displaystyle \u25b9\text{}\mathrm{t}\mathrm{h}\mathrm{e}\text{}\mathrm{s}\mathrm{p}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{a}\mathrm{l}\text{}\mathrm{c}\mathrm{o}\mathrm{m}\mathrm{p}\mathrm{o}\mathrm{n}\mathrm{e}\mathrm{n}\mathrm{t}\text{}\mathrm{o}\mathrm{f}\text{}\mathrm{t}\mathrm{h}\mathrm{e}\text{}\mathrm{k}\mathrm{t}\mathrm{h}\text{}\mathrm{n}\mathrm{e}\mathrm{u}\mathrm{r}\mathrm{o}\mathrm{n}}\\ {\displaystyle 20\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}\phantom{\rule{2em}{0ex}}Y\leftarrow Y{\hat{\mathit{a}}}_{k}\cdot {\hat{\mathit{c}}}_{k}^{T}}& \phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}{\displaystyle \u25b9\text{}\mathrm{p}\mathrm{e}\mathrm{e}\mathrm{l}\text{}\mathrm{a}\mathrm{w}\mathrm{a}\mathrm{y}\text{}\mathrm{t}\mathrm{h}\mathrm{e}\text{}\mathrm{n}\mathrm{e}\mathrm{u}\mathrm{r}\mathrm{o}{\mathrm{n}}^{\prime}\mathrm{s}\text{}\mathrm{s}\mathrm{p}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{t}\mathrm{e}\mathrm{m}\mathrm{p}\mathrm{o}\mathrm{r}\mathrm{a}\mathrm{l}\text{}\mathrm{a}\mathrm{c}\mathrm{t}\mathrm{i}\mathrm{v}\mathrm{i}\mathrm{t}\mathrm{y}}\\ {\displaystyle 21\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}\phantom{\rule{2em}{0ex}}\mathrm{u}\mathrm{p}\mathrm{d}\mathrm{a}\mathrm{t}\mathrm{e}\text{}\mathrm{L}(\mathit{x})\text{}\mathrm{a}\mathrm{n}\mathrm{d}\text{}\mathrm{P}(\mathit{x})\text{}\mathrm{l}\mathrm{o}\mathrm{c}\mathrm{a}\mathrm{l}\mathrm{l}\mathrm{y}\text{}\mathrm{g}\mathrm{i}\mathrm{v}\mathrm{e}\mathrm{n}\text{}\mathrm{t}\mathrm{h}\mathrm{e}\text{}\mathrm{n}\mathrm{e}\mathrm{w}\text{}\mathrm{Y}}& \\ {\displaystyle 22\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}A\leftarrow [{\hat{\mathit{a}}}_{1},{\hat{\mathit{a}}}_{2},\cdots ,{\hat{\mathit{a}}}_{k}]}& \\ {\displaystyle 23\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}C\leftarrow [{\hat{\mathit{c}}}_{1},{\hat{\mathit{c}}}_{2},\cdots ,{\hat{\mathit{c}}}_{k}{]}^{T}}& \\ {\displaystyle 24\phantom{\rule{negativethinmathspace}{0ex}}\phantom{\rule{negativethinmathspace}{0ex}}:\text{}\mathbf{r}\mathbf{e}\mathbf{t}\mathbf{u}\mathbf{r}\mathbf{n}\text{}A,C}& \end{array}$ 
Greedy initialization
Request a detailed protocolOur initialization method greedily initializes neurons one by one. Every time we initialize a neuron, we will remove its initialized spatiotemporal activity from the raw video data and initialize the next neuron from the residual. For the same neuron, there are several seed pixels that could be used to initialize it. But once the neuron has been initialized from any of these seed pixels (and the spatiotemporal residual matrix has been updated by peeling away the corresponding activity), the remaining seed pixels related to this neuron have lowered PNR and local correlation. This helps avoid the duplicate initialization of the same neuron. Also, $P(\mathit{\bm{x}})$ and $L(\mathit{\bm{x}})$ have to be updated after each neuron is initialized, but since only a small area near the initialized neuron is affected, we can update these quantities locally to reduce the computational cost. This procedure is repeated until the specified number of neurons have been initialized or no more candidate seed pixels exist.
This initialization algorithm can greedily initialize the required number of neurons, but the subproblem of estimating ${\widehat{a}}_{i}$ given ${\widehat{c}}_{i}$ still has to deal with the large background activity in the residual matrix. We developed a simple method to remove this background and accurately initialize neuron shapes, described next. We first crop a $(2l+1)\times (2l+1)$ square centered at ${\mathit{\bm{x}}}^{*}$ in the field of view (Figure 10A–E). Then we compute the temporal correlation between the filtered traces of pixel ${x}^{*}$ and all other pixels in the patch (Figure 10D). We choose those pixels with small temporal correlations (e.g. 0.3) as the neighboring pixels that are outside of the neuron (the green contour in Figure 10D). Next, we estimate the background fluctuations as the median values of these pixels for each frame in the raw data (Figure 10E). We also select pixels that are within the neuron by selecting correlation coefficients larger than $0.7$, then ${\widehat{\mathit{\bm{c}}}}_{i}$ is refined by computing the mean filtered traces of these pixels (Figure 10E). Finally, we regress the raw fluorescence signal in each pixel onto three sources: the neuron signal (Figure 10E), the local background fluctuation (Figure 10F), and a constant baseline. Our initial estimate of ${\widehat{a}}_{i}$ is given by the regression weights onto ${\widehat{c}}_{i}$ in Figure 10F.
Modifications for high temporal or spatial correlation
Request a detailed protocolThe above procedure works well in most experimental datasets as long as neurons are not highly spatially overlapped and temporally correlated. However, in a few extreme cases, this initialization may lead to bad local minima. We found that two practical modifications can lead to improved results.
High temporal correlation, low spatial overlaps
The greedy initialization procedure assumes that closeby neurons are not highly correlated. If this assumption fails, CNMFE will first merge nearby neurons into one component for explaining the shared fluctuations, and then the following initialized components will only capture the residual signals of each neuron. Our solution to this issue relies on our accurate background removal procedure, after which we simply reestimate each neural trace ${\mathit{\bm{c}}}_{i}$ as a weighted fluorescence trace of the backgroundsubtracted video $(Y{\widehat{B}}^{f}{\widehat{\mathit{\bm{b}}}}_{0}{\mathrm{\U0001d7cf}}^{T})$,
where ${\stackrel{~}{\mathit{\bm{a}}}}_{i}$ only selects pixels with large weights by thresholding the estimated ${\widehat{\mathit{\bm{a}}}}_{i}$ with $\mathrm{max}({\widehat{\mathit{\bm{a}}}}_{i})/2$ (this reduces the contributions from smaller neighboring neurons). This strategy improves the extraction of individual neurons’ traces in the high correlation scenarios and the spatial footprints can be corrected in the following step of updating $\widehat{A}$. Figure 4B and Figure 5 illustrate this procedure.
High spatial overlaps, low temporal correlation
CNMFE may initialize components with shared temporal traces because they have highly overlapping areas. We solve this problem by decorrelating their traces (following a similar approach in [Pnevmatikakis et al., 2016]). We start by assuming that neurons with high spatial overlap do not fire spikes within the same frame. If so, only the inferred spiking trace with the largest value is kept and the rest will be set to 0. Then we initialize each ${\mathit{\bm{c}}}_{i}$ given these thresholded spiking traces and the corresponding AR coefficients.
Interventions
We use iterative matrix updates to estimate model variables in CNMFE. This strategy gives us the flexibility of integrating prior information on neuron morphology and temporal activity during the model fitting. The resulting interventions (which can in principle be performed either automatically or under manual control) can in turn lead to faster convergence and more accurate source extraction. We integrate 5 interventions in our CNMFE implementation. Following these interventions, we usually run one more iteration of matrix updates.
Merge existing components
Request a detailed protocolWhen a single neuron is split mistakenly into multiple components, a merge step is necessary to rejoin these components. If we can find all split components, we can superimpose all their spatiotemporal activities and run rank1 NMF to obtain the spatial and temporal activity of the merged neuron. We automatically merge components for which the spatial and temporal components are correlated above certain thresholds. Our code also provides methods to manually specify neurons to be merged based on human judgment.
Split extracted components
Request a detailed protocolWhen highly correlated neurons are mistakenly merged into one component, we need to use spatial information to split into multiple components according to neurons’ morphology. Our current implementation of component splitting requires users to manually draw ROIs for splitting the spatial footprint of the extracted component. Automatic methods for ROI segmentation (Apthorpe et al., 2016; Pachitariu et al., 2013) could be added as an alternative in future implementations.
Remove false positives
Request a detailed protocolSome extracted components have spatial shapes that do not correspond to real neurons or temporal traces that do not correspond to neural activity. These components might explain some neural signals or background activity mistakenly. Our source extraction can benefit from the removal of these false positives. This can be done by manually examining all extracted components, or in principle automatically by training a classifier for detecting real neurons. The current implementation relies on visual inspection to exclude false positives. We also rank neurons based on their SNRs and set a cutoff to discard all extracted components that fail to meet this cutoff. As with the splitting step, removing false positives could also potentially use automated ROI detection algorithms in the future. See Video 10 for an example involving manual merge and delete operations.
Pick undetected neurons from the residual
Request a detailed protocolIf all neural signals and background are accurately estimated, the residual of the CNMFE model ${Y}_{\text{res}}=Y\widehat{A}\widehat{C}\widehat{B}$ should be relatively spatially and temporally uncorrelated. However, the initialization might miss some neurons due to large background fluctuations and/or high neuron density. After we estimate the background $\widehat{B}$ and extract a majority of the neurons, those missed neurons have prominent fluorescent signals left in the residual. To select these undetected neurons from the residual ${Y}_{\text{res}}$, we use the same algorithm as for initializing neurons from the raw video data, but typically now the task is easier because the background has been removed.
Postprocess the spatial footprints
Request a detailed protocolEach single neuron has localized spatial shapes and including this prior into the model fitting of CNMFE, as suggested in (Pnevmatikakis et al., 2016), leads to better extraction of spatial footprints. In the model fitting step, we constrain $A$ to be sparse and spatially localized. These constraints do give us compact neuron shapes in most cases, but in some cases there are still some visually abnormal components detected. We include a heuristic automated postprocessing step after each iteration of updating spatial shapes (PS). For each extracted neuron $A(:,k)$, we first convert it to a 2D image and perform morphological opening to remove isolated pixels resulting from noise (Haralick et al., 1987). Next we label all connected components in the image and create a mask to select the largest component. All pixels outside of the mask in $A(:,i)$ are set to be. This postprocessing induces compact neuron shapes by removing extra pixels and helps avoid mistakenly explaining the fluorescence signals of the other neurons.
Further algorithmic details
The simplest pipeline for running CNMFE includes the following steps:
Initialize $\widehat{A},\widehat{C}$ using the proposed initialization procedure.
Solve problem (PB) for updates of ${\widehat{\mathit{\bm{b}}}}_{0}$ and ${\widehat{B}}^{f}$.
Iteratively solve problem (PS) and (PT) to update $\widehat{A},\widehat{C}$ and ${\mathit{\bm{b}}}_{0}$.
If desired, apply interventions to intermediate results.
Repeat steps 2, 3, and 4 until the inferred components are stable.
In practice, the estimation of the background $B$ (step 2) often does not vary greatly from iteration to iteration and so this step usually can be run with fewer iterations to save time. In practice, we also use spatial and temporal decimation for improved speed, following (Friedrich et al., 2017a). We first run the pipeline on decimated data to get good initializations, then we upsample the results $\widehat{A},\widehat{C}$ to the original resolution and run one iteration of steps (23) on the raw data. This strategy improves on processing the raw data directly because downsampling increases the signaltonoise ratio and eliminates many false positives.
Step 4 provides a fast method for correcting abnormal components without redoing the whole analysis. (This is an important improvement over the PCA/ICA pipeline, where if users encounter poor estimated components it is necessary to repeat the whole analysis with new parameter values, which may not necessarily yield improved cell segmentations.) The interventions described here themselves can be independent tasks in calcium imaging analysis; with further work we expect many of these steps can be automated. In our interface for performing manual interventions, the most frequently used function is to remove false positives. Again, components can be rejected following visual inspection in PCA/ICA analysis, but the performance of CNMFE can be improved with further iterations after removing false positives, while this is not currently an option for PCA/ICA.
We have also found a twostep initialization procedure useful for detecting neurons: we first start from relatively high thresholds of ${P}_{\mathrm{min}}$ and ${L}_{\mathrm{min}}$ to initialize neurons with large activity from the raw video data; then we estimate the background components by solving problem (PB); finally we can pick undetected neurons from the residual using smaller thresholds. We can terminate the model iterations when the residual sum of squares (RSS) stabilizes (see Figure 4B), but this is seldom used in practice because computing the RSS is timeconsuming. Instead we usually automatically stop the iterations after the number of detected neurons stabilizes. If manual interventions are performed, we typically run one last iteration of updating $B,A$ and $C$ sequentially to further refine the results.
Parameter selection
Request a detailed protocolTable 2 shows 5 key parameters used in CNMFE. All of these parameters have interpretable meaning and can be easily picked within a broad range. The parameter $l$ controls the size of the spatial filter in the initialization step and is chosen as the diameter of a typical neuron in the FOV. As long as $l$ is much smaller than local background sources, the filtered data can be used for detecting seed pixels and then initializing neural traces. The distance between each seed pixel and its selected neighbors ${l}_{n}$ has to be larger than the neuron size $l$ and smaller than the spatial range of local background sources; in practice, this range is fairly broad. We usually set ${l}_{n}$ as $2l$. To determine the thresholds ${P}_{\mathrm{min}}$ and ${L}_{\mathrm{min}}$, we first compute the correlation image and PNR image and then visually select very weak neurons from these two images. ${P}_{\mathrm{min}}$ and ${L}_{\mathrm{min}}$ are determined to ensure that CNMFE is able to choose seed pixels from these weak neurons. Small ${P}_{\mathrm{min}}$ and ${L}_{\mathrm{min}}$ yield more false positive neurons, but they can be removed in the intervention step. Finally, in practice, our results are not sensitive to the selection of the outlier parameter $\zeta $, thus we frequently set it as 10.
Complexity analysis
Request a detailed protocolIn step 1, the time cost is mainly determined by spatial filtering, resulting in $O(dT)$ time. As for the initialization of a single neuron given a seed pixel, it is only ($O(T)$). Considering the fact that the number of neurons is typically much smaller than the number of pixels in this data, the complexity for step one remains $O(dT)$. In step 2, the complexity of estimating ${\widehat{\mathit{\bm{b}}}}_{0}$ is $O(dT)$ and estimating ${\widehat{B}}^{f}$ scales linearly with the number of pixels $d$. For each pixel, the computational complexity for estimating ${W}_{i,:}$ is $O(T)$. Thus, the computational complexity in updating the background component is $O(dT)$. In step 3, the computational complexities of solving problems (PS) and (PT) have been discussed in previous literature (Pnevmatikakis et al., 2016) and they scale linearly with pixel number $d$ and time $T$, that is, $O(dT)$. For the interventions, the one with the largest computational cost is picking undetected neurons from the residual, which is the same as the initialization step. Therefore, the computational cost for step 4 is $O(dT)$. To summarize, the complexity for running CNMFE is $O(dT)$, that is, the method scales linearly with both the number of pixels and the total recording time.
Implementations
Request a detailed protocolOur MATLAB implementation supports running CNMFE in three different modes that are optimized for different datasets: singlemode, patchmode and multibatchmode.
Singlemode is a naive implementation that loads data into memory and fits the model. It is fast for processing small datasets (<1 GB).
For larger datasets, many computers have insufficient RAM for loading all data into memory and storing intermediate results. Patchmode CNMFE divides the whole FOV into multiple small patches and maps data to the hard drive (Giovannucci et al., 2017b). The data within each patch are loaded only when we process that patch. This significantly reduces the memory consumption. More importantly, this mode allows running CNMFE in parallel on multicore CPUs, yielding a speedup roughly proportional to the number of available cores.
Multibatch mode builds on patchmode and is optimized for even larger datasets, especially data collected over multiple sessions/days. This mode segments data into multiple batches temporally and assumes that the neuron footprints $A$ are shared across all batches. We process each batch using patch mode and perform partial weighted updates on $A$ given the traces $C$ obtained in each batch.
All modes also include a logging system for keeping track of manual interventions and intermediate operations.
The Python implementation is similar; see Giovannucci et al., 2017b) for full details.
Running time
Request a detailed protocolTo provide a sense of the running time of the different steps of the algorithm, we timed the code on the simulation data shown in Figure 4. This dataset is $253\times 316$ pixels $\times 2000$ frames. The analyses were performed on a desktop with Intel Xeon CPU E52650 v4 @2.20 GHz and 128 GB RAM running Ubuntu 16.04. We used a parallel implementation for performing the CNMFE analysis, with patch size $64\times 64$ pixels, using up to 12 cores. PCA/ICA took $\sim 211$ seconds to converge, using 250 PCs and 220 ICs. CNMFE spent 55 s for initialization, 1 s for merging and deleting components, 110 s for the first round of the background estimation and 40 s in the following updates, 8 s for picking neurons from the residual, and 10 s per iteration for updating spatial ($A$) and temporal ($C$) components, resulting in a total of 258 s.
Finally, Table 3 shows the running time of processing the four experimental datasets.
Simulation experiments
Details of the simulated experiment of Figure 2
Request a detailed protocolThe field of view was $256\times 256$, with 1000 frames. We simulated 50 neurons whose shapes were simulated as spherical 2D Gaussian. The neuron centers were drawn uniformly from the whole FOV and the Gaussian widths ${\sigma}_{x}$ and ${\sigma}_{y}$ for each neuron was also randomly drawn from $\mathcal{N}(\frac{l}{4},{(\frac{1}{10}\frac{l}{4})}^{2})$, where $l=12$ pixels. Spikes were simulated from a Bernoulli process with probability of spiking per timebin $0.01$ and then convolved with a temporal kernel $g(t)=\mathrm{exp}(t/{\tau}_{d})\mathrm{exp}(t/{\tau}_{r})$, with fall time ${\tau}_{d}=6$ timebin and rise time ${\tau}_{r}=1$ timebin. We simulated the spatial footprints of local backgrounds as 2D Gaussian as well, but the mean Gaussian width is 5 times larger than the neurons’ widths. As for the spatial footprint of the blood vessel in Figure 2A, we simulated a cubic function and then convolved it with a 2D Gaussian (Gaussian width=$3$pixel). We use a random walk model to simulate the temporal fluctuations of local background and blood vessel. For the data used in Figure 2A–H, there were 23 local background sources; for Figure 2I, we varied the number of background sources.
We used the raw data to estimate the background in CNMFE without subtracting the neural signals $\widehat{A}\widehat{C}$ in problem (PB). We set ${l}_{n}=15$ pixels and left the remaining parameters at their default values. The plain NMF was performed using the builtin MATLAB function nnmf, which utilizes random initialization.
Details of the simulated experiment of Figure 3, Figure 4 and Figure 5
Request a detailed protocolWe used the same simulation settings for both Figure 3 and Figure 4. The field of view was $253\times 316$ and the number of frames was 2000. We simulated 200 neurons using the same method as the simulation in Figure 2, but for the background we used the spatiotemporal activity of the background extracted using CNMFE from real experimental data (data not shown). The noise level $\mathrm{\Sigma}$ was also estimated from the data. When we varied the SNR in Figure 4D–G, we multiplied $\mathrm{\Sigma}$ with an SNR reduction factor.
We set $l=12$ pixels to create the spatial filtering kernel. As for the thresholds used for determining seed pixels, we varied them for different SNR settings by visually checking the corresponding local correlation images and PNR images. The selected values were ${L}_{\mathrm{min}}=[0.9,0.8,0.8,0.8,0.6,0.6]$ and ${P}_{\mathrm{min}}=[15,10,10,8,6,6]$ for different SNR reduction factors $[1,2,3,4,5,6]$. For PCA/ICA analysis, we set the number of PCs and ICs as 600 and 300, respectively.
The simulation in Figure 5 only includes two neurons (as seen in Figure 3E) using the same simulation parameters. We replaced their temporal traces ${\mathit{\bm{c}}}_{1}$ and ${\mathit{\bm{c}}}_{2}$ with $(1\rho ){\mathit{\bm{c}}}_{1}+\rho {\mathit{\bm{c}}}_{3}$ and $(1\rho ){\mathit{\bm{c}}}_{2}+\rho {\mathit{\bm{c}}}_{3}$, where $\rho $ is tuned to generate different correlation levels ($\gamma $), and ${\mathit{\bm{c}}}_{3}$ is simulated in the same way as ${\mathit{\bm{c}}}_{1}$ and ${\mathit{\bm{c}}}_{2}$. We also added a new background source whose temporal profile is ${\mathit{\bm{c}}}_{3}$ to increase the neuronbackground correlation as $\rho $ increases. CNMFE was run as in Figure 4. We used 20 PCs and ICs for PCA/ICA.
In vivo microendoscopic imaging and data analysis
For all experimental data used in this work, we ran both CNMFE and PCA/ICA. For CNMFE, we chose parameters so that we initialized about 10–20% extra components, which were then merged or deleted (some automatically, some under manual supervision) to obtain the final estimates. Exact parameter settings are given for each dataset below. For PCA/ICA, the number of ICs were selected to be slightly larger than our extracted components in CNMFE (as we found this led to the best results for this algorithm), and the number of PCs was selected to capture over 90% of the signal variance. The weight of temporal information in spatiotemporal ICA was set as 0.1. After obtaining PCA/ICA filters, we again manually removed components that were clearly not neurons based on neuron morphology.
We computed the SNR of extracted cellular traces to quantitatively compare the performances of two approaches. For each cellular trace $\mathit{\bm{y}}$, we first computed its denoised trace $\mathit{\bm{c}}$ using the selected deconvolution algorithm (here, it is thresholded OASIS); then the SNR of $\mathit{\bm{y}}$ is
For PCA/ICA results, the calcium signal $\mathit{\bm{y}}$ of each IC is the output of its corresponding spatial filter, while for CNMFE results, it is the trace before applying temporal deconvolution, that is, ${\widehat{\mathit{\bm{y}}}}_{i}$ in Equation (9). All the data can be freely accessed online (Zhou et al., 2017).
Dorsal striatum data
Request a detailed protocolExpression of the genetically encoded calcium indicator GCaMP6f in neurons was achieved using a recombinant adenoassociated virus (AAV) encoding the GCaMP6f protein under transcriptional control of the synapsin promoter (AAVSynGCaMP6f). This viral vector was packaged (Serotype 1) and stored in undiluted aliquots at a working concentration of $>1012$ genomic copies per ml at ${80}^{\circ}$C until intracranial injection. $500\text{}\mu$l of AAV1SynGCaMP6f was injected unilaterally into dorsal striatum ($0.6$ mm anterior to Bregma, $2.2$mm lateral to Bregma, $2.5$mm ventral to the surface of the brain). 1 week postinjection, a $1$mm gradient index of refraction (GRIN) lens was implanted into dorsal striatum $\sim 300\mu $m above the center of the viral injection. Three weeks after the implantation, the GRIN lens was reversibly coupled to a miniature onephoton microscope with an integrated $475$nm LED (Inscopix). Using nVistaHD Acquisition software, images were acquired at 30 frames per second with the LED transmitting $0.1$ to $0.2$ mW of light while the mouse was freely moving in an openfield arena. Images were down sampled to $10$Hz and processed into TIFFs using Mosaic software. All experimental manipulations were performed in accordance with protocols approved by the Harvard Standing Committee on Animal Care following guidelines described in the US NIH Guide for the Care and Use of Laboratory Animals.
The parameters used in running CNMFE were: $l=13$ pixels, ${l}_{n}=18$ pixels, ${L}_{\mathrm{min}}=0.7$, and ${P}_{min}=7.728$ components were initialized from the raw data in the first pass before subtracting the background, and then additional components were initialized in a second pass. Highly correlated nearby components were merged and false positives were removed using the automated approach described above. In the end, we obtained $692$ components.
Prefrontal cortex data
Request a detailed protocolCortical neurons were targeted by administering two microinjections of 300 ul of AAVDJCamkIIaGCaMP6s (titer: 5.3 × 1012, 1:6 dilution, UNC vector core) into the prefrontal cortex (PFC) (coordinates relative to bregma; injection 1: +1.5 mm AP, 0.6 mm ML, −2.4 ml DV; injection 2: +2.15 AP, 0.43 mm ML, −2.4 mm DV) of an adult male wild type (WT) mice. Immediately following the virus injection procedure, a 1 mm diameter GRIN lens implanted 300 um above the injection site (coordinates relative to bregma: +1.87 mm AP, 0.5 mm ML, −2.1 ml DV). After sufficient time had been allowed for the virus to express and the tissue to clear underneath the lens (3 weeks), a baseplate was secured to the skull to interface the implanted GRIN lens with a miniature, integrated microscope (nVista, 473 nm excitation LED, Inscopix) and subsequently permit the visualization of Ca2 +signals from the PFC of a freely behaving mouse. The activity of PFC neurons were recorded at 15 Hz over a 10 min period (nVista HD Acquisition Software, Inscopix) while the test subject freely explored an empty novel chamber. Acquired data was spatially down sampled by a factor of 2, motion corrected, and temporally down sampled to 15 Hz (Mosaic Analysis Software, Inscopix). All procedures were approved by the University of North Carolina Institutional Animal Care and Use Committee (UNC IACUC).
The parameters used in running CNMFE were: $l=13$ pixels, ${l}_{n}=18$ pixels, ${L}_{\mathrm{min}}=0.9$, and ${P}_{\mathrm{min}}=15$. There were $169$ components initialized in the first pass and we obtained $225$ components after running the whole CNMFE pipeline.
Ventral hippocampus data
Request a detailed protocolThe calcium indicator GCaMP6f was expressed in ventral hippocampalamygdala projecting neurons by injecting a retrograde canine adeno type 2Cre virus (CAV2Cre; from Larry Zweifel, University of Washington) into the basal amydala (coordinates relative to bregma: −1.70 AP, 3.00 mm ML, and −4.25 mm DV from brain tissue at site), and a Credependent GCaMP6f adeno associated virus (AAV1flexSynapsinGCaMP6f, UPenn vector core) into ventral CA1 of the hippocampus (coordinates relative to bregma: −3.16 mm AP, 3.50 mm ML, and −3.50 mm DV from brain tissue at site). A 0.5 mm diameter GRIN lens was then implanted over the vCA1 subregion and imaging began 3 weeks after surgery to allow for sufficient viral expression. Mice were then imaged with Inscopix miniaturized microscopes and nVistaHD Acquisition software as described above; images were acquired at 15 frames per second, while mice explored an anxiogenic Elevated Plus Maze arena. Videos were motion corrected and spatially downsampled using Mosaic software. All procedures were performed in accordance with protocols approved by the New York State Psychiatric Institutional Animal Care and Use Committee following guidelines described in the US NIH Guide for the Care and Use of Laboratory Animals.
The parameters used in running CNMFE were: $l=15$ pixels, ${l}_{n}=30$ pixels, $\zeta =10$, ${L}_{\mathrm{min}}=0.9$, and ${P}_{\mathrm{min}}=15$. We first temporally downsampled the data by $2$. Then we applied CNMFE to the downsampled data. There were $53$ components initialized. After updating the background component, the algorithm detected six more neurons from the residual. We merged most of these components and deleted false positives. In the end, there were $24$ components left. The intermediate results before and after each manual intervention are shown in Video 10.
BNST data with footshock
Request a detailed protocolCalcium indicator GCaMP6s was expressed within CaMKIIexpressing neurons in the BNST by injecting the recombinant adenoassociated virus AAVdjCaMKIIGCaMP6s (packaged at UNC Vector Core) into the anterior dorsal portion of BNST (coordinates relative to bregma: 0.10 mm AP, −0.95 mm ML, −4.30 mm DV). A 0.6 mm diameter GRIN lens was implanted above the injection site within the BNST. As described above, images were acquired using a detachable miniature onephoton microscope and nVistaHD Acquisition Software (Inscopix). Images were acquired at 20 frames per second while the animal was freely moving inside a soundattenuated chamber equipped with a house light and a white noise generator (Med Associates). Unpredictable foot shocks were delivered through metal bars in the floor as an aversive stimulus during a 10 min session. Each unpredictable foot shock was 0.75 mA in intensity and 500 ms in duration on a variable interval (VI60). As described above, images were motion corrected, downsampled and processed into TIFFs using Mosaic Software. These procedures were conducted in adult C57BL/6J mice (Jackson Laboratories) and in accordance with the Guide for the Care and Use of Laboratory Animals, as adopted by the NIH, and with approval from the Institutional Animal Care and Use Committee of the University of North Carolina at Chapel Hill (UNC).
The parameters used in running CNMFE were: $l=15$ pixels, ${l}_{n}=23$ pixels, $\zeta =10$, ${L}_{\mathrm{min}}=0.9$, and ${P}_{\mathrm{min}}=15$. There were $149$ components initialized and we detected $29$ more components from the residual after estimating the background. there were $127$ components left after running the whole pipeline.
Code availability
Request a detailed protocolAll analyses were performed with customwritten MATLAB code. MATLAB implementations of the CNMFE algorithm can be freely downloaded from https://github.com/zhoupc/CNMF_E (Zhou, 2017a). We also implemented CNMFE as part of the Python package CaImAn (Giovannucci et al., 2017b), a computational analysis toolbox for largescale calcium imaging and behavioral data (https://github.com/simonsfoundation/CaImAn [Giovannucci et al., 2017a]).
The scripts for generating all figures and the experimental data in this paper can be accessed from https://github.com/zhoupc/eLife_submission (Zhou, 2017b).
Data availability

Data from: Efficient and accurate extraction of in vivo calcium signals from microendoscopic video dataAvailable at Dryad Digital Repository under a CC0 Public Domain Dedication.
References

Automatic neuron detection in calcium imaging data using convolutional networksAdvances in Neural Information Processing Systems 29:3270–3278.

Robust regression via hard thresholdingAdvances in Neural Information Processing Systems 28:721–729.

ConferenceCellular resolution calcium imaging and optogenetic excitation reveal a role for IL to NAc projection neurons in encoding of spatial information during cocaineseekingSociety for Neuroscience.

Fast local algorithms for large scale nonnegative matrix and tensor factorizationsIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E92A:708–721.https://doi.org/10.1587/transfun.E92.A.708

Hierarchical ALS Algorithms for Nonnegative Matrix and 3D Tensor FactorizationLecture Notes in Computer Science 4666:169–176.https://doi.org/10.1007/9783540744948_22

Calcium imaging of sleepwake related neuronal activity in the dorsal ponsNature Communications 7:10763.https://doi.org/10.1038/ncomms10763

ConferenceFunction of Basal Ganglia Circuitry in MotivationSociety for Neuroscience.

Multiscale approaches for highspeed imaging and analysis of large neural populationsPLOS Computational Biology 13:e1005685.https://doi.org/10.1371/journal.pcbi.1005685

Fast online deconvolution of calcium imaging dataPLOS Computational Biology 13:e1005423–1005426.https://doi.org/10.1371/journal.pcbi.1005423

Miniaturized integration of a fluorescence microscopeNature Methods 8:871–878.https://doi.org/10.1038/nmeth.1694

ConferenceCaImAn: an open source toolbox for large scale calcium imaging data analysis on standalone machinesCosyne2017.

Image analysis using mathematical morphologyIEEE Transactions on Pattern Analysis and Machine Intelligence 9:532–550.https://doi.org/10.1109/TPAMI.1987.4767941

Calcium imaging of basal forebrain activity during innate and learned behaviorsFrontiers in Neural Circuits 10:1–12.https://doi.org/10.3389/fncir.2016.00036

ConferenceSubcortical projectionspecific control of innate anxiety and learned fear by the ventral hippocampusSociety for Neuroscience.

ConferenceIn vivo calcium imaging of hippocampal neuronal network activity associated with memory behavior deficits in the Alzheimer’s disease mouse modelSociety for Neuroscience.

ConferenceNeural sequences underlying the rapid learning of new syllables in juvenile zebra finchesSociety for Neuroscience.

ConferenceIn vivo calcium imaging to assess the role of prelimbic cortex neuronal ensembles in encoding reinstatement of palatable foodseeking in ratsSociety for Neuroscience.

ConferenceDetecting action potentials in neuronal populations with calcium imagingSociety for Neuroscience.

Extracting regions of interest from biological images with convolutional sparse block codingAdvances in Neural Information Processing Systems 26:1745–1753.

2013 Asilomar Conference on Signals, Systems and Computers349–353, Bayesian spike inference from calcium imaging data, 2013 Asilomar Conference on Signals, Systems and Computers, 10.1109/ACSSC.2013.6810293.

Identification of a motortoauditory pathway important for vocal learningNature Neuroscience 20:978–986.https://doi.org/10.1038/nn.4563

ConferenceNociceptin neurons in the bed nucleus of the stria terminalis regulate anxietySociety for Neuroscience.

Oxytocinreceptorexpressing neurons in the parabrachial nucleus regulate fluid intakeNature Neuroscience 20:1722–1733.https://doi.org/10.1038/s415930170014z

Parallel processing of visual space by neighboring neurons in mouse visual cortexNature Neuroscience 13:1144–1149.https://doi.org/10.1038/nn.2620

ConferenceAction planning and action observation in rodent parietal cortexSociety for Neuroscience.

ConferenceEncoding the relationship between anxietyrelated behaviors and nociceptin neurons of the bed nucleus of the stria terminalisSociety for Neuroscience.

Fast nonnegative deconvolution for spike train inference from population calcium imagingJournal of Neurophysiology 104:3691–3704.https://doi.org/10.1152/jn.01073.2009

Spike inference from calcium imaging using sequential Monte Carlo methodsBiophysical Journal 97:636–655.https://doi.org/10.1016/j.bpj.2008.08.005

The central amygdala controls learning in the lateral amygdalaNature Neuroscience 20:1680–1685.https://doi.org/10.1038/s4159301700099

Longterm dynamics of CA1 hippocampal place codesNature Neuroscience 16:264–266.https://doi.org/10.1038/nn.3329

Miniature microscopes for largescale imaging of neuronal activity in freely behaving rodentsCurrent Opinion in Neurobiology 32:141–147.https://doi.org/10.1016/j.conb.2015.04.001
Decision letter

David C Van EssenReviewing Editor; Washington University in St. Louis, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article " Efficient and accurate extraction of in vivo calcium signals from microendoscopic video data" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by David Van Essen as the Senior Editor and Reviewing Editor. One of the reviewers was Dr Timothy Holy (Reviewer 2); the other two have opted to remain anonymous.
The reviewers have discussed the reviews with one another and the Senior Editor has drafted this decision to help you prepare a revised submission. With regard to the issue raised with you by email as to whether the new algorithm's success is perhaps more of an accident of initialization than a reflection of good statistical design, the reviewers found your response to be thoughtful and helpful, but not entirely convincing. We therefore plan to ask the original reviewers to take another look at your revised manuscript before making a final decision.
Summary:
This is an excellent paper showing how the CNMF technique has been adapted for use with 1P calcium imaging in endoscopic data. The paper is for the most part clearly written, and the simulations backed up by the invivo imaging data clearly demonstrates that this technique is by far superior to any other currently being used, including ICA/PCA or other variations. It builds on previous work from Pnevmatikakis and colleagues, differing primarily in the statistical model used to describe the background. It will likely be essential to use this technique to obtain the highest quality data, as background fluctuations and crosstalk between neurons can severely impact the data recorded.
However, some aspects of the analysis are difficult to judge, and there are a number of concerns and recommendations raised by the reviewers.
Essential revisions:
1) It is important to discuss carefully to what degree this new algorithm's success reflects more of an accident of initialization than good statistical design. Given that (a) SVD will always produce a lower reconstruction error than NMF for the same number of components, and that (b) having a high background might allow the neural components to "insulate" themselves from the constraints of nonnegativity by offsetting them from zero, it seems possible that the model also admits solutions that are better from the standpoint of the loss function yet worse from the standpoint of biological plausibility. As a consequence, it may only be the initialization causing the algorithm to land in a local (but far from global) minimum that is leading to the kinds of plausible solutions exhibited in the manuscript; the risk is that a more exhaustive optimization algorithm, applied to the same data and model, would prefer solutions that lack this plausibility.
2) The exposition of the CNMFE algorithm in this paper could stand to be improved substantially. It is extremely hard (or, really, impossible) to make sense of the details of the proposed algorithm as presented in this paper! While the big picture of the algorithm is clear from a quick glance at Equation 8, the details are not clear from the many pages of text that follow.
3) There are concerns about how well the technique would work if the true firing correlations increase. Can the authors do a simulation where they increase pairwise correlations between the deltaF traces systematically and see at what point segmentation breaks down or crosstalk removal artificially lowers the correlations.
4) Please provide more practical advice about how to implement the software. This method is very computationally intensive, and some direction needs to be given on how to run the software to allow a large number of movies to be analyzed in a reasonable amount of time. There are no benchmarks given (from my reading) on how long the analysis takes per minute of recording and how this can be optimized.
5) Please comment on success of being able to segment the same region over several recordings over days and match up neurons across days?
6) The authors should also present what happens when they iterate their model to full convergence (e.g. square root of the machine precision) and discuss the heuristics they use to choose when to terminate the iteration early.
7) How does the quality of the neuronal reconstruction compare if you just use CNMF on the spatiallyfiltered (and 0truncated) image? If it performs similarly, it's not entirely clear that this more complex model represents much of an advance.
8) It is mentioned in subsection “in vivomicroendoscopic imaging and data analysis” that manual interventions were applied in the data analysis shown in this paper. This makes the comparisons shown in this paper unfair to competitors that are fully automated (since of course any method can be improved using manual intervention). Can the authors show results of CNMFE without manual intervention? Also, how much manual investment (time, numbers of each type of decision, etc.) is necessary, and how much the manual intervention improves the result.
9) How does CNMFE fare against CNMF when applied to twophoton data? i.e., when the additional flexibility of the model is perhaps not essential, does this extra flexibility degrade the performance in some fashion?
10) It is important to make the data and the scripts available for recreating the figures shown in this paper, for both the simulated and the real data. Otherwise it will be very hard for others to apply CNMFE and get comparable results, given the many tuning parameters and semimanual interventions involved.
11) What happens if one passes in the initialization described in subsection “Initialization of model variables” to the CNMF algorithm or to other competitors in the literature?
Reviewer #1:
This is an excellent paper showing how the CNMF technique has been adapted for use with 1P calcium imaging in endoscopic data. The paper is very clearly written, and the simulations backed up by the invivo imaging data clearly demonstrates that this technique is by far superior to any other currently being used, including ICA/PCA or other variations. In fact, it will be essential to use this technique to obtain the highest quality data as background fluctuations and crosstalk between neurons can severely impact the data recorded.
These are my concerns:
1) I have some concerns about how well the technique would work if the true firing correlations increase. Can the authors do a simulation where they increase pairwise correlations between the deltaF traces systematically and see at what point segmentation breaks down or crosstalk removal artificially lowers the correlations.
2 I felt that the authors could provide more practical advice about how to implement the software. This method is very computationally intensive, and some direction needs to be given on how to run the software to allow a large number of movies to be analyzed in a reasonable amount of time. There are no benchmarks given (from my reading) on how long the analysis takes per minute of recording and how this can be optimized.
3) Can the authors comment on success of being able to segment the same region over several recordings over days and match up neurons across days?
Reviewer #2:
The manuscript by Zhou and colleagues presents a computational method for extracting calcium signals from neurons in images that are "corrupted" by high background, with a specific interest in microendoscopic recordings. It builds on previous work from Pnevmatikakis and colleagues, differing primarily in the statistical model used to describe the background. The authors present several examples using both simulations and real experimental data to demonstrate the characteristics of their new method. In comparison with their previous method and a PCA/ICA method, the authors show examples where the new method outperforms the previous one.
The manuscript has many strengths, including the application to several different in vivo data sets and the realisticlooking simulated (with available ground truth) data sets. The model also seems thoughtfully designed, and considerable effort was made to ensure that it will be practical at least for data sets of the size of typical endoscopic recordings. The figures are mostly clear except as noted below.
Overall, this may be a worthy addition to the armamentum of methods; however, some aspects of the analysis are difficult to judge, and my overall assessment is that its importance is not yet clear. My major questions/concerns are:
1) First, this reviewer doesn't fully understand why/how the background/foreground separation works. Loosely, the model is of the form `Y ≈ b + a*c`, where `b` is the background and `a` and `c` are the spatial/temporal components of the neurons. In previous work they argued that a nonnegativity constraint on `a` and `c` led to substantial improvement over unconstrained methods and effective separation of the neurons from a (small) background. However, in this case `b` is of quite large magnitude, and one might expect this to effectively "relax" the constraint on `a` and `c`: the model could fit a smaller value for `b` in the vicinity of a neuron, allowing `a` and `c` to take on larger positive values and thus "cushioning" them against the impact of the nonnegativity constraint. In a sense, one might imagine that the fullyconverged result would not be terribly different from a (sparse, local) singular value decomposition, which is known to not properly segment cells particularly in the presence of temporal correlations among nearby/overlapping cells.
Nevertheless, the temporal traces exhibit the characteristics expected from a "meaningful" nonnegative factorization. So, the question is, why? One possible answer is that the authors are not iterating their model to convergence, and that an intermediate initialization via a "meaningful" nonnegative step still has substantial influence over the final form of the solution. If this is the case, the authors should also present what happens when they iterate their model to full convergence (e.g. square root of the machine precision) and discuss the heuristics they use to choose when to terminate the iteration early.
2) For the real datasets they initialize the neuronal factorization using a highpass spatial filtering of the raw image, then reintroduce the full image and fit the complete model. But in the end, we're mostly interested in the neurons, and we may not really care that much about an accurate model of the background. How does the quality of the neuronal reconstruction compare if you just use CNMF on the spatiallyfiltered (and 0truncated) image? If it performs similarly, it's not entirely clear that this more complex model represents much of an advance.
It's possible that there's a bit of data in the manuscript on this point (Figure 4B, black dots), but to this reviewer it's not clearly described, and this is shown only for simulated data. If you don't know the ground truth, would you care about the difference between 0.99 similarity and 0.999 similarity?
3) The authors describe some manual interventions in the methods, but there does not appear to be much in the way of description of exactly how these interventions were leveraged. It would be helpful to readers to understand what the "raw" output of the unsupervised algorithm really looks like, how much manual investment (time, numbers of each type of decision, etc.) is necessary, and how much the manual intervention improves the result.
Related to these points (particularly the first two), how does CNMFE fare against CNMF when applied to twophoton data? i.e., when the additional flexibility of the model is perhaps not essential, does this extra flexibility degrade the performance in some fashion?
Reviewer #3:
This paper introduces the CNMFE algorithm for extraction of neurons from 1photon calcium imaging data.
1) First and foremost, CNMFE has quickly become widelyadopted by scientists who are collecting microendoscopic video data. Clearly this paper is very high impact and deserves to be published in a prominent journal. I congratulate the authors for an important piece of work, which will surely have a sustained and important impact on the field.
2) I'm concerned about the semimanual "interventions" described in subsection “Interventions”. It is mentioned in subsection “in vivomicroendoscopic imaging and data analysis” that manual interventions were applied in the data analysis shown in this paper. This makes the comparisons shown in this paper unfair to competitors that are fully automated (since of course any method can be improved using manual intervention). Can the authors show results of CNMFE without manual intervention?
3) The CNMFE algorithm is just a slight tweak on CNMF, which is itself a pretty standard matrix factorization. This lack of statistical innovation is probably OK, though, since eLife is not looking to publish innovative statistical methodology, but rather an algorithm that works well and has very high impact.
4) I feel very strongly that the exposition of the CNMFE algorithm in this paper could stand to be improved substantially. It is extremely hard (or, really, impossible) to make sense of the details of the proposed algorithm as presented in this paper! While the big picture of the algorithm is clear from a quick glance at Equation 8, the details are not clear from the many pages of text that follow. When I read a statistical or computationalfocused paper, I wonder "would a statistically/computationallyliterate reader of this paper be able to reimplement this algorithm, from scratch, based on the description of this paper?" The answer to that question is definitely "no". More details about this are provided below.
5) It is really wonderful that the authors have easytouse code available for the CNMFE algorithm! However, it is also very important that they post scripts online recreating the figures shown in this paper, for both the simulated and the real data. For instance, I'd like to see a script that one can simply call into matlab that will read in the mouse dorsal striatum data and output exactly the results shown in Figure 5. (A similar request applies to the other figures). This is particularly important in light of the "semimanual interventions" mentioned in my point 2 above. Furthermore, the actual data used to make the figures (e.g. the calcium recordings for the mouse dorsal striatum area that were used to make Figure 5) should be made available.
Without the availability of the data and a script to get the results shown in the paper, I think it will be very hard for others to apply CNMFE and get comparable results, given the many tuning parameters and semimanual interventions involved. It will also be very hard for others to objectively evaluate how well CNMFE works, and how sensitive the results in this paper are to tuning parameter selection, etc.!
6) I think that most of the benefit of CNMFE over CNMF comes from the careful initialization of the algorithm described in subsection “Initialization of model variables”, rather than from the details of the optimization problem Equation 8. What happens if one passes in the initialization described in subsection “Initialization of model variables” to the CNMF algorithm or to other competitors in the literature?
[Editors' note: further revisions were requested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled "Efficient and accurate extraction of in vivo calcium signals from microendoscopic video data" for further consideration at eLife. Your revised article has been favorably evaluated by David Van Essen (Senior Editor and Reviewing Editor), and the original three reviewers.
The manuscript has been improved but there are two small remaining technical points that need to be addressed before acceptance, as outlined below. Both should be easy to address.
1) In subsection “Fitting the CNMFE model”, the optimization problem (PAll) needs to list s and B^f as optimization variables (i.e. they should appear under 'minimize'). While the reviewer agrees with the authors that "s_i is completely determined by c_i and G^{(i)}, and B^f is not optimized explicitly", nonetheless s_i and B^f are optimization variables. This is actually just a fundamental math issue that has nothing to do with the specifics of the paper at hand!
To understand this point, consider the optimization problem "minimize_{x,y} f(x,y) s.t. x=y". It would NOT be correct to instead write this as "minimize_x f(x,y) s.t. x=y", EVEN THOUGH (to quote the authors), "x is completely determined by y". (Why is "minimize_x f(x,y) s.t. x=y" incorrect? Well, unless y is listed as an optimization variable, then it is treated as a constant in the optimization problem; therefore, the solution to that problem is trivially just x=y, with minimum value attained at f(y,y).)
Thus, (PAll) is simply incorrect as written. The fix is simple: put "s_i" and "B^f" under the "minimize".
2) In a couple of places (e.g. subsections “Fitting the CNMFE model”and” Ranking seed pixels”) it is mentioned that (PAll) is not jointly convex in the optimization variables. Unfortunately, though this statement is true, it is misleading: when we say that a problem is not "jointly convex", we are implying that it is convex in the individual optimization variables. By contrast, (PAll) isn't even convex in the individual optimization variables. This is because details aren't provided of what (for instance) "is sparse" means. If that means that the l1 norm is penalized, then yes, it is convex in the individual optimization variables. But if it means some other notion of sparsity, e.g. that the l0 norm is penalized, then (PAll) isn't convex in the individual optimization variables.
Therefore, the reviewer feels that stating that (PAll) is "not jointly convex" or "jointly nonconvex" is misleading and should be removed from the text where it appears. The authors should instead just say "nonconvex" without the word "jointly".
https://doi.org/10.7554/eLife.28728.030Author response
Thank you for your letter and for the thoughtful and constructive comments, which helped us greatly. We believe the revision is a substantial improvement over the original submission.
Essential revisions:
1) It is important to discuss carefully to what degree this new algorithm's success reflects more of an accident of initialization than good statistical design. Given that (a) SVD will always produce a lower reconstruction error than NMF for the same number of components, and that (b) having a high background might allow the neural components to "insulate" themselves from the constraints of nonnegativity by offsetting them from zero, it seems possible that the model also admits solutions that are better from the standpoint of the loss function yet worse from the standpoint of biological plausibility. As a consequence, it may only be the initialization causing the algorithm to land in a local (but far from global) minimum that is leading to the kinds of plausible solutions exhibited in the manuscript; the risk is that a more exhaustive optimization algorithm, applied to the same data and model, would prefer solutions that lack this plausibility.
Thanks for these thoughtful points.
Regarding the importance of initialization, we note first that the initialization results are typically not perfect; running the iterative optimization proposed here does lead to quantitatively better solutions, as shown in simulated data in Figure 4 and the new Figure 5. In real data, we often find that the initialization picks out highSNR neurons but misses lowSNR neurons that can only be detected after further iterations. Second, the results of Figure 4 further show that if we apply a perfect initialization (i.e., initialize with the ground truth) and then apply a statistical model that fits the data poorly (e.g., CNMF with a rank1 background model) then we obtain significantly worse results than CNMFE initialized with no prior knowledge of the ground truth. We have clarified both of these important points in the revised text (see Results section). Regarding the reviewer’s interesting point about the SVD: we agree that SVD minimizes a certain approximation norm: svds(Y,k) finds the closest rankk matrix to Y in terms of the meansquare reconstruction error. However, the resulting singular vectors will typically be very poor estimators of the neuronal components, for reasons that are wellunderstood: unlike real neural components, the singular vectors will be orthogonal, not sparse, not nonnegative, not spatially localized, etc. In short, the SVD is solving the wrong optimization problem, because while it is minimizing a reasonable cost function it is optimizing over the wrong constraint space. We don’t want to search over all possible lowrank matrices to approximate the data matrix Y; instead, to obtain good estimates we need to impose constraints both on the singleneuron components (e.g. sparsity and nonnegativity of neural activity, sparsity and locality of neuronal shapes, etc.) and on the background (which is spatially smooth, as encoded by the novel “ring” background model introduced here). We have tried to impose these constraints in a tractable manner in this work.
So, to address this question of artificially increasing the size of the background: yes, by doing this one could reduce the meansquared reconstruction error but at the cost of forcing the inferred neural activity to be highly nonsparse and the background to be highly nonsmooth in the spatial domain. In other words, this solution would achieve a good objective value (like the SVD solution) but would grossly violate the constraints we have placed on the problem, and thus this solution is not favored by our optimizer – as desired.
Regarding the point about local vs global optima – it is theoretically possible there is a solution that satisfies the constraints and gives better MSE but is less neurally plausible. Due to the nonconvexity of the objective function completely exhaustive search seems intractable. But our experience as described in this manuscript (and the experience of our users) has been that solutions that find a good MSE under the objective function are “plausible.” The new Figure 4B provides additional support for this claim; see further discussion below.
2) The exposition of the CNMFE algorithm in this paper could stand to be improved substantially. It is extremely hard (or, really, impossible) to make sense of the details of the proposed algorithm as presented in this paper! While the big picture of the algorithm is clear from a quick glance at Equation 8, the details are not clear from the many pages of text that follow.
Thanks for this feedback. We have edited the text significantly to clarify a number of details that were not sufficiently clear in the original manuscript. See detailed changes discussed further below.
3) There are concerns about how well the technique would work if the true firing correlations increase. Can the authors do a simulation where they increase pairwise correlations between the deltaF traces systematically and see at what point segmentation breaks down or crosstalk removal artificially lowers the correlations.
Great suggestion. See the new Figure 5 and the new text paragraph in subsection “initialization of model variables” describing highly spatially or temporallycorrelated data.
4) Please provide more practical advice about how to implement the software. This method is very computationally intensive and some direction needs to be given on how to run the software to allow a large number of movies to be analyzed in a reasonable amount of time. There are no benchmarks given (from my reading) on how long the analysis takes per minute of recording and how this can be optimized.
Good suggestion. We added more practical advice right after the summary of the CNMFE pipeline (subsection “Initialization of model variables”). We also improved our CNMFE implementation to support large scale data analysis and to process multiple videos together. Details are included in the revised manuscript (subsection “Ranking seed pixels”). We also provided more timing information. Regarding optimization of timing, we believe further significant speedups could be achieved with more parallelization, implementation in lowlevel languages, etc., and plan to pursue these directions in the future, but we chose to focus this manuscript on the model, estimation, and demonstrations on real and simulated data, rather than these lowerlevel timing optimizations.
We are also maintaining wiki pages for CNMFE (https://github.com/zhoupc/CNMF_E/wiki) for continuously providing practical advice about how to use the software.
5) Please comment on success of being able to segment the same region over several recordings over days and match up neurons across days?
We think CNMFE should make this task significantly easier (because the output SNR is much improved compared to PCA/ICA), but we have not pursued a quantitative analysis of this point. A recent paper (Sheintuch et al., 2017) addressed exactly this issue and showed that more neurons could be reliably tracked using CNMFE vs PCA/ICA; we have added a reference to this work.
6) The authors should also present what happens when they iterate their model to full convergence (e.g., square root of the machine precision), and discuss the heuristics they use to choose when to terminate the iteration early.
We added a residual sum of squares (RSS) vs iteration plot in Figure 4. We note in the revised text that we can use the RSS as a convergence criterion, but actually computing the RSS is relatively slow; instead we monitor the estimated neural components A and C, e.g., the number of detected components.
7) How does the quality of the neuronal reconstruction compare if you just use CNMF on the spatiallyfiltered (and 0truncated) image? If it performs similarly, it's not entirely clear that this more complex model represents much of an advance.
Yes, this occurred to us as well. We tried this, and it doesn’t work well, because spatial filtering distorts the neural shapes and also removes the useful nonnegativity constraint from the optimization. We have added a note to the text making this point clear.
8) It is mentioned in subsection “in vivo microendoscopic imaging and data analysis” that manual interventions were applied in the data analysis shown in this paper. This makes the comparisons shown in this paper unfair to competitors that are fully automated (since of course any method can be improved using manual intervention). Can the authors show results of CNMFE without manual intervention? Also, how much manual investment (time, numbers of each type of decision, etc.) is necessary, and how much the manual intervention improves the result.
Fair points. We should clarify one point: it is not quite true that any method can be trivially improved using manual intervention. Specifically, PCA/ICA outputs a collection of spatial & temporal filters, after the user inputs the dataset and the number of principal components and the number of independent components. Users can then easily manually delete components; however, there is not an obvious way to merge or split neurons (e.g., after splitting a neural component it is not clear how to assign temporal traces to the resulting split components). More importantly, there is no mechanism for refining the analysis results after manual interventions (for example, to account for a gap left after deleting some bad components that were partially explaining some of the observed signal). In CNMFE, on the other hand, we can run additional iterative updates of the estimated components following manual interventions, which in practice can lead to significantly improved results. This point has been clarified in the revised text.
That said, to respond to the main point, we emphasize that the manual intervention in CNMFE is an optional choice for correcting results using human knowledge, instead of being required by processing data. For the simulated data and three out of the four real datasets in the revised manuscript (subsection “Initialization of model variables”), we now apply no manual interventions at all: the algorithm was run in fully automatic mode. For the more challenging hippocampal dataset, we show the results before and after manual intervention in Video S10.
Of course, fully automated algorithms are preferable where possible. We believe that most of the interventions discussed here (e.g., detection of “bad” components) can be cast as standard classification or computer vision tasks and can in the future be fully automated with sufficient training data; this is the topic of ongoing work. Our opensource pipeline makes it easy to swap in more automated intervention steps when and if these become available and reliable in the future. Again, these points have been clarified in the revised text.
9) How does CNMFE fare against CNMF when applied to twophoton data? i.e., when the additional flexibility of the model is perhaps not essential, does this extra flexibility degrade the performance in some fashion?
The CNMFE model also works for 2p data in which the background has simple spatiotemporal structure. This is hinted at in Figure 2I, where the CNMFE model works well for both a small and large number of background sources. Several labs already use CNMFE for processing their 2p data e.g. obtained via GRIN lenses; we hope to present these results in future work but decided against including these results here to avoid lengthening an alreadylong paper. That said, for “vanilla” 2p data where the rank1 background model suffices, the basic CNMF model is preferred in practice, because the resulting inference is faster (though not significantly more accurate) than in the CNMFE model. In the toolbox we have developed users can easily choose different initialization options and background models depending on the type of data under analysis.
10) It is important to make the data and the scripts available for recreating the figures shown in this paper, for both the simulated and the real data. Otherwise it will be very hard for others to apply CNMFE and get comparable results, given the many tuning parameters and semimanual interventions involved.
Agreed. We now provide all code and scripts for generating the figures and the videos (https://github.com/zhoupc/eLife_submission)
11) What happens if one passes in the initialization described in subsection “Initialization of model variables” to the CNMF algorithm or to other competitors in the literature?
This is an important point that we emphasize more clearly in the revised text (subsection “CNMFE accurately initializes singleneuronal spatial and temporal components”). In Figure 4 we initialize CNMF with the ground truth and find that it performs poorly, due to the poor fit of its background model here. (We obtained similar results with real instead of simulated data; in real microendoscopic data, performing CNMF iterations significantly reduces the SNR of the estimated traces, due to background contamination.)
Reviewer #1:
This is an excellent paper showing how the CNMF technique has been adapted for use with 1P calcium imaging in endoscopic data. The paper is very clearly written, and the simulations backed up by the invivo imaging data clearly demonstrates that this technique is by far superior to any other currently being used, including ICA/PCA or other variations. In fact, it will be essential to use this technique to obtain the highest quality data as background fluctuations and crosstalk between neurons can severely impact the data recorded.
Thanks very much for your kind comments.
These are my concerns:
1) I have some concerns about how well the technique would work if the true firing correlations increase. Can the authors do a simulation where they increase pairwise correlations between the deltaF traces systematically and see at what point segmentation breaks down or crosstalk removal artificially lowers the correlations.
Please see our responses to the Essential revision question (Bhatia et al., 2015).
2 I felt that the authors could provide more practical advice about how to implement the software. This method is very computationally intensive, and some direction needs to be given on how to run the software to allow a large number of movies to be analyzed in a reasonable amount of time. There are no benchmarks given (from my reading) on how long the analysis takes per minute of recording and how this can be optimized.
Please see our responses to the Essential revision question (4).
3) Can the authors comment on success of being able to segment the same region over several recordings over days and match up neurons across days?
Please see our responses to the Essential revision question (Cameron et al., 2016).
Reviewer #2:
The manuscript by Zhou and colleagues presents a computational method for extracting calcium signals from neurons in images that are "corrupted" by high background, with a specific interest in microendoscopic recordings. It builds on previous work from Pnevmatikakis and colleagues, differing primarily in the statistical model used to describe the background. The authors present several examples using both simulations and real experimental data to demonstrate the characteristics of their new method. In comparison with their previous method and a PCA/ICA method, the authors show examples where the new method outperforms the previous one.
The manuscript has many strengths, including the application to several differentin vivo data sets and the realisticlooking simulated (with available ground truth) data sets. The model also seems thoughtfully designed, and considerable effort was made to ensure that it will be practical at least for data sets of the size of typical endoscopic recordings. The figures are mostly clear except as noted below.
Overall, this may be a worthy addition to the armamentum of methods; however, some aspects of the analysis are difficult to judge, and my overall assessment is that its importance is not yet clear. My major questions/concerns are:
1) First, this reviewer doesn't fully understand why/how the background/foreground separation works. Loosely, the model is of the form `Y ≈ b + a*c`, where `b` is the background and `a` and `c` are the spatial/temporal components of the neurons. In previous work they argued that a nonnegativity constraint on `a` and `c` led to substantial improvement over unconstrained methods and effective separation of the neurons from a (small) background. However, in this case `b` is of quite large magnitude, and one might expect this to effectively "relax" the constraint on `a` and `c`: the model could fit a smaller value for `b` in the vicinity of a neuron, allowing `a` and `c` to take on larger positive values and thus "cushioning" them against the impact of the nonnegativity constraint. In a sense, one might imagine that the fullyconverged result would not be terribly different from a (sparse, local) singular value decomposition, which is known to not properly segment cells particularly in the presence of temporal correlations among nearby/overlapping cells.
Nevertheless, the temporal traces exhibit the characteristics expected from a "meaningful" nonnegative factorization. So, the question is, why? One possible answer is that the authors are not iterating their model to convergence, and that an intermediate initialization via a "meaningful" nonnegative step still has substantial influence over the final form of the solution. If this is the case, the authors should also present what happens when they iterate their model to full convergence (e.g., square root of the machine precision), and discuss the heuristics they use to choose when to terminate the iteration early.
Thanks for these very thoughtful comments.
Re: the `Y ≈ b + a*c`model – yes, but additionally recall that the background term b here is constrained to have a strong degree of spatial smoothness, so the model can’t simply add a “hole” into the new timevarying background term b to somehow help a*c avoid the nonnegativity constraint. In addition, as noted above, the model would also have to violate the sparsity constraint on c to avoid the nonnegativity constraint. We have also significantly revised the “Algorithms for solving problem (PT)” subsection in the Materials and methods section to clarify this point.
Re: iterating to convergence – our results are not dependent on early stopping. See the new panel B in Figure 4, and also our responses to Essential revision question (Carvalho Poyraz et al., 2016).
2) For the real datasets they initialize the neuronal factorization using a highpass spatial filtering of the raw image, then reintroduce the full image and fit the complete model. But in the end, we're mostly interested in the neurons, and we may not really care that much about an accurate model of the background. How does the quality of the neuronal reconstruction compare if you just use CNMF on the spatiallyfiltered (and 0truncated) image? If it performs similarly, it's not entirely clear that this more complex model represents much of an advance.
See our response to the Essential revision question (Cichocki and Phan, 2009).
To expand on this slightly: we agree that many users may not care much about the background signals, just as many users of extracellular voltage recordings only care about extracted spikes, not local field potentials. But our goal for CNMFE is to demix and assign all the signals in the data correctly, with minimal spatial or temporal distortion. In our opinion it is likely that the background signal (like the local field potential) carries a lot of useful information which has to date been largely ignored, and we want to avoid corrupting or discarding this information. Without accurate backgroundsubtraction, the extracted temporal components will often be highly contaminated, as we have emphasized in this manuscript (subsection “Model and model fitting”).
It's possible that there's a bit of data in the manuscript on this point (Figure 4B, black dots), but to this reviewer it's not clearly described, and this is shown only for simulated data. If you don't know the ground truth, would you care about the difference between 0.99 similarity and 0.999 similarity?
Fair question. In addition to our response above (emphasizing the importance of a full separation of the movie data into properlydefined and interpretable spatial, temporal, and background components), our response to Essential Revision question (Apthorpe et al., 2016) touches on this issue (on fitting a complete model compared to just using the initializations directly). See also the new Figure 5, showing a more direct example of significant improvements of running the full CNMFE pipeline (not just the initialization).
3) The authors describe some manual interventions in the methods, but there does not appear to be much in the way of description of exactly how these interventions were leveraged. It would be helpful to readers to understand what the "raw" output of the unsupervised algorithm really looks like, how much manual investment (time, numbers of each type of decision, etc.) is necessary, and how much the manual intervention improves the result.
Please see our respondents to the Essential revision question (8).
Related to these points (particularly the first two), how does CNMFE fare against CNMF when applied to twophoton data? i.e., when the additional flexibility of the model is perhaps not essential, does this extra flexibility degrade the performance in some fashion?
Please see our respondents to the Essential revision question (9).
Reviewer #3:
This paper introduces the CNMFE algorithm for extraction of neurons from 1photon calcium imaging data.
1) First and foremost, CNMFE has quickly become widelyadopted by scientists who are collecting microendoscopic video data. Clearly this paper is very high impact and deserves to be published in a prominent journal. I congratulate the authors for an important piece of work, which will surely have a sustained and important impact on the field.
Thanks for this kind comment.
2) I'm concerned about the semimanual "interventions" described in subsection “Interventions”. It is mentioned in subsection “in vivo microendoscopic imaging and data analysis” that manual interventions were applied in the data analysis shown in this paper. This makes the comparisons shown in this paper unfair to competitors that are fully automated (since of course any method can be improved using manual intervention). Can the authors show results of CNMFE without manual intervention?
Please see our responses to the Essential revision question (8).
3) The CNMFE algorithm is just a slight tweak on CNMF, which is itself a pretty standard matrix factorization. This lack of statistical innovation is probably OK, though, since eLife is not looking to publish innovative statistical methodology, but rather an algorithm that works well and has very high impact.
We would argue against the comment that there is no statistical innovation here. Much careful, effective, cuttingedge statistical treatment of neural data might be dismissed as "slight tweaks," in this sense. But statistical methods cannot be judged solely from an abstract perspective, and as soon as methods are brought to bear on complicated problems, the "tweaks" constitute a major portion of the research effort. It is a serious statistical challenge to cleanly demix data in which the background artifacts are an order of magnitude larger than the desired singleneuronal signals – and as we have argued here, both PCA/ICA and vanilla CNMF do not do an adequate job of removing this background contamination. (To put it another way, we experimented with many “tweaks” of CNMF that did not work well; this was a highly nontrivial development process.) But that said, we agree that in the end the main differences between CNMF and CNMFE are a different background model and a different initialization procedure; these two innovations led to an algorithm that works well and that we hope will have a significant positive impact on the field.
4) I feel very strongly that the exposition of the CNMFE algorithm in this paper could stand to be improved substantially. It is extremely hard (or, really, impossible) to make sense of the details of the proposed algorithm as presented in this paper! While the big picture of the algorithm is clear from a quick glance at Equation 8, the details are not clear from the many pages of text that follow. When I read a statistical or computationalfocused paper, I wonder "would a statistically/computationallyliterate reader of this paper be able to reimplement this algorithm, from scratch, based on the description of this paper?" The answer to that question is definitely "no". More details about this are provided below.
Please see our responses to the Essential revision question (Barbera et al., 2016).
5) It is really wonderful that the authors have easytouse code available for the CNMFE algorithm! However, it is also very important that they post scripts online recreating the figures shown in this paper, for both the simulated and the real data. For instance, I'd like to see a script that one can simply call into matlab that will read in the mouse dorsal striatum data and output exactly the results shown in Figure 5. (A similar request applies to the other figures). This is particularly important in light of the "semimanual interventions" mentioned in my point 2 above. Furthermore, the actual data used to make the figures (e.g. the calcium recordings for the mouse dorsal striatum area that were used to make Figure 5) should be made available.
Without the availability of the data and a script to get the results shown in the paper, I think it will be very hard for others to apply CNMFE and get comparable results, given the many tuning parameters and semimanual interventions involved. It will also be very hard for others to objectively evaluate how well CNMFE works, and how sensitive the results in this paper are to tuning parameter selection, etc.!
Please see our responses to the Essential revision question (10).
6) I think that most of the benefit of CNMFE over CNMF comes from the careful initialization of the algorithm described in subsection “Initialization of model variables”, rather than from the details of the optimization problem Equation 8. What happens if one passes in the initialization described in subsection “Initialization of model variables” to the CNMF algorithm or to other competitors in the literature?
Please see our responses to the Essential revision question (Apthorpe et al., 2016) for explaining the limitations of the initialization step.
To our knowledge, the background model in CNMFE is the first one that accurately models the background components in this type of data; this new background model (along with our new initialization approach) is the main innovation of our work.
As for the suggestion of passing the initialization to other competitors, we did so using simulated data, where we passed in the ground truth spatial and temporal components to the vanilla CNMF algorithm (Figure 4). The results are much worse than CNMFE. This is a clear evidence that the low rank NMF background model used in vanilla CNMF is not enough for modeling the background components in microendoscopic data. We would expect similar results with related methods that use a similar lowrank matrix factorization model (e.g. DiegoAndilla and Hamprecht, 2013, or Maruyama et al., 2014). We also tried the similar model in Suite2p (Pachitariu et.al., 2016), which projects the background onto a prespecified spatial basis, and again found that this led to worse results in recovering the background signal in the datasets examined here.
PCA/ICA does not offer an easy method for incorporating warm initialization starts, as discussed further above.
[Editors' note: further revisions were requested prior to acceptance, as described below.]
1) In subsection “Fitting the CNMFE model”, the optimization problem (PAll) needs to list s and B^f as optimization variables (i.e. they should appear under 'minimize'). While the reviewer agrees with the authors that "s_i is completely determined by c_i and G^{(i)}, and B^f is not optimized explicitly", nonetheless s_i and B^f are optimization variables. This is actually just a fundamental math issue that has nothing to do with the specifics of the paper at hand!
To understand this point, consider the optimization problem "minimize_{x,y} f(x,y) s.t. x=y". It would NOT be correct to instead write this as "minimize_x f(x,y) s.t. x=y", EVEN THOUGH (to quote the authors), "x is completely determined by y". (Why is "minimize_x f(x,y) s.t. x=y" incorrect? Well, unless y is listed as an optimization variable, then it is treated as a constant in the optimization problem; therefore, the solution to that problem is trivially just x=y, with minimum value attained at f(y,y).)
Thus, (PAll) is simply incorrect as written. The fix is simple: put "s_i" and "B^f" under the "minimize".
Great suggestion. We made three changes:
a) in (PAll), we put 'S' and 'B^f' under the 'minimize'
b) in (PT), we put 'S' under the minimize
c) in (PB), we put 'B^f' under the minimize.
2) In a couple of places (e.g. subsections “Fitting the CNMFE model”and “Ranking seed pixels”) it is mentioned that (PAll) is not jointly convex in the optimization variables. Unfortunately, though this statement is true, it is misleading: when we say that a problem is not "jointly convex", we are implying that it is convex in the individual optimization variables. By contrast, (PAll) isn't even convex in the individual optimization variables. This is because details aren't provided of what (for instance) "is sparse" means. If that means that the l1 norm is penalized, then yes, it is convex in the individual optimization variables. But if it means some other notion of sparsity, e.g. that the l0 norm is penalized, then (PAll) isn't convex in the individual optimization variables.
Therefore, the reviewer feels that stating that (PAll) is "not jointly convex" or "jointly nonconvex" is misleading and should be removed from the text where it appears. The authors should instead just say "nonconvex" without the word "jointly".
Great suggestion. We removed the word ‘jointly’ in all places.
https://doi.org/10.7554/eLife.28728.031Article and author information
Author details
Funding
National Institute of Mental Health
 Pengcheng Zhou
 Jessica C Jimenez
 Rene Hen
 Mazen A Kheirbek
 Robert E Kass
National Institute on Drug Abuse
 Pengcheng Zhou
 Jose RodriguezRomaguera
 Garret D Stuber
Intelligence Advanced Research Projects Activity
 Pengcheng Zhou
 Liam Paninski
Defense Advanced Research Projects Agency
 Liam Paninski
Army Research Office
 Liam Paninski
National Institute of Biomedical Imaging and Bioengineering
 Liam Paninski
Eunice Kennedy Shriver National Institute of Child Health and Human Development
 Shanna L Resendez
 Garret D Stuber
Howard Hughes Medical Institute
 Jessica C Jimenez
National Institute on Aging
 Jessica C Jimenez
 Rene Hen
New York State Stem Cell Science
 Jessica C Jimenez
 Rene Hen
Hope for Depression Research Foundation
 Jessica C Jimenez
 Rene Hen
Canadian Institutes of Health Research
 Shay Q Neufeld
Simons Foundation
 Andrea Giovannucci
 Johannes Friedrich
 Eftychios A Pnevmatikakis
 Garret D Stuber
 Liam Paninski
International Mental Health Research Organization
 Mazen A Kheirbek
National Institute of Neurological Disorders and Stroke
 Bernardo L Sabatini
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We would like to thank CNMFE users who received early access to our package and provided tremendously helpful feedback and suggestions, especially James Hyde, Jesse Wood, and Sean Piantadosi in Susanne Ahmari’s lab in University of Pittsburgh, Andreas Klaus in Rui Costa’s Lab in the Champalimaud Neurobiology of Action Laboratory, Suoqin Jin in Xiangmin Xu’s lab at University of California  Irvine, Conor Heins at the National Institute of Drug Abuse, Chris Donahue in Anatol Kreitzer’s lab at University of California  San Francisco, Xian Zhang in Bo Li’s lab at Cold Spring Harbor Laboratory, Emily Mackevicius in Michale Fee’s lab at Massachusetts Institute of Technology, Courtney Cameron and Malavika Murugan in Ilana Witten’s lab at Princeton University, Pranav Mamidanna in Jonathan Whitlock’s lab at Norwegian University of Science and Technology, and Milekovic Tomislav in Gregoire Courtine’s group at EPFL. We also thank Andreas Klaus for valuable comments on the manuscript.
Ethics
Animal experimentation: These procedures were conducted in accordance with the Guide for the Care and Use of Laboratory Animals, as adopted by the NIH, and with approval from the Harvard Standing Committee on Animal Care (protocol number: IS00000571 ), or the University of North Carolina Institutional Animal Care and Use Committee (UNC IACUC, protocol number: 16075.0), or the New York State Psychiatric Institutional Animal Care and Use Committee (protocol number: NYSPI1412 ).
Reviewing Editor
 David C Van Essen, Washington University in St. Louis, United States
Publication history
 Received: May 19, 2017
 Accepted: February 20, 2018
 Accepted Manuscript published: February 22, 2018 (version 1)
 Version of Record published: March 27, 2018 (version 2)
Copyright
© 2018, Zhou et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 21,944
 Page views

 3,077
 Downloads

 171
 Citations
Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Neuroscience
Gustatory sensory neurons detect caloric and harmful compounds in potential food and convey this information to the brain to inform feeding decisions. To examine the signals that gustatory neurons transmit and receive, we reconstructed gustatory axons and their synaptic sites in the adult Drosophila melanogaster brain, utilizing a wholebrain electron microscopy volume. We reconstructed 87 gustatory projections from the proboscis labellum in the right hemisphere and 57 from the left, representing the majority of labellar gustatory axons. Gustatory neurons contain a nearly equal number of interspersed preand postsynaptic sites, with extensive synaptic connectivity among gustatory axons. Morphology and connectivitybased clustering revealed six distinct groups, likely representing neurons recognizing different taste modalities. The vast majority of synaptic connections are between neurons of the same group. This study resolves the anatomy of labellar gustatory projections, reveals that gustatory projections are segregated based on taste modality, and uncovers synaptic connections that may alter the transmission of gustatory signals.

 Neuroscience
Categorization of everyday objects requires that humans form representations of shape that are tolerant to variations among exemplars. Yet, how such invariant shape representations develop remains poorly understood. By comparing human infants (6–12 months; N=82) to computational models of vision using comparable procedures, we shed light on the origins and mechanisms underlying object perception. Following habituation to a neverbeforeseen object, infants classified other novel objects across variations in their component parts. Comparisons to several computational models of vision, including models of highlevel and lowlevel vision, revealed that infants’ performance was best described by a model of shape based on the skeletal structure. Interestingly, infants outperformed a range of artificial neural network models, selected for their massive object experience and biological plausibility, under the same conditions. Altogether, these findings suggest that robust representations of shape can be formed with little language or object experience by relying on the perceptually invariant skeletal structure.