Deep neural networks to register and annotate cells in moving and deforming nervous systems

eLife Assessment

Whole-brain imaging of neuronal activity in freely behaving animals holds great promise for neuroscience, but numerous technical challenges limit its use. In this important study, the authors describe a new set of deep learning-based tools to track and identify the activity of head neurons in freely moving nematodes (C. elegans) and jellyfish (Clytia hemisphaerica). While the tools convincingly enable high tracking speed and accuracy in the settings in which the authors have evaluated them, the claim that these tools should be easily generalizable to a wide variety of datasets is incompletely supported.

https://doi.org/10.7554/eLife.108159.2.sa0

Significance of the findings:

Important: Findings that have theoretical or practical implications beyond a single subfield

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Convincing: Appropriate and validated methodology in line with current state-of-the-art

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Aligning and annotating the heterogeneous cell types that make up complex cellular tissues remains a major challenge in the analysis of biomedical imaging data. Here, we present a series of deep neural networks that allow for automatic non-rigid registration and cell identification, developed in the context of freely moving and deforming invertebrate nervous systems. A semi-supervised learning approach was used to train a Caenorhabditis elegans registration network (BrainAlignNet) that aligns pairs of images of the bending C. elegans head with single-pixel-level accuracy. When incorporated into an image analysis pipeline, this network can link neurons over time with 99.6% accuracy. This network could also be readily purposed to align neurons from the jellyfish Clytia hemisphaerica, an organism with a vastly different body plan and set of movements. A separate network (AutoCellLabeler) was trained to annotate >100 neuronal cell types in the C. elegans head based on multi-spectral fluorescence of genetic markers. This network labels >100 different cell types per animal with 98% accuracy, exceeding individual human labeler performance by aggregating knowledge across manually labeled datasets. Finally, we trained a third network (CellDiscoveryNet) to perform unsupervised discovery of >100 cell types in the C. elegans nervous system: by comparing multi-spectral imaging data from many animals, it can automatically identify and annotate cell types without using any human labels. The performance of CellDiscoveryNet matched that of trained human labelers. These tools should be immediately useful for a wide range of biological applications and should be straightforward to generalize to many other contexts requiring alignment and annotation of dense heterogeneous cell types in complex tissues.

Introduction

Optical imaging of dense cellular tissues is widespread in biomedical research. Recently developed methods to label cells with highly multiplexed fluorescent probes should soon make it feasible to determine the heterogeneous cell types in any given sample (Alon et al., 2021; Chen et al., 2015; Ke et al., 2013). However, it remains challenging to extract critical information about cell identity and position from fluorescent imaging data. Aligning images within or across animals that have non-rigid deformations can be inefficient and lack cellular-level accuracy. Additionally, annotating cell types in a given sample can involve time-consuming manual labeling and often only results in coarse labeling of the main cell classes, rather than full annotation of the vast number of defined cellular subtypes.

Deep neural networks provide a promising avenue for aligning and annotating complex images of fluorescently labeled cells with high levels of efficiency and accuracy (Moen et al., 2019). Deep learning has generated high-performance tools for related problems, such as the general task of segmenting cells from background in images (Stirling et al., 2021; Kirillov et al., 2023). In addition, deep learning approaches have proven useful for non-rigid image registration in the context of medical image alignment (Zou et al., 2022) and for anatomical atlases (Brezovec et al., 2024). However, this has not been as widely applied to align images with single-cell resolution, which requires single-micron-level accuracy. Automated cell annotation using clustering of single-cell RNA sequencing data has been widely adopted (Zidane et al., 2023), but annotation of imaging data is more complex. Recent studies have shown the feasibility of using deep learning applied on image features (Geuenich et al., 2021) or raw imaging data to label major cell classes (Zidane et al., 2023; Amitay et al., 2023; Brbić et al., 2022). However, these methods are still not sufficiently advanced to label the potentially hundreds of cellular subtypes in images of complex tissues. In addition, fully unsupervised discovery of the many distinct cell types in cellular imaging data remains an unsolved challenge.

There is considerable interest in using these methods to automatically align and annotate cells in the nervous system of Caenorhabditis elegans, which consists of 302 uniquely identifiable neurons (White et al., 1986; Witvliet et al., 2021; Cook et al., 2019). The optical transparency of the animal enables in vivo imaging of fluorescent indicators of neural activity at brain-wide scale (Prevedel et al., 2014; Schrödel et al., 2013). Advances in closed-loop tracking made this imaging feasible in freely moving animals (Nguyen et al., 2016; Venkatachalam et al., 2016). These approaches are being used to map the relationship between brain-wide activity and flexible behavior (reviewed in Flavell and Gordus, 2022; Kramer and Flavell, 2024). However, analysis is still impeded by how the animal bends and warps its head as it moves, resulting in non-rigid deformations of the densely packed cells in its nervous system. Fully automating the alignment and annotation of cells in C. elegans imaging data would facilitate high-throughput and high-SNR brain-wide calcium imaging. These methods would also aid studies of reporter gene expression, developmental trajectories, and more. Moreover, similar challenges exist in a variety of other organisms whose nervous systems undergo deformations, like hydra, jellyfish, Drosophila larvae, and others.

Previous studies have described methods to align and annotate cells in multicellular imaging datasets from C. elegans and hydra. Datasets from freely moving animals pose an especially challenging case. Methods for aligning cells across timepoints in moving datasets include approaches that link neurons across adjacent timepoints (Lagache et al., 2021; Hanson et al., 2024; Wen et al., 2021), as well as approaches that use signal demixing (Nejatbakhsh et al., 2020), alignment of body position markers using anatomical constraints (Christensen et al., 2015; Ardiel et al., 2022), or registration/clustering/matching based on features of the neurons, such as their centroid positions (Ryu et al., 2024; Nguyen et al., 2017; Yu et al., 2021; Wu et al., 2022; Nejatbakhsh and Varol, 2021; Deng et al., 2024; Fieseler et al., 2025). Targeted data augmentation combined with deep learning applied to raw images has recently been used to reduce manual labeling time during cell alignment (Park et al., 2024). Deep learning applied to raw images has also been used to identify specific image features, like multicellular structures in C. elegans (Wang et al., 2019). We have previously applied non-rigid registration to full fluorescent images from brain-wide calcium imaging datasets to perform neuron alignment, but performing this complex image alignment via gradient descent is very slow, taking multiple days to process a single animal’s data even on a computing cluster (Atanas et al., 2023). In summary, all these current methods for neuron alignment are constrained by a tradeoff between alignment accuracy and processing time, either due to manually labeling subsets of neurons or computing the complex alignments required to yield >95% alignment accuracy.

For C. elegans neuron class annotation, ground-truth measurements of neurons’ locations in the head have allowed researchers to develop atlases describing the statistical likelihood of finding a given neuron in a given location (Toyoshima et al., 2020; Bubnis et al., 2019; Chaudhary et al., 2021; Skuhersky et al., 2022; Yemini et al., 2021; Varol et al., 2020; Sprague et al., 2024). Some of these atlases used the NeuroPAL transgene in which four fluorescent proteins are expressed in genetically defined sets of cells, allowing users to manually determine their identity based on multi-spectral fluorescence and neuron position (Yemini et al., 2021; Varol et al., 2020; Sprague et al., 2024). However, this manual labeling is time-consuming (hours per dataset), and statistical approaches to automate neuron annotation based on manual labeling have still not achieved human-level performance (>95% accuracy).

Here, we describe deep neural networks that solve these alignment and annotation tasks. First, we trained a neural network (BrainAlignNet) that can perform non-rigid registration to align images of the worm’s head from different timepoints in freely moving data. It is >600 fold faster than our previous gradient descent-based approach using elastix (Atanas et al., 2023) and aligns neurons with 99.6% accuracy. Moreover, we show that this same network can be easily purposed to align cells recorded from the jellyfish C. hemisphaerica, an organism with a vastly different body plan and pattern of movement. Second, we trained a neural network (AutoCellLabeler) that annotates the identity of each C. elegans head neuron based on multi-spectral NeuroPAL labels, which achieves 98% accuracy. Finally, we trained a different network (CellDiscoveryNet) that can perform unsupervised discovery and labeling of >100 cell types of the C. elegans nervous system with 93% accuracy by comparing unlabeled NeuroPAL images across animals. Overall, our results reveal how to train neural networks to automatically identify and annotate cells in complex cellular imaging data with high performance. These tools will be immediately useful for many biological applications and can be easily extended to solve related image analysis problems in other systems.

Results

BrainAlignNet: a neural network that registers cells in the deforming head of freely moving C. elegans

When analyzing neuronal calcium imaging data, it is essential to accurately link neurons’ identities over time to construct reliable calcium traces. This task is challenging in freely moving animals where animal movement warps the nervous system on sub-second timescales. Therefore, we sought to develop a fast and accurate method to perform non-rigid image registration that can deal with these warping images. Previous studies have described such methods for non-rigid registration of point clouds (e.g. neuron centroid positions; Nguyen et al., 2017; Yu et al., 2021; Wu et al., 2022; Ma et al., 2016), but, as we describe below, we found that performing full image alignment allows for higher accuracy neuron alignment.

To solve this task, we used a previously described network architecture (Fu et al., 2020; Hu et al., 2018) that takes as input a pair of 3-D images (i.e. volumes of fluorescent imaging data of the head of the worm) from different timepoints of the same neural recording (Figure 1A). The network is tasked with determining how to warp one 3-D image (termed the ‘moving image’) so that it resembles the other 3-D image (termed the ‘fixed image’). Specifically, the network outputs a dense displacement field (DDF), a pixel-wise coordinate transformation function designed to indicate which points in the moving and fixed images are the same (see Materials and methods). The moving image is then transformed through this DDF to create a warped moving image, which should look like the fixed image. This network was selected because its LocalNet architecture (a modified 3-D U-Net) allows it to do the feature extraction and image reconstruction necessary to solve the task. To train and evaluate the network, we used data from freely moving animals expressing both pan-neuronal NLS-GCaMP and NLS-tagRFP, but only provided the tagRFP images to the network, since this fluorophore’s brightness should remain static over time. Since Euler registration of images (rotation and translation) is simple, we performed Euler registration on the images using a GPU-accelerated grid search prior to inputting them into the network. During training, we also provided the network with the locations of the centroids of matched neurons found in both images, which were available for these training and validation data since we had previously used gradient descent to solve those registration problems (‘registration problem’ here is defined as a single image pair that needs to be aligned) and link neurons’ identities (Atanas et al., 2023). The centroid locations are only used for network training and are not required for the network to solve registration problems after training. The loss function that the network was tasked with minimizing had three components: (1) image loss: negative of the Local squared zero-Normalized Cross-Correlation (LNCC) of the fixed and warped moving RFP images, which takes on a higher value when the images are more similar (hence making the image loss more negative); (2) centroid alignment loss: the average of the Euclidean distances between the matched centroid pairs, where lower values indicate better alignment; and (3) regularization loss: a term that increases the overall loss the more that the images are deformed in a non-rigid manner (in particular, penalizing image scaling and scrambling of adjacent pixels; see Materials and methods).

Figure 1 with 1 supplement see all

Download asset Open asset

BrainAlignNet can perform non-rigid registration to align the neurons in the C. *elegans* head.

(A) Network training pipeline. The network takes in a pair of images and a pair of centroid position lists corresponding to the images at two different time points (fixed and moving). (In the LocalNet diagram, this is represented as ‘IN’. Intermediate cuboids represent intermediate representations of the images at various stages of network processing. In reality, the cuboids are four-dimensional, but we represent them with three dimensions (up/down is x, left/right is y, in/out is channel, and we omit z) for visualization purposes. Spaces and arrows between cuboids represent network blocks, layers, and information flow. See Materials and methods for a detailed description of network architectures.) Image pairs were selected based on the similarity of worm postures (see Materials and methods). The fixed and moving images were pre-registered using an Euler transformation, translating and rotating the moving images to maximize their cross-correlation with the fixed images. The fixed and moving neuron centroid positions were obtained by computing the centers of the same neurons in both the fixed and moving images as a list of (x, y, z) coordinates. This information was available since we had previously extracted calcium traces from these videos using a previous, slow version of our image analysis pipeline. The network outputs a Dense Displacement Field (DDF), a 4-D tensor that indicates a coordinate transformation from fixed image coordinates to moving image coordinates. The DDF is then used to transform the moving images and fixed centroids to resemble the fixed images and moving centroids. During training, the network is tasked with learning a DDF that transforms the centroids and images in a way that minimizes the centroid alignment and image loss, as well as the regularization loss (see Materials and methods). Note that, after training, only images (not centroids) need to be input into the network to align the images. (B) Network loss curves. The training and validation loss curves show that validation performance plateaued around 300 epochs of training. (C) Example of registration outcomes on neuronal ROI images. The network-learned DDF warps the neurons in the moving image (‘moving ROIs’). The warped-moving ROIs are meant to be closer to the fixed ROIs. Each neuron is uniquely colored in the ROI images to represent its identity. The centroids of these neurons are represented by the white dots. Here, we take a z-slice of the 3-D fixed and moving ROI blocks on the *x-y* plane to show that the DDF can warp the x and y coordinates of the moving centroids to align with the x and y coordinates of the fixed centroids with one-pixel precision. (D) Example of registration outcomes on tagRFP images. We show the indicated image blocks as Maximal Intensity Projections (MIPs) along the z-axis, overlaying the fixed image (orange) with different versions of the moving image (blue). While the fixed image remains untransformed, the uninitialized moving image (left) gets warped by an Euler transformation (middle) and a network-learned DDF (right) to overlap with the fixed image. (E) Registration outcomes shown on example tagRFP and ROI images for four different trained networks. We randomly selected one registration problem from one of the testing datasets and tasked the trained networks with creating a DDF to warp the moving (RFP) image and moving ROI onto the fixed (RFP) image and fixed ROI. The full network with full loss function aligns neurons in both RFP and ROI images almost perfectly. For the networks trained without the centroid alignment loss, regularization loss, or image loss—while keeping the rest of the training configurations identical—the resulting DDF is unable to fully align the neurons and displays unrealistic deformation (closely inspect the warped moving ROI images). (F) Evaluation of registration performance on testing datasets before network registration and after registration with four different networks. ‘pre-align’ shows alignment statistics on images after Euler alignment, but before neural network registration. Two performance metrics are shown. Normalized cross-correlation (NCC, top) quantifies alignment of the fixed and warped moving RFP images, where a score of one indicates perfect alignment. Centroid distance (bottom) is measured as the mean Euclidean distance between the centroids of all neurons in the fixed ROI and the centroids of their corresponding neurons in the warped moving ROI; a distance of 0 indicates perfect alignment. All violin plots are accompanied by lines indicating the minimum, mean, and maximum values. **p<0.01, ***p<0.001, ****p<0.0001, distributions of registration metrics (NCC and centroid distance) were compared pairwise across all four versions of the network with the two-tailed Wilcoxon signed rank test *only* on problems that register frames from unique timepoints. For all networks (‘pre-align’, ‘full’, ‘no-centroid’, ‘no-regul.’, ‘no-image’), we evaluated performance on the same registration problems drawn from five animals in the testing set (n=447 total registration problems from those five animals). For NCC, W=35,281, 64,854, 78,754 for ‘no-centroid’, ‘no-regul.’, ‘no-image’ vs ‘full’, respectively; for centroid distance, W=12,168, 12,634, 13,345 for ‘no-centroid’, ‘no-regul.’, ‘no-image’ vs. ‘full’, respectively. (G) Example image of the head of an animal from a strain that expresses both pan-neuronal NLS-tagRFP and eat-4::NLS-GFP. The neurons expressing both NLS-tagRFP and eat-4::NLS-GFP are a subset of all the neurons expressing pan-neuronal NLS-tagRFP. (H) A comparison of the registration qualities of the four trained registration networks: full network, no-centroid alignment loss, no-regularization loss, no-image loss. Each network was evaluated on four datasets in which both pan-neuronal NLS-tagRFP and *eat-4*::NLS-GFP are expressed, examining 3927 registration problems per dataset. For a total of 15,708 registration problems, each network was tasked with registering the tagRFP images. The resulting DDFs from the tagRFP registrations were also used to register the *eat-4*::GFP images. For each channel in each problem, we determined which of the four networks had the highest performance (i.e. highest NCC). Note that the no-centroid alignment network performs the best of the RFP channel, but not in the GFP channel. Instead, the full network performs the best in the GFP channel. This suggests that the network without the centroid alignment loss deforms RFP images in a manner that does not accurately move the neurons to their correct locations (i.e. scrambles the pixels).

We trained and validated the network on 5176 and 1466 image pairs (from 57 and 22 animals), respectively, over 300 epochs (Figure 1B). We then evaluated network performance on a separate set of 447 image pairs reserved for testing that were recorded from different animals. On average, the network improved the Normalized Cross-Correlation (NCC; an overall metric of image similarity, where 1.0 means images are identical; see Materials and methods for equation) from 0.577 in the input image pairs to 0.947 in the registered image pairs (Figure 1C shows example of centroid positions; Figure 1D shows image example; Figure 1E shows both; Figure 1F shows quantification of pre-aligned and registered). After alignment, the average distance between aligned centroids was 1.45 pixels (Figure 1F). These results were only modestly different depending on the animal or the exact registration problem being solved (Figure 1—figure supplement 1A–C).

To determine which features of the network were critical for its performance, we trained three additional networks, using loss functions where we omitted either the centroid alignment loss, the regularization loss, or the image loss. In the first case, the network would not be able to learn based on whether the neuron centroids were well-aligned; in the second case, there would be no constraints on the network performing any type of deformation to solve the task; in the third case, the deformations that the network learned to apply could only be learned from the alignment of the centroids, not the raw tagRFP images. Registration performance of each network was evaluated using the NCC and centroid distance, which quantify the quality of tagRFP image alignment and centroid alignment, respectively (Figure 1F). While the NCC scores were similar for the full network and the no-regularization and no-centroid alignment networks, other performance metrics like centroid distance were significantly impaired by the absence of centroid alignment loss or regularization loss (Figure 1E–F). This suggests that in the absence of centroid alignment loss or regularization loss, the network learns how to align the tagRFP images, but does so using unnatural deformations that do not reflect how the worm bends. In the case of the no-image loss network, all performance metrics, including both image and centroid alignment, were impaired compared to the full network (Figure 1F). This suggests that requiring the network to learn how to warp the RFP images enhances the network’s ability to learn how to align the neuron positions (i.e. centroids). Thus, the loss on RFP image alignment can serve to regularize point cloud alignment.

The finding that the centroid positions were precisely aligned by the full network indicates that the centers of the neurons were correctly registered by the network. However, it does not ensure that all of the pixels that comprise a neuron are correctly registered, which could be important for subsequent feature extraction from the aligned images. For example, it is formally possible to have perfect RFP image alignment in a context where the pixels from one neuron in the moving RFP image are scrambled to multiple neuron locations in the warped moving RFP image. In fact, we observed this in our first efforts to build such a network, where the loss function was only composed of the image loss. As an additional control to test for this possibility in our trained networks, we examined the network’s performance on data from a different strain that expresses pan-neuronal NLS-mNeptune (analogous to the pan-neuronal NLS-tagRFP) and eat-4::NLS-GFP, which is expressed in ~40% of the neurons in the C. elegans head (Figure 1G shows example image). If the pixels within the neurons are being correctly registered, then applying image registration to the GFP channel for these image pairs should result in highly correlated images (i.e. a high NCC value close to 1). If the pixels within neurons are being scrambled, then these images should not be well-aligned. This test is somewhat analogous to building a ground-truth labeled dataset, but does not require manual labeling and is less susceptible to human labeling errors. We used the DDF that the network output based on the pan-neuronal mNeptune data to register the corresponding eat-4::NLS-GFP images from the same timepoints and found that this resulted in high-quality GFP image alignment (Figure 1H). In contrast, while the no-centroid alignment and no-regularization networks output a DDF that successfully aligned the RFP images, applying this DDF to corresponding GFP images resulted in poor GFP image registration (Figure 1H shows that the no-centroid alignment network aligns the RFP channel, but not the GFP channel, in the eat-4::NLS-GFP strain). This further suggests that these reduced networks lacking centroid alignment or regularization loss are aligning the RFP images through unnatural image deformations. Together, these results suggest that the full Brain Alignment Neural Network (BrainAlignNet) can perform non-rigid registration on pairs of images from freely moving animals.

The registration problems included in the training, validation, and test data above were pulled from a set of registration problems that we had been able to solve with gradient descent (example images in Figure 1—figure supplement 1D). These problems did not include the most challenging cases, for example, when the two images to be registered had the worm’s head bent in opposite directions (although we note that they did include substantial non-rigid deformations). We next asked whether a network trained on arbitrary registration problems, including those that were not solvable with gradient descent (example images in Figure 1—figure supplement 1E), could obtain high performance. For this test, we also omitted the Euler registration step that we performed in advance of network training, since the goal was to test whether this network architecture could solve any arbitrary C. elegans head alignment problem. For this analysis, we used the same loss function as the successful network described above. We also increased the amount of training data from 5176 to 335,588 registration problems. The network was trained for 300 epochs. However, the test performance of the network was not high in terms of image alignment or centroid alignment (Figure 1—figure supplement 1F). This suggests that additional approaches may be necessary to solve these more challenging registration problems. Overall, our results suggest that, provided there is an appropriate loss function, a deep neural network can perform non-rigid registration problems to align neurons across the C. elegans head with high speed and single-pixel-level accuracy.

Integration of BrainAlignNet into a complete calcium imaging processing pipeline

The above results suggest that BrainAlignNet can perform high-quality image alignments. The main value of performing these alignments is to enable accurate linking of neurons over time. To test whether performance was sufficient for this, we incorporated BrainAlignNet into our existing image analysis pipeline for brain-wide calcium imaging data and compared the results to our previously described pipeline, which used gradient descent to solve image registration (Atanas et al., 2023). This image analysis pipeline, the Automated Neuron Tracking System for Unconstrained Nematodes (ANTSUN), includes steps for neuron segmentation (via a 3D U-Net), image registration, and linking of neurons’ identities (Figure 2A). Several steps are required to link neurons’ identities based on registration of these high-complexity 3D images. First, image registration defines a coordinate transformation between the two images, which is then applied to the segmented neuron ROIs, warping them into a common coordinate frame. To link neuron identities over time, we then build an N-by-N matrix (where N is the number of all segmented neuron ROIs at all timepoints in a given recording, i.e. approximately # ROIs * # timepoints) with the following structure: (1) Enter zero if the ROIs were in an image pair that was not registered (we do not attempt to solve all registration problems, as this is unnecessary); (2) Enter zero if the ROIs were from a registered image pair, but the registration-warped ROI did not overlap with the fixed ROI; and (3) Otherwise, enter a heuristic value indicating the confidence that the ROIs are the same neurons based on several ROI features. These features include similarity of ROI positions and sizes, similarity of red channel brightness, registration quality, a penalty for overly nonlinear registration transformations, and a penalty if ROIs were displaced over large distances during alignment. Finally, custom hierarchical clustering is applied to the matrix to generate clusters consisting of the ROIs that reflect the same neuron recorded at different time points. Calcium traces are then constructed from all of these timepoints, normalizing the GCaMP signal to the tagRFP signal. We term the ANTSUN pipeline with gradient descent registration ANTSUN 1.4 (Atanas et al., 2023; Pradhan et al., 2025) and the version with BrainAlignNet registration ANTSUN 2.0 (Figure 2A).

Figure 2 with 1 supplement see all

Download asset Open asset

BrainAlignNet supports calcium trace extraction with high accuracy and high SNR.

(A) Diagram of ANTSUN 1.4 and 2.0, which are two full calcium trace extraction pipelines that only differ with regards to image registration. Raw tagRFP channel data is input into the pipeline, which submits image pairs with similar worm postures for registration using either elastix (ANTSUN 1.4; red) or BrainAlignNet (ANTSUN 2.0; blue). The registration is used to transform neuron ROIs identified by a segmentation U-Net (the cuboid diagram is represented as in Figure 1A). These are input into a heuristic function (ANTSUN 2.0-specific heuristics shown in blue) which defines an ROI linkage matrix. Clustering this matrix then yields neuron identities. (B) Sample dataset from an *eat-4*::NLS-GFP strain, showing ratiometric (GFP/tagRFP) traces without any further normalization. This strain has some GFP +neurons (bright horizontal lines) as well as some GFP- neurons (dark horizontal lines, which have F~0). Registration artifacts between GFP+ and GFP- neurons would be visible as bright points in GFP- traces or dark points in GFP + traces. (C) Error rate of ANTSUN 2.0 registration across four *eat-4*::NLS-GFP animals, computed based on mismatches between GFP+ and GFP- neurons in the *eat-4*::NLS-GFP strain. The dashed red line shows the error rate of ANTSUN 1.4. Individual dots are different recorded datasets. Note that all error rates are <1%. (D) Sample dataset from a pan-neuronal GCaMP strain, showing F/Fmean fluorescence. Robust calcium dynamics are visible in most neurons. (E) Number of detected neurons across three pan-neuronal GCaMP animals for the two different ANTSUN versions (1.4 or 2.0). Individual dots are individual recorded datasets. (F) Computation time to process one animal based on ANTSUN version (1.4 or 2.0). ANTSUN 1.4 was run on a computing cluster that provided an average of 32 CPU cores per registration problem; computation time is the total number of CPU hours used (i.e.: the time it would have taken to run ANTSUN 1.4 registration locally on a comparable 32-core machine). ANTSUN 2.0 was run locally on NVIDIA A4000, A5500, and A6000 graphics cards.

We ran a series of control datasets through both versions of ANTSUN to benchmark their results. The first was from animals with pan-neuronal NLS-mNeptune and eat-4::NLS-GFP, described above. The resulting GFP traces from these recordings allow us to quantify the number of timepoints where the neuron identities are not accurately linked together into a single trace (Figure 2B shows example dataset). Specifically, in this strain, this type of error can be easily detected since it can result in a low-intensity GFP neuron (eat-4-) suddenly having a high-intensity value when the trace mistakenly incorporates data from a high-intensity neuron (eat-4+), or vice versa. We computed this error rate, taking into account the overall similarity of GFP intensities (i.e. since we can only observe errors when GFP- and GFP+ neurons are combined into the same trace). For both versions of ANTSUN, the error rates were <0.5%, suggesting that >99.5% of timepoints reflect correctly linked neurons (Figure 2C).

We next estimated motion artifacts in the data collected from ANTSUN 2.0, as compared to ANTSUN 1.4. Here, we processed data from three pan-neuronal GFP datasets. GFP traces should ideally be constant, so the fluctuations in these traces provide an indication of any possible artifacts in data collection or processing (Figure 2—figure supplement 1A shows example dataset). Quantification of signal variation in GFP traces showed similar results for ANTSUN 1.4 and 2.0, suggesting that incorporating BrainAlignNet did not introduce artifacts that impair signal quality of the traces (Figure 2—figure supplement 1B).

We also quantified other aspects of performance on GCaMP datasets (example GCaMP datasets in Figure 2D). ANTSUN 2.0 successfully extracted traces from a similar number of neurons as ANTSUN 1.4 (Figure 2E). However, while ANTSUN 1.4 requires 250 CPU days per dataset for registration, ANTSUN 2.0 only requires 9 GPU hours, reflecting a >600 fold increase in computation speed (Figure 2F). These results suggest that ANTSUN 2.0, which uses BrainAlignNet, provides a massive speed improvement in extracting neural data from GCaMP recordings without compromising the accuracy of the data.

BrainAlignNet can be used to register neurons from moving jellyfish

We next sought to examine whether BrainAlignNet could be purposed to align neurons from other animals in distinct imaging paradigms. The hydrozoan jellyfish Clytia hemisphaerica has recently been developed as a genetically and optically tractable model for systems and evolutionary neuroscience (Cunningham et al., 2024). It presents a valuable test of the generalizability of our methods as the neuronal morphology, radially symmetric body plan, and centripetal motor output are all distinct from our C. elegans preparations above.

To test the generalizability of BrainAlignNet, we collected a set of videos in which Clytia were gently restrained under cover glass: neurons were restricted to a single z-plane in an epifluorescence configuration. In this setting, Clytia warps significantly as it moves and behaves, necessitating non-rigid motion correction. This paradigm provides an opportunity to relate neural activity to behavior, but even in this context, there are non-rigid deformations that make neuron tracking over time very challenging. Further, in this preparation, neurons have the potential to overlap in the single z-plane. This makes it important to perform full image alignment on all frames of a video, that is aligning all data to a single frame (note that this is different from the approach in C. elegans, see Figure 2A). We tested whether BrainAlignNet could facilitate this demanding case of neuron alignment.

To assess BrainAlignNet’s performance, we utilized recordings of a sparsely labeled transgenic line where neuron tracking is solvable using classic point-tracking across adjacent frames. This allowed us to determine ground-truth neuron positions (i.e. centroids) that could be used to rigorously quantify alignment accuracy. Specifically, we collected videos of transgenic Clytia medusae expressing mCherry under control of the RFamide neuropeptide promoter, which expresses in roughly 10% of neurons distributed throughout the animal’s body (Weissbourd et al., 2021). We formulated the alignment problem as follows. All images were Euler-aligned to a single reference frame. While Euler registration removes much of the macro displacement of the animal, it is entirely insufficient for trace extraction as it was unable to account for the myriad non-rigid deformations occurring across the jellyfish nerve ring (Figure 3A shows example). The network was as described above, with only slight modifications to mask out DDF deformations in the z axis and ignore regularization across z slices (since the image was 2-D). We then trained on 300 total image pairs from three animals using only image loss and regularization loss (we found that centroid alignment loss was unnecessary in this more constrained 2D search space). We then evaluated image alignment (Figure 3B) and centroid alignment (Figure 3C) on 25,997 frames from three separate animals withheld from the training data. Compared to the Euler-aligned images, BrainAlignNet led to a dramatic increase in image alignment and centroid alignment, ultimately aligning matched centroids within 1.51 pixels of each other on average (Figure 3B–C). Additionally, we found comparable performance across the withheld frames of the training animals (Figure 3—figure supplement 1). These results obtained in jellyfish show that, with only light modification, a version of BrainAlignNet can produce neuron alignment in a radically different organism at a quality level matching that observed in our C. elegans data – aligning matched neurons within ~1.5 pixels of each other.

Figure 3 with 1 supplement see all

Download asset Open asset

BrainAlignNet can be used to perform neuron alignment in jellyfish.

(A) Example of image registration on a pair of mCherry images (from a testing animal, withheld from training data), composed by overlaying a moving image (blue) on the fixed image (orange). While the fixed image remains untransformed, the uninitialized moving image (left) gets warped by a Euler transformation (middle) and a BrainAlignNet-generated DDF (right) to overlap with the fixed image. (B) Evaluation of registration performance by examining mCherry image alignment (via NCC) on testing datasets before and after registration with BrainAlignNet. ‘pre-align’ shows alignment statistics on images after Euler alignment, but before neural network registration. Here, we evaluated all registration problems for all three animals in the testing set. As in Figure 1, NCC quantifies alignment of the fixed and warped moving RFP images, where a score of 1 indicates perfect alignment. All violin plots are accompanied by lines indicating the minimum, mean, and maximum values. ****p<0.0001, two-tailed Wilcoxon signed rank test. For both ‘pre-align’ and ‘post-BrainAlignNet’, n=25,997 registration problems (from three animals). (C) Evaluation of registration performance by examining neuron alignment (measured via distance between matched centroids) on testing datasets before and after BrainAlignNet registration. Centroid distance is measured as the mean Euclidean distance between the centroids of all neurons in the fixed image and the centroids of the corresponding neurons in the warped moving image; a distance of 0 indicates perfect alignment. ****p<0.0001, two-tailed Wilcoxon signed rank test. For both ‘pre-align’ and ‘post-BrainAlignNet’, n=25,997 registration problems (from three animals).

AutoCellLabeler: a neural network that automatically annotates >100 neuron classes in the C. elegans head from multi-spectral fluorescence

We next turned our attention to annotating the identities of the recorded neurons in brain-wide calcium imaging data. C. elegans neurons have fairly stereotyped positions in the heads of adult animals, though fully accurate inference of neural identity from position alone has not been shown to be possible. Fluorescent reporter gene expression using well-defined genetic drivers can provide additional information. The NeuroPAL strain is especially useful in this regard. It expresses pan-neuronal NLS-tagRFP, but also has expression of NLS-mTagBFP2, NLS-CyOFP1, and NLS-mNeptune2.5 under a set of well-chosen genetic drivers (example image in Figure 4A Yemini et al., 2021). With proper training, humans can manually label the identities of most neurons in this strain using neuron position and multi-spectral fluorescence. For most of the brain-wide recordings collected using our calcium imaging platform, we used a previously characterized strain with a pan-neuronal NLS-GCaMP7F transgene crossed into NeuroPAL (Atanas et al., 2023). While freely moving recordings were conducted with only NLS-GCaMP and NLS-tagRFP data acquisition, animals were immobilized at the end of each recording to capture multi-spectral fluorescence. Humans could manually label many neurons’ identities in these multi-spectral images, and the image registration approaches described above could map the ROIs in the immobilized data to ROIs in the freely moving recordings in order to match neuron identity to GCaMP traces.

Figure 4 with 1 supplement see all

Download asset Open asset

The AutoCellLabeler Network can automatically annotate >100 neuronal cell types in the C. *elegans* head.

(A) Procedure by which AutoCellLabeler generates labels for neurons. First, the tagRFP component of a multi-spectral image is passed into a segmentation neural network, which extracts neuron ROIs, labeling each pixel as an arbitrary number with one number per neuron. Then, the full multi-spectral image is input into AutoCellLabeler, which outputs a probability map. This probability map is applied to the ROIs to generate labels and confidence values for those labels. The network cuboid diagrams are represented as in Figure 1A. (B) AutoCellLabeler’s training data consists of a set of multi-spectral images (NLS-tagRFP, NLS-mNeptune2.5, NLS-CyOFP1, and NLS-mTagBFP2), human neuron labels, and a pixel weighting matrix based on confidence and frequency of the human labels that controls how much each pixel is weighted in AutoCellLabeler’s loss function. (C) Pixel-weighted cross-entropy loss and pixel-weighted IoU metric scores for training and validation data. Cross-entropy loss captures the discrepancy between predicted and actual class probabilities for each pixel. The IoU metric describes how accurately the predicted labels overlap with the ground truth labels. (D) During the label extraction procedure, AutoCellLabeler is less confident of its label on pixels near the edge of ROI boundaries. Therefore, we allow the central pixels to have much higher weight when determining the overall ROI label from pixel-level network output. (E) Distributions of AutoCellLabeler’s confidence across test datasets based on the relationship of its label to the human label (‘Correct’=agree, ‘Incorrect’=disagree, ‘Human low conf’=human had low confidence, ‘Human no label’ – human did not even guess a label for the neuron). ****p<0.0001, as determined by a Mann-Whitney U Test between the indicated condition and the ‘Correct’ condition where the network agreed with the human label; n=835, 25, 322, 302 labels (from 11 animals) for the conditions ‘Correct’, ‘Incorrect’, ‘Human low conf’, ‘Human no label’, respectively; U=16,700, 202,691, 210,797 for ‘Incorrect’, ‘Human low conf’, ‘Human no label’ vs ‘Correct’, respectively. (F) Categorization of neurons in test datasets based on AutoCellLabeler’s confidence. Here ‘Correct’ and ‘Incorrect’ are as in (E), but ‘No human label’ also includes low-confidence human labels. Printed percentage values are the accuracy of AutoCellLabeler on the corresponding category, computed as $\frac{c o r r e c t}{c o r r e c t + i n c o r r e c t}$ (G) Distributions of accuracy of AutoCellLabeler’s high confidence (>75%) labels on neurons across test datasets based on the confidence of the human labels. n.s. not significant, *p<0.05, as determined by a paired permutation test comparing mean differences (n=11 test datasets). (H) Accuracy of AutoCellLabeler compared with high-confidence labels from new human labelers on neurons in test datasets that were labeled at low confidence, not at all, or at high confidence by the original human labelers. Error bars are bootstrapped 95% confidence intervals. A dashed red line shows accuracy of new human labelers relative to the old human labelers, when both gave high confidence to their labels. There was no significant difference between the human vs human accuracy and the network accuracy for any of these categories of labels, determined via two-tailed empirical p-values from the bootstrapped distributions. (I) Distributions of number of high-confidence labels per animal over test datasets. High confidence was 4–5 for human labels and >75% for network labels. We note that we standardized the manner in which split ROIs were handled for human- and network-labeled data so that the number of detected neurons could be properly compared between these two groups. n.s. not significant, ***p<0.001, as determined by a paired permutation test comparing mean differences (n=11 animals). (J) Distributions of accuracy of high-confidence labels per animal over test datasets, relative to the original human labels. A paired permutation test comparing mean differences to the full network’s label accuracy did not find any significance. (K) Number of ROIs per neuron class labeled at high confidence in test datasets that fall into each category, along with average confidence for all labels for each neuron class in those test datasets. ‘New’ represents ROIs that were labeled by the network as the neuron and were not labeled by the human. ‘Correct’ represents ROIs that were labeled by both AutoCellLabeler and the human as that neuron. ‘Incorrect’ represents ROIs that were labeled by the network as that neuron and were labeled by the human as something else. ‘Lost’ represents ROIs that were labeled by the human as that neuron and were not labeled by the network. ‘Network conf’ represents the average confidence of the network for all its labels of that neuron. ‘Human conf’ represents the average confidence of the human labelers for all their labels of that neuron. Neuron classes with high values in the ‘Correct’ column and low values in the ‘Incorrect’ column indicate a very high degree of accuracy in AutoCellLabeler’s labels for those classes. If those classes also have a high value in the ‘New’ column, it could indicate that AutoCellLabeler is able to find the neuron with high accuracy in animals where humans were unable to label it.

Manual annotation of NeuroPAL images is time-consuming. First, to perform accurate labeling, the individual needs substantial training. Even after being trained, labeling all ROIs in one NeuroPAL animal can take 3–5 hr. In addition, different individuals have different degrees of knowledge or confidence in labeling certain cell classes. For these reasons, it was desirable to automate NeuroPAL labeling using datasets that had previously been labeled by a panel of human labelers. In particular, the labels that they provided with a high degree of confidence would be most useful for training an automated labeling network. Previous studies have developed statistical approaches for semi-automated labeling to label neural identity from NeuroPAL images, but the maximum precision that we are aware of is 90% without manual correction (Yemini et al., 2021).

We trained a 3-D U-Net (Wolny et al., 2020) to label the C. elegans neuron classes in a given NeuroPAL 3-D image. As input, the network received four fluorescent 3-D images from the head of each worm: pan-neuronal NLS-tagRFP, plus the NLS-mTagBFP2, NLS-CyOFP1, and NLS-mNeptune2.5 images that label stereotyped subsets of neurons (Figure 4A). During training, the network also received the human-annotated labels of which pixels belong to which neurons. Humans provided ROI-level labels and the boundaries of each ROI were determined using a previously described neuron segmentation network (Atanas et al., 2023) trained to label all neurons in a given image (agnostic to their identity). Finally, during training, the network also received an array indicating the relative weight to assign each pixel during training (Figure 4B). This was incorporated into a pixel-weighted cross-entropy loss function (lower values indicate more accurate labeling of each pixel), summing across the pixels in a weighted manner. Pixel weighting was adjusted as follows: (1) background was given extremely low weight; (2) ROIs that humans were not able to label were given extremely low weight; (3) all other ROIs received higher weight proportional to the subjective confidence that the human had in assigning the label to the ROI and the rarity of the label (see Materials and methods for exact details). Regarding this latter point, neurons that were less frequently labeled by human annotation received higher weight so that the network could potentially learn how to classify these neurons from fewer labeled examples.

We trained the network over 300 epochs using a training set of 81 annotated images and a validation set of 10 images (Figure 4C). Because the size of the training set was fairly small, we augmented the training data using both standard image augmentations (rotation, flipping, adding Gaussian noise, etc.) and a custom augmentation where the images were warped in a manner to approximate worm head bending (see Materials and methods). Overall, the goal was for this network to be able to annotate neural identities in worms in any posture provided that they were roughly oriented length-wise in a 284 x 120 x 64 (x, y, z) image. Because this Automatic Cell Labeling Network (AutoCellLabeler) labels individual pixels, it was necessary to convert these pixel-wise classifications into ROI-level classifications. AutoCellLabeler outputs its confidence in its label for each pixel, and we noted that the network’s confidence for a given ROI was highest near the center of the ROI (Figure 4D). Therefore, to determine ROI-level labels, we took a weighted average of the pixel-wise labels within an ROI, weighing the center pixels more strongly. The overall confidence of these pixel scores was also used to compute an ROI-level confidence score, reflecting the network’s confidence that it labeled the ROI correctly. Finally, after all ROIs were assigned a label, heuristics were applied to identify and delete problematic labels. Labels were deleted if (1) the network already labeled another ROI as that label with higher confidence; (2) the label was present too infrequently in the network’s training data; (3) the network labeled that ROI as something other than a neuron (e.g. a gut granule or glial cell, which we supplied as valid labels during training and had labeled in training data); or (4) the network confidently predicted different parts of the ROI as different labels (see Materials and methods for details).

We evaluated the performance of the network on 11 separate datasets that were reserved for testing. We assessed the accuracy of AutoCellLabeler on the subset of ROIs with high-confidence human labels (subjective confidence scores of 4 or 5, on a scale from 1 to 5). On these neurons, average network confidence was 96.8% and its accuracy was 97.1%. We furthermore observed that the network was more confident in its correct labels (average confidence 97.3%) than its incorrect labels (average confidence 80.7%; Figure 4E). More generally, AutoCellLabeler confidence was highly correlated with its accuracy (Figure 4F shows a breakdown of cell labeling at different confidences, with an inset indicating accuracy). Indeed, excluding the neurons where the network assigns low (<75%) confidence increased its accuracy to 98.1% (Figure 4—figure supplement 1A displays the full accuracy-recall tradeoff curve). Under this confidence threshold cutoff, AutoCellLabeler still assigned a label to 90.6% of all the ROIs that had high-confidence human labels, so we chose to delete the low-confidence (<75%) labels altogether (see Figure 4—figure supplement 1A for rationale for the 75% cutoff value).

We also examined model performance on data where humans had either low confidence or did not assign a neuron label. In these cases, it was harder to estimate the ground truth. Overall, model confidence was much lower for neurons that humans labeled with low confidence (87.3%) or did not assign a label (81.3%). The concurrence of AutoCellLabeler relative to low-confidence human labels was also lower (84.1%; we note that this is not really a measure of accuracy since these ‘ground-truth’ labels had low confidence). Indeed, overall the network’s concurrence versus human labels scaled with the confidence of the human label (Figure 4G).

We carefully examined the subset of ROIs where the network had high confidence (>75%), but humans had either low confidence or entered no label at all. This was quite a large set of ROIs: AutoCellLabeler identified significantly more high-confidence neurons (119/animal) than the original human labelers (83/animal), and this could conceivably reflect a highly accurate pool of network-generated labels exceeding human performance. To determine whether this was the case, we obtained new human labels by different human labelers for a random subset of these neurons. Whereas some human labels remained low-confidence, others were now labeled with high confidence (20.9% of this group of ROIs). The new human labelers also labeled neurons that were originally labeled with high confidence so that we could compare the network’s performance on relabeled data where the original data was unlabeled, low confidence, or high confidence. AutoCellLabeler’s performance on all three groups was similar (88%, 86.1%, and 92.1%, respectively), which was comparable to the accuracy of humans relabeling data relative to the original high-confidence labels (92.3%; Figure 4H). The slightly lower accuracy on these re-labeled data is likely due to the human labeling of the original training, validation, and testing data being highly vetted and thoroughly double-checked, whereas the re-labeling that we performed just for this analysis was done in a single pass. Overall, these analyses indicate that the high-confidence network labels (119/animal) have similar accuracy regardless of whether the original data had been labeled by humans as unlabelable, low confidence, or high confidence. We note that this explains a subset of the cases where human low-confidence labels were not in agreement with network labels (Figure 4G). Taken together, these observations indicate that AutoCellLabeler can confidently label more neurons per dataset than individual human labelers.

We also split out model performance by cell type. This largely revealed similar trends. Model labeling accuracy and confidence were variable among the neuron types, with highest accuracy and confidence for the cell types where there were higher confidence human labels and a higher frequency of human labels (Figure 4K). For the labels where there were high confidence network and human labels, we generated a confusion matrix to see if AutoCellLabeler’s mistakes had recurring trends (Figure 4—figure supplement 1B). While mistakes of this type were very rare, we observed that the ones that occurred could mostly be categorized as either mislabeling a gut granule as the neuron RMG, or mislabeling the dorsal/ventral categorization of the neurons IL1 and IL2 (e.g. mislabeling IL2D as IL2). Together, these categories accounted for 50% of all AutoCellLabeler’s mistakes. We also observed that across cell types, AutoCellLabeler’s confidence was highly correlated with human confidence (Figure 4—figure supplement 1C), suggesting that the main limitations of model accuracy are due to limited human labeling accuracy and confidence.

To provide better insights into which network features were critical for its performance, we trained additional networks lacking some of AutoCellLabeler’s key features. To evaluate these networks, we considered both the number of high-confidence labels assigned by AutoCellLabeler and the accuracy of those labels measured against high-confidence human labels. Surprisingly, a network that was trained with only standard image augmentations (i.e. lacking the custom augmentation to bend the images in a manner that approximates a worm head bend) had similar performance (Figure 4I). However, a network that was trained without a pixel-weighting scheme (i.e. where all pixels were weighted equally) provided far fewer high-confidence labels. This suggests that devising strategies for pixel weighting is critical for model performance, though our custom augmentation was not important. Interestingly, all trained networks had similar accuracy (Figure 4J) on their high-confidence labels, suggesting that the network architecture in all cases is able to accurately assess its confidence.

Automated annotation of C. elegans neurons from fewer fluorescent labels and in different strains

We examined whether the full group of fluorophores was critical for AutoCellLabeler performance. This is a relevant question because (i) it is laborious to make, inject, and annotate a large number of plasmids driving fluorophore expression and (ii) the large number of plasmids in the NeuroPAL strain has been noted to adversely impact the animals’ growth and behavior (Atanas et al., 2023; Yemini et al., 2021; Sharma et al., 2024). To test whether fewer fluorescent labels could still facilitate automatic labeling, we trained four additional networks: one that only received the pan-neuronal tagRFP image as input, and three that received pan-neuronal tagRFP plus a single other fluorescent channel (CyOFP, tag-mBFP2, or mNeptune). As we still had the ground-truth labels based on humans viewing the full set of fluorophores, the supervised labels were identical to those supplied to the full network.

We evaluated the performance of these models by quantifying the number of high-confidence labels that each network provided in each testing dataset (Figure 5A) and the accuracy of these labels measured against high-confidence human labels (Figure 5B). We found that all four networks had attenuated performance relative to the full AutoCellLabeler network, which was almost entirely explainable by these networks having lower confidence in their labels, since network accuracy was always consistent with its confidence (Figure 5—figure supplement 1A). Of the four attenuated networks, the tagRFP +CyOFP network performance (107 neurons per animal labeled at 97.4% accuracy) was closest to the full network in its performance. Given that there are >20 mTagBFP2 and mNeptune plasmids in the full NeuroPAL strain, these results raise the possibility that a smaller set of carefully chosen plasmids could permit training of a network with equal performance to the full network that we trained here.

Figure 5 with 1 supplement see all

Download asset Open asset

Variants of AutoCellLabeler can annotate neurons from fewer fluorescent channels and in different strains.

(A) Distributions of number of high-confidence labels per animal over test datasets for the networks trained on the indicated set of fluorophores. The ‘tagRFP (on low SNR)’ column corresponds to a network that was trained on high-SNR, tagRFP-only data and tested on low-SNR tagRFP data due to shorter exposure times in freely moving animals. *p<0.05, **p<0.01, ***p<0.001, as determined by a paired permutation test comparing mean differences to the full network (n=11 animals). (B) Distributions of accuracy of high-confidence labels per animal over test datasets for the networks trained on the indicated set of fluorophores. The ‘tagRFP (on low SNR)’ column is as in (A). n.s. not significant, *p<0.05, **p<0.01, as determined by a paired permutation test comparing mean differences to the full network (n=11 animals). (C) Same as Figure 4K, except for the tagRFP-only network. (D) Accuracy vs detection tradeoff for various AutoCellLabeler versions. For each network, we can set a confidence threshold above which we accept labels. By varying this threshold, we can produce a tradeoff between accuracy of accepted labels (x-axis) and number of labels per animal (y-axis) on test data. Each curve in this plot was generated in this manner. The ‘tagRFP-only (on low SNR)’ values are as in (A). The ‘tagRFP-only (on freely moving)’ values come from evaluating the tagRFP-only network on 100 randomly-chosen timepoints in the freely moving (tagRFP) data for each test dataset. The final labels were then computed on each immobilized ROI by averaging together the 100 labels and finding the most likely label. To ensure fair comparison to other networks, only immobilized ROIs that were matched to the freely moving data were considered for any of the networks in this plot. (E) Evaluating the performance of tagRFP-only AutoCellLabeler on data from another strain SWF415, where there is pan-neuronal NLS-GCaMP7f and pan-neuronal NLS-mNeptune2.5. Notably, the pan-neuronal promoter used for NLS-mNeptune2.5 differs from the pan-neuronal promoter used for NLS-tagRFP in NeuroPAL. Performance here was quantified by computing the fraction of network labels with the correct expected activity-behavior relationships in the neuron class (y-axis; quantified by whether an encoding model showed significant encoding; see Materials and methods). For example, when the label was the reverse-active AVA neuron, did the corresponding calcium trace show higher activity during reverse? The blue line shows the expected fraction as a function of the true accuracy of the network (x-axis), computed via simulations (see Materials and methods). The orange circle shows the actual fraction when AutoCellLabeler was evaluated on SWF415. Based on this, the dashed line shows estimated true accuracy of this labeling.

We did not expect the tagRFP-only network to perform well, since the task of labeling tagRFP-only images is nearly impossible for humans. Surprisingly, this network still exhibited relatively high performance, with an average of 94 high-confidence neurons per animal and 94.8% accuracy on those neurons. On most neuron classes, it behaved nearly as well as the full network, though there are 10–20 neuron classes that it is much worse at labeling, such as ASG, IL1, and RMG (Figure 5C). This suggests a high level of stereotypy in neuron position, shape, and/or RFP brightness for many neuron types. Since this network only requires the red channel fluorescence, it could be used directly on freely moving data, which has only GCaMP and tagRFP channel data. Since the tagRFP-only network was trained only on high-SNR images collected from immobilized animals, we first checked that the network was able to generalize outside its training distribution to single images with lower SNR (example images in Figure 5—figure supplement 1B). It was able to label 79 high-confidence neurons per animal at 95.2% accuracy on the lower SNR images (Figure 5A–B, right). We then investigated whether allowing the network to access different postures of the same animal improved its accuracy. Specifically, we evaluated the tagRFP-only network on 100 randomly selected timepoints in the freely moving data of each animal (example images in Figure 5—figure supplement 1B). We then related these 100 network labels to the human labels, which could be easily determined, since ANTSUN registers free-moving images back to the immobilized NeuroPAL images that had been labeled by humans. We averaged the 100 network labels to obtain the most likely network label for each neuron, as well as the average confidence for that label. To properly compare network versions, we determined how many neurons could be labeled at any given target labeling accuracy – for example, how many neurons the network can label and still achieve 95% accuracy (Figure 5D; changing the threshold network confidence value to include a given label allowed us to determine these full curves). This analysis revealed that averaging network labels across the 100 timepoints improved network performance, though only modestly. These results suggest that single-color labels can be used to train networks to a high level of performance, but additional fluorescence channels further improve performance.

The strong performance of the tagRFP-only network on out-of-domain lower SNR images suggests an impressive ability of the AutoCellLabeler network to generalize across different modalities of data. This raised the possibility that it may be possible to use this network architecture to build a foundation model of C. elegans neuron annotation that works across strains and imaging conditions. As a first step to explore this idea, we investigated to what extent the tagRFP-only network could generalize to other strains of C. elegans besides the NeuroPAL strain. We used our previously described SWF415 strain, which contains pan-neuronal NLS-GCaMP7F, pan-neuronal NLS-mNeptune2.5, and sparse tagRFP expression (Atanas et al., 2023). Notably, the pan-neuronal promoter used in this strain for NLS-mNeptune expression (Primb-1) is distinct from the pan-neuronal promoter that drives NLS-tagRFP expression in NeuroPAL (a synthetic promoter), so the levels of red fluorescence are likely to be different across these strains. Since humans do not know how to label neurons in SWF415 (it lacks the NeuroPAL transgene), we did a more limited analysis by analyzing network labels for a subset of neurons that have highly reliable activity dynamics with respect to behavior (AVA, AVE, RIM, and AIB encode reverse locomotion; RIB, AVB, RID, and RME encode forward locomotion; SMDD encodes dorsal head curvature; and SMDV and RIV encode ventral head curvature; Atanas et al., 2023; Kato et al., 2015; Li et al., 2014; Chalfie et al., 1985; Gordus et al., 2015; Luo et al., 2014; Wang et al., 2020; Ben Arous et al., 2010; Lim et al., 2016; Hallinen et al., 2021; Ray and Gordus, 2025). Specifically, we asked whether neurons labeled in SWF415 recordings with high confidence by the network had the behavior encoding properties typical of the neuron, assessed via analysis of the GCaMP traces from that neuron. Our previously described CePNEM model (Atanas et al., 2023) was used to determine whether each labeled neuron encoded forward/reverse locomotion or dorsal/ventral head curvature. The network provided high-confidence labels for an average of 7.4/21 of these neurons per animal, and the encoding properties of these neurons matched expectations 68% of the time (randomly labeled neurons had a match of 19%). However, it was possible for the network to (i) incorrectly label a neuron as another neuron that happened to have the same encoding; or (ii) correctly label a neuron that CePNEM lacked statistical power to declare an encoding for. We accounted for these effects via simulations (see Materials and methods), which estimated that the actual labeling accuracy of the network on SWF415 was 69% (Figure 5E). This is substantially lower than this network’s accuracy on similar images from the NeuroPAL strain (i.e. the strain used to train the network), where an average of 12.5 of these neurons were labeled per animal with 97.1% accuracy. Nevertheless, this analysis indicates that AutoCellLabeler has a reasonable ability to generalize to strains with different genetic drivers and fluorophores, suggesting that in the future it may be worthwhile to pursue building a foundation model that labels C. elegans neurons across many strains.

A neural network (CellDiscoveryNet) that facilitates unsupervised discovery of >100 cell types by aligning data across animals

Annotation of cell types via supervised learning (using human-provided labels) is fundamentally limited by prior knowledge and human throughput in labeling multi-spectral imaging data. In principle, unsupervised approaches that can automatically identify stereotyped cell types would be preferable. Thus, we next sought to train a neural network to perform unsupervised discovery of the cell types of the C. elegans nervous system (Figure 6A). If successful, these approaches could be useful for labeling of mutant genotypes, new permutations of NeuroPAL, or even related species, where it might be laborious to painstakingly characterize new multispectral lines and generate >100 labeled datasets. In addition, such an approach would be useful in more complex animals that do not yet have complete catalogs of cell types.

Figure 6 with 1 supplement see all

Download asset Open asset

CellDiscoveryNet and ANTSUN 2 U can perform unsupervised cell type discovery by analyzing data across different C. *elegans* animals.

(A) A schematic comparing the approaches of AutoCellLabeler and CellDiscoveryNet. AutoCellLabeler uses supervised learning, taking as input both images and manual labels for those images, and learns to label neurons accordingly. CellDiscoveryNet uses unsupervised learning and can learn to label neurons after being trained only on images (with no labels provided). (B) CellDiscoveryNet training pipeline. The network takes as input two multi-spectral NeuroPAL images from two different animals. It then outputs a Dense Displacement Field (DDF), which is a coordinate transformation between the two images. It warps the moving image under this DDF, producing a warped moving image that should ideally look very similar to the fixed image. The dissimilarity between these images is the image loss component of the loss function, which is added to the regularization loss that penalizes non-linear image deformations present in the DDF. (C) Network loss curves. Both training and validation loss curves start to plateau around 600 epochs. (D) Distributions of normalized cross-correlation (NCC) scores comparing the CellDiscoveryNet predictions (warped moving images) and the fixed images for each pair of registered images. These NCCs were computed on all four channels simultaneously, treating the entire image as a single 4D matrix for this purpose. The ‘Train’ distribution contains the NCC scores for all pairs of images present in CellDiscoveryNet’s training data, while the ‘Val +Test’ distribution contains any pair of images that was not present in its training data. (E) Distributions of centroid distance scores based on human labels. These are computed over all (moving, fixed) image pairs on all neurons with high-confidence human labels in both moving and fixed images. The centroid distance scores represent the Euclidean distance from the network’s prediction for where the neuron was and its correct location as labeled by the human. Values of a few pixels or less likely roughly indicate that the neuron was mapped to its correct location, while large values mean the neuron was mis-registered. The ‘Train’ and ‘Val +Test’ distributions are as in (D). The ‘High NCC’ distribution is from only (moving, fixed) image pairs where the NCC score was greater than the 90th percentile of all such NCC scores. ****p<0.0001, Mann-Whitney U-Test comparing All versus High NCC (n=5048 vs 486 image pairs, $U = 1.678 \times 10^{6}$ ). (F) Labeling accuracy vs number of linked neurons tradeoff curve. Accuracy is the fraction of linked ROIs with labels matching their cluster’s most frequent label (see Materials and methods). The number of linked neurons is the total number of distinct clusters; each cluster must contain an ROI in more than half of the animals to be considered a cluster. The parameter $w_{7}$ describes when to terminate the clustering algorithm – higher values mean the clustering algorithm terminates earlier, resulting in more accurate but fewer detections. Red dot is the selected value $w_{7} = 10^{- 9}$ where 125 clusters were detected with 93% labeling accuracy. (G) Number of neurons labeled per animal in the 11 testing datasets. This plot compares the number of neurons labeled as follows: human labels with 4–5 confidence, AutoCellLabeler labels with 75% or greater confidence, and CellDiscoveryNet with ANTSUN 2 U labels with parameter $w_{7} = 10^{- 9}$ . ***p<0.001, as determined by a paired permutation test comparing mean differences (n=11 animals). (H) Accuracy of neuron labels in the 11 testing datasets. This plot defines the original human confidence 4–5 labels as ground truth. ‘Human relabel’ refers to confidence 4–5 labels done by different humans (independently from the first set of human labels). AutoCellLabeler labels are confidence 75% or greater labels. CellDiscoveryNet labels were created by running ANTSUN 2 U with $w_{7} = 10^{- 9}$ , and defining the correct label for each cluster to be its most frequent label. A paired permutation test comparing mean differences to the full network’s label accuracy did not find any significance. (I) Same as Figure 5K, except using labels from CellDiscoveryNet with ANTSUN 2 U. The neurons ‘NEW 1’ through ‘NEW 5’ are clusters that were not labeled frequently enough by humans to be able to determine which neuron class they corresponded to, as described in the main text.

To facilitate unsupervised cell type discovery, we trained a network to register different animals’ multi-spectral NeuroPAL imaging data to one another. Successful alignment of cells across all recorded animals would amount to unsupervised cell type annotation, since the cells that align across animals would comprise a functional ‘cluster’ corresponding to the same cell type identified in different animals (we note though that each cell type, or cluster, here would still need to be assigned a name). The architecture of this network was similar to BrainAlignNet, but the training data here consisted of pairs of four-color NeuroPAL images from two different animals and the network was tasked with aligning all four fluorescent channels (Figure 6B). No cell type positions (i.e. centroids) or neuronal identities were provided to the network during training. Regularization and augmentation were similar to that of BrainAlignNet (see Materials and methods). Training and validation data were comprised of 91 animals’ datasets, which gave rise to 3285 unique pairs for alignment; 11 animals were withheld for testing (the same test set as for AutoCellLabeler). The network was trained over 600 epochs (Figure 6C). In the analyses below, we characterize performance on training data and withheld testing data, describing any differences. We note that, in contrast to the networks described above, high performance on training data is still useful in this case, since the only criterion for success in unsupervised learning is successful alignment (i.e. even if all data need to be used for training to do so). Strong performance on testing data is still more desirable though, since it is less efficient to train different networks over and over as new data are incorporated into the full dataset.

We first characterized the ability of this Unsupervised Cell Discovery Network (CellDiscoveryNet) to align images of different animals. Image alignment was reasonably high for all four fluorescent NeuroPAL channels with a median NCC of 0.80 overall (Figure 6D). Alignment accuracy was nearly equivalent in training and testing data (Figure 6D). We also examined how well the centroid positions of defined cell types were aligned, utilizing our prior knowledge of neurons’ locations from human labels (Figure 6E). We computed this metric only on cell types that were identified with high confidence in both of the images of a given registration problem. The median centroid distance was 7.2 pixels, with similar performance on training and testing data. This was initially rather disappointing, as it suggested that the majority of neurons were not being placed at their correct locations during registration. However, we observed two important properties of the centroid alignments. First, the distribution of centroid distances was bimodal (Figure 6E) – the 20th percentile centroid distance was only 1.4 pixels, which corresponds to a correct neuron alignment. Second, the median centroid distance decreased to 3.3 for registration problems with high (> 90th percentile = 0.85) NCC scores on the images. Together, these observations suggest that CellDiscoveryNet correctly aligns neurons some of the time.

We next sought to differentiate the neuron alignments where CellDiscoveryNet was correct from those where it was incorrect. To accomplish this, we adapted our ANTSUN pipeline (described in Figure 2) to use CellDiscoveryNet instead of BrainAlignNet. This modified ANTSUN 2 U (Unsupervised) takes as input multi-spectral data from many animals instead of monochrome images from different time points of the same animal. This approach then allows us to effectively cluster neurons that might be the same neuron found across different animals. We ran CellDiscoveryNet on pairs of images and used the resulting DDFs to align the corresponding segmented neuron ROIs. We then constructed an N-by-N matrix (N is the number of all segmented neuron ROIs in all animals, i.e. approximately # ROIs * # animals). Entries in the matrix are zero if the two neurons were in an image pair that was never registered or if the two neurons did not overlap at all in the registered image pair. Otherwise, a heuristic value indicating the likelihood that the neurons are the same was entered into the matrix. This heuristic included the same information as in ANTSUN 2.0 (described above), such as registration quality and ROI position similarity. The only difference was that the heuristic for tagRFP brightness similarity was replaced with a heuristic for four-channel color similarity (see Methods). Custom hierarchical clustering of the rows of this matrix then identified groups of ROIs hypothesized to be the same cell type identified in different animals.

To determine the performance of this unsupervised cell type discovery approach, we quantified both the number of cell types that were discovered (i.e. number of clusters) and the accuracy of cell type labeling within each cluster. Here, accuracy was computed by first determining the most frequent neuron label for each cell type, based on the human labels. We then determined the number of correct versus incorrect detections of this cell type for all cells that fell within the cluster. A correct detection was defined to be when the human label for that cell matched the most frequent label for that cell’s cluster. The number of cell types identified and the labeling accuracy are directly related: more permissive clustering identifies more cell types, but at the cost of lower accuracy. A full curve revealing this tradeoff is shown in Figure 6F (see Materials and methods). Based on this curve, we selected the clustering parameters that identified 125 cell types with 93% labeling accuracy. Not every cell type is detected in every animal. On the testing data, the CellDiscoveryNet-powered ANTSUN 2 U roughly matched human-level performance in terms of accuracy and number of neurons labeled per animal (Figure 6G–H). However, it fell slightly short of AutoCellLabeler (Figure 6G–H). Overall, this analysis reveals that CellDiscoveryNet facilitates unsupervised cell type discovery with a high level of performance, matching trained human labelers. Specifically, this network, combined with our clustering approach, can identify 125 canonical cell types across animals and assign cells to these cell type classes with 93% accuracy.

We examined whether the accuracy of cell identification was different across cell types or across animals. Figure 6I shows the accuracy of labeling for each cell type (see Figure 6—figure supplement 1A for per-animal accuracy). Indeed, results were mixed: some cell types had highly accurate detections across animals (eg: OLQD and RME), whereas a smaller subset of cell types was detected with lower accuracy (eg: AIZ and ASG), and yet other cell types were harder to assess accuracy due to a smaller number of human labels (e.g.: AIM and I4). In addition, there were five clusters that did not contain a sufficient number of human-labeled ROIs to be given a cell type label (<3 cells in these clusters had matching human labels; these are labeled ‘NEW 1’ through ‘NEW 5’). To examine which neurons these might correspond to, we examined the high-confidence AutoCellLabeler labels for ROIs in these clusters. This produced enough labels to categorize four of these five clusters as SAAD, SMBD, VB02, and VB02. The repeated VB02 label is likely an indication of under-clustering (i.e. the two VB02 clusters should have been merged into the same cluster). The identity of the fifth cluster was unclear, as the ROIs in that cluster were not well labeled by either humans or AutoCellLabeler.

Finally, we examined whether CellDiscoveryNet was able to label cells not detected via AutoCellLabeler. Specifically, we determined the fraction of the cells detected by CellDiscoveryNet that were labeled by AutoCellLabeler, which was 86%. The new unsupervised detections (the remaining 14%) included: new labels for cells that were otherwise well-labeled by AutoCellLabeler (e.g.: M3); the detection and labeling of several cell types that were uncommonly labeled by AutoCellLabeler (e.g.: RMEV); and the previously mentioned cell type that could not be identified based on human labels. This suggests that the unsupervised approach that we describe here is able to provide cell annotations that were not possible via human labeling or AutoCellLabeler. Overall, these results show how CellDiscoveryNet, together with our clustering approach, can identify >120 cell type classes that occur across animals and assign cells to these classes with 93% accuracy. This unsupervised discovery of cell types occurs in the absence of any human labels.

Discussion

Aligning and annotating the cell types that make up complex tissues remains a key challenge in computational image analysis. We trained a series of deep neural networks that allow for automated non-rigid registration and neuron identification in the context of brain-wide calcium imaging in freely moving C. elegans. This provides an appealing test case for the development of such tools. C. elegans movement creates major challenges with tissue deformation, and the animal has >100 defined cell types in its nervous system. We describe BrainAlignNet, which can perform non-rigid registration of the neurons of the C. elegans head, allowing for 99.6% accuracy in aligning individual neurons. We further show that this approach readily generalizes to another system, the jellyfish C. hemisphaerica. We also describe AutoCellLabeler, which can automatically label >100 neuronal cell types with 98% accuracy, exceeding the performance of individual human labelers by aggregating their knowledge. Finally, CellDiscoveryNet aligns data across animals to perform unsupervised discovery of stereotyped cell types, identifying >100 cell types of the C. elegans nervous system from unlabeled data. These tools should be straightforward to generalize to analyses of complex tissues in other species beyond those tested here.

Our newly described network for freely moving worm registration on average aligns neurons with within ~1.5 pixels of each other. Incorporating the network into a full image processing pipeline allows us to link neurons across time with 99.6% accuracy. Training a network to achieve this high performance highlighted a series of general challenges. For example, our attempt to train this network in a fully unsupervised manner (i.e. to simply align two images with no further information) failed. While the resulting networks aligned RFP images of testing data nearly perfectly, it turned out that the image transformations underlying this registration reflected a scrambling of pixels and that the network was not warping the images in the manner that the animal actually bends. Regularization and a semi-supervised training procedure ultimately prevented this failure mode.

Another limitation was that even with the semi-supervised approach, we were only able to train networks to register images from reasonably well-initialized conditions. Specifically, we provided Euler-registered image pairs that were selected to have moderately similar head curvature (although we note that these examples still had fairly dramatic non-rigid deformations; see Figure 1). Solving this problem was sufficient to fully align neurons from freely moving C. elegans brain-wide calcium imaging, since clustering could effectively be used to link identity across all timepoints even if our image registration only aligned a subset of the image pairs. Our attempts to train a network to register all time points to one another were unsuccessful, although a variety of approaches could conceivably improve upon this moving forward.

Once we had learned how to optimize BrainAlignNet to work in C. elegans, we were able to use a similar approach to align neurons in the jellyfish nervous system. Simple modifications to the DDF generation and loss function allowed us to restrict warping to the two-dimensional space of the jellyfish images, while images were padded to maintain similarity to the imaging paradigm of the worm. Overall, the changes that were made were minimal. The results were also extremely similar to those in C. elegans. We again found that Euler alignment of image pairs prior to network alignment was critical for accurate warping, suggesting that solving macro deformations before the non-rigid deformations is a generally robust approach. Moreover, we found that it was feasible to align neurons within ~1.5 pixels of one another in jellyfish, just like in C. elegans. Having solved this alignment in multiple systems now, we believe it should be straightforward to similarly modify BrainAlignNet for other species as well.

The AutoCellLabeler network that we describe here successfully automated a task that previously required several hours of manual labeling per dataset. It achieves 98% accuracy in cell identification and confidently labels more neurons per dataset than individual human labelers. This performance required a pixel weighting scheme where the network was trained to be especially sensitive to high-confidence labels of neurons that were not ubiquitously labeled by all human labelers. In other words, the network could aggregate knowledge across human labelers and example animals to achieve high performance. While the high performance of AutoCellLabeler is extremely useful from a practical point of view, we note that AutoCellLabeler still cannot label all ROIs in a given image, which would be the highest level of desirable performance. Our analyses above suggest that it is currently bounded by human labeling of training data, which in turn is bounded by our NeuroPAL image quality and the ambiguity of labeling certain neurons in the NeuroPAL strain.

While improvements in human labeling could improve performance of the network, this analysis also highlighted that it would be highly desirable to perform fully unsupervised cell labeling, where the cell types could be inferred and labeled in multispectral images even without any human labeling. To accomplish this, we developed CellDiscoveryNet, which aligns NeuroPAL images across animals. Together with a custom clustering approach, this enabled us to identify 125 neuron classes, labeling them with 93% accuracy in a completely unsupervised manner. This approach could be very useful within the C. elegans system, since it is extremely time-consuming to perform human labeling, and it is conceivable that the NeuroPAL labels may change in different genotypes or if the NeuroPAL transgene is modified. Beyond C. elegans, these unsupervised approaches should be useful, since most tissues in larger animals do not yet have a full catalog of cell types and therefore would greatly benefit from unsupervised discovery. In this spirit, other recent studies have started to develop approaches for unsupervised labeling of imaging data (Brbić et al., 2022; Kim et al., 2022; Liu et al., 2023), although these efforts were not aimed at identifying the full set of cellular subtypes (>100) in individual images, which was the chief objective of CellDiscoveryNet.

These approaches for registering and annotating cells in dense tissues should be straightforward to generalize to other species. For example, variants of BrainAlignNet could be trained to facilitate alignment of tissue sections or to register single-cell resolution imaging data onto a common anatomical reference. Our results suggest that training these networks on subsets of data with labeled feature points, such as cell centroids (i.e. the semi-supervised approach we use here), will facilitate more accurate solutions that, after training, can still be applied to datasets without any labeled feature points. In addition, variants of AutoCellLabeler could be trained on any multi-color cellular imaging data with manual labels. A pixel-wise labeling approach, together with appropriate pixel weighting during training, should be generally useful to build models for automatic cell labeling in a range of different tissues and animals. Finally, models similar to CellDiscoveryNet could be broadly useful to identify previously uncharacterized cell types in many tissues. It is conceivable that hybrid or iterative versions of AutoCellLabeler and CellDiscoveryNet could lead to even higher performance cell type discovery and labeling.

Share this article

Cite this article

BrainAlignNet can perform non-rigid registration to align the neurons in the C. elegans head.

BrainAlignNet supports calcium trace extraction with high accuracy and high SNR.

BrainAlignNet can be used to perform neuron alignment in jellyfish.

The AutoCellLabeler Network can automatically annotate >100 neuronal cell types in the C. elegans head.

Variants of AutoCellLabeler can annotate neurons from fewer fluorescent channels and in different strains.

CellDiscoveryNet and ANTSUN 2 U can perform unsupervised cell type discovery by analyzing data across different C. elegans animals.

Author details

Adam A Atanas

Contribution

Competing interests

Alicia Kun-Yang Lu

Contribution

Competing interests

Brian Goodell

Contribution

Competing interests

Jungsoo Kim

Contribution

Competing interests

Saba N Baskoylu

Contribution

Competing interests

Di Kang

Contribution

Competing interests

Talya S Kramer

Contribution

Competing interests

Eric Bueno

Contribution

Competing interests

Flossie K Wan

Contribution

Competing interests

Karen L Cunningham

Contribution

Competing interests

Brandon Weissbourd

Contribution

Competing interests

Steven W Flavell

Contribution

For correspondence

Competing interests

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism