CRF_ID for multi-cell images. a) Computational workflow starting from image acquisition to final cell identity predictions. a-1,2,3) Image preprocessing steps include automatic cell segmentation and coordinate axes prediction. a-4) Feature variables that represent positional relationships of the cells are extracted (PA, posterior and anterior; LR, left and right; DV, dorsal and ventral). a-5) The CRF algorithm maximizes the similarities between the extracted features from the images and those from an atlas. a-6) The final results are represented as a list of most likely neuron candidates for each cell with predicted probabilities. b) The atlas can be customized to meet the specifications of the images, and this is easily done by compiling and averaging annotated data. The images are showing half volume (left or right side) of the specimen for illustration.

Improved method of assigning coordinate axes. a) coordinate axes for multi-cell images generated by PCA alone are not accurate. A two-step correction process is implemented: correction of the AP axis by using natural landmarks and correction of LR, DV axes by searching for the best plane of symmetry. b) The corrected axes are more accurate than the previous axes generated by PCA alone as they show decreased angle deviations from the ground truth axes for all three coordinate axes. c) Corrected axes result in a higher and comparable neuron ID accuracy (correspondence to manual cell annotations) when compared with PCA predicted axes and ground truth axes, respectively. Best single prediction results are reported. Two sample t-tests were performed for statistical analysis. The asterisk symbol denotes a significance level of p<0.05.

Characterizing the importance of data-specific atlases. a,b) Several example atlases (b) are compared on their performance in neuron ID prediction on the glr-1p::NLS-mCherry-NLS multi-cell images (a). c) The neuron ID accuracy (correspondence to manual cell annotations) depends greatly on the atlas used. Each data point represents the cell cluster from one animal (n=26). Best single prediction results are reported. Two sample t-tests were performed for statistical analysis. The asterisk symbol denotes a significance level of p<0.05. d, e) Difference of each atlas from the most accurate available atlas (glr-1 25 datasets) in terms of pairwise angle relationships (d) and PA/ LR/ DV positional relationships (e). All distributions in panel d and e had a p-value of less than 0.0001 for one sample t-test against zero.

Characterization of neuron identification accuracy using CRF_ID 2.0. a) Side by side comparison of automated neuron ID accuracy and manual neuron ID accuracy. Each datapoint represents the cell cluster from an animal. For automated neuron ID, top 1, 2, 3 results are from an iterative operation of the CRF_ID algorithm, and the best single prediction (BSP) results are from a single run. The atlas is a compiled data from three different annotators. The ground truth labels are defined by the consensus of the three annotators. Two sample t-tests were performed for statistical analysis. b) No significant differences in best single prediction accuracies are found when using atlases derived from data annotated by different annotators. One-way ANOVA was performed for statistical analysis. c) There is a positive correlation between the automatic and manual neuron ID accuracy of each neuron.

Multi-cell neuron identification in in-vivo gene expression analysis. a) CRF_ID 2.0 facilitates multi-cell annotation by providing top 3 most likely neuron labels for each cell, from which the user makes the final decision. b,c) The example strain contained extrachromosomal (b) and integrated reporter transgenes for glr-1 (c). Plotted are the neuron-specific gene expression levels displayed as the normalized fluorescence intensities of selected neurons. The neurons labels on the x axis are listed in the descending order of single-cell RNA sequencing expression levels reported by CeNGEN (Taylor et al., 2021). Box plots indicate median, quartiles and whiskers indicate 1.5 IQR. Data points indicate signals from individual worms. The images are showing half volume (right side) of the specimen for illustration. a.u.: arbitrary unit.

Neuron ID accuracy no longer depends on the axes inaccuracy after axes correction. a) High negative correlation between axes inaccuracy and neuron ID accuracy before axes correction. b) No correlation between axes inaccuracy and neuron ID accuracy after axes correction. c). No correlation between worm orientation and neuron ID accuracy.

A more detailed visual representation of the difference of each atlas from the best available atlas (glr-1 from 25 datasets (D.S.)). a) differences in angular relationships. The red color intensity indicates the angle differences of neuron pair vectors in the particular atlas and those in the best available atlas. b) differences in PA/LR/DV relationships. The blue color intensity (ranging from 0 to 3) indicates the summed absolute differences in the three pairwise relationships between the atlas and the best available atlas.

No correlation between the degree of mosaicism (fraction of cells expressed in the worm) and neuron ID correspondence.

a) correspondence between any two among the three annotators. b) Slight correlation between the fraction of cells unanimously labeled by 3 annotators and neuron ID correspondence.

Neuron-specific expressions of the glr-1 gene for neurons on the “dim” side of the specimen. These left/right paired neurons are on the side of the C. elegans farther away from the objective. a) Extrachromosomal transgene expression. b) Integrated transgene expression.

High correlation between extrachromosomal (mCherry) and integrated (GFP) transgene expressions. Each data point indicates the GFP and mCherry (RFP) intensities of a single neuron.