CRF_ID for multi-cell images. a) Computational workflow starting from image acquisition to final cell identity predictions. a-1,2,3) Image preprocessing steps include automatic cell segmentation and coordinate axes prediction. a-4) Feature variables that represent positional relationships of the cells are extracted (PA, posterior and anterior; LR, left and right; DV, dorsal and ventral). a-5) The CRF algorithm maximizes the similarities between the extracted features from the images and those from an atlas. a-6) The final results are represented as a list of most likely neuron candidates for each cell with predicted probabilities. b) The atlas can be customized to meet the specifications of the images, and this is easily done by compiling and averaging annotated data.

Improved method of assigning coordinate axes. a) coordinate axes for multi-cell images generated by PCA alone are not accurate. A two-step correction process is implemented: correction of the AP axis by using natural landmarks and correction of LR, DV axes by searching for the best plane of symmetry. b) The corrected axes are more accurate than the previous axes generated by PCA alone as they show decreased angle deviations from the ground truth axes for all three coordinate axes. c) Corrected axes result in a higher and comparable neuron ID accuracy (correspondence to manual cell annotations) when compared with PCA predicted axes and ground truth axes, respectively. Best single prediction results are reported. Two sample t-tests were performed for statistical analysis. The asterisk symbol denotes a significance level of p<0.05.

Characterizing the importance of data-specific atlases. a,b) Several example atlases (b) are compared on their performance in neuron ID prediction on the glr-1p::NLS-mcherry-NLS multi-cell images (a). c, d) Difference of each atlas from the most accurate available atlas (glr-1 25 datasets) in terms of pairwise angle relationships (c) and PA/ LR/ DV positional relationships (d). e) The neuron ID accuracy (correspondence to manual cell annotations) depends greatly on the atlas used. Each data point represents the cell cluster from one animal (n=26). Best single prediction results are reported. Two sample t-tests were performed for statistical analysis. The asterisk symbol denotes a significance level of p<0.05.

Characterization of neuron identification accuracy using CRF_ID. a) Side by side comparison of automated neuron ID accuracy and manual neuron ID accuracy. Each datapoint represents the cell cluster from an animal. For automated neuron ID, top 1, 2, 3 results are from an iterative operation of the CRF_ID algorithm, and the best single prediction (BSP) results are from a single run. The atlas is a compiled data from three different annotators. The ground truth labels are defined by the consensus of the three annotators. b) No significant differences in best single prediction accuracies are found when using atlases derived from data annotated by different annotators. c) There is a positive correlation between the automatic and manual neuron ID accuracy of each neuron. Two sample t-tests were performed for statistical analysis. The asterisk symbol denotes a significance level of p<0.05.

Multi-cell neuron identification in in-vivo gene expression analysis. a) CRF_ID 2.0 facilitates multi-cell annotation by providing top 3 most likely neuron labels for each cell, from which the user makes the final decision. b,c) The example strain contained extrachromosomal (c) and integrated transcriptional reporters for glr-1 (c). Plotted are the neuron-specific gene expression levels displayed as the normalized fluorescence intensities of identified neurons. The neurons labels on the x axis are listed in the descending order of single-cell RNA sequencing expression levels reported by CeNGEN. Box plots indicate median, quartiles and whiskers indicate 1.5 IQR. Data points indicate signals from individual worms.

Neuron ID accuracy no longer depends on the axes inaccuracy after axes correction. a) High negative correlation between axes inaccuracy and neuron ID accuracy before axes correction. b) No correlation between axes inaccuracy and neuron ID accuracy after axes correction. c). No correlation between worm orientation and neuron ID accuracy.

A more detailed visual representation of the difference of each atlas from the best available atlas (glr-1 from 25 datasets). a) differences in angular relationships.b) differences in PA/LR/DV relationships.

a) correspondence between any two among the three annotators. b) Slight correlation between the fraction of cells unanimously labeled by 3 annotators and neuron ID correspondence.

High correlation between extrachromosomal (mCherry) and integrated (GFP) transgene expressions.