Data pipeline for automated segmentation and quantitative characterization.

Samples are fixed, stained with PTA, and embedded in resin prior to scanning at synchrotron micro-CT facilities. Representative areas are selected for manual segmentation, enabling machine learning algorithm training and the establishment of ground truth validation sets. Whole unsegmented samples are processed through the automated segmentation pipeline and compared to the manually segmented volumes: cellular-level statistical shape and image intensity attributes are extracted from both sets and confirmed in the same regions. Processing multiple mutant and wild-type samples at different developmental stages allows for accurate blood cell comparisons between experimental and control groups.

Comparison of slice images and 3D renders of blood cells acquired with micro-CT and serial-section electron microscopy (ssEM).

(A) Blood cells from our 5-dpf dataset were compared to (B) a published SEM dataset of another 5-dpf fish. (C, D) Both datasets were automatically segmented using Ilastik and rendered in 3D. Average volumes of individual blood cells were compared between 3 manually segmented EM-scanned cells and the manually segmented micro-CT validation set from the heart, with the values differing by only 8 “µm3” (acquired using ITK-SNAP).

Blood cells were identified in micro-CT datasets based on their known staining patterns in histology.

(A) Hematoxylin and eosin stained 5-micron thick slices of a normally developing 5-dpf zebrafish are compared to (B) a 5-micron thick maximum intensity projection taken at a similar angle from a micro-CT scan of a similar, normal 5 dpf fish. Staining patterns between histology and PTA-stained micro-CT images show similar staining across anatomic regions, including the heart (C, D). Individual nucleated blood cells can be distinguished across both imaging modalities for segmentation. Physical cutting artifact in the histology section push certain features out of the section (e.g. yolk in the bottom right of the zoom-in of the histology is partially missing due to loss during floating of the tissue section, and is fully present not present in the virtual section of the micro-CT dataset.)

An iterative, machine learning-based training approach was used to isolate non-specific blood cell voxels for automated segmentation.

(A) A 2D example of random forest classifier-based iterative model development is presented on a single slice of a wt 5 dpf fish dataset, where the pixel prediction of each tissue class improves with additional data. (A’) A single annotated pixel per class can separate yolk and empty background space but does not separate the tissues into distinct zones of confidence and is highly error prone. A round of basic annotation (A’’) significantly improves the result, but still shows classification issues in complex areas (e.g., brain regions interpreted as muscle in the green box). Additional annotation specific to those areas (A’’’) further refines the accuracy of the pixel classification. (B) Output from a trained Ilastik model is in the form of a probability map for each class. The probability map for blood cells in the same slice is shown as a heat map, and (B’) is overlaid back onto the original data. (C) Input to Cellpose for automated segmentation of individual cells was generated by multiplying the probability map by the original image, generating a dataset that retains any cellular properties in the volumes of interest ((C’) in this case intensity profiles of blood cells) while suppressing all other anatomic background.

Automated segmentation workflow visually confirms boundary detection of cells and is further improved by shape statistic filtering.

(A) A 2D slice of micro-CT data from a 5 dpf fish is shown with the heart highlighted in red. (B) The same region is presented as a result of the image multiplied by the corresponding probability map. Arrows point to regions of high probability that resemble blood cells but do not share the manually-derived blood cell shape statistics. (C) Initial segmentation results identify these areas, despite (D) visually confirmed accurate boundary detection. (E) Automated removal of any detected objects that do not fall within the known shape parameters of zebrafish blood cells removes these objects. The 3D render of segmented blood cells in entire region is shown in (F).

Filtering the preliminary cell segmentation results by biologically informed shape and intensity statistics removes false positives from the dataset while retaining segmented blood cells in expected regions.

(A) Cellpose output of the processed Ilastik data filters cells from most areas of the body but leaves false positives in tissues that resemble blood cells in texture, such as neural nuclei in the brain and eye. (B) Filtering these preliminary results by known biological characteristics of blood cells, obtained from automated measurements of manually segmented validation sets, removes these false positives while retaining true positives within expected regions such as the heart and major vessels.

Degree of physical overlap, measured as an F1 score, between automatically segmented blood cells and manual sets validates the accuracy of our automated workflow.

(A) Validation regions in 5dpf fish were taken from a section of the dorsal aorta with few blood cells and abundant tissue background (a’), and the heart, which is dense in blood cells and low in other biological background (a’’). This allowed us to test automated segmentation accuracy in widely varied regions. (B) A region used for validation is shown with various degree of overlap between manually and automatically segmented cells. Arrows point to examples of clear false positives (black arrow), clear false negatives (blue arrow) and potential true positives (white arrow). (C) Optimizing similarity of automatically segmented cells to manual validation sets was done quantitatively. F1 scores were generated using increasing Dice-Sorensen coefficients as requirements for what degree of overlap between a manually segmented and automatically segmented cell qualifies as a true positive. Pre-processing the raw micro-CT data using the probability map and post-processing the data by filtering cells using the known biological shape statistics of blood cells increased the performance of the automated segmentation pipeline, both in validation regions of the heart and aorta (blue arrows). (D) The overlap (D’’’) of a single cell, segmented manually (D’) and automatically (D’’), is shown as a 3D render. Qualitative observation of an optimized output shows a large degree of overlap (blue) between the segmented data.

Automated segmentation pipeline was optimized through maximizing Dice-Sorensen coefficients between automated results and manually segmented validation sets.

(A) Generated F1 scores for various Dice-Sorensen cutoffs are shown for cells automatically segmented in the heart (a” in Figure 7). The pipeline was adjusted to segment cells with a series of average diameters of 3 pixels (D3) through 9 pixels (D9). The closest computational similarity between automatically and manually segmented cells was obtained for an average diameter of 6 pixels, which corresponds to the average spherical radius of manually-segmented zebrafish blood cells. Visual assessment for an individual cell in the center of the distribution illustrates this result (B). The same cell is rendered as the manual segmentation from the validation set (red), automated results (green), and volumes of overlap (blue) for diameters of 3 pixels (B’), 6 pixels (B’’), and 9 pixels (B’’’). Visual results confirm the highest Dice-Sorensen score when using an expected diameter of 6 pixels.

Rendering of detected blood cells in context of original data.

(A) All detected blood cells in a 5-dpf zebrafish are shown overlaid onto a rendering of half of the background anatomical data, cut sagittally. (B) A transversely cut, zoomed-in render of the detected cells in the heart region of the 5-dpf fish is shown, illustrating a high concentration of blood cells. A zoomed-in render of the raw zebrafish data with all detected cells is provided across large areas of the (C) air bladder muscles, (D) brain neural nuclei, and (E) yolk sack to emphasize the absence of detected blood cells in regions expected to lack them.

The automated segmentation pipeline detects blood cells in thin blood vessels.

(A) A 5 µm thick MIP of a vessel approximately 10 µm in diameter running adjacent to the yolk shows blood cells in various orientations. (B) A MIP of a thinner, approximately 5 µm thick blood vessel running to the brain is presented with overlaid detected cells oriented in the same direction. A dotted line is shown between its two halves where the tortuous nature of the vessel travels out of slice. (C) The nature of 3D data allows the same vessel in (B) to be straightened out digitally for interrogation as a single line. (D) The walls of thin vessels with few blood cells are easier to visualize due to their high amount of empty space.

Comparison of blood cell shape statistics reveals distribution differences between samples.

(A) Distributions of wild-type (WT) and hht blood cell volume measurements extracted from the automated segmentation pipeline are shown plotted according to zebrafish sample. (B) Linear discriminant (LD) analysis applied to shape attributes from each sample reveals separation of cells by genetic and age phenotypes. (C) Distribution of cell volume within the wild-type genotype is shown stratified by age. (D) Distribution of cell volume within the hht genotype is shown stratified by age.

3D renders of blood cell distributions in 4- and 5-dpf wt and hht mutant fish.

Blood cells (rendered in red) are shown overlaid onto a cut-in render of the background datasets. Cellular distribution is visibly different across the widely varied shapes of the WT and hht samples. (A, C) The highest concentration of blood cells in the 4- and 5-dpf WT fish can be seen in the heart and dorsal aorta, with various blood cells distributed through vessels in the body. (B, D) This distribution is roughly conserved in the hht samples, though the progressive degradation of the cells and organs alters the heart shape, cell counts, and (E) cell shape between the 5-dpf samples.

Elongation: A ratio of the moments of every individual cell along the principal long and medium axes.

Flatness: The degree to which each label approximates a mathematical plane. Calculated as a ratio of the moments of every individual cell along the smallest and medium axes.

Equivalent Radius: Radius of a sphere with the same volume as the cell label.