Introduction

Modelling the complexity of the human brain and its (dys)function has proven to be notoriously challenging. This is due to its intricate wiring, the cellular heterogeneity and species-specific differences1. To increase the translational value of neuroscientific research, there is a need for physiologically relevant human models. With the advent of human induced pluripotent stem cell (iPSC) technology, it has become possible to generate a wealth of brain-resident cell types including neurons2, astrocytes3, microglia4, oligodendrocytes5 and endothelial cells6, allowing the study of complex polygenic pathologies that cannot be modelled in animals, opening avenues to precision pharmacology7,8. Furthermore, the ability of iPSC to develop into organoids renders them attractive for studying multi-cellular interactions in a 3D context that is closer to the in vivo situation9. However, genetic drift and batch-to-batch heterogeneity cause variability in reprogramming and differentiation efficiency10. This hinders the use of iPSC-derived cell systems in systematic drug screening or cell therapy pipelines. Quality control in such settings demands a robust and cost-effective approach. Current culture validation methods include sequencing, flow cytometry and immunocytochemistry 10,11. These methods are often low in throughput, costly and/or destructive. Thence, we set out to develop a method for evaluating the composition of such cultures based on high-content imaging, which is fast and affordable. The primary criteria were to facilitate cell type identification with cellular resolution while preserving spatial information and ensuring compatibility with subsequent immunocytochemistry or molecular assays for further biological inquiries. To this end, we employed the Cell Painting (CP)12,13 method, which is based on labelling cells with simple organic dyes and analysing the resulting phenotypes. CP has proven to be a powerful and generic method for predicting the mode-of-action of pharmacological or genetic perturbations, and this sheerly using a cell morphology readout1416. Thus far, CP has primarily been utilized to predict conditions associated with pharmacological treatments or genetic modifications using images as input. An exemplary achievement in this context has been realized by the JUMP consortium, which has published an open-source CP image dataset comprising over 13,000 distinct conditions17. Recognizing the predictive potential of morphological information, we explored its utility at the single-cell level, shifting focus from the image level to predict individual cell types. By implementing deep learning for cell classification and tuning the input space, we established an approach that allows recognizing cell types with high accuracy, even in very dense cultures. We illustrate its potential by using it to distinguish iPSC-derived cell types in co-culture and to assess the maturity of differentiating neurons.

Results

Neural cell lines have a unique morphotextural fingerprint

Several groups have demonstrated that morphological cell profiling can be used to discriminate pharmacogenomic perturbations based on phenotypic similarity1416. We asked whether a similar approach could be leveraged to unequivocally distinguish individual cell types as well. To this end, we implemented a version of CP based on 4-channel confocal imaging (Fig. 1A) and applied it to monocultures of two neural cell lines from a different lineage, namely astrocyte-derived 1321N1 astrocytoma cells and neural crest-derived SH-SY5Y neuroblastoma cells. First, we explored whether traditional morphotextural feature extraction from cells and their nuclei provided sufficient distinctive power. Representation of the resulting standardized feature set in UMAP space revealed a clear separation of both cell types without replicate bias (Fig. 1B). Clustering of instances was less pronounced after principal component analysis (Suppl. Fig. 4 A), but UMAP better preserves local and global data structure 18. When mapping the individual features to the UMAP space, we found that both texture (e.g., energy, homogeneity) and shape (e.g., nuclear area, cellular area) metrics contributed to the separation (Fig. 1C). The contribution of intensity-related features (e.g., mean, max, min, std intensity) to cell type separation was less pronounced as they were more correlated with the biological replicate. Thus, we concluded that cell types can be separated across replicates based on a morphotextural fingerprint.

| Shallow classification using manually defined features.

(A) Image overview of 1321N1 cells after CP staining acquired at higher magnification (Plan Apo VC 60×A WI DIC N2, NA = 1,20). Scale bar is 30µm. (B) UMAP dimensionality reduction, color coded for either cell type condition (left) or replicate (right). (C) Feature importance deducted from the UMAP (feature maps) and RF (Gini importance). Three exemplary feature maps are highlighted alongside the quantification per cell type. (D) Random Forest classification performance with confusion matrix and Gini importance. All features used in the UMAP are used for RF building.

CNN outperforms random forest in cell type classification

To evaluate whether the morphotextural fingerprint could be used to predict cell type, we performed Random Forest (RF) classification, using different seeds for splitting up training and validation sets. This resulted in a rather poor prediction accuracy of 71,0±1,0%, mainly caused by the significant (46%) misclassification of 1321N1 cells (Fig. 1D). The imbalance in recall and precision was surprising given the clear separation of cell populations in UMAP space using the same feature matrix. When inspecting the main contributions to the RF classifier, we found very similar features as highlighted in UMAP space to add to the discrimination (Fig. 1C). Where the top 2 features (nuclear Cy3 energy and cellular area) showed the expected gradient along the UMAP1 direction, the third parameter (cellular DAPI contrast) had no contribution to UMAP separation. Reducing noise by manually removing redundant and correlating features could not ameliorate RF performance, which may be due to the documented bias in feature selection for node splitting in high-dimensional data19. This result drove us to improve the classification accuracy by means of a ResNet20 convolutional neuronal network (CNN). Here, we no longer relied on the extraction of features from segmented cell objects, but instead used image crops centred around individual cells (bounding box around cell centroid) and blanked for their surroundings as input (Fig. 2A). Using this approach, we found a significantly higher prediction accuracy of 96,0±1,8%, with a much more balanced recall and precision (Fig. 2B). Even with only 100 training instances per class, CNN outperformed RF, but optimal performance was attained with 5000 training instances (Fig. 2C). As expected, a model trained on all three biological replicates performed slightly worse (lower accuracy and higher variability) at predicting a test set of a given replicate than a model trained on that same replicate (Fig. 2D). However, models trained on two different replicates outperformed those that were only trained on one when used for predicting instances of a third (unseen) replicate. This emphasizes the need for including sufficient variation in the training set (Fig. 2E). Although more performant, CNN classification does not allow direct retrieval of discriminative features, which complicates model interpretation. To gain a visual understanding of image information contributing most to the classification we therefore resorted to Gradient-weighted Class Activation Mapping (Grad-CAM)21. This revealed that the cell borders (edge information), nuclear and nucleolar signals were the most distinctive features (Fig. 2F). When scrutinizing CNN misclassifications, we found that these are mainly caused by faulty cell detection (e.g., oversegmentation of cell ramifications), unhealthy/dead cells, debris or visibly aberrant cell shapes (Suppl. Fig. 4B) – mistakes not necessarily attributed to the CNN. To shed light on the contribution of individual markers to the classification, we eroded the input to single channel images. For all cases, the prediction performance was below or equal to 85,0% suggesting that no single channel contains all relevant information (Fig. 2 G). Combinations of two or three channels could not match the prediction accuracy of the full four-channel image either. Thus, we concluded that all channels contribute at least in part to a successful classification.

| Convolutional neural network classification of monoculture conditions.

(A) Schematic of image pre-processing for CNN classification. (B) CNN accuracy and confusion matrix on cell type classification in monocultures. (C) Shallow vs. deep learning accuracy for varying training set sizes. Dots represent different training iterations (cross-validation). Train-validation-test dataset selection was performed with the same random seed at 60-10-30 ratios. (D) The impact of variability in the training dataset on model accuracy for shallow vs. deep learning. Mann-Whitney-U-test, p-values resp. 7,47e-6 and 1,96e-4. (E) The performance of deep learning classifiers trained on single replicates (low variability) and mixed replicates (high variability) on unseen images (independent replicates). Kruskal-Wallis test on single training condition, p-value of 0,026 with post-hoc Tukey test. Mann-Whitney-U-test on mixed training, p-value of 7,47e-6. (F) GradCAM map highlighting the regions on which the CNN relies for making its prediction. (G) Performance accuracy of the CNN when only a selection of image channels is given as training input.

Nucleocentric predictions remain accurate regardless of culture density

Morphological profiling relies on cell identification for adequate class prediction. This may become difficult in dense cultures such as iPSC-cultures, clustered cells and tissue (mimics). Having established a method to distinguish cell types with high accuracy, we next asked how robust the classification was to increasing cell density. To this end, we acquired images of 1321N1 and SH-SY5Y monocultures, grown at densities ranging from 0 to 100% confluency (Fig. 3A). Based on the nuclear count, we binned individual fields into 6 density classes (0-20%, 20-40%, 40-60%, 60-80%, 80-95%, 95-100%) and trained a CNN with equal sampling of cell numbers per density class to avoid bias. No decrease in accuracy was observed until culture density reached 80% confluency. Only for very high densities (95-100%), we found a significant decrease in the prediction accuracy (92,0±1,7%) (Fig. 3B). We reasoned that under these conditions cell shape would be predominantly determined by neighbouring cells and cell segmentation performance would decrease (Suppl. Fig. 1). Nuclei are less malleable and even in dense cultures remain largely separated, allowing their robust segmentation and avoiding CNN misclassifications resulting from segmentation errors. Hence, we asked whether using the nuclear ROI as input would improve classification performance at high densities despite containing less cellular information. However, regardless of a relatively high average prediction accuracy of about 92,4±1,0% (Fig. 3C), the performance was lower than cell ROI across the density range. Thence, we tested an intermediate condition in which nuclear ROI with their direct surroundings (a square bounding box of 60×60 microns surrounding the nuclear centroid coordinates) were considered as input (Suppl. Fig. 4C). This resulted in a classification accuracy of 95,0±0,4% (Fig. 3C) comparable to that of models trained on whole-cell inputs for low to moderate cell densities, but also maintained prediction performance at high cell densities (Fig. 3B). To understand why this was not the case for nuclear crops, we inspected Grad-CAM output for these predictions, and found that the latter diverted the attention of the CNN to the background (Fig. 3D). We interpret this result as the CNN using the background as a proxy for nuclear size. As nuclear area dropped with culture density, the dynamic range decreased, which could explain the increased error rate of the CNN for high densities unrelated to segmentation errors (Suppl. Fig. 4B). Thus, we conclude that using the nucleocentric region as input for the CNN is a valuable strategy for accurate cell phenotype identification in dense cultures.

| The impact of regional information on CNN performance in relation to culture density.

(A) Definition of different culture density categories with corresponding images. (B) The effect of culture density (number of cells within the FOV) on prediction accuracy, for each of the different input regions. Kruskal-Wallis test for each of the regions, statistically significant decrease (p-value 2,07e-5) between the most dense condition with all other density categories for the cell region. (C) CNN accuracy for different input regions. Kruskal-Wallis test, p-value of 4,29e-4 with post-hoc Tukey test. (D) GradCAM intensity plots highlighting the regions on which the CNN relies for making its prediction in nucleocentric (top) and nuclear (bottom) crops.

Cell prediction remains accurate in mixed cell culture conditions

Although a very high classification accuracy was obtained with nucleocentric CNN predictions, both training and testing were performed with input images drawn from monocultures. As our goal was to allow cell type prediction in complex, heterogeneous cultures, we next switched to a more faithful situation in which we co-cultured both cell types. Ground-truth for these predictions was generated by either performing post-hoc immunofluorescence (IF) with cell-type specific antibodies (CD44 for 1321N1 and TUBB3 for SH-SY5Y cells), or by differential replication labelling (EdU or BrdU for either of the cell types) after dye quenching22 (Fig. 4A). The latter proved significantly more successful for binary ground truth classification (through intensity thresholding) than the former (Fig. 4B). When training a CNN to recognize the cell types using these post-hoc markers as ground truth, we found that the prediction accuracy on the left-out dataset of the co-culture (Co2Co) was almost as high as when a model was trained and tested on monocultures (Mono2Mono) (Fig. 4C).

| Cell type prediction in mixed cultures by post-hoc ground truth identification.

(A) Schematic overview of virtual vs. physically mixed cultures and the subsequent CNN models. For virtual mixing, individual cell types arise from distinct cultures, the cell type label was assigned based on the culture condition. For physical mixing, the true phenotype of each crop was determined by intensity thresholding after post-hoc staining. Three model types are defined: Mono2Mono (culture-based label), Co2Co (cell-based label) and Mono2Co (trained with culture-based label and tested on cell-based label). (B) Determining ground-truth phenotype by intensity thresholding on IF (top) or pre-label (bottom). The threshold was determined on the intensity of monocultures. Only the pre-labelled ground-truth resulted in good cell type separation by thresholding for 1321N1 vs. SH-SY5Y cells. (C) Mono2Mono (culture-based ground truth) vs. Co2Co (cell-based ground truth) models for cell type classification. Mann-Whitney-U-test p-value 0,0027. (D) Monoculture-trained models were tested on mixed cultures. Pretrained models were trained on independent biological replicates. These could be finetuned by additional training on monoculture images from the same replicate as the coculture. This was shown to reduce the variation between model iterations (Median performance: Mann-Whitney-U-test, p-value 0,0015; Coefficient of variation: Mann-Whitney-U-test, p-value 3,48e-4)

We then tested whether it was possible to use a classifier trained on monocultures for cell type prediction in co-culture (Mono2Co). This resulted in a prediction accuracy of 85,0±4,6%. We contributed this drop in accuracy to the effect on cell phenotype exerted by the presence of other cell types in mixed cultures which is not captured in monocultures. As the images from monocultures and co-cultures were obtained from different plates, we suspected inter-replicate variability in culture and staining procedures to contribute in part to this lesser performance. Therefore, we tested whether we could improve the performance of the CNN by including monocultures from the same plate as the co-cultures. This finetuning indeed improved the average performance to 89,0±0,7%, and more importantly, it significantly reduced the variability (coefficient of variation) of the predictions (Fig. 4D). Thus, while not yet reaching the same accuracy, it proves that it is possible to establish a model that recognizes cell types in co-cultures that is solely trained on monocultures.

Nucleocentric phenotyping can be applied to stage iPSC-derived neuronal cultures

iPSC-derived neuronal cell cultures suffer from significant inter– and intra-batch variability and could benefit from an efficient quality control23,24. Thence, we applied our nucleocentric phenotyping pipeline to stage the maturity of a neuronal cell culture based on its cell type composition. Using a guided protocol2, two differentiation stages were simulated: a primed stage, where most cells are cycling neural progenitors (NPCs), and a differentiated stage where most cells are post-mitotic neurons (Fig. 5A). The two cell types were discriminated by post-hoc IF labelling for the cell cycle marker Ki67 (NPC) and the microtubule marker ß-III-tubulin (TUBB3, neurons). Not all cells in the CP image could be assigned with a ground truth label due to cell detachment upon re-staining or sheer absence of the tested marker. Since no single monoculture consists of 100% of either cell type, we applied gates to retain only those cells that show unequivocal staining for either one of both markers. Based on these gates, ROIs were either classified as neuron (Ki67-/TUBB3+), NPC (Ki67+/TUBB3-) or undefined (outside of gating boundaries). We assume the latter category to represent transitioning cells in intermediate stages of neural development, un-or dedifferentiated iPSCs. (Fig. 5B). This gating strategy resulted in a fractional abundance of neurons vs. NPC of 36,4 % in the primed condition and 80,0% in the differentiated condition (Fig. 5C). UMAP dimensionality reduction revealed a clear clustering of both phenotypes suggesting that the morphotextural profile captured their distinct phenotypic fingerprint and should enable their separation (Fig. 5D).

| iPSC-derived differentiation staging using morphology profiling.

(A) Schematic overview of guided vs. spontaneous neural differentiation. (B) Representative images of morphological staining and post-hoc IF of primed and differentiated iPSC-derived cultures (guided differentiation). Ground-truth determination is performed using TUBB3 (for mature neurons) and Ki67 (mitotic progenitors). Scale bar is 200µm. (C) Fraction of neurons vs. NPC cells in the primed vs. differentiated condition as determined by IF staining. Upon guided differentiation, the fraction of neurons increased. (D) Following feature extraction (handcrafted features), supervised UMAP dimension reduction shows separation between the primed vs. differentiated cultures. (E) CNN performance when classifying neurons (Ki67-/TUBB3+) vs. NPC (Ki67+/TUBB3-) cells using either a condition-based or cell-type based ground truth. Mann-Whitney-U-test, p-value 4,04e-4. (F) Fractional abundance of predicted cell phenotypes (NPC vs. neurons) in primed vs. differentiated culture conditions using the cell-based CNN. (G) Representative images of spontaneously differentiating neural cultures. Scale bar is 200µm. (H) Handcrafted features were extracted from the morphological images. These morphological profiles were plotted onto the UMAP feature space created using the guided differentiation (panel D). Plotting the points of spontaneously differentiating cultures onto this space, revealed gradual displacement of the morphological profile from the NPC towards the neuronal cluster with increasing differentiation time. (I) Quantification of the number of points in each cluster of panel H. (J) Prediction of differentiation status using the cell-based CNN model trained on guided differentiated culture.

In a first attempt to classify cells within the guided culture, we trained a CNN on individual cell inputs using the culture condition as a weak label. This resulted in a prediction accuracy of 87,0±0,9%, which is not unexpected given the cellular heterogeneity within each condition. When we used the IF ground truth labels instead, we obtained a classification performance of 96,0±0,5 % (Fig. 5E). Applying this cell-based (as opposed to condition-based) CNN to the two culture conditions resulted in predicted fractional abundance of neurons to NPC of 40,5% in images of the primed condition and 74,2% in images of the differentiated condition – aligning well with the manually defined ratios (Fig. 5F).

We then went on to evaluate the established cell-based classifier to a primed neuronal cell culture undergoing gradual spontaneous differentiation after dual-SMAD inhibition25. We examined cell cultures at 13, 30, 60 and 90 days in vitro (DIV) after the start of the differentiation process and visually confirmed a gradual change in neural maturity, as evidenced by the increasing dendritic outgrowth (Fig. 5G). In absence of post-hoc ground truth labelling, we plotted the cells of the different DIVs onto the UMAP space of the guided differentiation and quantified their relative abundance in regions encompassing either NPCs or neurons (Fig. 5H-I). This confirmed a shift in the neuron-to-NPC fractional abundance with an especially strong accumulation of neurons at DIV90. The cell-based CNN model trained on the guided differentiation dataset recapitulated this shift in its predictions (Fig. 5J).

Nucleocentric phenotyping can be applied to characterize mixed iPSC-derived neuronal cultures

Next to NPCs and neurons, iPSC-derived neuronal cultures are often studied in conjunction with other relevant cell types that influence neuronal connectivity and homeostasis such as astrocytes and microglia26. Therefore, we tested whether the cell-based approach could be extended to these cell types as well. We generated iPSC-derived astrocytes, neurons, and microglia from the same parental iPSC line (Fig. 6A) and trained a CNN to identify each cell type. In monocultures, this led to a prediction accuracy of 96,8±1,0% (Fig. 6B) and in mixed cultures of neurons and microglia (using NeuN, resp. Iba1 as ground-truth IF markers) (Fig. 6C) an accuracy of 96,9±0,5% (Fig. 6D). Based on these results, we conclude that nucleocentric phenotyping can be used to gauge the composition of iPSC-derived cultures.

| iPSC cell type identification using morphology profiling.

(A) Representative images of iPSC-derived neurons, astrocytes and microglia with morphological staining. Scale bar is 200µm. (B) Prediction accuracy of a CNN trained to classify monocultures of iPSC-derived astrocytes, microglia and neurons (panel A) with confusion matrix (average of all models). (C) Representative images of a mixed culture of iPSC-derived neurons and microglia. Ground-truth identification was performed using IF. (D) CNN accuracy of a model trained to identify both cell types in mixed culture mounted up to 98%.

Discussion

iPSC-derived cell cultures can improve translatability of biomedical research but represent a challenge due to their higher variability and multicellular composition. In this research, we have presented a method to efficiently identify neural cell types in mixed cultures to aid in their validation and application in routine screening settings. We first benchmarked our approach using neural cell lines and found that a CNN-based approach outperforms shallow learning (RF) based on handcrafted feature extraction, even with a limited number of input images. This aligns well with recent data showing CNN-based feature representations outperformed shallow techniques for condition-based drug response prediction27.

We found that all channels contribute to the overall prediction accuracy but that the whole cell is not required to obtain the highest prediction accuracy, especially in dense cultures. In line with earlier studies that highlight the biomarker potential of the nucleus for patient stratification28, cell state identification29 or drug response14, we found that the nuclear ROI as such already carries highly distinctive information for cell type prediction. However, extension to its direct surroundings proved to be the best and most robust input across a range of cell densities. Since errors made by the CNN largely arose from segmentation errors, we hypothesize that such errors will be most pronounced in dense cultures. The nucleocentric approach, which is based on more robust nuclear segmentation, minimizes such mistakes. Furthermore, it opens possibilities for applying cell profiling to even more complex 3D cell systems such as tissue slices or organoids, where cell segmentation becomes extremely challenging.

One important conclusion from this work is that cell types can be identified in mixed cultures solely using the input from monocultures. This implies that cells retain their salient characteristics even in complex heterogeneous environments. While for now, predictions are still superior when based on training on mixed cultures, we found that the prediction accuracy of monoculture trained models can be increased by employing replicate controls. This suggests that it may become possible to apply or adapt existing models without the need for a cell-level ground truth (as provided by post-hoc labelling). This technique could potentially be of use for cultures or tissues where no antibody-or pre-labelling is possible (e.g., no unique IF marker is available, non-replicating cells). To increase the accuracy one could resort to intelligent data augmentation30 or transfer learning31 strategies. While a ground-truth independent method holds promise for bulk predictions (e.g., for quality control purposes), the use of post-hoc labelling allows building more refined models that can distinguish multiple cell types and/or cell states at once. Especially using the cyclic staining that we have used22, much richer information content can be gained. With the method presented here, we combine the rich multidimensional information from cyclic IF with morphological information and show that it is possible to predict these IF outcomes. Multiplexed imaging has previously shown its utility in gaining in depth information on culture composition and differentiation status in iPSC-derived neural cultures (progenitor, radial glia, astrocytes (immature and mature) and neurons (inhibitory and excitatory)32. Similarly, this could be expanded to cell states (apoptosis, DNA damage)15. Initial model training requires both CP and multiplexed IF images. After the initial training phase, the IF step can be omitted, and the CNN will predict the phenotype outcomes of the IF with high accuracy on the CP images alone. This significantly reduces experimental costs and time requirements.

Applying the method to iPSC-neuronal cultures revealed its potential to score their differentiation state. Although guided protocols manage to speed up the differentiation process and lead to enhanced culture purity, the neural differentiation process proves to be less than binary, as evidenced by the mixed presence of Ki67+/TUBB3– and Ki67-/TUBB3+ cells. The spontaneous differentiation protocol illustrated the unpredictable nature and late stage of the differentiation process. Reports in literature highlight the difficulty of reproducible neural differentiation and attribute this to culture conditions, cultivation time and variation in developmental signalling pathways in the source iPSC material33,34. Spontaneous neural differentiation has previously been shown to require approximately 80 days before mature neurons arise that can fire action potentials and show neural circuit formation. Although these differentiation processes display a stereotypical temporal sequence25, the exact timing and duration might vary. Yet, we could reliably predict the gradual transition to a more differentiated state using our CNN model. Complementary, the same transition from NPC to neuronal phenotype could be observed in UMAP space. Although UMAP dimensionality reduction merely visualizes data structures and is less sensitive as compared to CNN classification, it aids in visualization and comprehension of the dataset. We realize that differentiating iPSC cultures are highly heterogenous and are composed of a landscape of transitioning progenitors in various stages of maturity that our binary models currently fail to capture. To predict more continuously the maturation of iPSC-derived neural progenitors to differentiated neurons, it could be of interest to consider deep learning-based regression approaches 35.

For now, we have tailored models to the individual datasets, but it is conceivable that a more generalist CNN could be established for multiple culture types. This would obviously demand a much larger dataset to encompass the variability encountered in such models (e.g., various starting iPSC lines, various differentiation protocols). Publicly available datasets (e.g., JUMP-Cell Painting Consortium) can aid in creating an input database containing a large variability (different iPSC lines, different neural differentiation protocols, …), which would ultimately lead to a more robust and generalist predictor. Our results showing the prediction accuracy of a guided differentiation model on spontaneously differentiating cultures indicate that the approach can be transferred to other differentiation protocols. Inclusion of more input images and variability should thus enable developing a generalist model for other differentiation protocols without the need for ground truth validation and further CNN training.

In conclusion, we present a novel application for unbiased morphological profiling by extending its use to complex mixed neural cultures using sequential multispectral imaging and convolutional network-informed cell phenotype classification. We show that the resulting predictors are robust with respect to biological variation and cell culture density. This method holds promise for use in quality control of iPSC cultures to allow their routine use in high-throughput and high-content screening applications.

Methods

Cell culture

Cells were cultured at 37 °C and 5% CO2. 1321N1 and SH-SY5Y cell lines were maintained in DMEM-F12 + Glutamax (Gibco, 10565018) supplemented with 10% Fetal Bovine Serum (Gibco, 10500064). Cell seeding prior to imaging was done in 96-well black multiwell plates with #1.5 glass-like polymer coverslip bottom (Cellvis, P96-1.5P). Only the inner 60 wells were used, while the outer wells were filled with PBS−/− to avoid plate effects. Plates were coated with Matrigel (Corning, 734-1440) After seeding, the imaging plate was centrifuged at 100g for 3min.

iPSCs (Sigma Aldrich, iPSC0028 Epithelial-1) were cultured on Matrigel (Corning, 734-1440) in Essential 8 medium (Gibco, A1517001). Upon cell thawing, Rock inhibitor (Y-27632 dichloride, MedChem, HY-10583) was added in 10µM concentration. Subculturing of iPSCs was performed with ReLeSR (Stemcell Technologies, 05872).

Unguided differentiation25 of iPS cells to neural progenitor cells (NPCs) was started by subculturing the iPSCs single cell using Tryple Express Enzyme (Life technologies, 12605010) at a density of 10e4 cells/cm² in mTesR1 medium (Stemcell Technologies, 85850) and Rock inhibitor. The following day, differentiation to NPCs was started by dual SMAD inhibition in neural maintenance medium (1:1 Neurobasal (Life technologies, 21103049):DMEM-F12+Glutamax (Gibco, 10565018), 0.5× Glutamax (Gibco, 35-050-061), 0.5% Mem Non Essential Amino Acids Solution (Gibco, 11140050), 0.5% Sodium Pyruvate (Gibco, 11360070), 50 μM 2-Mercaptoethanol (Gibco, 31350010), 0.025% Human Insulin Solution (Sigma Aldrich, I9278), 0.5X N2 (Gibco, 17502048), B27 (Gibco, 17504044), 50 U/ml Penicillin-Streptomycin (Gibco, 15140122)) supplemented with 1 μM LDN-193189 (Miltenyi, 130-106-540), SB431542 (Tocris, 1614). Daily medium changes were performed for 11 consecutive days. Following neural induction, neural rosettes were selected by STEMdiff Neural Rosette Selection Reagent (Stemcell technologies, 05832). Maintenance of neural progenitor cells was performed in neural maintenance medium with 10 µM bFGF (Peprotech, 100-18C) added after subculturing. Single cell detachment of NPCs during maintenance was performed using Tryple Express Enzyme. Cell seeding prior to imaging is done in 96-well black multiwell plates µCLEAR (Greiner, 655090) coated with Poly-L-ornithine (Sigma-Aldrich, P4957) and laminin (Sigma-Aldrich, L2020). Only the inner 60 wells were used, while the outer wells were filled with PBS−/− to avoid plate effects. After seeding, the imaging plate was centrifuged at 100g for 3min.

Guided iPSC differentiation to neurons was performed according to Bell et al. (2019)2. The initial phase of neural induction consisted of 12 days neural induction medium (DMEM-F12+Glutamax (Gibco, 10565018), 1× N2 (Gibco, 17502048), 1× B27 (Gibco, 17504044),1mg/ml BSA (Sigma-Aldrich, A7979), 0.5% Mem Non-Essential Amino Acids Solution (Gibco, 11140050)). Of these 12 days, the first 7 were supplemented with cytokines for dual-SMAD inhibition (1 μM LDN-193189 (Miltenyi, 130-106-540), SB431542 (Tocris, 1614)). Following neural induction, the NPCs were floated in uncoated MW6 culture plates in NPC medium (DMEM-F12+Glutamax (Gibco, 10565018), 1× N2 (Gibco, 17502048), 1× B27 (Gibco, 17504044), 10 µM bFGF (Peprotech, 100-18C), 10 µM EGF (Peprotech, 100-47)). After 4 days, NPC clusters of appropriate size were filtered using a cell strainer ((37 µm cell strainer, Stemcell technologies, 27250) and plated on Matrigel-coated (Corning, 734-1440) MW6 culture plates. NPCs can now be expanded in NPC medium. Guided differentiation into forebrain neurons can be induced by switching to neuronal medium (BrainPhys (STEMCELL Technologies, 05792), 1× N2 (Gibco, 17502048), 1× B27 (Gibco, 17504044), 10 µM BDNF (PeproTech, AF-450-02), 10 µM GDNF (PeproTech, AF-450-02) for 15 days before fixation. Differentiation of iPSC to microglia4 was performed by the formation of embroid bodies (EBs) with 10e3 iPSCs/well in a 96-well U-bottom plate (Corning, 351177) coated with Anti-Adherence Rinsing Solution (Stemcell technologies, 07010) in mTeSR medium supplemented with Rock inhibitor, 50 ng/mL BMP4 (Peprotech, 120-05), 50 ng/mL VEGF (Peprotech, 100-20), 20 ng/mL SCF (Peprotech, 250-03). 75% medium is changed for 4 consecutive days. After mesoderm induction, EBs are transferred to a 6-well plate with 20 EBs/well and placed in macrophage precursor medium (X-vivo15 (Lonza, BE02-060Q), 100 ng/mL M-CSF (Peprotech, 300-25), 25 ng/mL IL-3 (Peprotech, 213-13), 1× Glutamax, 50 U/ml Penicillin-Streptomycin, 50 μM 2-Mercaptoethanol). 14 days after macrophage differentiation, macrophage precursors were harvested using a cell strainer (Stemcell technologies, 27250). Macrophage precursors were added to the NPC culture in 1:1 neural maintenance medium:microglia medium ((DMEM-F12+Glutamax, 100 ng/mL M-CSF (Peprotech, 300-25), 100 ng/mL IL-34 (Peprotech, 200-34), 1× Glutamax, 50 U/ml Penicillin-Streptomycin, 50 μM 2-Mercaptoethanol).

Replication labelling

Prior to co-seeding of 1321N1 and SH-SY5Y mixed cultures, individual cultures were incubated with either 10 µM EdU (Click-iT® EdU Imaging Kit, Life Technologies, C10340) or 10 µM BrdU (Sigma-Aldrich, B5002) for 24h. This incubation time exceeded the doubling time, allowing incorporation of the nucleotide analog in all cells. This labelling period was followed by a 24h washout period in regular cell culture medium. After washout, the cells were subcultured and plated in coculture. In half of the replicate, SH-SY5Y cells received BrdU while 1321N1 cells received EdU. For the remainder of wells, the pre-label switched cell types.

Morphological staining

Morphological staining (Cell Painting) was adapted from Bray et al. 201613. After careful titration, all dye concentrations were adjusted and evaluated for compatibility with 4-channel laser and filter combinations available on the confocal microscope (see further). Staining was performed on cell cultures fixed in 4% PFA (roti-histofix 4% paraformaldehyde, Roth, 3105.2) for 15min. Cells were rinsed once with PBS−/− (Life Technologies, 10010015) prior to fixation and 3× 5min post fixation. Staining solutions were prepared fresh before staining in PBS−/− with 0.3% Triton-X-100 (Sigma Aldrich, X100) (Table 1). Each staining solution was incubated for 30min on a shaker at RT in the dark. After staining, the cells were washed 1× with PBS −/− and sealed for imaging.

Specifications of morphological staining composition.

Cyclic staining and immunocytochemistry

Cyclic staining is executed by fluorescence quenching after each sequential imaging round. 1mg/ml in ddH2O LiBH4 solution22 (Acros Organics, 206810050) was prepared fresh before use. 1.5h incubation of quenching solution was performed before each successive staining series. After incubation, the quenching solution was removed by washing 3× 5min in PBS-/-. Successful fluorescence quenching was microscopically verified before proceeding with immunofluorescence staining (IF) (Table 2). Cells are treated with PAV blocking buffer (Thimerosal 0.5% (Fluka, 71230), NaN3 0.1% (Merck, k 6688), Bovine serum albumin (Sigma-Aldrich, A7284), Normal horse serum, PBS-/-) for 8min. The desired primary antibodies (pAB) are diluted in PAV blocking buffer. pAB incubation was performed 12h (overnight) at 4°C after which the cells were washed 1× 5min in PBS−/− followed by incubation in secondary antibody solution (sAB) in PAV + DAPI for nuclear counterstain. sAB staining was performed for 3h at RT while shaking. Prior to imaging, the cells were washed 2× with PBS−/− and stored in PBS−/− + 0,1% NaN3.

Used antibodies.

BrdU staining was performed using IF, requiring DNA denaturation before pAB incubation. This was performed by 10 min incubation with 2N HCl at 37°C. HCl was neutralized with 0.1 M sodium borate buffer pH 8.5 for 30min at RT. Cells were washed 3× 5min in PBS−/− before continuing with the general IF protocol. EdU click-it labelling was performed according to the manufacturer’s instructions (Click-iT® EdU Imaging Kit, Life Technologies, C10340).

Image acquisition

Images were acquired using a spinning disk confocal microscope (Nikon CSU-W1 SoRa) with a 20× 0.75 NA objective (Plan APO VC 20× DIC N2) and Kinetix sCMOS camera (pinhole 50µm; disk speed 4000 rpm; pinhole aperture 10; bit depth 12-bit, pixel size 0.325µm²). We opted for confocal microscopy instead of widefield to overcome image quality limitations resulting from highly dense cell clusters. 96-well plates were scanned, capturing a minimum of 16 images per well spread in a regular pattern (0,8 mm spacing in x and y) across the surface of the well. If multiple z-slices were acquired (to correct for surface inclinations in the field of view), a maximum projection was performed before further analysis. Morphological images were acquired in all 4 channels. (Table 3).

Specifications of the used laser lines, excitation and emission filters.

Software

Images were captured by the Nikon CSU-W1 confocal system in combination with the NIS elements software (RRID:SCR_014329). Image visualization was later performed using Fiji freeware36,37. In-depth image analysis, pre-processing and machine learning for cell classification were performed using Python programming language38 in combination with Anaconda39 (distribution and package managing software) and Visual Studio Code (code editor). The packages and versions used for data analysis are shown in Table 4.

Python packages used for image and data analysis alongside the software version.

Image pre-processing

Cell/nuclear segmentation

Using a tailor-made script implementing the Cellpose40 (cell segmentation) or StarDist41 (nuclear segmentation) package, images were pre-processed by normalizing the intensity values of each channel between the 1st and 99th quantile. Individual channels or channel combinations for segmentation can be selected depending on the desired outcome mask. Resulting outputs included the segmentation mask alongside the quality control images. For Stardist implementation, hyperparameter probability was set at 0,6 and overlap at 0,3. For Cellpose segmentation, 4 models (cyto2) were averaged to obtain the final mask. Segmentation was performed on the composite of all CP channels. An estimation of the cell’s diameter could be included to optimize cell detection.

Ground truth alignment

Following sequential staining and imaging rounds, multiple images were captured representing the same cell with different markers. All images were aligned using Fourier-based image cross correlation on the intensity-normalized multichannel images. The alignment shift between image1 and image2 was determined using scikit-image phase cross correlation. Image2 was then moved according to the predetermined shift to align morphological with ground truth images.

Ground truth phenotyping

The true cell phenotype was determined by the fluorescence intensity of the post-hoc immunostaining with class-specific markers. Each ground truth image was imported alongside the corresponding cell mask for that image. For each cell label, the fluorescence intensity was determined and tabulated. The threshold was set manually based on the fluorescence intensity of the monoculture controls. Ground truth labels were assigned to each region of interest (ROI).

Data filtering

Over the entire pipeline, ROIs could be discarded based on 3 conditions: (1) Cell detachment. Due to the harsh sample preparation and repeated washing steps, cells could detach from the imaging substrate thus resulting in lack of ground truth information for those cells. As a result, all ROIs for which there was no DAPI signal detected in the ground truth image, were removed from the dataset as incomplete. (2) Cells for which the ground truth fluorescence intensity was ambiguous. Ground truth labels were determined based on the presence of specific IF staining in either of the phenotype classes. If no class-specific IF staining was detected by thresholding, no true label could be assigned. These ROIs were therefore discarded due to uncertainty. (3) Likelihood of faulty ROI detection. The DAPI signal was used to discard ROIs that do not represent (complete) cells by thresholding on minimal DAPI intensity and minimal nuclear size.

ROI classification

Train-validation-test split

Model training requires splitting the available data in 3 subsets: training (60%), validation (10%), and testing (30%). The training dataset was used to train the machine learning models (either RF or CNN). The validation dataset, composing of 10% of the total data, was used to determine the hyperparameters and intermediate testing. The remaining 30% of the dataset was kept apart and used to test the final model when training is completed. For both RF and CNN, the testing dataset was never shown to the model during the training phase, but only used after training to determine the accuracy of predictions to the ground truth. The number of instances for each class was equalized by sampling equal number of instances from each predicted class. To account for variation between technical replicates, the train-validation-test split was stratified per well. As a result, no datapoints arising from the same well could appear simultaneously in the training, validation and testing subset. This data stratification was repeated 3 times with different random seeds (see Methods – Reproducibility).

Random Forest

For each ROI, a set of manually defined parameters was extracted corresponding to cell shape (area, convex area, filled area, length minor axis, length major axis, centroid, eccentricity, equivalent diameter area, ferret diameter max, orientation, perimeter), texture (GLCM: contrast, dissimilarity, homogeneity, energy, correlation, ASM), intensity (maximal intensity, minimal intensity, mean intensity, standard deviation intensity) and distribution (solidity). This was done for 3 regions (nucleus, cytoplasm and complete cell), and for all channel dimensions. Redundant parameters were removed. All parameters were standardized per ROI, grouped per replicate. The number of trees within the forest was varied between 10 and 150, reaching maximum accuracy at around 30 trees.

Uniform Manifold Approximation and Projection

Dimensionality reduction using UMAP was performed on the same feature matrix as used for RF prediction. Hyperparameters were set at the default settings.

Convolutional neural network

A ResNet50 model was trained for image classification. In contrast to classical machine learning techniques, no handcrafted features were extracted. Cell crops (60µm whole cell – 15µm nucleocentric/nuclear area) were defined based on the segmentation mask for each ROI. Each ROI was cropped out of the original morphological image and associated with metadata corresponding to its ground truth label. Images alongside their labels were fed to the network. Tensors were normalized per channel. Models are trained on a minimum of 5000 training inputs of each class for 50 epochs (training iterations). Each batch consisted of 100 samples. The training input was augmented by linear transformations (horizontal and vertical flip, random rotation). Each epoch, the current model was tested against a validation dataset. The performance of the model on this validation subset determined whether the model was stored as new best (if the new accuracy exceeded the accuracy of the previous best model) or discarded. The learning rate at the start was set at 0,0001 and automatically reduced with a factor of 0,1 during training when no improvement was seen after 10 epochs. After 50 epochs, the best resulting model was tested on a separate test dataset to determine the accuracy on previously unseen data.

Reproducibility

Each model training was performed 3 independent times. This was repeated for 3 different random seeds. Each model received input data arising from a minimum of 16 images per well, at least 15 technical replicates (wells). The optimization experiments (figures 1-4) were performed with cell lines with limited variability. These models were trained on 3 independent experiments where ground truth pre-labelling (Edu/BrdU) was performed at least once on either of the cell lines in coculture. For iPSC-derived cultures, as variability is inherent to these differentiations, 3 biological replicates (independent differentiations) were pooled for model training.

Statistics

All statistical comparisons were made nonparametric using Mann-Whitney U (for two independent sample comparison) or Kruskal-Wallis (for multiple sample comparison) with pairwise tests using Tukey’s honestly significant difference test. We opted for nonparametric testing because the number of models in each group to be compared was < 15. Significance levels are indicated on the figures using ns. (no statistical significance, p-value above 0,05), * (p-value between 0,05 and 5e-4), ** (p-value between 5e-4 and 5e-6) and *** (p-value smaller than 5e-6).

Funding

This work was funded by Fonds Wetenschappelijk Onderzoek Vlaanderen (1SB7423N; 1274822N), BOF (FFB210009) and IOF Uantwerpen (FFI210239; FFI230099) and VLAIO (HBC.2023.0155).

Acknowledgements

We thank Marlies Verschuuren and Hugo Steenberghen for their assistance and knowledge regarding the cell lines. By extension, we would like to thank all current and former members of the De Vos lab.

Data availability statement

The authors report that the results of this study are available within the manuscript and supplementary materials. All image analysis scripts are open-source available on GitHub (https://github.com/DeVosLab)

Conflict of interest

The authors declare no conflict of interest.

Abbreviations

BrdU: Bromodeoxyuridine; CNN: convolutional neural network; CP: cell painting; DAPI: 4’,6-diamidino-2-fenylindool; DIV: days in vitro; EdU: 5-ethynyl-2’-deoxyuridine; ER: endoplasmic reticulum; Grad-CAM: Gradient-weighted Class Activation Mapping; IF: immunofluorescence; iPSC: induced pluripotent stem cell; NPC: neural progenitor cell; RF: random forest; ROI: region of interest; TUBB3: beta-III-tubulin; UMAP: Uniform Manifold Approximation and Projection

Author contributions

The experiments were conceptualized by SDB, TVDL, JVDD, PP and WDV. Experiments were executed by SDB and JVDD. TVDL and SDB optimized the data analysis scripts. SDB analysed the imaging data and performed the data analysis. SDB and WDV prepared the original draft. PP and WDV supervised the work. All authors took part in reviewing and editing the manuscript.

| The influence of cell density on segmentation quantity and quality.

(A) Representative images of 1321N1 cells with increasing density alongside their cell and nuclear mask produced using resp. Cellpose and Stardist. (B) the number of ROIs detected in comparison to the ground truth (manual segmentation) (determined as intersection over union (IoU) < 0,15). (C) ROI detection quality (IoU) for increasing cell density for cell and nuclear masks.

| Fluorescence quenching over time using LiBH4.

(A) Images before and after quenching for all 4 fluorescence channels. (B) Time curve of normalized image-level fluorescence intensity during incubation with 1mg/ml LiBH4.

| Methods.

(A) Pipeline used for morphological profiling in mixed cultures. (B) Overview of the steps within the image analysis pipeline. (C) Evaluation of accuracy, true negative and true positive rate during CNN training (1321N1 vs. SH-SY5Y in monocultures) across all 50 epochs.

| Additional figures.

(A) PCA dimension reduction on handcrafted features extracted from monocultures of 1321N1 and SH-SY5Y cells. In contrast to UMAP dimension reduction, this linear approach does not result in clear cell type clustering. (B) Examples of misclassified ROIs. (C) Definition of cell regions given as training input for nuclear and nucleocentric model training. (D) Decreasing nuclear area in function of culture density. (E) Evaluation of cell phenotype prediction on an image of differentiating neurons (guided differentiation). The predicted phenotype is shown by the color of the square around each ROI. The distinctive phenotype is shown in the insets. (F) Additional GradCAM images for both cell type classes. (G) Predicted neuron-to-NPC ratio in spontaneously differentiating cultures as predicted by the condition-based classifier (Fig. 5E) in contrast to the cell-based classifier performance on the same dataset (Fig. 5H).