Joint profiling of cell morphology and gene expression during in vitro neurodevelopment
Figures
Overview of the experimental design and Cell Painting (CP) at the single-cell level.
(a) Overview of the experiment. Six induced pluripotent stem cell (iPSC) lines from four donors were differentiated to cortical neural fate and assayed at days 20, 40, and 70 of the differentiation using CP and scRNA-seq. (b) UMAP of the CP dataset at the single-cell level. Cells are colored and labeled by unsupervised Leiden clustering. A composite image representative of cells belonging to each CP cluster is indicated by arrows (staining legend bottom-right). (c) Heatmap of CP feature-module activation scores. 541 cells (minimum cluster size) were drawn from each CP cluster to ensure an equal contribution of each CP cluster to the heatmap. The CP feature modules represent sets of CP features that are highly correlated. The right-most column contains selected highly enriched terms from GSEA of CP feature modules. (d) Differentiation day (time points) projected on the CP UMAP.
Benchmarking of in vitro cell types.
Representative ICC images per donor in differentiation batches 1, 2, and a representative donor from differentiation batch 3, showing canonical marker expression per time point – neural progenitors (NESTIN), early and maturing neurons (TBR1, TUJ1), and astrocytes (GFAP). Scale bars correspond to 50 μm.
Cell Painting (CP) analysis workflow.
(a) Schematic of the CellProfiler segmentation steps. The nucleus channel was used to segment the nuclei, and the three cytoplasmic channels were then summed to detect cell contours, by propagation from the nuclei. (b) Schematic of feature extraction steps and image export. (c) Secondary analysis workflow (processing of the CP feature matrix). (d) Effect of regularization on the CP feature distribution. Original distribution (first row) and regularized distribution (second row) are shown for one feature per family of transformations.
Additional Cell Painting (CP) results.
(a) Well-to-well Pearson correlation of the CP dataset. Heatmap header annotates the corresponding differentiation day (first row) and the cell line-replicate combination (second row), per well. (b) CP feature-to-feature Pearson correlation, grouped by CP feature module. The ‘misc’ module corresponds to unclassified CP features. The feature name nomenclature is as follows: [Compartment]_[Feature family]_[Feature name]_[Channel]_[Texture variation], where ‘Compartment’ indicates where the feature was measured (Cell, Cytoplasm, or Nucleus); ‘Feature family’ describes the broad measure type of the feature (intensity, texture, shape, neighbors); ‘Feature name’ abbreviates the precise name of the measure (e.g. MedInt is median intensity); ‘Channel’ indicates which channel the feature was extracted from (only for intensity or texture features); and ‘Texture variation’ provides information about how the texture was extracted (only for texture features). (c) CP regularized features values corresponding to Cell area (Cel_Shape_Area) and density (Cel_Neighbors_PctTouching_Adjacent) projected on the CP UMAP.
Joint profiling of neural cell types observed in vitro.
(a) UMAP of cell types as identified by scRNA-seq analysis, split by time point. (b) Representative heatmap of the model linking expression of marker genes (columns) to Cell Painting (CP) image-based features (rows). (c) Violin plots of predicted cell cycle scores per CP cluster. (d) Cell type composition from the scRNA-seq dataset. The facet on the left represents aggregated cell type proportions across donors per time point, while the other facets represent individual time points, depicting cell type proportions for each donor. (e) Correlation heatmap of cell types to CP clusters, between average of CP features values per cell cluster and average of predicted CP features from scRNA-seq. Abbreviations: PgS – S phase progenitors, PgG2M – G2M phase progenitors, vRG – ventral radial glia, oRG – outer radial glia, IP – intermediate progenitors, ExDp1/2 – excitatory deep layer neurons 1/2, ExN – migrating excitatory neurons, ExM – maturing excitatory neurons, ExM-U – upper-layer enriched maturing excitatory neurons, InCGE – caudal ganglionic eminence interneurons, InMGE – medial ganglionic eminence interneurons, OPC – oligodendrocyte precursors, Per – pericytes, End – endothelial cells.
Multi-modal analysis workflow.
Schematic of the steps used to link gene expression to Cell Painting (CP) features. Metacells were computed from each experimental population and used to train lasso regression models, one per CP feature. The computed coefficients and intercepts were stored in a coefficient matrix, used for CP feature functional enrichment analysis. The coefficient matrix was also used with the gene expression data to predict the CP feature values of each cell from the scRNA-seq dataset.
Cell type annotations in scRNA-seq.
Faceted panel per fetal-mapped cell type showing distribution of annotated cells on the scRNA-seq UMAP.
In vitro to fetal cell type correlations.
(a) Correlation of marker gene expression between annotated in vitro (query) cell types and fetal (reference) cell types, prior to filtering out unmapped cells (mean R = 0.50, off-diagonal mean R = 0.05). (b) Same correlation as in (a), after filtering out unmapped cells showing higher specificity (mean R = 0.55, off-diagonal mean R = 0.03).
Mitochondrial feature dynamics in Cell Painting (CP) and gene expression.
(a) Predicted pseudotime trajectory across time points in scRNA-seq showing progression from radial glia and progenitors, via intermediate progenitors, toward maturing excitatory and inhibitory neurons. Pseudotime is rooted in the predicted earliest node within day 20. (b) CP UMAPs with differentiation day (left), feature values for mitochondrial intensity (middle), and mitochondrial texture (right) projected. (c) scRNA-seq UMAPs with differentiation day (left), predicted values for the CP features mitochondrial intensity (middle) and texture (right) projected. (d) Gene set variation analysis (GSVA) enrichment scores for glycolysis and oxidative phosphorylation pathways per pseudobulked cell type. Enrichment scores indicate a higher activation score of OxPhos in excitatory neurons, except for the deep layer 1 subtype (ExDp1). (e) GSVA enrichment scores for mitochondria-associated gene ontology (GO) terms per pseudobulked cell type.
Functional enrichment across the developmental trajectory.
(a) Aggregated expression per cell type of modules of genes changing expression as a function of pseudotime. Enrichment of gene ontology (GO) terms (for each of the three sub-ontologies: ‘Biological Process’, ‘Cellular Component’, and ‘Molecular Function’) within the top-activated module for (b) PgG2M (module 15), (c) PgS (module 2), (d) IP (module 21), (e) ExM and ExDp1 (module 20), and (f) InMGE and InCGE (module 17). No significant terms were found for ‘Cellular Component’ within module 21 (d).
Mechanisms of donor-specific inhibitory neuron production.
(a, b) Association plots between the cell lines used in this study and cell type/cluster annotations. x-axis: log2 of the odds ratio, y-axis: AMI (Adjusted Mutual Information): AMI score here measures dependence between proportions of cell types/clusters present and cell line of origin, with an AMI score closer to 1 indicating a high degree of agreement, while values near 0 suggest little to no similarity beyond random chance. (a) Association of cell line abundance with InMGE cells from scRNA-seq at days 40 and 70. Each point represents aggregate expression per technical replicate, donor, and the differentiation protocol used. (b) Association of cell line abundance with the ‘d20.endoRetNeg’ cluster at d20 and ‘d40.endoRet’ at day 40, from Cell Painting (CP) data. Each point represents aggregate expression of all wells per induction replicate and donor. (c) Gene set variation analysis (GSVA) enrichment scores from curated KEGG pathways between experimental populations (donor x time point) with hierarchical clustering on both axes. Annotation bar plot to the right of the heatmap shows the percentage of cells produced by each experimental population annotated as inhibitory cell types (InCGE, InMGE, and InhN-O). (d) Relative expression per donor (log-normalized values) of inhibitory interneuron-associated transcription factors within the progenitor pool (oRG + vRG + PgG2M + PgS) at day 40. The expression per donor is grouped per 10x sample. (e) Expression of interneuron-associated DLX genes and ribosomal subunits RPL39 and RPL19, along pseudotime in two donors shown to produce a high (HEL61.2, left) and a low (HEL11.4, right) proportion of inhibitory neurons, respectively. Point colors represent fetal-annotated cell types.
Characterization of inhibitory neurons produced in vitro.
(a) Dot plot representing the expression of markers of CGE- and MGE-like interneurons across all cell types from the two-step annotation. (b) Gene set variation analysis (GSVA) enrichment scores per pseudobulked cell type for WNT and Hedgehog signaling pathways, ordered from highest to lowest enrichment. (c) Scaled gene expression aggregated per experimental population (time point × cell line) of genes representing key WNT/Hedgehog signaling pathway readouts as per Strano et al., 2020. ‘Ventralized’ experimental populations (producing more inhibitory neurons, Figure 4c) show higher PTCH1 expression as previously reported (Strano et al., 2020), but other genes do not follow reported patterns (higher PTCH1 and GLI1, and lower GAS1 in ventralized differentiations; higher AXIN2 and TNFRSF19 in dorsalized differentiations). (d) GSEA (KEGG) of differentially expressed genes between inhibitory neurons (InMGE, InCGE, and InhN-O) and excitatory neurons (ExDp-O, ExM-U, ExN, ExNeuNew-O, ExDp1,ExM, and ExU-O), at each differentiation time point. (e) Inverse correlation of KEGG ribosome pathway activation (GSVA enrichment score) with the percentage of inhibitory neurons, as seen in our 2D in vitro model and two independent organoid experiments (Bhaduri et al., 2020; He et al., 2024). (f) Representation of the excitatory–inhibitory branching segment within the pseudotime trajectory, as compared to the overall UMAP (shown in gray), and the scaled expression of the inhibitory branch-associated module (module 12) within this segment.
Characterization of low-quality cells within the scRNA-seq dataset.
(a) UMAP of scRNA-seq data across time points with unmapped cells highlighted in red. (b) Enrichment scores from gene set variation analysis (GSVA, across cell types) of key pathways driving the differences between high- and low-quality subsets per pseudobulked cell type. (c) Percentage of cells from the initial unmapped fraction that align to organoid cell types (Bhaduri et al.) per time point. (d) Ridge plots representing the distribution of module activation scores of key differentially activated pathways between pan-neuronal (ExPanNeu-O), excitatory (ExN), and inhibitory (InCGE) neuronal cell types. (e) Differential abundance of cell types produced by the original (left) or the modified (right) versions of the differentiation protocol, computed by miloR. Abbreviations: panRG – pan radial glia, glycoRG – glycolytic radial glia, IPC-Mature – mature intermediate progenitors, ExDp – excitatory deep layer, ExNeuNew – newborn excitatory neurons, ExU – excitatory upper layer, ExPanNeu – pan-neuronal (excitatory), InhN – inhibitory neurons, Astro – astrocytes, hRG – hindbrain radial glia, AstroHindb – hindbrain astrocytes. The suffix ‘-O’ differentiates organoid cell type annotations from their fetal counterparts.
LQ and HQ cell type subsets show similarities in expression and pseudotime.
(a) Dot plot representing the expression of cell-type-specific canonical markers in the HQ and LQ subsets of each cell type. (b) Average pseudotime values per cell type grouped by HQ and LQ subsets, ordered by increasing pseudotime of the HQ subset.
scRNA-seq quality control.
(a) Pearson correlation of cell type composition between differentiation replicates at day 20. Sample labeling is donor-batchRep-techRep-protocol with numbering corresponding to Supplementary file 1c. Pearson correlation as in (a) for samples at day 40 (b) and day 70 (c).
Reproducibility of replicates in scRNA-seq: HEL11.4.
(a) Cell type composition per replicate, per differentiation time point within cells from cell line HEL11.4 (BR – batch replicate, TR – technical replicate, PLivesy – original protocol, PModLivesey – modified protocol). (b) Number of cells per replicate from cell line HEL11.4. (c) Percentage of cells unmapped in the first annotation step or annotated as ‘panRG’/‘ExPanNeu’ (within unmapped cells) in the second annotation step, from cell line HEL11.4.
Reproducibility of replicates in scRNA-seq: HEL47.2.
Same as in Figure 5—figure supplement 3, for cell line HEL47.2.
Reproducibility of replicates in scRNA-seq: HEL61.1.
Same as in Figure 5—figure supplement 3, for cell line HEL61.1.
Reproducibility of replicates in scRNA-seq: HEL61.2.
Same as in Figure 5—figure supplement 3, for cell line HEL61.2.
Reproducibility of replicates in scRNA-seq: HEL62.4.
Same as in Figure 5—figure supplement 3, for cell line HEL62.4.
Reproducibility of replicates in scRNA-seq: HEL82.6.
Same as in Figure 5—figure supplement 3, for cell line HEL82.6.
In vitro neurons capture brain-relevant traits.
(a) Stratified linkage disequilibrium (LD) score regression analysis is shown for selected brain-related traits per cell type from our in vitro differentiation. Tile color represents the corresponding p-values after multiple testing correction across traits and cell types, with the significance level indicated as follows: *pAdj < 0.05, **pAdj < 0.01, and ***pAdj < 0.001. The top-most row represents enrichment across brain-specific genes associated with developmental disorders (n=68), with tile color representing p-values from a one-tailed Fisher’s exact test (*p < 0.05, **p < 0.01, and ***p < 0.001). (b) Stratified LD score regression analysis and Deciphering Developmental Disorders (DDD)-brain enrichment as in (a) per Cell Painting (CP) feature module. (c) Differential abundance of CP clusters between d20 samples from eight healthy individuals and four individuals with Kabuki syndrome (KS). (d) Visualization of gene ontology (GO): Biological Process terms enriched within the top genes predictive of CP clusters which are differentially abundant in either healthy donors (left, downregulated) or KS donors (right, upregulated).
Cell-type specificity of brain- and non-brain-related traits.
(a) Stratified linkage disequilibrium (LD) score regression analysis shown for all analyzed traits (n = 79) per cell type from our in vitro differentiation. Tile color represents the corresponding adjusted p-values after multiple testing correction across traits, within cell types. The significance level is indicated as follows: *pAdj < 0.05, **pAdj < 0.01, and ***pAdj < 0.001. (b) Stratified LD score regression analysis as in (a) per GTEx brain tissue type. (c) Heatmap of normalized expression, pseudobulked per cell type and scaled per gene, across brain-specific genes associated with developmental disorders (Deciphering Developmental Disorders [DDD] study). All DDD genes within the confidence category ‘definitive’ are shown (n = 719). ExDp1, ExM, ExM-U, ExN, InCGE, and InMGE together were considered as mature neuronal cell types, whereas PgS, PgG2M, oRG, and vRG were considered progenitors.
Integration of the two Cell Painting (CP) batches.
(a) Bar plot representing the cell distribution of the annotated CP clusters, from the original CP unsupervised clustering (batch 1), or from the reannotation of cells via the nearest neighbor (batch 2, see Methods). (b) 2D density of the integrated UMAP at day 20 per cell line. The Kabuki syndrome cell lines are indicated in blue, while lines included in batch 2 are indicated with the facet title box in gray.
Gene ontology (GO) enrichment of differentially abundant clusters between Kabuki syndrome (KS) samples and healthy donors.
(a) Visualization of GO: Cellular Component terms enriched within the top genes predictive of Cell Painting (CP) clusters which are differentially abundant in either healthy donors (left, downregulated) or KS donors (right, upregulated). (b) Same as in (a) for GO: Molecular Function terms.
Stratified LD score regression analysis including low-quality cells.
Results of the stratified LD score regression analysis shown for selected brain-related traits per cell type from our in vitro differentiation, prior to filtering for reference mapping thresholds. The adjusted p-values are reduced in comparison with the analysis shown in original Figure 6a, indicating a dilution of signal for the same set of common diseases. Tile color represents the corresponding p-values after multiple testing correction across traits and cell types, with the significance level indicated as follows: pAdj<0.05(*), pAdj<0.01(**) and pAdj<0.001 (***).
Tables
Cell lines used in the study with their corresponding donor information, derivation method, registration (https://hpscreg.eu/), and karyotype cytogenetic nomenclature.
| Name | Cell type | Diagnosis | Sex | Derivation method | Characterized as iPSC | Marker expression | Ref. | Registry info | Karyotype |
|---|---|---|---|---|---|---|---|---|---|
| HEL11.4 | Skin fibroblast | Unaffected | Male | Retrovirus | Yes | Yes | Mikkola et al., 2013 | UHi007-B | 46,X,inv(Y)(p11q11),add(1)(q12q21) |
| HEL47.2 | Skin fibroblast | Unaffected | Male | Sendai virus | Yes | Yes | Trokovic et al., 2015 | UHi007-A | 46,X,inv(Y)(p11q11) |
| HEL62.4 | Skin fibroblast | Unaffected | Female | Sendai virus | Yes | Yes | UHi020-A | 47,XX,+12[2]/46,XX[18] | |
| HEL61.1 | Skin fibroblast | Unaffected | Female | Sendai virus | Yes | Yes | UHi021-A | 46, XX | |
| HEL61.2 | Skin fibroblast | Unaffected | Female | Sendai virus | Yes | Yes | UHi021-B | 46, XX | |
| HEL82.6 | Skin fibroblast | Unaffected | Female | Sendai virus | Yes | Yes | UHi022-A | 46, XX | |
| aask4 | Skin fibroblast | Kabuki syndrome | Female | Sendai virus | Yes | Yes | WTSIi580-A | 46, XX | |
| hoik1 | Skin fibroblast | Unaffected | Female | Sendai virus | Yes | Yes | WTSIi026-A | 46,XX,t(6;13)(p11;q11) | |
| ierp4 | Skin fibroblast | Kabuki syndrome | Female | Sendai virus | Yes | Yes | WTSIi469-A | 47,XXX,t(2;5)(q12;q35.1)[9]/46,XX,t(2;5)(q12;q35.1)[11] | |
| oadp4 | Skin fibroblast | Kabuki syndrome | Male | Sendai virus | Yes | Yes | WTSIi558-A | 46, XY | |
| oikd4 | Skin fibroblast | Unaffected | Female | Sendai virus | Yes | Yes | WTSIi260-A | 46, XX | |
| qeti2 | Skin fibroblast | Kabuki syndrome | Male | Sendai virus | Yes | Yes | WTSIi526-A | 46, XY |
Additional files
-
Supplementary file 1
Methods-related information.
(a) Passage number information from BSCC induced pluripotent stem cell iPSC lines used in this study. (b) Protocol information for iPSC lines profiled by scRNA-seq at days 40 and 70, with their corresponding 10x library, replicate information, as well as enzyme usage for cell dissociation. (c) Cell lines included per 10x library at each differentiation day, with corresponding CellPlex barcodes (for day 20), donor and replicate information. (d) 10x libraries and sequencing information including runIDs, FASTQ IDs, and sequencing specs. (e) Reagents with their final concentration, supplier, and catalog number. (f) List of primary and secondary antibodies with their host species, dilution, supplier, and catalog numbers. (g) GWAS summary stats used for stratified LD score regression analysis.
- https://cdn.elifesciences.org/articles/102578/elife-102578-supp1-v1.xlsx
-
Supplementary file 2
Cell Painting (CP) regularization.
Formula of transformation applied per CP feature, ordered by feature families.
- https://cdn.elifesciences.org/articles/102578/elife-102578-supp2-v1.pdf
-
MDAR checklist
- https://cdn.elifesciences.org/articles/102578/elife-102578-mdarchecklist1-v1.docx