Figures and data

Cell types’ locations and relative proportions are consistent across morphologically normal gastruloids
a. Representative gastruloid with the expression of T shown in magenta. Each spot is one transcript (count), nuclei are shown in gray. Thick dashed line denotes the AP axis, thin lines outline the anterior and posterior halves of the gastruloid. Scale bar is 200 μm. b. Gallery of representative gastruloids. Scale bar is the same as a. c. Proportion of each cell type across all gastruloids, ordered by median proportion. d. Correlation between the ratio of anterior area to posterior area across gastruloids and the proportion of that gastruloid typed as either somite (left) or paraxial mesoderm (right). e. Smoothed density of each cell type along the AP axis. Nuclei in each gastruloid were projected onto the AP axis, and their position was normalized to the total length. Traces from individual gastruloids are shown by individual curves. f. Distribution of mixing coefficients across all gastruloids and examples of low, medium, and high mixing. Scale bar is 200 μm.

Pairwise cell type interactions quantify cell type mixing and interaction motifs
a. Segregation index, which represents how frequently the source cell type (row) is found next to the target cell type (column). The maximum possible value is 1, self-interactions excluded for readability. b. Significantly enriched and depleted triplet combinations of cell types, ordered by effect size from left to right. Bars are colored by the composition of the triplet they represent. Only triplets found in over half of the gastruloids are shown.

Progressive NMP differentiation is revealed by the single cell L-metric
a. Illustration of NMP differentiation with predictions for where along the trajectory gene expression is shared or unique. b. Co-expression plot showing the relative amount of Eogt vs Pax6 (left) and Eogt vs Rfx4 (right) per cell in a representative gastruloid. Pax6 and Rfx4 are both annotated as spinal cord-associated genes. c. Per cell expression of all NMP exclusive genes summed versus all other gene categories summed, one plot per type. n=25514 cells typed as NMP, presomitic mesoderm, or spinal cord in n=26 gastruloids d. Per cell expression of all spinal cord exclusive genes summed versus expression of all presomitic mesoderm genes summed. n=25514 cells typed as NMP, presomitic mesoderm, or spinal cord in n=26 gastruloids. e. Per cell expression scatterplots of the two pairs of genes shown in b). The y axis of each is the per-cell expression of Eogt. The x-axis is the per-cell expression of Pax6 (left) or Rfx4 (right). R is Pearson’s r, scL is scL-metric. f. Distribution of scL-metric values for pairs of spinal cord and presomitic mesoderm genes (left), NMP and presomitic mesoderm genes (center), and NMP and spinal cord genes (right). g. Hierarchical clustering heatmap of scL-metric vectors for NMP, presomitic mesoderm, spinal cord, and combined category genes. Blocks highlighted are discussed in the text. h. Tree of the hierarchical relationships between genes resulting from the clustering shown in e). Color indicating the gene type is shown at the bottom, legend is the same as in g. The density plots shown are the summed, averaged gene expression of all genes in the leaves up until that node, smoothed with a 2D density kernel estimate (see Methods for details about smoothing).

Clustering L-metric vectors clearly resolves cell types and reveals novel genetic interactions
a. Heatmap of all scL-metric values for all genes in the panel (excluding poorly detected and cell cycle genes, see Methods, n=166). Colored bars on the top and right hand sides indicate if a gene is associated with a particular cell type. Heatmap was hierarchically clustered by row. b. Expansion of a presomitic mesoderm cluster (i.). Clustering relationships are indicated with the dendrogram, and summed densities for all genes in an example gastruloid show where the genes in each cluster are expressed spatially. Clustering relationships determined from the full gene set shown in a. c. Expansion of the posterior cell type cluster (ii.). Clustering relationships are indicated with the dendrogram, and summed densities for all genes in an example gastruloid show where the genes in each cluster are expressed spatially. Clustering relationships determined from the full gene set shown in a. d. Expansion of the endothelial cluster (iii.). Clustering relationships are indicated with the dendrogram, and summed densities for all genes in an example gastruloid show where the genes in each cluster are expressed spatially. Clustering relationships determined from the full gene set shown in a. e. Expression of two example pairs of genes from the endothelial cluster: Cldn5 and Tgfb1, and Cldn5 and Sox17.

Spatial L-metric reveals tissue-level patterns of gene expression
a. Illustration of how the density-based L-metric (spatial L-metric) is calculated, and how the value changes as a function of how much the density estimate is smoothed. b. Clustered heatmap of the spatial L-metric for a representative gastruloid. Purple box indicates a mixed cluster of endothelial and endoderm genes. c. An image of the gastruloid used to generate the heatmap in b. Cell types are indicated by color, legend is the same as in b.

Endothelial precursors show unique organization and distinct, spatially-dependent cell states
a. Example of clusters of endothelial cells. Representative gastruloid shown. Highlight shows cluster morphology, color is cell type (endothelial in pink, paraxial mesoderm in mint green, somite in orange, differentiation front in light blue, and untyped cells in gray). Number of clusters ranges from 23 (cardiac mesoderm) to 185 (paraxial mesoderm). b. Circularity of clusters of different cell types. See Methods for a description of how circularity was calculated. c. Location and extent of endothelial clusters projected onto the AP axis and normalized by total length. 10 representative gastruloids are shown; the average of all gastruloids (n=26, total clusters = 155) is shown as a smoothed kernel density estimate at the top of the plot. d. Representative gastruloid showing two distinct morphologies of endothelial cells: somite-associated (anterior) and endoderm-associated (posterior). e. Genes differentially expressed in somite-associated or endoderm-associated endothelial cells. Bar color represents the cell type associated with the gene (if any); genes that were not previously known to be associated with a cell type in gastruloids are shown in gray. Bars are ordered by significance (adjusted p value) from greatest (Nepn p=5.95e-23) to least (Igfbp5 p=1.10e-5). f. Spatial distribution of expression for example genes shown in e). Each plot shows one gene, each dot is a single transcript, and cells typed as endothelial are outlined in black. The top row shows genes enriched in somite-associated endothelium, the bottom row shows genes enriched in endoderm-associated endothelium.

Cell type entropy scores, cell type proportion covariation, and correlation between individual cell type proportions and the overall gastruloid mixing coefficient
a. Cell type score entropy distributions for each cell type. Each violin shows all the values for that cell type across all gastruloids. n=79607 nuclei across 26 gastruloids. b. Per-gastruloid cell type proportions (summarized in Figure 1b). c. Co-variation in proportion (normalized by Centered Log-Ratio (CLR) transformation) between cell types. d. Correlation between the per-gastruloid mixing coefficient with the proportion of the labeled cell type in that gastruloid. Pearson’s r and the significance (p) for each relationship is shown in the black box.

Gallery of individual gastruloids colored by cell type
a. Gastruloids from the experiment conducted on 05/07/2025 b. Gastruloids from the experiment conducted on 09/24/2024 c. Gastruloids from the experiment conducted on 02/09/2024 d. Proportion of each plate of gastruloids that, at 120 hours, were scored as ‘correct’. The correct phenotype was elongated with one axis and a clear anterior and posterior domain. The experiments used to generate the samples used in this paper are shown with red dots. One of the datasets used multiple plates pooled together, but the plates are shown individually for clarity.

Comparison of AP-axis gene expression centers of mass between individual gastruloids reported in this work and previously-reported tomoSeq performed on gastruloids
a. Center of mass of gene expression in all gastruloids compared to the center of mass of gene expression in one gastruloid analyzed by tomo-seq as reported in [2]. The number of genes in each plot is shown in the title; numbers vary by gastruloid as some of the genes in the seqFISH panel were not consistently detected across gastruloids (see Methods).

Exposure indices triplet motifs
a. Exposure indices for all cell types shown individually as bar plots. b. All motifs significantly enriched or depleted in any gastruloid.

Single cell L-metric of NMP, presomitic mesoderm, and spinal cord genes
a. Left: cell type scores for all nuclei in the gastruloid used in this figure. Right: cell type scores for all nuclei typed as NMP, presomitic mesoderm, or spinal cord in each of those categories. b. Distribution of scL-metric values between pairs of terminal cell type genes and genes annotated as belonging to that cell type and NMP genes. c. Spatial distribution of the co-expression of Nkx1-2 and Rfx4 in an example gastruloid. d. Nkx1-2 transcripts in the same gastruloid as in c. e. Example of a pair of two NMP-exclusive genes that had a high L-metric value

Example plots of simulated distributions used to calculate the L-metric
a. Rank order plot of the per-cell expression of Pax6 (blue). The expression of Eogt in the same cells is shown in red, and the same values randomized are shown in green. b. Cumulative sum and reordered cumulative sum plots used to calculate the L-metric. See Methods for calculation details.

Heatmap of L-metric values including cell cycle genes and NMP subcluster
a. Heatmap of all scL-metric values for all genes in the panel (excluding poorly detected but including cell cycle genes). Colored bars on the top and right hand sides indicate if a gene is associated with a particular cell type. Heatmap was hierarchically clustered by row. b. Highlight of the NMP cluster from Figure 4a. Clustering relationships are indicated with the dendrogram, and summed densities for all genes in an example gastruloid show where the genes in each cluster are expressed spatially. Clustering relationships determined from the full gene set shown in Figure 4a.

Comparison of scL-metric clusters to cNMF clusters
a. cNMF stability analysis for varying numbers of components. cNMF was run with standard parameters on all nuclei from the dataset taken on 05/07/2025. b. Top 24 genes in each cluster averaged in space and summed together for visualization. 24 genes were chosen as this was the average number of genes per cluster when the L-metric tree was truncated to produce 7 clusters for comparison (see c) c. Truncation of the scL-metric tree to produce 7 clusters. The genes in each cluster were then visualized as in b. d. Qualitative clusters on the scL-metric tree that have many genes associated with a single cell type. We ran cNMF [39] on our data to identify gene programs. Stability analysis indicated that 7 was the optimal number of components. To compare how the genes identified by cNMF compared to those which are related by scL-metric similarity, we truncated the L-metric tree to produce 7 clusters. Because a gene can appear in multiple cNMF programs, but can only appear once in an L-metric tree, looking at cluster overlap by component genes was not directly possible. Instead, we visualized the genes for each method in space. There was clear correlation between the clusters produced by both methods. The cNMF clusters looked ’cleaner’ (because the top 24 genes in each program were plotted, and thus many genes that did not meet this threshold in any program weren’t visualized at all), while the L-metric clusters had a hierarchy of relatedness which was absent from the cNMF programs. Furthermore, we could select tree nodes qualitatively as being enriched for cell type marker genes, and found that these clusters also closely matched the cNMF clusters.

Refining and assessing marker gene panels with scL-metric values
a. A representative gastruloid with cell types as scored by the full marker gene panel (left) and entropy values of the score distributions of each cell (right). b. The same gastruloid scored with a reduced panel of marker genes filtered by the properties of their scL-metric values with all other genes. c. Heatmap showing the average scL-metric values between sets of marker genes from the hand-curated panel used in this paper. Panels were filtered by average per-cell expression, see Methods for details.

Using the scL-metric to assess unsupervised clustering and identify marker genes
a. Left: UMAP of all gastruloid nuclei from the dataset taken on 05/07/2025 with > 40 counts projected onto a UMAP and colored according to Leiden clusters. Right: Cluster assignment for a representative gastruloid projected into spatial coordinate b. Heatmap showing the average scL-metric values for the top 10 genes associated with each cluster shown in a. We reasoned that since the L-metric quantifies mutually-exclusive gene expression, a property of good marker genes, it might be possible to use a gene’s scL-metric values to refine our marker gene panel. We calculated the scL-metric vectors, and then filtered for only genes which had at least 10% of their L-metric values < -0.8, and at least 12% > -0.3. This did shift the balance of differentiation front and presomitic mesoderm genes, however it slightly increased the entropy scores (Figure S4.3a,b), indicating that for small, hand-chosen marker gene panels, removing genes decreases scoring confidence, albeit only slightly. However, we also used the average scL-metric of genes associated with one type compared to another type to quantify type similarity (in terms of per-cell gene expression), and found marker pairs that distinguished cell types or were concordantly expressed in the same cell types (Figure S4.3c). This method can also be used to post-facto analyze similarity of clusters produced by Leiden or other clustering methods. To demonstrate this we pseudo-bulked all the nuclei for our entire dataset and clustered using a standard scanpy workflow (Figure S4.4a). We then took the top genes for each cluster, and calculated the average scL-metric score between the groups (Figure S4.4b). This analysis demonstrated that the clusters produced by unsupervised clustering, while they could be qualitatively mapped onto the cell types we expected to see (Figure S4.4a), were much more similar in terms of gene expression. Just using the top genes from this analysis would not be sufficient to produce marker genes. However, we used scL-metric values to find genes that were divergently expressed, even among similar clusters (Figure S4.4b).

Spatial L-metric heatmap of values averaged across all samples and direct comparison of scL-metric and spatial L-metric values for all gene pairs.
a. Heatmap of clustered spatial L-metric values for all genes averaged across all gastruloids. n=202 genes. b. Comparison of scL-metric and spatial L-metric values for cell type markers. Colored dots represent intra-type pairs, with the color specifying the cell type. The gray trace is the smoothed density estimate for all pairs, including inter-type pairs and pairs including cell cycle or non-marker genes.

Clustered heatmap of the scL-metric for all genes in an example gastruloid
a. Heatmap of clustered scL-metric values for all genes for the representative gastruloid shown in the main figure. Purple rectangle highlights endoderm and endothelial gene clusters.

Characterization of endothelial clusters
a. Cluster circularity versus size for all clusters identified in the gastruloid shown in Figure 6a. b. Distribution of the number of clusters of each cell type detected per gastruloid (across all gastruloids). n=26 gastruloids. c. Proportion of total gastruloids in which clusters were identified for each cell type. d. Exposure index of endoderm cells with all other cell types (left) and endothelial cells with all other cell types (right). The colored bars are the average across all gastruloids, and the gray lines are the values for the gastruloid shown in Figure 6d.

AP axis position of all endothelial clusters in all gastruloids
a. AP axis position of all endothelial clusters in all gastruloids

Average nearest neighbor distances between cell centroids in each gastruloid in our dataset from 05/07/2025 (n=18 gastruloids).
These values multiplied by 2 were used as bandwidths for individual two-dimensional Gaussian kernels fitted to each RNA spot to produce a smoothed spatial profile of expression for a given gene on a particular gastruloid.

Marker genes for all scored cell types.
Genes that mark more than one cell type are shown with both.