Identification and comparison of orthologous cell types from primate embryoid bodies shows limits of marker gene transferability

  1. Jessica Jocher
  2. Philipp Janssen
  3. Beate Vieth
  4. Fiona C Edenhofer
  5. Tamina Dietl
  6. Anita Térmeg
  7. Paulina Spurk
  8. Johanna Geuder
  9. Wolfgang Enard  Is a corresponding author
  10. Ines Hellmann  Is a corresponding author
  1. Anthropology and Human Genomics, Faculty of Biology, Ludwig-Maximilians-Universität München, Germany
  2. Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt: Munich, Germany
8 figures, 2 tables and 1 additional file

Figures

Figure 1 with 3 supplements
Generation of primate embryoid bodies.

(A) Overview of the embryoid body (EB) differentiation workflow of the four primate species human (Homo sapiens), orangutan (Pongo abelii), cynomolgus (Macaca fascicularis), and rhesus (Macaca mulatta), including their phylogenetic relationship. Scale bar represents 500 µm. (B) Immunofluorescence staining of day 16 EBs using α-fetoprotein (AFP), β-III-tubulin, and α-smooth muscle actin (α-SMA). Scale bar represents 100 µm. (C) Schematic overview of the sampling and processing steps prior to 10 x scRNA-seq. (D) UMAP representation of the whole scRNA-seq dataset, integrated across all four species with Harmony. Single cells are colored by the expression of known marker genes for the three germ layers and undifferentiated cells. (E) UMAP representation, colored by assigned germ layers, split by species. Created with BioRender.com.

Figure 1—figure supplement 1
Comparison of embryoid body (EB) differentiation protocols using flow cytometry.

(A) Antibody combination to analyze induced pluripotent stem cells (iPSCs) and cells of the three primary germ layers in a single sample. (B) Flow cytometry gating overview using human EBs at day 7 of differentiation. 1. Gating of cell population. 2. Gating of single cell population. 3. Gating of live cell population. 4-6. Gating of cells belonging to pluripotent or germ layer populations based on the antibody combination shown in S1A. (C) Phase contrast images of orangutan EBs on day 6 of differentiation in four different culture conditions. Scale bar represents 250 μm. (D) Bar plot of pluripotency and germ layer proportions of day 7 EBs from human, orangutan, cynomolgus, and rhesus in the four different culture conditions. Created with BioRender.com.

Figure 1—figure supplement 2
Total number of recovered cells.

(A) Barplot of cell numbers per species and experimental batch and 10x lane.(B) Barplot of cell numbers per species and day of differentiation. (C) Barplot of cell numbers per clone.(D) Barplot of cell numbers per clone and day of differentiation.

Figure 1—figure supplement 3
Reference-based cell type classification.

(A) UMAP presentations colored by labels from a classification with a reference dataset of day 21 human embryoid bodies (Rhodes et al., 2022). (B) Single cell clusters in integrated data from all four species. (C) Stacked bar plot of the proportions of predicted labels across clusters obtained in the integrated dataset.

Figure 2 with 6 supplements
Assignment of orthologous cell types across species.

(A) Schematic overview of the pipeline to match clusters between species and assign orthologous cell types. (B) Sankey plot visualizing the intermediate steps of the cell type assignment pipeline. Each line represents a cell which are colored by their species of origin on the left and by their current cell type assignment during the annotation procedure on the right. An initial set of 118 high-resolution clusters (HRCs), 25–35 per species, was combined into 26 orthologous cell type clusters (OCCs). Similar cell type clusters were merged, and after further manual refinement, provided the basis for final orthologous cell type assignments. (C) Fraction of annotated cell types per species. (D) UMAPs for each species colored by cell type. (E) To validate our cell type assignments, we selected three marker genes per cell type that exhibit a similar expression pattern across all four species and have been reported to be specific for this cell type in both human and mouse (Appendix 1—table 1). The heatmap depicts the fraction of cells of a cell type in which the respective gene was detected for cell types present in at least three species.

Figure 2—figure supplement 1
Replicability of cell types across species measured by reciprocal classification.

(A) Heatmap illustrating ‘all vs all’ similarities of cell types from all four species. For each cell type pair, the similarity represents the average classification fraction obtained through reciprocal classification between each species pair. (B) Average classification fractions for cell types that are shared among each species pair. AP : astrocyte progenitor, CFib: cardiac fibroblasts, CEndo: cardiac endothelial cells, CPC: cardiac progenitor cells, EEC: early epithelial cells, EE: early ectoderm, EC: epithelial cells, Fib: fibroblasts, GPC: granule precursor cells, Hepa: hepatocytes, NCI: neural crest I, NCII: neural crest II, Neu: neurons, SMC: smooth muscle cells.

Figure 2—figure supplement 2
Replicability of cell types across species measured with MetaNeighbor.

(A) Heatmap illustrating ‘all vs all’ similarities of cell types from all four species. For each cell type pair, the similarity represents area under the receiver operator characteristic curve (AUROC) scores obtained with Meta Neighbor Crow et al., 2018 in unsupervised mode. (B) AUROC scores for cell types that are shared among each species pair. AP: astrocyte progenitor, CFib: cardiac fibroblasts, CEndo: cardiac endothelial cells, CPC: cardiac progenitor cells, EEC: early epithelial cells, EE: early ectoderm, EC: epithelial cells, Fib: fibroblasts, GPC: granule precursor cells, Hepa: hepatocytes, NCI: neural crest I, NCII: neural crest II, Neu: neurons, SMC: smooth muscle cells.

Figure 2—figure supplement 3
Cell type annotation.

(A) Bar plot of cell type fraction per species and clone.(B) Bar plot of cell type fractions per experimental batch and 10x lane. (C) Bar plot of cell type fractions per day of differentiation across different species.

Figure 2—figure supplement 4
Pseudotime analysis of ectoderm differentiation trajectories.

(A-D) PHATE embeddings of induced pluripotent stem cells (iPSCs) and ectodermal cells from human (A), orangutan (B), cynomolgus (C), and rhesus macaque (D). (E-F) Distribution of pseudotime values across major ectodermal lineages for each species. Box plots illustrate shifts in pseudotime distributions between day 8 (d8) and day 16 (d16) of differentiation.

Figure 2—figure supplement 5
Pseudotime analysis of mesoderm differentiation trajectories.

(A-D) PHATE embeddings of induced pluripotent stem cells (iPSCs) and mesodermal cells from human (A), orangutan (B), cynomolgus (C), and rhesus macaque (D). (E-F) Distribution of pseudotime values across major mesodermal lineages for each species. Box plots illustrate shifts in pseudotime distributions between day 8 (d8) and day 16 (d16) of differentiation.

Figure 2—figure supplement 6
Pseudotime analysis of endoderm differentiation trajectories.

(A-D) PHATE embeddings of induced pluripotent stem cells (iPSCs) and endodermal cells from human (A), orangutan (B), cynomolgus (C), and rhesus macaque (D). (E-F) Distribution of pseudotime values across major endodermal lineages for each species. Box plots illustrate shifts in pseudotime distributions between day 8 (d8) and day 16 (d16) of differentiation.

Figure 3 with 1 supplement
Effect of cell type specificity on expression conservation.

(A) UMAP visualizations depicting expression patterns of selected example genes: SOX10 (conserved cell type-specific expression in neural crest cells), ESRG (species-specific and cell type-specific expression in human iPSCs), and RPL22 (conserved, broad expression). (B) For each gene, expression was summarized per species and cell type as the expression fraction and binarized into ‘not expressed’/’expressed’ (black frame) based on cell type-specific thresholds. The same example genes as in (A) are shown here. iPSCs: induced pluripotent stem cells, EE: early ectoderm, NC: neural crest, SMC: smooth muscle cells, CFib: cardiac fibroblasts, EC: epithelial cells, Hepa: hepatocytes. (C) Boxplot of expression conservation of genes according to the number of different cell types in which a gene is expressed in humans (cell type specificity). (D) Boxplot of the fraction of coding sequence sites that were found to evolve under constraint based on a 43 primate phylogeny (Sullivan et al., 2023), stratified by human cell type specificity.

Figure 3—figure supplement 1
Characteristics of genes with different levels of cell type-specific expression.

(A) Stacked bar plot of the number of genes per cell type specificity level for different species. (B) Boxplot of expression conservation of genes with different levels of cell type specificity in orangutan, cynomolgus, and rhesus. (C) Boxplot of gene-level constraint based on primate phastCons scores Sullivan et al., 2023 for protein-coding genes. (D) Boxplot of mean expression per cell type for genes with different levels of cell type specificity. (E) Boxplot of mean expression per cell type for a subset of 236 genes per cell type specificity and species that were sampled to have a similar distribution of mean expression. (F) Boxplot of expression conservation of the same subsampled gene sets as in (E).

Figure 4 with 3 supplements
Evaluation of marker gene conservation.

(A) UpSet plot illustrating the overlap between species for the top 100 marker genes per cell type. (B) Heatmap showing the expression fractions of marker genes: on the left, markers shared among all species, and on the right, markers unique to the human ranking. For each cell type, one representative gene is labeled and further detailed in Figure 4—figure supplement 1. iPSCs: induced pluripotent stem cells, EE: early ectoderm, NC: neural crest, SMC: smooth muscle cells, CFib: cardiac fibroblasts, EC: epithelial cells, Hepa: hepatocytes. (C) Rank-biased overlap (RBO) analysis comparing the concordance of gene rankings per cell type for lncRNAs, protein-coding genes, and transcription factors. (D) Average F1-score for a k-nearest neighbor (kNN)-classifier trained in the human clone 29B5 to predict cell type identity based on the expression of 1–30 marker genes. Each line represents the performance in a different clone, with shaded areas indicating 95% bootstrap confidence intervals.

Figure 4—figure supplement 1
Expression patterns of shared and human-specific marker genes.

(A) UMAP representation per species filtered for the seven cell types that are present in all four species. (B) UMAP representations colored by the log-normalized expression of seven representative marker genes that are shared among the top 100 marker genes per cell type in all four species. (C) UMAP representations colored by the log-normalized expression of seven representative marker genes that are only present in the human top 100 marker gene ranking per cell type.

Figure 4—figure supplement 2
k-nearest neighbor (kNN) classification performance per cell type.

F1-score per cell type for a kNN classifier trained in the human clone 29B5 to predict cell type identity based on the expression of 1-30 protein-coding marker genes. Each line represents the performance in a different clone, colored by species identity.

Figure 4—figure supplement 3
k-nearest neighbor (kNN) classification performance for transcription factors and protein coding marker genes.

(A) Average F1-score for a kNN-classifier trained in the human clone 29B5 to predict cell type identity in the other clones. The classifier is trained on the expression of the top 1-30 protein-coding markers (solid lines) or transcription factor markers (dashed lines). (B) Comparison of the maximum average F1 score between transcription factors and protein coding markers for the classifications depicted in (A).

Author response image 1
Pseudotime analysis for a differentiation trajectory towards neurons.

Single cells were first aggregated into metacells per species using SEACells (Persad et al. 2023). Pluripotent and ectoderm metacells were then integrated across all four species using Harmony and a combined pseudotime was inferred with Slingshot (Street et al. 2018), specifying iPSCs as the starting cluster. Here, lineage 3 is shown, illustrating a differentiation towards neurons. (A) PHATE embedding colored by pseudotime (Moon et al. 2019). (B) PHATE embedding colored by celltype. (C) Pseudotime distribution across the sampling timepoints (day 8 and day 16) in different species.

Author response image 2
UMAP visualization for the Harmony-integrated dataset across all four species for the seven shared cell types, colored by cell type identity (A) and species (B).
Author response image 3

(A) Evaluation of species mixing per cell type in the Harmony-integrated dataset, quantified by the fraction of cells with an adjusted cell-specific mixing score (cms) above 0. 05. (B) Summary of rank-biased overlap (RBO) scores per cell type to assess concordance of marker gene rankings for all species pairs.

Author response image 4

(A) UMI-counts/ gene for the same cynomolgus macaque iPSC samples. On the x-axis the gtf file from Ensembl Macaca_fascicularis_6.0.111 was used to count and on the y-axis we used our filtered Liftoff annotation that transferred the human gene models from Gencode v32. (B) The # of DE-genes between human and cynomolgus iPSCs detected with DESeq2. In Liftoff, we counted human samples using Gencode v32 and compared it to the Liftoff annotation of the same human gene models to macFas6. In Ensembl, we use Gencode v32 for the human and Ensembl Macaca_fascicularis_6.0.111 for the Macaque. For both comparisons we subset the genes to only contain one-to-one orthologs as annotated in biomart. Up and down regulation is relative to human expression. (C) Read counts for one example gene SALL4. Here, we used in addition to the Liftoff and Ensembl annotation also transcripts derived from Nanopore cDNA sequencing of cynomolgus iPSCs. (D) Gene models for SALL4 in the space of MacFas6 and a coverage for iPSC-Prime-seq bulk RNA-sequencing.

Tables

Table 1
Cell lines.

List of cell lines used for embryoid body (EB) differentiation.

IDSpeciesSexPublication
29B5Homo sapiensMaleGeuder et al., 2021
63Ab2.2Homo sapiensFemaleGeuder et al., 2021
69A1Pongo abeliiMaleGeuder et al., 2021
68A20Pongo abeliiMaleGeuder et al., 2021
82A3Macaca fascicularisFemaleEdenhofer et al., 2024
56B1Macaca fascicularisFemaleEdenhofer et al., 2024
56A1Macaca fascicularisFemale
87B1Macaca mulattaMaleJocher et al., 2024
83D1Macaca mulattaMaleJocher et al., 2024
83Ab1.1Macaca mulattaMaleJocher et al., 2024
Appendix 1—table 1
Marker genes.

Literature review for marker genes used in human and mouse / rodents to determine a specific cell type.

Cell typeMarker geneUsed in humanUsed in mouse
iPSCsPOU5F1Nguyen et al., 2018Loh et al., 2006
iPSCsNANOGNguyen et al., 2018Apostolou et al., 2013
iPSCsL1TD1Närvä et al., 2012Närvä et al., 2012
early ectodermSOX2Graham et al., 2003Lodato et al., 2013
early ectodermHES5Ziller et al., 2015Harada et al., 2021
early ectodermRFX4Ziller et al., 2015Kawase et al., 2014
granule precursor cellsNFIATan et al., 2023Fraser et al., 2020
granule precursor cellsZIC1Aruga et al., 1998Schüller et al., 2006
granule precursor cellsZIC4Aruga et al., 1998Blank et al., 2011
neural crestSOX10Mollaaghababa and Pavan, 2003Mollaaghababa and Pavan, 2003; Kim et al., 2003
neural crestFOXD3Tseng et al., 2016Dottori et al., 2001
neural crestS100BHackland et al., 2017Murphy et al., 1991
neuronsSTMN2Klim et al., 2019Guerra San Juan et al., 2022; Ware et al., 2016
neuronsTAGLN3 (NP25)Mori et al., 2004Ware et al., 2016
neuronsDCXGleeson et al., 1999Gleeson et al., 1999
smooth muscle cellsCOL8A1Rojas et al., 2024Muhl et al., 2022
smooth muscle cellsACTG2Hashmi et al., 2020Muhl et al., 2022
smooth muscle cellsACTA2Rojas et al., 2024Muhl et al., 2022
cardiac fibroblastsTNNT2Mononen et al., 2020Tachampa and Wongtawan, 2020
cardiac fibroblastsDCNFloy et al., 2021Ko et al., 2022
cardiac fibroblastsHAND2Mononen et al., 2020Furtado et al., 2014
epithelial cellsCDH1Oikawa et al., 2018Bondow et al., 2012
epithelial cellsEPCAMMartowicz et al., 2016Huang et al., 2018
epithelial cellsCLDN7Farkas et al., 2015Xing et al., 2020
hepatocytesTTRBanas et al., 2007Lavon and Benvenisty, 2005
hepatocytesAPOA1Krueger et al., 2013De Giorgi et al., 2021
hepatocytesAPOA2Krueger et al., 2013Peng et al., 2018

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Jessica Jocher
  2. Philipp Janssen
  3. Beate Vieth
  4. Fiona C Edenhofer
  5. Tamina Dietl
  6. Anita Térmeg
  7. Paulina Spurk
  8. Johanna Geuder
  9. Wolfgang Enard
  10. Ines Hellmann
(2026)
Identification and comparison of orthologous cell types from primate embryoid bodies shows limits of marker gene transferability
eLife 14:RP105398.
https://doi.org/10.7554/eLife.105398.3