1. Computational and Systems Biology
  2. Stem Cells and Regenerative Medicine
Download icon

Self-assembling manifolds in single-cell RNA sequencing data

  1. Alexander J Tarashansky
  2. Yuan Xue
  3. Pengyang Li
  4. Stephen R Quake
  5. Bo Wang  Is a corresponding author
  1. Stanford University, United States
  2. Chan Zuckerberg Biohub, United States
  3. Stanford University School of Medicine, United States
Tools and Resources
Cite this article as: eLife 2019;8:e48994 doi: 10.7554/eLife.48994
6 figures, 1 table, 52 data sets and 5 additional files


Figure 1 with 2 supplements
The SAM algorithm.

(a) SAM starts with a randomly initialized kNN adjacency matrix and iterates to refine the adjacency matrix and gene weight vector until convergence. (b) Root mean square error (RMSE) of the gene weights (top) and the fraction of different edges of the nearest-neighbor adjacency matrices (bottom) between adjacent iterations (blue) and between independent runs at the same iteration (orange) to show that SAM converges to the same solution regardless of initial conditions. The differences between the gene weights and nearest-neighbor graphs from independent runs are relatively small, indicating that SAM converges to the same solution through similar paths. (c) Graph structures and gene weights of the schistosome stem cell data converging to the final output over the course of 10 iterations (i denotes iteration number). Top: nodes are cells and edges connect neighbors. Nodes are color-coded according to the final clusters. Bottom: weights are sorted according to the final gene rankings. (d) Network properties iteratively improve for the graphs reconstructed from the original data (red) but not on the randomly shuffled data (blue). The network properties converge to the same values when initializing SAM with the Seurat-reconstructed graph instead of a random graph (yellow). Dashed lines: metrics measured from the Seurat-reconstructed graphs.

Figure 1—figure supplement 1
Quality control of library preparation and sequencing of the schistosome stem cells.

(a) Histograms of h2a qPCR measurements in 2.5- (left) and 3.5- (right) week post infection samples. (b) Scatter plot of gene count (>2 TPM) vs. mapped read count of individual sequenced cells. Cells with low gene count or h2a expression are discarded and filtered from analysis (red) and the remaining cells are analyzed (blue). The number of final cells kept for analysis is specified on the top left corner of each plot.

Figure 1—figure supplement 2
A user interface for interactively exploring single-cell data using SAM.

An interactive Jupyter notebook interface provided by the SAM package facilitates convenient visualization of single cell data (upper left) and changing of SAM parameters using various control panels (upper right, and bottom). This interface allows clustering, subclustering, visualizing of gene expression, and many other applications.

Figure 2 with 1 supplement
SAM identifies novel subpopulations within schistosome stem cells.

(a) UMAP projections of the manifolds reconstructed by SAM, PCA, and Seurat. SIMLR outputs its own 2D projection based on its constructed similarity matrix using a modified version of t-SNE. The schistosome cells are color-coded by the stem cell subpopulations μ, δ’, εɑ, and εβ determined by Louvain clustering. (b) UMAP projections with gene expressions of subpopulation-specific markers (eledh, nanos-2, cabp, astf, bhlh,) and a ubiquitous stem cell marker, ago2-1, overlaid. Insets: magnified views of the expressing populations. (c) FISH of cabp and EdU labeling of dividing stem cells in juvenile parasites at 2.5 weeks post-infection show that μ-cells (cabp+EdU+, arrowheads) are close to the parasite surface and beneath a layer of post-mitotic cabp+ cells. Dashed outline: parasite surface. Right: magnified views of the boxed region. (d) FISH of cabp and a set of canonical muscle markers, troponin, myosin, tropomyosin, and collagen, shows colocalization in post-mitotic cabp+ cells. Images in (c–d) are single confocal slices. (e) FISH of astf and bhlh shows their orthogonal expression in adjacent EdU+ cells (arrowheads). Bottom: magnified views of the boxed region. Image is a maximum intensity projection of a confocal stack with a thickness of 12 µm. (f) UMAP projection of stem cells isolated from juveniles at 2.5 and 3.5 weeks post-infection. Cell subpopulation assignments based on marker gene expressions are specified. Right: a magnified view to show the mapping of εɑ- and εβ-cells. (g) Standardized dispersions as calculated by Seurat plotted vs. the SAM gene weights. (h) SC3 AUROC scores plotted vs. the SAM gene weights. Error bars indicate the standard deviation of SC3 AUROC scores between trials using different chosen numbers of clusters. In (g) and (h), the top 20 genes specific to each subpopulation are colored according to the color scheme used in (a).

Figure 2—figure supplement 1
μ-cells express ubiquitous stem cell marker and population specific genes.

UMAP projections with gene expressions of (a) stem cell markers and (b) μ-cell-specific genes overlaid.

Figure 3 with 1 supplement
SAM improves clustering accuracy and runtime performance.

(a) Accuracy of cluster assignment quantified by adjusted rand index (ARI) on nine annotated datasets (left). Right: differences between the number of clusters found by each method (N) and the number of annotated clusters (NTRUE). Smaller differences indicate more accurate clustering. Seurat* denotes Seurat analysis using parameters that maximize ARI. (b) RMSE of gene weights output by SAM averaged across ten replicate runs with random initial conditions for 56 datasets (blue) and simulated datasets with no intrinsic structure (green, Materials and methods). (c) Runtime of SAM, SC3, SIMLR, and Seurat as a function of the number of cells in each dataset. SC3 and SIMLR were not run on datasets with >3000 cells as the run time exceeds 20 min.

Figure 3—figure supplement 1
SAM converges to a stable solution independent of random initial conditions and is robust to the number of nearest neighbors and choice of distance metric.

(a) RMSE of gene weights between adjacent iterations within a run, averaged across ten replicate runs for all datasets. (b–c) Average ARI scores for the nine annotated benchmarking datasets when varying (b) the number of nearest neighbors, k, from 10 to 30 or (c) the choice of distance metric (Euclidean or Pearson correlation). Error bars indicate standard deviations of ARI scores across the different values of k and distance metrics. The errors for data with no error bars are too small to be seen.

SAM improves the analysis of datasets with varying network sensitivities.

(a) Network sensitivity of all 56 datasets ranked in descending order. Blue: the nine benchmarking datasets used in Figure 3a. Sensitivity measures the robustness of a dataset to changes in which features are selected (Materials and methods). (b) The network sensitivity plotted against the fraction of genes with SAM weight greater than 0.5 (in log scale) with Spearman correlation coefficient specified in the upper-right corner. (c) Fold improvement of SAM over Seurat for NACC, modularity, and spatial dispersion with respect to sensitivity for all 56 datasets. These ratios are linearly correlated with network sensitivity with Pearson correlations (r2) specified in the upper-left corner of each plot.

Robust feature selection improves cell clustering and manifold reconstruction.

(a) Network sensitivity, ARI, NACC, modularity, and spatial dispersion with respect to corruption of the Darmanis dataset, in which we randomly permute fractions of the data ranging from 0 to 100% of the total number of elements (Materials and methods). Performance is compared between SAM (blue), Seurat (red), Seurat with optimal parameters (black), and Seurat rescued with the top-ranked SAM genes (indigo). Error bars indicate the standard deviations across 10 replicate runs. The errors for points with no bars are too small to be seen. (b) Comparison of the area under curve (AUC) of the metrics in (a) with respect to data corruption for all nine datasets. Error bars indicate the standard deviations across 10 replicate runs. The errors for data with no error bars are too small to be seen.

Figure 6 with 2 supplements
SAM captures the cellular activation dynamics in a stimulated macrophage dataset.

(a) GSEA analysis (left) and UMAP projections (right) of the activated macrophages before (top) and after (bottom) removing cell cycle effects. Teal: significantly enriched gene sets determined by the significance threshold of 0.25 for the False Discovery Rate (FDR, dashed lines). Bottom: the two clusters are denoted as MT and M with colors representing the time since LPS induction. Arrows: evolution of time. (b) TNFα is enriched in the MT cluster. (c) Diagram of NF-κB activation in response to LPS stimulation via the Myd88 and TRIF signaling pathways. (d) Log2 fold changes of the average expressions of selected inflammatory genes in the MT cluster vs. the M cluster. All genes are significantly differentially expressed between the two clusters according to the Welch’s two-sample t-test (p<5103). (e) Representative traces for transient (left) and prolonged (right) NF-κB activation (Materials and methods). (f) Cells with prolonged NF-κB response (denoted as P) are primarily in the MT population. (g) Seurat and SIMLR projections show that they fail to order the cells by time since LPS induction and do not identify cell clusters representing the different modes of NF-κB activation.

Figure 6—figure supplement 1
Cluster-specific marker genes before and after removing cell cycle effects.

UMAP projections with marker genes specific to the dividing cells (a) and the MT cluster (b) overlaid.

Figure 6—figure supplement 2
SAM groups cells based on NF-κB activation dynamics while other methods cannot.

(a) UMAP projection of the macrophage cells after the removal of cell cycle effects. Cells with prolonged NF-κB dynamics are highlighted in red. (b) UMAP and t-SNE projections for Seurat and SIMLR, respectively, after the removal of cell cycle effects. Cells with prolonged NF-κB dynamics are highlighted in red. (c) UMAP projections for Seurat and SIMLR with three MT-specific marker gene expressions overlaid.



Key resources table
Reagent type
(species) or
DesignationSource or referenceIdentifiersAdditional
Commercial assay or kitSsoAdvanced Universal SYBR Green SupermixBiorad1725270qPCR
Commercial assay or kitQuant-iT PicoGreen dsDNA Assay KitThermo-FisherP7589cDNA quantification
Peptide, recombinant proteinRNase InhibitorTakara Bio2313BRT mix
Chemical compound, drugdNTP Set 100 mM solutionsThermo-FisherR0181RT mix and cDNA pre-amplification
Sequence-based reagents100 µM oligo-dTIDTAAGCAGTGGTATCAACGCAGAGTACT(30)VN
Sequence-based reagents100 µM TSOExiqonAAGCAGTGGTATCAACGCAGAGTACATrGrG+G
Commercial assay or kitERCC RNA Spike-In MixThermo-Fisher4456740RT mix
Chemical compound, drug10% Triton X-100Thermo-Fisher28314RT mix
Peptide,recombinant proteinSMARTscribe reverse transcriptaseTakara Bio639538RT mix
Chemical compound, drug100 mM DTTPromegaP1171RT mix
Chemical compound, drug5 M BetaineThermo-FisherB0300-1VLRT mix
Commercial assay or kitKapa Hotstart Ready MixRocheKK2602cDNA pre-amplification
Sequence-based reagents100 μM IS_PCR primerIDTAAGCAGTGGTATCAACGCAGAGT
Peptide, recombinant proteinlambda exonucleaseNEBM0262SDepletion of primer dimers
Commercial assay or kitAmpure purification beadsNEBM0262SDNA purification
Commercial assay or kitTG Nextera XT DNA Sample Preparation KitIlluminaFC-131–1096Library preparation
Commercial assay or kitTG Nextera XT Index Kit v2 Set A (96 Indices, 384 Samples)IlluminaTG-131–2001Library preparation
Strain, strain background (S. mansoni)NMRIBEI ResourcesNR-21963
AntibodyAnti-Digoxigenin-POD, Fab fragments from sheepRoche11207733910(1:1,000); FISH
AntibodyAnti-Fluorescein-POD, Fab fragments from sheepRoche11426346910(1:1,500); FISH experiments
Peptide, recombinant DNA reagentsPlasmid-pJC53.2Addgene26536Cloning vector
Chemical compound, drugCy5-azideClick Chemistry ToolsAZ118EdU detection
Chemical compound, drug5-ethynyl-2-deoxyuridine (EdU)InvitrogenA10044
Chemical compound, drugVybrant DyeCycle Violet (DCV)InvitrogenV35003FACS
Chemical compound, drugTOTO-3InvitrogenT3604FACS

Data availability

The schistosome stem cell scRNAseq data generated in this study is available through the Gene Expression Omnibus (GEO) under accession number GSE116920.

The following data sets were generated
  1. 1
    NCBI Gene Expression Omnibus
    1. Y Xue
    2. B Wang
    ID GSE116920. Single-cell RNA sequencing of proliferative stem cell population from juvenile Schistosoma mansoni.
The following previously published data sets were used
  1. 1
    NCBI Gene Expression Omnibus
    1. F Tang
    2. J Qiao
    3. R Li
    ID GSE36552. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells.
  2. 2
    1. M Goolam
    2. A Scialdone
    3. SJL Graham
    4. IC Macaulay
    5. A Jedrusik
    6. A Hupalowska
    7. T Voet
    8. JC Marioni
    9. M Zernicka-Goetz
    ID E-MTAB-3321. Heterogeneity in Oct4 and Sox2 targets biases cell fate in 4-cell mouse embryos.
  3. 3
    NCBI Gene Expression Omnibus
    1. B Tasic
    2. V Menon
    3. TN Nguyen
    4. TK Kim
    5. Z Yao
    6. LT Gray
    7. M Hawrylycz
    8. C Koch
    9. H Zeng
    ID GSE71585-GPL17021. Adult mouse cortical cell taxonomy revealed by single cell transcriptomics.
  4. 4
    NCBI Gene Expression Omnibus
    1. F Guo
    2. H Guo
    3. L Li
    4. F Tang
    ID GSE63818. The transcriptome and DNA methylome landscapes of human primordial germ cells.
  5. 5
    1. JK Kim
    2. AA Kolodziejczyk
    3. T Ilicic
    4. T Illicic
    5. SA Teichmann
    6. JC Marioni
    ID E-MTAB-2600. Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation.
  6. 6
    NCBI Gene Expression Omnibus
    1. D Wollny
    2. S Zhao
    3. A Martin-Villalba
    ID GSE80032. Single-cell analysis uncovers clonal acinar cell heterogeneity in the adult pancreas.
  7. 7
    1. KM Loh
    2. A Chen
    3. PW Koh
    4. TZ Deng
    5. R Sinha
    6. TsaiJM
    7. AA Barkal
    8. KY Shen
    9. R Jain
    10. RM Morganti
    11. N Shyh-Chang
    12. NB Fernhoff
    13. GeorgeBM
    14. G Wernig
    15. REA Salomon
    16. Z Chen
    17. H Vogel
    18. JA Epstein
    19. A Kundaje
    20. WS Talbot
    21. BeachyPA
    22. LT Ang
    23. IL Weissman
    ID SRP073808. Mapping the pairwise choices leading from pluripotency to human bone, heart, and other mesoderm cell types.
  8. 8
    NCBI Gene Expression Omnibus
    1. Q Deng
    2. D Ramsköld
    3. B Reinius
    4. R Sandberg
    ID GSE45719. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells.
  9. 9
    NCBI Gene Expression Omnibus
    1. P Anoop
    2. T Itay
    ID GSE57872. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma.
  10. 10
    NCBI Gene Expression Omnibus
    1. AH Rizvi
    2. PG Camara
    3. EK Kandror
    4. TJ Roberts
    5. I Schieren
    6. T Maniatis
    7. R Rabadan
    ID GSE94883. Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development.
  11. 11
    NCBI Gene Expression Omnibus
    1. Q Tang
    2. D Langenau
    ID GSE100911. Dissecting hematopoietic and renal cell heterogeneity in adult zebrafish at single-cell resolution using RNA sequencing.
  12. 12
    NCBI Gene Expression Omnibus
    1. I Engel
    2. G Seumois
    3. L Chavez
    4. A Chawla
    5. B White
    6. D Mock
    7. P Vijayanand
    8. M Kronenberg
    ID GSE74596. Innate-like functions of natural killer T cell subsets result from highly divergent gene programs.
  13. 13
    1. D Edsgard
    2. F Lanner
    3. R Sandberg
    4. S Petropoulos
    ID E-MTAB-3929. Single-cell RNA-seq reveals lineage and X chromosome dynamics in human preimplantation embryos.
  14. 14
    NCBI Gene Expression Omnibus
    1. JC Burns
    2. MC Kelly
    3. M Hoa
    4. RJ Morell
    5. MW Kelley
    ID GSE71982. Single-cell RNA-Seq resolves cellular complexity in sensory organs from the neonatal inner ear.
  15. 15
    NCBI Gene Expression Omnibus
    1. A Namani
    2. XJ Wang
    3. X Tang
    ID GSE94383. Measuring signaling and RNA-Seq in the same cell links gene expression to dynamic patterns of NF-κB activation.
  16. 16
    NCBI Gene Expression Omnibus
    1. FH Biase
    2. X Cao
    3. S Zhong
    ID GSE57249. Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing.
  17. 17
    NCBI Gene Expression Omnibus
    1. C Trapnell
    2. D Cacchiarelli
    3. J Grimbsby
    4. P Pokharel
    5. S Li
    6. M Morse
    7. T Mikkelsen
    8. J Rinn
    ID GSE52529-GPL16791. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells.
  18. 18
    1. AA Pollen
    2. TJ Nowakowski
    3. J Shuga
    4. X Wang
    5. AA Leyrat
    6. JH Lui
    7. N Li
    8. L Szpankowski
    9. B Fowler
    10. P Chen
    11. N Ramalingam
    12. G Sun
    13. M Thu
    14. M Norris
    15. R Lebofsky
    16. D Toppani
    17. DW Kemp
    18. WongM
    19. B Clerkson
    20. BN Jones
    21. S Wu
    22. L Knutsson
    23. B Alvarado
    24. J Wang
    25. LS Weaver
    26. MayAP
    27. RC Jones
    28. MA Unger
    29. AR Kriegstein
    30. JA West
    ID SRP041736. Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex.
  19. 19
    1. F Buettner
    2. KN Natarajan
    3. FP Casale
    4. ProserpioV
    5. A Scialdone
    6. FJ Theis
    7. SA Teichmann
    8. JC Marioni
    9. O Stegle
    ID E-MTAB-2805. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells.
  20. 20
    NCBI Gene Expression Omnibus
    1. R Satija
    ID GSE48968-GPL13112. Single-cell RNA-seq reveals dynamic paracrine control of cellular variation.
  21. 21
    NCBI Gene Expression Omnibus
    1. L Ning
    2. C Li-Fang
    ID GSE64016. Oscope identifies oscillatory genes in unsynchronized single-cell RNAseq experiments.
  22. 22
    NCBI Gene Expression Omnibus
    1. SE Meyer
    2. T Qin
    3. DE Muench
    4. K Masuda
    5. M Venkatasubramanian
    6. E Orr
    7. E Paietta
    8. MS Tallman
    9. H Fernandez
    10. A Melnick
    11. MM Beau
    12. S Kogan
    13. N Salomonis
    14. ME Figueroa
    15. HL Grimes
    ID GSE77847. DNMT3A haploinsufficiency transforms FLT3ITD myeloproliferative disease into a rapid, spontaneous, and fully penetrant acute myeloid leukemia.
  23. 23
    NCBI Gene Expression Omnibus
    1. B Treutlein
    2. SR Quake
    ID GSE52583-GPL13112. Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq.
  24. 24
    NCBI Gene Expression Omnibus
    1. A Olsson
    2. M Venkatasubramanian
    3. VK Chaudhri
    4. BJ Aronow
    ID GSE70245. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice.
  25. 25
    NCBI Gene Expression Omnibus
    1. J Shin
    2. H Song
    ID GSE71485. Single-cell RNA-seq with waterfall reveals molecular cascades underlying adult neurogenesis.
  26. 26
    1. PC Schwalie
    2. H Dong
    3. M Zachara
    4. J Russeil
    5. D Alpern
    6. N Akchiche
    7. C Caprara
    8. W Sun
    9. KU Schlaudraff
    10. G Soldati
    11. C Wolfrum
    12. B Deplancke
    ID E-MTAB-6677. A stromal cell population that inhibits adipogenesis in mammalian fat depots.
  27. 27
    NCBI Gene Expression Omnibus
    1. S Darmanis
    2. S Quake
    ID GSE84465. Single-Cell RNA-Seq analysis of infiltrating neoplastic cells at the migrating front of human glioblastoma.
  28. 28
    1. A Scialdone
    2. Y Tanaka
    3. W Jawaid
    4. V Moignard
    5. NK Wilson
    6. IC Macaulay
    7. JC Marioni
    8. B Göttgens
    ID E-MTAB-4079. Resolving early mesoderm diversification through single-cell expression profiling.
  29. 29
    NCBI Gene Expression Omnibus
    1. M Enge
    2. HE Arda
    ID GSE81547. Single-cell analysis of human pancreas reveals transcriptional signatures of aging and somatic mutation patterns.
  30. 30
    NCBI Gene Expression Omnibus
    1. I Stévant
    2. Y Neirjinck
    3. C Borel
    4. J Escoffier
    5. LB Smith
    6. SE Antonarakis
    7. ET Dermitzakis
    8. S Nef
    ID GSE97519. Deciphering cell lineage specification during male sex determination with single-cell RNA sequencing.
  31. 31
    NCBI Gene Expression Omnibus
    1. MJ Phillips
    2. P Jiang
    3. S Howden
    ID GSE98556. A novel approach to single cell RNA-sequence analysis facilitates in silico gene reporting of human pluripotent stem cell-derived retinal cell types.
  32. 32
    NCBI Gene Expression Omnibus
    1. M Vanlandewijck
    2. L He
    3. MA Mäe
    4. J Andrae
    5. C Betsholtz
    ID GSE99235. A molecular atlas of cell types and zonation in the brain vasculature.
  33. 33
  34. 34
    NCBI Gene Expression Omnibus
    1. A Ghahramani
    2. F Watt
    3. N Luscombe
    ID GSE99989. Epidermal Wnt signalling regulates transcriptome heterogeneity and proliferative fate in neighbouring cells.
  35. 35
    NCBI Gene Expression Omnibus
    1. F Lescroart
    2. X Wang
    3. X Li
    4. S Gargouri
    5. V Moignard
    6. C Dubois
    7. C Paulissen
    8. B Göttgens
    9. C Blanpain
    ID GSE100471. Defining the earliest step of cardiovascular lineage segregation by single-cell RNA-seq.
  36. 36
    NCBI Gene Expression Omnibus
    1. H Mohammed
    2. I Hernando-Herraez
    3. W Reik
    ID GSE100597. Single-cell landscape of transcriptional heterogeneity and cell fate decisions during mouse early gastrulation.
  37. 37
    NCBI Gene Expression Omnibus
    1. H Mathys
    2. F Gao
    3. L Tsai
    ID GSE103334. Temporal tracking of microglia activation in neurodegeneration at single-cell resolution.
  38. 38
    NCBI Gene Expression Omnibus
    1. M Chevée
    2. JD Robertson
    3. GH Cannon
    4. SP Brown
    5. LA Goff
    ID GSE107632. Variation in activity state, axonal projection, and position define the transcriptional identity of individual neocortical projection neurons.
  39. 39
    NCBI Gene Expression Omnibus
    1. PW Hook
    2. SA MyClymont
    3. GH Cannon
    4. WD Law
    5. AJ Morton
    6. LA Goff
    7. AS McCallion
    ID GSE108020. Single-Cell RNA-Seq of mouse dopaminergic neurons informs candidate gene selection for sporadic parkinson disease.
  40. 40
    NCBI Gene Expression Omnibus
    1. D Mi
    2. Z Li
    3. M Li
    ID GSE109796. Early emergence of cortical interneuron diversity in the mouse embryo.
  41. 41
    NCBI Gene Expression Omnibus
    1. F Zanini
    2. S Pu
    3. E Bekerman
    4. S Einav
    5. SR Quake
    ID GSE110496. Single-cell transcriptional dynamics of flavivirus infection.
  42. 42
    NCBI Gene Expression Omnibus
    1. I Tirosh
    2. A Venteicher
    3. M Suva
    4. A Regev
    ID GSE70630. Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma.
  43. 43
    NCBI Gene Expression Omnibus
    1. I Amit
    2. A Tanay
    3. F Paul
    4. Y Arkin
    5. A Giladi
    ID GSE72857. Transcriptional heterogeneity and lineage commitment in myeloid progenitors.
  44. 44
    NCBI Gene Expression Omnibus
    1. GK Smyth
    2. Y Chen
    3. B Pal
    4. JE Visvader
    ID GSE95430. Construction of developmental lineage relationships in the mouse mammary gland by single-cell RNA profiling.
  45. 45
    NCBI Gene Expression Omnibus
    1. M Häring
    2. A Zeisel
    3. S Linnarsson
    4. P Ernfors
    ID GSE103840. Neuronal atlas of the dorsal horn defines its architecture and links sensory input to transcriptional cell types.
  46. 46
    NCBI Gene Expression Omnibus
    1. S Darmanis
    2. M Enge
    3. SR Quake
    4. SA Sloan
    5. BA Barres
    6. Y Zhang
    7. C Caneda
    8. Gephart MG Hayden
    9. LM Shuer
    ID GSE67835. A survey of human brain transcriptome diversity at the single cell level.
  47. 47
    NCBI Gene Expression Omnibus
    1. YJ Wang
    2. J Schug
    3. ML Golson
    4. K Won
    5. C Liu
    6. A Naji
    7. D Avrahami
    8. KH Kaestner
    ID GSE83139. Single-cell transcriptomics of the human endocrine pancreas.
  48. 48
    NCBI Gene Expression Omnibus
    1. A Veres
    2. M Baron
    ID GSE84133. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure.
  49. 49
    NCBI Gene Expression Omnibus
    1. CT Fincher
    2. O Wurtzel
    3. Hoog T de
    4. KM Kravarik
    5. PW Reddien
    ID GSE111764. Cell type transcriptome atlas for the planarian Schmidtea mediterranea.
  50. 50
    1. Å Segerstolpe
    2. A Palasantza
    3. P Eliasson
    4. EM Andersson
    5. AC Andréasson
    6. X Sun
    7. S Picelli
    8. A Sabirsh
    9. M Clausen
    10. BjursellMK
    11. DM Smith
    12. M Kasper
    13. C Ämmälä
    14. R Sandberg
    ID E-MTAB-5061. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes.
  51. 51
    NCBI Gene Expression Omnibus
    1. MJ Muraro
    2. G Dharmadhikari
    3. Koning E de
    4. Oudenaarden A van
    ID GSE85241. A Single-cell transcriptome atlas of the human pancreas.

Additional files

Supplementary file 1

Ranked gene list with high SAM weights in the schistosome stem cell data.

Gene IDs and annotations are given in the S. mansoni genome version 9 (WormBase, WS268). Genes are assigned to the cluster corresponding to the marker gene, nanos-2, cabp, astf, or bhlh, with which they have the highest correlation. Genes found in our prior work (Wang et al., 2018) to be enriched in subsets of stem cells are specified.

Supplementary file 2

Datasets used in this study.

Accession numbers, library size normalization methods, data preprocessing methods, sensitivity scores, and corresponding references are provided for each dataset. Accession numbers with asterisks indicate datasets that are sourced from the conquer database (Soneson and Robinson, 2018). Accession numbers with crosses indicate the nine well-annotated datasets that were used for benchmarking.

Supplementary file 3

ARI clustering accuracy of individual annotated cell types.

The ARI scores of SAM, Seurat, SC3, and SIMLR applied to the nine benchmarking datasets are provided for each annotated ground truth cluster.

Supplementary file 4

Cloning primer sequences used for generating riboprobes for the FISH experiments and primer sequences for qPCR analysis.

Functional annotations of the genes were given in the S. mansoni genome version 9 (WormBase, WS268).

Transparent reporting form

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)