1. Computational and Systems Biology
  2. Stem Cells and Regenerative Medicine
Download icon

Self-assembling manifolds in single-cell RNA sequencing data

  1. Alexander J Tarashansky
  2. Yuan Xue
  3. Pengyang Li
  4. Stephen R Quake
  5. Bo Wang  Is a corresponding author
  1. Stanford University, United States
  2. Chan Zuckerberg Biohub, United States
  3. Stanford University School of Medicine, United States
Tools and Resources
Cite this article as: eLife 2019;8:e48994 doi: 10.7554/eLife.48994
6 figures, 1 table, 52 data sets and 5 additional files


Figure 1 with 2 supplements
The SAM algorithm.

(a) SAM starts with a randomly initialized kNN adjacency matrix and iterates to refine the adjacency matrix and gene weight vector until convergence. (b) Root mean square error (RMSE) of the gene weights (top) and the fraction of different edges of the nearest-neighbor adjacency matrices (bottom) between adjacent iterations (blue) and between independent runs at the same iteration (orange) to show that SAM converges to the same solution regardless of initial conditions. The differences between the gene weights and nearest-neighbor graphs from independent runs are relatively small, indicating that SAM converges to the same solution through similar paths. (c) Graph structures and gene weights of the schistosome stem cell data converging to the final output over the course of 10 iterations (i denotes iteration number). Top: nodes are cells and edges connect neighbors. Nodes are color-coded according to the final clusters. Bottom: weights are sorted according to the final gene rankings. (d) Network properties iteratively improve for the graphs reconstructed from the original data (red) but not on the randomly shuffled data (blue). The network properties converge to the same values when initializing SAM with the Seurat-reconstructed graph instead of a random graph (yellow). Dashed lines: metrics measured from the Seurat-reconstructed graphs.

Figure 1—figure supplement 1
Quality control of library preparation and sequencing of the schistosome stem cells.

(a) Histograms of h2a qPCR measurements in 2.5- (left) and 3.5- (right) week post infection samples. (b) Scatter plot of gene count (>2 TPM) vs. mapped read count of individual sequenced cells. Cells with low gene count or h2a expression are discarded and filtered from analysis (red) and the remaining cells are analyzed (blue). The number of final cells kept for analysis is specified on the top left corner of each plot.

Figure 1—figure supplement 2
A user interface for interactively exploring single-cell data using SAM.

An interactive Jupyter notebook interface provided by the SAM package facilitates convenient visualization of single cell data (upper left) and changing of SAM parameters using various control panels (upper right, and bottom). This interface allows clustering, subclustering, visualizing of gene expression, and many other applications.

Figure 2 with 1 supplement
SAM identifies novel subpopulations within schistosome stem cells.

(a) UMAP projections of the manifolds reconstructed by SAM, PCA, and Seurat. SIMLR outputs its own 2D projection based on its constructed similarity matrix using a modified version of t-SNE. The schistosome cells are color-coded by the stem cell subpopulations μ, δ’, εɑ, and εβ determined by Louvain clustering. (b) UMAP projections with gene expressions of subpopulation-specific markers (eledh, nanos-2, cabp, astf, bhlh,) and a ubiquitous stem cell marker, ago2-1, overlaid. Insets: magnified views of the expressing populations. (c) FISH of cabp and EdU labeling of dividing stem cells in juvenile parasites at 2.5 weeks post-infection show that μ-cells (cabp+EdU+, arrowheads) are close to the parasite surface and beneath a layer of post-mitotic cabp+ cells. Dashed outline: parasite surface. Right: magnified views of the boxed region. (d) FISH of cabp and a set of canonical muscle markers, troponin, myosin, tropomyosin, and collagen, shows colocalization in post-mitotic cabp+ cells. Images in (c–d) are single confocal slices. (e) FISH of astf and bhlh shows their orthogonal expression in adjacent EdU+ cells (arrowheads). Bottom: magnified views of the boxed region. Image is a maximum intensity projection of a confocal stack with a thickness of 12 µm. (f) UMAP projection of stem cells isolated from juveniles at 2.5 and 3.5 weeks post-infection. Cell subpopulation assignments based on marker gene expressions are specified. Right: a magnified view to show the mapping of εɑ- and εβ-cells. (g) Standardized dispersions as calculated by Seurat plotted vs. the SAM gene weights. (h) SC3 AUROC scores plotted vs. the SAM gene weights. Error bars indicate the standard deviation of SC3 AUROC scores between trials using different chosen numbers of clusters. In (g) and (h), the top 20 genes specific to each subpopulation are colored according to the color scheme used in (a).

Figure 2—figure supplement 1
μ-cells express ubiquitous stem cell marker and population specific genes.

UMAP projections with gene expressions of (a) stem cell markers and (b) μ-cell-specific genes overlaid.

Figure 3 with 1 supplement
SAM improves clustering accuracy and runtime performance.

(a) Accuracy of cluster assignment quantified by adjusted rand index (ARI) on nine annotated datasets (left). Right: differences between the number of clusters found by each method (N) and the number of annotated clusters (NTRUE). Smaller differences indicate more accurate clustering. Seurat* denotes Seurat analysis using parameters that maximize ARI. (b) RMSE of gene weights output by SAM averaged across ten replicate runs with random initial conditions for 56 datasets (blue) and simulated datasets with no intrinsic structure (green, Materials and methods). (c) Runtime of SAM, SC3, SIMLR, and Seurat as a function of the number of cells in each dataset. SC3 and SIMLR were not run on datasets with >3000 cells as the run time exceeds 20 min.

Figure 3—figure supplement 1
SAM converges to a stable solution independent of random initial conditions and is robust to the number of nearest neighbors and choice of distance metric.

(a) RMSE of gene weights between adjacent iterations within a run, averaged across ten replicate runs for all datasets. (b–c) Average ARI scores for the nine annotated benchmarking datasets when varying (b) the number of nearest neighbors, k, from 10 to 30 or (c) the choice of distance metric (Euclidean or Pearson correlation). Error bars indicate standard deviations of ARI scores across the different values of k and distance metrics. The errors for data with no error bars are too small to be seen.

SAM improves the analysis of datasets with varying network sensitivities.

(a) Network sensitivity of all 56 datasets ranked in descending order. Blue: the nine benchmarking datasets used in Figure 3a. Sensitivity measures the robustness of a dataset to changes in which features are selected (Materials and methods). (b) The network sensitivity plotted against the fraction of genes with SAM weight greater than 0.5 (in log scale) with Spearman correlation coefficient specified in the upper-right corner. (c) Fold improvement of SAM over Seurat for NACC, modularity, and spatial dispersion with respect to sensitivity for all 56 datasets. These ratios are linearly correlated with network sensitivity with Pearson correlations (r2) specified in the upper-left corner of each plot.

Robust feature selection improves cell clustering and manifold reconstruction.

(a) Network sensitivity, ARI, NACC, modularity, and spatial dispersion with respect to corruption of the Darmanis dataset, in which we randomly permute fractions of the data ranging from 0 to 100% of the total number of elements (Materials and methods). Performance is compared between SAM (blue), Seurat (red), Seurat with optimal parameters (black), and Seurat rescued with the top-ranked SAM genes (indigo). Error bars indicate the standard deviations across 10 replicate runs. The errors for points with no bars are too small to be seen. (b) Comparison of the area under curve (AUC) of the metrics in (a) with respect to data corruption for all nine datasets. Error bars indicate the standard deviations across 10 replicate runs. The errors for data with no error bars are too small to be seen.

Figure 6 with 2 supplements
SAM captures the cellular activation dynamics in a stimulated macrophage dataset.

(a) GSEA analysis (left) and UMAP projections (right) of the activated macrophages before (top) and after (bottom) removing cell cycle effects. Teal: significantly enriched gene sets determined by the significance threshold of 0.25 for the False Discovery Rate (FDR, dashed lines). Bottom: the two clusters are denoted as MT and M with colors representing the time since LPS induction. Arrows: evolution of time. (b) TNFα is enriched in the MT cluster. (c) Diagram of NF-κB activation in response to LPS stimulation via the Myd88 and TRIF signaling pathways. (d) Log2 fold changes of the average expressions of selected inflammatory genes in the MT cluster vs. the M cluster. All genes are significantly differentially expressed between the two clusters according to the Welch’s two-sample t-test (p<5103). (e) Representative traces for transient (left) and prolonged (right) NF-κB activation (Materials and methods). (f) Cells with prolonged NF-κB response (denoted as P) are primarily in the MT population. (g) Seurat and SIMLR projections show that they fail to order the cells by time since LPS induction and do not identify cell clusters representing the different modes of NF-κB activation.

Figure 6—figure supplement 1
Cluster-specific marker genes before and after removing cell cycle effects.

UMAP projections with marker genes specific to the dividing cells (a) and the MT cluster (b) overlaid.

Figure 6—figure supplement 2
SAM groups cells based on NF-κB activation dynamics while other methods cannot.

(a) UMAP projection of the macrophage cells after the removal of cell cycle effects. Cells with prolonged NF-κB dynamics are highlighted in red. (b) UMAP and t-SNE projections for Seurat and SIMLR, respectively, after the removal of cell cycle effects. Cells with prolonged NF-κB dynamics are highlighted in red. (c) UMAP projections for Seurat and SIMLR with three MT-specific marker gene expressions overlaid.



Key resources table
Reagent type
(species) or
DesignationSource or referenceIdentifiersAdditional
Commercial assay or kitSsoAdvanced Universal SYBR Green SupermixBiorad1725270qPCR
Commercial assay or kitQuant-iT PicoGreen dsDNA Assay KitThermo-FisherP7589cDNA quantification
Peptide, recombinant proteinRNase InhibitorTakara Bio2313BRT mix
Chemical compound, drugdNTP Set 100 mM solutionsThermo-FisherR0181RT mix and cDNA pre-amplification
Sequence-based reagents100 µM oligo-dTIDTAAGCAGTGGTATCAACGCAGAGTACT(30)VN
Sequence-based reagents100 µM TSOExiqonAAGCAGTGGTATCAACGCAGAGTACATrGrG+G
Commercial assay or kitERCC RNA Spike-In MixThermo-Fisher4456740RT mix
Chemical compound, drug10% Triton X-100Thermo-Fisher28314RT mix
Peptide,recombinant proteinSMARTscribe reverse transcriptaseTakara Bio639538RT mix
Chemical compound, drug100 mM DTTPromegaP1171RT mix
Chemical compound, drug5 M BetaineThermo-FisherB0300-1VLRT mix
Commercial assay or kitKapa Hotstart Ready MixRocheKK2602cDNA pre-amplification
Sequence-based reagents100 μM IS_PCR primerIDTAAGCAGTGGTATCAACGCAGAGT
Peptide, recombinant proteinlambda exonucleaseNEBM0262SDepletion of primer dimers
Commercial assay or kitAmpure purification beadsNEBM0262SDNA purification
Commercial assay or kitTG Nextera XT DNA Sample Preparation KitIlluminaFC-131–1096Library preparation
Commercial assay or kitTG Nextera XT Index Kit v2 Set A (96 Indices, 384 Samples)IlluminaTG-131–2001Library preparation
Strain, strain background (S. mansoni)NMRIBEI ResourcesNR-21963
AntibodyAnti-Digoxigenin-POD, Fab fragments from sheepRoche11207733910(1:1,000); FISH
AntibodyAnti-Fluorescein-POD, Fab fragments from sheepRoche11426346910(1:1,500); FISH experiments
Peptide, recombinant DNA reagentsPlasmid-pJC53.2Addgene26536Cloning vector
Chemical compound, drugCy5-azideClick Chemistry ToolsAZ118EdU detection
Chemical compound, drug5-ethynyl-2-deoxyuridine (EdU)InvitrogenA10044
Chemical compound, drugVybrant DyeCycle Violet (DCV)InvitrogenV35003FACS
Chemical compound, drugTOTO-3InvitrogenT3604FACS

Data availability

The schistosome stem cell scRNAseq data generated in this study is available through the Gene Expression Omnibus (GEO) under accession number GSE116920.

The following data sets were generated
    1. Xue Y
    2. Wang B
    (2018) NCBI Gene Expression Omnibus
    ID GSE116920. Single-cell RNA sequencing of proliferative stem cell population from juvenile Schistosoma mansoni.
The following previously published data sets were used
    1. Tang F
    2. Qiao J
    3. Li R
    (2013) NCBI Gene Expression Omnibus
    ID GSE36552. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells.
    1. Goolam M
    2. Scialdone A
    3. Graham SJL
    4. Macaulay IC
    5. Jedrusik A
    6. Hupalowska A
    7. Voet T
    8. Marioni JC
    9. Zernicka-Goetz M
    (2016) ArrayExpress
    ID E-MTAB-3321. Heterogeneity in Oct4 and Sox2 targets biases cell fate in 4-cell mouse embryos.
    1. Tasic B
    2. Menon V
    3. Nguyen TN
    4. Kim TK
    5. Yao Z
    6. Gray LT
    7. Hawrylycz M
    8. Koch C
    9. Zeng H
    (2016) NCBI Gene Expression Omnibus
    ID GSE71585-GPL17021. Adult mouse cortical cell taxonomy revealed by single cell transcriptomics.
    1. Guo F
    2. Guo H
    3. Li L
    4. Tang F
    (2015) NCBI Gene Expression Omnibus
    ID GSE63818. The transcriptome and DNA methylome landscapes of human primordial germ cells.
    1. Kim JK
    2. Kolodziejczyk AA
    3. Ilicic T
    4. Illicic T
    5. Teichmann SA
    6. Marioni JC
    (2015) ArrayExpress
    ID E-MTAB-2600. Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation.
    1. Wollny D
    2. Zhao S
    3. Martin-Villalba A
    (2016) NCBI Gene Expression Omnibus
    ID GSE80032. Single-cell analysis uncovers clonal acinar cell heterogeneity in the adult pancreas.
    1. Loh KM
    2. Chen A
    3. Koh PW
    4. Deng TZ
    5. Sinha R
    6. TsaiJM
    7. Barkal AA
    8. Shen KY
    9. Jain R
    10. Morganti RM
    11. Shyh-Chang N
    12. Fernhoff NB
    13. GeorgeBM
    14. Wernig G
    15. Salomon REA
    16. Chen Z
    17. Vogel H
    18. Epstein JA
    19. Kundaje A
    20. Talbot WS
    21. BeachyPA
    22. Ang LT
    23. Weissman IL
    (2016) NCBI
    ID SRP073808. Mapping the pairwise choices leading from pluripotency to human bone, heart, and other mesoderm cell types.
    1. Deng Q
    2. Ramsköld D
    3. Reinius B
    4. Sandberg R
    (2014) NCBI Gene Expression Omnibus
    ID GSE45719. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells.
    1. Anoop P
    2. Itay T
    (2014) NCBI Gene Expression Omnibus
    ID GSE57872. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma.
    1. Rizvi AH
    2. Camara PG
    3. Kandror EK
    4. Roberts TJ
    5. Schieren I
    6. Maniatis T
    7. Rabadan R
    (2017) NCBI Gene Expression Omnibus
    ID GSE94883. Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development.
    1. Tang Q
    2. Langenau D
    (2017) NCBI Gene Expression Omnibus
    ID GSE100911. Dissecting hematopoietic and renal cell heterogeneity in adult zebrafish at single-cell resolution using RNA sequencing.
    1. Engel I
    2. Seumois G
    3. Chavez L
    4. Chawla A
    5. White B
    6. Mock D
    7. Vijayanand P
    8. Kronenberg M
    (2016) NCBI Gene Expression Omnibus
    ID GSE74596. Innate-like functions of natural killer T cell subsets result from highly divergent gene programs.
    1. Edsgard D
    2. Lanner F
    3. Sandberg R
    4. Petropoulos S
    (2016) ArrayExpress
    ID E-MTAB-3929. Single-cell RNA-seq reveals lineage and X chromosome dynamics in human preimplantation embryos.
    1. Burns JC
    2. Kelly MC
    3. Hoa M
    4. Morell RJ
    5. Kelley MW
    (2015) NCBI Gene Expression Omnibus
    ID GSE71982. Single-cell RNA-Seq resolves cellular complexity in sensory organs from the neonatal inner ear.
    1. Namani A
    2. Wang XJ
    3. Tang X
    (2017) NCBI Gene Expression Omnibus
    ID GSE94383. Measuring signaling and RNA-Seq in the same cell links gene expression to dynamic patterns of NF-κB activation.
    1. Biase FH
    2. Cao X
    3. Zhong S
    (2014) NCBI Gene Expression Omnibus
    ID GSE57249. Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing.
    1. Trapnell C
    2. Cacchiarelli D
    3. Grimbsby J
    4. Pokharel P
    5. Li S
    6. Morse M
    7. Mikkelsen T
    8. Rinn J
    (2014) NCBI Gene Expression Omnibus
    ID GSE52529-GPL16791. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells.
    1. Pollen AA
    2. Nowakowski TJ
    3. Shuga J
    4. Wang X
    5. Leyrat AA
    6. Lui JH
    7. Li N
    8. Szpankowski L
    9. Fowler B
    10. Chen P
    11. Ramalingam N
    12. Sun G
    13. Thu M
    14. Norris M
    15. Lebofsky R
    16. Toppani D
    17. Kemp DW
    18. WongM
    19. Clerkson B
    20. Jones BN
    21. Wu S
    22. Knutsson L
    23. Alvarado B
    24. Wang J
    25. Weaver LS
    26. MayAP
    27. Jones RC
    28. Unger MA
    29. Kriegstein AR
    30. West JA
    (2014) NCBI SRA
    ID SRP041736. Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex.
    1. Buettner F
    2. Natarajan KN
    3. Casale FP
    4. ProserpioV
    5. Scialdone A
    6. Theis FJ
    7. Teichmann SA
    8. Marioni JC
    9. Stegle O
    (2015) ArrayExpress
    ID E-MTAB-2805. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells.
    1. Satija R
    (2014) NCBI Gene Expression Omnibus
    ID GSE48968-GPL13112. Single-cell RNA-seq reveals dynamic paracrine control of cellular variation.
    1. Ning L
    2. Li-Fang C
    (2015) NCBI Gene Expression Omnibus
    ID GSE64016. Oscope identifies oscillatory genes in unsynchronized single-cell RNAseq experiments.
    1. Meyer SE
    2. Qin T
    3. Muench DE
    4. Masuda K
    5. Venkatasubramanian M
    6. Orr E
    7. Paietta E
    8. Tallman MS
    9. Fernandez H
    10. Melnick A
    11. Beau MM
    12. Kogan S
    13. Salomonis N
    14. Figueroa ME
    15. Grimes HL
    (2016) NCBI Gene Expression Omnibus
    ID GSE77847. DNMT3A haploinsufficiency transforms FLT3ITD myeloproliferative disease into a rapid, spontaneous, and fully penetrant acute myeloid leukemia.
    1. Treutlein B
    2. Quake SR
    (2014) NCBI Gene Expression Omnibus
    ID GSE52583-GPL13112. Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq.
    1. Olsson A
    2. Venkatasubramanian M
    3. Chaudhri VK
    4. Aronow BJ
    (2016) NCBI Gene Expression Omnibus
    ID GSE70245. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice.
    1. Shin J
    2. Song H
    (2015) NCBI Gene Expression Omnibus
    ID GSE71485. Single-cell RNA-seq with waterfall reveals molecular cascades underlying adult neurogenesis.
    1. Schwalie PC
    2. Dong H
    3. Zachara M
    4. Russeil J
    5. Alpern D
    6. Akchiche N
    7. Caprara C
    8. Sun W
    9. Schlaudraff KU
    10. Soldati G
    11. Wolfrum C
    12. Deplancke B
    (2018) ArrayExpress
    ID E-MTAB-6677. A stromal cell population that inhibits adipogenesis in mammalian fat depots.
    1. Darmanis S
    2. Quake S
    (2017) NCBI Gene Expression Omnibus
    ID GSE84465. Single-Cell RNA-Seq analysis of infiltrating neoplastic cells at the migrating front of human glioblastoma.
    1. Scialdone A
    2. Tanaka Y
    3. Jawaid W
    4. Moignard V
    5. Wilson NK
    6. Macaulay IC
    7. Marioni JC
    8. Göttgens B
    (2016) ArrayExpress
    ID E-MTAB-4079. Resolving early mesoderm diversification through single-cell expression profiling.
    1. Enge M
    2. Arda HE
    (2017) NCBI Gene Expression Omnibus
    ID GSE81547. Single-cell analysis of human pancreas reveals transcriptional signatures of aging and somatic mutation patterns.
    1. Stévant I
    2. Neirjinck Y
    3. Borel C
    4. Escoffier J
    5. Smith LB
    6. Antonarakis SE
    7. Dermitzakis ET
    8. Nef S
    (2018) NCBI Gene Expression Omnibus
    ID GSE97519. Deciphering cell lineage specification during male sex determination with single-cell RNA sequencing.
    1. Phillips MJ
    2. Jiang P
    3. Howden S
    (2018) NCBI Gene Expression Omnibus
    ID GSE98556. A novel approach to single cell RNA-sequence analysis facilitates in silico gene reporting of human pluripotent stem cell-derived retinal cell types.
    1. Vanlandewijck M
    2. He L
    3. Mäe MA
    4. Andrae J
    5. Betsholtz C
    (2018) NCBI Gene Expression Omnibus
    ID GSE99235. A molecular atlas of cell types and zonation in the brain vasculature.
    1. Ghahramani A
    2. Watt F
    3. Luscombe N
    (2018) NCBI Gene Expression Omnibus
    ID GSE99989. Epidermal Wnt signalling regulates transcriptome heterogeneity and proliferative fate in neighbouring cells.
    1. Lescroart F
    2. Wang X
    3. Li X
    4. Gargouri S
    5. Moignard V
    6. Dubois C
    7. Paulissen C
    8. Göttgens B
    9. Blanpain C
    (2018) NCBI Gene Expression Omnibus
    ID GSE100471. Defining the earliest step of cardiovascular lineage segregation by single-cell RNA-seq.
    1. Mohammed H
    2. Hernando-Herraez I
    3. Reik W
    (2017) NCBI Gene Expression Omnibus
    ID GSE100597. Single-cell landscape of transcriptional heterogeneity and cell fate decisions during mouse early gastrulation.
    1. Mathys H
    2. Gao F
    3. Tsai L
    (2017) NCBI Gene Expression Omnibus
    ID GSE103334. Temporal tracking of microglia activation in neurodegeneration at single-cell resolution.
    1. Chevée M
    2. Robertson JD
    3. Cannon GH
    4. Brown SP
    5. Goff LA
    (2018) NCBI Gene Expression Omnibus
    ID GSE107632. Variation in activity state, axonal projection, and position define the transcriptional identity of individual neocortical projection neurons.
    1. Hook PW
    2. MyClymont SA
    3. Cannon GH
    4. Law WD
    5. Morton AJ
    6. Goff LA
    7. McCallion AS
    (2018) NCBI Gene Expression Omnibus
    ID GSE108020. Single-Cell RNA-Seq of mouse dopaminergic neurons informs candidate gene selection for sporadic parkinson disease.
    1. Mi D
    2. Li Z
    3. Li M
    (2018) NCBI Gene Expression Omnibus
    ID GSE109796. Early emergence of cortical interneuron diversity in the mouse embryo.
    1. Zanini F
    2. Pu S
    3. Bekerman E
    4. Einav S
    5. Quake SR
    (2018) NCBI Gene Expression Omnibus
    ID GSE110496. Single-cell transcriptional dynamics of flavivirus infection.
    1. Tirosh I
    2. Venteicher A
    3. Suva M
    4. Regev A
    (2016) NCBI Gene Expression Omnibus
    ID GSE70630. Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma.
    1. Amit I
    2. Tanay A
    3. Paul F
    4. Arkin Y
    5. Giladi A
    (2015) NCBI Gene Expression Omnibus
    ID GSE72857. Transcriptional heterogeneity and lineage commitment in myeloid progenitors.
    1. Smyth GK
    2. Chen Y
    3. Pal B
    4. Visvader JE
    (2017) NCBI Gene Expression Omnibus
    ID GSE95430. Construction of developmental lineage relationships in the mouse mammary gland by single-cell RNA profiling.
    1. Häring M
    2. Zeisel A
    3. Linnarsson S
    4. Ernfors P
    (2018) NCBI Gene Expression Omnibus
    ID GSE103840. Neuronal atlas of the dorsal horn defines its architecture and links sensory input to transcriptional cell types.
    1. Darmanis S
    2. Enge M
    3. Quake SR
    4. Sloan SA
    5. Barres BA
    6. Zhang Y
    7. Caneda C
    8. Hayden Gephart MG
    9. Shuer LM
    (2015) NCBI Gene Expression Omnibus
    ID GSE67835. A survey of human brain transcriptome diversity at the single cell level.
    1. Wang YJ
    2. Schug J
    3. Golson ML
    4. Won K
    5. Liu C
    6. Naji A
    7. Avrahami D
    8. Kaestner KH
    (2016) NCBI Gene Expression Omnibus
    ID GSE83139. Single-cell transcriptomics of the human endocrine pancreas.
    1. Veres A
    2. Baron M
    (2016) NCBI Gene Expression Omnibus
    ID GSE84133. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure.
    1. Fincher CT
    2. Wurtzel O
    3. de Hoog T
    4. Kravarik KM
    5. Reddien PW
    (2018) NCBI Gene Expression Omnibus
    ID GSE111764. Cell type transcriptome atlas for the planarian Schmidtea mediterranea.
    1. Segerstolpe Å
    2. Palasantza A
    3. Eliasson P
    4. Andersson EM
    5. Andréasson AC
    6. Sun X
    7. Picelli S
    8. Sabirsh A
    9. Clausen M
    10. BjursellMK
    11. Smith DM
    12. Kasper M
    13. Ämmälä C
    14. Sandberg R
    (2016) ArrayExpress
    ID E-MTAB-5061. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes.
    1. Muraro MJ
    2. Dharmadhikari G
    3. de Koning E
    4. van Oudenaarden A
    (2016) NCBI Gene Expression Omnibus
    ID GSE85241. A Single-cell transcriptome atlas of the human pancreas.

Additional files

Supplementary file 1

Ranked gene list with high SAM weights in the schistosome stem cell data.

Gene IDs and annotations are given in the S. mansoni genome version 9 (WormBase, WS268). Genes are assigned to the cluster corresponding to the marker gene, nanos-2, cabp, astf, or bhlh, with which they have the highest correlation. Genes found in our prior work (Wang et al., 2018) to be enriched in subsets of stem cells are specified.

Supplementary file 2

Datasets used in this study.

Accession numbers, library size normalization methods, data preprocessing methods, sensitivity scores, and corresponding references are provided for each dataset. Accession numbers with asterisks indicate datasets that are sourced from the conquer database (Soneson and Robinson, 2018). Accession numbers with crosses indicate the nine well-annotated datasets that were used for benchmarking.

Supplementary file 3

ARI clustering accuracy of individual annotated cell types.

The ARI scores of SAM, Seurat, SC3, and SIMLR applied to the nine benchmarking datasets are provided for each annotated ground truth cluster.

Supplementary file 4

Cloning primer sequences used for generating riboprobes for the FISH experiments and primer sequences for qPCR analysis.

Functional annotations of the genes were given in the S. mansoni genome version 9 (WormBase, WS268).

Transparent reporting form

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)